CN102110071B - Virtual machine cluster system and implementation method thereof - Google Patents

Virtual machine cluster system and implementation method thereof Download PDF

Info

Publication number
CN102110071B
CN102110071B CN 201110051817 CN201110051817A CN102110071B CN 102110071 B CN102110071 B CN 102110071B CN 201110051817 CN201110051817 CN 201110051817 CN 201110051817 A CN201110051817 A CN 201110051817A CN 102110071 B CN102110071 B CN 102110071B
Authority
CN
China
Prior art keywords
virtual machine
cluster system
machine cluster
computing nodes
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201110051817
Other languages
Chinese (zh)
Other versions
CN102110071A (en
Inventor
熊坤
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN 201110051817 priority Critical patent/CN102110071B/en
Publication of CN102110071A publication Critical patent/CN102110071A/en
Application granted granted Critical
Publication of CN102110071B publication Critical patent/CN102110071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

本发明公开了一种虚拟机集群系统及其实现方法,以克服当前集群系统成本高昂的缺陷。该虚拟机集群系统构建在多个物理服务器上,在该些物理服务器上运行虚拟机以提供多个虚拟计算节点,该些虚拟计算节点分为一个管理节点及多个计算节点,其中:该管理节点,用于创建一个管理库以记录每个计算节点的主机名及动态IP地址,根据该些主机名及动态IP地址,对整个虚拟机集群系统进行管理;该些计算节点,用于完成该虚拟机集群系统的计算任务。本发明的技术方案采用虚拟机来搭建集群系统,减少了物理服务器的使用数量,降低了集群系统的成本。

Figure 201110051817

The invention discloses a virtual machine cluster system and a realization method thereof, so as to overcome the defect of high cost of the current cluster system. The virtual machine cluster system is built on multiple physical servers, running virtual machines on these physical servers to provide multiple virtual computing nodes, these virtual computing nodes are divided into a management node and multiple computing nodes, wherein: the management node, used to create a management library to record the host name and dynamic IP address of each computing node, and manage the entire virtual machine cluster system according to these host names and dynamic IP addresses; these computing nodes are used to complete the The computing task of the virtual machine cluster system. The technical scheme of the present invention uses a virtual machine to build a cluster system, which reduces the number of physical servers used and reduces the cost of the cluster system.

Figure 201110051817

Description

一种虚拟机集群系统及其实现方法A virtual machine cluster system and its implementation method

技术领域 technical field

本发明涉及计算机应用领域,尤其涉及一种虚拟机集群系统及其实现方法。 The invention relates to the field of computer applications, in particular to a virtual machine cluster system and an implementation method thereof.

背景技术 Background technique

计算机集群系统是通过一组松散集成的计算机软件或硬件连接起来高度紧密协作完成计算工作的计算系统。集群系统的高可用性集群一般是指集群系统中有某个节点失效的情况下,其上运行的任务会自动迁移到其他正常运行的节点上,这个迁移过程并不影响整个集群系统的运行。 A computer cluster system is a computing system that is connected by a group of loosely integrated computer software or hardware and highly closely cooperates to complete computing work. The high-availability cluster of the cluster system generally means that when a node in the cluster system fails, the tasks running on it will be automatically migrated to other normal running nodes. This migration process does not affect the operation of the entire cluster system.

传统的集群系统的硬件组成一般包括服务器组和共享存储设备。服务器组一般包含多个服务器节点,需要统一安装操作系统和集群软件程序;共享存储设备用于保证各服务器节点之间数据的一致性。另外,为了保证集群系统的高可用性,一般采用冗余备份的方式,这样一台服务器就需要一台备份设备,或者为每台服务器准备多台备份设备(称之为多备一)。这些备份设备也是导致整个集群系统成本居高不下的原因之一。 The hardware composition of a traditional cluster system generally includes server groups and shared storage devices. A server group generally includes multiple server nodes, and operating systems and cluster software programs need to be installed uniformly; shared storage devices are used to ensure data consistency among server nodes. In addition, in order to ensure the high availability of the cluster system, a redundant backup method is generally adopted, so that one server needs a backup device, or prepare multiple backup devices for each server (called one more backup). These backup devices are also one of the reasons for the high cost of the entire cluster system.

随着科技的发展,小规模的集群系统的需求也逐步增多,需要集群系统来完成高性能计算,但是小规模也决定了需要尽量降低整个集群系统的经济成本。所以,集群系统的高昂成本已经成为影响集群系统发展的障碍之一。 With the development of science and technology, the demand for small-scale cluster systems is also gradually increasing, and cluster systems are required to complete high-performance computing, but the small scale also determines the need to minimize the economic cost of the entire cluster system. Therefore, the high cost of the cluster system has become one of the obstacles affecting the development of the cluster system.

发明内容 Contents of the invention

本发明所要解决的技术问题是需要提供一种虚拟机集群系统,以克服当前集群系统成本高昂的缺陷。 The technical problem to be solved by the present invention is to provide a virtual machine cluster system to overcome the defect of high cost of the current cluster system.

为了解决上述技术问题,本发明提供了一种虚拟机集群系统,该虚拟机集群系统构建在多个物理服务器上,在该些物理服务器上运行虚拟机以提供多个虚拟计算节点,该些虚拟计算节点分为一个管理节点及多个计算节点,其中: In order to solve the above technical problems, the present invention provides a virtual machine cluster system, the virtual machine cluster system is constructed on multiple physical servers, running virtual machines on these physical servers to provide multiple virtual computing nodes, these virtual Computing nodes are divided into a management node and multiple computing nodes, among which:

该管理节点,用于创建一个管理库以记录每个计算节点的主机名及动态IP地址,根据该些主机名及动态IP地址,对整个虚拟机集群系统进行管理; The management node is used to create a management library to record the hostname and dynamic IP address of each computing node, and manage the entire virtual machine cluster system according to the hostname and dynamic IP address;

该些计算节点,用于完成该虚拟机集群系统的计算任务。 The computing nodes are used to complete computing tasks of the virtual machine cluster system.

优选地,该些物理服务器用于采用Qemu的写时拷贝格式提供虚拟机的磁盘镜像。 Preferably, these physical servers are used to provide disk images of virtual machines in the copy-on-write format of Qemu.

优选地,该管理节点包括: Preferably, the management node includes:

动态主机配置协议服务器,用于进行该动态IP地址的动态分发; A Dynamic Host Configuration Protocol server for dynamically distributing the dynamic IP address;

域名服务器,用于完成域名解析; Domain name server, used to complete domain name resolution;

网络文件系统服务器,用于对该磁盘镜像进行共享。 The network file system server is used to share the disk image.

优选地,该管理节点包括: Preferably, the management node includes:

扩展服务器,用于将新接入该虚拟机集群系统的物理服务器配置成运行计算节点的物理服务器。 The extended server is configured to configure a physical server newly connected to the virtual machine cluster system as a physical server running computing nodes.

优选地,该些物理服务器用于使用网络文件系统共享磁盘。 Preferably, the physical servers are used to share disks using a network file system.

优选地,该管理节点与该些计算节点之间建立有心跳连接,该管理节点定期向该些计算节点发送心跳信号,并对未收到心跳反馈的计算节点进行虚拟机迁移操作。 Preferably, heartbeat connections are established between the management node and the computing nodes, and the management node periodically sends heartbeat signals to the computing nodes, and performs virtual machine migration operations on computing nodes that have not received heartbeat feedback.

为了解决上述技术问题,本发明还提供了一种虚拟机集群系统的实现方法,该虚拟机集群系统构建在多个物理服务器上,在该些物理服务器上运行虚拟机以提供多个虚拟计算节点,该些虚拟计算节点分为一个管理节点及多个计算节点,其中: In order to solve the above-mentioned technical problems, the present invention also provides a method for realizing a virtual machine cluster system, the virtual machine cluster system is built on multiple physical servers, running virtual machines on these physical servers to provide multiple virtual computing nodes , these virtual computing nodes are divided into a management node and multiple computing nodes, among which:

在该管理节点上创建一个管理库以记录每个计算节点的主机名及动态IP地址,并根据该些主机名及动态IP地址对整个虚拟机集群系统进行管理; Create a management library on the management node to record the host name and dynamic IP address of each computing node, and manage the entire virtual machine cluster system according to these host names and dynamic IP addresses;

其中,该些计算节点用于完成该虚拟机集群系统的计算任务。 Wherein, the computing nodes are used to complete computing tasks of the virtual machine cluster system.

优选地,该些物理服务器采用Qemu的写时拷贝格式提供虚拟机的磁盘镜像。 Preferably, these physical servers provide disk images of virtual machines in the copy-on-write format of Qemu.

优选地,该些物理服务器使用网络文件系统共享磁盘。 Preferably, the physical servers share disks using a network file system.

优选地,在该管理节点与该些计算节点之间建立心跳连接。 Preferably, a heartbeat connection is established between the management node and the computing nodes.

与现有技术相比,本发明的技术方案采用虚拟机来搭建集群系统,减少了物理服务器的使用数量,降低了集群系统的成本。本发明的技术方案通过心跳连接能够监测虚拟机的健康状态,适时启动虚拟机的动态迁移,保证了虚拟机上服务的不间断运行,达到了整个虚拟机集群系统的高可用性。本发明的技术方案通过管理节点来完成集群系统的管理配置工作,由计算节点负责计算任务,这样的架构可以在不需要重启系统的前提下动态添加计算节点,时的整个虚拟机集群系统具有高可扩展性。另外,本发明的技术方案中,所有的虚拟机都采用共享磁盘镜像的方式,大大减少计算节点的存储空间,并使用NFS的方式共享数据存储,无需单独使用共享存储设备,进一步降低了整个虚拟机集群系统的成本。 Compared with the prior art, the technical scheme of the present invention adopts a virtual machine to build a cluster system, which reduces the number of physical servers used and reduces the cost of the cluster system. The technical scheme of the present invention can monitor the health state of the virtual machine through the heartbeat connection, start the dynamic migration of the virtual machine in good time, ensure the uninterrupted operation of the service on the virtual machine, and achieve high availability of the entire virtual machine cluster system. The technical solution of the present invention completes the management and configuration of the cluster system through the management nodes, and the computing nodes are responsible for computing tasks. Such an architecture can dynamically add computing nodes without restarting the system, and the entire virtual machine cluster system has high scalability. In addition, in the technical solution of the present invention, all virtual machines use shared disk mirroring, which greatly reduces the storage space of computing nodes, and uses NFS to share data storage, without using shared storage devices alone, further reducing the entire virtual machine. The cost of the machine cluster system.

本发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。 Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

附图说明 Description of drawings

附图用来提供对本发明技术方案的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明的技术方案,并不构成对本发明技术方案的限制。在附图中: The accompanying drawings are used to provide a further understanding of the technical solution of the present invention, and constitute a part of the description, and are used together with the embodiments of the present invention to explain the technical solution of the present invention, and do not constitute a limitation to the technical solution of the present invention. In the attached picture:

图1为本发明实施例一虚拟机集群系统的组成示意图; 1 is a schematic diagram of the composition of a virtual machine cluster system according to an embodiment of the present invention;

图2为本发明实施例一中使用KVM进行虚拟化的系统视图。 FIG. 2 is a system view of virtualization using KVM in Embodiment 1 of the present invention.

具体实施方式 Detailed ways

以下将结合附图及实施例来详细说明本发明的实施方式,借此对本发明如何应用技术手段来解决技术问题,并达成技术效果的实现过程能充分理解并据以实施。 The implementation of the present invention will be described in detail below in conjunction with the accompanying drawings and examples, so as to fully understand and implement the process of how to apply technical means to solve technical problems and achieve technical effects in the present invention.

首先,如果不冲突,本发明实施例以及实施例中的各个特征的相互结合,均在本发明的保护范围之内。另外,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。 First of all, if there is no conflict, the embodiment of the present invention and the combination of various features in the embodiment are within the protection scope of the present invention. In addition, the steps shown in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and, although a logical order is shown in the flow diagrams, in some cases, the sequence may be different. The steps shown or described are performed in the order herein.

随着虚拟化技术的发展,在一台服务器上同时运行多个操作系统已经成为现实,硬件虚拟化技术更是让虚拟机的性能达到了接近主机性能的水平。使用虚拟化技术,原来需要几十甚至更多服务器才能搭建的集群系统,现在只可能仅需一半甚至更少的服务器就能实现,而且虚拟化技术的整体发展也为保证整个集群系统高可用和高可扩展性提供了技术保证。 With the development of virtualization technology, it has become a reality to run multiple operating systems on one server at the same time. Hardware virtualization technology makes the performance of the virtual machine reach a level close to that of the host. Using virtualization technology, the cluster system that originally required dozens or even more servers to build can now be realized with only half or even fewer servers, and the overall development of virtualization technology is also to ensure the high availability and high availability of the entire cluster system. High scalability provides technical assurance.

实施例一、虚拟机集群系统 Embodiment 1, virtual machine cluster system

图1为本实施例一的组成示意图。如图1所示,本实施例的虚拟机集群系统构建在多个物理服务器120上(可以在普通的物理机110上通过网络进行访问),本发明将该些物理服务器称之为虚拟化服务提供者(Virtualization Service Provider,VSP),在该些物理服务器上运行虚拟机以提供多个虚拟计算节点(Virtual Computer Node,VCN)130,该些虚拟计算节点分为管理节点140和计算节点150两种,管理节点140的数量为一个,该些虚拟计算节点除管理节点140之外其余均为计算节点150,其中: FIG. 1 is a schematic diagram of the composition of the first embodiment. As shown in Figure 1, the virtual machine cluster system of this embodiment is built on multiple physical servers 120 (which can be accessed through the network on ordinary physical machines 110), and the present invention refers to these physical servers as virtualization services Provider (Virtualization Service Provider, VSP), running virtual machines on these physical servers to provide multiple virtual computing nodes (Virtual Computer Node, VCN) 130, these virtual computing nodes are divided into management nodes 140 and computing nodes 150 In this case, the number of management node 140 is one, and these virtual computing nodes are all computing nodes 150 except the management node 140, wherein:

管理节点140,在整个虚拟机集群系统中有且仅有一个,用于对整个虚拟机集群系统进行配置管理;其上创建一个管理库以记录每个计算节点的主机名及动态IP地址,根据该管理库中所记录的每个计算节点的主机名及动态IP地址,即可实现对整个虚拟机集群系统进行管理; There is only one management node 140 in the entire virtual machine cluster system, which is used to configure and manage the entire virtual machine cluster system; a management library is created on it to record the host name and dynamic IP address of each computing node, according to The host name and dynamic IP address of each computing node recorded in the management library can realize the management of the entire virtual machine cluster system;

计算节点150,在整个虚拟机集群系统中为多个,与管理节点140相连,用于完成该虚拟机集群系统的计算任务。 There are multiple computing nodes 150 in the entire virtual machine cluster system, connected to the management node 140, and used to complete computing tasks of the virtual machine cluster system.

本实施例中,物理服务器(即VSP)120的结构如图2所示,其采用硬件虚拟化技术(Kernel based Virtual Machine,KVM),通过加载内核,在系统中创建一个特殊的字符设备/dev/kvm,以此来与用户态的设备模型通信。通过这个设备,使得客户机操作系统的地址空间独立于内核或者运行着的其他的客户机操作系统的地址空间。并且通过引入一个客户机模式,使得客户机拥有自己的私有地址空间,在实例化客户机操作系统(Guest OS)时映射,在这个私有地址空间上实现客户机地址的映射。加载KVM之后,就可以在用户空间启动客户机操作系统,也即是完成了虚拟化服务的提供。需要说明的是,在系统中创建一个特殊的字符设备/dev/kvm采用现有技术即可实现,本发明对此不作具体限定。 In this embodiment, the structure of the physical server (i.e. VSP) 120 is shown in Figure 2, which adopts hardware virtualization technology (Kernel based Virtual Machine, KVM), and creates a special character device /dev in the system by loading the kernel. /kvm, to communicate with userland device models. Through this device, the address space of the guest operating system is independent of the address space of the kernel or other running guest operating systems. And by introducing a client mode, the client has its own private address space, which is mapped when the guest operating system (Guest OS) is instantiated, and the client address is mapped on this private address space. After the KVM is loaded, the guest operating system can be started in the user space, that is, the provision of the virtualization service is completed. It should be noted that the creation of a special character device /dev/kvm in the system can be realized by using the existing technology, which is not specifically limited in the present invention.

图2为使用KVM进行虚拟化的系统视图,底部的硬件平台(hardware)是能够支持硬件虚拟化的硬件平台,在该硬件平台上面是一个Linux的操作系统(OS),其包含了一个虚拟机监控程序hypervisor。在这样的一个平台上面,可以运行客户机操作系统(也即Guest OS),在Guest OS里面可以运行与普通操作系统相同的应用程序。需要说明的是,图2所示的使用KVM进行虚拟化的技术内容是对现有技术的简要介绍,本发明对此不做详细说明。 Figure 2 is a system view of virtualization using KVM. The hardware platform (hardware) at the bottom is a hardware platform that can support hardware virtualization. On the hardware platform is a Linux operating system (OS), which contains a virtual machine. The monitoring program hypervisor. On such a platform, the client operating system (that is, Guest OS) can be run, and the same application programs as ordinary operating systems can be run in the Guest OS. It should be noted that the technical content of using KVM for virtualization shown in FIG. 2 is a brief introduction to the prior art, which is not described in detail in the present invention.

物理服务器120用于采取Qemu的写时拷贝(Qemu Copy-On-Write,QCOW)格式提供虚拟机的磁盘镜像,该QCOW格式支持一种快照模式。在该快照模式中,虚拟机所有的对磁盘镜像的写操作都将单独写到附加的一个临时文件当中,而不会写到磁盘镜像当中。这种共享模式支持了整个虚拟机集群系统中所有的虚拟机都使用同一个磁盘镜像,而不会产生数据不一致的情况,这就大大降低了整个虚拟机集群系统对存储空间(Storage)的需求,降低了存储设备的成本。 The physical server 120 is used to provide the disk image of the virtual machine in Qemu's copy-on-write (Qemu Copy-On-Write, QCOW) format, and the QCOW format supports a snapshot mode. In this snapshot mode, all write operations of the virtual machine to the disk image will be written separately to an additional temporary file instead of being written to the disk image. This sharing mode supports all virtual machines in the entire virtual machine cluster system to use the same disk image without data inconsistency, which greatly reduces the storage space (Storage) requirements of the entire virtual machine cluster system , reducing the cost of storage devices.

另外,现有技术中的集群系统都是通过额外使用共享存储设备来实现所有物理服务器使用统一的存储空间。在本发明的虚拟机集群系统中,物理服务器120用于使用NFS的方式来共享磁盘。NFS是一种分布式的文件系统,允许本地计算机像使用本地设备一样使用网络上的计算机的设备,仅仅需要将网络上的计算机设备挂载到本地系统上即可。这样仅仅需要在一个VSP上放置磁盘镜像,其它VSP不需要很大的磁盘空间,这样不仅仅减少了共享存储设备,而且减少了所需要的存储空间的大小,达到了降低集群系统成本的目的。 In addition, the cluster systems in the prior art realize that all physical servers use a unified storage space by additionally using a shared storage device. In the virtual machine cluster system of the present invention, the physical server 120 is used to use NFS to share disks. NFS is a distributed file system that allows a local computer to use the devices of computers on the network as if they were local devices, and only needs to mount the computer devices on the network to the local system. In this way, only a disk image needs to be placed on one VSP, and other VSPs do not need a large disk space, which not only reduces the shared storage device, but also reduces the size of the required storage space, and achieves the purpose of reducing the cost of the cluster system.

本发明的技术方案中,管理节点140与计算节点150之间可以建立有心跳连接,管理节点140定期向所有的计算节点150发送心跳信号,收到计算节点150相应的心跳反馈,表示计算节点150处于活跃或者正常状态。如果管理节点140没有收到计算节点150的心跳反馈,则可以认为相应的计算节点150处于非正常状态(如故障状态或者死机状态等等),这样的状态可能是由于VSP的故障或者其本身的故障导致,此时启动虚拟机迁移操作,将计算节点迁移到处于正常状态的VSP上面继续运行。这样就保证了整个系统的高可靠运行。如此,整个虚拟机集群系统的高可用性通过虚拟机的动态迁移得到了很好的保证。 In the technical solution of the present invention, a heartbeat connection can be established between the management node 140 and the computing node 150, and the management node 140 periodically sends a heartbeat signal to all computing nodes 150, and receives the corresponding heartbeat feedback from the computing node 150, indicating that the computing node 150 active or normal. If the management node 140 does not receive the heartbeat feedback from the computing node 150, it can be considered that the corresponding computing node 150 is in an abnormal state (such as a fault state or a dead state, etc.), and such a state may be due to a failure of the VSP or its own Due to the fault, the virtual machine migration operation is started at this time, and the computing node is migrated to the VSP in a normal state to continue running. This ensures the highly reliable operation of the entire system. In this way, the high availability of the entire virtual machine cluster system is well guaranteed through the dynamic migration of virtual machines.

管理节点140主要用于实现对整个虚拟机集群系统进行管理以及计算节点150主要用于完成计算任务的分工,具体地,比如在本实施例中部署Condor(一种开源工具)系统,那么在管理节点140上配置condor_collector、condor_negotiator和condor_schedd,在计算节点150上配置condor_startd即可。 The management node 140 is mainly used to manage the entire virtual machine cluster system and the computing node 150 is mainly used to complete the division of computing tasks. Specifically, for example, if the Condor (an open source tool) system is deployed in this embodiment, then the management Configure condor_collector, condor_negotiator, and condor_schedd on node 140, and configure condor_startd on computing node 150.

由于所有计算节点公用一个磁盘镜像,因此不能采用静态IP的方式,因为静态IP会将IP地址写入磁盘镜像。故而,该管理节点140包含动态主机配置协议(Dynamic Host Configuration Protocol,DHCP)服务器、域名服务器(Domain Name Server,DNS)以及网络文件系统(Network File System,NFS)服务器,其中: Since all computing nodes share a disk image, static IP cannot be used, because static IP will write the IP address into the disk image. Therefore, the management node 140 includes a Dynamic Host Configuration Protocol (Dynamic Host Configuration Protocol, DHCP) server, a Domain Name Server (Domain Name Server, DNS) and a Network File System (Network File System, NFS) server, wherein:

该DHCP服务器,用于进行IP地址的动态分发; The DHCP server is used for dynamically distributing IP addresses;

该DNS服务器,用于完成域名解析; The DNS server is used to complete domain name resolution;

该NFS服务器,用于将磁盘镜像共享给每个VSP,这些磁盘镜像设置为只读模式,每个计算节点采用快照模式来访问磁盘镜像。 The NFS server is used to share the disk images to each VSP, and these disk images are set to a read-only mode, and each computing node uses a snapshot mode to access the disk images.

该管理节点140上还可以包含扩展服务器,其中该扩展服务器用于将新接入该虚拟机集群系统的物理服务器配置成运行计算节点的VSP。这种管理模式可以非常方便地扩展虚拟机集群系统的规模,仅仅需要将新加入的物理服务器配置成运行计算节点的VSP,再进行适当的配置即可。在管理节点140上进行添加计算节点的操作,将其加入到整个集群计算系统当中,在这个过程中无需重启虚拟机集群系统或者管理节点140,就可以实现新加入物理服务器的动态添加。 The management node 140 may also include an extension server, where the extension server is used to configure a physical server newly connected to the virtual machine cluster system as a VSP running a computing node. This management mode can easily expand the scale of the virtual machine cluster system. It only needs to configure the newly added physical server as a VSP running the computing node, and then perform appropriate configuration. The operation of adding computing nodes on the management node 140 is added to the entire cluster computing system. During this process, the dynamic addition of newly added physical servers can be realized without restarting the virtual machine cluster system or the management node 140 .

实施例二、虚拟机集群系统的实现方法 Embodiment 2, the realization method of the virtual machine cluster system

结合图1所示的虚拟机集群系统及其说明,本实施例实现该虚拟机集群系统,其构建在多个物理服务器上,在该些物理服务器上运行虚拟机以提供多个虚拟计算节点,该些虚拟计算节点分为一个管理节点及多个计算节点,其中: In combination with the virtual machine cluster system shown in FIG. 1 and its description, this embodiment implements the virtual machine cluster system, which is built on multiple physical servers, and runs virtual machines on these physical servers to provide multiple virtual computing nodes. These virtual computing nodes are divided into a management node and multiple computing nodes, among which:

在该管理节点上创建一个管理库以记录每个计算节点的主机名及动态IP地址,并根据该些主机名及动态IP地址对整个虚拟机集群系统进行管理; Create a management library on the management node to record the host name and dynamic IP address of each computing node, and manage the entire virtual machine cluster system according to these host names and dynamic IP addresses;

其中,该些计算节点用于完成该虚拟机集群系统的计算任务。 Wherein, the computing nodes are used to complete computing tasks of the virtual machine cluster system.

其中,该些物理服务器采用Qemu的写时拷贝格式提供虚拟机的磁盘镜像。 Wherein, these physical servers provide the disk image of the virtual machine in the copy-on-write format of Qemu.

其中,该些物理服务器使用网络文件系统共享磁盘。 Wherein, these physical servers use a network file system to share disks.

其中,在该管理节点与该些计算节点之间建立心跳连接。 Wherein, a heartbeat connection is established between the management node and the computing nodes.

本实施例提供的虚拟机集群系统的该实现方法,请参照前述实施例一提供的虚拟机集群系统的技术方案进行理解。 For the implementation method of the virtual machine cluster system provided in this embodiment, please refer to the technical solution of the virtual machine cluster system provided in the first embodiment for understanding.

传统的集群系统都是通过物理服务器和共享存储来进行组建的,这些设备都是需要很高成本的投入。本发明的技术方案使用虚拟机来搭建集群系统,不仅降低了物理服务器的数量,提高了物理服务器的利用率,而且还可以利用虚拟机技术来提高集群系统的可用性和扩展性,显著降低成本的同时,也可以达到集群系统高性能计算的目的。 Traditional cluster systems are established through physical servers and shared storage, and these devices require high-cost investment. The technical solution of the present invention uses a virtual machine to build a cluster system, which not only reduces the number of physical servers and improves the utilization rate of the physical server, but also can use virtual machine technology to improve the availability and scalability of the cluster system and significantly reduce the cost. At the same time, the purpose of high-performance computing of the cluster system can also be achieved.

在本发明的技术方案中,利用虚拟机的动态迁移技术来实现集群系统的高可用性。虚拟机动态迁移,即是虚拟机的迁移的过程中,在虚拟机中运行的服务不会产生间断。虚拟机动态迁移是采用迭代拷贝的方式传输虚拟机内存状态,将出现故障或者处于其他非正常工作状态的虚拟机(或者计算节点)动态迁移到别的VSP上面,保证了服务的连续性,实现集群系统的高可用性。 In the technical solution of the present invention, the dynamic migration technology of the virtual machine is used to realize the high availability of the cluster system. Virtual machine live migration means that during the process of virtual machine migration, services running in the virtual machine will not be interrupted. Virtual machine dynamic migration is to use iterative copy method to transfer virtual machine memory status, and dynamically migrate virtual machines (or computing nodes) that have failed or are in other abnormal working states to other VSPs, ensuring the continuity of services and realizing High availability of clustered systems.

本发明的技术方案采用一个管理节点和多个计算节点的组成方式,方便了集群系统的动态扩展。如果需要扩充集群系统的规模,仅仅需要添加相应的物理服务器,并在其上建立计算节点,然后在管理节点上维护配置信息,即可将新加入的物理服务器扩展到已有的集群系统中,而不需要重启集群系统或者管理节点,能快速提高集群系统的计算能力,并且计算性能随着物理服务器数量的增加而呈近乎线性的增长。 The technical scheme of the invention adopts a composition mode of a management node and multiple computing nodes, which facilitates the dynamic expansion of the cluster system. If you need to expand the scale of the cluster system, you only need to add the corresponding physical server, build a computing node on it, and then maintain the configuration information on the management node to expand the newly added physical server to the existing cluster system. Without restarting the cluster system or management nodes, the computing power of the cluster system can be rapidly improved, and the computing performance increases almost linearly with the increase in the number of physical servers.

本发明技术方案使得本发明提出的集群系统的高可用性不需要准备额外的备份设备,节省了备份设备的成本。另外本发明提出的集群系统的所有虚拟机采用共享镜像的方式,大大降低存储空间的需求和使用。另外,本发明提供的集群系统通过NFS的方式来来共享磁盘空间,不需要另外单独购买共享存储设备,降低了存储空间的成本。 The technical scheme of the invention makes it unnecessary to prepare additional backup equipment for the high availability of the cluster system proposed by the invention, thereby saving the cost of the backup equipment. In addition, all the virtual machines of the cluster system proposed by the present invention adopt a shared image mode, which greatly reduces the demand and use of storage space. In addition, the cluster system provided by the present invention shares the disk space through NFS, and does not need to separately purchase a shared storage device, thereby reducing the cost of the storage space.

本领域的技术人员应该明白,上述的本发明的系统实施例的各组成部分或方法实施例的各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。 Those skilled in the art should understand that each component of the above-mentioned system embodiment of the present invention or each step of the method embodiment can be implemented by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in multiple On a network composed of computing devices, optionally, they can be implemented with executable program codes of computing devices, thus, they can be stored in storage devices and executed by computing devices, or they can be made into individual integrated circuit modules, or by making multiple modules or steps among them into a single integrated circuit module. As such, the present invention is not limited to any specific combination of hardware and software.

虽然本发明所揭露的实施方式如上,但所述的内容只是为了便于理解本发明而采用的实施方式,并非用以限定本发明。任何本发明所属技术领域内的技术人员,在不脱离本发明所揭露的精神和范围的前提下,可以在实施的形式上及细节上作任何的修改与变化,但本发明的专利保护范围,仍须以所附的权利要求书所界定的范围为准。 Although the embodiments disclosed in the present invention are as above, the described content is only an embodiment adopted for the convenience of understanding the present invention, and is not intended to limit the present invention. Anyone skilled in the technical field to which the present invention belongs can make any modifications and changes in the form and details of the implementation without departing from the spirit and scope disclosed by the present invention, but the patent protection scope of the present invention, The scope defined by the appended claims must still prevail.

Claims (9)

1. virtual machine cluster system, it is characterized in that, this virtual machine cluster system is structured on a plurality of physical servers, move virtual machines so that a plurality of virtual computing nodes to be provided at those physical servers, those virtual computing nodes are divided into a management node and a plurality of computing node, virtual computing nodes all in the whole virtual machine cluster system all use same disk mirroring, wherein:
This management node is used for creating host name and the dynamic IP addressing of a management holder to record each computing node, according to those host name and dynamic IP addressing, whole virtual machine cluster system is managed; This management node comprises expansion servers, and this expansion servers is configured to move the virtualization services supplier VSP of computing node for the physical server that will newly access this virtual machine cluster system;
Those computing nodes are for the calculation task of finishing this virtual machine cluster system.
2. virtual machine cluster system according to claim 1 is characterized in that:
Those physical servers provide the disk mirroring of virtual machine for the Copy on write form that adopts Qemu.
3. virtual machine cluster system according to claim 2 is characterized in that, this management node also comprises:
Dynamic Host Configuration Protocol server is for the dynamic distribution that carries out this dynamic IP addressing;
Name server is used for finishing domain name mapping;
Network file system server is used for this disk mirroring is shared.
4. virtual machine cluster system according to claim 1 is characterized in that:
Those physical servers are used for using the network file system(NFS) shared disk.
5. virtual machine cluster system according to claim 1 is characterized in that:
Setting up between this management node and those computing nodes has heartbeat to be connected, and this management node regularly sends heartbeat signal to those computing nodes, and the computing node of not receiving the heartbeat feedback is carried out virtual machine (vm) migration operates.
6. the implementation method of a virtual machine cluster system, it is characterized in that, this virtual machine cluster system is structured on a plurality of physical servers, move virtual machines so that a plurality of virtual computing nodes to be provided at those physical servers, those virtual computing nodes are divided into a management node and a plurality of computing node, virtual computing nodes all in the whole virtual machine cluster system all use same disk mirroring, wherein:
Create host name and the dynamic IP addressing of a management holder to record each computing node at this management node, and according to those host name and dynamic IP addressing whole virtual machine cluster system is managed; The physical server that the expansion servers of this management node will newly access this virtual machine cluster system is configured to move the virtualization services supplier VSP of computing node;
Wherein, those computing nodes are used for finishing the calculation task of this virtual machine cluster system.
7. method according to claim 6 is characterized in that:
Those physical servers adopt the Copy on write form of Qemu that the disk mirroring of virtual machine is provided.
8. method according to claim 6 is characterized in that:
Those physical servers use the network file system(NFS) shared disk.
9. method according to claim 6 is characterized in that:
Setting up heartbeat between this management node and those computing nodes is connected.
CN 201110051817 2011-03-04 2011-03-04 Virtual machine cluster system and implementation method thereof Active CN102110071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110051817 CN102110071B (en) 2011-03-04 2011-03-04 Virtual machine cluster system and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110051817 CN102110071B (en) 2011-03-04 2011-03-04 Virtual machine cluster system and implementation method thereof

Publications (2)

Publication Number Publication Date
CN102110071A CN102110071A (en) 2011-06-29
CN102110071B true CN102110071B (en) 2013-04-17

Family

ID=44174235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110051817 Active CN102110071B (en) 2011-03-04 2011-03-04 Virtual machine cluster system and implementation method thereof

Country Status (1)

Country Link
CN (1) CN102110071B (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355369B (en) * 2011-09-27 2014-01-08 华为技术有限公司 Virtualized cluster system and its processing method and equipment
CN102404385A (en) * 2011-10-25 2012-04-04 华中科技大学 Virtual cluster deployment system and deployment method for high performance computing
CN102412988A (en) * 2011-11-14 2012-04-11 浪潮(北京)电子信息产业有限公司 A business information system and its method for realizing continuous operation
US10031782B2 (en) * 2012-06-26 2018-07-24 Juniper Networks, Inc. Distributed processing of network device tasks
CN102970204B (en) * 2012-10-24 2017-09-01 曙光信息产业(北京)有限公司 A kind of distribution switch system and its implementation based on xen virtual platforms
CN102981929B (en) * 2012-11-05 2015-11-25 曙光云计算技术有限公司 The management method of disk mirroring and system
EP2911346B1 (en) * 2012-11-13 2017-07-05 Huawei Technologies Co., Ltd. Method and network device for establishing virtual cluster
CN103019625B (en) * 2012-12-13 2016-01-06 中国电信股份有限公司 Storage controlling method and equipment
CN103067501B (en) * 2012-12-28 2015-12-09 广州杰赛科技股份有限公司 The large data processing method of PaaS platform
US9483352B2 (en) * 2013-09-27 2016-11-01 Fisher-Rosemont Systems, Inc. Process control systems and methods
CN104601622B (en) * 2013-10-31 2018-04-17 国际商业机器公司 A kind of method and system for disposing cluster
TWI676898B (en) * 2013-12-09 2019-11-11 安然國際科技有限公司 Decentralized memory disk cluster storage system operation method
CN103605562B (en) * 2013-12-10 2017-05-03 浪潮电子信息产业股份有限公司 Method for migrating kernel-based virtual machine (KVM) between physical hosts
CN103647840A (en) * 2013-12-19 2014-03-19 深圳市青葡萄科技有限公司 Distributed management method of symmetric cluster
CN103729234B (en) * 2013-12-20 2017-06-27 中电长城网际系统应用有限公司 A kind of cluster virtual machine management method and device
CN103677961A (en) * 2013-12-20 2014-03-26 国云科技股份有限公司 Method for setting host name of virtual machine
CN103729233A (en) * 2013-12-20 2014-04-16 中电长城网际系统应用有限公司 Multiple virtual machines management method and device
WO2015135181A1 (en) * 2014-03-13 2015-09-17 华为技术有限公司 Graphic processing method, guest operating system (os) and guest os system
EP3163461B1 (en) * 2014-07-31 2022-05-11 Huawei Technologies Co., Ltd. Communication system and communication method
CN104318091B (en) * 2014-10-13 2017-03-15 航天东方红卫星有限公司 A kind of moonlet ground test method based on virtualization computer system
US9411628B2 (en) * 2014-11-13 2016-08-09 Microsoft Technology Licensing, Llc Virtual machine cluster backup in a multi-node environment
CN106302569B (en) * 2015-05-14 2019-06-18 华为技术有限公司 Method and computer system for processing virtual machine clusters
CN105978915A (en) * 2016-07-19 2016-09-28 浪潮电子信息产业股份有限公司 Security isolation method based on cloud resource control
CN106502797A (en) * 2016-10-28 2017-03-15 郑州云海信息技术有限公司 A kind of group system and the dispositions method of group system
CN106790477B (en) * 2016-12-12 2020-05-15 广州杰赛科技股份有限公司 System and method for realizing cloud classroom cluster
CN107241460B (en) * 2017-06-30 2020-06-23 联想(北京)有限公司 Floating address processing method and electronic equipment
CN107632937B (en) * 2017-10-10 2020-08-21 苏州浪潮智能科技有限公司 Method and device for testing virtual machine cluster and readable storage medium
CN108469996A (en) * 2018-03-13 2018-08-31 山东超越数控电子股份有限公司 A kind of system high availability method based on auto snapshot
CN110780973B (en) * 2018-07-31 2025-08-12 中兴通讯股份有限公司 Virtual machine migration device, method, equipment and readable storage medium
CN109032761A (en) * 2018-08-06 2018-12-18 郑州云海信息技术有限公司 Automatic deployment virtual machine and the method for installing OS automatically under a kind of Linux
CN110837451B (en) * 2018-08-16 2023-08-15 中国移动通信集团重庆有限公司 Processing method, device, equipment and medium for high availability of virtual machine
CN109491764A (en) * 2018-11-20 2019-03-19 郑州云海信息技术有限公司 A kind of virtual-machine fail management method based on openstack
CN109558212B (en) * 2018-11-27 2023-07-14 深信服科技股份有限公司 A virtualization management method, system, physical equipment and medium of physical equipment
US10503543B1 (en) * 2019-02-04 2019-12-10 Cohesity, Inc. Hosting virtual machines on a secondary storage system
CN111209025A (en) * 2020-01-19 2020-05-29 山东浪潮通软信息科技有限公司 SaaS platform implementation method based on heartbeat mechanism
CN112527325B (en) * 2020-11-23 2024-07-09 山东乾云启创信息科技股份有限公司 Deployment method and system applied to super fusion architecture
CN115202803A (en) * 2021-04-13 2022-10-18 超聚变数字技术有限公司 Fault processing method and device
CN116346332B (en) * 2023-03-10 2025-08-12 中安云科科技发展(山东)有限公司 Cluster creation method of virtualized cryptographic machine

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504620A (en) * 2009-03-03 2009-08-12 华为技术有限公司 Load balancing method, apparatus and system of virtual cluster system
CN101594387A (en) * 2009-06-29 2009-12-02 北京航空航天大学 Virtual cluster deployment method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8973098B2 (en) * 2007-01-11 2015-03-03 International Business Machines Corporation System and method for virtualized resource configuration

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504620A (en) * 2009-03-03 2009-08-12 华为技术有限公司 Load balancing method, apparatus and system of virtual cluster system
CN101594387A (en) * 2009-06-29 2009-12-02 北京航空航天大学 Virtual cluster deployment method and system

Also Published As

Publication number Publication date
CN102110071A (en) 2011-06-29

Similar Documents

Publication Publication Date Title
CN102110071B (en) Virtual machine cluster system and implementation method thereof
US8370833B2 (en) Method and system for implementing a virtual storage pool in a virtual environment
US7984108B2 (en) Computer system para-virtualization using a hypervisor that is implemented in a partition of the host system
US9164795B1 (en) Secure tunnel infrastructure between hosts in a hybrid network environment
US8473692B2 (en) Operating system image management
CN103491144B (en) A Construction Method of Wide Area Network Virtual Platform
US9197489B1 (en) Live migration of virtual machines in a hybrid network environment
US9928107B1 (en) Fast IP migration in a hybrid network environment
CN102394774B (en) Service state monitoring and failure recovery method for controllers of cloud computing operating system
US20160127206A1 (en) Rack awareness data storage in a cluster of host computing devices
WO2020123149A1 (en) Computing service with configurable virtualization control levels and accelerated launches
CN110912991A (en) Super-fusion-based high-availability implementation method for double nodes
EP2805239A1 (en) Systems and methods for server cluster application virtualization
CN104077199A (en) Isolation method and system for high availability cluster based on shared disk
CN115904608B (en) Control plane configuration
CN112084007A (en) NAS storage upgrade method and device based on virtual machine technology
CN112579008A (en) Storage deployment method, device, equipment and storage medium of container arrangement engine
US20210182116A1 (en) Method for running a quorum-based system by dynamically managing the quorum
CN106612314A (en) System for realizing software-defined storage based on virtual machine
CN115904603A (en) Method and device for virtual machine application migration between heterogeneous platforms
US20230176884A1 (en) Techniques for switching device implementations for virtual devices
TWI763331B (en) Backup method and backup system for virtual machine
US11561856B2 (en) Erasure coding of replicated data blocks
Dell
CN107329805A (en) The implementation method and device of a kind of virtual platform high availability

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201123

Address after: 215100 No. 1 Guanpu Road, Guoxiang Street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 100085 Beijing, Haidian District on the road to information on the ground floor, building 2-1, No. 1, C

Patentee before: Inspur (Beijing) Electronic Information Industry Co.,Ltd.

CP03 Change of name, title or address

Address after: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Patentee after: Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Patentee before: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China