CN102136972A

CN102136972A - Super large scale cluster monitoring system and method

Info

Publication number: CN102136972A
Application number: CN2011100695219A
Authority: CN
Inventors: 赵欢; 温鑫; 邵宗有
Original assignee: Dawning Information Industry Co Ltd
Current assignee: Dawning Information Industry Co Ltd
Priority date: 2011-03-22
Filing date: 2011-03-22
Publication date: 2011-07-27

Abstract

The invention relates to a super large scale cluster monitoring system and method. The system comprises subregion monitoring servers and a center monitoring server, wherein each subregion monitoring server acquires internal information of each subregion cluster and pushes the information to the center monitoring server; the center monitoring server receives the information from the subregion monitoring server, uniformly configures all the subregions according to the information, and then provides a uniform configuration result to each subregion monitoring server; and the subregion monitoring server receives the uniform configuration result from the center monitoring server, and monitors and manages a cluster according to the result. In the invention, a uniform centralized monitoring platform is adopted, and high-efficiency monitoring and management on a super large scale cluster are effectively realized.

Description

A kind of ultra-large cluster monitoring system and method

Technical field

The present invention relates to computer High-Performance Computing Cluster monitoring field, be specifically related to a kind of ultra-large cluster monitoring system and method.

Background technology

Along with number of servers increases year by year,, need in time to grasp the situation of cluster, and in real time cluster is monitored for administrative staff.

Monitor ultra-large cluster (greater than 2000) and can run into various performance bottleneck problems, this technology has solved the performance issue of unified performance issue, mass data storage and the inquiry of gathering of ultra-large cluster state.Therefore, need a kind of system and method to address the above problem.

Summary of the invention

For overcoming the above problems, the invention provides a kind of ultra-large cluster monitoring system and method.

A kind of ultra-large cluster monitoring system, this system comprises subregion monitoring server and center monitoring server;

Described subregion monitoring server comprises:

Gather the monitoring module of subregion internal information;

Described information is pushed to the propelling movement module of described center monitoring server;

Reception is from the receiver module of the unified configuration result configuration of described center monitoring server;

Described center monitoring server comprises:

Reception is from the receiver module of the information of described subregion monitoring server;

The configuration module of partitioned server being unified to dispose according to the monitor message of partitioned server;

The result of unified configuration offers the sending module of subregion monitoring server.

Preferably, the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.

Preferably, the result of described unified configuration comprises supervisor authority information, user management configuration, at least one item in alarm configuration and the information gathering configuration.

A kind of ultra-large cluster monitoring method, step is as follows:

A, each subregion monitoring server push to the center monitoring server with this information after collecting each subregion cluster internal information;

B, center monitoring server receive the information from the subregion monitoring server, and according to information all subregions are unified configuration, will unify configuration result then and offer the subregion monitoring server;

C, subregion monitoring server receive the unified configuration result from the center monitoring server, by this result cluster are monitored and are managed.

Preferably, the result of described unified configuration comprises supervisor authority, user management configuration, at least one item in alarm configuration and the information gathering configuration.

The present invention has effectively realized ultra-large cluster is monitored efficiently and managed by using unified centralized monitor supervision platform.

Description of drawings

Fig. 1 is the structure chart according to ultra-large cluster monitoring system of the present invention;

Fig. 2 is the flow chart according to ultra-large cluster monitoring method of the present invention.

Embodiment

Fig. 1 is the structure chart according to ultra-large cluster monitoring system of the present invention, and as shown in Figure 1, this system comprises a plurality of subregion monitoring servers 100 and center monitoring server 200.Each subregion monitoring server 100 is gathered the information that each divides subregion inside, and the information that collects is sent to center monitoring server 200 in the mode that pushes with information.Why the mode that pushes of employing and not use center go each minute subregion initiatively to obtain the mode of information, be to cause network bandwidth bottleneck easily because subregion initiatively obtains information simultaneously when too much, and the mode that employing pushes, because each subregion pushes the randomness of time, then can alleviate the bandwidth pressure that sends information simultaneously to a great extent.Center monitoring server 200 receives the information from a plurality of subregion monitoring servers 100, and according to information all subregions is unified configuration, and the result that will unify to dispose offers subregion monitoring server 100.

Subregion monitoring server 100 comprises monitoring module 110, pushes module 120 and configuration receiver module 130.Wherein, monitoring module 110 is used to gather the information of subregion inside.The information that propelling movement module 120 is used for collecting pushes to center monitoring server 200.Configuration receiver module 130 is used to receive the unified configuration result from center monitoring server 200.

Center monitoring server 200 comprises receiver module 210, configuration module 220 and sending module 230.Wherein, receiver module 210 is used to receive the information from a plurality of subregion monitoring servers 100.Configuration module 220 is used for all subregions are unified configuration.Sending module 230 is used for the result of unified configuration is offered subregion monitoring server 100.

Describe ultra-large cluster monitoring method of the present invention in detail below in conjunction with accompanying drawing 2, this method may further comprise the steps:

Step S210, each subregion monitoring server 100 pushes information to center monitoring server 200 after collecting the information of each subregion cluster inside.

Step S220, center monitoring server 200 receives the information from subregion monitoring server 100, and all subregions are unified configuration, and the result that will unify to dispose offers each subregion monitoring server 100 then.

Step S230, the unified configuration result that each subregion monitoring server 100 receives from center monitoring server 200 is monitored and is managed subregion according to unified configuration result.

Should be understood that the foregoing description only is schematic embodiment, does not limit the present invention and only can realize by the foregoing description.Those of ordinary skill in the art can also propose other modifications or variation according to such scheme, and these modifications or variation all should be included in of the present invention comprising within the scope.

Adopt ultra-large cluster monitoring system of the present invention, realized ultra-large cluster monitoring, and can support that the network bandwidth that monitor message takies is few to reaching the monitoring of station servers up to ten thousand, the real-time performance of monitoring is good.Simultaneously, provide integrated interface, possess extensibility, integration, reliability and ease for use, thereby satisfy the demand that ultra-large cluster is monitored various commercializations, self-defining management tool.

Claims

1. ultra-large cluster monitoring system, it is characterized in that: this system comprises subregion monitoring server and center monitoring server;

Described subregion monitoring server comprises:

Gather the monitoring module of subregion internal information;

Described center monitoring server comprises:

2. a kind of according to claim 1 ultra-large cluster monitoring system, it is characterized in that: the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.

3. a kind of according to claim 1 ultra-large cluster monitoring system is characterized in that: the result of described unified configuration comprises supervisor authority information, user management configuration, at least one item in alarm configuration and the information gathering configuration.

4. ultra-large cluster monitoring method, it is characterized in that: step is as follows:

5. as a kind of ultra-large cluster monitoring method as described in the claim 4, it is characterized in that: the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.

6. as a kind of ultra-large cluster monitoring method as described in the claim 4, it is characterized in that: the result of described unified configuration comprises supervisor authority, user management configuration, in alarm configuration and the information gathering configuration at least one.