CN102136972A - Super large scale cluster monitoring system and method - Google Patents
Super large scale cluster monitoring system and method Download PDFInfo
- Publication number
- CN102136972A CN102136972A CN2011100695219A CN201110069521A CN102136972A CN 102136972 A CN102136972 A CN 102136972A CN 2011100695219 A CN2011100695219 A CN 2011100695219A CN 201110069521 A CN201110069521 A CN 201110069521A CN 102136972 A CN102136972 A CN 102136972A
- Authority
- CN
- China
- Prior art keywords
- monitoring server
- subregion
- information
- configuration
- monitoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 91
- 238000000034 method Methods 0.000 title claims abstract description 13
- 238000007726 management method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention relates to a super large scale cluster monitoring system and method. The system comprises subregion monitoring servers and a center monitoring server, wherein each subregion monitoring server acquires internal information of each subregion cluster and pushes the information to the center monitoring server; the center monitoring server receives the information from the subregion monitoring server, uniformly configures all the subregions according to the information, and then provides a uniform configuration result to each subregion monitoring server; and the subregion monitoring server receives the uniform configuration result from the center monitoring server, and monitors and manages a cluster according to the result. In the invention, a uniform centralized monitoring platform is adopted, and high-efficiency monitoring and management on a super large scale cluster are effectively realized.
Description
Technical field
The present invention relates to computer High-Performance Computing Cluster monitoring field, be specifically related to a kind of ultra-large cluster monitoring system and method.
Background technology
Along with number of servers increases year by year,, need in time to grasp the situation of cluster, and in real time cluster is monitored for administrative staff.
Monitor ultra-large cluster (greater than 2000) and can run into various performance bottleneck problems, this technology has solved the performance issue of unified performance issue, mass data storage and the inquiry of gathering of ultra-large cluster state.Therefore, need a kind of system and method to address the above problem.
Summary of the invention
For overcoming the above problems, the invention provides a kind of ultra-large cluster monitoring system and method.
A kind of ultra-large cluster monitoring system, this system comprises subregion monitoring server and center monitoring server;
Described subregion monitoring server comprises:
Gather the monitoring module of subregion internal information;
Described information is pushed to the propelling movement module of described center monitoring server;
Reception is from the receiver module of the unified configuration result configuration of described center monitoring server;
Described center monitoring server comprises:
Reception is from the receiver module of the information of described subregion monitoring server;
The configuration module of partitioned server being unified to dispose according to the monitor message of partitioned server;
The result of unified configuration offers the sending module of subregion monitoring server.
Preferably, the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.
Preferably, the result of described unified configuration comprises supervisor authority information, user management configuration, at least one item in alarm configuration and the information gathering configuration.
A kind of ultra-large cluster monitoring method, step is as follows:
A, each subregion monitoring server push to the center monitoring server with this information after collecting each subregion cluster internal information;
B, center monitoring server receive the information from the subregion monitoring server, and according to information all subregions are unified configuration, will unify configuration result then and offer the subregion monitoring server;
C, subregion monitoring server receive the unified configuration result from the center monitoring server, by this result cluster are monitored and are managed.
Preferably, the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.
Preferably, the result of described unified configuration comprises supervisor authority, user management configuration, at least one item in alarm configuration and the information gathering configuration.
The present invention has effectively realized ultra-large cluster is monitored efficiently and managed by using unified centralized monitor supervision platform.
Description of drawings
Fig. 1 is the structure chart according to ultra-large cluster monitoring system of the present invention;
Fig. 2 is the flow chart according to ultra-large cluster monitoring method of the present invention.
Embodiment
Fig. 1 is the structure chart according to ultra-large cluster monitoring system of the present invention, and as shown in Figure 1, this system comprises a plurality of subregion monitoring servers 100 and center monitoring server 200.Each subregion monitoring server 100 is gathered the information that each divides subregion inside, and the information that collects is sent to center monitoring server 200 in the mode that pushes with information.Why the mode that pushes of employing and not use center go each minute subregion initiatively to obtain the mode of information, be to cause network bandwidth bottleneck easily because subregion initiatively obtains information simultaneously when too much, and the mode that employing pushes, because each subregion pushes the randomness of time, then can alleviate the bandwidth pressure that sends information simultaneously to a great extent.Center monitoring server 200 receives the information from a plurality of subregion monitoring servers 100, and according to information all subregions is unified configuration, and the result that will unify to dispose offers subregion monitoring server 100.
Describe ultra-large cluster monitoring method of the present invention in detail below in conjunction with accompanying drawing 2, this method may further comprise the steps:
Step S210, each subregion monitoring server 100 pushes information to center monitoring server 200 after collecting the information of each subregion cluster inside.
Step S220, center monitoring server 200 receives the information from subregion monitoring server 100, and all subregions are unified configuration, and the result that will unify to dispose offers each subregion monitoring server 100 then.
Step S230, the unified configuration result that each subregion monitoring server 100 receives from center monitoring server 200 is monitored and is managed subregion according to unified configuration result.
Should be understood that the foregoing description only is schematic embodiment, does not limit the present invention and only can realize by the foregoing description.Those of ordinary skill in the art can also propose other modifications or variation according to such scheme, and these modifications or variation all should be included in of the present invention comprising within the scope.
Adopt ultra-large cluster monitoring system of the present invention, realized ultra-large cluster monitoring, and can support that the network bandwidth that monitor message takies is few to reaching the monitoring of station servers up to ten thousand, the real-time performance of monitoring is good.Simultaneously, provide integrated interface, possess extensibility, integration, reliability and ease for use, thereby satisfy the demand that ultra-large cluster is monitored various commercializations, self-defining management tool.
Claims (6)
1. ultra-large cluster monitoring system, it is characterized in that: this system comprises subregion monitoring server and center monitoring server;
Described subregion monitoring server comprises:
Gather the monitoring module of subregion internal information;
Described information is pushed to the propelling movement module of described center monitoring server;
Reception is from the receiver module of the unified configuration result configuration of described center monitoring server;
Described center monitoring server comprises:
Reception is from the receiver module of the information of described subregion monitoring server;
The configuration module of partitioned server being unified to dispose according to the monitor message of partitioned server;
The result of unified configuration offers the sending module of subregion monitoring server.
2. a kind of according to claim 1 ultra-large cluster monitoring system, it is characterized in that: the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.
3. a kind of according to claim 1 ultra-large cluster monitoring system is characterized in that: the result of described unified configuration comprises supervisor authority information, user management configuration, at least one item in alarm configuration and the information gathering configuration.
4. ultra-large cluster monitoring method, it is characterized in that: step is as follows:
A, each subregion monitoring server push to the center monitoring server with this information after collecting each subregion cluster internal information;
B, center monitoring server receive the information from the subregion monitoring server, and according to information all subregions are unified configuration, will unify configuration result then and offer the subregion monitoring server;
C, subregion monitoring server receive the unified configuration result from the center monitoring server, by this result cluster are monitored and are managed.
5. as a kind of ultra-large cluster monitoring method as described in the claim 4, it is characterized in that: the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.
6. as a kind of ultra-large cluster monitoring method as described in the claim 4, it is characterized in that: the result of described unified configuration comprises supervisor authority, user management configuration, in alarm configuration and the information gathering configuration at least one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100695219A CN102136972A (en) | 2011-03-22 | 2011-03-22 | Super large scale cluster monitoring system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100695219A CN102136972A (en) | 2011-03-22 | 2011-03-22 | Super large scale cluster monitoring system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102136972A true CN102136972A (en) | 2011-07-27 |
Family
ID=44296633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011100695219A Pending CN102136972A (en) | 2011-03-22 | 2011-03-22 | Super large scale cluster monitoring system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102136972A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077106A (en) * | 2012-12-02 | 2013-05-01 | 吉林省蓝格信息科技有限公司 | Micro-environmental monitoring system for machine room equipment |
CN108809717A (en) * | 2018-06-12 | 2018-11-13 | 中国铁塔股份有限公司 | Node acquisition zone server, distributed monitoring method and system |
CN111737079A (en) * | 2020-05-20 | 2020-10-02 | 山东鲸鲨信息技术有限公司 | Method and device for monitoring cluster network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101404803A (en) * | 2008-11-13 | 2009-04-08 | 浪潮通信信息系统有限公司 | Multidimensional monitoring method for network management system |
CN101719841A (en) * | 2009-11-13 | 2010-06-02 | 曙光信息产业(北京)有限公司 | Monitoring system and method of distributed type assemblies |
-
2011
- 2011-03-22 CN CN2011100695219A patent/CN102136972A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101404803A (en) * | 2008-11-13 | 2009-04-08 | 浪潮通信信息系统有限公司 | Multidimensional monitoring method for network management system |
CN101719841A (en) * | 2009-11-13 | 2010-06-02 | 曙光信息产业(北京)有限公司 | Monitoring system and method of distributed type assemblies |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077106A (en) * | 2012-12-02 | 2013-05-01 | 吉林省蓝格信息科技有限公司 | Micro-environmental monitoring system for machine room equipment |
CN108809717A (en) * | 2018-06-12 | 2018-11-13 | 中国铁塔股份有限公司 | Node acquisition zone server, distributed monitoring method and system |
CN111737079A (en) * | 2020-05-20 | 2020-10-02 | 山东鲸鲨信息技术有限公司 | Method and device for monitoring cluster network |
CN111737079B (en) * | 2020-05-20 | 2024-04-09 | 山东鲸鲨信息技术有限公司 | Cluster network monitoring method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101719841B (en) | Monitoring system and method of distributed type assemblies | |
CN110858850B (en) | Comprehensive network management method, device and system for rail transit system | |
CN110247810B (en) | System and method for collecting container service monitoring data | |
EP2791825B1 (en) | System and method for monitoring and managing data center resources in real time incorporating manageability subsystem | |
CN104092719B (en) | Document transmission method, device and distributed cluster file system | |
CN203466840U (en) | Cloud service monitoring system | |
CN102064975B (en) | Network equipment supervision method and system | |
CN107104840A (en) | A kind of daily record monitoring method, apparatus and system | |
WO2010099514A3 (en) | System and method for computer cloud management | |
CA2724251A1 (en) | System and method for aggregate monitoring of user-based groups of private computer networks | |
CN102196373A (en) | Short message alarm system and short message alarm method | |
US9104745B1 (en) | Distributed log collector and report generation | |
CN103389715A (en) | High-performance distributed data center monitoring framework | |
CN109039817B (en) | Information processing method, device, equipment and medium for flow monitoring | |
CN108270860A (en) | The acquisition system and method for environmental quality online monitoring data | |
CN113190583B (en) | Data acquisition system, method, electronic equipment and storage medium | |
CN101222347A (en) | A method and device for enabling users to obtain network data | |
CN102136972A (en) | Super large scale cluster monitoring system and method | |
CN103248636A (en) | Offline download system and method | |
EP2674876A1 (en) | Streaming analytics processing node and network topology aware streaming analytics system | |
US9544214B2 (en) | System and method for optimized event monitoring in a management environment | |
CN202907094U (en) | Wireless video monitoring system based on Internet of Things | |
CN112966051A (en) | Distributed data exchange system and method | |
WO2014018875A1 (en) | Cloud-based data center infrastructure management system and method | |
CN109376131A (en) | A method, device and system for distributed deployment and storage of logs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20110727 |