CN102136972A - Super large scale cluster monitoring system and method - Google Patents

Super large scale cluster monitoring system and method Download PDF

Info

Publication number
CN102136972A
CN102136972A CN2011100695219A CN201110069521A CN102136972A CN 102136972 A CN102136972 A CN 102136972A CN 2011100695219 A CN2011100695219 A CN 2011100695219A CN 201110069521 A CN201110069521 A CN 201110069521A CN 102136972 A CN102136972 A CN 102136972A
Authority
CN
China
Prior art keywords
monitoring server
subregion
information
configuration
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100695219A
Other languages
Chinese (zh)
Inventor
赵欢
温鑫
邵宗有
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Co Ltd filed Critical Dawning Information Industry Co Ltd
Priority to CN2011100695219A priority Critical patent/CN102136972A/en
Publication of CN102136972A publication Critical patent/CN102136972A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a super large scale cluster monitoring system and method. The system comprises subregion monitoring servers and a center monitoring server, wherein each subregion monitoring server acquires internal information of each subregion cluster and pushes the information to the center monitoring server; the center monitoring server receives the information from the subregion monitoring server, uniformly configures all the subregions according to the information, and then provides a uniform configuration result to each subregion monitoring server; and the subregion monitoring server receives the uniform configuration result from the center monitoring server, and monitors and manages a cluster according to the result. In the invention, a uniform centralized monitoring platform is adopted, and high-efficiency monitoring and management on a super large scale cluster are effectively realized.

Description

A kind of ultra-large cluster monitoring system and method
Technical field
The present invention relates to computer High-Performance Computing Cluster monitoring field, be specifically related to a kind of ultra-large cluster monitoring system and method.
Background technology
Along with number of servers increases year by year,, need in time to grasp the situation of cluster, and in real time cluster is monitored for administrative staff.
Monitor ultra-large cluster (greater than 2000) and can run into various performance bottleneck problems, this technology has solved the performance issue of unified performance issue, mass data storage and the inquiry of gathering of ultra-large cluster state.Therefore, need a kind of system and method to address the above problem.
Summary of the invention
For overcoming the above problems, the invention provides a kind of ultra-large cluster monitoring system and method.
A kind of ultra-large cluster monitoring system, this system comprises subregion monitoring server and center monitoring server;
Described subregion monitoring server comprises:
Gather the monitoring module of subregion internal information;
Described information is pushed to the propelling movement module of described center monitoring server;
Reception is from the receiver module of the unified configuration result configuration of described center monitoring server;
Described center monitoring server comprises:
Reception is from the receiver module of the information of described subregion monitoring server;
The configuration module of partitioned server being unified to dispose according to the monitor message of partitioned server;
The result of unified configuration offers the sending module of subregion monitoring server.
Preferably, the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.
Preferably, the result of described unified configuration comprises supervisor authority information, user management configuration, at least one item in alarm configuration and the information gathering configuration.
A kind of ultra-large cluster monitoring method, step is as follows:
A, each subregion monitoring server push to the center monitoring server with this information after collecting each subregion cluster internal information;
B, center monitoring server receive the information from the subregion monitoring server, and according to information all subregions are unified configuration, will unify configuration result then and offer the subregion monitoring server;
C, subregion monitoring server receive the unified configuration result from the center monitoring server, by this result cluster are monitored and are managed.
Preferably, the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.
Preferably, the result of described unified configuration comprises supervisor authority, user management configuration, at least one item in alarm configuration and the information gathering configuration.
The present invention has effectively realized ultra-large cluster is monitored efficiently and managed by using unified centralized monitor supervision platform.
Description of drawings
Fig. 1 is the structure chart according to ultra-large cluster monitoring system of the present invention;
Fig. 2 is the flow chart according to ultra-large cluster monitoring method of the present invention.
Embodiment
Fig. 1 is the structure chart according to ultra-large cluster monitoring system of the present invention, and as shown in Figure 1, this system comprises a plurality of subregion monitoring servers 100 and center monitoring server 200.Each subregion monitoring server 100 is gathered the information that each divides subregion inside, and the information that collects is sent to center monitoring server 200 in the mode that pushes with information.Why the mode that pushes of employing and not use center go each minute subregion initiatively to obtain the mode of information, be to cause network bandwidth bottleneck easily because subregion initiatively obtains information simultaneously when too much, and the mode that employing pushes, because each subregion pushes the randomness of time, then can alleviate the bandwidth pressure that sends information simultaneously to a great extent.Center monitoring server 200 receives the information from a plurality of subregion monitoring servers 100, and according to information all subregions is unified configuration, and the result that will unify to dispose offers subregion monitoring server 100.
Subregion monitoring server 100 comprises monitoring module 110, pushes module 120 and configuration receiver module 130.Wherein, monitoring module 110 is used to gather the information of subregion inside.The information that propelling movement module 120 is used for collecting pushes to center monitoring server 200.Configuration receiver module 130 is used to receive the unified configuration result from center monitoring server 200.
Center monitoring server 200 comprises receiver module 210, configuration module 220 and sending module 230.Wherein, receiver module 210 is used to receive the information from a plurality of subregion monitoring servers 100.Configuration module 220 is used for all subregions are unified configuration.Sending module 230 is used for the result of unified configuration is offered subregion monitoring server 100.
Describe ultra-large cluster monitoring method of the present invention in detail below in conjunction with accompanying drawing 2, this method may further comprise the steps:
Step S210, each subregion monitoring server 100 pushes information to center monitoring server 200 after collecting the information of each subregion cluster inside.
Step S220, center monitoring server 200 receives the information from subregion monitoring server 100, and all subregions are unified configuration, and the result that will unify to dispose offers each subregion monitoring server 100 then.
Step S230, the unified configuration result that each subregion monitoring server 100 receives from center monitoring server 200 is monitored and is managed subregion according to unified configuration result.
Should be understood that the foregoing description only is schematic embodiment, does not limit the present invention and only can realize by the foregoing description.Those of ordinary skill in the art can also propose other modifications or variation according to such scheme, and these modifications or variation all should be included in of the present invention comprising within the scope.
Adopt ultra-large cluster monitoring system of the present invention, realized ultra-large cluster monitoring, and can support that the network bandwidth that monitor message takies is few to reaching the monitoring of station servers up to ten thousand, the real-time performance of monitoring is good.Simultaneously, provide integrated interface, possess extensibility, integration, reliability and ease for use, thereby satisfy the demand that ultra-large cluster is monitored various commercializations, self-defining management tool.

Claims (6)

1. ultra-large cluster monitoring system, it is characterized in that: this system comprises subregion monitoring server and center monitoring server;
Described subregion monitoring server comprises:
Gather the monitoring module of subregion internal information;
Described information is pushed to the propelling movement module of described center monitoring server;
Reception is from the receiver module of the unified configuration result configuration of described center monitoring server;
Described center monitoring server comprises:
Reception is from the receiver module of the information of described subregion monitoring server;
The configuration module of partitioned server being unified to dispose according to the monitor message of partitioned server;
The result of unified configuration offers the sending module of subregion monitoring server.
2. a kind of according to claim 1 ultra-large cluster monitoring system, it is characterized in that: the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.
3. a kind of according to claim 1 ultra-large cluster monitoring system is characterized in that: the result of described unified configuration comprises supervisor authority information, user management configuration, at least one item in alarm configuration and the information gathering configuration.
4. ultra-large cluster monitoring method, it is characterized in that: step is as follows:
A, each subregion monitoring server push to the center monitoring server with this information after collecting each subregion cluster internal information;
B, center monitoring server receive the information from the subregion monitoring server, and according to information all subregions are unified configuration, will unify configuration result then and offer the subregion monitoring server;
C, subregion monitoring server receive the unified configuration result from the center monitoring server, by this result cluster are monitored and are managed.
5. as a kind of ultra-large cluster monitoring method as described in the claim 4, it is characterized in that: the information of described subregion monitoring server comprises warning information at least, in cpu busy percentage and the EMS memory occupation space one.
6. as a kind of ultra-large cluster monitoring method as described in the claim 4, it is characterized in that: the result of described unified configuration comprises supervisor authority, user management configuration, in alarm configuration and the information gathering configuration at least one.
CN2011100695219A 2011-03-22 2011-03-22 Super large scale cluster monitoring system and method Pending CN102136972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100695219A CN102136972A (en) 2011-03-22 2011-03-22 Super large scale cluster monitoring system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100695219A CN102136972A (en) 2011-03-22 2011-03-22 Super large scale cluster monitoring system and method

Publications (1)

Publication Number Publication Date
CN102136972A true CN102136972A (en) 2011-07-27

Family

ID=44296633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100695219A Pending CN102136972A (en) 2011-03-22 2011-03-22 Super large scale cluster monitoring system and method

Country Status (1)

Country Link
CN (1) CN102136972A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077106A (en) * 2012-12-02 2013-05-01 吉林省蓝格信息科技有限公司 Micro-environmental monitoring system for machine room equipment
CN108809717A (en) * 2018-06-12 2018-11-13 中国铁塔股份有限公司 Node acquisition zone server, distributed monitoring method and system
CN111737079A (en) * 2020-05-20 2020-10-02 山东鲸鲨信息技术有限公司 Method and device for monitoring cluster network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404803A (en) * 2008-11-13 2009-04-08 浪潮通信信息系统有限公司 Multidimensional monitoring method for network management system
CN101719841A (en) * 2009-11-13 2010-06-02 曙光信息产业(北京)有限公司 Monitoring system and method of distributed type assemblies

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404803A (en) * 2008-11-13 2009-04-08 浪潮通信信息系统有限公司 Multidimensional monitoring method for network management system
CN101719841A (en) * 2009-11-13 2010-06-02 曙光信息产业(北京)有限公司 Monitoring system and method of distributed type assemblies

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077106A (en) * 2012-12-02 2013-05-01 吉林省蓝格信息科技有限公司 Micro-environmental monitoring system for machine room equipment
CN108809717A (en) * 2018-06-12 2018-11-13 中国铁塔股份有限公司 Node acquisition zone server, distributed monitoring method and system
CN111737079A (en) * 2020-05-20 2020-10-02 山东鲸鲨信息技术有限公司 Method and device for monitoring cluster network
CN111737079B (en) * 2020-05-20 2024-04-09 山东鲸鲨信息技术有限公司 Cluster network monitoring method and device

Similar Documents

Publication Publication Date Title
CN101719841B (en) Monitoring system and method of distributed type assemblies
CN110858850B (en) Comprehensive network management method, device and system for rail transit system
CN110247810B (en) System and method for collecting container service monitoring data
EP2791825B1 (en) System and method for monitoring and managing data center resources in real time incorporating manageability subsystem
CN104092719B (en) Document transmission method, device and distributed cluster file system
CN203466840U (en) Cloud service monitoring system
CN102064975B (en) Network equipment supervision method and system
CN107104840A (en) A kind of daily record monitoring method, apparatus and system
WO2010099514A3 (en) System and method for computer cloud management
CA2724251A1 (en) System and method for aggregate monitoring of user-based groups of private computer networks
CN102196373A (en) Short message alarm system and short message alarm method
US9104745B1 (en) Distributed log collector and report generation
CN103389715A (en) High-performance distributed data center monitoring framework
CN109039817B (en) Information processing method, device, equipment and medium for flow monitoring
CN108270860A (en) The acquisition system and method for environmental quality online monitoring data
CN113190583B (en) Data acquisition system, method, electronic equipment and storage medium
CN101222347A (en) A method and device for enabling users to obtain network data
CN102136972A (en) Super large scale cluster monitoring system and method
CN103248636A (en) Offline download system and method
EP2674876A1 (en) Streaming analytics processing node and network topology aware streaming analytics system
US9544214B2 (en) System and method for optimized event monitoring in a management environment
CN202907094U (en) Wireless video monitoring system based on Internet of Things
CN112966051A (en) Distributed data exchange system and method
WO2014018875A1 (en) Cloud-based data center infrastructure management system and method
CN109376131A (en) A method, device and system for distributed deployment and storage of logs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110727