CN114640499A - Method and device for carrying out abnormity identification on user behavior - Google Patents

Method and device for carrying out abnormity identification on user behavior Download PDF

Info

Publication number
CN114640499A
CN114640499A CN202210130017.3A CN202210130017A CN114640499A CN 114640499 A CN114640499 A CN 114640499A CN 202210130017 A CN202210130017 A CN 202210130017A CN 114640499 A CN114640499 A CN 114640499A
Authority
CN
China
Prior art keywords
sql
sql template
data
template
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210130017.3A
Other languages
Chinese (zh)
Inventor
刘永波
徐裕斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ankki Technology Co ltd
Original Assignee
Shenzhen Ankki Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ankki Technology Co ltd filed Critical Shenzhen Ankki Technology Co ltd
Priority to CN202210130017.3A priority Critical patent/CN114640499A/en
Publication of CN114640499A publication Critical patent/CN114640499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method and a device for identifying user behaviors abnormally. The method comprises the steps of collecting and storing big data, extracting an SQL template and business data, analyzing the SQL template by using a K-Means clustering algorithm, constructing an incidence relation between the SQL template and the business data, extracting characteristics of the SQL template and establishing a characteristic model, and identifying abnormal behaviors according to a clustering algorithm analysis result, the incidence relation and the characteristic model. By the mode, the method and the system can refine the SQL template and the SQL data, perform machine learning and learn and mine the knowledge map, and finally identify unknown risks and deep attack behaviors without configuring other rules.

Description

一种对用户行为进行异常识别的方法及其装置A method and device for abnormal identification of user behavior

技术领域technical field

本发明实施方式涉及机器学习领域,特别是涉及一种对用户行为进行异常识别的方法及其装置。Embodiments of the present invention relate to the field of machine learning, and in particular, to a method and device for abnormal identification of user behavior.

背景技术Background technique

在目前的大数据环境下,数据安全是一个容易被忽视的特性,由于忽视数据的安全管理,从而造成数据泄露和暴露的现象已非常普遍,由于数据库技术广泛运用到各种信息管理系统、交易系统,以及各种社交软件、社交网站、网络论坛等社交系统。在这些数据库中,存储了大量客户的姓名、身份证号、个人密码等个人隐私资料,有的还存储了客户银行卡卡号和有效期等金融隐私资料。只要数据库中存储了任何人的任意个人数据,无论是用户还是公司员工,数据库的安全都变得至关重要,如果不采取有效的保护措施,一旦机密资料被不法分子窃取,不仅会导致客户个人隐私资料泄漏,更甚者还可能导致客户经济损失,随着黑市对数据需求的上升,数据泄露利润的上涨,内网的非法用户铤而走险,进行非法的拖库、刷库盗取数据,售卖数据、非法更改破坏数据的事件时有发生。In the current big data environment, data security is a feature that is easily overlooked. Due to the neglect of data security management, the phenomenon of data leakage and exposure has become very common. Because database technology is widely used in various information management systems, transactions system, as well as various social software, social networking sites, online forums and other social systems. In these databases, a large number of personal privacy data such as names, ID numbers, personal passwords, etc. of customers are stored, and some financial privacy data such as bank card numbers and expiration dates of customers are also stored. As long as any personal data of anyone is stored in the database, whether it is a user or a company employee, the security of the database becomes very important. If effective protection measures are not taken, once the confidential information is stolen by criminals, it will not only lead to customers The leakage of private information may even lead to economic losses for customers. With the increase in the demand for data in the black market and the increase in profits from data leakage, illegal users of the intranet take the risk of illegally dragging and brushing the database to steal data and sell data. , Illegal modification and destruction of data have occurred from time to time.

通过预设的安全策略、规则很难识别这种风险行为,如何在大量行为中识别出这种异常行为,成为行业的难点问题。现有技术是通过预设已知的异常行为规则、策略,在大量数据库访问行为日志中,检索分析出异常的行为、用户。但这样的方式只能预设已知的风险行为特征、规则,对未知的风险仍缺乏识别能力。It is difficult to identify such risky behaviors through preset security policies and rules. How to identify such abnormal behaviors among a large number of behaviors has become a difficult problem for the industry. The prior art retrieves and analyzes abnormal behaviors and users from a large number of database access behavior logs by presetting known abnormal behavior rules and policies. However, this method can only preset known risk behavior characteristics and rules, and still lack the ability to identify unknown risks.

发明内容SUMMARY OF THE INVENTION

本发明实施方式主要解决的技术问题是提供一种对用户行为进行异常识别的方法及其装置,能够提炼SQL模板、SQL数据并进行机器学习、知识图谱的学习和挖掘,最终无须配置其他规则,识别出深层次的攻击行为。The main technical problem solved by the embodiments of the present invention is to provide a method and device for abnormal identification of user behavior, which can extract SQL templates, SQL data, and perform machine learning, knowledge graph learning and mining, and ultimately do not need to configure other rules, Identify deep-seated aggressive behavior.

为解决上述技术问题,本发明实施方式采用的一个技术方案是:提供一种对用户行为进行异常识别的方法,该方法包括:对数据库访问日志进行大数据采集并存储;将所述数据库访问日志中包含的SQL语句抽取出SQL模板和业务数据;对抽取出的所述SQL模板使用K-Means聚类算法进行分析;对抽取出的所述业务数据构造知识图谱,构建所述SQL模板和所述业务数据的关联关系,并将所述关联关系保存到分析库中;对抽取出的所述SQL模板进行习惯特征提取分析,建立用户特征模型;通过所述SQL模板的习惯特征、所述SQL模板和所述业务数据的关联关系或所述SQL模板的聚类分析结果与应用用户进行比对识别异常行为。In order to solve the above technical problem, a technical solution adopted by the embodiments of the present invention is to provide a method for abnormal identification of user behavior, the method comprising: collecting and storing big data on database access logs; SQL template and business data are extracted from the SQL statement contained in it; K-Means clustering algorithm is used to analyze the extracted SQL template; knowledge graph is constructed for the extracted business data, and the SQL template and all the business data are constructed. Describe the association relationship of the business data, and save the association relationship in the analysis database; carry out habit feature extraction analysis on the extracted SQL template, and establish a user feature model; through the habit feature of the SQL template, the SQL template The association relationship between the template and the business data or the cluster analysis result of the SQL template is compared with the application user to identify abnormal behavior.

在本发明的实施例中,所述对应用用户的数据库访问日志进行大数据采集并存储还包括:对应用用户的数据库访问日志进行大数据采集并存储;对直接使用数据库客户端工具访问数据库的日志进行大数据采集并存储。In the embodiment of the present invention, the collecting and storing the big data of the database access log of the application user further includes: collecting and storing the big data of the database access log of the application user; Logs are collected and stored for big data.

在本发明的实施例中,所述习惯特征包括空行的情况、回车换行的情况、表的编写习惯、大小写习惯和每次访问的数据量情况In the embodiment of the present invention, the habit features include blank lines, carriage return and line feed, table writing habit, capitalization habit, and data amount of each access

在本发明的实施例中,所述通过所述SQL模板的聚类分析结果与应用用户进行比对识别异常行为包括:In an embodiment of the present invention, identifying abnormal behaviors by comparing the cluster analysis result of the SQL template with the application user includes:

针对应用用户,将所述SQL模板通过所述K-Means聚类算法得到的分析结果与应用账号所在部门进行比对,识别出部门中明显有偏离度的用户行为,得出所述应用用户的异常访问行为。For the application user, compare the analysis result obtained by the SQL template through the K-Means clustering algorithm with the department where the application account is located, identify user behaviors with obvious deviations in the department, and obtain the application user's Abnormal access behavior.

在本发明的实施例中,所述通过所述SQL模板的聚类分析结果与应用用户进行比对识别异常行为还包括:In an embodiment of the present invention, the identifying abnormal behavior by comparing the cluster analysis result of the SQL template with the application user further includes:

所述数据库访问日志限定在应用系统本身的SQL模板中,一旦出现新的SQL模板,则视为有SQL攻击的存在。The database access log is limited to the SQL template of the application system itself, and once a new SQL template appears, it is considered that there is an SQL attack.

在本发明的实施例中,通过所述SQL模板和所述业务数据的关联关系识别异常行为包括:In an embodiment of the present invention, identifying the abnormal behavior through the association relationship between the SQL template and the business data includes:

通过所述SQL模板和所述业务数据的所述关联关系识别异常行为,对所述应用用户的操作行为进行画像分析,从而识别出所述应用用户的异常访问行为。The abnormal behavior is identified through the association relationship between the SQL template and the business data, and a profile analysis is performed on the operation behavior of the application user, so as to identify the abnormal access behavior of the application user.

在本发明的实施例中,所述通过所述SQL模板的习惯特征识别异常行为包括:In an embodiment of the present invention, the identifying abnormal behavior through the habitual feature of the SQL template includes:

针对直接采用客户端连接数据库账户的客户端用户,通过对每一个客户端用户分析所述SQL模板,对所述SQL模板提取特征进行分析,通过所述模型识别出异常行为。For the client users who directly use the client to connect to the database account, by analyzing the SQL template for each client user, and analyzing the extracted features of the SQL template, abnormal behaviors are identified through the model.

本发明实施例还提供了一种对用户行为进行异常识别的装置,该装置包括:数据采集模块:用于对数据库访问日志进行大数据采集并存储;数据抽取模块:用于将所述数据库访问日志中包含的SQL语句抽取出SQL模板和业务数据;数据分析模块:用于对抽取出的所述SQL模板使用K-Means聚类算法进行分析;关系构造模块:用于对抽取出的所述业务数据构造知识图谱,构建所述SQL模板和所述业务数据的关联关系,并将所述关联关系保存到分析库中;特征提取模块:用于对抽取出的所述SQL模板进行习惯特征提取分析,建立用户特征模型;行为识别模块:用于通过所述SQL模板的习惯特征、所述SQL模板和所述业务数据的关联关系或所述SQL模板的聚类分析结果与应用用户进行比对识别异常行为。The embodiment of the present invention also provides a device for abnormal identification of user behavior, the device includes: a data collection module: used to collect and store big data on database access logs; a data extraction module: used to access the database The SQL statement contained in the log extracts the SQL template and business data; the data analysis module is used to analyze the extracted SQL template using the K-Means clustering algorithm; the relationship construction module is used to analyze the extracted SQL template. The business data constructs a knowledge graph, constructs the association relationship between the SQL template and the business data, and saves the association relationship in the analysis library; feature extraction module: used to extract the habitual feature of the extracted SQL template Analysis to establish a user feature model; Behavior recognition module: used to compare with the application user through the habitual feature of the SQL template, the relationship between the SQL template and the business data or the cluster analysis result of the SQL template Identify abnormal behavior.

本发明实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的方法。An embodiment of the present invention further provides an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a program executable by the at least one processor. Instructions executed by the at least one processor to enable the at least one processor to perform the method as described above.

本发明实施例还提供了一种非易失性计算机存储介质,其特征在于,所述计算机存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行,可使得所述一个或多个处理器执行如上所述的一种对用户行为进行异常识别的方法。An embodiment of the present invention further provides a non-volatile computer storage medium, characterized in that, the computer storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more processors to cause all The one or more processors execute the above-mentioned method for abnormal identification of user behavior.

本发明实施方式的有益效果是:区别于现有技术的情况,本发明实施方式提供一种对用户行为进行异常识别的方法及其装置,该方法包括进行大数据采集并存储、抽取SQL模板和业务数据、对SQL模板使用K-Means聚类算法进行分析、构建SQL模板与业务数据的关联关系、对SQL模板进行特征提取并建立特征模型、根据聚类算法分析结果、关联关系和特征模型识别异常行为。通过上述方式,本发明实施方式能够提炼SQL模板、SQL数据并进行机器学习、知识图谱的学习和挖掘,最终无须配置其他规则,识别出未知的风险和识别出深层次的攻击行为。The beneficial effects of the embodiments of the present invention are: different from the situation in the prior art, the embodiments of the present invention provide a method and a device for abnormal identification of user behavior, the method includes collecting and storing big data, extracting SQL templates and Business data, use K-Means clustering algorithm to analyze SQL templates, build the relationship between SQL templates and business data, extract features from SQL templates and establish feature models, analyze results based on clustering algorithms, and identify association relationships and feature models Abnormal behavior. In the above manner, the embodiments of the present invention can extract SQL templates and SQL data, and perform machine learning and knowledge graph learning and mining. Finally, no other rules need to be configured, and unknown risks and deep-level attack behaviors can be identified.

附图说明Description of drawings

图1是本发明实施例提供的一种对用户行为进行异常识别的方法的流程示意图;1 is a schematic flowchart of a method for abnormally identifying user behavior provided by an embodiment of the present invention;

图2是本发明实施例提供的一种对用户行为进行异常识别的装置的结构示意图;2 is a schematic structural diagram of a device for abnormally identifying user behavior provided by an embodiment of the present invention;

图3是本发明实施例提供的一种电子设备的硬件结构示意图。FIG. 3 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present invention.

以下为附图说明:The following is a description of the drawings:

10:对用户行为进行异常识别的装置;10: A device for abnormal identification of user behavior;

100:数据采集模块;200:数据抽取模块;300:数据分析模块;400:关系构造模块;500:特征提取模块;600:行为识别模块;100: data acquisition module; 200: data extraction module; 300: data analysis module; 400: relation construction module; 500: feature extraction module; 600: behavior recognition module;

700:电子设备;700: Electronic equipment;

701:处理器;702:存储器。701: processor; 702: memory.

具体实施方式Detailed ways

下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明,但不以任何形式限制本发明。应当指出的是,对本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进。这些都属于本发明的保护范围。The present invention will be described in detail below with reference to specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that, for those skilled in the art, several modifications and improvements can be made without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

需要说明的是,如果不冲突,本发明实施例中的各个特征可以相互结合,均在本申请的保护范围之内。另外,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。此外,本文所采用的“第一”、“第二”、“第三”等字样并不对数据和执行次序进行限定,仅是对功能和作用基本相同的相同项或相似项进行区分。It should be noted that, if there is no conflict, various features in the embodiments of the present invention may be combined with each other, which are all within the protection scope of the present application. In addition, although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, the modules in the device may be divided differently, or the sequence shown in the flowchart may be performed. or the described steps. In addition, the words "first", "second" and "third" used herein do not limit the data and execution order, but only distinguish the same or similar items with substantially the same function and effect.

除非另有定义,本说明书所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本说明书中在本发明的说明书中所使用的术语只是为了描述具体的实施方式的目的,不是用于限制本发明。本说明书所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the technical field of the present invention. The terms used in the description of the present invention in this specification are only for the purpose of describing specific embodiments, and are not used to limit the present invention. As used in this specification, the term "and/or" includes any and all combinations of one or more of the associated listed items.

此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

参阅图1,图1为本发明实施例提供的一种对用户行为进行异常识别的方法的流程示意图,该方法包括:Referring to FIG. 1, FIG. 1 is a schematic flowchart of a method for abnormally identifying user behavior according to an embodiment of the present invention, and the method includes:

步骤S100:进行大数据采集并存储;Step S100: collecting and storing big data;

在多事务的数据库系统中,访问数据库的每个事务有若干个操作步骤,访问日志就是用于记载有关某个事务已做的某些情况,如访问操作过程中空行的情况、回车换行的情况和访问的数据量情况等等。In a multi-transaction database system, each transaction that accesses the database has several operation steps, and the access log is used to record certain situations about a transaction that has been done, such as blank lines during the access operation, carriage return and line feed. situation and the amount of data accessed, etc.

在本发明的一个实施例中,对应用用户的数据库访问日志进行大数据采集并存储,还包括运维人员和数据分析人员对直接使用数据库客户端工具访问数据库的日志进行大数据采集并存储。In one embodiment of the present invention, the database access logs of application users are collected and stored in big data, and operation and maintenance personnel and data analysts are also included in the collection and storage of big data for logs that directly use database client tools to access the database.

在本发明的一个实施例中,对访问数据进行大数据采集并存储所采集的为用户访问数据库时的SQL语句,该SQL语句包含了SQL模板和业务数据。In an embodiment of the present invention, big data collection is performed on the access data, and the collected SQL statement when the user accesses the database is stored, and the SQL statement includes an SQL template and business data.

步骤S200:抽取SQL模板和业务数据;Step S200: extracting SQL templates and business data;

在企业中,通常会按照职能进行部门的划分,因此不同部门由于职能的不同,因此访问数据库的目的是可根据部门进行区分的。因此各部门人员所使用的SQL模板以及所访问的业务数据也可根据其部门进行区分。除此之外,SQL模板数据量是固定在一定数量内的,模板形式将不会变化。因此,一般情况下不会出现新的SQL模板。In an enterprise, departments are usually divided according to functions, so different departments have different functions, so the purpose of accessing the database can be distinguished according to the department. Therefore, the SQL templates used by personnel in each department and the business data accessed can also be distinguished according to their departments. In addition, the amount of SQL template data is fixed within a certain amount, and the template form will not change. Therefore, new SQL templates do not appear in general.

例如,一个公司有财务部、研发部、销售部等职能部门,根据应用用户的职能,如财务部人员,其访问数据库主要体现在日常会计业务、发票管理和税务系统方面的工作,所获取的数据也大都为财务收支数据;研发部人员访问数据库主要体现在获取公司的产品信息和录入产品数据信息等;销售部人员访问数据库主要体现在获取公司的销售计划以及录入客户订单信息等。由于职能的不同,因此各个部门的人员访问数据库的SQL语句中含有的SQL模板和业务数据与其所在部门有关联关系。另外需要说明的是,指定部门人员并不是不会使用其他模板去访问其他数据,而是频率相较而言较低或者几乎为零。For example, a company has functional departments such as the finance department, R&D department, and sales department. According to the functions of the application users, such as personnel in the finance department, their access to the database is mainly reflected in the daily accounting business, invoice management and tax system work. Most of the data is financial revenue and expenditure data; the access to the database by the R&D department personnel is mainly reflected in obtaining the company's product information and entering product data information, etc.; the sales department personnel's access to the database is mainly reflected in obtaining the company's sales plan and entering customer order information. Due to different functions, the SQL templates and business data contained in the SQL statements that the personnel of each department access to the database are related to their departments. In addition, it should be noted that the designated department personnel do not use other templates to access other data, but the frequency is relatively low or almost zero.

在本发明的一个实施例中,将从步骤S100采集到的数据库访问日志中包含的SQL语句中抽取出SQL模板和业务数据,In one embodiment of the present invention, the SQL template and business data are extracted from the SQL statements contained in the database access log collected in step S100,

步骤S300:对SQL模板使用K-Means聚类算法进行分析;Step S300: use the K-Means clustering algorithm to analyze the SQL template;

K均值聚类算法(K-Means clustering algorithm)是一种迭代求解的聚类分析算法,聚类是一个将数据集中在某些方面相似的数据成员进行分类组织的过程,聚类就是一种发现这种内在结构的技术,聚类技术经常被称为无监督学习。K-Means clustering algorithm is an iterative clustering analysis algorithm. Clustering is a process of classifying and organizing data members that are similar in some aspects of the data set. Clustering is a discovery Techniques for this inherent structure, clustering techniques are often referred to as unsupervised learning.

由上述可知,各个部门的人员访问数据库的SQL语句中含有的SQL模板和业务数据与其所在部门有关联关系,但这种关联关系在机器中并没有具体体现,因此需要通过某些操作对SQL模板进行分类。It can be seen from the above that the SQL templates and business data contained in the SQL statements that the personnel of each department access the database are related to their departments, but this relationship is not embodied in the machine, so it is necessary to perform certain operations on the SQL templates. sort.

在本发明的一个实施例中,对由步骤S200抽取出的SQL模板按照部门数目划分为数类,如共有三个部门,则选定要聚类的类别数目为3,选择3个中心点,所述中心点即为众多样本SQL模板中选取的三个SQL模板,然后针对每个样本点,找到距离其最近的中心点,距离同一中心点最近的点为一个类,这样就完成了一次聚类。在完成聚类之前,还需判断聚类前后的样本点的类别情况是否相同,如果相同,则算法终止,若不相同,则针对每个类别的中的样本点,计算这些样本点的中心点,当作该类的新的中心点,继续针对每个样本点,找到距离其最近的中心点。In an embodiment of the present invention, the SQL template extracted in step S200 is divided into several categories according to the number of departments. If there are three departments in total, the number of categories to be clustered is selected to be 3, and 3 center points are selected. The center point is the three SQL templates selected from many sample SQL templates, and then for each sample point, find the center point closest to it, and the point closest to the same center point is a class, thus completing a clustering . Before completing the clustering, it is also necessary to judge whether the categories of the sample points before and after the clustering are the same. If they are the same, the algorithm is terminated. If they are not the same, the center points of these sample points are calculated for the sample points in each category. , as the new center point of the class, continue to find the closest center point for each sample point.

经过上述的步骤,可将抽取出的SQL模板分为三类,A类SQL模板、B类SQL模板和C类SQL模板,分别对应财务部、研发部和销售部。即财务部人员访问数据库所使用的SQL模板主要为A类SQL模板,也可能使用B、C类SQL模板,但是概率较低;研发部人员访问数据库所使用的SQL模板主要为B类SQL模板,也可能使用A、C类SQL模板,但是概率较低;销售部人员访问数据库所使用的SQL模板主要为C类SQL模板,也可能使用A、B类SQL模板,但是概率较低。After the above steps, the extracted SQL templates can be divided into three categories: A type SQL template, B type SQL template and C type SQL template, corresponding to the finance department, R&D department and sales department respectively. That is, the SQL templates used by the personnel of the finance department to access the database are mainly type A SQL templates, and may also use type B and C SQL templates, but the probability is low; the SQL templates used by the personnel of the R&D department to access the database are mainly type B SQL templates. A and C SQL templates may also be used, but the probability is low; the SQL templates used by sales staff to access the database are mainly C SQL templates, and A and B SQL templates may also be used, but the probability is low.

步骤S400:构建SQL模板与业务数据的关联关系;Step S400: build an association relationship between the SQL template and the business data;

知识图谱在图书情报界被称为知识域可视化或知识领域映射地图,是显示知识发展进程与结构关系得一系列各种不同得图形,用可视化技术描述知识资源及其载体,挖掘、分析、构建、绘制和显示知识及它们之间的相互联系。知识图谱的架构主要包括自身的逻辑结构以及体系架构,知识图谱在逻辑结构上可分为模式层与数据层两个层次,数据层主要是由一系列的事实组成,而知识将以事实为单位进行存储。模式层构建在数据层之上,主要是通过本体库来规范数据层的一系列事实表达。知识图谱的体系架构是指其构建模式的结构。Knowledge graph is called knowledge domain visualization or knowledge domain mapping map in the library and information industry. It is a series of different graphs showing the development process and structural relationship of knowledge. , map and display knowledge and the interconnections between them. The architecture of the knowledge graph mainly includes its own logical structure and architecture. The knowledge graph can be divided into two levels: the schema layer and the data layer in the logical structure. The data layer is mainly composed of a series of facts, and knowledge will be based on facts. to store. The schema layer is built on top of the data layer, mainly through the ontology library to standardize a series of fact expressions in the data layer. The architecture of the knowledge graph refers to the structure of its construction mode.

在本发明的一个实施例中,将从步骤S200抽取出的业务数据构建为知识图谱的数据层,将从步骤S200抽取出的SQL模板构建为知识图谱的模式层,建立起数据层与模式层之间的关联关系,即SQL模板与业务数据之间的关联关系。In an embodiment of the present invention, the business data extracted from step S200 is constructed as the data layer of the knowledge graph, the SQL template extracted from step S200 is constructed as the schema layer of the knowledge graph, and the data layer and the schema layer are established. The relationship between them, that is, the relationship between the SQL template and the business data.

步骤S500:对SQL模板进行特征提取,建立特征模型;Step S500: perform feature extraction on the SQL template to establish a feature model;

在本发明的一个实施例中,对于直接采用客户端连接数据库账号的运维人员或数据分析人员等而言,其进行访问数据库所使用的SQL模板的变化性比应用用户的SQL模板的变化性更大,虽然如此,由于每个人的书写习惯不同,因此可以通过对每一个客户端用户进行分析,分析其SQL模板中书写过程中空行的情况、回车换行的情况、表的编写习惯、大小写习惯以及每次访问的数据量情况,对此进行特征提取,并通过提取到的特征建立起特征模型。In an embodiment of the present invention, for the operation and maintenance personnel or data analysts who directly use the client to connect to the database account, the variability of the SQL template used for accessing the database is higher than that of the application user's SQL template. Even so, because everyone's writing habits are different, it is possible to analyze the blank lines, carriage return, line feed, table writing habits and size in the writing process of the SQL template by analyzing each client user. According to the writing habits and the amount of data accessed each time, feature extraction is performed on this, and a feature model is established through the extracted features.

步骤S600:根据聚类算法分析结果、关联关系和特征模型识别异常行为。Step S600: Identify abnormal behaviors according to clustering algorithm analysis results, association relationships and feature models.

在本发明的一个实施例中,可以通过SQL模板的聚类分析结果与应用用户进行对比识别异常行为。可识别出部门中明显有偏离度的用户行为,如财务部主要使用的SQL模板为A类SQL模板,而有些用户使用了B类SQL模板,有些则没有使用,有些则使用的很频繁,因此可以分析出该部门有些用户存在异常行为,可对其进行重点观察。同时,由于访问数据库日志是限定在应用系统本身的SQL模板中的,SQL模板数据量是固定在一定数量的。因此,一旦出现新的SQL模板,则可以判定有SQL攻击的存在。In one embodiment of the present invention, the abnormal behavior can be identified by comparing the cluster analysis result of the SQL template with the application user. User behaviors with obvious deviations in the department can be identified. For example, the SQL template mainly used by the finance department is the A-type SQL template, while some users use the B-type SQL template, some do not use it, and some use it very frequently. Therefore, It can be analyzed that some users in this department have abnormal behaviors, which can be observed. At the same time, since the access database log is limited to the SQL template of the application system itself, the data volume of the SQL template is fixed at a certain amount. Therefore, once a new SQL template appears, it can be determined that there is an SQL attack.

其次,还可通过SQL模板和业务数据所构建的关联关系识别异常行为,例如使用A类模板的财务部人员主要访问的数据为财务收支数据,即B类模板与财务收支数据建立起了关联关系,当财务部人员使用B类模板访问财务收支数据,则该行为为正常行为。假如财务部中某用户访问数据库的特点是查看研发资料等,通过知识图谱的关联关系对该用户的操作行为进行分析,可得出该用户的异常行为。Secondly, abnormal behaviors can also be identified through the relationship between SQL templates and business data. For example, the data accessed by the finance department personnel who use the A-type template is mainly financial revenue and expenditure data, that is, the B-type template and the financial revenue and expenditure data are established. Association relationship, when the personnel of the finance department use the B-type template to access the financial income and expenditure data, the behavior is normal. If a user in the finance department accesses the database by viewing research and development materials, etc., the user's abnormal behavior can be obtained by analyzing the user's operation behavior through the association relationship of the knowledge graph.

最后,还可通过SQL模板的习惯特征识别异常行为,由于每个人都有自己的笔迹习惯,通过对每一个用户的数据库访问日志的SQL模板的分析,可提取出空行情况、回车换行的情况、表的编写情况、大小写情况、每次访问的数据量情况等特征,根据提取出的特征通过模型识别出异常行为。Finally, abnormal behaviors can also be identified through the habit characteristics of the SQL template. Since everyone has their own handwriting habits, through the analysis of the SQL template of each user's database access log, blank lines, carriage returns and line feeds can be extracted. According to the characteristics of the situation, table writing, capitalization, and the amount of data accessed each time, abnormal behaviors are identified through the model according to the extracted features.

区别于现有技术,本发明提供一种对SQL笔迹特征进行异常识别的方法,该方法包括进行大数据采集并存储、抽取SQL模板和业务数据、对SQL模板使用K-Means聚类算法进行分析、构建SQL模板与业务数据的关联关系、对SQL模板进行特征提取并建立特征模型、根据聚类算法分析结果、关联关系和特征模型识别异常行为。通过上述方式,本发明实施方式能够提炼SQL模板、SQL数据并进行机器学习、知识图谱的学习和挖掘,最终无须配置其他规则,识别出未知的风险和识别出深层次的攻击行为。Different from the prior art, the present invention provides a method for abnormal identification of SQL handwriting features. The method includes collecting and storing big data, extracting SQL templates and business data, and using K-Means clustering algorithm to analyze the SQL templates. , Construct the relationship between SQL templates and business data, extract features from SQL templates and establish feature models, and identify abnormal behaviors according to clustering algorithm analysis results, relationships and feature models. In the above manner, the embodiments of the present invention can extract SQL templates and SQL data, and perform machine learning and knowledge graph learning and mining. Finally, no other rules need to be configured, and unknown risks and deep-level attack behaviors can be identified.

请参阅图2,图2为本发明实施例提供的一种对SQL笔迹特征进行异常识别的装置,该装置包括:数据采集模块100、数据抽取模块200、数据分析模块300、关系构造模块400、特征提取模块500和行为识别模块600。Please refer to FIG. 2. FIG. 2 is a device for abnormally identifying SQL handwriting features provided by an embodiment of the present invention. The device includes: a data acquisition module 100, a data extraction module 200, a data analysis module 300, a relationship construction module 400, Feature extraction module 500 and behavior recognition module 600 .

其中,数据采集模块100用于对应用用户的数据库访问日志进行大数据采集并存储,还包括运维人员和数据分析人员对直接使用数据库客户端工具访问数据库的日志进行大数据采集并存储进行大数据采集存储。Among them, the data collection module 100 is used to collect and store big data for database access logs of application users, and also includes operation and maintenance personnel and data analysts to collect and store big data for logs that directly use database client tools to access the database. Data collection and storage.

数据抽取模块200用于从数据采集模块100采集到的数据库访问日志中包含的SQL语句中抽取出SQL模板和业务数据。The data extraction module 200 is configured to extract SQL templates and business data from the SQL statements contained in the database access logs collected by the data collection module 100 .

数据分析模块300用于将由数据抽取模块200抽取出的SQL模板划分为数类,如共有三个部门,则选定要聚类的类别数目为3,选择3个中心点,所述中心点即为众多样本SQL模板中选取的三个SQL模板,然后针对每个样本点,找到距离其最近的中心点,距离同一中心点最近的点为一个类,这样就完成了一次聚类。在完成聚类之前,还需判断聚类前后的样本点的类别情况是否相同,如果相同,则算法终止,若不相同,则针对每个类别的中的样本点,计算这些样本点的中心点,当作该类的新的中心点,继续针对每个样本点,找到距离其最近的中心点。The data analysis module 300 is used to divide the SQL template extracted by the data extraction module 200 into several categories. If there are three departments, the number of categories to be clustered is selected as 3, and 3 center points are selected, and the center points are Three SQL templates are selected from many sample SQL templates, and then for each sample point, the closest center point is found, and the closest point to the same center point is a class, thus completing a clustering. Before completing the clustering, it is also necessary to judge whether the categories of the sample points before and after the clustering are the same. If they are the same, the algorithm is terminated. If they are not the same, the center points of these sample points are calculated for the sample points in each category. , as the new center point of the class, continue to find the closest center point for each sample point.

关系构造模块400用于将从数据抽取模块200抽取出的业务数据构建为知识图谱的数据层,将从数据抽取模块200抽取出的SQL模板构建为知识图谱的模式层,建立起数据层与模式层之间的关联关系,即SQL模板与业务数据之间的关联关系。The relationship construction module 400 is used to construct the business data extracted from the data extraction module 200 as a data layer of the knowledge graph, construct the SQL template extracted from the data extraction module 200 as a schema layer of the knowledge graph, and establish the data layer and the schema. The relationship between layers, that is, the relationship between the SQL template and the business data.

特征提取模块500用于对每一个客户端用户进行分析,分析其SQL模板中书写过程中空行的情况、回车换行的情况、表的编写习惯、大小写习惯以及每次访问的数据量情况,对此进行特征提取,并通过提取到的特征建立起特征模型。The feature extraction module 500 is used to analyze each client user, and analyzes the situation of blank lines in the writing process in its SQL template, the situation of carriage return and line feed, the writing habits of tables, capitalization habits, and the data volume of each access, Feature extraction is performed on this, and a feature model is established through the extracted features.

行为识别模块600用于通过SQL模板的聚类分析结果与应用用户进行对比识别异常行为。可识别出部门中明显有偏离度的用户行为,如财务部主要使用的SQL模板为A类SQL模板,而有些用户使用了B类SQL模板,有些则没有使用,有些则使用的很频繁,因此可以分析出该部门有些用户存在异常行为,可对其进行重点观察。同时,由于访问数据库日志是限定在应用系统本身的SQL模板中的,SQL模板数据量是固定在一定数量的。因此,一旦出现新的SQL模板,则可以判定有SQL攻击的存在。The behavior identification module 600 is configured to identify abnormal behaviors by comparing the cluster analysis result of the SQL template with the application user. User behaviors with obvious deviations in the department can be identified. For example, the SQL template mainly used by the finance department is the A-type SQL template, while some users use the B-type SQL template, some do not use it, and some use it very frequently. Therefore, It can be analyzed that some users in this department have abnormal behaviors, which can be observed. At the same time, since the access database log is limited to the SQL template of the application system itself, the data volume of the SQL template is fixed at a certain amount. Therefore, once a new SQL template appears, it can be determined that there is an SQL attack.

其次,还可用于通过SQL模板和业务数据所构建的关联关系识别异常行为,例如使用A类模板的财务部人员主要访问的数据为财务收支数据,即B类模板与财务收支数据建立起了关联关系,当财务部人员使用B类模板访问财务收支数据,则该行为为正常行为。假如财务部中某用户访问数据库的特点是查看研发资料等,通过知识图谱的关联关系对该用户的操作行为进行分析,可得出该用户的异常行为。Secondly, it can also be used to identify abnormal behaviors through the relationship between SQL templates and business data. For example, the data accessed by the finance department personnel who use the A-type template is mainly financial revenue and expenditure data, that is, the B-type template is established with the financial revenue and expenditure data. The association relationship is established. When the personnel of the finance department use the B-type template to access the financial income and expenditure data, the behavior is normal. If a user in the finance department accesses the database by viewing research and development materials, etc., the user's abnormal behavior can be obtained by analyzing the user's operation behavior through the association relationship of the knowledge graph.

最后,还可用于通过SQL模板的习惯特征识别异常行为,由于每个人都有自己的笔迹习惯,通过对每一个用户的数据库访问日志的SQL模板的分析,可提取出空行情况、回车换行的情况、表的编写情况、大小写情况、每次访问的数据量情况等特征,根据提取出的特征通过模型识别出异常行为。Finally, it can also be used to identify abnormal behaviors through the habit characteristics of SQL templates. Since everyone has their own handwriting habits, by analyzing the SQL template of each user's database access log, blank lines, carriage return and line feed can be extracted. The situation of the table, the writing situation of the table, the capitalization situation, the amount of data accessed each time and other characteristics, according to the extracted characteristics, the abnormal behavior is identified through the model.

图3是本发明实施例提供的电子设备的硬件结构示意图,如图3所示,该电子设备700包括:FIG. 3 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present invention. As shown in FIG. 3 , the electronic device 700 includes:

一个或多个处理器701以及存储器702,图3中以一个处理器701为例。One or more processors 701 and a memory 702, one processor 701 is taken as an example in FIG. 3 .

处理器701和存储器702可以通过总线或者其他方式连接,图3中以通过总线连接为例。The processor 701 and the memory 702 may be connected through a bus or in other ways, and the connection through a bus is taken as an example in FIG. 3 .

存储器702作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。处理器701通过运行存储在存储器702中的非易失性软件程序、指令以及单元,从而执行电子设备的各种功能应用以及数据处理,即实现上述方法实施例的基于图库绘制数据地图及流向梳理的方法。As a non-volatile computer-readable storage medium, the memory 702 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules. The processor 701 executes various functional applications and data processing of the electronic device by running the non-volatile software programs, instructions and units stored in the memory 702, that is, to realize the drawing of data maps based on the gallery and the sorting of the flow direction of the above method embodiments. Methods.

存储器702可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据电子设备使用所创建的数据等。此外,存储器702可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器702可选包括相对于处理器701远程设置的存储器,这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device, and the like. Additionally, memory 702 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 702 may optionally include memory located remotely from processor 701, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

所述一个或者多个单元存储在所述存储器702中,当被所述一个或者多个处理器701执行时,执行上述任意方法实施例中的一种对用户行为进行异常识别的方法,例如,执行以上描述的图1中的方法步骤S100至步骤S600,实现图2中的模块100-600的功能。The one or more units are stored in the memory 702, and when executed by the one or more processors 701, execute a method for abnormally identifying user behavior in any of the above method embodiments, for example, Steps S100 to S600 of the method in FIG. 1 described above are executed to realize the functions of the modules 100 - 600 in FIG. 2 .

上述电子设备可执行本发明实施例所提供的一种对用户行为进行异常识别的方法,具备执行方法相应的程序模块和有益效果。未在电子设备实施例中详尽描述的技术细节,可参见本发明实施例所提供的一种对用户行为进行异常识别的方法。The above electronic device can execute the method for abnormal identification of user behavior provided by the embodiment of the present invention, and has program modules and beneficial effects corresponding to the execution method. For technical details that are not described in detail in the embodiments of the electronic device, reference may be made to a method for abnormal identification of user behavior provided by the embodiments of the present invention.

本发明实施例还提供了一种非易失性计算机可读存储介质,该非易失性计算机可读存储介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述非易失性计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被执行时,实现本公开实施例的方法。Embodiments of the present invention also provide a non-volatile computer-readable storage medium, and the non-volatile computer-readable storage medium may be included in the device described in the foregoing embodiments; or may exist independently, and Not assembled into this device. The above-mentioned non-volatile computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, the methods of the embodiments of the present disclosure are implemented.

本申请实施例的电子设备以多种形式存在,包括但不限于:The electronic devices of the embodiments of the present application exist in various forms, including but not limited to:

(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。(1) Mobile communication equipment: This type of equipment is characterized by having mobile communication functions, and its main goal is to provide voice and data communication. Such terminals include: smart phones (eg iPhone), multimedia phones, feature phones, and low-end phones.

(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has the characteristics of mobile Internet access. Such terminals include: PDAs, MIDs, and UMPC devices, such as iPads.

(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。(3) Portable entertainment equipment: This type of equipment can display and play multimedia content. Such devices include: audio and video players (eg iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.

(4)服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。(4) Server: A device that provides computing services. The composition of the server includes a processor, a hard disk, a memory, a system bus, etc. The server is similar to a general computer architecture, but due to the need to provide highly reliable services, the processing power, stability , reliability, security, scalability, manageability and other aspects of high requirements.

(5)其他电子装置。(5) Other electronic devices.

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

通过以上的实施例的描述,本领域普通技术人员可以清楚地了解到各实施例可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-OnlyMemory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。From the description of the above embodiments, those of ordinary skill in the art can clearly understand that each embodiment can be implemented by means of software plus a general hardware platform, and certainly can also be implemented by hardware. Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing the relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium, and the program is During execution, it may include the processes of the embodiments of the above-mentioned methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;在本发明的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本发明的不同方面的许多其它变化,为了简明,它们没有在细节中提供;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; under the idea of the present invention, the technical features in the above embodiments or different embodiments can also be combined, The steps may be carried out in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the invention has been The skilled person should understand that it is still possible to modify the technical solutions recorded in the foregoing embodiments, or to perform equivalent replacements on some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the implementation of the application. scope of technical solutions.

Claims (10)

1.一种对用户行为进行异常识别的方法,其特征在于,包括:1. a method for abnormal identification of user behavior is characterized in that, comprising: 对数据库访问日志进行大数据采集并存储;Big data collection and storage of database access logs; 将所述数据库访问日志中包含的SQL语句抽取出SQL模板和业务数据;Extract the SQL template and business data from the SQL statement contained in the database access log; 对抽取出的所述SQL模板使用K-Means聚类算法进行分析;The extracted SQL template is analyzed using K-Means clustering algorithm; 对抽取出的所述业务数据构造知识图谱,构建所述SQL模板和所述业务数据的关联关系,并将所述关联关系保存到分析库中;Constructing a knowledge graph for the extracted business data, constructing an association relationship between the SQL template and the business data, and saving the association relationship in an analysis library; 对抽取出的所述SQL模板进行习惯特征提取分析,建立用户特征模型;Carry out habit feature extraction analysis on the extracted SQL template, and establish a user feature model; 通过所述SQL模板的习惯特征、所述SQL模板和所述业务数据的关联关系或所述SQL模板的聚类分析结果与应用用户进行比对识别异常行为。The abnormal behavior is identified by comparing the habitual feature of the SQL template, the association relationship between the SQL template and the business data, or the cluster analysis result of the SQL template and the application user. 2.根据权利要求1所述的方法,其特征在于,所述对数据库访问日志进行大数据采集并存储还包括:对应用用户的数据库访问日志进行大数据采集并存储;对直接使用数据库客户端工具访问数据库的日志进行大数据采集并存储。2. The method according to claim 1, wherein the collecting and storing the big data on the database access log further comprises: collecting and storing the big data on the database access log of the application user; The tool accesses the logs of the database for big data collection and storage. 3.根据权利要求1所述的方法,其特征在于,所述习惯特征包括空行的情况、回车换行的情况、表的编写习惯、大小写习惯和每次访问的数据量情况。3 . The method according to claim 1 , wherein the habit characteristics include blank lines, carriage return and line feed, table writing habit, capitalization habit, and data volume of each access. 4 . 4.根据权利要求1所述的方法,其特征在于,所述通过所述SQL模板的聚类分析结果与应用用户进行比对识别异常行为包括:4. method according to claim 1, is characterized in that, described by the cluster analysis result of described SQL template and application user to compare and identify abnormal behavior comprising: 针对应用用户,将所述SQL模板通过所述K-Means聚类算法得到的分析结果与应用账号所在部门进行比对,识别出部门中明显有偏离度的用户行为,得出所述应用用户的异常访问行为。For the application user, compare the analysis result obtained by the SQL template through the K-Means clustering algorithm with the department where the application account is located, identify user behaviors with obvious deviations in the department, and obtain the application user's Abnormal access behavior. 5.根据权利要求4所述的方法,其特征在于,所述通过所述SQL模板的聚类分析结果与应用用户进行比对识别异常行为还包括:5. method according to claim 4, is characterized in that, described by the cluster analysis result of described SQL template and application user to compare and identify abnormal behavior and also comprise: 所述数据库访问日志限定在应用系统本身的SQL模板中,一旦出现新的SQL模板,则视为有SQL攻击的存在。The database access log is limited to the SQL template of the application system itself, and once a new SQL template appears, it is considered that there is an SQL attack. 6.根据权利要求1所述的方法,其特征在于,通过所述SQL模板和所述业务数据的关联关系识别异常行为包括:6. The method according to claim 1, wherein identifying abnormal behavior through the association relationship between the SQL template and the business data comprises: 通过所述SQL模板和所述业务数据的所述关联关系识别异常行为,对所述应用用户的操作行为进行画像分析,从而识别出所述应用用户的异常访问行为。The abnormal behavior is identified through the association relationship between the SQL template and the business data, and a profile analysis is performed on the operation behavior of the application user, so as to identify the abnormal access behavior of the application user. 7.根据权利要求1所述的方法,其特征在于,所述通过所述SQL模板的习惯特征识别异常行为包括:7. The method according to claim 1, wherein the identifying abnormal behavior by the habitual feature of the SQL template comprises: 针对直接采用客户端连接数据库账户的客户端用户,通过对每一个客户端用户分析所述SQL模板,对所述SQL模板提取特征进行分析,通过所述模型识别出异常行为。For the client users who directly use the client to connect to the database account, by analyzing the SQL template for each client user, and analyzing the extracted features of the SQL template, abnormal behaviors are identified through the model. 8.一种对用户行为进行异常识别的装置,其特征在于,包括:8. A device for abnormal identification of user behavior, characterized in that, comprising: 数据采集模块:用于对数据库访问日志进行大数据采集并存储;Data collection module: used for big data collection and storage of database access logs; 数据抽取模块:用于将所述数据库访问日志中包含的SQL语句抽取出SQL模板和业务数据;Data extraction module: for extracting SQL templates and business data from the SQL statements contained in the database access log; 数据分析模块:用于对抽取出的所述SQL模板使用K-Means聚类算法进行分析;Data analysis module: used to analyze the extracted SQL template using K-Means clustering algorithm; 关系构造模块:用于对抽取出的所述业务数据构造知识图谱,构建所述SQL模板和所述业务数据的关联关系,并将所述关联关系保存到分析库中;Relationship construction module: used to construct a knowledge graph for the extracted business data, construct an association relationship between the SQL template and the business data, and save the association relationship in an analysis library; 特征提取模块:用于对抽取出的所述SQL模板进行习惯特征提取分析,建立用户特征模型;Feature extraction module: used to extract and analyze the habitual feature of the extracted SQL template, and establish a user feature model; 行为识别模块:用于通过所述SQL模板的习惯特征、所述SQL模板和所述业务数据的关联关系或所述SQL模板的聚类分析结果与应用用户进行比对识别异常行为。Behavior identification module: used to identify abnormal behaviors by comparing the habitual characteristics of the SQL template, the association relationship between the SQL template and the business data, or the cluster analysis result of the SQL template with the application user. 9.一种电子设备,其特征在于,包括:9. An electronic device, characterized in that, comprising: 至少一个处理器;以及,at least one processor; and, 与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein, 所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7的任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-7 Methods. 10.一种非易失性计算机存储介质,其特征在于,所述计算机存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行,可使得所述一个或多个处理器执行权利要求1至7任意一项所述的一种对用户行为进行异常识别的方法。10. A non-volatile computer storage medium, wherein the computer storage medium stores computer-executable instructions that are executed by one or more processors to cause the one or more The processor executes the method for abnormal identification of user behavior according to any one of claims 1 to 7.
CN202210130017.3A 2022-02-11 2022-02-11 Method and device for carrying out abnormity identification on user behavior Pending CN114640499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210130017.3A CN114640499A (en) 2022-02-11 2022-02-11 Method and device for carrying out abnormity identification on user behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210130017.3A CN114640499A (en) 2022-02-11 2022-02-11 Method and device for carrying out abnormity identification on user behavior

Publications (1)

Publication Number Publication Date
CN114640499A true CN114640499A (en) 2022-06-17

Family

ID=81946286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210130017.3A Pending CN114640499A (en) 2022-02-11 2022-02-11 Method and device for carrying out abnormity identification on user behavior

Country Status (1)

Country Link
CN (1) CN114640499A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269636A (en) * 2022-08-05 2022-11-01 中孚信息股份有限公司 A user classification method and system based on behavior word embedding
CN115587132A (en) * 2022-11-11 2023-01-10 北京中安星云软件技术有限公司 Method and system for identifying abnormal access of database based on session clustering
CN116055119A (en) * 2022-12-19 2023-05-02 中通服创发科技有限责任公司 Method and device for identifying fraudulent users based on traffic data
CN116684202A (en) * 2023-08-01 2023-09-01 光谷技术有限公司 Internet of things information security transmission method
CN118709175A (en) * 2024-08-29 2024-09-27 杭州海康威视数字技术股份有限公司 Cryptographic service protection method and system based on dynamic defense

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457405A (en) * 2019-08-20 2019-11-15 上海观安信息技术股份有限公司 A kind of database audit method based on genetic connection
CN110874469A (en) * 2018-09-04 2020-03-10 广州视源电子科技股份有限公司 Database high-risk operation detection method, device, computer equipment and storage medium
CN111291070A (en) * 2020-01-20 2020-06-16 南京星环智能科技有限公司 Abnormal SQL detection method, equipment and medium
CN111488590A (en) * 2020-05-29 2020-08-04 深圳易嘉恩科技有限公司 SQ L injection detection method based on user behavior credibility analysis
US11023607B1 (en) * 2020-04-03 2021-06-01 Imperva, Inc. Detecting behavioral anomalies in user-data access logs
CN113505371A (en) * 2021-08-06 2021-10-15 四川大学 Database security risk assessment system
CN113515955A (en) * 2021-04-26 2021-10-19 太极计算机股份有限公司 Semantic understanding-based online translation system and method from text sequence to instruction sequence
CN113672977A (en) * 2021-08-13 2021-11-19 支付宝(杭州)信息技术有限公司 Privacy data processing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874469A (en) * 2018-09-04 2020-03-10 广州视源电子科技股份有限公司 Database high-risk operation detection method, device, computer equipment and storage medium
CN110457405A (en) * 2019-08-20 2019-11-15 上海观安信息技术股份有限公司 A kind of database audit method based on genetic connection
CN111291070A (en) * 2020-01-20 2020-06-16 南京星环智能科技有限公司 Abnormal SQL detection method, equipment and medium
US11023607B1 (en) * 2020-04-03 2021-06-01 Imperva, Inc. Detecting behavioral anomalies in user-data access logs
CN111488590A (en) * 2020-05-29 2020-08-04 深圳易嘉恩科技有限公司 SQ L injection detection method based on user behavior credibility analysis
CN113515955A (en) * 2021-04-26 2021-10-19 太极计算机股份有限公司 Semantic understanding-based online translation system and method from text sequence to instruction sequence
CN113505371A (en) * 2021-08-06 2021-10-15 四川大学 Database security risk assessment system
CN113672977A (en) * 2021-08-13 2021-11-19 支付宝(杭州)信息技术有限公司 Privacy data processing method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
戴华,秦小麟,刘亮,柏传杰: "基于OCAR 挖掘的数据库异常检测模型", 《通信学报》, vol. 30, no. 9, 25 September 2009 (2009-09-25), pages 8 *
段西强;乔赛;: "一种基于RBAC的数据库入侵检测方法", 泰山学院学报, no. 06, 25 November 2010 (2010-11-25) *
蒋梦丹;林宏刚;曹鹤鸣;: "基于业务逻辑思想的异常检测研究", 成都信息工程大学学报, no. 02, 15 April 2019 (2019-04-15) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269636A (en) * 2022-08-05 2022-11-01 中孚信息股份有限公司 A user classification method and system based on behavior word embedding
CN115587132A (en) * 2022-11-11 2023-01-10 北京中安星云软件技术有限公司 Method and system for identifying abnormal access of database based on session clustering
CN115587132B (en) * 2022-11-11 2023-03-10 北京中安星云软件技术有限公司 Method and system for identifying abnormal access of database based on session clustering
CN116055119A (en) * 2022-12-19 2023-05-02 中通服创发科技有限责任公司 Method and device for identifying fraudulent users based on traffic data
CN116684202A (en) * 2023-08-01 2023-09-01 光谷技术有限公司 Internet of things information security transmission method
CN116684202B (en) * 2023-08-01 2023-10-24 光谷技术有限公司 Internet of things information security transmission method
CN118709175A (en) * 2024-08-29 2024-09-27 杭州海康威视数字技术股份有限公司 Cryptographic service protection method and system based on dynamic defense
CN118709175B (en) * 2024-08-29 2024-11-22 杭州海康威视数字技术股份有限公司 Cryptographic service protection method and system based on dynamic defense

Similar Documents

Publication Publication Date Title
US11949747B2 (en) Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment
Wang et al. Representing fine-grained co-occurrences for behavior-based fraud detection in online payment services
CN114640499A (en) Method and device for carrying out abnormity identification on user behavior
US20200081899A1 (en) Automated database schema matching
US11455364B2 (en) Clustering web page addresses for website analysis
CA2845743A1 (en) Resolving similar entities from a transaction database
CN111833182B (en) Method and device for identifying risk object
CN110570199B (en) User identity detection method and system based on user input behaviors
CN110781229A (en) System and method for entity network analysis
CN110929525A (en) An online loan risk behavior analysis and detection method, device, equipment and storage medium
CN111639910A (en) Standing book generation method, device, equipment and storage medium
CN111177653A (en) Credit assessment method and device
Sharma et al. Importance of Big Data in financial fraud detection
CN113094595A (en) Object recognition method, device, computer system and readable storage medium
US20230060127A1 (en) Techniques to generate and store graph models from structured and unstructured data in a cloud-based graph database system
KR102710397B1 (en) Apparatus and method for analysis of transaction brief data using corpus for machine learning based on financial mydata and computer program for the same
CN111427883A (en) AeroSpike-based data processing method, device, computer equipment and storage medium
CN117435643A (en) A method, device, equipment and medium for mining related words of high-risk vulnerability components
CN116955751A (en) Crawler identification method, crawler identification device, computer equipment and storage medium
CN115033880B (en) A computer software management system based on the Internet
CN115205025A (en) Risk account identification method and device, computer equipment and storage medium
Zhong et al. Legal supervision mechanism of recommendation algorithm based on intelligent data recognition
CN111507368B (en) Campus network intrusion detection method and system
CN114549179A (en) Method, device, storage medium and processor for generating risk list
Xiong A method of mining key accounts from internet pyramid selling data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220617

RJ01 Rejection of invention patent application after publication