CN107544979A - The credibility Analysis method and system of user data - Google Patents

The credibility Analysis method and system of user data Download PDF

Info

Publication number
CN107544979A
CN107544979A CN201610474402.4A CN201610474402A CN107544979A CN 107544979 A CN107544979 A CN 107544979A CN 201610474402 A CN201610474402 A CN 201610474402A CN 107544979 A CN107544979 A CN 107544979A
Authority
CN
China
Prior art keywords
user data
user
analyzed
data source
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610474402.4A
Other languages
Chinese (zh)
Inventor
于秋林
陈尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201610474402.4A priority Critical patent/CN107544979A/en
Publication of CN107544979A publication Critical patent/CN107544979A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种用户数据的可信性分析方法及系统,该方法包括:本发明实施例提供了一种用户数据的可信性分析方法,包括:S10:服务器获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据;S11:将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率;及S12:根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。本发明可以提高用户数据分析的准确率及效率。

The present invention discloses a method and system for analyzing the credibility of user data. The method includes: an embodiment of the present invention provides a method for analyzing the credibility of user data, including: S10: The server acquires a preset number of user samples data, and obtain multiple field data corresponding to the user sample data; S11: Match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and count the data to be analyzed The matching rate of each field data of the user data source and the user sample data; and S12: According to the statistical matching rate of each field data of the user data source to be analyzed, and according to the preset analysis rules, determine whether the user data source to be analyzed is acceptable The user data source for the letter. The invention can improve the accuracy and efficiency of user data analysis.

Description

用户数据的可信性分析方法及系统User data credibility analysis method and system

技术领域technical field

本发明涉及用户数据处理的技术领域,尤其涉及一种用户数据的可信性分析方法及系统。The present invention relates to the technical field of user data processing, in particular to a method and system for analyzing the credibility of user data.

背景技术Background technique

目前,在对用户数据进行大数据分析及应用的过程中,需要从多个用户数据源获取多类型用户数据(例如,从X1公司获取用户的银行业务数据;从X2公司获取用户的寿险业务数据;从X3公司获取用户的车险数据;从X4公司获取用户的消费业务数据;从X5公司获取用户的通信业务数据等等),然而,现行的对特定用户数据源的用户数据的真实性评估主要依赖专家经验、抽查验证,对其中不可信的用户数据没有辨识能力。比如,A用户数据源的用户数据中部分用户姓名填写为“某某”,B用户数据源的用户数据中部分用户姓名填写为“先生”,通过现有方式无法辨识出A用户数据源和B用户数据源的用户姓名是否可信。因此,如何对特定用户数据源的用户数据可信度进行准确分析已经成为一个亟待解决的技术问题。At present, in the process of big data analysis and application of user data, it is necessary to obtain multiple types of user data from multiple user data sources (for example, obtain user banking business data from X1 company; obtain user life insurance business data from X2 company ; Obtain the user's auto insurance data from X3 company; obtain the user's consumption business data from X4 company; obtain the user's communication business data from X5 company, etc.), however, the current authenticity assessment of user data from specific user data sources mainly Relying on expert experience and spot check verification, there is no ability to identify untrustworthy user data. For example, some user names in the user data of user A data source are filled in as "so-and-so", and some user names in the user data of user B data source are filled in as "Mr". Whether the user name of the user data source is trusted. Therefore, how to accurately analyze the user data credibility of a specific user data source has become an urgent technical problem to be solved.

发明内容Contents of the invention

本发明提供一种用户数据的可信性分析方法及系统,以解决现有用户数据的可信性无法准确分析的问题。The present invention provides a method and system for analyzing the credibility of user data to solve the problem that the credibility of existing user data cannot be accurately analyzed.

第一方面,本发明提供了一种用户数据的可信性分析方法,包括:In a first aspect, the present invention provides a method for analyzing the credibility of user data, including:

S10:服务器获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据;S10: The server obtains a preset number of user sample data, and obtains multiple field data corresponding to the user sample data;

S11:将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率;及S11: Match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the matching rate of each field data of the user data source to be analyzed and the user sample data; and

S12:根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。S12: Determine whether the user data source to be analyzed is a credible user data source according to the statistical matching rate of each field data of the user data source to be analyzed and according to preset analysis rules.

第二方面,本发明提供了一种用户数据的可信性分析系统,包括:In a second aspect, the present invention provides a system for analyzing the credibility of user data, including:

获取模块,用于获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据;An acquisition module, configured to acquire a preset number of user sample data, and acquire a plurality of field data corresponding to the user sample data;

分析模块,用于将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率;及The analysis module is used to match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the difference between each field data of the user data source to be analyzed and the user sample data match rate; and

确定模块,用于根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。The determining module is configured to determine whether the user data source to be analyzed is a credible user data source according to the statistical matching rate of each field data of the user data source to be analyzed and according to preset analysis rules.

本发明提供了一种用户数据的可信性分析方法及系统,该方法包括:S10:服务器获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据;S11:将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率;及S12:根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。本发明实施例的技术方案可以将待分析用户数据源与用户样本数据进行字段的逐一匹配以实现自动针对所述待分析用户数据源的可信性进行准确分析,从而提高数据分析的准确率及效率。。The present invention provides a method and system for analyzing the credibility of user data. The method includes: S10: the server obtains a preset number of user sample data, and obtains a plurality of field data corresponding to the user sample data; Analyze each field data of the user data of the user data source and match the corresponding field data of the user sample data one by one, and calculate the matching rate of each field data of the user data source to be analyzed and the user sample data; and S12: According to Calculate the matching rate of each field data of the user data source to be analyzed, and determine whether the user data source to be analyzed is a credible user data source according to the preset analysis rules. The technical solution of the embodiment of the present invention can match the user data source to be analyzed with the user sample data field by field one by one to realize automatic and accurate analysis of the credibility of the user data source to be analyzed, thereby improving the accuracy of data analysis and efficiency. .

附图说明Description of drawings

图1为本发明实施例一提供的一种用户数据的可信性分析方法的流程示意图;FIG. 1 is a schematic flowchart of a user data credibility analysis method provided by Embodiment 1 of the present invention;

图2为本发明实施例二提供的一种用户数据的可信性分析方法的流程示意图;FIG. 2 is a schematic flowchart of a method for analyzing the credibility of user data provided by Embodiment 2 of the present invention;

图3为本发明实施例三提供的一种用户数据的可信性分析方法的流程示意图;FIG. 3 is a schematic flowchart of a user data credibility analysis method provided by Embodiment 3 of the present invention;

图4为本发明实施四提供的一种用户数据的可信性分析系统的结构示意图。FIG. 4 is a schematic structural diagram of a user data credibility analysis system provided by Embodiment 4 of the present invention.

具体实施方式detailed description

下面结合附图并通过具体实施方式来进一步说明本发明的技术方案。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and through specific implementation methods. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

实施例一Embodiment one

图1为本发明实施例一提供的一种用户数据的可信性分析方法流程示意图,该方法可以由用户数据的可信性分析系统执行,其中该用户数据的可信性分析系统可由软件和/或硬件实现,一般可集成在服务器中。Fig. 1 is a schematic flowchart of a user data credibility analysis method provided in Embodiment 1 of the present invention, the method can be executed by a user data credibility analysis system, wherein the user data credibility analysis system can be implemented by software and /or hardware implementation, generally can be integrated in the server.

参见图1,本实施例的方法包括如下步骤:Referring to Fig. 1, the method of the present embodiment comprises the steps:

S10:服务器获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据。S10: The server obtains a preset number of user sample data, and obtains a plurality of field data corresponding to the user sample data.

具体的,所述服务器可以与多个数据库连接,所述服务器可以从多个数据库中获取用户数据。其中每个数据库可视为一个数据源。Specifically, the server may be connected to multiple databases, and the server may acquire user data from multiple databases. Each of these databases can be considered a data source.

所述预设数量可以根据实际情况进行设置,例如10万个。所述多个字段数据包括:姓名、身份证号、年龄、住址、职业、收入、办公地址、存款额等中任一种或几种组合。The preset number can be set according to actual conditions, for example, 100,000. The multiple field data include: any one or a combination of name, ID number, age, address, occupation, income, office address, deposit amount, etc.

S11:将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率。S11: Match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the matching rate of each field data of the user data source to be analyzed and the user sample data.

具体的,将待分析的用户数据的各个字段数据与用户样本数据对应字段数据逐一匹配,若用户数据中的字段数据与样本数据中对应字段数据一致,则表示该字段相匹配,并统计出待分析用户数据源各个字段数据在样本数据的匹配率。例如,100个用户样本数据对应100个姓名字段数据,100个姓名字段数据在Z1用户数据源中若有99个字段数据相匹配,则代表100个用户样本数据的姓名字段数据在Z1用户数据源的匹配率为99%。Specifically, each field data of the user data to be analyzed is matched with the corresponding field data of the user sample data one by one. If the field data in the user data is consistent with the corresponding field data in the sample data, it means that the field matches, and the statistics to Analyze the matching rate of each field data of the user data source in the sample data. For example, 100 user sample data correspond to 100 name field data, and if 100 name field data match 99 field data in the Z1 user data source, it means that the name field data of the 100 user sample data are in the Z1 user data source The match rate is 99%.

S12:根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。S12: Determine whether the user data source to be analyzed is a credible user data source according to the statistical matching rate of each field data of the user data source to be analyzed and according to preset analysis rules.

具体的,当统计出待分析用户数据源的各个字段数据在样本数据的匹配率后,则按照预设的分析规则确定出该待分析用户数据源是否可信。Specifically, after calculating the matching rate of each field data of the user data source to be analyzed in the sample data, it is determined whether the user data source to be analyzed is credible according to the preset analysis rules.

本发明实施例的技术方案,通过服务器获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据;将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率;根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。本发明实施例的技术方案可以将待分析用户数据源与用户样本数据进行字段的逐一匹配以实现自动针对所述待分析用户数据源的可信性进行准确分析,从而提高数据分析的准确率及效率。According to the technical solution of the embodiment of the present invention, the server obtains a preset number of user sample data, and obtains a plurality of field data corresponding to the user sample data; Match the corresponding field data of the data one by one, and calculate the matching rate of each field data of the user data source to be analyzed and the user sample data; according to the statistics of the matching rate of each field data of the user data source to be analyzed, and according to the preset The analysis rule determines whether the user data source to be analyzed is a credible user data source. The technical solution of the embodiment of the present invention can match the user data source to be analyzed with the user sample data field by field one by one to realize automatic and accurate analysis of the credibility of the user data source to be analyzed, thereby improving the accuracy of data analysis and efficiency.

实施例二Embodiment two

图2是为本发明实施例二提供的一种用户数据的可信性分析方法流程示意图。以实施例一为基础,将预设的分析规则作进一步优化,以提高将用户数据的可信性分析的效率。FIG. 2 is a schematic flowchart of a method for analyzing the credibility of user data provided by Embodiment 2 of the present invention. Based on the first embodiment, the preset analysis rules are further optimized to improve the efficiency of analyzing the credibility of user data.

S20:服务器获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据。S20: The server obtains a preset number of user sample data, and obtains a plurality of field data corresponding to the user sample data.

具体的,所述服务器可以与多个数据库连接,所述服务器可以从多个数据库中获取用户数据。其中每个数据库可视为一个数据源。Specifically, the server may be connected to multiple databases, and the server may acquire user data from multiple databases. Each of these databases can be considered a data source.

所述预设数量可以根据实际情况进行设置,例如10万个。所述多个字段数据包括:姓名、身份证号、年龄、住址、职业、收入、办公地址、存款额等。The preset number can be set according to actual conditions, for example, 100,000. The multiple field data include: name, ID number, age, address, occupation, income, office address, deposit amount, and the like.

S21:将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率。S21: Match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the matching rate of each field data of the user data source to be analyzed and the user sample data.

S22:确定出待分析用户数据源中的匹配率大于预设匹配率的字段,并统计出待分析用户数据源中的匹配率大于预设匹配率的字段数量;若所述统计出的字段数量大于预设数量,则确定该待分析用户数据源为可信的用户数据源并添加可信标识。S22: Determine the fields in the user data source to be analyzed whose matching rate is greater than the preset matching rate, and count the number of fields in the user data source to be analyzed whose matching rate is greater than the preset matching rate; is greater than the preset number, then determine that the user data source to be analyzed is a credible user data source and add a credible mark.

具体的,在本实施例中,可针对每一字段的匹配率预设一统一数值,例如,99%;也可以针对每一字段的匹配率预设不同的数值,例如,针对“姓名”字段设置匹配率99%,针对“身份证号”字段设置匹配率98%等。Specifically, in this embodiment, a uniform value can be preset for the matching rate of each field, for example, 99%; different values can also be preset for the matching rate of each field, for example, for the "name" field Set the matching rate to 99%, and set the matching rate to 98% for the "ID card number" field, etc.

在统计出待分析用户数据源中的匹配率大于预设匹配率的字段数据,当所述统计出的字段数量大于预设数量(例如,10个)时,确定该待分析用户数据为可信的用户数据源。After counting the field data whose matching rate in the user data source to be analyzed is greater than the preset matching rate, when the number of fields counted is greater than the preset number (for example, 10), it is determined that the user data to be analyzed is credible source of user data.

进一步的,本实施例可以为该可信的用户数据源增加可信标识;同时针对那些不可信的用户数据源也可增加不可信标识。Further, in this embodiment, a credible identifier can be added to the credible user data source; at the same time, an untrustworthy identifier can also be added to those untrustworthy user data sources.

实施例三Embodiment three

图3是为本发明实施例三提供的一种用户数据的可信性分析方法流程示意图。以实施例一为基础,将预设的分析规则作进一步优化,以提高将用户数据的可信性分析的效率。FIG. 3 is a schematic flowchart of a method for analyzing the credibility of user data provided by Embodiment 3 of the present invention. Based on the first embodiment, the preset analysis rules are further optimized to improve the efficiency of analyzing the credibility of user data.

S30:服务器获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据。S30: The server obtains a preset number of user sample data, and obtains a plurality of field data corresponding to the user sample data.

S31:将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率。S31: Match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the matching rate of each field data of the user data source to be analyzed and the user sample data.

S32:确定出待分析用户数据源中的匹配率大于预设匹配率的字段;分析待分析用户数据源对应的确定出的字段是否包含所有预先确定的关键字段;若包含预先确定的关键字段,则确定该待分析用户数据源为可信的用户数据源并添加可信标识。S32: Determine the field whose matching rate in the user data source to be analyzed is greater than the preset matching rate; analyze whether the determined field corresponding to the user data source to be analyzed contains all predetermined key fields; if it contains predetermined keywords section, then determine that the user data source to be analyzed is a trusted user data source and add a trusted identifier.

具体的,在本实施例中,可针对每一字段的匹配率预设一统一数值,例如,99%;也可以针对每一字段的匹配率预设不同的数值,例如,针对“姓名”字段设置匹配率99%,针对“身份证号”字段设置匹配率98%等。Specifically, in this embodiment, a uniform value can be preset for the matching rate of each field, for example, 99%; different values can also be preset for the matching rate of each field, for example, for the "name" field Set the matching rate to 99%, and set the matching rate to 98% for the "ID card number" field, etc.

在统计出待分析用户数据源中的匹配率大于预设匹配率的字段数据,待分析用户数据源对应的确定出的字段是否包含所有预先确写的关键字段时,确定该待分析用户数据为可信的用户数据源。所述关键字段可以为:例如,姓名及/或住址等。When the field data in the user data source to be analyzed whose matching rate is greater than the preset matching rate is counted, and whether the determined field corresponding to the user data source to be analyzed contains all the pre-written key fields, determine the user data to be analyzed as a trusted source of user data. The key field may be: for example, name and/or address, etc.

进一步的,本实施例可以为该可信的用户数据源增加可信标识;同时针对那些不可信的用户数据源也可增加不可信标识。Further, in this embodiment, a credible identifier can be added to the credible user data source; at the same time, an untrustworthy identifier can also be added to those untrustworthy user data sources.

实施例四Embodiment four

图4为本发明实施例四提供的一种用户数据的可信性分析系统的结构示意图。所述用户数据的可信性分析系统应用于服务器中以进行用户数据可信性的分析。FIG. 4 is a schematic structural diagram of a user data credibility analysis system provided by Embodiment 4 of the present invention. The user data credibility analysis system is applied in a server to analyze the user data credibility.

本实施例的系统具体包括:获取模块40、分析模块41及确定模块42。The system of this embodiment specifically includes: an acquisition module 40 , an analysis module 41 and a determination module 42 .

所述获取模块40,用于获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据。The acquiring module 40 is configured to acquire a preset number of user sample data, and acquire a plurality of field data corresponding to the user sample data.

具体的,所述服务器可以与多个数据库连接,所述服务器可以从多个数据库中获取用户数据。其中每个数据库可视为一个数据源。Specifically, the server may be connected to multiple databases, and the server may acquire user data from multiple databases. Each of these databases can be considered a data source.

所述预设数量可以根据实际情况进行设置,例如10万个。所述多个字段数据包括:姓名、身份证号、年龄、住址、职业、收入、办公地址、存款额等。The preset number can be set according to actual conditions, for example, 100,000. The multiple field data include: name, ID number, age, address, occupation, income, office address, deposit amount, and the like.

所述分析模块41,用于将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率。The analysis module 41 is configured to match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the relationship between each field data of the user data source to be analyzed and the user The matching rate of the sample data.

所述确定模块42,用于根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。The determination module 42 is configured to determine whether the user data source to be analyzed is a credible user data source according to the statistical matching rate of each field data of the user data source to be analyzed and according to preset analysis rules.

进一步的,所述确定模块42具体用于:Further, the determination module 42 is specifically used for:

确定出待分析用户数据源中的匹配率大于预设匹配率的字段,并统计出待分析用户数据源中的匹配率大于预设匹配率的字段数量;若所述统计出的字段数量大于预设数量,则确定该待分析用户数据源为可信的用户数据源并添加可信标识;或Determine the field whose matching rate in the user data source to be analyzed is greater than the preset matching rate, and count the number of fields in the user data source to be analyzed whose matching rate is greater than the preset matching rate; Determine the user data source to be analyzed as a credible user data source and add a credible mark; or

确定出待分析用户数据源中的匹配率大于预设匹配率的字段;分析待分析用户数据源对应的确定出的字段是否包含所有预先确定的关键字段;若包含预先确定的关键字段,则确定该待分析用户数据源为可信的用户数据源并添加可信标识。Determine the field whose matching rate in the user data source to be analyzed is greater than the preset matching rate; analyze whether the determined field corresponding to the user data source to be analyzed contains all the predetermined key fields; if it contains the predetermined key field, Then determine that the user data source to be analyzed is a trusted user data source and add a trusted identifier.

进一步的,所述确定模块42,还用于针对不可信用户数据源添加不可信标识。Further, the determining module 42 is also configured to add an untrustworthy mark to the untrustworthy user data source.

本实施例的技术方案提供的用户数据的可信性分析系统,通过获取模块40获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据。利用分析模块41将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率。利用确定模块42根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。本发明实施例的技术方案可以将待分析用户数据源与用户样本数据进行字段的逐一匹配以实现自动针对所述待分析用户数据源的可信性进行准确分析,从而提高数据分析的准确率及效率。The user data credibility analysis system provided by the technical solution of this embodiment acquires a preset number of user sample data through the acquisition module 40, and acquires a plurality of field data corresponding to the user sample data. Use the analysis module 41 to match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the matching of each field data of the user data source to be analyzed and the user sample data Rate. The utilization determination module 42 determines whether the user data source to be analyzed is a credible user data source according to the statistical matching rate of each field data of the user data source to be analyzed and according to preset analysis rules. The technical solution of the embodiment of the present invention can match the user data source to be analyzed with the user sample data field by field one by one to realize automatic and accurate analysis of the credibility of the user data source to be analyzed, thereby improving the accuracy of data analysis and efficiency.

上述产品可执行本发明任意实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本发明任意实施例所提供的方法。The above-mentioned product can execute the method provided by any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method. For technical details not exhaustively described in this embodiment, reference may be made to the method provided in any embodiment of the present invention.

注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, and the present invention The scope is determined by the scope of the appended claims.

Claims (10)

1.一种用户数据的可信性分析方法,其特征在于,包括:1. A credibility analysis method for user data, comprising: S10:服务器获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据;S10: The server obtains a preset number of user sample data, and obtains multiple field data corresponding to the user sample data; S11:将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率;及S11: Match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the matching rate of each field data of the user data source to be analyzed and the user sample data; and S12:根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。S12: Determine whether the user data source to be analyzed is a credible user data source according to the statistical matching rate of each field data of the user data source to be analyzed and according to preset analysis rules. 2.根据权利要求1所述的方法,其特征在于,步骤S12具体包括:2. The method according to claim 1, wherein step S12 specifically comprises: 确定出待分析用户数据源中的匹配率大于预设匹配率的字段,并统计出待分析用户数据源中的匹配率大于预设匹配率的字段数量;若所述统计出的字段数量大于预设数量,则确定该待分析用户数据源为可信的用户数据源并添加可信标识;或Determine the field whose matching rate in the user data source to be analyzed is greater than the preset matching rate, and count the number of fields in the user data source to be analyzed whose matching rate is greater than the preset matching rate; Determine the user data source to be analyzed as a credible user data source and add a credible mark; or 确定出待分析用户数据源中的匹配率大于预设匹配率的字段;分析待分析用户数据源对应的确定出的字段是否包含所有预先确定的关键字段;若包含预先确定的关键字段,则确定该待分析用户数据源为可信的用户数据源并添加可信标识。Determine the field whose matching rate in the user data source to be analyzed is greater than the preset matching rate; analyze whether the determined field corresponding to the user data source to be analyzed contains all the predetermined key fields; if it contains the predetermined key field, Then determine that the user data source to be analyzed is a trusted user data source and add a trusted identifier. 3.根据权利要求1所述的方法,其特征在于,所述字段数据包括姓名、身份证号、年龄、住址、职业、收入、办公地址、存款额中任一种或几种组合。3. The method according to claim 1, wherein the field data includes any one or a combination of name, ID number, age, address, occupation, income, office address, deposit amount. 4.根据权利要求2所述的方法,其特征在于,所述预设匹配率为99%。4. The method according to claim 2, wherein the preset matching rate is 99%. 5.根据权利要求1所述的方法,其特征在于,还包括步骤:5. The method according to claim 1, further comprising the steps of: 针对不可信用户数据源添加不可信标识。Add an untrusted flag for untrusted user data sources. 6.一种用户数据的可信性分析系统,其配置于服务器中,其特征在于,包括:6. A credibility analysis system for user data, which is configured in a server, is characterized in that it includes: 获取模块,用于获取预设数量的用户样本数据,并获取所述用户样本数据对应多个字段数据;An acquisition module, configured to acquire a preset number of user sample data, and acquire a plurality of field data corresponding to the user sample data; 分析模块,用于将待分析用户数据源的用户数据的各个字段数据与所述用户样本数据的对应字段数据逐一匹配,并统计出所述待分析用户数据源的各个字段数据与用户样本数据的匹配率;及The analysis module is used to match each field data of the user data of the user data source to be analyzed with the corresponding field data of the user sample data one by one, and calculate the difference between each field data of the user data source to be analyzed and the user sample data match rate; and 确定模块,用于根据统计的待分析用户数据源各个字段数据的匹配率,并按照预设的分析规则确定该待分析用户数据源是否为可信的用户数据源。The determining module is configured to determine whether the user data source to be analyzed is a credible user data source according to the statistical matching rate of each field data of the user data source to be analyzed and according to preset analysis rules. 7.根据权利要求6所述的系统,其特征在于,所述确定模块具体用于:7. The system according to claim 6, wherein the determining module is specifically used for: 确定出待分析用户数据源中的匹配率大于预设匹配率的字段,并统计出待分析用户数据源中的匹配率大于预设匹配率的字段数量;若所述统计出的字段数量大于预设数量,则确定该待分析用户数据源为可信的用户数据源并添加可信标识;或Determine the field whose matching rate in the user data source to be analyzed is greater than the preset matching rate, and count the number of fields in the user data source to be analyzed whose matching rate is greater than the preset matching rate; Determine the user data source to be analyzed as a credible user data source and add a credible mark; or 确定出待分析用户数据源中的匹配率大于预设匹配率的字段;分析待分析用户数据源对应的确定出的字段是否包含所有预先确定的关键字段;若包含预先确定的关键字段,则确定该待分析用户数据源为可信的用户数据源并添加可信标识。Determine the field whose matching rate in the user data source to be analyzed is greater than the preset matching rate; analyze whether the determined field corresponding to the user data source to be analyzed contains all the predetermined key fields; if it contains the predetermined key field, Then determine that the user data source to be analyzed is a trusted user data source and add a trusted identifier. 8.根据权利要求6所述的系统,其特征在于,所述字段数据包括姓名、身份证号、年龄、住址、职业、收入、办公地址、存款额中任一种或几种组合。8. The system according to claim 6, wherein the field data includes any one or a combination of name, ID number, age, address, occupation, income, office address, deposit amount. 9.根据权利要求7所述的系统,其特征在于,所述预设匹配率为99%。9. The system according to claim 7, wherein the preset matching rate is 99%. 10.根据权利要求6所述的系统,其特征在于,所述确定模块还用于针对不可信用户数据源添加不可信标识。10. The system according to claim 6, wherein the determining module is further configured to add an untrustworthy identification to untrustworthy user data sources.
CN201610474402.4A 2016-06-24 2016-06-24 The credibility Analysis method and system of user data Pending CN107544979A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610474402.4A CN107544979A (en) 2016-06-24 2016-06-24 The credibility Analysis method and system of user data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610474402.4A CN107544979A (en) 2016-06-24 2016-06-24 The credibility Analysis method and system of user data

Publications (1)

Publication Number Publication Date
CN107544979A true CN107544979A (en) 2018-01-05

Family

ID=60959845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610474402.4A Pending CN107544979A (en) 2016-06-24 2016-06-24 The credibility Analysis method and system of user data

Country Status (1)

Country Link
CN (1) CN107544979A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595249A (en) * 2020-12-03 2022-06-07 腾讯科技(深圳)有限公司 Service data access method, device, storage medium and equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050187965A1 (en) * 2000-11-21 2005-08-25 Abajian Aram C. Grouping multimedia and streaming media search results
CN102081842A (en) * 2010-11-03 2011-06-01 北京世纪高通科技有限公司 Method and device for evaluating video data source
CN102135974A (en) * 2010-08-06 2011-07-27 华为软件技术有限公司 Data source selecting method and system
US8078638B2 (en) * 2008-07-09 2011-12-13 Yahoo! Inc. Operations of multi-level nested data structure
CN103646110A (en) * 2013-12-26 2014-03-19 中国人民银行征信中心 Natural person basic identity information matching method
CN103729369A (en) * 2012-10-15 2014-04-16 金蝶软件(中国)有限公司 Method and device for automatically processing coexisting orders
US20150032738A1 (en) * 2013-07-23 2015-01-29 Salesforce.Com, Inc. Confidently adding snippets of search results to clusters of objects
CN104598598A (en) * 2015-01-23 2015-05-06 浙江协同数据系统有限公司 Method for evaluating relational data standard
US9336296B2 (en) * 2010-01-06 2016-05-10 International Business Machines Corporation Cross-domain clusterability evaluation for cross-guided data clustering based on alignment between data domains

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050187965A1 (en) * 2000-11-21 2005-08-25 Abajian Aram C. Grouping multimedia and streaming media search results
US8078638B2 (en) * 2008-07-09 2011-12-13 Yahoo! Inc. Operations of multi-level nested data structure
US9336296B2 (en) * 2010-01-06 2016-05-10 International Business Machines Corporation Cross-domain clusterability evaluation for cross-guided data clustering based on alignment between data domains
CN102135974A (en) * 2010-08-06 2011-07-27 华为软件技术有限公司 Data source selecting method and system
CN102081842A (en) * 2010-11-03 2011-06-01 北京世纪高通科技有限公司 Method and device for evaluating video data source
CN103729369A (en) * 2012-10-15 2014-04-16 金蝶软件(中国)有限公司 Method and device for automatically processing coexisting orders
US20150032738A1 (en) * 2013-07-23 2015-01-29 Salesforce.Com, Inc. Confidently adding snippets of search results to clusters of objects
CN103646110A (en) * 2013-12-26 2014-03-19 中国人民银行征信中心 Natural person basic identity information matching method
CN104598598A (en) * 2015-01-23 2015-05-06 浙江协同数据系统有限公司 Method for evaluating relational data standard

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
余伟 等: "基于数据质量的Deep Web数据源排序", 《小型微型计算机系统》 *
秦争艳: "基于采样的Deep_Web数据源选择方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595249A (en) * 2020-12-03 2022-06-07 腾讯科技(深圳)有限公司 Service data access method, device, storage medium and equipment

Similar Documents

Publication Publication Date Title
CN108737535B (en) Message pushing method, storage medium and server
US10984483B2 (en) Cognitive regulatory compliance automation of blockchain transactions
CN109272219B (en) Performance assessment method, device, computer equipment and storage medium
CN109634941B (en) Medical data processing method, device, electronic device and storage medium
CN106649831B (en) Data filtering method and device
CN109241358A (en) Metadata management method, device, computer equipment and storage medium
WO2019196304A1 (en) Electronic apparatus, credit feedback message parsing method, and storage medium
WO2020155508A1 (en) Suspicious user screening method and apparatus, computer device and storage medium
CN105303437A (en) Processing method and device for account checking
CN109635564A (en) A kind of method, apparatus, medium and equipment detecting Brute Force behavior
CN110502425A (en) Test data generation method, device, electronic device and storage medium
WO2022073513A1 (en) Information input assistance method and apparatus, electronic device and storage medium
CN110147378B (en) Data checking method, device, computer equipment and storage medium
CN109918385A (en) Tripartite's account checking method, electronic device and readable storage medium storing program for executing
CN116993523A (en) Configurable reconciliation methods, devices, equipment and storage media
WO2019056496A1 (en) Method for generating picture review probability interval and method for picture review determination
CN103020269A (en) Method and device for verifying data
CN114862257B (en) A method, device and equipment for evaluating data source quality
CN109544207B (en) Information processing method, storage medium and server
CN114697110A (en) A network attack detection method, device, equipment and storage medium
CN107544979A (en) The credibility Analysis method and system of user data
CN110708414B (en) Telephone number sorting method and device and electronic equipment
CN111190824A (en) Monitoring method, monitoring device, terminal equipment and storage medium
CN117033552A (en) Information evaluation method, device, electronic equipment and storage medium
CN111506615B (en) A method and device for determining the degree of possession of invalid users

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20180529

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen one ledger Intelligent Technology Co., Ltd.

Address before: 200030 Xuhui District, Shanghai Kai Bin Road 166, 9, 10 level.

Applicant before: Shanghai Financial Technologies Ltd

REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1246888

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180105

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1246888

Country of ref document: HK