CN111340220A

CN111340220A - Method and apparatus for training predictive models

Info

Publication number: CN111340220A
Application number: CN202010116709.3A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-06-26
Anticipated expiration: 2040-02-25
Also published as: CN111340220B

Abstract

The present disclosure relates to the field of artificial intelligence. Embodiments of the present disclosure disclose methods and apparatus for training a predictive model. The prediction model is used for predicting the performance of the neural network structure, and the method comprises training the prediction model through a sampling operation; the sampling operation comprises the following steps: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; constructing sample data based on the trained sub-network and the corresponding performance information, and training a prediction model by using the sample data; and in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation. The method can reduce the searching cost of the neural network model structure.

Description

Method and apparatus for training predictive models

技术领域technical field

本公开的实施例涉及计算机技术领域，具体涉及人工智能技术领域，尤其涉及用于训练预测模型的方法和装置。The embodiments of the present disclosure relate to the field of computer technology, in particular to the field of artificial intelligence technology, and in particular, to a method and apparatus for training a prediction model.

背景技术Background technique

随着人工智能技术和数据存储技术的发展，深度神经网络在许多领域取得了重要的成果。深度神经网络结构的设计对其性能具有直接的影响。传统的深度神经网络结构的设计由人工根据经验完成。人工设计网络结构需要大量的专家知识，并且针对不同的任务或应用场景需要分别针对性地进行网络结构的设计，成本较高。With the development of artificial intelligence technology and data storage technology, deep neural networks have achieved important results in many fields. The design of the deep neural network structure has a direct impact on its performance. The design of the traditional deep neural network structure is done manually based on experience. The manual design of the network structure requires a lot of expert knowledge, and the network structure needs to be designed for different tasks or application scenarios, and the cost is high.

NAS(neural architecture search，自动化神经网络结构搜索)是用算法代替繁琐的人工操作，自动搜索出最佳的神经网络架构。现有的模型结构自动搜索只能基于特定的约束条件进行搜索，例如针对指定的硬件设备型号进行搜索。然而，实际场景中的约束条件比较复杂，且变化很多，涉及到多种硬件种类，例如多种不同型号处理器。对每一种硬件，搜索约束也是繁多的，例如不同的延时约束。现有的方法需要针对每一种约束条件执行网络结构搜索，大量重复的网络结构搜索任务会消耗很多的计算资源，成本非常高。NAS (neural architecture search, automatic neural network structure search) is to replace tedious manual operations with algorithms to automatically search for the best neural network architecture. The existing model structure automatic search can only be searched based on specific constraints, such as searching for a specified hardware device model. However, the constraints in actual scenarios are complex and vary a lot, involving multiple types of hardware, such as multiple types of processors. For each type of hardware, search constraints are also numerous, such as different delay constraints. Existing methods need to perform network structure search for each constraint condition, and a large number of repetitive network structure search tasks will consume a lot of computing resources and the cost is very high.

发明内容SUMMARY OF THE INVENTION

本公开的实施例提出了用于训练预测模型的方法和装置、电子设备和计算机可读介质。Embodiments of the present disclosure propose methods and apparatuses, electronic devices, and computer-readable media for training predictive models.

第一方面，本公开的实施例提供了一种用于训练预测模型的方法，预测模型用于预测神经网络结构的性能，用于训练预测模型的方法包括通过采样操作训练预测模型；采样操作包括：从已训练完成的超网络中采样出子网络，并对采样出的子网络进行训练，得到训练完成的子网络的性能信息；基于训练完成的子网络和对应的性能信息构建样本数据，并利用样本数据训练预测模型；响应于确定当前采样操作中训练得到的预测模型的精度不满足预设的条件，执行下一次采样操作，并在下一次采样操作中增加采样的子网络的数量。In a first aspect, embodiments of the present disclosure provide a method for training a prediction model, where the prediction model is used to predict the performance of a neural network structure, and the method for training a prediction model includes training the prediction model through a sampling operation; the sampling operation includes: : Sample the sub-network from the trained super-network, and train the sampled sub-network to obtain the performance information of the trained sub-network; build the sample data based on the trained sub-network and the corresponding performance information, and Use the sample data to train the prediction model; in response to determining that the accuracy of the prediction model trained in the current sampling operation does not meet the preset condition, perform the next sampling operation, and increase the number of sampled sub-networks in the next sampling operation.

在一些实施例中，上述从已训练完成的超网络中采样出子网络，包括：采用初始的递归神经网络从已训练完成的超网络中采样出子网络；以及在对采样出的子网络进行训练之前，采样操作还包括：基于训练好的子网络的性能信息生成反馈信息，以基于反馈信息迭代更新递归神经网络；基于迭代更新后的递归神经网络重新从已训练完成的超网络中采样出子网络。In some embodiments, the above-mentioned sampling of the sub-network from the trained super-network includes: sampling the sub-network from the trained super-network by using an initial recurrent neural network; Before training, the sampling operation also includes: generating feedback information based on the performance information of the trained sub-network to iteratively update the recurrent neural network based on the feedback information; re-sampling from the trained super-network based on the iteratively updated recurrent neural network subnet.

在一些实施例中，上述从已训练完成的超网络中采样出子网络，包括：从已训练完成的超网络中采样出未被采样过的子网络；以及上述基于训练完成的子网络和对应的性能信息构建样本数据，包括：基于当前采样操作中采样出的子网络和对应的性能信息、以及上一次采样操作中采样出的子网络和对应的性能信息构建样本数据。In some embodiments, the above-mentioned sampling of the sub-network from the trained super-network includes: sampling the unsampled sub-network from the trained super-network; and the above-mentioned based on the trained sub-network and corresponding The construction of sample data includes: constructing sample data based on the sub-network and corresponding performance information sampled in the current sampling operation, and the sub-network and corresponding performance information sampled in the previous sampling operation.

在一些实施例中，上述采样操作还包括：响应于确定预测模型的精度满足预设的条件，基于当前的采样操作的训练结果生成训练完成的预测模型。In some embodiments, the above sampling operation further includes: in response to determining that the accuracy of the prediction model meets a preset condition, generating a trained prediction model based on the training result of the current sampling operation.

在一些实施例中，上述方法还包括：基于训练完成的预测模型对预设的模型结构搜索空间内的模型结构的性能预测结果，以及预设的深度学习任务场景的性能约束条件，在模型结构搜索空间中搜索出满足性能约束条件的神经网络模型结构。In some embodiments, the above method further includes: based on the performance prediction result of the model structure in the preset model structure search space based on the trained prediction model, and the preset performance constraints of the deep learning task scenario, in the model structure The neural network model structure that satisfies the performance constraints is searched in the search space.

第二方面，本公开的实施例提供了一种用于训练预测模型的装置，预测模型用于预测神经网络结构的性能，用于训练预测模型的装置包括采样单元，被配置为通过采样操作训练预测模型；采样单元执行的采样操作包括：从已训练完成的超网络中采样出子网络，并对采样出的子网络进行训练，得到训练完成的子网络的性能信息；基于训练完成的子网络和对应的性能信息构建样本数据，并利用样本数据训练预测模型；响应于确定当前采样操作中训练得到的预测模型的精度不满足预设的条件，执行下一次采样操作，并在下一次采样操作中增加采样的子网络的数量。In a second aspect, embodiments of the present disclosure provide an apparatus for training a prediction model, where the prediction model is used to predict the performance of a neural network structure, and the apparatus for training a prediction model includes a sampling unit configured to train through a sampling operation Prediction model; the sampling operation performed by the sampling unit includes: sampling the sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; based on the trained sub-network and the corresponding performance information to construct sample data, and use the sample data to train the prediction model; in response to determining that the accuracy of the prediction model trained in the current sampling operation does not meet the preset conditions, perform the next sampling operation, and in the next sampling operation Increase the number of sub-networks sampled.

用于训练预测模型的述采样单元按照如下方式从已训练完成的超网络中采样出子网络：采用初始的递归神经网络从已训练完成的超网络中采样出子网络；以及在对采样出的子网络进行训练之前，采样单元执行的采样操作还包括：基于训练好的子网络的性能信息生成反馈信息，以基于反馈信息迭代更新递归神经网络；基于迭代更新后的递归神经网络重新从已训练完成的超网络中采样出子网络。The sampling unit for training the prediction model samples the sub-network from the trained super-network in the following manner: using the initial recurrent neural network to sample the sub-network from the trained super-network; Before the sub-network is trained, the sampling operation performed by the sampling unit further includes: generating feedback information based on the performance information of the trained sub-network, so as to iteratively update the recurrent neural network based on the feedback information; Sub-networks are sampled from the completed super-network.

用于训练预测模型的述采样单元按照如下方式从已训练完成的超网络中采样出子网络：从已训练完成的超网络中采样出未被采样过的子网络；以及采样单元按照如下方式构建样本数据：基于当前采样操作中采样出的子网络和对应的性能信息、以及上一次采样操作中采样出的子网络和对应的性能信息构建样本数据。The sampling unit for training the prediction model samples the sub-network from the trained super-network in the following manner: sampling the sub-network that has not been sampled from the trained super-network; and the sampling unit is constructed as follows Sample data: construct sample data based on the sub-network and corresponding performance information sampled in the current sampling operation, and the sub-network and corresponding performance information sampled in the previous sampling operation.

用于训练预测模型的述采样操作还包括：响应于确定预测模型的精度满足预设的条件，基于当前的采样操作的训练结果生成训练完成的预测模型。The sampling operation for training the prediction model further includes: in response to determining that the accuracy of the prediction model satisfies a preset condition, generating a trained prediction model based on the training result of the current sampling operation.

用于训练预测模型的述装置还包括：搜索单元，被配置为基于训练完成的预测模型对预设的模型结构搜索空间内的模型结构的性能预测结果，以及预设的深度学习任务场景的性能约束条件，在模型结构搜索空间中搜索出满足性能约束条件的神经网络模型结构。The apparatus for training a prediction model further includes: a search unit configured to predict the performance of the model structure in the preset model structure search space based on the trained prediction model, and the performance of the preset deep learning task scene Constraints, the neural network model structure that satisfies the performance constraints is searched in the model structure search space.

第三方面，本公开的实施例提供了一种电子设备，包括：一个或多个处理器；存储装置，用于存储一个或多个程序，当一个或多个程序被一个或多个处理器执行，使得一个或多个处理器实现如第一方面提供的用于训练预测模型的方法。In a third aspect, embodiments of the present disclosure provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are processed by the one or more processors Execution causes the one or more processors to implement the method for training a predictive model as provided by the first aspect.

第四方面，本公开的实施例提供了一种计算机可读介质，其上存储有计算机程序，其中，程序被处理器执行时实现第一方面提供的用于训练预测模型的方法。In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method for training a prediction model provided in the first aspect.

本公开的上述实施例的用于训练预测模型的方法和装置，通过采样操作训练预测模型；其中，采样操作包括：从已训练完成的超网络中采样出子网络，并对采样出的子网络进行训练，得到训练完成的子网络的性能信息；基于训练完成的子网络和对应的性能信息构建样本数据，并利用样本数据训练预测模型；响应于确定预测模型的精度不满足预设的条件，执行下一次采样操作，并在下一次采样操作中增加采样的子网络的数量，其中，预测模型用于预测神经网络结构的性能。该方法和装置可以获得预测任意模型结构的性能的预测模型，从而在应用至模型结构自动搜索中时针对不同的约束条件都只需搜索一次即可得出性能最优的模型结构，有效减少了模型结构搜索消耗的资源，降低了模型结构搜索成本。The method and device for training a prediction model according to the above-mentioned embodiments of the present disclosure train the prediction model through a sampling operation; wherein the sampling operation includes: sampling a sub-network from a super-network that has been trained, and analyzing the sampled sub-network Perform training to obtain performance information of the trained sub-network; construct sample data based on the trained sub-network and the corresponding performance information, and use the sample data to train the prediction model; in response to determining that the accuracy of the prediction model does not meet the preset conditions, The next sampling operation is performed and the number of sampled sub-networks is increased in the next sampling operation, where the prediction model is used to predict the performance of the neural network structure. The method and device can obtain a prediction model for predicting the performance of any model structure, so that when applied to the automatic search of the model structure, the model structure with the best performance can be obtained by searching only once for different constraints, effectively reducing the number of The resources consumed by model structure search reduce the cost of model structure search.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本公开的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present disclosure will become more apparent upon reading the detailed description of non-limiting embodiments taken with reference to the following drawings:

图1是本公开的实施例可以应用于其中的示例性系统架构图；FIG. 1 is an exemplary system architecture diagram to which embodiments of the present disclosure may be applied;

图2是根据本公开的用于训练预测模型的方法的一个实施例的流程图；2 is a flowchart of one embodiment of a method for training a predictive model according to the present disclosure;

图3是根据本公开的用于训练预测模型的方法的另一个实施例的流程图；3 is a flowchart of another embodiment of a method for training a predictive model according to the present disclosure;

图4是本公开的用于训练预测模型的装置的一个实施例的结构示意图；4 is a schematic structural diagram of an embodiment of the apparatus for training a prediction model of the present disclosure;

图5是适于用来实现本公开实施例的电子设备的计算机系统的结构示意图。FIG. 5 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.

具体实施方式Detailed ways

下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是，在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict. The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

图1示出了可以应用本公开的用于训练超网络的方法或用于训练超网络的装置的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 to which a method for training a super-network or an apparatus for training a super-network of the present disclosure may be applied.

如图1所示，系统架构100可以包括终端设备101、102、103，网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

终端设备101、102、103通过网络104与服务器105交互，以接收或发送消息等。终端设备101、102、103可以是用户端设备，其上可以安装有各种客户端应用。例如，图像处理类应用、信息分析类应用、语音助手类应用、购物类应用、金融类应用等。The terminal devices 101, 102, and 103 interact with the server 105 through the network 104 to receive or send messages and the like. The terminal devices 101, 102, and 103 may be client devices on which various client applications may be installed. For example, image processing applications, information analysis applications, voice assistant applications, shopping applications, financial applications, etc.

终端设备101、102、103可以是硬件，也可以是软件。当终端设备101、102、103为硬件时，可以是各种电子设备，包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时，可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块)，也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices, including but not limited to smart phones, tablet computers, e-book readers, laptop computers, desktop computers, and the like. When the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (eg, multiple software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.

服务器105可以是运行各种服务的服务器，例如运行基于图像或语音数据的目标跟踪、语音处理服务的服务器。服务器105可以从终端设备101、102、103获取深度学习任务数据、或者从数据库获取深度学习任务数据来构建训练样本，对用于执行深度学习任务的神经网络的模型结构进行自动搜索和优化。服务器105还可以运行用于预测神经网络结构的性能的预测模型，在进行模型结构自动搜索时基于该预测模型预测不同的神经网络结构的性能，进而快速确定出性能最优的神经网络模型结构。The server 105 may be a server running various services, such as a server running object tracking, voice processing services based on image or voice data. The server 105 can obtain deep learning task data from the terminal devices 101, 102, 103, or obtain deep learning task data from a database to construct training samples, and automatically search and optimize the model structure of the neural network used to perform the deep learning task. The server 105 may also run a prediction model for predicting the performance of the neural network structure, and predict the performance of different neural network structures based on the prediction model during automatic model structure search, so as to quickly determine the neural network model structure with the best performance.

在本公开的实施例的应用场景中，服务器105可以通过超网络来实现神经网络的模型结构的自动搜索。服务器105可以基于获取到的深度学习任务数据，例如图像、文本、语音等媒体数据，来训练超网络，在超网络训练完成后，服务器105可以从超网络中采样出子网络结构来执行相应的任务。In the application scenario of the embodiment of the present disclosure, the server 105 may realize the automatic search of the model structure of the neural network through the super network. The server 105 can train the super network based on the acquired deep learning task data, such as media data such as images, texts, and voices. After the training of the super network is completed, the server 105 can sample the sub-network structure from the super network to execute the corresponding Task.

服务器105还可以是为终端设备101、102、103上安装的应用提供后端支持的后端服务器。例如，服务器105可以接收终端设备101、102、103发送的待处理的数据，使用神经网络模型对数据进行处理，并将处理结果返回至终端设备101、102、103。The server 105 may also be a back-end server that provides back-end support for applications installed on the terminal devices 101 , 102 , and 103 . For example, the server 105 may receive the data to be processed sent by the terminal devices 101 , 102 , and 103 , process the data using a neural network model, and return the processing result to the terminal devices 101 , 102 , and 103 .

在实际场景中，终端设备101、102、103可以向服务器105发送与语音交互、文本分类、对话行为分类、图像识别、关键点检测等任务相关的深度学习任务请求。服务器105上可以运行已针对相应的深度学习任务训练得到的神经网络模型，利用该神经网络模型来处理信息。In an actual scenario, the terminal devices 101 , 102 , and 103 may send deep learning task requests related to tasks such as voice interaction, text classification, dialogue behavior classification, image recognition, and key point detection to the server 105 . The neural network model that has been trained for the corresponding deep learning task can be run on the server 105, and the neural network model can be used to process information.

需要说明的是，本公开的实施例所提供的用于训练预测模型的方法一般由服务器105执行，相应地，用于训练预测模型的装置一般设置于服务器105中。It should be noted that the method for training a prediction model provided by the embodiments of the present disclosure is generally executed by the server 105 , and accordingly, the apparatus for training the prediction model is generally set in the server 105 .

在一些场景中，服务器105可以从数据库、存储器或其他设备获取训练预测模型所需要的源数据(例如训练样本，已训练完成的超网络等)，这时，示例性系统架构100可以不存在终端设备101、102、103和网络104。In some scenarios, the server 105 may obtain source data (eg, training samples, trained hyper-networks, etc.) required for training the predictive model from a database, memory, or other device, in which case the exemplary system architecture 100 may not have a terminal Devices 101 , 102 , 103 and network 104 .

需要说明的是，服务器105可以是硬件，也可以是软件。当服务器105为硬件时，可以实现成多个服务器组成的分布式服务器集群，也可以实现成单个服务器。当服务器105为软件时，可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块)，也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server 105 may be hardware or software. When the server 105 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server 105 is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.

应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

继续参考图2，其示出了根据本公开的用于训练预测模型的方法的一个实施例的流程200。With continued reference to Figure 2, a flow 200 of one embodiment of a method for training a predictive model according to the present disclosure is shown.

本公开的预测模型用于预测神经网络结构的性能。神经网络结构的性能可以包括以下至少一项：神经网络结构执行相应深度学习任务的精度、神经网络结构在指定硬件或软件环境下的运行功耗、神经网络结构在指定硬件或软件环境下的运行延时、神经网络结构在指定硬件或软件环境下的内存占用率，等等。需要说明的是，针对不同的硬件或软件环境，可以分别训练相应的预测模型。针对不同的深度学习任务，也可以分别训练不同的预测模型。The predictive models of the present disclosure are used to predict the performance of neural network structures. The performance of the neural network structure may include at least one of the following: the accuracy of the neural network structure to perform the corresponding deep learning task, the running power consumption of the neural network structure in the specified hardware or software environment, the operation of the neural network structure in the specified hardware or software environment Latency, memory usage of neural network structure in specified hardware or software environment, etc. It should be noted that, for different hardware or software environments, corresponding prediction models can be trained separately. Different prediction models can also be trained separately for different deep learning tasks.

本实施例中用于训练预测模型的方法的流程200，包括通过采样操作训练预测模型。其中，采样操作包括以下步骤201至步骤203。The process 200 of the method for training a prediction model in this embodiment includes training the prediction model through sampling operations. The sampling operation includes the following steps 201 to 203 .

在步骤201中，从已训练完成的超网络中采样出子网络，并对采样出的子网络进行训练，得到训练完成的子网络的性能信息。In step 201, a sub-network is sampled from the trained super-network, and the sampled sub-network is trained to obtain performance information of the trained sub-network.

在本实施例中，用于训练预测模型的方法的执行主体可以获取预先训练完成的超网络。超网络的结构可以是预先设定的，包含网络结构搜索空间中的所有网络结构，超网络的每一层可以包含网络结构搜索空间中的多个网络结构单元。在这里，网络结构单元可以由单个网络层形成，例如单个卷积层、循环神经网络中的单个循环单元，也可以由多个网络层组合形成，例如由卷积层、批量归一化层、非线性层连接形成的卷积块(block)。超网络中，每个网络结构单元可以与其上一层和下一层的所有网络结构单元连接。超网络训练完成后，内部所有的网络结构在构建不同的子网络时共享参数。In this embodiment, the execution body of the method for training a prediction model may acquire a pre-trained super network. The structure of the super network may be preset, including all network structures in the network structure search space, and each layer of the super network may include multiple network structure units in the network structure search space. Here, the network structural unit can be formed by a single network layer, such as a single convolutional layer, a single recurrent unit in a recurrent neural network, or a combination of multiple network layers, such as a convolutional layer, batch normalization layer, A convolutional block formed by connecting non-linear layers. In a super network, each network structural unit can be connected to all network structural units at the upper and lower layers. After the super network is trained, all internal network structures share parameters when building different sub-networks.

可以随机地中超网络中采样子网络，或者可以利用经过训练的递归神经网络中超网络中采样出子网络。需要说明的是，在每一次采样操作中，可以采样出多个子网络。The sub-networks can be randomly sampled from the super-network, or the sub-networks can be sampled from the super-network using a trained recurrent neural network. It should be noted that in each sampling operation, multiple sub-networks can be sampled.

可以获取训练数据，对采样出的子网络进行训练。训练数据可以是图像、文本、语音、视频等媒体数据，也可以是位置、价格、时间等数字型数据，可以根据所执行的深度学习任务确定，例如深度学习任务是图像分类任务，则训练数据是图像数据。Training data can be obtained to train the sampled subnetworks. Training data can be media data such as images, text, voice, and video, or digital data such as location, price, and time, and can be determined according to the deep learning task performed. For example, if the deep learning task is an image classification task, then the training data is image data.

在本实施例中，训练数据可以是具有标注信息的数据，在子网络训练过程中，基于输入子网络的训练数据的标注信息来确定子网络的误差，进而通过误差反向传播的方式迭代调整子网络的参数，使得子网络在训练过程中逐步优化其参数。In this embodiment, the training data may be data with label information. During the sub-network training process, the error of the sub-network is determined based on the label information of the training data input to the sub-network, and then iteratively adjusts by means of error back propagation. The parameters of the sub-network, so that the sub-network gradually optimizes its parameters during the training process.

在子网络训练完成之后，可以利用测试数据来测试子网络的性能。测试数据也可以有具有标注信息，根据训练完成的子网络对测试数据的处理结果与对应的标注信息得出训练完成的子网络的性能信息。After the sub-network training is completed, the test data can be used to test the performance of the sub-network. The test data may also have label information, and the performance information of the trained sub-network is obtained according to the processing result of the trained sub-network on the test data and the corresponding label information.

在步骤202中，基于训练完成的子网络和对应的性能信息构建样本数据，并利用样本数据训练预测模型。In step 202, sample data is constructed based on the trained sub-network and corresponding performance information, and a prediction model is trained by using the sample data.

在本实施例中，可以根据步骤201训练完成的子网络与其性能信息构建成对的样本数据，在该样本数据中，子网络为输入信息，对应的性能信息为输入信息对应的标注信息。In this embodiment, paired sample data can be constructed according to the sub-network trained in step 201 and its performance information. In the sample data, the sub-network is the input information, and the corresponding performance information is the labeling information corresponding to the input information.

可选地，可以将一部分样本数据作为训练样本，另一部分训练样本作为测试样本。Optionally, a part of the sample data can be used as a training sample, and another part of the training sample can be used as a test sample.

可以利用样本数据对待训练的预测模型进行训练。待训练的预测模型的结构可以是预先构建的，例如可以是基于NAS的方法从搜索空间中自动搜索出的，或者可以是预先设定的卷积神经网络、循环神经网络等网络结构。在本实施例中，可以将样本数据中的子网络编码后输入待训练的预测模型，待训练的预测模型可以对输入的子网络的性能信息进行预测。根据待训练的预测模型对子网络的性能信息的预测结果与子网络的标注信息之间的差异构建目标函数，通过最小化目标函数来迭代调整待训练的预测模型的参数。在目标函数的值收敛至预设的范围内，或者迭代调整待训练的预测模型的参数的次数达到预设的次数阈值时，可以固定待训练的预测模型的参数，得到当前的采样操作中生成的预测模型。The predictive model to be trained can be trained using sample data. The structure of the prediction model to be trained may be pre-built, for example, it may be automatically searched from the search space based on the NAS method, or it may be a network structure such as a pre-set convolutional neural network and a recurrent neural network. In this embodiment, the sub-network in the sample data can be encoded and input into the prediction model to be trained, and the prediction model to be trained can predict the performance information of the input sub-network. An objective function is constructed according to the difference between the prediction result of the prediction model to be trained on the performance information of the sub-network and the label information of the sub-network, and the parameters of the prediction model to be trained are iteratively adjusted by minimizing the objective function. When the value of the objective function converges to a preset range, or when the number of times of iteratively adjusting the parameters of the prediction model to be trained reaches a preset number of thresholds, the parameters of the prediction model to be trained can be fixed to obtain the current sampling operation. prediction model.

由于子网络是从已训练的超网络中采样得出的，各子网络的初始参数精度较高，所以子网络在步骤202中训练时能够快速收敛，因而本实施例的方案可以快速地完成每次采样操作，从而加快预测模型的训练速度。Since the sub-network is sampled from the trained super-network, and the initial parameters of each sub-network have high precision, the sub-network can quickly converge during training in step 202, so the solution of this embodiment can quickly complete each sub-network. sub-sampling operations, thereby speeding up the training of predictive models.

在步骤203中，响应于确定预测模型的精度不满足预设的条件，执行下一次采样操作，并在下一次采样操作中增加采样的子网络的数量。In step 203, in response to determining that the accuracy of the prediction model does not meet the preset condition, a next sampling operation is performed, and the number of sampled sub-networks is increased in the next sampling operation.

在当前的采样操作中停止对待训练的预测模型的训练后，可以利用测试样本测试当前采样操作中训练得到的预测模型的预测精度，具体可以利用当前采样操作中训练得到的预测模型对测试样本中的各子网络的性能信息进行预测，将预测结果与测试样本中的各子网络的标注信息进行比对得到预测模型的预测精度。After stopping the training of the prediction model to be trained in the current sampling operation, the prediction accuracy of the prediction model trained in the current sampling operation can be tested by using the test sample. The performance information of each sub-network is predicted, and the prediction result is compared with the label information of each sub-network in the test sample to obtain the prediction accuracy of the prediction model.

若当前采样操作中训练得到的预测模型的精度不满足预设的条件，例如未达到预设的精度阈值，则可以执行下一次采样操作，在下一次采样操作中增加采样的子网络的数量。增加的子网络的数量可以预先设定，例如设定每次采样操作比前一次采样操作增加500个子网络。这样，相邻的两次采样操作中，后一次采样操作中预测模型的样本数量增加，较前一次采样操作可以达到更好的精度。并且，通过逐步增加采样的子网络的数量，能够避免过大的样本数量造成预测模型的训练消耗过多的内存资源。If the accuracy of the prediction model trained in the current sampling operation does not meet the preset conditions, for example, does not reach the preset accuracy threshold, the next sampling operation can be performed, and the number of sampled sub-networks can be increased in the next sampling operation. The number of added sub-networks can be preset, for example, it is set that each sampling operation is increased by 500 sub-networks compared to the previous sampling operation. In this way, in two adjacent sampling operations, the number of samples of the prediction model in the latter sampling operation increases, which can achieve better accuracy than the previous sampling operation. Moreover, by gradually increasing the number of sampled sub-networks, it can be avoided that the training of the prediction model consumes too much memory resources due to the excessively large number of samples.

上述实施例的用于训练预测模型的方法，通过采样操作训练预测模型，其中采样操作包括：从已训练完成的超网络中采样出子网络，并对采样出的子网络进行训练，得到训练完成的子网络的性能信息；基于训练完成的子网络和对应的性能信息构建样本数据，并利用样本数据训练预测模型；响应于确定预测模型的精度不满足预设的条件，执行下一次采样操作，并在下一次采样操作中增加采样的子网络的数量。该方法可以获得预测任意模型结构的性能的预测模型，从而在应用至模型结构自动搜索中时针对不同的约束条件都只需搜索一次即可得出性能最优的模型结构，有效减少了模型结构搜索消耗的资源，降低了模型结构搜索成本。In the method for training a prediction model of the above-mentioned embodiment, the prediction model is trained by a sampling operation, wherein the sampling operation includes: sampling a sub-network from a super-network that has been trained, and training the sampled sub-network, and obtaining a training completed performance information of the sub-network; construct sample data based on the trained sub-network and the corresponding performance information, and use the sample data to train the prediction model; in response to determining that the accuracy of the prediction model does not meet the preset conditions, perform the next sampling operation, And increase the number of sub-networks sampled in the next sampling operation. This method can obtain a prediction model for predicting the performance of any model structure, so that when applied to the automatic search of the model structure, the model structure with the best performance can be obtained by only one search for different constraints, which effectively reduces the model structure. Searching consumes resources, reducing model structure search costs.

可选地，上述采样操作还可以包括：响应于确定预测模型的精度满足预设的条件，基于当前的采样操作的训练结果生成训练完成的预测模型。若当前的采样操作中训练得到的预测模型的精度达到预设的精度阈值，可以将该预测模型作为训练完成的预测模型。这样，在经过多次采样操作逐步增加采样的子网络数量、增加预测模型的样本数量以对预测模型逐步进行优化后，当预测模型的精度满足预设的条件时可以停止采样操作，避免过多的采样操作消耗内存资源。Optionally, the above sampling operation may further include: in response to determining that the accuracy of the prediction model meets a preset condition, generating a trained prediction model based on the training result of the current sampling operation. If the accuracy of the prediction model trained in the current sampling operation reaches the preset accuracy threshold, the prediction model can be used as the prediction model after training. In this way, after the number of sampled sub-networks and the number of samples of the prediction model are gradually increased after multiple sampling operations to gradually optimize the prediction model, the sampling operation can be stopped when the accuracy of the prediction model meets the preset conditions to avoid excessive The sampling operation consumes memory resources.

训练完成的预测模型可以预测预设的神经网络模型的性能。在实际场景中，在应用程序中的功能上线之前，可以利用该虚拟兰完成的预测模型对用于实现该功能的神经网络模型的性能进行预测，该预测结果可以作为参考信息来评估该功能的稳定性和可靠性。The trained prediction model can predict the performance of the preset neural network model. In actual scenarios, before the function in the application goes online, the prediction model completed by the virtual blueprint can be used to predict the performance of the neural network model used to realize the function, and the prediction result can be used as reference information to evaluate the function of the function. Stability and reliability.

在上述实施例的一些可选的实现方式中，上述从已训练完成的超网络中采样出子网络的步骤包括：从已训练完成的超网络中采样出未被采样过的子网络。也即，在每次采样操作中，之前的采样操作中已被采样过的子网络不会被重复采样，每次采样操作采样一批新的子网络。这时上述基于训练完成的子网络和对应的性能信息构建样本数据，包括：基于当前采样操作中采样出的子网络和对应的性能信息、以及上一次采样操作中采样出的子网络和对应的性能信息构建样本数据。也即，可以将当前的采样操作采样出的子网络添加至上一次采样操作中构建的样本数据中，以对样本数据进行扩充。这样可以最小化子网络采样造成的运算资源消耗并逐步增加样本数据的数量，有利于减少预测模型的占用的内存资源。In some optional implementations of the foregoing embodiments, the foregoing step of sampling a sub-network from a trained super-network includes: sampling an unsampled sub-network from the trained super-network. That is, in each sampling operation, the sub-networks that have been sampled in the previous sampling operation will not be repeatedly sampled, and each sampling operation samples a batch of new sub-networks. At this time, the above-mentioned construction of sample data based on the trained sub-network and the corresponding performance information includes: based on the sub-network and corresponding performance information sampled in the current sampling operation, and the sub-network and corresponding performance information sampled in the previous sampling operation. Performance information builds sample data. That is, the sub-network sampled by the current sampling operation can be added to the sample data constructed in the previous sampling operation to expand the sample data. In this way, computing resource consumption caused by sub-network sampling can be minimized and the number of sample data can be gradually increased, which is beneficial to reduce the memory resources occupied by the prediction model.

可选地，上述采样操作中，步骤201的子网络的采样可以按照如下方式实现：采用初始的递归神经网络从已训练完成的超网络中采样出子网络。在执行步骤202对采样出的子网络进行训练之前，采样操作还可以包括：基于训练好的子网络的性能信息生成反馈信息，以基于反馈信息迭代更新递归神经网络；基于迭代更新后的递归神经网络重新从已训练完成的超网络中采样出子网络。Optionally, in the above sampling operation, the sampling of the sub-network in step 201 may be implemented as follows: an initial recurrent neural network is used to sample the sub-network from the super-network that has been trained. Before performing step 202 to train the sampled sub-network, the sampling operation may further include: generating feedback information based on the performance information of the trained sub-network, so as to iteratively update the recurrent neural network based on the feedback information; The network resamples sub-networks from the trained super-network.

具体地，在训练预测模型的过程中，还可以对用于从超网络中采样出子网络的递归神经网络进行训练。在每次一采样操作中，可以采用待训练的递归神经网络采样子网络，该待训练的递归神经网络的参数可以随机初始化，然后将待训练的递归神经网络采样出的子网络的误差等信息作为反馈信息反馈至递归神经网络，以使递归神经网络根据反馈信息更新参数，并重新采样子网络。Specifically, in the process of training the prediction model, the recurrent neural network for sampling the sub-network from the super-network may also be trained. In each sampling operation, the recurrent neural network to be trained can be used to sample the sub-network, the parameters of the recurrent neural network to be trained can be randomly initialized, and then the information such as the error of the sub-network sampled by the recurrent neural network to be trained Feedback to the recurrent neural network as feedback information, so that the recurrent neural network updates parameters according to the feedback information, and resamples the sub-network.

这样，通过基于子网络采样结果训练递归神经网络，可以优化递归神经网络，从而优化子网络采样结果，进而提升基于子网络采样结果训练得到的预测模型的预测精度。In this way, by training the recurrent neural network based on the sub-network sampling results, the recurrent neural network can be optimized, thereby optimizing the sub-network sampling results, and further improving the prediction accuracy of the prediction model trained based on the sub-network sampling results.

请参考图3，其示出了本公开的用于训练预测模型的方法的另一个实施例的流程。如图3所示，本实施例的用于训练预测模型的方法的流程300，包括：Please refer to FIG. 3 , which shows a flow of another embodiment of the method for training a prediction model of the present disclosure. As shown in FIG. 3 , the process 300 of the method for training a prediction model in this embodiment includes:

步骤301，通过采样操作训练预测模型。Step 301, training a prediction model through sampling operation.

其中，采样操作包括以下步骤3011、步骤3012和步骤3013。The sampling operation includes the following steps 3011 , 3012 and 3013 .

步骤3011，从已训练完成的超网络中采样出子网络，并对采样出的子网络进行训练，得到训练完成的子网络的性能信息。Step 3011: Sample sub-networks from the trained super-network, and train the sampled sub-networks to obtain performance information of the trained sub-networks.

步骤3012，基于训练完成的子网络和对应的性能信息构建样本数据，并利用样本数据训练预测模型。Step 3012 , construct sample data based on the trained sub-network and corresponding performance information, and use the sample data to train a prediction model.

步骤3013，响应于确定当前采样操作中训练得到的预测模型的精度不满足预设的条件，执行下一次采样操作，并在下一次采样操作中增加采样的子网络的数量。Step 3013 , in response to determining that the accuracy of the prediction model trained in the current sampling operation does not meet the preset condition, execute the next sampling operation, and increase the number of sampled sub-networks in the next sampling operation.

可选地，上述步骤3011中，可以采用初始的递归神经网络从已训练完成的超网络中采样出子网络；以及在执行步骤3012对采样出的子网络进行训练之前，采样操作还包括：基于训练好的子网络的性能信息生成反馈信息，以基于反馈信息迭代更新递归神经网络；基于迭代更新后的递归神经网络重新从已训练完成的超网络中采样出子网络。Optionally, in the above step 3011, an initial recurrent neural network can be used to sample a sub-network from the trained super-network; and before performing step 3012 to train the sampled sub-network, the sampling operation further includes: based on The performance information of the trained sub-network generates feedback information to iteratively update the recurrent neural network based on the feedback information; based on the iteratively updated recurrent neural network, the sub-network is re-sampled from the trained super-network.

可选地，上述从以训练完成的超网络中采样出子网络的步骤3011可以包括：从已训练完成的超网络中采样出未被采样过的子网络。并且上述步骤3012中可以按照如下方式构建样本数据：基于当前采样操作中采样出的子网络和对应的性能信息、以及上一次采样操作中采样出的子网络和对应的性能信息构建样本数据。Optionally, the above step 3011 of sampling a sub-network from a super-network completed by training may include: sampling an unsampled sub-network from the super-network that has been trained. And in the above step 3012, the sample data may be constructed as follows: the sample data is constructed based on the sub-network and corresponding performance information sampled in the current sampling operation, and the sub-network and corresponding performance information sampled in the previous sampling operation.

可选地，采样操作还可以包括：响应于确定预测模型的精度满足预设的条件，基于当前的采样操作的训练结果生成训练完成的预测模型。Optionally, the sampling operation may further include: in response to determining that the accuracy of the prediction model meets a preset condition, generating a trained prediction model based on the training result of the current sampling operation.

上述采样操作301中的步骤3011、步骤3012、步骤3013分别与前述实施例中的步骤201、步骤202、步骤203一致，步骤3011、步骤3012、步骤3013的具体实施方式及采样操作的可选实现方式可以分别参考前述实施例中对应步骤的描述，此处不再赘述。Steps 3011, 3012, and 3013 in the above sampling operation 301 are respectively consistent with steps 201, 202, and 203 in the foregoing embodiment. For the manner, reference may be made to the descriptions of the corresponding steps in the foregoing embodiments, which will not be repeated here.

在本实施例中，用于训练预测模型的方法还包括：In this embodiment, the method for training the prediction model further includes:

步骤302，基于训练完成的预测模型对预设的模型结构搜索空间内的模型结构的性能预测结果，以及预设的深度学习任务场景的性能约束条件，在模型结构搜索空间中搜索出满足性能约束条件的神经网络模型结构。Step 302 , based on the performance prediction result of the model structure in the preset model structure search space by the trained prediction model, and the performance constraints of the preset deep learning task scenario, search the model structure search space to meet the performance constraints. Conditional neural network model structure.

预设的模型结构搜索空间可以是针对指定的深度学习任务构建的搜索空间，例如针对图像处理任务构建的包含卷积层的搜索空间、针对文本或语音等序列数据构建的包含注意力单元(Attention)的搜索空间。可以利用在步骤301中训练完成的预测模型预测搜索空间内各模型结构的性能进行预测。然后将各模型结构的性能预测结果与预设的深度学习任务场景的性能约束条件进行匹配，将匹配成功的模型结构作为搜索出的满足该性能约束条件的神经网络模型结构。该搜索出的神经网络模型结构可以用于执行上述预设的深度学习任务场景中的任务数据。The preset model structure search space can be a search space constructed for a specified deep learning task, such as a search space constructed for image processing tasks including convolutional layers, and an attention unit constructed for sequence data such as text or speech. ) search space. The prediction can be performed by using the prediction model trained in step 301 to predict the performance of each model structure in the search space. Then, the performance prediction results of each model structure are matched with the preset performance constraints of the deep learning task scenario, and the successfully matched model structure is used as the searched neural network model structure that satisfies the performance constraints. The searched neural network model structure can be used to execute the task data in the above preset deep learning task scene.

上述性能约束条件可以由运行神经网络模型结构的设备的硬件或软件环境决定。例如在一种芯片上运行神经网络模型的最小延时为0.2秒，则可以在搜索空间内搜索出满足该延时条件的网络结构。或者，上述性能约束条件可根据由神经网络模型所执行的任务的需求确定。例如应用程序中的一个功能需要达到95％的精度，则可以从上述搜索空间内搜索出精度不低于95％的神经网络模型结构。The above performance constraints may be determined by the hardware or software environment of the device running the neural network model structure. For example, if the minimum delay of running a neural network model on a chip is 0.2 seconds, a network structure that satisfies the delay condition can be searched in the search space. Alternatively, the performance constraints described above may be determined according to the requirements of the task performed by the neural network model. For example, if a function in an application needs to achieve 95% accuracy, a neural network model structure with an accuracy of not less than 95% can be searched from the above search space.

基于预测模型对搜索空间内的网络结构的性能预测结果以及预设的性能约束条件，能够快速搜索出合适的神经网络模型结构。从而该预测模型可以灵活地应用于搜索不同场景下适合的神经网络模型结构。Based on the performance prediction results of the prediction model on the network structure in the search space and the preset performance constraints, a suitable neural network model structure can be quickly searched. Therefore, the prediction model can be flexibly applied to search for suitable neural network model structures in different scenarios.

请参考图4，作为对上述用于训练预测模型的方法的实现，本公开提供了一种用于训练预测模型的装置的一个实施例，该装置实施例与图2和图3所示的方法实施例相对应，该装置具体可以应用于各种电子设备中。其中，预测模型用于预测神经网络结构的性能。Referring to FIG. 4 , as an implementation of the above method for training a prediction model, the present disclosure provides an embodiment of an apparatus for training a prediction model, which is the same as the method shown in FIGS. 2 and 3 . Corresponding to the embodiments, the apparatus can be specifically applied to various electronic devices. Among them, the prediction model is used to predict the performance of the neural network structure.

如图4所示，本实施例的用于训练预测模型的装置400包括采样单元401。采样单元401被配置为通过采样操作训练预测模型；采样单元执行的采样操作包括：从已训练完成的超网络中采样出子网络，并对采样出的子网络进行训练，得到训练完成的子网络的性能信息；基于训练完成的子网络和对应的性能信息构建样本数据，并利用样本数据训练预测模型；响应于确定当前采样操作中训练得到的预测模型的精度不满足预设的条件，执行下一次采样操作，并在下一次采样操作中增加采样的子网络的数量。As shown in FIG. 4 , the apparatus 400 for training a prediction model in this embodiment includes a sampling unit 401 . The sampling unit 401 is configured to train a prediction model through a sampling operation; the sampling operation performed by the sampling unit includes: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain a trained sub-network based on the trained sub-network and the corresponding performance information, construct sample data, and use the sample data to train the prediction model; in response to determining that the accuracy of the prediction model trained in the current sampling operation does not meet the preset conditions, execute the following One sampling operation, and in the next sampling operation increase the number of sub-networks sampled.

在一些实施例中，上述采样单元401按照如下方式从已训练完成的超网络中采样出子网络：采用初始的递归神经网络从已训练完成的超网络中采样出子网络；以及在对采样出的子网络进行训练之前，采样单元执行的采样操作还包括：基于训练好的子网络的性能信息生成反馈信息，以基于反馈信息迭代更新递归神经网络；基于迭代更新后的递归神经网络重新从已训练完成的超网络中采样出子网络。In some embodiments, the above-mentioned sampling unit 401 samples the sub-networks from the trained super-network in the following manner: using an initial recurrent neural network to sample the sub-networks from the trained super-network; Before training the sub-network, the sampling operation performed by the sampling unit further includes: generating feedback information based on the performance information of the trained sub-network, so as to iteratively update the recurrent neural network based on the feedback information; Sub-networks are sampled from the trained super-network.

在一些实施例中，上述采样单元401按照如下方式从已训练完成的超网络中采样出子网络：从已训练完成的超网络中采样出未被采样过的子网络；以及上述采样单元401按照如下方式构建样本数据：基于当前采样操作中采样出的子网络和对应的性能信息、以及上一次采样操作中采样出的子网络和对应的性能信息构建样本数据。In some embodiments, the sampling unit 401 samples the sub-networks from the super-network that has been trained in the following manner: sampling the sub-networks that have not been sampled from the super-network that has been trained; and the sampling unit 401 according to The sample data is constructed as follows: the sample data is constructed based on the sub-network and corresponding performance information sampled in the current sampling operation, and the sub-network and corresponding performance information sampled in the previous sampling operation.

在一些实施例中，上述装置还包括：搜索单元，被配置为基于训练完成的预测模型对预设的模型结构搜索空间内的模型结构的性能预测结果，以及预设的深度学习任务场景的性能约束条件，在模型结构搜索空间中搜索出满足性能约束条件的神经网络模型结构。In some embodiments, the above apparatus further includes: a search unit configured to predict the performance of the model structure in the preset model structure search space based on the trained prediction model, and the performance of the preset deep learning task scenario Constraints, the neural network model structure that satisfies the performance constraints is searched in the model structure search space.

上述装置400中的采样单元401与参考图2和图3描述的方法中的步骤相对应。由此，上文针对用于训练预测模型的方法描述的操作、特征及所能达到的技术效果同样适用于装置400及其中包含的单元，在此不再赘述。The sampling unit 401 in the above apparatus 400 corresponds to the steps in the method described with reference to FIG. 2 and FIG. 3 . Therefore, the operations, features, and technical effects that can be achieved as described above with respect to the method for training a prediction model are also applicable to the apparatus 400 and the units included therein, and will not be repeated here.

下面参考图5，其示出了适于用来实现本公开的实施例的电子设备(例如图1所示的服务器)500的结构示意图。图5示出的电子设备仅仅是一个示例，不应对本公开的实施例的功能和使用范围带来任何限制。Referring next to FIG. 5 , it shows a schematic structural diagram of an electronic device (eg, the server shown in FIG. 1 ) 500 suitable for implementing embodiments of the present disclosure. The electronic device shown in FIG. 5 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

如图5所示，电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501，其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中，还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in FIG. 5 , an electronic device 500 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into random access according to a program stored in a read only memory (ROM) 502 or from a storage device 508 Various appropriate actions and processes are executed by the programs in the memory (RAM) 503 . In the RAM 503, various programs and data necessary for the operation of the electronic device 500 are also stored. The processing device 501 , the ROM 502 , and the RAM 503 are connected to each other through a bus 504 . An input/output (I/O) interface 505 is also connected to bus 504 .

通常，以下装置可以连接至I/O接口505：包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507；包括例如硬盘等的存储装置508；以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备500，但是应理解的是，并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图5中示出的每个方框可以代表一个装置，也可以根据需要代表多个装置。Typically, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 507 such as a computer; a storage device 508 including, for example, a hard disk; and a communication device 509 . Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 5 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in FIG. 5 can represent one device, and can also represent multiple devices as required.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置509从网络上被下载和安装，或者从存储装置508被安装，或者从ROM 502被安装。在该计算机程序被处理装置501执行时，执行本公开的实施例的方法中限定的上述功能。需要说明的是，本公开的实施例所描述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 509 , or from the storage device 508 , or from the ROM 502 . When the computer program is executed by the processing device 501, the above-described functions defined in the methods of the embodiments of the present disclosure are performed. It should be noted that the computer-readable medium described in the embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In embodiments of the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. Rather, in embodiments of the present disclosure, a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

上述计算机可读介质可以是上述电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被该电子设备执行时，使得该电子设备：通过采样操作训练预测模型；采样操作包括：从已训练完成的超网络中采样出子网络，并对采样出的子网络进行训练，得到训练完成的子网络的性能信息；基于训练完成的子网络和对应的性能信息构建样本数据，并利用样本数据训练预测模型；响应于确定当前采样操作中训练得到的预测模型的精度不满足预设的条件，执行下一次采样操作，并在下一次采样操作中增加采样的子网络的数量，其中，预测模型用于预测神经网络结构的性能。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: trains a prediction model through a sampling operation; the sampling operation includes: from the trained super network The sub-network is sampled from the middle of the network, and the sampled sub-network is trained to obtain the performance information of the trained sub-network; the sample data is constructed based on the trained sub-network and the corresponding performance information, and the sample data is used to train the prediction model; In order to determine that the accuracy of the prediction model trained in the current sampling operation does not meet the preset conditions, perform the next sampling operation, and increase the number of sampled sub-networks in the next sampling operation, wherein the prediction model is used to predict the neural network structure. performance.

可以以一种或多种程序设计语言或其组合来编写用于执行本公开的实施例的操作的计算机程序代码，程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, and also A conventional procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本公开的实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中，例如，可以描述为：一种处理器包括采样单元。其中，这些单元的名称在某种情况下并不构成对该单元本身的限定，例如，采样单元还可以被描述为“通过采样操作训练预测模型的单元”。The units involved in the embodiments of the present disclosure may be implemented in software or hardware. The described unit may also be provided in the processor, for example, it may be described as: a processor includes a sampling unit. Among them, the names of these units do not constitute a limitation of the unit itself in some cases, for example, the sampling unit can also be described as "a unit for training a prediction model through sampling operations".

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the present disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned inventive concept, the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.

Claims

1. A method for training a predictive model for predicting the performance of a neural network structure, the method comprising training the predictive model by a sampling operation;

the sampling operation comprises:

sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network;

constructing sample data based on the trained sub-network and the corresponding performance information, and training the prediction model by using the sample data;

and in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation.

2. The method of claim 1, wherein said sampling a subnetwork from a trained super-network comprises:

sampling a sub-network from the trained super-network by adopting an initial recurrent neural network; and

before training the sampled subnetwork, the sampling operation further comprises:

generating feedback information based on the performance information of the trained sub-network to iteratively update the recurrent neural network based on the feedback information;

and re-sampling the sub-networks from the trained super-network based on the iteratively updated recurrent neural network.

3. The method of claim 1, wherein said sampling a subnetwork from a trained super-network comprises:

sampling a sub-network which is not sampled from the trained super-network; and

the constructing of sample data based on the trained sub-networks and corresponding performance information comprises:

and constructing sample data based on the sub-networks and the corresponding performance information sampled in the current sampling operation and the sub-networks and the corresponding performance information sampled in the last sampling operation.

4. The method of claim 1, wherein the sampling operation further comprises:

and generating a trained prediction model based on a training result of the current sampling operation in response to determining that the precision of the prediction model meets a preset condition.

5. The method of any of claims 1-4, wherein the method further comprises:

and searching a neural network model structure meeting the performance constraint condition in the model structure search space based on the performance prediction result of the trained prediction model on the model structure in the preset model structure search space and the performance constraint condition of the preset deep learning task scene.

6. An apparatus for training a prediction model for predicting performance of a neural network structure, the apparatus comprising a sampling unit configured to train the prediction model by a sampling operation;

the sampling operation performed by the sampling unit comprises:

7. The apparatus of claim 6, wherein the sampling unit samples a subnetwork from a trained super-network as follows:

before training the sampled sub-network, the sampling operation performed by the sampling unit further includes:

8. The apparatus of claim 6, wherein the sampling unit samples a subnetwork from a trained super-network as follows:

sampling a sub-network which is not sampled from the trained super-network; and

the sampling unit constructs sample data according to the following mode:

9. The apparatus of claim 6, wherein the sampling operation further comprises:

10. The apparatus of any of claims 6-9, wherein the apparatus further comprises:

and the searching unit is configured to search out a neural network model structure meeting the performance constraint condition in the model structure searching space based on the performance prediction result of the trained prediction model on the model structure in the preset model structure searching space and the performance constraint condition of the preset deep learning task scene.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.