CN109255020B

CN109255020B - Method for solving dialogue generation task by using convolution dialogue generation model

Info

Publication number: CN109255020B
Application number: CN201811057115.9A
Authority: CN
Inventors: 赵洲; 章璇; 孟令涛; 梁伟欣; 金志华
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2022-04-01
Anticipated expiration: 2038-09-11
Also published as: CN109255020A

Abstract

The invention discloses a method for solving a dialogue generation task by using a convolution dialogue generation model. Add, get the comprehensive expression vector of the word; input to the coding network that combines the convolutional layer and the gated linear unit to obtain the comprehensive expression above; convert the last word above into the meaning vector of the last word, and combine the last word The position vector of the word, the two are added together to obtain the comprehensive expression of the last word; the input is input to the coding network that combines the convolutional layer and the gated linear unit, and the above comprehensive expression is combined to obtain the expression of the next word to be generated. The present invention utilizes the convolutional dialogue generation model, and can overcome the problems that the use of the cyclic neural network in the prior art leads to the inability to utilize the parallel characteristics of the GPU, and the cyclic neural network causes the gradient to disappear.

Description

A method for solving dialogue generation tasks using convolutional dialogue generation models

技术领域technical field

本发明涉及对话生成任务技术领域，具体涉及一种利用卷积对话生成模型解决对话生成任务的方法。The invention relates to the technical field of dialogue generation tasks, in particular to a method for solving dialogue generation tasks by using a convolution dialogue generation model.

背景技术Background technique

当下，非任务导向的对话生成已经引起了广泛关注，成为一项重要的服务，但是目前已有该项服务的效果并不是很好。At present, non-task-oriented dialogue generation has attracted extensive attention and has become an important service, but the effect of the existing service is not very good.

现有的技术主要是使用循环神经网络为基础来做，该方法主要是通过利用循环神经网络所拥有的时序含义，来完成对话的生成。但是循环神经网络由于涉及到时序，所以无法使用到GPU(Graphics Processing Unit,图形处理器)的并行特点。同时，由于循环神经网络的链式求导，导致循环神经网络倾向于出现梯度消失现象。为了克服这些缺陷，本方法将使用卷积对话生成模型完成对话生成任务。The existing technology is mainly based on the recurrent neural network, and the method mainly uses the time sequence meaning possessed by the recurrent neural network to complete the dialogue generation. However, the cyclic neural network cannot use the parallel characteristics of the GPU (Graphics Processing Unit, graphics processor) because it involves timing. At the same time, due to the chain derivation of the recurrent neural network, the recurrent neural network tends to have a gradient disappearance phenomenon. To overcome these shortcomings, this method will use a convolutional dialogue generation model to complete the dialogue generation task.

本发明将首先利用带有注意力机制模块的卷积神经网络获取当前对话上下文的表达，之后将这个表达输入到解码模块中，获取所需对话的下一个字，依次进行，生成整个对话。The present invention will first use a convolutional neural network with an attention mechanism module to obtain the expression of the current dialogue context, and then input this expression into the decoding module to obtain the next word of the required dialogue, and proceed sequentially to generate the entire dialogue.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于解决现有技术中的问题，为了克服现有技术中使用循环神经网络导致无法利用GPU并行特点，且循环神经网络会导致梯度消失的问题，本发明提供一种利用卷积对话生成模型解决对话生成任务的方法。The purpose of the present invention is to solve the problems in the prior art. In order to overcome the problems in the prior art that the use of a cyclic neural network leads to the inability to utilize the parallel characteristics of the GPU, and the cyclic neural network will cause the gradient to disappear, the present invention provides a method using convolutional dialogue. A generative model approach to solving the task of dialogue generation.

本发明所采用的具体技术方案是：The concrete technical scheme adopted in the present invention is:

一种利用卷积对话生成模型解决对话生成任务的方法，包括如下步骤：A method for solving a dialogue generation task using a convolutional dialogue generation model, comprising the following steps:

1)针对于所要生成的对话的下一个词的上文(context)，将上文进行单词映射成相应的含义向量(获取上文的单词表达)，并获得单词的位置向量，之后将得到的单词的含义向量与单词的位置向量相加，获取单词的综合表达向量；1) For the context of the next word of the dialogue to be generated, map the above word into the corresponding meaning vector (obtain the above word expression), and obtain the position vector of the word, and then obtain the The meaning vector of the word is added to the position vector of the word to obtain the comprehensive expression vector of the word;

将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络，获取上文的综合表达；Input the obtained comprehensive expression vector of the word into the coding network that combines the convolutional layer and the gated linear unit to obtain the above comprehensive expression;

2)将所要生成的对话的下一个词的上文最后一个单词(最后一次生成的单词，简称最后单词)转换成最后单词的含义向量(获取最后单词的表达)，并结合最后单词的位置向量，两者相加获取最后单词的综合表达；2) Convert the last word above the next word of the dialogue to be generated (the last generated word, referred to as the last word) into the meaning vector of the last word (obtain the expression of the last word), and combine the position vector of the last word , the two are added together to obtain the comprehensive expression of the last word;

将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络，并结合步骤1)获得的上文的综合表达，获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词)；Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate);

3)经过训练，得到最终的卷积对话生成模型，利用该模型可以生成所需的上下文对话。3) After training, the final convolutional dialogue generation model is obtained, which can be used to generate the required contextual dialogue.

步骤1)中，所述的单词的含义向量为w_c＝{w_c1,...,w_cn}，w_c为第c个单词的含义向量，w_c1为第c个单词的含义向量第1维数值，w_cn为第c个单词的含义向量第n维数值；In step 1), the meaning vector of the word is w _c ={w _c1 ,...,w _cn }, w _c is the meaning vector of the c-th word, and w _c1 is the meaning vector of the c-th word. 1-dimensional value, _wcn is the nth-dimensional value of the meaning vector of the c-th word;

所述的单词的位置向量为p_c＝{p_c1,...,p_cn}，p_c为第c个单词的位置向量，p_c1为第c个单词的位置向量第1维数值，p_cn为第c个单词的位置向量第n维数值；The position vector of the word is p _c ={p _c1 ,...,p _cn }, p _c is the position vector of the cth word, p _c1 is the first dimension value of the position vector of the cth word, p _cn is the nth dimension value of the position vector of the cth word;

所述的单词的综合表达向量o_c＝{o_c1,...,o_cn}，o_c为第c个单词的综合表达向量，o_c1为第c个单词的综合表达向量第1维数值，o_cn为第c个单词的综合表达向量第n维数值。The comprehensive expression vector of the word o _c ={o _c1 ,...,o _cn }, o _c is the comprehensive expression vector of the c-th word, and o _c1 is the first-dimensional value of the comprehensive expression vector of the c-th word , o _cn is the nth dimension value of the comprehensive expression vector of the cth word.

步骤1)中，将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络，获取上文的综合表达，具体包括：In step 1), the comprehensive expression vector of the acquired word is input into the coding network combining the convolutional layer and the gated linear unit to obtain the comprehensive expression above, specifically including:

1.1)将单词的综合表达向量o_c＝{o_c1,...,o_cn}循环输入到m个卷积模块中，利用这m个卷积模块获得上文的综合表达向量q^m；m个卷积模块中每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成，卷积计算操作会按照如下公式生成两列d维向量Y＝[A,B]∈R^2d，1.1) The comprehensive expression vector o _c ={o _c1 ,...,o _cn } of the word is cyclically input into m convolution modules, and the m convolution modules are utilized to obtain the above comprehensive expression vector q ^m ; m Each convolution module in the convolution modules consists of a convolution calculation operation and a nonlinear calculation operation. The convolution calculation operation will generate two columns of d-dimensional vectors Y=[A,B]∈R ^2d according to the following formula,

Y＝f_conv(X)＝W^mX+b^m Y=f _conv (X)=W ^m X+b ^m

其中，A为第一列d维向量，B为第二列d维向量，R^2d为2d维度所有向量集合，f_conv(X)代表卷积操作，X代表卷积计算操作的输入映射表达向量，W^m代表第m个卷积计算操作中的权重矩阵，b^m代表第m个卷积计算操作中的偏置值向量；Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R ^2d is the set of all 2d-dimensional vectors, f _conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W ^m represents the weight matrix in the mth convolution calculation operation, b ^m represents the bias value vector in the mth convolution calculation operation;

通过计算得到两列d维向量Y＝[A,B]∈R^2d；Two columns of d-dimensional vectors Y=[A,B]∈R ^2d are obtained by calculation;

1.2)非线性计算操作会利用步骤1.1)卷积操作生成的输出Y＝[A,B]∈R^2d中的第二列d维向量B，结合门操作函数δ(B)，获取控制网络中信息流动量的输出g＝δ(B)，该输出将传递到下一个神经元；将卷积操作生成的输出Y＝[A,B]∈R^2d中的第一列d维向量A，结合生成的控制网络中信息流动量的输出g＝δ(B)，按照如下公式获取编码器的卷积模块输出，1.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R ^2d generated by the convolution operation in step 1.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), which will be passed to the next neuron; the output Y = [A, B] ∈ R ^2d generated by the convolution operation is the first column of d-dimensional vector A in R 2d, combined with The output of the information flow in the generated control network is g=δ(B), and the output of the convolution module of the encoder is obtained according to the following formula,

其中，

代表第m个编码器卷积模块的输出的第i维值，f_conv(.)代表卷积操作，

代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维，k为定义好的一个参数(例如可以定3、5、7等)，

代表第m-1个编码器卷积模块的输出的第i维值；in,

represents the i-th dimension value of the output of the mth encoder convolution module, f _conv (.) represents the convolution operation,

Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.),

represents the i-th dimension value of the output of the m-1th encoder convolution module;

经过m个卷积模块的连续操作，可以获得上文的综合表达q^m。After successive operations of m convolution modules, the above comprehensive expression q ^m can be obtained.

步骤2)中，所述的最后单词的含义向量为w_w＝{w_w1,...,w_wn}，w_w为最后单词的含义向量，w_w为最后单词的含义向量第1维数值，w_wn为最后单词的含义向量第n维数值；In step 2), the meaning vector of the last word is w _w ={w _w1 ,...,w _wn }, w _w is the meaning vector of the last word, and w _w is the first dimension value of the meaning vector of the last word. , w _wn is the nth dimension value of the meaning vector of the last word;

所述的最后单词的位置向量为p_w＝{p_w1,...,p_wn}，p_w为最后单词的位置向量，p_w1为最后单词的位置向量第1维数值，p_wn为最后单词的的位置向量第n维数值；The position vector of the last word is p _w ={p _w1 ,...,p _wn }, p _w is the position vector of the last word, p _w1 is the first dimension value of the position vector of the last word, and p _wn is the last word. The nth dimension value of the position vector of the word;

所述的最后单词的综合表达为o_w＝{o_w1,...,o_wn}，o_w为最后单词的综合表达向量，o_w1为最后单词的综合表达向量第1维数值，o_wn为最后单词的综合表达向量第n维数值。The comprehensive expression of the last word is o _w ={o _w1 ,...,o _wn }, o _w is the comprehensive expression vector of the last word, o _w1 is the first dimension value of the comprehensive expression vector of the last word, o _wn It is the nth dimension value of the comprehensive expression vector of the last word.

将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络，并结合步骤1)获得的上文的综合表达，获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词)，具体包括：Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate), including:

2.1)将最后单词的综合表达o_w＝{o_w1,...,o_wn}循环输入到与编码器中相同的m个卷积模块中，利用这m个卷积模块获得下一个要生成的单词的预测表达r^m；每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成，卷积计算操作会按照如下公式生成两列d维向量Y＝[A,B]∈R^2d，2.1) Input the comprehensive expression of the last word _{ow ={o w1} _, ...,o _wn } cyclically into the same m convolution modules as in the encoder, and use these m convolution modules to obtain the next generation to be generated. The predicted expression r ^m of the word of R ^2d ,

Y＝f_conv(X)＝W^mX+b^m Y=f _conv (X)=W ^m X+b ^m

2.2)非线性计算操作会利用步骤2.1)卷积操作生成的输出Y＝[A,B]∈R^2d中的第二列d维向量B，结合门操作函数δ(B)，获取控制网络中信息流动量的输出g＝δ(B)，该输出将传递到下一个神经元；2.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R ^2d generated by the convolution operation in step 2.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), this output will be passed to the next neuron;

将卷积操作生成的输出Y＝[A,B]∈R^2d中的第一列d维向量A，结合生成的控制网络中信息流动量的输出g＝δ(B)，按照如下公式获取编码器的卷积模块输出；The first column d-dimensional vector A in the output Y=[A,B]∈R ^2d generated by the convolution operation is combined with the output g=δ(B) of the information flow in the generated control network, and the code is obtained according to the following formula The output of the convolution module of the device;

其中r_i ^m代表第m个编码器卷积模块的输出的第i维值，f_conv(.)代表卷积操作，

代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维，k为定义好的一个参数(例如可以定3、5、7等)，r_i ^m-1代表第m-1个编码器卷积模块的输出的第i维值；where r _i ^m represents the i-th dimension value of the output of the m-th encoder convolution module, f _conv (.) represents the convolution operation,

Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.), r _i ^m-1 represents the i-th dimension value of the output of the m-1th encoder convolution module;

2.3)利用如下公式，结合解码器第m个卷积模块的输出的第i维值r_i ^m，获取该解码器卷积模块对应注意力机制输出的第i维值

2.3) Using the following formula, combined with the i-th dimension value r _i ^m of the output of the m-th convolution module of the decoder, obtain the i-th dimension value output by the decoder convolution module corresponding to the attention mechanism

其中，

代表权重矩阵，

代表偏置向量，g_i代表参数系数(g_i可人为设定)；in,

represents the weight matrix,

represents the bias vector, and _gi represents the parameter coefficient ( _gi can be set manually);

之后利用如下公式可以获取对应于解码器第m个卷积模块的对应注意力机制输出的第i维值

结合编码器第m个卷积模块输出中的第j维值

为步骤1)中综合表达向量q^m第j维数值，获得对应的激活参数

After that, the i-th dimension value corresponding to the output of the corresponding attention mechanism of the m-th convolution module of the decoder can be obtained by using the following formula

Combine the jth dimension values in the output of the mth convolutional block of the encoder

is the jth dimension value of the comprehensive expression vector q ^m in step 1) to obtain the corresponding activation parameter

之后结合编码器整体输出的第j维值

结合编码器步骤1)中单词的综合表达向量o_c＝{o_c1,...,o_cn}的第j维值o_cj，o_cj为第c个单词的综合表达向量第j维数值，获取解码器第m个卷积模块输出的第i维值激活部分添加项

Then combine the jth dimension value of the overall output of the encoder

Combined with the jth dimension value o _cj of the comprehensive expression vector o _c ={o _c1 ,...,o _cn } of the word in step 1) of the encoder, o _cj is the jth dimension value of the comprehensive expression vector of the c th word, Get the addition of the activation part of the i-th dimension value of the output of the m-th convolution module of the decoder

将生成的解码器第m个卷积模块输出的第i维值激活部分添加项

与解码器第m个卷积模块输出的第i维值r_i ^m相加，经过m个卷积模块的循环处理，获得最终的解码器输出r^m；Add an item to the activation part of the i-th dimension value output by the m-th convolution module of the generated decoder

Add the i-th dimension value r _i ^m output by the m-th convolution module of the decoder, and obtain the final decoder output r ^m through the cyclic processing of the m-th convolution module;

2.3)通过将解码器的输出r^m，输入到softmax函数中，按照如下公式获取将要生成的下一个单词的概率，2.3) By inputting the output rm of the decoder into the ^softmax function, the probability of the next word to be generated is obtained according to the following formula,

p(y_i+1|y₁,...,y_i)＝softmax(W_or^m+b_o)p(y _i+1 |y ₁ ,...,y _i )=softmax(W _o r ^m +b _o )

其中，W_o代表权重矩阵，b_o代表偏置向量，softmax(.)代表softmax函数，利用该概率输出，找到最大的概率对应的单词作为生成的对话下一个单词输出。p(y_i+1|y₁,...,y_i)为下一个单词的概率，y_i+1|y₁,...,y_i中，y_i+1表示第i+1个单词，y₁为表示第1个单词，y_i表示第i个单词。Among them, W _o represents the weight matrix, b _o represents the bias vector, and softmax(.) represents the softmax function. Using the probability output, find the word corresponding to the maximum probability as the output of the next word in the generated dialogue. p(y _i+1 |y ₁ ,...,y _i ) is the probability of the next word, in y _i+1 |y ₁ ,...,y _i , y _i+1 represents the i+1th word word, y ₁ means the first word, and y _i means the i-th word.

与现有技术相比，本发明具有如下优点：Compared with the prior art, the present invention has the following advantages:

本发明利用卷积对话生成模型解决对话生成任务的方法，相比于一般的对话生成解决方案，本发明利用了卷积对话生成模型，能够克服现有技术中使用循环神经网络导致无法利用GPU并行特点，且循环神经网络会导致梯度消失的问题。本发明在对话生成任务中所取得的效果相比于传统的方法更好。The present invention uses the convolutional dialogue generation model to solve the dialogue generation task. Compared with the general dialogue generation solution, the present invention uses the convolutional dialogue generation model, which can overcome the inability to use GPU parallelism caused by the use of cyclic neural networks in the prior art. characteristics, and the recurrent neural network will cause the problem of gradient disappearance. Compared with the traditional method, the effect achieved by the present invention in the dialogue generation task is better.

附图说明Description of drawings

图1为本发明利用卷积对话生成模型解决对话生成任务的方法的流程示意图。FIG. 1 is a schematic flowchart of a method for solving a dialogue generation task by using a convolution dialogue generation model according to the present invention.

具体实施方式Detailed ways

如图1所示，一种利用卷积对话生成模型解决对话生成任务的方法，包括如下步骤：As shown in Figure 1, a method for solving a dialogue generation task using a convolutional dialogue generation model includes the following steps:

1)针对于所要生成的对话的下一个词的上文(context)，将上文进行单词映射成相应的含义向量(获取上文的单词表达)，并获得单词的位置向量，之后将得到的单词的含义向量与单词的位置向量表达相加，获取单词的综合表达向量；1) For the context of the next word of the dialogue to be generated, map the above word into the corresponding meaning vector (obtain the above word expression), and obtain the position vector of the word, and then obtain the The meaning vector of the word and the position vector expression of the word are added to obtain the comprehensive expression vector of the word;

步骤1)中，单词的含义向量为w_c＝{w_c1,...,w_cn}，w_c为第c个单词的含义向量，w_c1为第c个单词的含义向量第1维数值，w_cn为第c个单词的含义向量第n维数值；In step 1), the meaning vector of the word is w _c ={w _c1 ,...,w _cn }, w _c is the meaning vector of the c-th word, and w _c1 is the first dimension value of the meaning vector of the c-th word. , _wcn is the nth dimension value of the meaning vector of the cth word;

单词的位置向量为p_c＝{p_c1,...,p_cn}，p_c为第c个单词的位置向量，p_c1为第c个单词的位置向量第1维数值，p_cn为第c个单词的位置向量第n维数值；The position vector of the word is p _c ={p _c1 ,...,p _cn }, p _c is the position vector of the c-th word, p _c1 is the first dimension value of the position vector of the c-th word, and p _cn is the th The nth dimension value of the position vector of the c words;

单词的综合表达向量o_c＝{o_c1,...,o_cn}，o_c为第c个单词的综合表达向量，o_c1为第c个单词的综合表达向量第1维数值，o_cn为第c个单词的综合表达向量第n维数值。The comprehensive expression vector of words o _c ={o _c1 ,...,o _cn }, o _c is the comprehensive expression vector of the cth word, o _c1 is the first dimension value of the comprehensive expression vector of the cth word, o _cn is the nth dimension value of the comprehensive expression vector of the cth word.

将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络，获取上文的综合表达，具体包括：Input the obtained comprehensive expression vector of the word into the coding network that combines the convolutional layer and the gated linear unit to obtain the above comprehensive expression, including:

Y＝f_conv(X)＝W^mX+b^m Y=f _conv (X)=W ^m X+b ^m

其中，

代表第m-1个编码器卷积模块的输出的第i维值；in,

2)通过所要生成的对话的下一个词的上文最后一个单词(最后一次生成的单词，简称最后单词)转换成最后单词的含义向量(获取最后单词的表达)，并结合最后单词的位置向量，两者相加获取最后单词的综合表达；2) Convert the last word above the next word of the dialogue to be generated (the last generated word, referred to as the last word) into the meaning vector of the last word (obtain the expression of the last word), and combine the position vector of the last word , the two are added together to obtain the comprehensive expression of the last word;

步骤2)中，最后单词的含义向量为w_w＝{w_w1,...,w_wn}，w_w为最后单词的含义向量，w_w为最后单词的含义向量第1维数值，w_wn为最后单词的含义向量第n维数值；In step 2), the meaning vector of the last word is w _w ={w _w1 ,...,w _wn }, w _w is the meaning vector of the last word, w _w is the first dimension value of the meaning vector of the last word, w _wn is the nth dimension value of the meaning vector of the last word;

最后单词的位置向量为p_w＝{p_w1,...,p_wn}，p_w为最后单词的位置向量，p_w1为最后单词的位置向量第1维数值，p_wn为最后单词的的位置向量第n维数值；The position vector of the last word is p _w ={p _w1 ,...,p _wn }, p _w is the position vector of the last word, p _w1 is the first dimension value of the position vector of the last word, and p _wn is the value of the last word The nth dimension value of the position vector;

最后单词的综合表达为o_w＝{o_w1,...,o_wn}，o_w为最后单词的综合表达向量，o_w1为最后单词的综合表达向量第1维数值，o_wn为最后单词的综合表达向量第n维数值。The comprehensive expression of the last word is o _w ={o _w1 ,...,o _wn }, o _w is the comprehensive expression vector of the last word, o _w1 is the first dimension value of the comprehensive expression vector of the last word, and o _wn is the last word The composite expression vector of nth dimension values.

Y＝f_conv(X)＝W^mX+b^m Y=f _conv (X)=W ^m X+b ^m

其中，

代表权重矩阵，

代表偏置向量，g_i代表参数系数(g_i可人为设定)；in,

represents the weight matrix,

结合编码器第m个卷积模块输出中的第j维值

为步骤1)中综合表达向量q^m第j维数值，获得对应的激活参数

之后结合编码器整体输出的第j维值

Then combine the jth dimension value of the overall output of the encoder

下面将上述方法应用于下列实施例中，以体现本发明的技术效果，实施例中具体步骤不再赘述。The above method is applied in the following embodiments to reflect the technical effect of the present invention, and the specific steps in the embodiments will not be repeated.

实施例Example

本发明在DailyDialog数据集上进行实验。为了客观地评价本发明的算法的性能，本发明在所选出的测试集中，使用了Average，Greedy，Extrema，Training Time这四种评价标准来对于本发明的效果进行评价。按照具体实施方式中描述的步骤，所得的实验结果如表1所示，本发明针对于Average，Greedy，Extrema，Training Time四种标准的测试结果，本方法表示为ConvTalker。The present invention conducts experiments on the DailyDialog dataset. In order to objectively evaluate the performance of the algorithm of the present invention, the present invention uses four evaluation criteria of Average, Greedy, Extrema, and Training Time in the selected test set to evaluate the effect of the present invention. According to the steps described in the specific embodiment, the obtained experimental results are shown in Table 1. The present invention is directed to the test results of four standards of Average, Greedy, Extrema, and Training Time, and this method is expressed as ConvTalker.

表1Table 1

Claims

1. A method for solving a dialog generation task using a convolutional dialog generation model, comprising the steps of:

1) aiming at the upper text of the next word of the dialog to be generated, mapping the word of the upper text into a corresponding meaning vector, obtaining a position vector of the word, and then adding the obtained meaning vector of the word and the position vector of the word to obtain a comprehensive expression vector of the word;

inputting the obtained comprehensive expression vector of the word into a coding network combined with a convolutional layer and a gate linear unit to obtain the comprehensive expression, specifically comprising:

1.1) comprehensive expression vector o of words_c＝{o_c1,...,o_cnCircularly inputting the data into m convolution modules, and obtaining the comprehensive expression vector q by using the m convolution modules^m(ii) a Each convolution module in the m convolution modules consists of a convolution calculation operation and a nonlinear calculation operation, and the convolution calculation operation generates two columns of d-dimensional vectors Y ═ A, B according to the following formula]∈R^2d，

Y＝f_conv(X)＝W^mX+b^m

Wherein A is a first row d-dimensional vector, B is a second row d-dimensional vector, R^2dSet of all vectors for 2d dimension, f_conv(X) represents convolution operation, X represents input mapping expression vector of convolution calculation operation, W^mRepresenting the weight matrix in the mth convolution calculation operation, b^mRepresenting a vector of bias values in the mth convolution calculation operation;

two columns of d-dimensional vectors Y ═ A, B are obtained through calculation]∈R^2d；

1.2) nonlinear computation operation the output Y ═ a, B generated using the convolution operation of step 1.1) is]∈R^2dThe second column of d-dimensional vectors B, combined with the gate operation function δ (B), obtains the output g ═ δ (B) of the information flow in the control network, which will be passed to the next neuron;

the output Y generated by the convolution operation is ═ A, B]∈R^2dThe first column d-dimensional vector A in the control network is combined with the generated output g of the information flow quantity in the control network to be delta (B), the convolution module output of the encoder is obtained according to the following formula,

wherein,

representing the i-dimensional value, f, of the output of the m-th encoder convolution module_conv(.) represents a convolution operation,

the (i-k/2) th to (i + k/2) th dimensions of the (m-1) th encoder convolution module are represented, k is a defined parameter, and k is 3, 5 and 7，

An ith dimension value representing the output of the (m-1) th encoder convolution module;

through the continuous operation of m convolution modules, the above comprehensive expression q can be obtained^m；

The meaning vector of the word is w_c＝{w_c1,...,w_cn}，w_cIs the meaning vector of the c-th word, w_c1Is the 1 st dimension value, w, of the meaning vector of the c word_cnIs the meaning vector of the c word to the nth dimension value;

the position vector of the word is p_c＝{p_c1,...,p_cn}，p_cIs the position vector of the c-th word, p_c1Is the position vector of the c-th word, the 1 st dimension value, p_cnIs the position vector nth dimension value of the c word;

the comprehensive expression vector o of the words_c＝{o_c1,...,o_cn}，o_cIs a comprehensive expression vector of the c-th word, o_c1Is the 1 st dimension value, o, of the comprehensive expression vector of the c word_cnThe nth dimension value is the comprehensive expression vector of the c word;

2) converting the last word in the previous text of the next word of the dialog to be generated into a meaning vector of the last word, and adding the last word and the position vector of the last word to obtain the comprehensive expression of the last word;

inputting the comprehensive expression of the last word into a coding network combined with the convolutional layer and the gate linear unit, and combining the comprehensive expression obtained in the step 1) to obtain the expression of the next word to be generated, which specifically comprises the following steps:

2.1) expressing the last word in general o_w＝{o_w1,...,o_wnThe predicted expression r of the next word to be generated is obtained by the m convolution modules^m(ii) a Each convolution module consists of a convolution calculation operation and a non-linear calculation operation,the convolution operation generates two columns of d-dimensional vectors Y ═ a, B according to the following formula]∈R^2d，

Y＝f_conv(X)＝W^mX+b^m

2.2) nonlinear computation operation the output Y generated by the convolution operation of step 2.1) is ═ a, B]∈R^2dThe second column of d-dimensional vectors B, combined with the gate operation function δ (B), obtains the output g ═ δ (B) of the information flow in the control network, which will be passed to the next neuron;

the output Y generated by the convolution operation is ═ A, B]∈R^2dCombining the generated output g of the information flow quantity in the control network with the d-dimensional vector A in the first column, and obtaining the convolution module output of the encoder according to the following formula;

wherein r is_i ^mRepresenting the i-dimensional value, f, of the output of the m-th encoder convolution module_conv(.) represents a convolution operation,

(i-k/2) to (i + k/2) dimensions representing the output of the (m-1) th encoder convolution module, k being a well-defined parameter, r_i ^m-1An ith dimension value representing the output of the (m-1) th encoder convolution module;

2.3) combining the i-dimensional value r of the output of the m-th convolution module of the decoder using the following formula_i ^mObtaining the decoder volumeThe product module corresponds to the ith dimension value output by the attention mechanism

Wherein,

a matrix of the weights is represented by,

representing an offset vector, g_iRepresenting a parameter coefficient;

the ith dimension value corresponding to the attention mechanism output of the mth convolution module of the decoder can then be obtained using the following formula

Combining j-dimension values in the m-convolution module output of the encoder

For the comprehensive expression vector q in step 1)^mJ dimensional value to obtain corresponding activation parameter

Then combining j dimension value of integral output of coder

Combining the comprehensive expression vector o of the words in the encoder step 1)_c＝{o_c1,...,o_cnJ-th dimension value o of_cj，o_cjFor the comprehensive expression vector of the c word to the j dimension value, the i dimension value activation part addition item output by the m convolution module of the decoder is obtained

Adding the generated i-dimension value activation part output by the mth convolution module of the decoder

And the ith dimension value r output by the mth convolution module of the decoder_i ^mAdding the data, and circularly processing by m convolution modules to obtain final decoder output r^m；

2.4) by dividing the output r of the decoder^mInput into the softmax function, the probability of the next word to be generated is obtained according to the following formula,

p(y_i+1|y₁,...,y_i)＝softmax(W_or^m+b_o)

wherein, W_oRepresenting a weight matrix, b_oRepresenting a bias vector, and softmax (.) representing a softmax function, and using the probability output to find a word corresponding to the maximum probability as a next word of the generated dialogue to be output;

the meaning vector of the last word is w_w＝{w_w1,...,w_wn}，w_wIs the meaning vector of the last word, w_wIs the meaning vector of the last word, the 1 st dimensional numerical value, w_wnThe nth dimension value is the meaning vector of the last word;

the position vector of the last word is p_w＝{p_w1,...,p_wn}，p_wAs a position vector of the last word, p_w1Is the 1 st dimension value, p, of the position vector of the last word_wnThe nth dimension value is the position vector of the last word;

said final word is expressed as o_w＝{o_w1,...,o_wn}，o_wIs a comprehensive expression vector of the last word, o_w1For the final word's comprehensive expression vector, the 1 st dimension value, o_wnThe nth dimension value of the comprehensive expression vector of the last word;

3) and training to obtain a final convolution dialogue generation model, and generating the required context dialogue by using the model.