CN109255020B - Method for solving dialogue generation task by using convolution dialogue generation model - Google Patents
Method for solving dialogue generation task by using convolution dialogue generation model Download PDFInfo
- Publication number
- CN109255020B CN109255020B CN201811057115.9A CN201811057115A CN109255020B CN 109255020 B CN109255020 B CN 109255020B CN 201811057115 A CN201811057115 A CN 201811057115A CN 109255020 B CN109255020 B CN 109255020B
- Authority
- CN
- China
- Prior art keywords
- word
- vector
- convolution
- output
- dimension value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Error Detection And Correction (AREA)
Abstract
本发明公开了一种利用卷积对话生成模型解决对话生成任务的方法,包括如下步骤:针对于所要生成的对话的下一个词的上文,得到的单词的含义向量与单词的位置向量,相加,获取单词的综合表达向量;输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达;将上文最后一个单词转换成最后单词的含义向量,并结合最后单词的位置向量,两者相加获取最后单词的综合表达;输入到结合了卷积层与门式线性单元结合的编码网络,并结合上文的综合表达,获取下一个要生成单词的表达。本发明利用了卷积对话生成模型,能够克服现有技术中使用循环神经网络导致无法利用GPU并行特点,且循环神经网络会导致梯度消失的问题。
The invention discloses a method for solving a dialogue generation task by using a convolution dialogue generation model. Add, get the comprehensive expression vector of the word; input to the coding network that combines the convolutional layer and the gated linear unit to obtain the comprehensive expression above; convert the last word above into the meaning vector of the last word, and combine the last word The position vector of the word, the two are added together to obtain the comprehensive expression of the last word; the input is input to the coding network that combines the convolutional layer and the gated linear unit, and the above comprehensive expression is combined to obtain the expression of the next word to be generated. The present invention utilizes the convolutional dialogue generation model, and can overcome the problems that the use of the cyclic neural network in the prior art leads to the inability to utilize the parallel characteristics of the GPU, and the cyclic neural network causes the gradient to disappear.
Description
技术领域technical field
本发明涉及对话生成任务技术领域,具体涉及一种利用卷积对话生成模型解决对话生成任务的方法。The invention relates to the technical field of dialogue generation tasks, in particular to a method for solving dialogue generation tasks by using a convolution dialogue generation model.
背景技术Background technique
当下,非任务导向的对话生成已经引起了广泛关注,成为一项重要的服务,但是目前已有该项服务的效果并不是很好。At present, non-task-oriented dialogue generation has attracted extensive attention and has become an important service, but the effect of the existing service is not very good.
现有的技术主要是使用循环神经网络为基础来做,该方法主要是通过利用循环神经网络所拥有的时序含义,来完成对话的生成。但是循环神经网络由于涉及到时序,所以无法使用到GPU(Graphics Processing Unit,图形处理器)的并行特点。同时,由于循环神经网络的链式求导,导致循环神经网络倾向于出现梯度消失现象。为了克服这些缺陷,本方法将使用卷积对话生成模型完成对话生成任务。The existing technology is mainly based on the recurrent neural network, and the method mainly uses the time sequence meaning possessed by the recurrent neural network to complete the dialogue generation. However, the cyclic neural network cannot use the parallel characteristics of the GPU (Graphics Processing Unit, graphics processor) because it involves timing. At the same time, due to the chain derivation of the recurrent neural network, the recurrent neural network tends to have a gradient disappearance phenomenon. To overcome these shortcomings, this method will use a convolutional dialogue generation model to complete the dialogue generation task.
本发明将首先利用带有注意力机制模块的卷积神经网络获取当前对话上下文的表达,之后将这个表达输入到解码模块中,获取所需对话的下一个字,依次进行,生成整个对话。The present invention will first use a convolutional neural network with an attention mechanism module to obtain the expression of the current dialogue context, and then input this expression into the decoding module to obtain the next word of the required dialogue, and proceed sequentially to generate the entire dialogue.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于解决现有技术中的问题,为了克服现有技术中使用循环神经网络导致无法利用GPU并行特点,且循环神经网络会导致梯度消失的问题,本发明提供一种利用卷积对话生成模型解决对话生成任务的方法。The purpose of the present invention is to solve the problems in the prior art. In order to overcome the problems in the prior art that the use of a cyclic neural network leads to the inability to utilize the parallel characteristics of the GPU, and the cyclic neural network will cause the gradient to disappear, the present invention provides a method using convolutional dialogue. A generative model approach to solving the task of dialogue generation.
本发明所采用的具体技术方案是:The concrete technical scheme adopted in the present invention is:
一种利用卷积对话生成模型解决对话生成任务的方法,包括如下步骤:A method for solving a dialogue generation task using a convolutional dialogue generation model, comprising the following steps:
1)针对于所要生成的对话的下一个词的上文(context),将上文进行单词映射成相应的含义向量(获取上文的单词表达),并获得单词的位置向量,之后将得到的单词的含义向量与单词的位置向量相加,获取单词的综合表达向量;1) For the context of the next word of the dialogue to be generated, map the above word into the corresponding meaning vector (obtain the above word expression), and obtain the position vector of the word, and then obtain the The meaning vector of the word is added to the position vector of the word to obtain the comprehensive expression vector of the word;
将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达;Input the obtained comprehensive expression vector of the word into the coding network that combines the convolutional layer and the gated linear unit to obtain the above comprehensive expression;
2)将所要生成的对话的下一个词的上文最后一个单词(最后一次生成的单词,简称最后单词)转换成最后单词的含义向量(获取最后单词的表达),并结合最后单词的位置向量,两者相加获取最后单词的综合表达;2) Convert the last word above the next word of the dialogue to be generated (the last generated word, referred to as the last word) into the meaning vector of the last word (obtain the expression of the last word), and combine the position vector of the last word , the two are added together to obtain the comprehensive expression of the last word;
将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络,并结合步骤1)获得的上文的综合表达,获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词);Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate);
3)经过训练,得到最终的卷积对话生成模型,利用该模型可以生成所需的上下文对话。3) After training, the final convolutional dialogue generation model is obtained, which can be used to generate the required contextual dialogue.
步骤1)中,所述的单词的含义向量为wc={wc1,...,wcn},wc为第c个单词的含义向量,wc1为第c个单词的含义向量第1维数值,wcn为第c个单词的含义向量第n维数值;In step 1), the meaning vector of the word is w c ={w c1 ,...,w cn }, w c is the meaning vector of the c-th word, and w c1 is the meaning vector of the c-th word. 1-dimensional value, wcn is the nth-dimensional value of the meaning vector of the c-th word;
所述的单词的位置向量为pc={pc1,...,pcn},pc为第c个单词的位置向量,pc1为第c个单词的位置向量第1维数值,pcn为第c个单词的位置向量第n维数值;The position vector of the word is p c ={p c1 ,...,p cn }, p c is the position vector of the cth word, p c1 is the first dimension value of the position vector of the cth word, p cn is the nth dimension value of the position vector of the cth word;
所述的单词的综合表达向量oc={oc1,...,ocn},oc为第c个单词的综合表达向量,oc1为第c个单词的综合表达向量第1维数值,ocn为第c个单词的综合表达向量第n维数值。The comprehensive expression vector of the word o c ={o c1 ,...,o cn }, o c is the comprehensive expression vector of the c-th word, and o c1 is the first-dimensional value of the comprehensive expression vector of the c-th word , o cn is the nth dimension value of the comprehensive expression vector of the cth word.
步骤1)中,将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达,具体包括:In step 1), the comprehensive expression vector of the acquired word is input into the coding network combining the convolutional layer and the gated linear unit to obtain the comprehensive expression above, specifically including:
1.1)将单词的综合表达向量oc={oc1,...,ocn}循环输入到m个卷积模块中,利用这m个卷积模块获得上文的综合表达向量qm;m个卷积模块中每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成,卷积计算操作会按照如下公式生成两列d维向量Y=[A,B]∈R2d,1.1) The comprehensive expression vector o c ={o c1 ,...,o cn } of the word is cyclically input into m convolution modules, and the m convolution modules are utilized to obtain the above comprehensive expression vector q m ; m Each convolution module in the convolution modules consists of a convolution calculation operation and a nonlinear calculation operation. The convolution calculation operation will generate two columns of d-dimensional vectors Y=[A,B]∈R 2d according to the following formula,
Y=fconv(X)=WmX+bm Y=f conv (X)=W m X+b m
其中,A为第一列d维向量,B为第二列d维向量,R2d为2d维度所有向量集合,fconv(X)代表卷积操作,X代表卷积计算操作的输入映射表达向量,Wm代表第m个卷积计算操作中的权重矩阵,bm代表第m个卷积计算操作中的偏置值向量;Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R 2d is the set of all 2d-dimensional vectors, f conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W m represents the weight matrix in the mth convolution calculation operation, b m represents the bias value vector in the mth convolution calculation operation;
通过计算得到两列d维向量Y=[A,B]∈R2d;Two columns of d-dimensional vectors Y=[A,B]∈R 2d are obtained by calculation;
1.2)非线性计算操作会利用步骤1.1)卷积操作生成的输出Y=[A,B]∈R2d中的第二列d维向量B,结合门操作函数δ(B),获取控制网络中信息流动量的输出g=δ(B),该输出将传递到下一个神经元;将卷积操作生成的输出Y=[A,B]∈R2d中的第一列d维向量A,结合生成的控制网络中信息流动量的输出g=δ(B),按照如下公式获取编码器的卷积模块输出,1.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R 2d generated by the convolution operation in step 1.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), which will be passed to the next neuron; the output Y = [A, B] ∈ R 2d generated by the convolution operation is the first column of d-dimensional vector A in R 2d, combined with The output of the information flow in the generated control network is g=δ(B), and the output of the convolution module of the encoder is obtained according to the following formula,
其中,代表第m个编码器卷积模块的输出的第i维值,fconv(.)代表卷积操作,代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维,k为定义好的一个参数(例如可以定3、5、7等),代表第m-1个编码器卷积模块的输出的第i维值;in, represents the i-th dimension value of the output of the mth encoder convolution module, f conv (.) represents the convolution operation, Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.), represents the i-th dimension value of the output of the m-1th encoder convolution module;
经过m个卷积模块的连续操作,可以获得上文的综合表达qm。After successive operations of m convolution modules, the above comprehensive expression q m can be obtained.
步骤2)中,所述的最后单词的含义向量为ww={ww1,...,wwn},ww为最后单词的含义向量,ww为最后单词的含义向量第1维数值,wwn为最后单词的含义向量第n维数值;In step 2), the meaning vector of the last word is w w ={w w1 ,...,w wn }, w w is the meaning vector of the last word, and w w is the first dimension value of the meaning vector of the last word. , w wn is the nth dimension value of the meaning vector of the last word;
所述的最后单词的位置向量为pw={pw1,...,pwn},pw为最后单词的位置向量,pw1为最后单词的位置向量第1维数值,pwn为最后单词的的位置向量第n维数值;The position vector of the last word is p w ={p w1 ,...,p wn }, p w is the position vector of the last word, p w1 is the first dimension value of the position vector of the last word, and p wn is the last word. The nth dimension value of the position vector of the word;
所述的最后单词的综合表达为ow={ow1,...,own},ow为最后单词的综合表达向量,ow1为最后单词的综合表达向量第1维数值,own为最后单词的综合表达向量第n维数值。The comprehensive expression of the last word is o w ={o w1 ,...,o wn }, o w is the comprehensive expression vector of the last word, o w1 is the first dimension value of the comprehensive expression vector of the last word, o wn It is the nth dimension value of the comprehensive expression vector of the last word.
将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络,并结合步骤1)获得的上文的综合表达,获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词),具体包括:Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate), including:
2.1)将最后单词的综合表达ow={ow1,...,own}循环输入到与编码器中相同的m个卷积模块中,利用这m个卷积模块获得下一个要生成的单词的预测表达rm;每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成,卷积计算操作会按照如下公式生成两列d维向量Y=[A,B]∈R2d,2.1) Input the comprehensive expression of the last word ow ={o w1 , ...,o wn } cyclically into the same m convolution modules as in the encoder, and use these m convolution modules to obtain the next generation to be generated. The predicted expression r m of the word of R 2d ,
Y=fconv(X)=WmX+bm Y=f conv (X)=W m X+b m
其中,A为第一列d维向量,B为第二列d维向量,R2d为2d维度所有向量集合,fconv(X)代表卷积操作,X代表卷积计算操作的输入映射表达向量,Wm代表第m个卷积计算操作中的权重矩阵,bm代表第m个卷积计算操作中的偏置值向量;Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R 2d is the set of all 2d-dimensional vectors, f conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W m represents the weight matrix in the mth convolution calculation operation, b m represents the bias value vector in the mth convolution calculation operation;
通过计算得到两列d维向量Y=[A,B]∈R2d;Two columns of d-dimensional vectors Y=[A,B]∈R 2d are obtained by calculation;
2.2)非线性计算操作会利用步骤2.1)卷积操作生成的输出Y=[A,B]∈R2d中的第二列d维向量B,结合门操作函数δ(B),获取控制网络中信息流动量的输出g=δ(B),该输出将传递到下一个神经元;2.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R 2d generated by the convolution operation in step 2.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), this output will be passed to the next neuron;
将卷积操作生成的输出Y=[A,B]∈R2d中的第一列d维向量A,结合生成的控制网络中信息流动量的输出g=δ(B),按照如下公式获取编码器的卷积模块输出;The first column d-dimensional vector A in the output Y=[A,B]∈R 2d generated by the convolution operation is combined with the output g=δ(B) of the information flow in the generated control network, and the code is obtained according to the following formula The output of the convolution module of the device;
其中ri m代表第m个编码器卷积模块的输出的第i维值,fconv(.)代表卷积操作,代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维,k为定义好的一个参数(例如可以定3、5、7等),ri m-1代表第m-1个编码器卷积模块的输出的第i维值;where r i m represents the i-th dimension value of the output of the m-th encoder convolution module, f conv (.) represents the convolution operation, Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.), r i m-1 represents the i-th dimension value of the output of the m-1th encoder convolution module;
2.3)利用如下公式,结合解码器第m个卷积模块的输出的第i维值ri m,获取该解码器卷积模块对应注意力机制输出的第i维值 2.3) Using the following formula, combined with the i-th dimension value r i m of the output of the m-th convolution module of the decoder, obtain the i-th dimension value output by the decoder convolution module corresponding to the attention mechanism
其中,代表权重矩阵,代表偏置向量,gi代表参数系数(gi可人为设定);in, represents the weight matrix, represents the bias vector, and gi represents the parameter coefficient ( gi can be set manually);
之后利用如下公式可以获取对应于解码器第m个卷积模块的对应注意力机制输出的第i维值结合编码器第m个卷积模块输出中的第j维值 为步骤1)中综合表达向量qm第j维数值,获得对应的激活参数 After that, the i-th dimension value corresponding to the output of the corresponding attention mechanism of the m-th convolution module of the decoder can be obtained by using the following formula Combine the jth dimension values in the output of the mth convolutional block of the encoder is the jth dimension value of the comprehensive expression vector q m in step 1) to obtain the corresponding activation parameter
之后结合编码器整体输出的第j维值结合编码器步骤1)中单词的综合表达向量oc={oc1,...,ocn}的第j维值ocj,ocj为第c个单词的综合表达向量第j维数值,获取解码器第m个卷积模块输出的第i维值激活部分添加项 Then combine the jth dimension value of the overall output of the encoder Combined with the jth dimension value o cj of the comprehensive expression vector o c ={o c1 ,...,o cn } of the word in step 1) of the encoder, o cj is the jth dimension value of the comprehensive expression vector of the c th word, Get the addition of the activation part of the i-th dimension value of the output of the m-th convolution module of the decoder
将生成的解码器第m个卷积模块输出的第i维值激活部分添加项与解码器第m个卷积模块输出的第i维值ri m相加,经过m个卷积模块的循环处理,获得最终的解码器输出rm;Add an item to the activation part of the i-th dimension value output by the m-th convolution module of the generated decoder Add the i-th dimension value r i m output by the m-th convolution module of the decoder, and obtain the final decoder output r m through the cyclic processing of the m-th convolution module;
2.3)通过将解码器的输出rm,输入到softmax函数中,按照如下公式获取将要生成的下一个单词的概率,2.3) By inputting the output rm of the decoder into the softmax function, the probability of the next word to be generated is obtained according to the following formula,
p(yi+1|y1,...,yi)=softmax(Worm+bo)p(y i+1 |y 1 ,...,y i )=softmax(W o r m +b o )
其中,Wo代表权重矩阵,bo代表偏置向量,softmax(.)代表softmax函数,利用该概率输出,找到最大的概率对应的单词作为生成的对话下一个单词输出。p(yi+1|y1,...,yi)为下一个单词的概率,yi+1|y1,...,yi中,yi+1表示第i+1个单词,y1为表示第1个单词,yi表示第i个单词。Among them, W o represents the weight matrix, b o represents the bias vector, and softmax(.) represents the softmax function. Using the probability output, find the word corresponding to the maximum probability as the output of the next word in the generated dialogue. p(y i+1 |y 1 ,...,y i ) is the probability of the next word, in y i+1 |y 1 ,...,y i , y i+1 represents the i+1th word word, y 1 means the first word, and y i means the i-th word.
与现有技术相比,本发明具有如下优点:Compared with the prior art, the present invention has the following advantages:
本发明利用卷积对话生成模型解决对话生成任务的方法,相比于一般的对话生成解决方案,本发明利用了卷积对话生成模型,能够克服现有技术中使用循环神经网络导致无法利用GPU并行特点,且循环神经网络会导致梯度消失的问题。本发明在对话生成任务中所取得的效果相比于传统的方法更好。The present invention uses the convolutional dialogue generation model to solve the dialogue generation task. Compared with the general dialogue generation solution, the present invention uses the convolutional dialogue generation model, which can overcome the inability to use GPU parallelism caused by the use of cyclic neural networks in the prior art. characteristics, and the recurrent neural network will cause the problem of gradient disappearance. Compared with the traditional method, the effect achieved by the present invention in the dialogue generation task is better.
附图说明Description of drawings
图1为本发明利用卷积对话生成模型解决对话生成任务的方法的流程示意图。FIG. 1 is a schematic flowchart of a method for solving a dialogue generation task by using a convolution dialogue generation model according to the present invention.
具体实施方式Detailed ways
如图1所示,一种利用卷积对话生成模型解决对话生成任务的方法,包括如下步骤:As shown in Figure 1, a method for solving a dialogue generation task using a convolutional dialogue generation model includes the following steps:
1)针对于所要生成的对话的下一个词的上文(context),将上文进行单词映射成相应的含义向量(获取上文的单词表达),并获得单词的位置向量,之后将得到的单词的含义向量与单词的位置向量表达相加,获取单词的综合表达向量;1) For the context of the next word of the dialogue to be generated, map the above word into the corresponding meaning vector (obtain the above word expression), and obtain the position vector of the word, and then obtain the The meaning vector of the word and the position vector expression of the word are added to obtain the comprehensive expression vector of the word;
将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达;Input the obtained comprehensive expression vector of the word into the coding network that combines the convolutional layer and the gated linear unit to obtain the above comprehensive expression;
步骤1)中,单词的含义向量为wc={wc1,...,wcn},wc为第c个单词的含义向量,wc1为第c个单词的含义向量第1维数值,wcn为第c个单词的含义向量第n维数值;In step 1), the meaning vector of the word is w c ={w c1 ,...,w cn }, w c is the meaning vector of the c-th word, and w c1 is the first dimension value of the meaning vector of the c-th word. , wcn is the nth dimension value of the meaning vector of the cth word;
单词的位置向量为pc={pc1,...,pcn},pc为第c个单词的位置向量,pc1为第c个单词的位置向量第1维数值,pcn为第c个单词的位置向量第n维数值;The position vector of the word is p c ={p c1 ,...,p cn }, p c is the position vector of the c-th word, p c1 is the first dimension value of the position vector of the c-th word, and p cn is the th The nth dimension value of the position vector of the c words;
单词的综合表达向量oc={oc1,...,ocn},oc为第c个单词的综合表达向量,oc1为第c个单词的综合表达向量第1维数值,ocn为第c个单词的综合表达向量第n维数值。The comprehensive expression vector of words o c ={o c1 ,...,o cn }, o c is the comprehensive expression vector of the cth word, o c1 is the first dimension value of the comprehensive expression vector of the cth word, o cn is the nth dimension value of the comprehensive expression vector of the cth word.
将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达,具体包括:Input the obtained comprehensive expression vector of the word into the coding network that combines the convolutional layer and the gated linear unit to obtain the above comprehensive expression, including:
1.1)将单词的综合表达向量oc={oc1,...,ocn}循环输入到m个卷积模块中,利用这m个卷积模块获得上文的综合表达向量qm;m个卷积模块中每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成,卷积计算操作会按照如下公式生成两列d维向量Y=[A,B]∈R2d,1.1) The comprehensive expression vector o c ={o c1 ,...,o cn } of the word is cyclically input into m convolution modules, and the m convolution modules are utilized to obtain the above comprehensive expression vector q m ; m Each convolution module in the convolution modules consists of a convolution calculation operation and a nonlinear calculation operation. The convolution calculation operation will generate two columns of d-dimensional vectors Y=[A,B]∈R 2d according to the following formula,
Y=fconv(X)=WmX+bm Y=f conv (X)=W m X+b m
其中,A为第一列d维向量,B为第二列d维向量,R2d为2d维度所有向量集合,fconv(X)代表卷积操作,X代表卷积计算操作的输入映射表达向量,Wm代表第m个卷积计算操作中的权重矩阵,bm代表第m个卷积计算操作中的偏置值向量;Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R 2d is the set of all 2d-dimensional vectors, f conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W m represents the weight matrix in the mth convolution calculation operation, b m represents the bias value vector in the mth convolution calculation operation;
通过计算得到两列d维向量Y=[A,B]∈R2d;Two columns of d-dimensional vectors Y=[A,B]∈R 2d are obtained by calculation;
1.2)非线性计算操作会利用步骤1.1)卷积操作生成的输出Y=[A,B]∈R2d中的第二列d维向量B,结合门操作函数δ(B),获取控制网络中信息流动量的输出g=δ(B),该输出将传递到下一个神经元;将卷积操作生成的输出Y=[A,B]∈R2d中的第一列d维向量A,结合生成的控制网络中信息流动量的输出g=δ(B),按照如下公式获取编码器的卷积模块输出,1.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R 2d generated by the convolution operation in step 1.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), which will be passed to the next neuron; the output Y = [A, B] ∈ R 2d generated by the convolution operation is the first column of d-dimensional vector A in R 2d, combined with The output of the information flow in the generated control network is g=δ(B), and the output of the convolution module of the encoder is obtained according to the following formula,
其中,代表第m个编码器卷积模块的输出的第i维值,fconv(.)代表卷积操作,代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维,k为定义好的一个参数(例如可以定3、5、7等),代表第m-1个编码器卷积模块的输出的第i维值;in, represents the i-th dimension value of the output of the mth encoder convolution module, f conv (.) represents the convolution operation, Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.), represents the i-th dimension value of the output of the m-1th encoder convolution module;
经过m个卷积模块的连续操作,可以获得上文的综合表达qm。After successive operations of m convolution modules, the above comprehensive expression q m can be obtained.
2)通过所要生成的对话的下一个词的上文最后一个单词(最后一次生成的单词,简称最后单词)转换成最后单词的含义向量(获取最后单词的表达),并结合最后单词的位置向量,两者相加获取最后单词的综合表达;2) Convert the last word above the next word of the dialogue to be generated (the last generated word, referred to as the last word) into the meaning vector of the last word (obtain the expression of the last word), and combine the position vector of the last word , the two are added together to obtain the comprehensive expression of the last word;
将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络,并结合步骤1)获得的上文的综合表达,获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词);Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate);
步骤2)中,最后单词的含义向量为ww={ww1,...,wwn},ww为最后单词的含义向量,ww为最后单词的含义向量第1维数值,wwn为最后单词的含义向量第n维数值;In step 2), the meaning vector of the last word is w w ={w w1 ,...,w wn }, w w is the meaning vector of the last word, w w is the first dimension value of the meaning vector of the last word, w wn is the nth dimension value of the meaning vector of the last word;
最后单词的位置向量为pw={pw1,...,pwn},pw为最后单词的位置向量,pw1为最后单词的位置向量第1维数值,pwn为最后单词的的位置向量第n维数值;The position vector of the last word is p w ={p w1 ,...,p wn }, p w is the position vector of the last word, p w1 is the first dimension value of the position vector of the last word, and p wn is the value of the last word The nth dimension value of the position vector;
最后单词的综合表达为ow={ow1,...,own},ow为最后单词的综合表达向量,ow1为最后单词的综合表达向量第1维数值,own为最后单词的综合表达向量第n维数值。The comprehensive expression of the last word is o w ={o w1 ,...,o wn }, o w is the comprehensive expression vector of the last word, o w1 is the first dimension value of the comprehensive expression vector of the last word, and o wn is the last word The composite expression vector of nth dimension values.
将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络,并结合步骤1)获得的上文的综合表达,获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词),具体包括:Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate), including:
2.1)将最后单词的综合表达ow={ow1,...,own}循环输入到与编码器中相同的m个卷积模块中,利用这m个卷积模块获得下一个要生成的单词的预测表达rm;每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成,卷积计算操作会按照如下公式生成两列d维向量Y=[A,B]∈R2d,2.1) Input the comprehensive expression of the last word ow ={o w1 , ...,o wn } cyclically into the same m convolution modules as in the encoder, and use these m convolution modules to obtain the next generation to be generated. The predicted expression r m of the word of R 2d ,
Y=fconv(X)=WmX+bm Y=f conv (X)=W m X+b m
其中,A为第一列d维向量,B为第二列d维向量,R2d为2d维度所有向量集合,fconv(X)代表卷积操作,X代表卷积计算操作的输入映射表达向量,Wm代表第m个卷积计算操作中的权重矩阵,bm代表第m个卷积计算操作中的偏置值向量;Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R 2d is the set of all 2d-dimensional vectors, f conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W m represents the weight matrix in the mth convolution calculation operation, b m represents the bias value vector in the mth convolution calculation operation;
通过计算得到两列d维向量Y=[A,B]∈R2d;Two columns of d-dimensional vectors Y=[A,B]∈R 2d are obtained by calculation;
2.2)非线性计算操作会利用步骤2.1)卷积操作生成的输出Y=[A,B]∈R2d中的第二列d维向量B,结合门操作函数δ(B),获取控制网络中信息流动量的输出g=δ(B),该输出将传递到下一个神经元;2.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R 2d generated by the convolution operation in step 2.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), this output will be passed to the next neuron;
将卷积操作生成的输出Y=[A,B]∈R2d中的第一列d维向量A,结合生成的控制网络中信息流动量的输出g=δ(B),按照如下公式获取编码器的卷积模块输出;The first column d-dimensional vector A in the output Y=[A,B]∈R 2d generated by the convolution operation is combined with the output g=δ(B) of the information flow in the generated control network, and the code is obtained according to the following formula The output of the convolution module of the device;
其中ri m代表第m个编码器卷积模块的输出的第i维值,fconv(.)代表卷积操作,代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维,k为定义好的一个参数(例如可以定3、5、7等),ri m-1代表第m-1个编码器卷积模块的输出的第i维值;where r i m represents the i-th dimension value of the output of the m-th encoder convolution module, f conv (.) represents the convolution operation, Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.), r i m-1 represents the i-th dimension value of the output of the m-1th encoder convolution module;
2.3)利用如下公式,结合解码器第m个卷积模块的输出的第i维值ri m,获取该解码器卷积模块对应注意力机制输出的第i维值 2.3) Using the following formula, combined with the i-th dimension value r i m of the output of the m-th convolution module of the decoder, obtain the i-th dimension value output by the decoder convolution module corresponding to the attention mechanism
其中,代表权重矩阵,代表偏置向量,gi代表参数系数(gi可人为设定);in, represents the weight matrix, represents the bias vector, and gi represents the parameter coefficient ( gi can be set manually);
之后利用如下公式可以获取对应于解码器第m个卷积模块的对应注意力机制输出的第i维值结合编码器第m个卷积模块输出中的第j维值 为步骤1)中综合表达向量qm第j维数值,获得对应的激活参数 After that, the i-th dimension value corresponding to the output of the corresponding attention mechanism of the m-th convolution module of the decoder can be obtained by using the following formula Combine the jth dimension values in the output of the mth convolutional block of the encoder is the jth dimension value of the comprehensive expression vector q m in step 1) to obtain the corresponding activation parameter
之后结合编码器整体输出的第j维值结合编码器步骤1)中单词的综合表达向量oc={oc1,...,ocn}的第j维值ocj,ocj为第c个单词的综合表达向量第j维数值,获取解码器第m个卷积模块输出的第i维值激活部分添加项 Then combine the jth dimension value of the overall output of the encoder Combined with the jth dimension value o cj of the comprehensive expression vector o c ={o c1 ,...,o cn } of the word in step 1) of the encoder, o cj is the jth dimension value of the comprehensive expression vector of the c th word, Get the addition of the activation part of the i-th dimension value of the output of the m-th convolution module of the decoder
将生成的解码器第m个卷积模块输出的第i维值激活部分添加项与解码器第m个卷积模块输出的第i维值ri m相加,经过m个卷积模块的循环处理,获得最终的解码器输出rm;Add an item to the activation part of the i-th dimension value output by the m-th convolution module of the generated decoder Add the i-th dimension value r i m output by the m-th convolution module of the decoder, and obtain the final decoder output r m through the cyclic processing of the m-th convolution module;
2.3)通过将解码器的输出rm,输入到softmax函数中,按照如下公式获取将要生成的下一个单词的概率,2.3) By inputting the output rm of the decoder into the softmax function, the probability of the next word to be generated is obtained according to the following formula,
p(yi+1|y1,...,yi)=softmax(Worm+bo)p(y i+1 |y 1 ,...,y i )=softmax(W o r m +b o )
其中,Wo代表权重矩阵,bo代表偏置向量,softmax(.)代表softmax函数,利用该概率输出,找到最大的概率对应的单词作为生成的对话下一个单词输出。p(yi+1|y1,...,yi)为下一个单词的概率,yi+1|y1,...,yi中,yi+1表示第i+1个单词,y1为表示第1个单词,yi表示第i个单词。Among them, W o represents the weight matrix, b o represents the bias vector, and softmax(.) represents the softmax function. Using the probability output, find the word corresponding to the maximum probability as the output of the next word in the generated dialogue. p(y i+1 |y 1 ,...,y i ) is the probability of the next word, in y i+1 |y 1 ,...,y i , y i+1 represents the i+1th word word, y 1 means the first word, and y i means the i-th word.
3)经过训练,得到最终的卷积对话生成模型,利用该模型可以生成所需的上下文对话。3) After training, the final convolutional dialogue generation model is obtained, which can be used to generate the required contextual dialogue.
下面将上述方法应用于下列实施例中,以体现本发明的技术效果,实施例中具体步骤不再赘述。The above method is applied in the following embodiments to reflect the technical effect of the present invention, and the specific steps in the embodiments will not be repeated.
实施例Example
本发明在DailyDialog数据集上进行实验。为了客观地评价本发明的算法的性能,本发明在所选出的测试集中,使用了Average,Greedy,Extrema,Training Time这四种评价标准来对于本发明的效果进行评价。按照具体实施方式中描述的步骤,所得的实验结果如表1所示,本发明针对于Average,Greedy,Extrema,Training Time四种标准的测试结果,本方法表示为ConvTalker。The present invention conducts experiments on the DailyDialog dataset. In order to objectively evaluate the performance of the algorithm of the present invention, the present invention uses four evaluation criteria of Average, Greedy, Extrema, and Training Time in the selected test set to evaluate the effect of the present invention. According to the steps described in the specific embodiment, the obtained experimental results are shown in Table 1. The present invention is directed to the test results of four standards of Average, Greedy, Extrema, and Training Time, and this method is expressed as ConvTalker.
表1Table 1
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811057115.9A CN109255020B (en) | 2018-09-11 | 2018-09-11 | Method for solving dialogue generation task by using convolution dialogue generation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811057115.9A CN109255020B (en) | 2018-09-11 | 2018-09-11 | Method for solving dialogue generation task by using convolution dialogue generation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109255020A CN109255020A (en) | 2019-01-22 |
CN109255020B true CN109255020B (en) | 2022-04-01 |
Family
ID=65046678
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811057115.9A Active CN109255020B (en) | 2018-09-11 | 2018-09-11 | Method for solving dialogue generation task by using convolution dialogue generation model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109255020B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110196928B (en) * | 2019-05-17 | 2021-03-30 | 北京邮电大学 | Fully parallelized end-to-end multi-turn dialogue system and method with domain scalability |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273487A (en) * | 2017-06-13 | 2017-10-20 | 北京百度网讯科技有限公司 | Generation method, device and the computer equipment of chat data based on artificial intelligence |
CN107506823A (en) * | 2017-08-22 | 2017-12-22 | 南京大学 | A kind of construction method for being used to talk with the hybrid production style of generation |
CN107590153A (en) * | 2016-07-08 | 2018-01-16 | 微软技术许可有限责任公司 | Use the dialogue correlation modeling of convolutional neural networks |
CN107980130A (en) * | 2017-11-02 | 2018-05-01 | 深圳前海达闼云端智能科技有限公司 | It is automatic to answer method, apparatus, storage medium and electronic equipment |
CN108388944A (en) * | 2017-11-30 | 2018-08-10 | 中国科学院计算技术研究所 | LSTM neural network chips and its application method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9916538B2 (en) * | 2012-09-15 | 2018-03-13 | Z Advanced Computing, Inc. | Method and system for feature detection |
US10546066B2 (en) * | 2016-08-31 | 2020-01-28 | Microsoft Technology Licensing, Llc | End-to-end learning of dialogue agents for information access |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
-
2018
- 2018-09-11 CN CN201811057115.9A patent/CN109255020B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107590153A (en) * | 2016-07-08 | 2018-01-16 | 微软技术许可有限责任公司 | Use the dialogue correlation modeling of convolutional neural networks |
CN107273487A (en) * | 2017-06-13 | 2017-10-20 | 北京百度网讯科技有限公司 | Generation method, device and the computer equipment of chat data based on artificial intelligence |
CN107506823A (en) * | 2017-08-22 | 2017-12-22 | 南京大学 | A kind of construction method for being used to talk with the hybrid production style of generation |
CN107980130A (en) * | 2017-11-02 | 2018-05-01 | 深圳前海达闼云端智能科技有限公司 | It is automatic to answer method, apparatus, storage medium and electronic equipment |
CN108388944A (en) * | 2017-11-30 | 2018-08-10 | 中国科学院计算技术研究所 | LSTM neural network chips and its application method |
Non-Patent Citations (3)
Title |
---|
Improving Variational Encoder-Decoders in Dialogue Generation;Xiaoyu Shen et al.;《The Thirty-Second AAAI Conference on Artificial Intelligence》;20180427;第32卷(第1期);5456-5463 * |
Investigating Deep Reinforcement Learning Techniques in Personalized Dialogue Generation;Min Yang et al.;《Proceedings of the 2018 SIAM International Conference on Data Mining》;20180507;630-638 * |
智能对话系统研究综述;贾熹滨 等;《北京工业大学学报》;20170910;第43卷(第9期);1344-1356 * |
Also Published As
Publication number | Publication date |
---|---|
CN109255020A (en) | 2019-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492227A (en) | It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations | |
CN109165660B (en) | Significant object detection method based on convolutional neural network | |
US11610124B2 (en) | Learning compressible features | |
CN106776545B (en) | Method for calculating similarity between short texts through deep convolutional neural network | |
CN112801280B (en) | One-dimensional convolution position coding method of visual depth self-adaptive neural network | |
CN107798385B (en) | Sparse connection method of recurrent neural network based on block tensor decomposition | |
WO2018095049A1 (en) | Method and apparatus for generating recommended results | |
CN109710953B (en) | Translation method and device, computing equipment, storage medium and chip | |
CN110569505B (en) | A text input method and device | |
CN114493755A (en) | A Self-Attention Sequence Recommendation Method Based on Temporal Information | |
CN107391501A (en) | A kind of neural machine translation method of word-based prediction | |
CN110570346A (en) | A method for style transfer of calligraphy based on recurrent generative adversarial network | |
CN112926303A (en) | Malicious URL detection method based on BERT-BiGRU | |
CN114220496A (en) | A deep learning-based retrosynthesis prediction method, device, medium and device | |
CN114528398B (en) | A sentiment prediction method and system based on interactive dual graph convolutional network | |
CN114580710B (en) | Environmental monitoring method based on transducer time sequence prediction | |
CN116046810B (en) | Non-destructive testing method based on failure load of RPC cover | |
CN118152553A (en) | Person post matching intelligent recommendation method, system and device | |
CN115757713A (en) | An End-to-End Multi-Granularity Contrastive Learning Method for Video Text Retrieval | |
CN111507101B (en) | An Irony Detection Method Based on Multi-level Semantic Capsule Routing | |
CN112446461A (en) | Neural network model training method and device | |
CN114565625B (en) | A mineral image segmentation method and device based on global features | |
CN110377711A (en) | A method of open long video question-answering task is solved from attention network using layering convolution | |
CN113378546B (en) | Non-autoregressive sentence sequencing method | |
CN109255020B (en) | Method for solving dialogue generation task by using convolution dialogue generation model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |