CN109255020B - Method for solving dialogue generation task by using convolution dialogue generation model - Google Patents

Method for solving dialogue generation task by using convolution dialogue generation model Download PDF

Info

Publication number
CN109255020B
CN109255020B CN201811057115.9A CN201811057115A CN109255020B CN 109255020 B CN109255020 B CN 109255020B CN 201811057115 A CN201811057115 A CN 201811057115A CN 109255020 B CN109255020 B CN 109255020B
Authority
CN
China
Prior art keywords
word
vector
convolution
output
dimension value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811057115.9A
Other languages
Chinese (zh)
Other versions
CN109255020A (en
Inventor
赵洲
章璇
孟令涛
梁伟欣
金志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201811057115.9A priority Critical patent/CN109255020B/en
Publication of CN109255020A publication Critical patent/CN109255020A/en
Application granted granted Critical
Publication of CN109255020B publication Critical patent/CN109255020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Error Detection And Correction (AREA)

Abstract

本发明公开了一种利用卷积对话生成模型解决对话生成任务的方法,包括如下步骤:针对于所要生成的对话的下一个词的上文,得到的单词的含义向量与单词的位置向量,相加,获取单词的综合表达向量;输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达;将上文最后一个单词转换成最后单词的含义向量,并结合最后单词的位置向量,两者相加获取最后单词的综合表达;输入到结合了卷积层与门式线性单元结合的编码网络,并结合上文的综合表达,获取下一个要生成单词的表达。本发明利用了卷积对话生成模型,能够克服现有技术中使用循环神经网络导致无法利用GPU并行特点,且循环神经网络会导致梯度消失的问题。

Figure 201811057115

The invention discloses a method for solving a dialogue generation task by using a convolution dialogue generation model. Add, get the comprehensive expression vector of the word; input to the coding network that combines the convolutional layer and the gated linear unit to obtain the comprehensive expression above; convert the last word above into the meaning vector of the last word, and combine the last word The position vector of the word, the two are added together to obtain the comprehensive expression of the last word; the input is input to the coding network that combines the convolutional layer and the gated linear unit, and the above comprehensive expression is combined to obtain the expression of the next word to be generated. The present invention utilizes the convolutional dialogue generation model, and can overcome the problems that the use of the cyclic neural network in the prior art leads to the inability to utilize the parallel characteristics of the GPU, and the cyclic neural network causes the gradient to disappear.

Figure 201811057115

Description

一种利用卷积对话生成模型解决对话生成任务的方法A method for solving dialogue generation tasks using convolutional dialogue generation models

技术领域technical field

本发明涉及对话生成任务技术领域,具体涉及一种利用卷积对话生成模型解决对话生成任务的方法。The invention relates to the technical field of dialogue generation tasks, in particular to a method for solving dialogue generation tasks by using a convolution dialogue generation model.

背景技术Background technique

当下,非任务导向的对话生成已经引起了广泛关注,成为一项重要的服务,但是目前已有该项服务的效果并不是很好。At present, non-task-oriented dialogue generation has attracted extensive attention and has become an important service, but the effect of the existing service is not very good.

现有的技术主要是使用循环神经网络为基础来做,该方法主要是通过利用循环神经网络所拥有的时序含义,来完成对话的生成。但是循环神经网络由于涉及到时序,所以无法使用到GPU(Graphics Processing Unit,图形处理器)的并行特点。同时,由于循环神经网络的链式求导,导致循环神经网络倾向于出现梯度消失现象。为了克服这些缺陷,本方法将使用卷积对话生成模型完成对话生成任务。The existing technology is mainly based on the recurrent neural network, and the method mainly uses the time sequence meaning possessed by the recurrent neural network to complete the dialogue generation. However, the cyclic neural network cannot use the parallel characteristics of the GPU (Graphics Processing Unit, graphics processor) because it involves timing. At the same time, due to the chain derivation of the recurrent neural network, the recurrent neural network tends to have a gradient disappearance phenomenon. To overcome these shortcomings, this method will use a convolutional dialogue generation model to complete the dialogue generation task.

本发明将首先利用带有注意力机制模块的卷积神经网络获取当前对话上下文的表达,之后将这个表达输入到解码模块中,获取所需对话的下一个字,依次进行,生成整个对话。The present invention will first use a convolutional neural network with an attention mechanism module to obtain the expression of the current dialogue context, and then input this expression into the decoding module to obtain the next word of the required dialogue, and proceed sequentially to generate the entire dialogue.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于解决现有技术中的问题,为了克服现有技术中使用循环神经网络导致无法利用GPU并行特点,且循环神经网络会导致梯度消失的问题,本发明提供一种利用卷积对话生成模型解决对话生成任务的方法。The purpose of the present invention is to solve the problems in the prior art. In order to overcome the problems in the prior art that the use of a cyclic neural network leads to the inability to utilize the parallel characteristics of the GPU, and the cyclic neural network will cause the gradient to disappear, the present invention provides a method using convolutional dialogue. A generative model approach to solving the task of dialogue generation.

本发明所采用的具体技术方案是:The concrete technical scheme adopted in the present invention is:

一种利用卷积对话生成模型解决对话生成任务的方法,包括如下步骤:A method for solving a dialogue generation task using a convolutional dialogue generation model, comprising the following steps:

1)针对于所要生成的对话的下一个词的上文(context),将上文进行单词映射成相应的含义向量(获取上文的单词表达),并获得单词的位置向量,之后将得到的单词的含义向量与单词的位置向量相加,获取单词的综合表达向量;1) For the context of the next word of the dialogue to be generated, map the above word into the corresponding meaning vector (obtain the above word expression), and obtain the position vector of the word, and then obtain the The meaning vector of the word is added to the position vector of the word to obtain the comprehensive expression vector of the word;

将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达;Input the obtained comprehensive expression vector of the word into the coding network that combines the convolutional layer and the gated linear unit to obtain the above comprehensive expression;

2)将所要生成的对话的下一个词的上文最后一个单词(最后一次生成的单词,简称最后单词)转换成最后单词的含义向量(获取最后单词的表达),并结合最后单词的位置向量,两者相加获取最后单词的综合表达;2) Convert the last word above the next word of the dialogue to be generated (the last generated word, referred to as the last word) into the meaning vector of the last word (obtain the expression of the last word), and combine the position vector of the last word , the two are added together to obtain the comprehensive expression of the last word;

将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络,并结合步骤1)获得的上文的综合表达,获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词);Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate);

3)经过训练,得到最终的卷积对话生成模型,利用该模型可以生成所需的上下文对话。3) After training, the final convolutional dialogue generation model is obtained, which can be used to generate the required contextual dialogue.

步骤1)中,所述的单词的含义向量为wc={wc1,...,wcn},wc为第c个单词的含义向量,wc1为第c个单词的含义向量第1维数值,wcn为第c个单词的含义向量第n维数值;In step 1), the meaning vector of the word is w c ={w c1 ,...,w cn }, w c is the meaning vector of the c-th word, and w c1 is the meaning vector of the c-th word. 1-dimensional value, wcn is the nth-dimensional value of the meaning vector of the c-th word;

所述的单词的位置向量为pc={pc1,...,pcn},pc为第c个单词的位置向量,pc1为第c个单词的位置向量第1维数值,pcn为第c个单词的位置向量第n维数值;The position vector of the word is p c ={p c1 ,...,p cn }, p c is the position vector of the cth word, p c1 is the first dimension value of the position vector of the cth word, p cn is the nth dimension value of the position vector of the cth word;

所述的单词的综合表达向量oc={oc1,...,ocn},oc为第c个单词的综合表达向量,oc1为第c个单词的综合表达向量第1维数值,ocn为第c个单词的综合表达向量第n维数值。The comprehensive expression vector of the word o c ={o c1 ,...,o cn }, o c is the comprehensive expression vector of the c-th word, and o c1 is the first-dimensional value of the comprehensive expression vector of the c-th word , o cn is the nth dimension value of the comprehensive expression vector of the cth word.

步骤1)中,将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达,具体包括:In step 1), the comprehensive expression vector of the acquired word is input into the coding network combining the convolutional layer and the gated linear unit to obtain the comprehensive expression above, specifically including:

1.1)将单词的综合表达向量oc={oc1,...,ocn}循环输入到m个卷积模块中,利用这m个卷积模块获得上文的综合表达向量qm;m个卷积模块中每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成,卷积计算操作会按照如下公式生成两列d维向量Y=[A,B]∈R2d1.1) The comprehensive expression vector o c ={o c1 ,...,o cn } of the word is cyclically input into m convolution modules, and the m convolution modules are utilized to obtain the above comprehensive expression vector q m ; m Each convolution module in the convolution modules consists of a convolution calculation operation and a nonlinear calculation operation. The convolution calculation operation will generate two columns of d-dimensional vectors Y=[A,B]∈R 2d according to the following formula,

Y=fconv(X)=WmX+bm Y=f conv (X)=W m X+b m

其中,A为第一列d维向量,B为第二列d维向量,R2d为2d维度所有向量集合,fconv(X)代表卷积操作,X代表卷积计算操作的输入映射表达向量,Wm代表第m个卷积计算操作中的权重矩阵,bm代表第m个卷积计算操作中的偏置值向量;Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R 2d is the set of all 2d-dimensional vectors, f conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W m represents the weight matrix in the mth convolution calculation operation, b m represents the bias value vector in the mth convolution calculation operation;

通过计算得到两列d维向量Y=[A,B]∈R2dTwo columns of d-dimensional vectors Y=[A,B]∈R 2d are obtained by calculation;

1.2)非线性计算操作会利用步骤1.1)卷积操作生成的输出Y=[A,B]∈R2d中的第二列d维向量B,结合门操作函数δ(B),获取控制网络中信息流动量的输出g=δ(B),该输出将传递到下一个神经元;将卷积操作生成的输出Y=[A,B]∈R2d中的第一列d维向量A,结合生成的控制网络中信息流动量的输出g=δ(B),按照如下公式获取编码器的卷积模块输出,1.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R 2d generated by the convolution operation in step 1.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), which will be passed to the next neuron; the output Y = [A, B] ∈ R 2d generated by the convolution operation is the first column of d-dimensional vector A in R 2d, combined with The output of the information flow in the generated control network is g=δ(B), and the output of the convolution module of the encoder is obtained according to the following formula,

Figure BDA0001796071990000031
Figure BDA0001796071990000031

其中,

Figure BDA0001796071990000032
代表第m个编码器卷积模块的输出的第i维值,fconv(.)代表卷积操作,
Figure BDA0001796071990000033
代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维,k为定义好的一个参数(例如可以定3、5、7等),
Figure BDA0001796071990000034
代表第m-1个编码器卷积模块的输出的第i维值;in,
Figure BDA0001796071990000032
represents the i-th dimension value of the output of the mth encoder convolution module, f conv (.) represents the convolution operation,
Figure BDA0001796071990000033
Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.),
Figure BDA0001796071990000034
represents the i-th dimension value of the output of the m-1th encoder convolution module;

经过m个卷积模块的连续操作,可以获得上文的综合表达qmAfter successive operations of m convolution modules, the above comprehensive expression q m can be obtained.

步骤2)中,所述的最后单词的含义向量为ww={ww1,...,wwn},ww为最后单词的含义向量,ww为最后单词的含义向量第1维数值,wwn为最后单词的含义向量第n维数值;In step 2), the meaning vector of the last word is w w ={w w1 ,...,w wn }, w w is the meaning vector of the last word, and w w is the first dimension value of the meaning vector of the last word. , w wn is the nth dimension value of the meaning vector of the last word;

所述的最后单词的位置向量为pw={pw1,...,pwn},pw为最后单词的位置向量,pw1为最后单词的位置向量第1维数值,pwn为最后单词的的位置向量第n维数值;The position vector of the last word is p w ={p w1 ,...,p wn }, p w is the position vector of the last word, p w1 is the first dimension value of the position vector of the last word, and p wn is the last word. The nth dimension value of the position vector of the word;

所述的最后单词的综合表达为ow={ow1,...,own},ow为最后单词的综合表达向量,ow1为最后单词的综合表达向量第1维数值,own为最后单词的综合表达向量第n维数值。The comprehensive expression of the last word is o w ={o w1 ,...,o wn }, o w is the comprehensive expression vector of the last word, o w1 is the first dimension value of the comprehensive expression vector of the last word, o wn It is the nth dimension value of the comprehensive expression vector of the last word.

将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络,并结合步骤1)获得的上文的综合表达,获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词),具体包括:Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate), including:

2.1)将最后单词的综合表达ow={ow1,...,own}循环输入到与编码器中相同的m个卷积模块中,利用这m个卷积模块获得下一个要生成的单词的预测表达rm;每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成,卷积计算操作会按照如下公式生成两列d维向量Y=[A,B]∈R2d2.1) Input the comprehensive expression of the last word ow ={o w1 , ...,o wn } cyclically into the same m convolution modules as in the encoder, and use these m convolution modules to obtain the next generation to be generated. The predicted expression r m of the word of R 2d ,

Y=fconv(X)=WmX+bm Y=f conv (X)=W m X+b m

其中,A为第一列d维向量,B为第二列d维向量,R2d为2d维度所有向量集合,fconv(X)代表卷积操作,X代表卷积计算操作的输入映射表达向量,Wm代表第m个卷积计算操作中的权重矩阵,bm代表第m个卷积计算操作中的偏置值向量;Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R 2d is the set of all 2d-dimensional vectors, f conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W m represents the weight matrix in the mth convolution calculation operation, b m represents the bias value vector in the mth convolution calculation operation;

通过计算得到两列d维向量Y=[A,B]∈R2dTwo columns of d-dimensional vectors Y=[A,B]∈R 2d are obtained by calculation;

2.2)非线性计算操作会利用步骤2.1)卷积操作生成的输出Y=[A,B]∈R2d中的第二列d维向量B,结合门操作函数δ(B),获取控制网络中信息流动量的输出g=δ(B),该输出将传递到下一个神经元;2.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R 2d generated by the convolution operation in step 2.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), this output will be passed to the next neuron;

将卷积操作生成的输出Y=[A,B]∈R2d中的第一列d维向量A,结合生成的控制网络中信息流动量的输出g=δ(B),按照如下公式获取编码器的卷积模块输出;The first column d-dimensional vector A in the output Y=[A,B]∈R 2d generated by the convolution operation is combined with the output g=δ(B) of the information flow in the generated control network, and the code is obtained according to the following formula The output of the convolution module of the device;

Figure BDA0001796071990000041
Figure BDA0001796071990000041

其中ri m代表第m个编码器卷积模块的输出的第i维值,fconv(.)代表卷积操作,

Figure BDA0001796071990000042
代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维,k为定义好的一个参数(例如可以定3、5、7等),ri m-1代表第m-1个编码器卷积模块的输出的第i维值;where r i m represents the i-th dimension value of the output of the m-th encoder convolution module, f conv (.) represents the convolution operation,
Figure BDA0001796071990000042
Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.), r i m-1 represents the i-th dimension value of the output of the m-1th encoder convolution module;

2.3)利用如下公式,结合解码器第m个卷积模块的输出的第i维值ri m,获取该解码器卷积模块对应注意力机制输出的第i维值

Figure BDA0001796071990000043
2.3) Using the following formula, combined with the i-th dimension value r i m of the output of the m-th convolution module of the decoder, obtain the i-th dimension value output by the decoder convolution module corresponding to the attention mechanism
Figure BDA0001796071990000043

Figure BDA0001796071990000044
Figure BDA0001796071990000044

其中,

Figure BDA0001796071990000051
代表权重矩阵,
Figure BDA0001796071990000052
代表偏置向量,gi代表参数系数(gi可人为设定);in,
Figure BDA0001796071990000051
represents the weight matrix,
Figure BDA0001796071990000052
represents the bias vector, and gi represents the parameter coefficient ( gi can be set manually);

之后利用如下公式可以获取对应于解码器第m个卷积模块的对应注意力机制输出的第i维值

Figure BDA0001796071990000053
结合编码器第m个卷积模块输出中的第j维值
Figure BDA0001796071990000054
Figure BDA0001796071990000055
为步骤1)中综合表达向量qm第j维数值,获得对应的激活参数
Figure BDA0001796071990000056
After that, the i-th dimension value corresponding to the output of the corresponding attention mechanism of the m-th convolution module of the decoder can be obtained by using the following formula
Figure BDA0001796071990000053
Combine the jth dimension values in the output of the mth convolutional block of the encoder
Figure BDA0001796071990000054
Figure BDA0001796071990000055
is the jth dimension value of the comprehensive expression vector q m in step 1) to obtain the corresponding activation parameter
Figure BDA0001796071990000056

Figure BDA0001796071990000057
Figure BDA0001796071990000057

之后结合编码器整体输出的第j维值

Figure BDA0001796071990000058
结合编码器步骤1)中单词的综合表达向量oc={oc1,...,ocn}的第j维值ocj,ocj为第c个单词的综合表达向量第j维数值,获取解码器第m个卷积模块输出的第i维值激活部分添加项
Figure BDA0001796071990000059
Then combine the jth dimension value of the overall output of the encoder
Figure BDA0001796071990000058
Combined with the jth dimension value o cj of the comprehensive expression vector o c ={o c1 ,...,o cn } of the word in step 1) of the encoder, o cj is the jth dimension value of the comprehensive expression vector of the c th word, Get the addition of the activation part of the i-th dimension value of the output of the m-th convolution module of the decoder
Figure BDA0001796071990000059

Figure BDA00017960719900000510
Figure BDA00017960719900000510

将生成的解码器第m个卷积模块输出的第i维值激活部分添加项

Figure BDA00017960719900000511
与解码器第m个卷积模块输出的第i维值ri m相加,经过m个卷积模块的循环处理,获得最终的解码器输出rm;Add an item to the activation part of the i-th dimension value output by the m-th convolution module of the generated decoder
Figure BDA00017960719900000511
Add the i-th dimension value r i m output by the m-th convolution module of the decoder, and obtain the final decoder output r m through the cyclic processing of the m-th convolution module;

2.3)通过将解码器的输出rm,输入到softmax函数中,按照如下公式获取将要生成的下一个单词的概率,2.3) By inputting the output rm of the decoder into the softmax function, the probability of the next word to be generated is obtained according to the following formula,

p(yi+1|y1,...,yi)=softmax(Worm+bo)p(y i+1 |y 1 ,...,y i )=softmax(W o r m +b o )

其中,Wo代表权重矩阵,bo代表偏置向量,softmax(.)代表softmax函数,利用该概率输出,找到最大的概率对应的单词作为生成的对话下一个单词输出。p(yi+1|y1,...,yi)为下一个单词的概率,yi+1|y1,...,yi中,yi+1表示第i+1个单词,y1为表示第1个单词,yi表示第i个单词。Among them, W o represents the weight matrix, b o represents the bias vector, and softmax(.) represents the softmax function. Using the probability output, find the word corresponding to the maximum probability as the output of the next word in the generated dialogue. p(y i+1 |y 1 ,...,y i ) is the probability of the next word, in y i+1 |y 1 ,...,y i , y i+1 represents the i+1th word word, y 1 means the first word, and y i means the i-th word.

与现有技术相比,本发明具有如下优点:Compared with the prior art, the present invention has the following advantages:

本发明利用卷积对话生成模型解决对话生成任务的方法,相比于一般的对话生成解决方案,本发明利用了卷积对话生成模型,能够克服现有技术中使用循环神经网络导致无法利用GPU并行特点,且循环神经网络会导致梯度消失的问题。本发明在对话生成任务中所取得的效果相比于传统的方法更好。The present invention uses the convolutional dialogue generation model to solve the dialogue generation task. Compared with the general dialogue generation solution, the present invention uses the convolutional dialogue generation model, which can overcome the inability to use GPU parallelism caused by the use of cyclic neural networks in the prior art. characteristics, and the recurrent neural network will cause the problem of gradient disappearance. Compared with the traditional method, the effect achieved by the present invention in the dialogue generation task is better.

附图说明Description of drawings

图1为本发明利用卷积对话生成模型解决对话生成任务的方法的流程示意图。FIG. 1 is a schematic flowchart of a method for solving a dialogue generation task by using a convolution dialogue generation model according to the present invention.

具体实施方式Detailed ways

如图1所示,一种利用卷积对话生成模型解决对话生成任务的方法,包括如下步骤:As shown in Figure 1, a method for solving a dialogue generation task using a convolutional dialogue generation model includes the following steps:

1)针对于所要生成的对话的下一个词的上文(context),将上文进行单词映射成相应的含义向量(获取上文的单词表达),并获得单词的位置向量,之后将得到的单词的含义向量与单词的位置向量表达相加,获取单词的综合表达向量;1) For the context of the next word of the dialogue to be generated, map the above word into the corresponding meaning vector (obtain the above word expression), and obtain the position vector of the word, and then obtain the The meaning vector of the word and the position vector expression of the word are added to obtain the comprehensive expression vector of the word;

将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达;Input the obtained comprehensive expression vector of the word into the coding network that combines the convolutional layer and the gated linear unit to obtain the above comprehensive expression;

步骤1)中,单词的含义向量为wc={wc1,...,wcn},wc为第c个单词的含义向量,wc1为第c个单词的含义向量第1维数值,wcn为第c个单词的含义向量第n维数值;In step 1), the meaning vector of the word is w c ={w c1 ,...,w cn }, w c is the meaning vector of the c-th word, and w c1 is the first dimension value of the meaning vector of the c-th word. , wcn is the nth dimension value of the meaning vector of the cth word;

单词的位置向量为pc={pc1,...,pcn},pc为第c个单词的位置向量,pc1为第c个单词的位置向量第1维数值,pcn为第c个单词的位置向量第n维数值;The position vector of the word is p c ={p c1 ,...,p cn }, p c is the position vector of the c-th word, p c1 is the first dimension value of the position vector of the c-th word, and p cn is the th The nth dimension value of the position vector of the c words;

单词的综合表达向量oc={oc1,...,ocn},oc为第c个单词的综合表达向量,oc1为第c个单词的综合表达向量第1维数值,ocn为第c个单词的综合表达向量第n维数值。The comprehensive expression vector of words o c ={o c1 ,...,o cn }, o c is the comprehensive expression vector of the cth word, o c1 is the first dimension value of the comprehensive expression vector of the cth word, o cn is the nth dimension value of the comprehensive expression vector of the cth word.

将获取的单词的综合表达向量输入到结合了卷积层与门式线性单元结合的编码网络,获取上文的综合表达,具体包括:Input the obtained comprehensive expression vector of the word into the coding network that combines the convolutional layer and the gated linear unit to obtain the above comprehensive expression, including:

1.1)将单词的综合表达向量oc={oc1,...,ocn}循环输入到m个卷积模块中,利用这m个卷积模块获得上文的综合表达向量qm;m个卷积模块中每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成,卷积计算操作会按照如下公式生成两列d维向量Y=[A,B]∈R2d1.1) The comprehensive expression vector o c ={o c1 ,...,o cn } of the word is cyclically input into m convolution modules, and the m convolution modules are utilized to obtain the above comprehensive expression vector q m ; m Each convolution module in the convolution modules consists of a convolution calculation operation and a nonlinear calculation operation. The convolution calculation operation will generate two columns of d-dimensional vectors Y=[A,B]∈R 2d according to the following formula,

Y=fconv(X)=WmX+bm Y=f conv (X)=W m X+b m

其中,A为第一列d维向量,B为第二列d维向量,R2d为2d维度所有向量集合,fconv(X)代表卷积操作,X代表卷积计算操作的输入映射表达向量,Wm代表第m个卷积计算操作中的权重矩阵,bm代表第m个卷积计算操作中的偏置值向量;Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R 2d is the set of all 2d-dimensional vectors, f conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W m represents the weight matrix in the mth convolution calculation operation, b m represents the bias value vector in the mth convolution calculation operation;

通过计算得到两列d维向量Y=[A,B]∈R2dTwo columns of d-dimensional vectors Y=[A,B]∈R 2d are obtained by calculation;

1.2)非线性计算操作会利用步骤1.1)卷积操作生成的输出Y=[A,B]∈R2d中的第二列d维向量B,结合门操作函数δ(B),获取控制网络中信息流动量的输出g=δ(B),该输出将传递到下一个神经元;将卷积操作生成的输出Y=[A,B]∈R2d中的第一列d维向量A,结合生成的控制网络中信息流动量的输出g=δ(B),按照如下公式获取编码器的卷积模块输出,1.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R 2d generated by the convolution operation in step 1.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), which will be passed to the next neuron; the output Y = [A, B] ∈ R 2d generated by the convolution operation is the first column of d-dimensional vector A in R 2d, combined with The output of the information flow in the generated control network is g=δ(B), and the output of the convolution module of the encoder is obtained according to the following formula,

Figure BDA0001796071990000071
Figure BDA0001796071990000071

其中,

Figure BDA0001796071990000072
代表第m个编码器卷积模块的输出的第i维值,fconv(.)代表卷积操作,
Figure BDA0001796071990000073
代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维,k为定义好的一个参数(例如可以定3、5、7等),
Figure BDA0001796071990000074
代表第m-1个编码器卷积模块的输出的第i维值;in,
Figure BDA0001796071990000072
represents the i-th dimension value of the output of the mth encoder convolution module, f conv (.) represents the convolution operation,
Figure BDA0001796071990000073
Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.),
Figure BDA0001796071990000074
represents the i-th dimension value of the output of the m-1th encoder convolution module;

经过m个卷积模块的连续操作,可以获得上文的综合表达qmAfter successive operations of m convolution modules, the above comprehensive expression q m can be obtained.

2)通过所要生成的对话的下一个词的上文最后一个单词(最后一次生成的单词,简称最后单词)转换成最后单词的含义向量(获取最后单词的表达),并结合最后单词的位置向量,两者相加获取最后单词的综合表达;2) Convert the last word above the next word of the dialogue to be generated (the last generated word, referred to as the last word) into the meaning vector of the last word (obtain the expression of the last word), and combine the position vector of the last word , the two are added together to obtain the comprehensive expression of the last word;

将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络,并结合步骤1)获得的上文的综合表达,获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词);Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate);

步骤2)中,最后单词的含义向量为ww={ww1,...,wwn},ww为最后单词的含义向量,ww为最后单词的含义向量第1维数值,wwn为最后单词的含义向量第n维数值;In step 2), the meaning vector of the last word is w w ={w w1 ,...,w wn }, w w is the meaning vector of the last word, w w is the first dimension value of the meaning vector of the last word, w wn is the nth dimension value of the meaning vector of the last word;

最后单词的位置向量为pw={pw1,...,pwn},pw为最后单词的位置向量,pw1为最后单词的位置向量第1维数值,pwn为最后单词的的位置向量第n维数值;The position vector of the last word is p w ={p w1 ,...,p wn }, p w is the position vector of the last word, p w1 is the first dimension value of the position vector of the last word, and p wn is the value of the last word The nth dimension value of the position vector;

最后单词的综合表达为ow={ow1,...,own},ow为最后单词的综合表达向量,ow1为最后单词的综合表达向量第1维数值,own为最后单词的综合表达向量第n维数值。The comprehensive expression of the last word is o w ={o w1 ,...,o wn }, o w is the comprehensive expression vector of the last word, o w1 is the first dimension value of the comprehensive expression vector of the last word, and o wn is the last word The composite expression vector of nth dimension values.

将最后单词的综合表达输入到结合了卷积层与门式线性单元结合的编码网络,并结合步骤1)获得的上文的综合表达,获取下一个要生成单词的表达(利用该表达获取下一个要生成的单词),具体包括:Input the comprehensive expression of the last word into the coding network that combines the convolutional layer and the gated linear unit, and combine the above comprehensive expression obtained in step 1) to obtain the expression of the next word to be generated (use this expression to obtain the next word. a word to generate), including:

2.1)将最后单词的综合表达ow={ow1,...,own}循环输入到与编码器中相同的m个卷积模块中,利用这m个卷积模块获得下一个要生成的单词的预测表达rm;每个卷积模块都由一个卷积计算操作与一个非线性计算操作组成,卷积计算操作会按照如下公式生成两列d维向量Y=[A,B]∈R2d2.1) Input the comprehensive expression of the last word ow ={o w1 , ...,o wn } cyclically into the same m convolution modules as in the encoder, and use these m convolution modules to obtain the next generation to be generated. The predicted expression r m of the word of R 2d ,

Y=fconv(X)=WmX+bm Y=f conv (X)=W m X+b m

其中,A为第一列d维向量,B为第二列d维向量,R2d为2d维度所有向量集合,fconv(X)代表卷积操作,X代表卷积计算操作的输入映射表达向量,Wm代表第m个卷积计算操作中的权重矩阵,bm代表第m个卷积计算操作中的偏置值向量;Among them, A is the d-dimensional vector of the first column, B is the d-dimensional vector of the second column, R 2d is the set of all 2d-dimensional vectors, f conv (X) represents the convolution operation, and X represents the input map expression vector of the convolution calculation operation , W m represents the weight matrix in the mth convolution calculation operation, b m represents the bias value vector in the mth convolution calculation operation;

通过计算得到两列d维向量Y=[A,B]∈R2dTwo columns of d-dimensional vectors Y=[A,B]∈R 2d are obtained by calculation;

2.2)非线性计算操作会利用步骤2.1)卷积操作生成的输出Y=[A,B]∈R2d中的第二列d维向量B,结合门操作函数δ(B),获取控制网络中信息流动量的输出g=δ(B),该输出将传递到下一个神经元;2.2) The nonlinear calculation operation will use the second column d-dimensional vector B in the output Y=[A,B]∈R 2d generated by the convolution operation in step 2.1), combined with the gate operation function δ(B), to obtain the control network. The output of information flow g = δ(B), this output will be passed to the next neuron;

将卷积操作生成的输出Y=[A,B]∈R2d中的第一列d维向量A,结合生成的控制网络中信息流动量的输出g=δ(B),按照如下公式获取编码器的卷积模块输出;The first column d-dimensional vector A in the output Y=[A,B]∈R 2d generated by the convolution operation is combined with the output g=δ(B) of the information flow in the generated control network, and the code is obtained according to the following formula The output of the convolution module of the device;

Figure BDA0001796071990000091
Figure BDA0001796071990000091

其中ri m代表第m个编码器卷积模块的输出的第i维值,fconv(.)代表卷积操作,

Figure BDA0001796071990000092
代表第m-1个编码器卷积模块的输出的第(i-k/2)到第(i+k/2)维,k为定义好的一个参数(例如可以定3、5、7等),ri m-1代表第m-1个编码器卷积模块的输出的第i维值;where r i m represents the i-th dimension value of the output of the m-th encoder convolution module, f conv (.) represents the convolution operation,
Figure BDA0001796071990000092
Represents the (ik/2)th to (i+k/2)th dimension of the output of the m-1th encoder convolution module, k is a defined parameter (for example, it can be set to 3, 5, 7, etc.), r i m-1 represents the i-th dimension value of the output of the m-1th encoder convolution module;

2.3)利用如下公式,结合解码器第m个卷积模块的输出的第i维值ri m,获取该解码器卷积模块对应注意力机制输出的第i维值

Figure BDA0001796071990000093
2.3) Using the following formula, combined with the i-th dimension value r i m of the output of the m-th convolution module of the decoder, obtain the i-th dimension value output by the decoder convolution module corresponding to the attention mechanism
Figure BDA0001796071990000093

Figure BDA0001796071990000094
Figure BDA0001796071990000094

其中,

Figure BDA0001796071990000095
代表权重矩阵,
Figure BDA0001796071990000096
代表偏置向量,gi代表参数系数(gi可人为设定);in,
Figure BDA0001796071990000095
represents the weight matrix,
Figure BDA0001796071990000096
represents the bias vector, and gi represents the parameter coefficient ( gi can be set manually);

之后利用如下公式可以获取对应于解码器第m个卷积模块的对应注意力机制输出的第i维值

Figure BDA0001796071990000097
结合编码器第m个卷积模块输出中的第j维值
Figure BDA0001796071990000098
Figure BDA0001796071990000099
为步骤1)中综合表达向量qm第j维数值,获得对应的激活参数
Figure BDA00017960719900000910
After that, the i-th dimension value corresponding to the output of the corresponding attention mechanism of the m-th convolution module of the decoder can be obtained by using the following formula
Figure BDA0001796071990000097
Combine the jth dimension values in the output of the mth convolutional block of the encoder
Figure BDA0001796071990000098
Figure BDA0001796071990000099
is the jth dimension value of the comprehensive expression vector q m in step 1) to obtain the corresponding activation parameter
Figure BDA00017960719900000910

Figure BDA00017960719900000911
Figure BDA00017960719900000911

之后结合编码器整体输出的第j维值

Figure BDA00017960719900000912
结合编码器步骤1)中单词的综合表达向量oc={oc1,...,ocn}的第j维值ocj,ocj为第c个单词的综合表达向量第j维数值,获取解码器第m个卷积模块输出的第i维值激活部分添加项
Figure BDA0001796071990000101
Then combine the jth dimension value of the overall output of the encoder
Figure BDA00017960719900000912
Combined with the jth dimension value o cj of the comprehensive expression vector o c ={o c1 ,...,o cn } of the word in step 1) of the encoder, o cj is the jth dimension value of the comprehensive expression vector of the c th word, Get the addition of the activation part of the i-th dimension value of the output of the m-th convolution module of the decoder
Figure BDA0001796071990000101

Figure BDA0001796071990000102
Figure BDA0001796071990000102

将生成的解码器第m个卷积模块输出的第i维值激活部分添加项

Figure BDA0001796071990000103
与解码器第m个卷积模块输出的第i维值ri m相加,经过m个卷积模块的循环处理,获得最终的解码器输出rm;Add an item to the activation part of the i-th dimension value output by the m-th convolution module of the generated decoder
Figure BDA0001796071990000103
Add the i-th dimension value r i m output by the m-th convolution module of the decoder, and obtain the final decoder output r m through the cyclic processing of the m-th convolution module;

2.3)通过将解码器的输出rm,输入到softmax函数中,按照如下公式获取将要生成的下一个单词的概率,2.3) By inputting the output rm of the decoder into the softmax function, the probability of the next word to be generated is obtained according to the following formula,

p(yi+1|y1,...,yi)=softmax(Worm+bo)p(y i+1 |y 1 ,...,y i )=softmax(W o r m +b o )

其中,Wo代表权重矩阵,bo代表偏置向量,softmax(.)代表softmax函数,利用该概率输出,找到最大的概率对应的单词作为生成的对话下一个单词输出。p(yi+1|y1,...,yi)为下一个单词的概率,yi+1|y1,...,yi中,yi+1表示第i+1个单词,y1为表示第1个单词,yi表示第i个单词。Among them, W o represents the weight matrix, b o represents the bias vector, and softmax(.) represents the softmax function. Using the probability output, find the word corresponding to the maximum probability as the output of the next word in the generated dialogue. p(y i+1 |y 1 ,...,y i ) is the probability of the next word, in y i+1 |y 1 ,...,y i , y i+1 represents the i+1th word word, y 1 means the first word, and y i means the i-th word.

3)经过训练,得到最终的卷积对话生成模型,利用该模型可以生成所需的上下文对话。3) After training, the final convolutional dialogue generation model is obtained, which can be used to generate the required contextual dialogue.

下面将上述方法应用于下列实施例中,以体现本发明的技术效果,实施例中具体步骤不再赘述。The above method is applied in the following embodiments to reflect the technical effect of the present invention, and the specific steps in the embodiments will not be repeated.

实施例Example

本发明在DailyDialog数据集上进行实验。为了客观地评价本发明的算法的性能,本发明在所选出的测试集中,使用了Average,Greedy,Extrema,Training Time这四种评价标准来对于本发明的效果进行评价。按照具体实施方式中描述的步骤,所得的实验结果如表1所示,本发明针对于Average,Greedy,Extrema,Training Time四种标准的测试结果,本方法表示为ConvTalker。The present invention conducts experiments on the DailyDialog dataset. In order to objectively evaluate the performance of the algorithm of the present invention, the present invention uses four evaluation criteria of Average, Greedy, Extrema, and Training Time in the selected test set to evaluate the effect of the present invention. According to the steps described in the specific embodiment, the obtained experimental results are shown in Table 1. The present invention is directed to the test results of four standards of Average, Greedy, Extrema, and Training Time, and this method is expressed as ConvTalker.

表1Table 1

Figure BDA0001796071990000111
Figure BDA0001796071990000111

Claims (1)

1. A method for solving a dialog generation task using a convolutional dialog generation model, comprising the steps of:
1) aiming at the upper text of the next word of the dialog to be generated, mapping the word of the upper text into a corresponding meaning vector, obtaining a position vector of the word, and then adding the obtained meaning vector of the word and the position vector of the word to obtain a comprehensive expression vector of the word;
inputting the obtained comprehensive expression vector of the word into a coding network combined with a convolutional layer and a gate linear unit to obtain the comprehensive expression, specifically comprising:
1.1) comprehensive expression vector o of wordsc={oc1,...,ocnCircularly inputting the data into m convolution modules, and obtaining the comprehensive expression vector q by using the m convolution modulesm(ii) a Each convolution module in the m convolution modules consists of a convolution calculation operation and a nonlinear calculation operation, and the convolution calculation operation generates two columns of d-dimensional vectors Y ═ A, B according to the following formula]∈R2d
Y=fconv(X)=WmX+bm
Wherein A is a first row d-dimensional vector, B is a second row d-dimensional vector, R2dSet of all vectors for 2d dimension, fconv(X) represents convolution operation, X represents input mapping expression vector of convolution calculation operation, WmRepresenting the weight matrix in the mth convolution calculation operation, bmRepresenting a vector of bias values in the mth convolution calculation operation;
two columns of d-dimensional vectors Y ═ A, B are obtained through calculation]∈R2d
1.2) nonlinear computation operation the output Y ═ a, B generated using the convolution operation of step 1.1) is]∈R2dThe second column of d-dimensional vectors B, combined with the gate operation function δ (B), obtains the output g ═ δ (B) of the information flow in the control network, which will be passed to the next neuron;
the output Y generated by the convolution operation is ═ A, B]∈R2dThe first column d-dimensional vector A in the control network is combined with the generated output g of the information flow quantity in the control network to be delta (B), the convolution module output of the encoder is obtained according to the following formula,
Figure FDA0003416053010000011
wherein,
Figure FDA0003416053010000012
representing the i-dimensional value, f, of the output of the m-th encoder convolution moduleconv(.) represents a convolution operation,
Figure FDA0003416053010000021
the (i-k/2) th to (i + k/2) th dimensions of the (m-1) th encoder convolution module are represented, k is a defined parameter, and k is 3, 5 and 7,
Figure FDA0003416053010000022
An ith dimension value representing the output of the (m-1) th encoder convolution module;
through the continuous operation of m convolution modules, the above comprehensive expression q can be obtainedm
The meaning vector of the word is wc={wc1,...,wcn},wcIs the meaning vector of the c-th word, wc1Is the 1 st dimension value, w, of the meaning vector of the c wordcnIs the meaning vector of the c word to the nth dimension value;
the position vector of the word is pc={pc1,...,pcn},pcIs the position vector of the c-th word, pc1Is the position vector of the c-th word, the 1 st dimension value, pcnIs the position vector nth dimension value of the c word;
the comprehensive expression vector o of the wordsc={oc1,...,ocn},ocIs a comprehensive expression vector of the c-th word, oc1Is the 1 st dimension value, o, of the comprehensive expression vector of the c wordcnThe nth dimension value is the comprehensive expression vector of the c word;
2) converting the last word in the previous text of the next word of the dialog to be generated into a meaning vector of the last word, and adding the last word and the position vector of the last word to obtain the comprehensive expression of the last word;
inputting the comprehensive expression of the last word into a coding network combined with the convolutional layer and the gate linear unit, and combining the comprehensive expression obtained in the step 1) to obtain the expression of the next word to be generated, which specifically comprises the following steps:
2.1) expressing the last word in general ow={ow1,...,ownThe predicted expression r of the next word to be generated is obtained by the m convolution modulesm(ii) a Each convolution module consists of a convolution calculation operation and a non-linear calculation operation,the convolution operation generates two columns of d-dimensional vectors Y ═ a, B according to the following formula]∈R2d
Y=fconv(X)=WmX+bm
Wherein A is a first row d-dimensional vector, B is a second row d-dimensional vector, R2dSet of all vectors for 2d dimension, fconv(X) represents convolution operation, X represents input mapping expression vector of convolution calculation operation, WmRepresenting the weight matrix in the mth convolution calculation operation, bmRepresenting a vector of bias values in the mth convolution calculation operation;
two columns of d-dimensional vectors Y ═ A, B are obtained through calculation]∈R2d
2.2) nonlinear computation operation the output Y generated by the convolution operation of step 2.1) is ═ a, B]∈R2dThe second column of d-dimensional vectors B, combined with the gate operation function δ (B), obtains the output g ═ δ (B) of the information flow in the control network, which will be passed to the next neuron;
the output Y generated by the convolution operation is ═ A, B]∈R2dCombining the generated output g of the information flow quantity in the control network with the d-dimensional vector A in the first column, and obtaining the convolution module output of the encoder according to the following formula;
Figure FDA0003416053010000031
wherein r isi mRepresenting the i-dimensional value, f, of the output of the m-th encoder convolution moduleconv(.) represents a convolution operation,
Figure FDA0003416053010000032
(i-k/2) to (i + k/2) dimensions representing the output of the (m-1) th encoder convolution module, k being a well-defined parameter, ri m-1An ith dimension value representing the output of the (m-1) th encoder convolution module;
2.3) combining the i-dimensional value r of the output of the m-th convolution module of the decoder using the following formulai mObtaining the decoder volumeThe product module corresponds to the ith dimension value output by the attention mechanism
Figure FDA0003416053010000033
Figure FDA0003416053010000034
Wherein,
Figure FDA0003416053010000035
a matrix of the weights is represented by,
Figure FDA0003416053010000036
representing an offset vector, giRepresenting a parameter coefficient;
the ith dimension value corresponding to the attention mechanism output of the mth convolution module of the decoder can then be obtained using the following formula
Figure FDA0003416053010000037
Combining j-dimension values in the m-convolution module output of the encoder
Figure FDA0003416053010000038
Figure FDA0003416053010000039
For the comprehensive expression vector q in step 1)mJ dimensional value to obtain corresponding activation parameter
Figure FDA00034160530100000310
Figure FDA0003416053010000041
Then combining j dimension value of integral output of coder
Figure FDA0003416053010000042
Combining the comprehensive expression vector o of the words in the encoder step 1)c={oc1,...,ocnJ-th dimension value o ofcj,ocjFor the comprehensive expression vector of the c word to the j dimension value, the i dimension value activation part addition item output by the m convolution module of the decoder is obtained
Figure FDA0003416053010000043
Figure FDA0003416053010000044
Adding the generated i-dimension value activation part output by the mth convolution module of the decoder
Figure FDA0003416053010000045
And the ith dimension value r output by the mth convolution module of the decoderi mAdding the data, and circularly processing by m convolution modules to obtain final decoder output rm
2.4) by dividing the output r of the decodermInput into the softmax function, the probability of the next word to be generated is obtained according to the following formula,
p(yi+1|y1,...,yi)=softmax(Worm+bo)
wherein, WoRepresenting a weight matrix, boRepresenting a bias vector, and softmax (.) representing a softmax function, and using the probability output to find a word corresponding to the maximum probability as a next word of the generated dialogue to be output;
the meaning vector of the last word is ww={ww1,...,wwn},wwIs the meaning vector of the last word, wwIs the meaning vector of the last word, the 1 st dimensional numerical value, wwnThe nth dimension value is the meaning vector of the last word;
the position vector of the last word is pw={pw1,...,pwn},pwAs a position vector of the last word, pw1Is the 1 st dimension value, p, of the position vector of the last wordwnThe nth dimension value is the position vector of the last word;
said final word is expressed as ow={ow1,...,own},owIs a comprehensive expression vector of the last word, ow1For the final word's comprehensive expression vector, the 1 st dimension value, ownThe nth dimension value of the comprehensive expression vector of the last word;
3) and training to obtain a final convolution dialogue generation model, and generating the required context dialogue by using the model.
CN201811057115.9A 2018-09-11 2018-09-11 Method for solving dialogue generation task by using convolution dialogue generation model Active CN109255020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811057115.9A CN109255020B (en) 2018-09-11 2018-09-11 Method for solving dialogue generation task by using convolution dialogue generation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811057115.9A CN109255020B (en) 2018-09-11 2018-09-11 Method for solving dialogue generation task by using convolution dialogue generation model

Publications (2)

Publication Number Publication Date
CN109255020A CN109255020A (en) 2019-01-22
CN109255020B true CN109255020B (en) 2022-04-01

Family

ID=65046678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811057115.9A Active CN109255020B (en) 2018-09-11 2018-09-11 Method for solving dialogue generation task by using convolution dialogue generation model

Country Status (1)

Country Link
CN (1) CN109255020B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196928B (en) * 2019-05-17 2021-03-30 北京邮电大学 Fully parallelized end-to-end multi-turn dialogue system and method with domain scalability

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273487A (en) * 2017-06-13 2017-10-20 北京百度网讯科技有限公司 Generation method, device and the computer equipment of chat data based on artificial intelligence
CN107506823A (en) * 2017-08-22 2017-12-22 南京大学 A kind of construction method for being used to talk with the hybrid production style of generation
CN107590153A (en) * 2016-07-08 2018-01-16 微软技术许可有限责任公司 Use the dialogue correlation modeling of convolutional neural networks
CN107980130A (en) * 2017-11-02 2018-05-01 深圳前海达闼云端智能科技有限公司 It is automatic to answer method, apparatus, storage medium and electronic equipment
CN108388944A (en) * 2017-11-30 2018-08-10 中国科学院计算技术研究所 LSTM neural network chips and its application method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9916538B2 (en) * 2012-09-15 2018-03-13 Z Advanced Computing, Inc. Method and system for feature detection
US10546066B2 (en) * 2016-08-31 2020-01-28 Microsoft Technology Licensing, Llc End-to-end learning of dialogue agents for information access
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590153A (en) * 2016-07-08 2018-01-16 微软技术许可有限责任公司 Use the dialogue correlation modeling of convolutional neural networks
CN107273487A (en) * 2017-06-13 2017-10-20 北京百度网讯科技有限公司 Generation method, device and the computer equipment of chat data based on artificial intelligence
CN107506823A (en) * 2017-08-22 2017-12-22 南京大学 A kind of construction method for being used to talk with the hybrid production style of generation
CN107980130A (en) * 2017-11-02 2018-05-01 深圳前海达闼云端智能科技有限公司 It is automatic to answer method, apparatus, storage medium and electronic equipment
CN108388944A (en) * 2017-11-30 2018-08-10 中国科学院计算技术研究所 LSTM neural network chips and its application method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Improving Variational Encoder-Decoders in Dialogue Generation;Xiaoyu Shen et al.;《The Thirty-Second AAAI Conference on Artificial Intelligence》;20180427;第32卷(第1期);5456-5463 *
Investigating Deep Reinforcement Learning Techniques in Personalized Dialogue Generation;Min Yang et al.;《Proceedings of the 2018 SIAM International Conference on Data Mining》;20180507;630-638 *
智能对话系统研究综述;贾熹滨 等;《北京工业大学学报》;20170910;第43卷(第9期);1344-1356 *

Also Published As

Publication number Publication date
CN109255020A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
CN109492227A (en) It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations
CN109165660B (en) Significant object detection method based on convolutional neural network
US11610124B2 (en) Learning compressible features
CN106776545B (en) Method for calculating similarity between short texts through deep convolutional neural network
CN112801280B (en) One-dimensional convolution position coding method of visual depth self-adaptive neural network
CN107798385B (en) Sparse connection method of recurrent neural network based on block tensor decomposition
WO2018095049A1 (en) Method and apparatus for generating recommended results
CN109710953B (en) Translation method and device, computing equipment, storage medium and chip
CN110569505B (en) A text input method and device
CN114493755A (en) A Self-Attention Sequence Recommendation Method Based on Temporal Information
CN107391501A (en) A kind of neural machine translation method of word-based prediction
CN110570346A (en) A method for style transfer of calligraphy based on recurrent generative adversarial network
CN112926303A (en) Malicious URL detection method based on BERT-BiGRU
CN114220496A (en) A deep learning-based retrosynthesis prediction method, device, medium and device
CN114528398B (en) A sentiment prediction method and system based on interactive dual graph convolutional network
CN114580710B (en) Environmental monitoring method based on transducer time sequence prediction
CN116046810B (en) Non-destructive testing method based on failure load of RPC cover
CN118152553A (en) Person post matching intelligent recommendation method, system and device
CN115757713A (en) An End-to-End Multi-Granularity Contrastive Learning Method for Video Text Retrieval
CN111507101B (en) An Irony Detection Method Based on Multi-level Semantic Capsule Routing
CN112446461A (en) Neural network model training method and device
CN114565625B (en) A mineral image segmentation method and device based on global features
CN110377711A (en) A method of open long video question-answering task is solved from attention network using layering convolution
CN113378546B (en) Non-autoregressive sentence sequencing method
CN109255020B (en) Method for solving dialogue generation task by using convolution dialogue generation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant