CN103280218A - Selection method based on voice recognition, mobile terminal device and information system thereof - Google Patents

Selection method based on voice recognition, mobile terminal device and information system thereof Download PDF

Info

Publication number
CN103280218A
CN103280218A CN2013101828630A CN201310182863A CN103280218A CN 103280218 A CN103280218 A CN 103280218A CN 2013101828630 A CN2013101828630 A CN 2013101828630A CN 201310182863 A CN201310182863 A CN 201310182863A CN 103280218 A CN103280218 A CN 103280218A
Authority
CN
China
Prior art keywords
answer
repayment
user
voice
natural language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101828630A
Other languages
Chinese (zh)
Inventor
张国峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Priority to CN2013101828630A priority Critical patent/CN103280218A/en
Priority to CN201710007339.8A priority patent/CN106847278A/en
Priority to TW102121404A priority patent/TWI511124B/en
Publication of CN103280218A publication Critical patent/CN103280218A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/081Search algorithms, e.g. Baum-Welch or Viterbi

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A selection method based on voice recognition, a mobile terminal device and an information system thereof are provided. The selection method comprises the following steps: receiving a first voice input; performing voice recognition and natural language processing on the first voice input to generate a corresponding first keyword; obtaining at least one first reply answer according to the first keyword; when the number of the first return answers is 1, performing corresponding operation according to the type corresponding to the first return answer; when the number of the first return answers is more than 1, sorting the first return answers, displaying the first return answers in a first candidate list and receiving second input voice; performing voice recognition and natural language processing on the second input voice to generate a corresponding second keyword; and selecting a second return answer from the second return answers in the first candidate list according to the second keyword.

Description

基于语音识别的选择方法及其移动终端装置及信息系统Selection method based on voice recognition, mobile terminal device and information system thereof

技术领域 technical field

本发明涉及一种选择方法及其移动终端装置及信息系统,特别是涉及一种基于语音识别的选择方法及其移动终端装置及信息系统。  The invention relates to a selection method, a mobile terminal device and an information system thereof, in particular to a voice recognition-based selection method, a mobile terminal device and an information system. the

背景技术 Background technique

在计算机的自然语言理解(Nature Language Understanding)中,通常会使用特定的语法来抓取用户的输入语句的意图或信息。因此,若数据库中储存有足够多的用户输入语句的数据,便能做到合理的判断。  In the computer's natural language understanding (Nature Language Understanding), a specific grammar is usually used to capture the intention or information of the user's input sentence. Therefore, if there are enough data of user input sentences stored in the database, a reasonable judgment can be made. the

在现有的作法中,有一种是利用内置的固定词列表来抓取用户的输入语句,而固定词列表中包含了特定的意图或信息所使用的特定用语,而用户需依照此特定用语来表达其意图或信息,其意图或信息才能被系统正确识别。然而,迫使用户去记住固定词列表的每个特定用语是相当不人性化的作法。例如:现有技术使用固定词列表的实施方式,要求用户在询问天气的时候必须说:“上海(或北京)明天(或后天)天气如何?”,而若用户使用其他比较自然的口语化表达也想询问天气状况时,比如是“上海明天怎么样啊?”,因为语句中未出现“天气”,所以现有技术就会理解成“上海有个叫明天的地方”,这样显然没有抓到用户的真正意图。另外,用户所使用的语句种类是十分复杂的,并且又时常有所变化,甚至有时用户可能会输入错误的语句,在此情况下必须要藉由模糊匹配的方式来抓取用户的输入语句。因此,仅提供僵化输入规则的固定词列表所能达到的效果就更差了。  In the existing practice, one is to use the built-in fixed word list to capture the user's input sentence, and the fixed word list contains specific intentions or specific terms used in information, and the user needs to follow this specific term. Only by expressing its intention or information can its intention or information be correctly recognized by the system. However, forcing the user to memorize each specific term of the fixed word list is quite inhumane. For example: the prior art uses the implementation mode of fixed word list, requires the user to say when asking weather: "Shanghai (or Beijing) tomorrow (or the day after tomorrow) weather how?" Also want to ask about weather conditions, such as "how is Shanghai tomorrow?", because "weather" does not appear in the sentence, so the prior art will be understood as "there is a place called tomorrow in Shanghai", which obviously does not catch the user's true intent. In addition, the types of sentences used by users are very complex and change from time to time. Sometimes users may even input wrong sentences. In this case, it is necessary to capture the user's input sentences by means of fuzzy matching. Therefore, it is even less effective to provide only fixed word lists with rigid input rules. the

此外,当利用自然语言理解来处理多种类型的用户意图时,有些相异的意图的语法结构却是相同的,例如当用户的输入语句为"我要看三国演义",其用户意图有可能是想看三国演义的电影,或是想看三国演义的书,因此通常在此情况中,便会匹配到两种可能意图来让用户做选择。然而,在很多情况下,提供不必要的可能意图来让用户做选择是十分多余且没效率的。例如,当用户的输入语句为"我想看超级星光大道"时,将使用者的意图匹配为看超 级星光大道的书或者画作是十分没必要的(因为超级星光大道是电视节目)。  In addition, when natural language understanding is used to process multiple types of user intentions, the grammatical structure of some different intentions is the same. For example, when the user's input sentence is "I want to watch Romance of the Three Kingdoms", the user intention may be You want to watch a movie about Romance of the Three Kingdoms, or you want to read a book about Romance of the Three Kingdoms, so usually in this case, two possible intentions will be matched for the user to choose. However, in many cases, it is superfluous and inefficient to provide unnecessary possible intents for users to choose. For example, when the user's input sentence is "I want to watch the Super Avenue of Fame", it is very unnecessary to match the user's intention as reading a book or painting on the Super Avenue of Fame (because the Super Avenue of Fame is a TV program). the

再者,一般而言,在全文检索中所获得的搜寻结果是非结构化的数据。非结构化数据内的信息是分散且不具关联的,例如,在google或百度等搜寻引擎输入关键字后,所获得的网页搜寻结果就是非结构化数据,因为搜寻结果必须通过人为的逐项阅读才能找到当中的有用信息,而这样的作法不仅浪费用户的时间,而且可能漏失想要的信息,所以在实用性上会受到很大的限制。  Furthermore, generally speaking, the search results obtained in the full-text search are unstructured data. The information in unstructured data is scattered and unrelated. For example, after entering keywords in search engines such as Google or Baidu, the webpage search results obtained are unstructured data, because the search results must be read item by item manually. In order to find the useful information in it, and such an approach not only wastes the user's time, but also may miss the desired information, so the practicality will be greatly limited. the

发明内容 Contents of the invention

本发明提供一种基于语音识别的选择方法及其移动终端装置及信息系统,可提升使用者操作的便利性。  The invention provides a selection method based on voice recognition, a mobile terminal device and an information system thereof, which can improve the convenience of user operation. the

本发明提出一种基于语音识别的选择方法,包括:接收第一语音输入;对第一语音输入进行语音识别以产生第一关键字;依据第一关键字产生至少一个第一回报答案;当选择第一回报答案的数量为1时,依据所选择第一回报答案的数据类型进行对应的操作;当选择第一回报答案的数量大于1时,显示包含第一回报答案的第一候选列表且接收第二语音输入;对第二语音输入进行语音识别以产生第二关键字;依据第二关键字从第一候选列表所显示的第一回报答案中选择第二回报答案。  The present invention proposes a selection method based on speech recognition, including: receiving a first speech input; performing speech recognition on the first speech input to generate a first keyword; generating at least one first return answer according to the first keyword; when selecting When the number of first reported answers is 1, perform the corresponding operation according to the data type of the selected first reported answer; when the number of selected first reported answers is greater than 1, display the first candidate list containing the first reported answer and receive second voice input; perform voice recognition on the second voice input to generate a second keyword; select the second reported answer from the first reported answers displayed in the first candidate list according to the second keyword. the

本发明提出一种移动终端装置,包括语音接收单元、显示单元、存储单元及数据处理单元。语音接收单元接收第一语音输入及第二语音输入。显示单元用以显示包含回报答案的候选列表。存储单元用以储存多个数据。数据处理单元耦接语音接收单元、显示单元及存储单元。数据处理单元对第一语音输入进行语音识别以产生第一关键字,并且依据第一关键字选择对应的第一回报答案。当选择的第一回报答案的数量为1时,数据处理单元依据所选择第一回报答案的类型进行对应的操作。当选择的第一回报答案的数量大于1时,数据处理单元控制显示单元显示包含第一回报答案的第一候选列表。数据处理单元对第二语音输入进行语音识别以产生第二关键字,并且依据第二关键字从第一候选列表的第一回报答案中选择第二回报答案。  The invention proposes a mobile terminal device, which includes a voice receiving unit, a display unit, a storage unit and a data processing unit. The voice receiving unit receives the first voice input and the second voice input. The display unit is used for displaying a candidate list including the reported answer. The storage unit is used for storing a plurality of data. The data processing unit is coupled to the voice receiving unit, the display unit and the storage unit. The data processing unit conducts voice recognition on the first voice input to generate a first keyword, and selects a corresponding first reported answer according to the first keyword. When the number of selected first reported answers is 1, the data processing unit performs a corresponding operation according to the type of the selected first reported answer. When the number of selected first reported answers is greater than 1, the data processing unit controls the display unit to display the first candidate list including the first reported answers. The data processing unit conducts voice recognition on the second voice input to generate a second keyword, and selects the second reported answer from the first reported answers in the first candidate list according to the second keyword. the

本发明提出一种信息系统,包括伺服器及移动终端装置。伺服器用以储存多个数据且具有语音识别功能。移动终端装置包括语音接收单元、显示单元及数据处理单元。语音接收单元接收第一语音输入及第二语音输入。显示 单元用以显示包含回报答案的候选列表。数据处理单元耦接语音接收单元、显示单元及伺服器。数据处理单元通过伺服器对第一语音输入进行语音识别以产生第一关键字,并且伺服器依据第一关键字选择对应的第一回报答案并传送至数据处理单元。当选择的第一回报答案的数量为1时,数据处理单元依据所选择的第一回报答案的类型进行对应的操作。当选择的第一回报答案的数量大于1时,数据处理单元控制显示单元显示包含第一回报答案的第一候选列表,以及数据处理单元通过伺服器对第二语音输入进行语音识别以产生第二关键字,并且伺服器依据第二关键字从第一候选列表的第一回报答案中选择第二回报答案并传送至数据处理单元。  The invention proposes an information system, including a server and a mobile terminal device. The server is used for storing a plurality of data and has a voice recognition function. The mobile terminal device includes a voice receiving unit, a display unit and a data processing unit. The voice receiving unit receives the first voice input and the second voice input. The display unit is used to display a candidate list containing the returned answer. The data processing unit is coupled to the voice receiving unit, the display unit and the server. The data processing unit performs voice recognition on the first voice input through the server to generate a first keyword, and the server selects a corresponding first reported answer according to the first keyword and sends it to the data processing unit. When the number of selected first reported answers is 1, the data processing unit performs a corresponding operation according to the type of the selected first reported answer. When the number of selected first reported answers is greater than 1, the data processing unit controls the display unit to display the first candidate list containing the first reported answer, and the data processing unit performs voice recognition on the second voice input through the server to generate the second keywords, and the server selects a second reported answer from the first reported answers in the first candidate list according to the second keyword and sends it to the data processing unit. the

本发明提出一种基于语音识别的选择方法,包括:依据该第一关键字于一结构化数据库进行检索以取得至少一第一回报答案;当选择的该第一回报答案的数量大于1时,显示一包含该第一回报答案的第一候选数据;在显示该第一候选列表后接收一第二语音输入,且对该第二语音输入进行语音识别以产生一第二关键字;以及,依据该第二用户意图从第一候选列表的该第一回报答案中选择第二回报答案。  The present invention proposes a selection method based on speech recognition, including: searching a structured database according to the first keyword to obtain at least one first return answer; when the number of the selected first return answer is greater than 1, displaying a first candidate data including the first reported answer; receiving a second voice input after displaying the first candidate list, and performing voice recognition on the second voice input to generate a second keyword; and, according to The second user intends to select a second reported answer from the first reported answer in the first candidate list. the

基于上述,本发明实施例的基于语音识别的选择方法及其移动终端装置及信息系统,其对第一语音输入及第二语音输入进行语音识别及自然语言处理以确认第一语音输入及第二语音输入对应的关键字,并且依据第一语音输入及第二语音输入对应的关键字对回报答案进行选择。藉此,可提升使用者操作的便利性。  Based on the above, the speech recognition-based selection method and its mobile terminal device and information system in the embodiment of the present invention perform speech recognition and natural language processing on the first speech input and the second speech input to confirm the first speech input and the second speech input. The corresponding keyword is voice input, and the reported answer is selected according to the keywords corresponding to the first voice input and the second voice input. Thereby, the convenience of the user's operation can be improved. the

为使本发明的上述特征和优点能更明显易懂,下文特举实施例,并结合附图详细说明如下。  In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail with reference to the accompanying drawings. the

附图说明 Description of drawings

图1为根据本发明的一实施例的自然语言理解系统的方块图。  FIG. 1 is a block diagram of a natural language understanding system according to an embodiment of the present invention. the

图2为根据本发明的一实施例的自然语言处理器对用户的各种请求信息的分析结果的示意图。  Fig. 2 is a schematic diagram of analysis results of various request information of users by a natural language processor according to an embodiment of the present invention. the

图3A是根据本发明的一实施例的结构化数据库所储存的具有特定数据结构的多个记录的示意图。  FIG. 3A is a schematic diagram of a plurality of records with a specific data structure stored in a structured database according to an embodiment of the present invention. the

图3B是根据本发明的另一实施例的结构化数据库所储存的具有特定数据结构的多个记录的示意图。  FIG. 3B is a schematic diagram of a plurality of records with a specific data structure stored in a structured database according to another embodiment of the present invention. the

图3C是根据本发明的一实施例的指引数据储存装置所储存的指引数据的示意图。  FIG. 3C is a schematic diagram of guidance data stored in the guidance data storage device according to an embodiment of the present invention. the

图4A为根据本发明的一实施例的检索方法的流程图。  FIG. 4A is a flowchart of a retrieval method according to an embodiment of the present invention. the

图4B为根据本发明的另一实施例的自然语言理解系统工作过程的流程图。  FIG. 4B is a flow chart of the working process of the natural language understanding system according to another embodiment of the present invention. the

图5A是依照本发明一实施例所绘示的自然语言对话系统的方块图。  FIG. 5A is a block diagram of a natural language dialogue system according to an embodiment of the present invention. the

图5B是依照本发明一实施例所绘示的自然语言理解系统的方块图。  FIG. 5B is a block diagram of a natural language understanding system according to an embodiment of the present invention. the

图5C是依照本发明另一实施例所绘示的自然语言对话系统的方块图。  FIG. 5C is a block diagram of a natural language dialogue system according to another embodiment of the present invention. the

图6是依照本发明一实施例所绘示的修正语音应答的方法流程图。  FIG. 6 is a flowchart of a method for modifying a voice response according to an embodiment of the invention. the

图7A是依照本发明一实施例所绘示的自然语言对话系统的方块图。  FIG. 7A is a block diagram of a natural language dialogue system according to an embodiment of the present invention. the

图7B是依照本发明另一实施例所绘示的自然语言对话系统的方块图。  FIG. 7B is a block diagram of a natural language dialogue system according to another embodiment of the present invention. the

图8A是依照本发明一实施例所绘示的自然语言对话方法流程图。  FIG. 8A is a flowchart of a natural language dialogue method according to an embodiment of the present invention. the

图8B是根据本发明的再一实施例的结构化数据库所储存的具有特定数据结构的多个记录的示意图。  FIG. 8B is a schematic diagram of multiple records with specific data structures stored in the structured database according to yet another embodiment of the present invention. the

图9为依据本发明一实施例的移动终端装置的系统示意图。  FIG. 9 is a system diagram of a mobile terminal device according to an embodiment of the present invention. the

图10为依据本发明一实施例的信息系统的系统示意图。  FIG. 10 is a system diagram of an information system according to an embodiment of the present invention. the

图11为依据本发明一实施例的基于语音识别的选择方法的流程图。  FIG. 11 is a flowchart of a selection method based on voice recognition according to an embodiment of the present invention. the

图12是依照本发明一实施例所绘示的语音操控系统的方块图。  FIG. 12 is a block diagram of a voice control system according to an embodiment of the present invention. the

图13是依照本发明另一实施例所绘示的语音操控系统的方块图。  FIG. 13 is a block diagram of a voice control system according to another embodiment of the present invention. the

图14是依照本发明一实施例所绘示的语音操控方法的流程图。  FIG. 14 is a flowchart of a voice control method according to an embodiment of the present invention. the

附图符号说明  Description of reference symbols

100、520、520’、720、720’:自然语言理解系统  100, 520, 520’, 720, 720’: Natural Language Understanding System

102、503、503’、703、902、902’:请求信息  102, 503, 503’, 703, 902, 902’: request information

104:分析结果  104: Analysis Results

106:可能意图语法数据  106: Possible intent syntax data

108、509、509’、711、904、904’:关键字  108, 509, 509', 711, 904, 904': keywords

110:回应结果  110: Response result

112:意图数据  112: Intent Data

114:确定意图语法数据  114: Determine intent syntax data

116:分析结果输出模块  116: Analysis result output module

200:检索系统  200: retrieval system

220:结构化数据库  220: Structured Databases

240:搜寻引擎  240: Search Engines

260:检索接口单元  260: Retrieve interface unit

280:指引数据储存装置  280: Guide Data Storage Device

300:自然语言处理器  300: Natural Language Processor

302、832、834、836、838:记录  302, 832, 834, 836, 838: record

304:标题字段  304: Title field

306:内容字段  306: Content field

308:分字段  308: subfield

310:指引字段  310: guide field

312:数值字段  312: Numeric field

314:来源字段  314: source field

316:热度字段  316: Heat field

318、852、854:喜好字段  318, 852, 854: preferences field

320、862、864:厌恶字段  320, 862, 864: disgust field

400:知识辅助理解模块  400: Knowledge Assisted Comprehension Module

500、500’、700、700’:自然语言对话系统  500, 500’, 700, 700’: Natural Language Dialogue System

501、701:语音输入  501, 701: voice input

507、507’、707:语音应答  507, 507', 707: voice response

510、710:语音取样模块  510, 710: voice sampling module

511、511’、711、906、906’:回报答案  511, 511', 711, 906, 906': return the answer

513、513’、713:语音  513, 513', 713: Speech

522、722:语音识别模块  522, 722: Speech recognition module

524、724:自然语言处理模块  524, 724: Natural Language Processing Module

526、726:语音合成模块  526, 726: speech synthesis module

530、740:语音合成数据库  530, 740: speech synthesis database

702:语音综合处理模块  702: Speech integrated processing module

715:用户喜好数据  715: User preference data

717:用户喜好记录  717: User preferences record

730:特性数据库  730: Feature Database

872、874:字段  872, 874: fields

900、1010:移动终端装置  900, 1010: mobile terminal device

908、908’:候选列表  908, 908': candidate list

910、1011:语音接收单元  910, 1011: voice receiving unit

920、1013:数据处理单元  920, 1013: data processing unit

930、1015:显示单元  930, 1015: display unit

940:存储单元  940: storage unit

1000:信息系统  1000: Information Systems

1020:伺服器  1020: Server

SP1:第一语音  SP1: First Voice

SP2:第二语音  SP2: Second Speech

1200、1300:语音操控系统  1200, 1300: voice control system

1210:辅助启动装置  1210: Auxiliary starting device

1212、1222:无线传输模块  1212, 1222: wireless transmission module

1214:触发模块  1214: trigger module

1216:无线充电电池  1216: Wireless charging battery

12162:电池单元  12162: battery unit

12164:无线充电模块  12164: Wireless charging module

1220、1320:移动终端装置  1220, 1320: mobile terminal device

1221:语音系统  1221: Voice system

1224:语音取样模块  1224: Speech sampling module

1226:语音合成模块  1226: Speech Synthesis Module

1227:语音输出接口  1227: Voice output interface

1228:通讯模块  1228: Communication module

1230:(云端)伺服器  1230: (Cloud) Server

1232:语音理解模块  1232: Speech Understanding Module

12322:语音识别模块  12322: Voice recognition module

12324:语音处理模块  12324: Voice processing module

S410~S450:根据本发明一实施例的检索方法的步骤  S410~S450: Steps of the retrieval method according to an embodiment of the present invention

S510~S590:根据本发明一实施例的自然语言理解系统工作过程的步骤  S510~S590: Steps in the working process of the natural language understanding system according to an embodiment of the present invention

S602、S604、S606、S608、S610、S612:修正语音应答的方法各步骤  S602, S604, S606, S608, S610, S612: Each step of the method for correcting the voice response

S802~S890:根据本发明一实施例的自然语言对话方法各步骤  S802~S890: each step of the natural language dialogue method according to an embodiment of the present invention

S1100~S1190:依据本发明一实施例的基于语音识别的选择方法的各步骤  S1100~S1190: each step of the selection method based on speech recognition according to an embodiment of the present invention

S1402~S1412:依据本发明一实施例的语音操控方法的各步骤  S1402~S1412: each step of the voice control method according to an embodiment of the present invention

具体实施方式 Detailed ways

由于现有运用固定词列表的实施方式只能提供僵化的输入规则,对于用户多变的输入语句的判断能力十分不足,所以常导致对用户的意图判断错误而找不到所需的信息、或是因为判断力不足而输出不必要的信息给用户等问题。此外,现有的搜寻引擎只能对用户提供分散、且相关不强的搜寻结果,于是用户还要花时间逐条检视才能过滤出所需信息,不仅浪费时间而且可能漏失所需信息。本发明即针对现有技术的前述问题提出一结构化数据的检索方法与系统,在结构化数据提供特定的字段来储存不同类型的数据元素,俾提供用户使用自然语音输入信息进行检索时,能快速且正确地判断用户的意图,进而提供所需信息予用户、或提供更精确讯息供其选取。  Because the existing implementation of using the fixed word list can only provide rigid input rules, the judgment ability for the user's changeable input sentence is very insufficient, so it often leads to wrong judgment of the user's intention and the required information cannot be found, or It is a problem such as outputting unnecessary information to the user due to insufficient judgment. In addition, existing search engines can only provide users with scattered and weakly relevant search results, so users have to spend time viewing items one by one to filter out the required information, which not only wastes time but also may miss the required information. The present invention proposes a structured data retrieval method and system for the foregoing problems of the prior art, and provides specific fields in the structured data to store different types of data elements, so that when users use natural voice input information for retrieval, they can Quickly and accurately judge the user's intention, and then provide the required information to the user, or provide more accurate information for them to choose. the

图1为根据本发明的一实施例的自然语言理解系统的方块图。如图1所示,自然语言理解系统100包括检索系统200、自然语言处理器300以及知识辅助理解模块400,知识辅助理解模块400耦接自然语言处理器300以及检索系统200,检索系统200还包括结构化数据库220、搜寻引擎240以及检索接口单元260,其中搜寻引擎240耦接结构化数据库220以及检索接口单元260。在本实施例中,检索系统200包括有检索接口单元260,但非以限定本发明,某些实施例中可能没有检索接口单元260,而以其他方式使搜寻引擎240对结构化数据库220进行全文检索。  FIG. 1 is a block diagram of a natural language understanding system according to an embodiment of the present invention. As shown in Figure 1, the natural language understanding system 100 includes a retrieval system 200, a natural language processor 300 and a knowledge-assisted understanding module 400, the knowledge-assisted understanding module 400 is coupled to the natural language processor 300 and the retrieval system 200, and the retrieval system 200 also includes The structured database 220 , the search engine 240 and the search interface unit 260 , wherein the search engine 240 is coupled to the structured database 220 and the search interface unit 260 . In this embodiment, the retrieval system 200 includes a retrieval interface unit 260, but it is not intended to limit the present invention. In some embodiments, the retrieval interface unit 260 may not be present, and in other ways the search engine 240 performs a full-text search on the structured database 220. search. the

当用户对自然语言理解系统100发出请求信息102时,自然语言处理器300可分析请求信息102,并在将所分析的可能意图语法数据106送往知识辅助理解模块400,其中可能意图语法数据106包含关键字108与意图数据112。随后,知识辅助理解模块400取出可能意图语法数据106中的关键字108并送往检索系统200并将意图数据112储存在知识辅助理解模块400内部,而检索系统200中的搜寻引擎240将依据关键字108对结构化数据库220进行全文检索之后,再将全文检索的回应结果110回传至知识辅助理解模块400。接着,知识辅助理解模块400依据回应结果110对所储存的意图 数据112进行比对,并将所求得的确定意图语法数据114送往分析结果输出模块116,而分析结果输出模块116再依据确定意图语法数据114,传送分析结果104至伺服器(未显示),随后在查询到用户所需的数据后将其送给用户。应注意的是,分析结果104可包含关键字108,亦可输出包含关键字108的记录(例如图3A/3B的记录)的部分信息(例如记录302的编号)、或是全部的信息。此外,分析结果104可直接被伺服器转换成语音输出予用户、或是再经过特定处理后再输出对应语音予用户(后文会再详述“特定处理”的方式与所包含的内容与信息),本领域的技术人员可依据实际需求设计检索系统200所输出的信息,本发明对此不予以限制。  When the user sends request information 102 to the natural language understanding system 100, the natural language processor 300 can analyze the request information 102, and send the analyzed possible intention grammar data 106 to the knowledge-aided understanding module 400, wherein the possible intention grammar data 106 Contains keywords 108 and intent data 112 . Subsequently, the knowledge-assisted understanding module 400 takes out the keywords 108 in the possible intent grammar data 106 and sends them to the retrieval system 200 and stores the intent data 112 inside the knowledge-assisted understanding module 400, and the search engine 240 in the retrieval system 200 will be based on the keywords After the word 108 performs full-text search on the structured database 220 , the response result 110 of the full-text search is sent back to the knowledge-aided understanding module 400 . Next, the knowledge-aided understanding module 400 compares the stored intent data 112 according to the response result 110, and sends the determined intent grammar data 114 obtained to the analysis result output module 116, and the analysis result output module 116 then determines the The intent grammar data 114, transmits the analysis result 104 to the server (not shown), and then sends it to the user after querying the data required by the user. It should be noted that the analysis result 104 may include the keyword 108 , and may also output partial information (such as the number of the record 302 ) or all information of the record (such as the record in FIG. 3A/3B ) containing the keyword 108 . In addition, the analysis result 104 can be directly converted into voice by the server and output to the user, or the corresponding voice can be output to the user after specific processing (the method of "specific processing" and the content and information included will be described in detail later. ), those skilled in the art can design the information output by the retrieval system 200 according to actual needs, which is not limited in the present invention. the

上述的分析结果输出模块116可视情况与其他模块相结合,例如在一实施例中可并入知识辅助理解模块400中、或是在另一实施例中分离于自然语言理解系统100而位于伺服器(例如包含自然语言理解系统100者)中,于是伺服器将直接接收意图语法数据114再进行处理。此外,知识辅助理解模块400可将意图数据112储存在模块内部的储存装置中、在自然语言理解系统100中、伺服器中(例如包含自然语言理解系统100者)、或是在任何可供知识辅助理解模块400可以撷取到的储存器中,本发明对此并不加以限定。再者,自然语言理解系统100包括检索系统200、自然语言处理器300以及知识辅助理解模块400可以用硬件、软件、固件、或是上述方式的各种结合方式来构筑,本发明亦未对此进行限制。  The above-mentioned analysis result output module 116 can be combined with other modules as appropriate, for example, in one embodiment, it can be incorporated into the knowledge-aided understanding module 400, or in another embodiment, it can be separated from the natural language understanding system 100 and located in the server In the server (for example, including the natural language understanding system 100), the server will directly receive the intent grammar data 114 for processing. In addition, the knowledge-aided understanding module 400 can store the intent data 112 in a storage device inside the module, in the natural language understanding system 100, in a server (such as including the natural language understanding system 100), or in any available knowledge The auxiliary comprehension module 400 can retrieve the storage, which is not limited in the present invention. Furthermore, the natural language understanding system 100 including the retrieval system 200, the natural language processor 300, and the knowledge-aided understanding module 400 can be constructed with hardware, software, firmware, or various combinations of the above methods, and the present invention does not cover this Limit. the

前述自然语言理解系统100可以位于云端伺服器中,也可以位于区域网路中的伺服器,甚或是位于个人计算机、移动计算机装置(如笔记型计算机)或移动通讯装置(如手机)等。自然语言理解系统100或检索系统200中的各构件也不一定需设置在同一机器中,而可视实际需要分散在不同装置或系统通过各种不同的通讯协议来连结。例如,自然语言理解处理器300及知识辅助理解模块400可配置于同一智能型手机内,而检索系统200可配置在另一云端伺服器中;或者是,检索接口单元260、自然语言理解处理器300及知识辅助理解模块400可配置于同一笔记型计算机内,而搜寻引擎240及结构化数据库220可配置于区域网路中的另一伺服器中。此外,当自然语言理解系统100皆位于伺服器时(不论是云端伺服器或区域网路伺服器),可以将检索系统200、自然语言理解处理器300、以及知识辅助理解模块400配置不同的计算机主机中,并由伺服器主系统来统筹其相互间的讯息与数据的传 送。当然,检索系统200、自然语言理解处理器300、以及知识辅助理解模块400亦可视实际需求而将其中两者或全部合并在一计算机主机中,本发明并不对这部分的配置进行限制。  The aforementioned natural language understanding system 100 can be located in a cloud server, a server in a local area network, or even a personal computer, a mobile computer device (such as a notebook computer), or a mobile communication device (such as a mobile phone). The components of the natural language understanding system 100 or the retrieval system 200 do not necessarily need to be installed in the same machine, but may be distributed in different devices or systems and connected by various communication protocols according to actual needs. For example, the natural language understanding processor 300 and the knowledge-assisted understanding module 400 can be configured in the same smart phone, and the retrieval system 200 can be configured in another cloud server; or, the retrieval interface unit 260, the natural language understanding processor 300 and the knowledge aided comprehension module 400 can be configured in the same notebook computer, and the search engine 240 and the structured database 220 can be configured in another server in the local area network. In addition, when the natural language understanding system 100 is located on a server (whether it is a cloud server or a local network server), the retrieval system 200, the natural language understanding processor 300, and the knowledge-aided understanding module 400 can be configured on different computers Host, and the main server system coordinates the transmission of messages and data between them. Certainly, the retrieval system 200 , the natural language understanding processor 300 , and the knowledge-aided understanding module 400 can also combine two or all of them in a computer host according to actual needs, and the present invention does not limit the configuration of this part. the

在本发明的实施例中,用户可以用各种方式来向自然语言处理器300发出请求信息,例如用说话的语音输入或是文字描述等方式来发出请求信息。举例来说,若自然语言理解系统100是位于云端或区域网路中的伺服器(未显示)内,则用户可先藉由移动装置(例如手机、PDA、平板计算机或类似系统)来输入请求信息102,接着再通过电信系统业者来将请求信息102传送至伺服器中的自然语言理解系统100,来让自然语言处理器300进行请求信息102的分析,最后伺服器于确认用户意图后,再通过分析结果输出模块116将对应的分析结果104通过伺服器的处理后,将用户所请求的信息传回用户的移动装置。举例来说,请求信息102可以是用户希望藉由自然语言理解系统100来求得答案的问题(例如"明天上海的天气怎么样啊"),而自然语言理解系统100在分析出用户的意图是查询上海明天的天气时,将通过分析结果输出模块116将所查询的天气数据作为输出结果104送给用户。此外,若用户对自然语言理解系统100所下的指令为"我要看让子弹飞"、"我想听一起走过的日子"时,因为“让子弹飞”或“一起走过的日子”可能包含不同的领域,所以自然语言处理器300会将用户的请求信息102分析成一个或一个以上的可能意图语法数据106,此可能意图语法数据106包括有关键字108及意图数据112,然后再经由对检索系统220中的结构化数据240进行全文检索后,进而确认用户的意图。  In the embodiment of the present invention, the user can send request information to the natural language processor 300 in various ways, for example, send the request information by speaking voice input or text description. For example, if the natural language understanding system 100 is located in the cloud or in a server (not shown) in a local area network, the user can first input the request through a mobile device (such as a mobile phone, PDA, tablet computer or similar system) Information 102, and then the request information 102 is sent to the natural language understanding system 100 in the server through the telecommunications system operator, so that the natural language processor 300 can analyze the request information 102, and finally the server confirms the user's intention, and then After the corresponding analysis result 104 is processed by the server through the analysis result output module 116 , the information requested by the user is sent back to the user's mobile device. For example, the request information 102 may be a question to which the user wishes to obtain an answer through the natural language understanding system 100 (such as "how is the weather in Shanghai tomorrow"), and the natural language understanding system 100 analyzes the user's intention is When inquiring about the weather in Shanghai tomorrow, the inquired weather data will be sent to the user as the output result 104 through the analysis result output module 116 . In addition, if the user's instruction to the natural language understanding system 100 is "I want to watch Let the Bullets Fly" or "I want to listen to the days we walked together", because "Let the Bullets Fly" or "The Days We Walked Together" may contain different fields, so the natural language processor 300 will analyze the user's request information 102 into one or more possible intent grammar data 106, and the possible intent grammar data 106 includes keywords 108 and intent data 112, and then After performing a full-text search on the structured data 240 in the search system 220, the user's intention is further confirmed. the

进一步来说,当用户的请求信息102为"明天上海的天气怎么样啊时,自然语言处理器300经过分析后,可产生一个可能意图语法数据106:  Further, when the user's request information 102 is "how is the weather in Shanghai tomorrow, after analysis by the natural language processor 300, a possible intent grammar data 106 can be generated:

"<queryweather>,<city>=上海,<时间>=明天"。  "<queryweather>,<city>=Shanghai,<time>=tomorrow". the

在一实施例中,如果自然语言理解系统100认为用户的意图已相当明确,便可以直接将用户的意图(亦即查询明天上海的天气)通过分析结果输出模块116输出分析结果104至伺服器,而伺服器可在查询到用户所指定的天气候传送给用户。又例如,当用户的请求信息102为"我要看三国演义"时,自然语言处理器300经过分析后,可产生出三个可能意图语法数据106:  In one embodiment, if the natural language understanding system 100 thinks that the user's intention is quite clear, it can directly output the user's intention (that is, query the weather in Shanghai tomorrow) through the analysis result output module 116 to output the analysis result 104 to the server, The server can send the weather information specified by the user to the user after querying it. For another example, when the user's request information 102 is "I want to watch the Romance of the Three Kingdoms", the natural language processor 300 can produce three possible intention grammar data 106 after analysis:

"<readbook>,<bookname>=三国演义";  "<readbook>,<bookname>=Romance of the Three Kingdoms"; 

"<watchTV>,<TVname>=三国演义";以及  "<watchTV>,<TVname>=Romance of the Three Kingdoms"; and

"<watchfilm>,<filmname>=三国演义"。  "<watchfilm>,<filmname>=Romance of the Three Kingdoms". the

这是因为可能意图语法数据106中的关键字108(亦即“三国演义”)可能属于不同的领域,亦即书籍(<readbook>)、电视剧(<watchTV>)、以及电影(<readfilm>)三个领域,所以一个请求信息102可分析成多个可能意图语法数据106,因此需要通过知识辅助理解模块400做进一步分析,来确认用户的意图。再举另一个例子来说,若用户输入"我要看让子弹飞"时,因其中的"让子弹飞"有可能是电影名称或是书名称,所以也可能出现至少以下两个可能意图语法数据106:  This is because the keywords 108 (i.e. "Romance of the Three Kingdoms") in the syntax data 106 may belong to different fields, i.e. books (<readbook>), TV dramas (<watchTV>), and movies (<readfilm>) Three domains, so one request information 102 can be analyzed into multiple possible intent grammar data 106, so it needs to be further analyzed by the knowledge-assisted comprehension module 400 to confirm the user's intent. To give another example, if the user enters "I want to watch Let the Bullets Fly", because "Let the Bullets Fly" may be the title of a movie or a book, at least the following two possible intent grammars may appear Data 106:

"<readbook>,<bookname>=让子弹飞";以及  "<readbook>,<bookname>=let the bullets fly"; and

"<watchfilm>,<filmname>=让子弹飞";  "<watchfilm>,<filmname>=let the bullets fly"; 

其分别属于书籍与电影两个领域。上述的可能意图语法数据106随后需通过知识辅助理解模块400做进一步分析,并从中求得确定意图语法数据114,来表达用户的请求信息的明确意图。当知识辅助理解模块400分析可能意图语法数据106时,知识辅助理解模块400可通过检索接口260传送关键字108(例如上述的“三国演义”或“让子弹飞”)给检索系统200。检索系统200中的结构化数据库220储存了具有特定数据结构的多个记录,而搜寻引擎240能藉由检索接口单元260所接收的关键字108来对结构化数据库220进行全文检索,并将全文检索所获得的回应结果110回传给知识辅助理解模块400,随后知识辅助理解模块400便能藉由此回应结果110来求得确定意图语法数据114。至于对结构化数据库220进行全文检索以确定意图语法数据114的细节,将在后面通过图3A、图3B与相关段落做更详细的描述。  They belong to two fields of books and movies respectively. The above-mentioned possible intent grammar data 106 needs to be further analyzed by the knowledge-assisted comprehension module 400, and the definite intent grammar data 114 is obtained therefrom to express the clear intention of the user's requested information. When the knowledge aided comprehension module 400 analyzes the possible intent grammar data 106 , the knowledge aided comprehension module 400 can transmit the keywords 108 (such as the aforementioned “Romance of the Three Kingdoms” or “Let the bullets fly”) to the retrieval system 200 through the retrieval interface 260 . The structured database 220 in the retrieval system 200 stores multiple records with a specific data structure, and the search engine 240 can perform a full-text search on the structured database 220 through the keyword 108 received by the retrieval interface unit 260, and retrieve the full-text The response result 110 obtained by the search is sent back to the knowledge-aided understanding module 400 , and then the knowledge-aided understanding module 400 can use the response result 110 to obtain the definite intent grammar data 114 . As for the details of performing a full-text search on the structured database 220 to determine the intent grammar data 114 , it will be described in more detail later through FIG. 3A , FIG. 3B and related paragraphs. the

在本发明的概念中,自然语言理解系统100能先撷取用户的请求信息102中的关键字108,并藉由结构化数据库220的全文检索结果来判别关键字108的领域属性,例如上述输入“我要看三国演义”时,会产生分别属于书籍、电视剧、电影三个领域的可能意图语法数据106,随后再进一步分析并确认用户的明确意图。因此用户能够很轻松地以口语化方式来表达出其意图或信息,而不需要特别熟记特定用语,例如现有作法中关于固定词列表的特定用语。  In the concept of the present invention, the natural language understanding system 100 can first capture the keywords 108 in the user's request information 102, and use the full-text search results of the structured database 220 to determine the domain attributes of the keywords 108, such as the above-mentioned input When "I want to watch Romance of the Three Kingdoms", possible intention grammar data 106 belonging to the three fields of books, TV dramas and movies will be generated, and then the user's clear intention will be further analyzed and confirmed. Therefore, users can easily express their intentions or information in a colloquial manner without needing to memorize specific terms, such as the specific terms related to the fixed word list in the existing practice. the

图2为根据本发明的一实施例的自然语言处理器300对用户的各种请求信息的分析结果的示意图。  FIG. 2 is a schematic diagram of analysis results of various request information of a user by a natural language processor 300 according to an embodiment of the present invention. the

如图2所示,当用户的请求信息102为"明天上海的天气怎么样啊"时, 自然语言处理器300经过分析后,可产生出可能意图语法数据106为:  As shown in Figure 2, when the user's request information 102 is "how is the weather in Shanghai tomorrow", after analysis by the natural language processor 300, the possible intent grammar data 106 can be generated as follows:

"<queryweather>,<city>=上海,<时间>=明天"  "<queryweather>,<city>=Shanghai,<time>=tomorrow"

其中意图数据112为"<queryweather>"、而关键字108为"上海"与"明天"。由于经自然语言处理器300的分析后只取得一组意图语法数据106(查询天气<queryweather>),因此在一实施例中,知识辅助理解模块400可直接取出关键字108"上海"与"明天"作为分析结果104送往伺服器来查询天气的信息(例如查询明天上海天气概况、包含气象、气温…等信息),而不需要对结构化数据库220进行全文检索来判定用户意图。当然,在一实施例中,仍可对结构化数据库220进行全文检索做更精确的用户意图判定,本领域的技术人员可依据实际需求进行变更。  The intent data 112 is "<queryweather>", and the keywords 108 are "Shanghai" and "tomorrow". Since only one set of intent grammar data 106 (query weather <queryweather>) is obtained after analysis by the natural language processor 300, in one embodiment, the knowledge-aided understanding module 400 can directly extract the keywords 108 "Shanghai" and "tomorrow". "As the analysis result 104, it is sent to the server to query weather information (for example, to query tomorrow's Shanghai weather overview, including information such as weather, temperature, etc.), without the need to perform a full-text search on the structured database 220 to determine the user's intention. Of course, in one embodiment, the full-text search of the structured database 220 can still be performed to determine the user's intention more accurately, and those skilled in the art can make changes according to actual needs. the

此外,当用户的请求信息102为"我要看让子弹飞"时,因为可产生出两个可能意图语法数据106:  In addition, when the user's request information 102 is "I want to watch and let the bullets fly", because two possible intent syntax data 106 can be produced:

"<readbook>,<bookname>=让子弹飞";以及  "<readbook>,<bookname>=let the bullets fly"; and

"<watchfilm>,<filmname>=让子弹飞";  "<watchfilm>,<filmname>=let the bullets fly"; 

与两个对应的意图数据112"<readbook>"与"<watchfilm>"、以及两个相同的关键字108"让子弹飞",来表示其意图可能是看"让子弹飞"的书籍或是看"让子弹飞"的电影。为进一步确认用户的意图,将通过知识辅助理解模块400传送关键字108"让子弹飞"给检索接口单元260,接着搜寻引擎240藉由此关键字108"让子弹飞"来对结构化数据库220进行全文检索,以确认"让子弹飞"应该是书名称或是电影名称,藉以确认用户的意图。  Two corresponding intent data 112 "<readbook>" and "<watchfilm>" and two identical keywords 108 "let the bullets fly" indicate that the intent may be to read books or "let the bullets fly" Watch the "Let the Bullets Fly" movie. In order to further confirm the user's intention, the keyword 108 "let the bullets fly" will be sent to the retrieval interface unit 260 through the knowledge-aided understanding module 400, and then the search engine 240 will use the keyword 108 "let the bullets fly" to search the structured database 220 A full-text search is performed to confirm that "let the bullets fly" should be the title of a book or movie to confirm the user's intent. the

再者,当用户的请求信息102为"我想听一起走过的日子"时,可产生出两个可能意图语法数据106:  Furthermore, when the user's request information 102 is "I want to listen to the days I walked together", two possible intent grammar data 106 can be produced:

"<playmusic>,<singer>=一起走过,<songname>=日子";"<playmusic>,<songname>=一起走过的日子"  "<playmusic>,<singer>=walked together,<songname>=days"; "<playmusic>,<singer>=days walked together" 

两个对应的相同的意图数据112"<playmusic>"、以及两组对应的关键字108"一起走过"与"日子"及"一起走过的日子",来分别表示其意图可能是听歌手"一起走过"所唱的歌曲"日子"、或是听歌曲"一起走过的日子",此时知识辅助理解模块400可传送第一组关键字108"一起走过"与"日子"以及第二组关键字"一起走过的日子"给检索接口单元260,来确认是否有"一起走过"这位歌手所演唱的"日子"这首歌(第一组关键字所隐含的用户意图)、或是否有"一起走过的日子"这首歌(第二组关键字所隐含的用户意图),藉以确认用户的 意图。然而,本发明并不限于在此所表示的各可能意图语法数据与意图数据所对应的格式与名称。  Two corresponding identical intent data 112 "<playmusic>" and two sets of corresponding keywords 108 "walked together" and "days" and "days walked together", to indicate that the intention may be to listen to the singer The song "days" sung by "walking together", or listening to the song "days walking together", at this time, the knowledge aided understanding module 400 can transmit the first group of keywords 108 "walking together" and "days" and The second group of keywords "the days we walked together" is given to the retrieval interface unit 260 to confirm whether there is the song "days" sung by the singer of "walking together" (the implied user of the first group of keywords intent), or whether there is the song "The Days We Walked Together" (the implied user intent of the second group of keywords), so as to confirm the user's intent. However, the present invention is not limited to the formats and names corresponding to the possible intent grammar data and intent data represented here. the

图3A是根据本发明的一实施例的结构化数据库220所储存的具有特定数据结构的多个记录的示意图。  FIG. 3A is a schematic diagram of a plurality of records with a specific data structure stored in the structured database 220 according to an embodiment of the present invention. the

一般而言,在一些现有的全文检索作法中,所获得的搜寻结果是非结构化的数据(例如通过google或百度所搜寻的结果),因其搜寻结果的各项信息是分散且不具关联的,所以用户必须再对各项信息逐一检视,因此造成实用性的限制。然而,在本发明的概念中,能藉由结构化数据库来有效增进检索的效率与正确性。因为本发明所揭示的结构化数据库中的每个记录内部所包含的数值数据相互间具有关联性,且这些数值数据共同用以表达该记录的属性。于是在搜寻引擎对结构化数据库进行一全文检索时,可在记录的数值数据与关键字产生匹配时,输出对应于该数值数据的指引数据,作为确认该请求信息的意图。这部分的实施细节将通过下列实例作更进一步的描述。  Generally speaking, in some existing full-text search methods, the obtained search results are unstructured data (such as the results searched by Google or Baidu), because the information of the search results is scattered and irrelevant , so the user has to view each piece of information one by one, thus causing practical limitations. However, in the concept of the present invention, the efficiency and accuracy of retrieval can be effectively improved by using a structured database. Because the numerical data contained in each record in the structured database disclosed by the present invention are related to each other, and these numerical data are jointly used to express the attributes of the record. Therefore, when the search engine performs a full-text search on the structured database, when the recorded numerical data matches the keyword, it can output the guide data corresponding to the numerical data as confirmation of the intention of the request information. The implementation details of this part will be further described through the following examples. the

在本发明的实施例中,结构化数据库220所储存的每个记录302包括标题字段304及内容字段306,标题字段304内包括多个分字段308,各分字段包括指引字段310以及数值字段312,所述多个记录302的指引字段310用以储存指引数据,而所述多个记录302的数值字段312用以储存数值数据。在此以图3A所示的记录1来举例说明,记录1的标题字段304中的三个分字段308分别储存了:  In an embodiment of the present invention, each record 302 stored in the structured database 220 includes a title field 304 and a content field 306, the title field 304 includes a plurality of subfields 308, and each subfield includes a guide field 310 and a value field 312 , the guide field 310 of the plurality of records 302 is used to store guide data, and the value field 312 of the plurality of records 302 is used to store value data. Take the record 1 shown in Figure 3A as an example to illustrate here, three sub-fields 308 in the title field 304 of record 1 store respectively:

"singerguid:刘德华"、  "singerguid: Andy Lau",

"songnameguid:一起走过的日子";及  "songnameguid: the days we walked together"; and

"songtypeguid:港台,粤语,流行";  "songtypeguid: Hong Kong and Taiwan, Cantonese, popular"; 

各分字段308的指引字段310分别储存了指引数据"singerguid"、"songnameguid"及"songtypeguid"、而其对应分字段308的数值字段312则分别储存了数值数据"刘德华"、"一起走过的日子"及"港台,粤语,流行"。指引数据"singerguid"代表数值数据"刘德华"的领域种类为歌手名称(singer),指引数据"songnameguid"代表数值数据"一起走过的日子"的领域种类为歌曲名称(song),指引数据"songtypeguid"代表数值数据"港台,粤语,流行"的领域种类为歌曲类型(song type)。在此的各指引数据实际上可分别用不同的特定一串数字或字符来表示,在本发明中不以此为限。记录1的内容字段306则是储存了"一起走过的日子"这首歌的歌词内容或储存其他的数据(例如作曲/ 词者…等),然而各记录的内容字段306中的真实数据并非本发明所强调的重点,因此在图3A中仅示意性地来描述。  The guidance field 310 of each subfield 308 stores the guidance data "singerguid", "songnameguid" and "songtypeguid" respectively, and the numerical field 312 corresponding to the subfield 308 stores the numerical data "Andy Lau", "Walking together Days" and "Hong Kong and Taiwan, Cantonese, Popular". The guidance data "singerguid" represents the field type of the numerical data "Andy Lau" is the singer name (singer), the guidance data "songnameguid" represents the field type of the numerical data "The days we walked together" is the song name (song), and the guidance data "songtypeguid" The field type of "representative numerical data" Hong Kong and Taiwan, Cantonese, popular" is song type. In fact, each guide data here can be represented by a different specific series of numbers or characters, which is not limited in the present invention. The content field 306 of record 1 is to store the lyrics content of the song "The Days We Walked Together" or store other data (such as composer/lyricist... etc.), but the real data in the content field 306 of each record is not The emphasis of the invention is therefore only schematically depicted in FIG. 3A . the

前述的实施例中,每个记录包括标题字段304及内容字段306,且标题字段304内的分字段308包括指引字段310以及数值字段312,但非以限定本发明,某些实施例中也可以没有内容字段306,甚或是有些实施例中可以没有指引字段310。  In the aforementioned embodiments, each record includes a title field 304 and a content field 306, and the sub-field 308 in the title field 304 includes a guide field 310 and a value field 312, but this is not intended to limit the present invention, and in some embodiments it can also be There is no content field 306 and, in some embodiments, there may be no directions field 310 . the

除此之外,在本发明的实施例中,于各分字段308的数据间储存有第一特殊字符来分隔各分字段308的数据,于指引字段310与该数值字段312的数据间储存有第二特殊字符来分隔指引字段与数值字段的数据。举例来说,如图3A所示,"singerguid"与"刘德华"之间、"songnameguid"与"一起走过的日子"之间、以及"songtypeguid"与"港台,粤语,流行"之间是利用第二特殊字符":"来做分隔,而记录1的各分字段308间是利用第一特殊字符"|"来做分隔,然而本发明并不限于以":"或"|"来做为用以分隔的特殊字符。  In addition, in the embodiment of the present invention, a first special character is stored between the data of each sub-field 308 to separate the data of each sub-field 308, and between the data of the index field 310 and the value field 312 is stored a A second special character to separate the data in the index field and the value field. For example, as shown in Figure 3A, between "singerguid" and "Andy Lau", between "songnameguid" and "the days we walked together", and between "songtypeguid" and "Hong Kong and Taiwan, Cantonese, popular" are The second special character ":" is used to separate, and the sub-fields 308 of record 1 are separated by the first special character "|", but the present invention is not limited to ":" or "|" is the special character used to separate. the

另一方面,在本发明的实施例中,标题字段304中的各分字段308可具有固定位数,例如各分字段308的固定位数可以是32个字符,而其中的指引字段310的固定位数可以是7或8个位(最多用来指引128或256种不同的指引数据),此外,因第一特殊字符与第二特殊字符所需要的位数可以是固定的,所以分字段308的固定位数在扣除指引字段310、第一特殊字符、第二特殊字符所占去的位数后,剩下的位数便可悉数用来储存数值字段312的数值数据。再者,由于分字段308的位数固定,加上分字段308储存数据的内容可如图3A所示依序为指引字段310(指引数据的指标)、第一特殊字符、数值字段312的数值数据、第二特殊字符,而且如前所述,这四个数据的位数量也是固定的,于是在实作上可跳过指引字段310的位(例如跳过前7或8个位)、以及第二特殊字符的位数(例如再跳过1个字符,亦即8个位)后,再扣掉第一特殊字符所占的位数(例如最后1个字符、8个位)之后,最后便可直接取得数值字段312的数值数据(例如在记录1的第一个分字段308中直接取出数值数据“刘德华”,此时还有32-3=29个字符可供储存数值字段312的数值数据,算式中的3(亦即1+1+1)代表被指引字段310的指引数据、第一特殊字符、第二特殊字符所分别占去的1个字符),接着再进行所需的领域种类判断即可。于是,在目前所取出的数值数据比对完毕后(不论是否比对成功与否),可以再依据上述取出数值数据的方式取出下一个分字 段308的数值数据(例如在记录1的第二个分字段308中直接取出数值数据“一起走过的日子”),来进行比对领域种类的比对。上述取出数值数据的方式可以从记录1开始进行比对,并在比对完记录1所有的数值数据后,再取出记录2的标题字段308中第一个分字段308的数值数据(例如“冯小刚”)进行比对。上述比对程序将持续进行,直到所有记录的数值数据都被比对过为止。  On the other hand, in an embodiment of the present invention, each subfield 308 in the title field 304 can have a fixed number of digits, for example, the fixed number of digits of each subfield 308 can be 32 characters, and the fixed number of the guide field 310 can be 32 characters. The number of digits can be 7 or 8 digits (used to guide 128 or 256 different guide data at most). In addition, because the required digits of the first special character and the second special character can be fixed, the sub-field 308 After deducting the fixed number of digits occupied by the guide field 310 , the first special character, and the second special character, the remaining number of digits can be used to store the numerical data of the value field 312 . Furthermore, since the number of digits in the sub-field 308 is fixed, plus the content of the data stored in the sub-field 308, as shown in FIG. data, the second special character, and as previously mentioned, the number of bits of these four data is also fixed, so in practice, the bits of the index field 310 can be skipped (such as skipping the first 7 or 8 bits), and After the number of digits of the second special character (such as skipping 1 character, that is, 8 digits), and then deducting the digits occupied by the first special character (such as the last character, 8 digits), finally Just can directly obtain the numerical data of numerical field 312 (for example, directly take numerical data "Andy Lau" in the first sub-field 308 of record 1, now there are 32-3=29 characters available for storing the numerical value of numerical field 312 data, 3 (that is, 1+1+1) in the formula represents 1 character occupied by the guidance data, the first special character and the second special character of the directed field 310 respectively), and then proceed to the required field The type can be judged. Thus, after the comparison of the numerical data taken out at present is completed (regardless of whether the comparison is successful or not), the numerical data of the next sub-field 308 can be taken out according to the above-mentioned method of taking out the numerical data (for example, in the second field of record 1). directly take out the numerical data "the days we walked together" in the sub-field 308) to compare the types of comparison fields. The above method of taking out numerical data can be compared from record 1, and after comparing all the numerical data in record 1, take out the numerical data of the first sub-field 308 in the title field 308 of record 2 (such as "Feng Xiaogang ”) for comparison. The above comparison procedure will continue until all recorded numerical data have been compared. the

应注意的是,上述的分字段308的位数、以及指引字段310、第一特殊字符、第二特殊字符个使用的位数可依实际应用改变,本发明对此并未加以限制。前述利用比对来取出数值数据的方式只是一种实施例,但非用以限定本发明,另一实施例可以使用全文检索的方式来进行。此外,上述跳过指引字段310、第二特殊字符、第一特殊字符的实作方式,可以使用位平移(例如除法)来达成,此部分的实施可以用硬件、软件、或两者结合的方式进行,本领域的技术人员可依实际需求而变更。在本发明的另一实施例中,标题字段304中的各分字段308可具有固定位数,分字段308中的指引字段310可具有另一固定位数,并且标题字段304中可不包括第一特殊字符以及第二特殊字符,由于各分字段308以及各指引字段310的位数为固定,所以可利用跳过特定位数的方式或是使用位平移(例如除法)的方式来直接取出各分字段308中的指引数据或数值数据。  It should be noted that the above-mentioned number of digits in the sub-field 308 , and the number of digits used in the index field 310 , the first special character, and the second special character may vary according to actual applications, and the present invention is not limited thereto. The above method of extracting numerical data by comparison is just an example, but it is not intended to limit the present invention. Another example can be performed by using a full-text search method. In addition, the implementation of the above-mentioned skip guide field 310, the second special character, and the first special character can be achieved by bit shifting (such as division), and the implementation of this part can be implemented in hardware, software, or a combination of both Those skilled in the art can make changes according to actual needs. In another embodiment of the present invention, each sub-field 308 in the title field 304 may have a fixed number of digits, the index field 310 in the sub-field 308 may have another fixed number of digits, and the title field 304 may not include the first For special characters and second special characters, since the digits of each subfield 308 and each guide field 310 are fixed, each subfield can be directly taken out by skipping a certain number of digits or by bit shifting (such as division). Guidance data or numeric data in field 308. the

应注意的是,由于前面已提到分字段308具有一定的位数,所以可以在自然语言理解系统100中(或是包含自然语言理解系统100的伺服器中)使用计数器来记录目前所比对的是某一记录的某分字段308。此外,比对的记录亦可使用另一计数器来储存其顺序。举例来说,当分别使用一第一计数器记录来表示目前所比对的记录顺序、并使用一第二计数器来表示目前所比对的分字段顺序时,若目前比对的是图3A的记录2的第3个分字段308(亦即比对“filenameguid:华谊兄弟”)时,第一计数器所储存的数值将是2(表示目前比对的是记录2),第二计数器所储存的数值则为3(表示目前比对的是第3个分字段308)。再者,上述仅以7或8个位储存指引字段310的指引数据的方式,是希望将分字段308的大多数字都用来储存数值数据,而实际的指引数据则可通过这7、8个位当作指标,再据以从检索系统200所储存的指引数据储存装置280中读取实际的指引数据,其中指引数据是以表格的方式进行储存,但其他任何可供检索系统200存取的方式皆可用在本发明中。于是, 在实际操作时,除了可直接取出数值数据进行比对之外,亦可在产生匹配结果时,直接依据上述两个计数器的数值,直接取出指引数据作为回应结果110送给知识辅助理解模块400。举例来说,当记录6的第2个分字段308(亦即“songnameguid:背叛”)匹配成功时,将得知目前的第一计数器/第二计数器的数值分别为6与2,因此可以依据这两个数值前往储存图3C所示的指引数据储存装置280,由记录6的分字段2查询出指引数据为“songnameguid”。在一实施例中,可以将分字段308的位数固定后,再将分字段308的所有位都用来储存数值数据,于是可以完全除去指引字段、第一特殊字符、第二特殊字符,而搜寻引擎240只要知道每越过固定位数就是另一个分字段308,并在第二计数器中加一即可(当然,每换下一个记录进行检索时亦需将第一计数器的储存值加一),这样可以提供更多的位数来储存数值数据。  It should be noted that, as mentioned above, the sub-field 308 has a certain number of digits, so a counter can be used in the natural language understanding system 100 (or in a server including the natural language understanding system 100) to record the currently compared It is a certain field 308 of a certain record. In addition, the compared records can also use another counter to store their sequence. For example, when a first counter record is used to represent the current sequence of compared records, and a second counter is used to represent the current sequence of sub-fields compared, if the current comparison is the record in Figure 3A During the 3rd sub-field 308 of 2 (that is, when comparing "filenameguid: Huayi Brothers"), the stored value of the first counter will be 2 (representing that the current comparison is record 2), and the stored value of the second counter is The value is then 3 (indicating that the current comparison is the third sub-field 308). Furthermore, the above-mentioned method of only storing the guidance data of the guidance field 310 with 7 or 8 bits is to hope that most of the words in the sub-field 308 are used to store numerical data, and the actual guidance data can be passed through these 7 or 8 bits. The bit is used as an indicator, and then read the actual guidance data from the guidance data storage device 280 stored in the retrieval system 200, wherein the guidance data is stored in the form of a table, but any other information that can be accessed by the retrieval system 200 All methods can be used in the present invention. Therefore, in actual operation, in addition to directly taking out the numerical data for comparison, it is also possible to directly take out the guide data as the response result 110 and send it to the knowledge-aided understanding module based on the values of the above two counters when generating the matching result 400. For example, when the second sub-field 308 of record 6 (i.e. "songnameguid: betrayal") is successfully matched, it will be known that the current first counter/second counter values are 6 and 2 respectively, so it can be based on These two values are stored in the guide data storage device 280 shown in FIG. 3C , and the guide data is “songnameguid” found in sub-field 2 of record 6 . In one embodiment, after the number of digits of the sub-field 308 can be fixed, all the bits of the sub-field 308 can be used to store numerical data, so the index field, the first special character, and the second special character can be completely removed, and The search engine 240 only needs to know that it is another sub-field 308 every time the fixed number is crossed, and add one to the second counter (of course, the storage value of the first counter needs to be increased by one every time the next record is retrieved) , which provides more bits to store numeric data. the

再举一个实例来说明比对产生匹配结果时,回传匹配记录110至知识辅助理解模块400做进一步处理的过程。对应于上述记录302的数据结构,在本发明的实施例中,当用户的请求信息102为"我要看让子弹飞"时,可产生出两个可能意图语法数据106:  Another example is given to illustrate the process of returning the matching record 110 to the knowledge-aided understanding module 400 for further processing when the comparison generates a matching result. Corresponding to the data structure of the above-mentioned record 302, in an embodiment of the present invention, when the user's request information 102 is "I want to watch let the bullets fly", two possible intent syntax data 106 can be produced:

"<readbook>,<bookname>=让子弹飞";与  "<readbook>,<bookname>=let the bullets fly"; with

"<watchfilm>,<filmname>=让子弹飞";  "<watchfilm>,<filmname>=let the bullets fly"; 

搜寻引擎240便藉由检索接口单元260所接收的关键字108"让子弹飞"来对图3A的结构化数据库220所储存的记录的标题字段304进行全文检索。全文检索中,在标题字段304中找到了储存有数值数据"让子弹飞"的记录5,因此产生了匹配结果。接下来,检索系统200将回传记录5标题字段304中,对应于关键字108“让子弹飞”的指引数据“filmnameguid”作为回应结果110并回传至知识辅助理解模块400。由于在记录5的标题字段中,包含对应数值数据"让子弹飞"的指引数据"filmnameguid",所以知识辅助理解模块400藉由比对记录5的指引数据"filmnameguid"与上述可能意图语法数据106先前已储存的意图数据112"<watchfilm>"或"<readbook>",便能判断出此次请求信息的确定意图语法数据114为"<watchfilm>,<filmname>=让子弹飞"(因为都包含“film”在其中)。换句话说,此次用户的请求信息102中所描述数据"让子弹飞"是电影名称,而数据用户的请求信息102的意图为看电影"让子弹飞",而非阅读书籍。  The search engine 240 performs a full-text search on the title field 304 of the records stored in the structured database 220 of FIG. 3A by using the keyword 108 "let the bullets fly" received by the search interface unit 260 . In the full-text search, the record 5 storing the value data "let the bullets fly" is found in the title field 304, thus a matching result is generated. Next, the retrieval system 200 returns the guidance data “filmnameguid” corresponding to the keyword 108 “let the bullets fly” in the title field 304 of the returned record 5 as the response result 110 and returns it to the knowledge-aided understanding module 400 . Since the title field of record 5 contains the guidance data "filmnameguid" corresponding to the numerical data "let the bullets fly", the knowledge aided understanding module 400 compares the guidance data "filmnameguid" of record 5 with the above-mentioned possible intent grammar data 106 previously Stored intent data 112 "<watchfilm>" or "<readbook>", it can be judged that the definite intent syntax data 114 of this request information is "<watchfilm>, <filmname>=let bullets fly" (because they all contain "film" in it). In other words, the data "Let the Bullets Fly" described in the user's request information 102 this time is the title of a movie, and the data user's request information 102 intends to watch the movie "Let the Bullets Fly" instead of reading books. the

再举一个实例作更进一步的说明。当用户的请求信息102为"我想听一 起走过的日子"时,可产生出两个可能意图语法数据106:  Give another example for further explanation. When the user's request information 102 is "I want to listen to the days I walked through together", two possible intent grammar data 106 can be produced:

"<playmusic>,<singer>=一起走过,<songname>=日子";与  "<playmusic>,<singer>=walked together,<songname>=days"; with

"<playmusic>,<songname>=一起走过的日子";  "<playmusic>,<songname>=the days we walked together"; 

搜寻引擎240便藉由检索接口单元260所接收的两组关键字108:  The search engine 240 then retrieves the two groups of keywords 108 received by the interface unit 260:

"一起走过"与"日子";以及  "walked together" and "days"; and

"一起走过的日子"  "The days we walked together"

来对图3A的结构化数据库220所储存的记录的标题字段304进行全文检索。由于全文检索中,并未在所有记录的标题字段304中找到对应于第一组关键字108"一起走过"与"日子"的匹配结果,而是找到了对应于第二组关键字108"一起走过的日子"的记录1,于是检索系统200将记录1标题字段304中对应于第二组关键字108的指引数据"songnameguid",作为匹配记录110且回传至知识辅助理解模块400。接下来,知识辅助理解模块400在接收对应数值数据"一起走过的日子"的指引数据"songnameguid"后,便与可能意图语法数据106(亦即"<playmusic>,<singer>=一起走过,<songname>=日子"与"<playmusic>,<songname>=一起走过的日子")中的意图数据112(亦即<singer>、<songname>等)进行比对,于是便发现此次用户的请求信息102中并未描述有歌手名称的数据,而是描述有歌曲名称为"一起走过的日子"的数据(因为只有<songname>比对成功)。所以,知识辅助理解模块400可藉由上述比对而判断出此次请求信息102的确定意图语法数据114为"<playmusic>,<songname>=一起走过的日子",而用户的请求信息102的意图为听歌曲"一起走过的日子"。  To perform a full-text search on the title field 304 of the records stored in the structured database 220 of FIG. 3A . Because in the full-text search, the matching results corresponding to the first group of keywords 108 "walking together" and "day" were not found in the title field 304 of all records, but the matching results corresponding to the second group of keywords 108 " The record 1 of the days we walked together", so the retrieval system 200 returns the guide data "songnameguid" corresponding to the second group of keywords 108 in the title field 304 of the record 1 as the matching record 110 and returns it to the knowledge-aided understanding module 400. Next, after receiving the guidance data "songnameguid" corresponding to the numerical data "days walked together", the knowledge aided comprehension module 400 walks together with the possible intent grammar data 106 (that is, "<playmusic>,<singer>= ,<songname>=days" and "<playmusic>, <songname>=days you walked together") in the intent data 112 (that is, <singer>, <songname>, etc.) are compared, so it is found that this time The user's request information 102 does not describe the data with the name of the singer, but the data with the title of the song "the days we walked together" (because only <songname> is compared successfully). Therefore, the knowledge-assisted comprehension module 400 can judge that the definite intention grammar data 114 of the request information 102 is "<playmusic>, <songname>=the days we walked together" through the above comparison, and the user's request information 102 The intention is to listen to the song "The Days We Walked Together". the

在本发明的另一实施例中,检索而得的回应结果110可以是与关键字108完全匹配的完全匹配记录、或是与关键字108部分匹配的部分匹配记录。举例来说,如果用户的请求信息102为"我想听萧敬腾的背叛",同样地,自然语言处理器300经过分析后,产生出两个可能意图语法数据106:  In another embodiment of the present invention, the retrieved response result 110 may be an exact match record that completely matches the keyword 108 , or a partial match record that partially matches the keyword 108 . For example, if the user's request information 102 is "I want to hear Hsiao Jingteng's betrayal", similarly, after analysis, the natural language processor 300 produces two possible intention grammar data 106:

"<playmusic>,<singer>=萧敬腾,<songname>=背叛";及"<playmusic>,<songname>=萧敬腾的背叛";  "<playmusic>,<singer>=Xiao Jingteng,<songname>=betrayal"; and "<playmusic>,<singer>=Xiao Jingteng's betrayal"; 

并传送两组关键字108:  And send two sets of keywords 108: 

"萧敬腾"与"背叛";以及  "Xiao Jingteng" and "Betrayal"; and

"萧敬腾的背叛";  "Xiao Jingteng's Betrayal";

给检索接口单元260,搜寻引擎240接着藉由检索接口单元260所接收 的关键字108来对图3A的结构化数据库220所储存的记录302的标题字段304进行全文检索。由于在全文检索中,对应第二组关键字108"萧敬腾的背叛"并未匹配到任何记录,但是对应第一组关键字108"萧敬腾"与"背叛"找到了记录6与记录7的匹配结果。由于第二组关键字108"萧敬腾"与"背叛"仅与记录6中的数值数据"萧敬腾"相匹配,而未匹配到其他数值数据"杨宗纬"及"曹格",因此记录6为部分匹配记录(请注意上述对应请求信息102"我要看让子弹飞"的记录5以及对应请求信息"我想听一起走过的日子"的记录1皆为部分匹配记录),而关键字"萧敬腾"与"背叛"完全匹配了记录7的数值数据(因为第二组关键字108"萧敬腾"与"背叛"皆匹配成功),所以记录7为完全匹配记录。在本发明的实施例中,当该检索接口单元260输出多个匹配记录110至知识辅助理解模块400时,可依序输出完全匹配记录(亦即全部的数值数据都被匹配)及部分匹配记录(亦即仅有部分的数值数据被匹配)的匹配记录110,其中完全匹配记录的优先顺序大于部分匹配记录的优先顺序。因此,在检索接口单元260输出记录6与记录7的匹配记录110时,记录7的输出优先顺序会大于记录6的输出优先顺序,因为记录7全部的数值数据"萧敬腾"与"背叛"都产生匹配结果,但记录6还包含"杨宗纬"与"曹格"未产生匹配结果。也就是说,结构化数据库220中所储存的记录对其请求信息102中的关键字108的匹配程度越高,越容易优先被输出,以便用户进行查阅或挑选对应的确定意图语法数据114。在另一实施例中,可直接输出优先顺序最高的记录所对应的匹配记录110,做为确定意图语法数据114之用。前述非以限定本发明,因为在另一实施例中可能采取只要搜寻到有匹配记录即输出的方式(例如,以"我想听萧敬腾的背叛"为请求信息102而言,当检索到记录6即产生匹配结果时,即输出记录6对应的指引数据做匹配记录110),而没有包含优先顺序的排序,以加快检索的速度。在另一实施例中,可对优先顺序最高的记录,直接执行其对应的处理方式并提供予用户。例如当优先顺序最高的为播放三国演义的电影时,可直接播放电影与用户。此外,若优先顺序最高的为萧敬腾演唱的背叛时,可直接将此歌曲播放与用户。应注意的是,本发明在此仅作说明,并非对此加以限定。  To the search interface unit 260, the search engine 240 then uses the keyword 108 received by the search interface unit 260 to perform a full-text search on the title field 304 of the record 302 stored in the structured database 220 of FIG. 3A. Because in the full-text search, no records were matched for the second group of keywords 108 "Xiao Jingteng's betrayal", but the matching results of records 6 and 7 were found for the first group of keywords 108 "Xiao Jingteng" and "betrayal" . Since the second group of keywords 108 "Xiao Jingteng" and "betrayal" only matched the numerical data "Xiao Jingteng" in record 6, but did not match other numerical data "Yang Zongwei" and "Cao Ge", record 6 is a partial match record (Please note that the above record 5 corresponding to the request information 102 "I want to watch let the bullets fly" and the record 1 corresponding to the request information "I want to listen to the days we walked together" are both partial matching records), and the keyword "Xiao Jingteng" and "Betrayal" completely matches the numerical data of record 7 (because the second group of keywords 108 "Xiao Jingteng" and "betrayal" both match successfully), so record 7 is an exact match record. In the embodiment of the present invention, when the retrieval interface unit 260 outputs a plurality of matching records 110 to the knowledge-aided understanding module 400, it can output the complete matching records (that is, all the numerical data are matched) and the partial matching records in sequence (that is, only part of the numerical data is matched) matching records 110, wherein the priority of the complete matching records is higher than that of the partial matching records. Therefore, when the search interface unit 260 outputs the matching record 110 between record 6 and record 7, the output priority of record 7 will be higher than that of record 6, because all the numerical data "Xiao Jingteng" and "betrayal" of record 7 are generated. Matching results, but record 6 also contains "Yang Zongwei" and "Cao Ge" and did not produce matching results. That is to say, the higher the matching degree of the records stored in the structured database 220 to the keywords 108 in the requested information 102 , the easier it is to be preferentially output, so that the user can check or select the corresponding definite intent grammar data 114 . In another embodiment, the matching record 110 corresponding to the record with the highest priority may be directly output for use in determining the intent grammar data 114 . The foregoing is not intended to limit the present invention, because in another embodiment, it may be possible to output as long as a matching record is found (for example, taking "I want to hear Xiao Jingteng's betrayal" as the request information 102, when the record 6 That is, when the matching result is generated, the guide data corresponding to the output record 6 is used as the matching record 110 ), without prioritization, so as to speed up the retrieval. In another embodiment, the record with the highest priority can be directly processed and provided to the user. For example, when the highest priority is to play the movie Romance of the Three Kingdoms, the movie and the user can be played directly. In addition, if the highest priority is Betrayal sung by Xiao Jingteng, this song can be played directly to the user. It should be noted that the present invention is only described here and not limited thereto. the

在本发明的再一实施例中,如果用户的请求信息102为"我要听刘德华的背叛",则其可能意图语法数据106的其中之一为:  In yet another embodiment of the present invention, if the user's request information 102 is "I want to listen to Andy Lau's betrayal", one of the possible intended grammar data 106 is:

"<playmusic>,<singer>=刘德华,<songname>=背叛”;  "<playmusic>,<singer>=Andy Lau,<songname>=betrayal"; 

若检索接口单元260将关键字108"刘德华"与"背叛"输入搜寻引擎240,并不会在图3A的数据库中找到任何的匹配结果。在本发明的又一实施例中,检索接口单元260可分别将关键字108"刘德华"以及"背叛"输入搜寻引擎240,并且分别对应求得"刘德华"是歌手名称(指引数据singerguid)以及"背叛"是歌曲名称(指引数据songnameguid,且歌手可能是曹格或是萧敬腾、杨宗纬与曹格合唱)。此时,自然语言理解系统100可进一步提醒用户:“背叛这首歌曲是否为萧敬腾所唱(依据记录7的匹配结果)?”,或者,“是否为萧敬腾、杨宗纬与曹格所合唱(依据记录6的匹配结果)?”。  If the search interface unit 260 inputs the keywords 108 "Andy Lau" and "betrayal" into the search engine 240, no matching result will be found in the database in FIG. 3A. In yet another embodiment of the present invention, the search interface unit 260 can input the keywords 108 "Andy Lau" and "betrayal" into the search engine 240, and obtain correspondingly whether "Andy Lau" is the name of the singer (guidance data singerguid) and " Betrayal" is the name of the song (guide data songnameguid, and the singer may be Cao Ge or Xiao Jingteng, Yang Zongwei and Cao Ge). At this time, the natural language understanding system 100 can further remind the user: "Is the song Betrayal sung by Xiao Jingteng (according to the matching result of record 7)?" matching results)?". the

在本发明的再一实施例中,结构化数据库220所储存记录可还包括有来源字段314及热度字段316。如图3B所示的数据库,其除了图3A的各项字段之外,还包含来源字段314热度字段316、喜好字段318与厌恶字段。各记录的来源字段314可用以储存此记录是出自哪一个结构化数据库(在此图式中仅显示结构化数据库220,而实际上可存在更多不同的结构化数据库)、或是哪一个用户、伺服器所提供的来源值。并且,自然语言理解系统100可根据用户在之前的请求讯息102中所透漏的喜好,来检索特定来源的结构化数据库(例如以请求信息102中的关键字108进行全文检索产生匹配时,便对该记录的热度值加一)。而各记录302的热度字段316用以储存此记录302的搜寻热度值或是热门程度值(例如该记录在特定时间里被单一用户、特定用户群组、所有用户的匹配次数或机率),以供知识辅助理解模块400判断用户意图时的参考,至于喜好字段318与厌恶字段的使用方式后文会再详述。详细而论,当用户的请求信息102为"我要看三国演义"时,自然语言处理器300经过分析后,可产生出多个可能意图语法数据106:  In yet another embodiment of the present invention, the records stored in the structured database 220 may further include a source field 314 and a popularity field 316 . As shown in FIG. 3B , in addition to the fields in FIG. 3A , it also includes a source field 314 , popularity field 316 , like field 318 and dislike field. The source field 314 of each record can be used to store which structured database the record comes from (only the structured database 220 is shown in this diagram, but in fact there may be more different structured databases), or which user , the origin value provided by the server. Moreover, the natural language understanding system 100 can search a structured database of a specific source according to the preference disclosed by the user in the previous request message 102 (for example, when a full-text search is performed with the keyword 108 in the request message 102 to generate a match, then the The heat value of the record is incremented by one). And the popularity field 316 of each record 302 is used to store the search popularity value or the popularity degree value of this record 302 (for example, this record is matched by a single user, a specific user group, or the number of times or probability of all users in a specific time), so as to For reference when the knowledge-assisted comprehension module 400 judges the user's intention, the use of the like field 318 and the dislike field will be described in detail later. In detail, when the user's request information 102 is "I want to watch Romance of the Three Kingdoms", after analysis by the natural language processor 300, multiple possible intent grammar data 106 can be generated:

"<readbook>,<bookname>=三国演义";  "<readbook>,<bookname>=Romance of the Three Kingdoms"; 

"<watchTV>,<TVname>=三国演义";以及  "<watchTV>,<TVname>=Romance of the Three Kingdoms"; and

"<watchfilm>,<filmname>=三国演义"。  "<watchfilm>,<filmname>=Romance of the Three Kingdoms". the

若自然语言理解系统100在用户的请求信息102的历史记录中(例如利用通过热度字段316储存该笔记录302被某用户所点选的次数),统计出其大部份的请求为看电影,则自然语言理解系统100可针对储存电影记录的结构化数据库来做检索(此时来源字段314中的来源值,是记录储存电影记录的结构化数据库的代码),从而可优先判定"<watchfilm>,<filmname>=三国演义"为确定意图语法数据114。举例来说,在一实施例中亦可在每个记录302 被匹配一次,就可在后面的热度字段316加一,作为用户的历史记录。于是在依据关键字108“三国演义”做全文检索时,可以从所有匹配结果中挑选热度字段316中数值最高的记录302,作为判断用户意图之用。在一实施例中,若自然语言理解系统100在关键字108"三国演义"的检索结果中,判定对应"三国演义"这出电视节目的记录的热度字段316所储存的搜寻热度值最高,则便可优先判定"<watchTV>,<TVname>=三国演义"为确定意图语法数据114。此外,上述对热度字段316所储存数值的变更方式,可通过自然语言理解系统100所在的计算机系统进行变更,本发明对此并不加以限制。此外、热度字段316的数值亦可随时间递减,以表示用户对某项记录302的热度已逐渐降低,本发明对这部分亦不加以限制。  If the natural language understanding system 100 counts most of the requests for watching movies in the historical records of the user's request information 102 (for example, using the heat field 316 to store the number of times the record 302 is clicked by a certain user), Then the natural language understanding system 100 can search the structured database for storing movie records (at this time, the source value in the source field 314 is the code for recording the structured database for storing movie records), so that it can preferentially determine "<watchfilm> ,<filmname>=Romance of the Three Kingdoms" is the grammatical data 114 for determining the intent. For example, in one embodiment, each record 302 can be matched once, and one can be added to the popularity field 316 as the user's historical record. Therefore, when performing a full-text search based on the keyword 108 "Romance of the Three Kingdoms", the record 302 with the highest value in the popularity field 316 can be selected from all matching results for judging the user's intention. In one embodiment, if the natural language understanding system 100 determines that the search popularity value stored in the popularity field 316 of the TV program corresponding to the TV program "Romance of the Three Kingdoms" is the highest among the search results of the keyword 108 "Romance of the Three Kingdoms", then Then it can be preferentially determined that "<watchTV>, <TVname>=Romance of the Three Kingdoms" is the grammatical data 114 for determining the intention. In addition, the above-mentioned way of changing the value stored in the popularity field 316 can be changed through the computer system where the natural language understanding system 100 is located, which is not limited in the present invention. In addition, the value of the popularity field 316 can also decrease with time to indicate that the user's popularity for a certain record 302 has gradually decreased, and the present invention does not limit this part. the

再举另一个实例来说,在另一实施例中,由于用户可能在某段时间中特别喜欢看三国演义的电视剧,由于电视剧的长度可能很长而用户无法短时间看完,因此在短时间中可能重复点选(假设每匹配一次就将热度字段316内的数值加一的话),因此造成某个记录302被重复匹配,这部分都可通过分析热度字段316的数据而得知。再者,在另一实施例中,电信业者也可以利用热度字段316来表示某一来源所提供数据被取用的热度,而此数据供应者的编码可以用来源字段314进行储存。举例来说,若某位供应“三国演义电视剧”的供应者的被点选的机率最高,所以当某用户输入“我要看三国演义”的请求信息102时,虽然在对图3B的数据库进行全文检索时会找到阅读三国演义的书籍(记录8)、观看三国演义电视剧(记录9)、观看三国演义电影(记录10)三个匹配结果,但由于热度字段316中的数据显示观看三国演义电视剧是现在最热门的选项(亦即记录8、9、10的热度字段的数值分别为2、5、8),所以将先提供记录10的指引数据做匹配记录110输出至知识辅助理解系统400,作为判定用户意图的最优先选项。在一实施例中,可同时将来源字段314的数据显示给用户,让用户判断他所想要观看的电视剧是否为某位供应者所提供。应注意的是,上述对来源字段314所储存数据以及其变更方式,亦可通过自然语言理解系统100所在的计算机系统进行变更,本发明对此并不加以限制。应注意的是,本领域的技术人员应知,可进一步将图3B中的热度字段316、喜好字段318、厌恶字段320所储存的信息进一步切割成与用户个人相关以及与全体用户相关两部分,并将与用户个人相关的热度字段316、喜好字段318、厌恶字段320信息将储存在用户的手机,而伺服 器则储存与全体用户相关的热度字段316、喜好字段318、厌恶字段320等信息。这样一来,仅与用户个人的选择或意图相关的个人喜好相关信息就只储存在用户个人的移动通讯装置(例如手机、平板计算机、或是小笔电…等)中,而伺服器则储存与用户全体相关的信息,这样不仅可节省伺服器的储存空间,也保留用户个人喜好的隐密性。  To give another example, in another embodiment, because the user may particularly like to watch the TV series of Romance of the Three Kingdoms in a certain period of time, because the length of the TV series may be very long and the user cannot watch it in a short time, so in a short time It may be repeatedly clicked (assuming that the value in the popularity field 316 is increased by one every time it is matched), thus causing a certain record 302 to be matched repeatedly, which can be known by analyzing the data in the popularity field 316. Moreover, in another embodiment, the telecommunications company can also use the popularity field 316 to indicate the popularity of the data provided by a certain source, and the code of the data provider can be stored in the source field 314 . For example, if a supplier who supplies "Romance of the Three Kingdoms" has the highest probability of being selected, so when a user inputs the request information 102 of "I want to watch Romance of the Three Kingdoms", although the database in Fig. 3B is processed During the full-text search, three matching results will be found: reading Romance of the Three Kingdoms (Record 8), watching Romance of the Three Kingdoms TV series (Record 9), and watching Romance of the Three Kingdoms movies (Record 10). It is the most popular option now (that is, the values of the heat fields of records 8, 9, and 10 are 2, 5, and 8 respectively), so the guide data of record 10 will be provided first to make matching record 110 and output to the knowledge-assisted understanding system 400, As the highest priority option to determine user intent. In one embodiment, the data in the source field 314 can be displayed to the user at the same time, allowing the user to judge whether the TV series he wants to watch is provided by a certain provider. It should be noted that the above-mentioned data stored in the source field 314 and its modification method can also be modified through the computer system where the natural language understanding system 100 is located, which is not limited by the present invention. It should be noted that those skilled in the art should know that the information stored in the popularity field 316, favorite field 318, and dislike field 320 in FIG. 3B can be further divided into two parts related to the individual user and related to all users. And the popularity field 316, liking field 318, and dislike field 320 information related to the user will be stored in the user's mobile phone, and the server then stores information such as the popularity field 316, liking field 318, and dislike field 320 related to all users. In this way, the personal preference information related to the user's personal choice or intention is only stored in the user's personal mobile communication device (such as a mobile phone, tablet computer, or small laptop, etc.), while the server stores Information related to all users, which not only saves the storage space of the server, but also keeps the privacy of the user's personal preferences. the

明显的,本发明所揭示的结构化数据库中的每个记录内部所包含的数值数据相互间具有关联性(例如记录1中的数值数据“刘德华”、“一起走过的日子”、“港台,粤语,流行”都是用来描述记录1的特征),且这些数值数据共同用以表达来自用户的请求信息对该记录的意图(例如对“一起走过的日子”产生匹配结果时,表示用户的意图可能是对记录1的数据存取),于是在搜寻引擎对结构化数据库进行全文检索时,可在记录的数值数据被匹配时,输出对应于该数值数据的指引数据(例如输出“songnameguid”作为回应结果110),进而确认该请求信息的意图(例如在知识辅助理解模块400中进行比对)。  Obviously, the numerical data contained in each record in the structured database disclosed by the present invention are related to each other (for example, the numerical data "Andy Lau" in record 1, "days we walked together", "Hong Kong and Taiwan , Cantonese, and popular” are used to describe the characteristics of record 1), and these numerical data are used to express the intention of the record in the request information from the user (for example, when a matching result is generated for “the days we walked together”, it means The user's intention may be to access the data of record 1), so when the search engine performs a full-text search on the structured database, when the numerical data of the record is matched, it can output the guide data corresponding to the numerical data (for example, output " songnameguid" as the response result 110), and then confirm the intent of the request information (for example, compare in the knowledge-aided understanding module 400). the

基于上述示范性实施例所揭示或教示的内容,图4A为根据本发明的一实施例的检索方法的流程图。请参阅图4A,本发明的实施例的检索方法包括以下步骤:  Based on the contents disclosed or taught in the above exemplary embodiments, FIG. 4A is a flowchart of a retrieval method according to an embodiment of the present invention. Please refer to Fig. 4A, the retrieval method of the embodiment of the present invention comprises the following steps:

提供结构化数据库,且结构化数据库储存多个记录(步骤S410);  Provide a structured database, and the structured database stores multiple records (step S410);

接收至少一关键字(步骤S420);  Receive at least one keyword (step S420);

藉由关键字来对多个记录的标题字段进行全文检索(步骤S430)。举例来说,将关键字108输入检索接口单元260来让搜寻引擎240对结构化数据库220所储存的多个记录302的标题字段304进行全文检索,至于检索方式可如对图3A或图3B所进行的检索方式、或是不变更其精神的方式来进行;  A full-text search is performed on the title fields of multiple records by keywords (step S430). For example, the keyword 108 is input into the search interface unit 260 to allow the search engine 240 to perform a full-text search on the title fields 304 of the plurality of records 302 stored in the structured database 220. As for the search method, it can be as shown in FIG. 3A or FIG. 3B the manner in which the search is carried out, or is carried out without altering its spirit;

判断全文检索是否有匹配结果(步骤S440)。举例来说,藉由搜寻引擎240来判断此关键字108所对应的全文检索是否有匹配结果;以及  It is judged whether there is a matching result in the full-text search (step S440). For example, use the search engine 240 to determine whether the full-text search corresponding to the keyword 108 has matching results; and

若有匹配结果,依序输出完全匹配记录及部分匹配记录(步骤S450)。举例来说,若结构化数据库220中有记录匹配此关键字108,则检索接口单元260依序输出匹配此关键字108的完全匹配记录及部分匹配记录中的指引数据(可通过对图3C的指引数据储存装置280而取得)作为回应结果110送往知识辅助理解系统400,其中完全匹配记录的优先顺序大于部分匹配记录的优先顺序。  If there is a matching result, a full matching record and a partial matching record are sequentially output (step S450). For example, if there is a record matching the keyword 108 in the structured database 220, the retrieval interface unit 260 sequentially outputs the guide data in the complete matching record and the partial matching record matching the keyword 108 (can be obtained by referring to FIG. 3C (obtained by guiding the data storage device 280) is sent to the knowledge-aided understanding system 400 as the response result 110, wherein the priority of the complete matching record is higher than that of the partial matching record. the

另一方面,若未有匹配结果,则可以直接通知用户匹配失败并结束流程、通知用户未发现匹配结果并要求做更进一步的输入、或是列举可能选项给用户做进一步选择(例如前述以"刘德华"与"背叛"做全文检索未产生匹配结果的例子)(步骤460)。  On the other hand, if there is no matching result, you can directly notify the user that the matching failed and end the process, notify the user that no matching result was found and ask for further input, or list possible options for the user to make further choices (such as the aforementioned " "Andy Lau" and "betrayal" do a full-text search without producing a matching result) (step 460). the

前述的流程步骤非以限定本发明,有些步骤是可以忽略或移除,例如,在本发明的另一实施例中,可藉由位于检索系统200外的匹配判断模块(未绘示于图中)来执行步骤S440;或是在本发明的另一实施例中,可忽略上述步骤S450,其依序输出完全匹配记录及部分匹配记录的动作可以藉由位于检索系统200外的匹配结果输出模块(未绘示于图中),来执行步骤S450中依序输出完全匹配记录及部分匹配记录的动作。  The foregoing process steps are not intended to limit the present invention, and some steps can be ignored or removed. For example, in another embodiment of the present invention, a matching judgment module (not shown in the figure) outside the retrieval system 200 can be used to ) to execute step S440; or in another embodiment of the present invention, the above-mentioned step S450 can be ignored, and the action of sequentially outputting complete matching records and partial matching records can be performed by a matching result output module located outside the retrieval system 200 (not shown in the figure), to execute the action of sequentially outputting the complete matching record and the partial matching record in step S450. the

基于上述示范性实施例所揭示或教示的内容,图4B为根据本发明的另一实施例的自然语言理解系统100工作过程的流程图。请参阅图4B,本发明的另一实施例的自然语言理解系统100工作过程包括以下步骤:  Based on the content disclosed or taught in the above exemplary embodiments, FIG. 4B is a flow chart of the working process of the natural language understanding system 100 according to another embodiment of the present invention. Referring to Fig. 4B, the working process of the natural language understanding system 100 of another embodiment of the present invention includes the following steps:

接收请求信息(步骤S510)。举例来说,用户将具有语音内容或文字内容的请求信息102传送至自然语言理解系统100;  Receive request information (step S510). For example, the user sends the request information 102 with voice content or text content to the natural language understanding system 100;

提供结构化数据库,且结构化数据库储存多个记录(步骤S520);  Provide a structured database, and the structured database stores multiple records (step S520);

将请求信息语法化(步骤S530)。举例来说,自然语言处理器300分析用户的请求信息102后,进而转为对应的可能意图语法数据106;  Syntaxize the request information (step S530). For example, after the natural language processor 300 analyzes the user's request information 102, it is converted into the corresponding possible intent grammar data 106;

辨别关键字的可能属性(步骤S540)。举例来说,知识辅助理解模块400辨别出可能意图语法数据106中的至少一关键字108的可能属性,例如,关键字108"三国演义"可能是书、电影及电视节目;  Identify possible attributes of keywords (step S540). For example, the knowledge-assisted comprehension module 400 identifies possible attributes of at least one keyword 108 in the grammar data 106, for example, the keyword 108 "Romance of the Three Kingdoms" may be books, movies and TV programs;

藉由关键字108来对多个记录的标题字段304进行全文检索(步骤S550)。举例来说,将关键字108输入检索接口单元260来让搜寻引擎240对结构化数据库220所储存的多个记录的标题字段304进行全文检索;  Perform a full-text search on the title fields 304 of multiple records by using the keyword 108 (step S550 ). For example, the keyword 108 is input into the search interface unit 260 to allow the search engine 240 to perform a full-text search on the title field 304 of a plurality of records stored in the structured database 220;

判断全文检索是否有匹配结果(步骤S560)。举例来说,藉由搜寻引擎240来判断此关键字108所对应的全文检索是否有匹配结果;  It is judged whether there is a matching result in the full-text search (step S560). For example, use the search engine 240 to determine whether the full-text search corresponding to the keyword 108 has matching results;

若有匹配结果,依序输出完全匹配记录及部分匹配记录(步骤S570)所对应的指引数据为回应结果110。举例来说,若结构化数据库220中有记录匹配此关键字108,则检索接口单元260依序输出匹配此关键字108的完全匹配记录及部分匹配记录所对应的指引数据为回应结果110,  If there is a matching result, the guidance data corresponding to the complete matching record and the partial matching record (step S570 ) are sequentially output as the response result 110 . For example, if there is a record matching the keyword 108 in the structured database 220, the retrieval interface unit 260 sequentially outputs the guidance data corresponding to the complete matching record and the partial matching record matching the keyword 108 as the response result 110,

其中完全匹配记录的优先顺序大于部分匹配记录的优先顺序;以及  where exact match records take precedence over partial match records; and

依序输出对应的确定意图语法数据(步骤S580)。举例来说,知识辅助理解模块400藉由依序输出的完全匹配记录及部分匹配记录,藉以输出对应的确定意图语法数据114。  Sequentially output the corresponding determined intent syntax data (step S580). For example, the knowledge-assisted comprehension module 400 outputs the corresponding determined intent grammar data 114 by sequentially outputting the complete matching record and the partial matching record. the

另一方面,若在步骤S560未产生匹配结果,亦可运用类似步骤S460的方式来处理,例如直接通知用户匹配失败并结束流程、通知用户未发现匹配结果并要求做更进一步的输入、或是列举可能选项给用户做进一步选择(例如前述以"刘德华"与"背叛"做全文检索未产生匹配结果的例子)(步骤S590)。  On the other hand, if no matching result is generated in step S560, it can also be processed in a manner similar to step S460, such as directly notifying the user of matching failure and ending the process, notifying the user that no matching result is found and requiring further input, or Enumerate possible options for the user to make further selections (for example, the aforementioned example of "Andy Lau" and "betrayal" full-text search did not produce matching results) (step S590). the

前述的流程步骤非以限定本发明,有些步骤是可以忽略或移除。  The foregoing process steps are not intended to limit the present invention, and some steps may be omitted or removed. the

综上所述,本发明藉由取出用户的请求信息所包括的关键字,并且针对结构化数据库中的具有特定数据结构的记录的标题字段来进行全文检索,若产生匹配结果,便可判断出关键字所属的领域种类,藉以确定用户在请求信息所表示的意图。  To sum up, the present invention extracts the keywords included in the user's request information, and performs a full-text search for the title field of the record with a specific data structure in the structured database. If a matching result is generated, it can be judged that The category of the field to which the keyword belongs, so as to determine the intention expressed by the user in requesting information. the

接下来针对以上结构化数据库在语音识别上的应用做更多的说明。首先针对在自然语言对话系统中,根据用户的语音输入来修正错误的语音应答,并进一步找出其他可能的答案来回报给用户的应用做说明。  Next, we will explain more about the application of the above structured database in speech recognition. First, in the natural language dialogue system, the application of correcting the wrong voice response according to the user's voice input, and further finding out other possible answers to report to the user is described. the

如前所述,虽然现今的移动通讯装置已可提供自然语言对话功能,以让用户发出语音来和移动通讯装置沟通。然而在目前的语音对话系统,当用户的语音输入不明确时,由于同一句语音输入可能意指多个不同的意图或目的,故系统容易会输出不符合语音输入的语音应答。因此在很多对话情境中,用户难以得到符合其意图的语音应答。为此,本发明提出一种修正语音应答的方法以及自然语言对话系统,其中自然语言对话是统可根据用户的语音输入来修至错误的语音应答,并进一步找出其他可能的答案来回报给用户。为了使本发明的内容更为明了,以下特举实施例作为本发明确实能够据以实施的范例。  As mentioned above, although today's mobile communication devices can provide a natural language dialogue function to allow users to communicate with the mobile communication device by making a voice. However, in the current voice dialogue system, when the user's voice input is unclear, the system may easily output a voice response that does not conform to the voice input because the same voice input may mean multiple different intentions or purposes. Therefore, in many dialogue situations, it is difficult for users to get voice responses that meet their intentions. For this reason, the present invention proposes a method for correcting a voice response and a natural language dialogue system, wherein the natural language dialogue system can correct the wrong voice response according to the user's voice input, and further find out other possible answers to give back to user. In order to make the content of the present invention clearer, the following specific examples are given as examples in which the present invention can actually be implemented. the

图5A是依照本发明一实施例所绘示的自然语言对话系统的方块图。请参照图5A,自然语言对话系统500包括语音取样模块510、自然语言理解系统520、以及语音合成数据库530。在一实施例中,语音取样模块510用以接收第一语音输入501(例如来自用户的语音),随后对其进行解析而产生第一请求信息503,而自然语言理解系统520会再对第一请求信息503进行解析而取得其中的第一关键字509,并在找到符合第一请求信息503的第一回 报答案511后(依据图1的描述,第一请求信息503可运用请求信息102相同的方式做处理,亦即请求信息102在分析后会产生可能意图语法数据106,而其中的关键字108会用来对结构化数据库220进行全文检索而获得回应结果110,此回应结果110再与可能意图语法数据106中的意图数据112作比对而产生确定意图语法数据114,最后由分析结果输出模块116送出分析结果104,此分析结果104可作为图5A中的第一回报答案511),依据此第一回报答案511对语音合成数据库530进行对应的语音查询(因为做为第一回答案511的分析结果104可包含完全/部分匹配的记录302的相关数据(例如储存在指引字段310的指引数据、在数值字段312的数值数据、以及在内容字段306的数据…等),因此可利用这些数据进行语音查询),再输出所查询的第一语音513产生对应于第一语音输入501的第一语音应答507予用户。其中,倘若用户认为自然语言理解系统520所输出的第一语音应答507不符合第一语音输入501中的第一请求信息503时,用户将输入另一个语音输入,例如第二语音输入501’,来指示此事。自然语言理解系统520会利用上述对第一语音输入501的相同处理方式来处理第二语音输入501’以产生第二请求信息503’,随后对第二请求信息503’进行解析、取得其中的第二关键字509’、找到符合第二请求信息503’的第二回报答案511’、找出对应的第二语音513’、最后再依据第二语音513’产生对应的第二语音应答507’输出予用户,作为修正第一回报答案511之用。明显的,自然语言理解系统520可以图1的自然语言理解系统100为基础,并再增加新的模块(将结合后续的图5B做解说)来达成根据用户的语音输入来修正错误的语音应答的目的。  FIG. 5A is a block diagram of a natural language dialogue system according to an embodiment of the present invention. Please refer to FIG. 5A , the natural language dialogue system 500 includes a speech sampling module 510 , a natural language understanding system 520 , and a speech synthesis database 530 . In one embodiment, the voice sampling module 510 is used to receive the first voice input 501 (such as voice from the user), and then analyze it to generate the first request information 503, and the natural language understanding system 520 will then analyze the first The request information 503 is parsed to obtain the first keyword 509 therein, and after finding the first return answer 511 that meets the first request information 503 (according to the description in Figure 1, the first request information 503 can use the same as the request information 102 In other words, the request information 102 will generate possible intent grammar data 106 after analysis, and the keywords 108 in it will be used to perform full-text search on the structured database 220 to obtain the response result 110, and the response result 110 will be compared with The intent data 112 in the possible intent grammar data 106 is compared to generate the definite intent grammar data 114, and finally the analysis result 104 is sent by the analysis result output module 116, and the analysis result 104 can be used as the first reported answer 511 in FIG. 5A), According to the first reported answer 511, a corresponding speech query is performed on the speech synthesis database 530 (because the analysis result 104 as the first answer 511 may include the relevant data of the complete/partially matched record 302 (such as stored in the guide field 310) guide data, the numerical data in the numerical field 312, and the data in the content field 306... etc.), so these data can be used for voice query), and then the first voice 513 of the query is output to generate the corresponding first voice input 501 The first voice response 507 is given to the user. Wherein, if the user thinks that the first voice response 507 output by the natural language understanding system 520 does not conform to the first request information 503 in the first voice input 501, the user will input another voice input, such as the second voice input 501', to indicate this. The natural language understanding system 520 will process the second voice input 501' in the same manner as the first voice input 501 above to generate the second request information 503', and then analyze the second request information 503' to obtain the first request information 503'. Two keywords 509', find the second reported answer 511' that matches the second request information 503', find out the corresponding second voice 513', and finally generate the corresponding second voice response 507' output according to the second voice 513' For the user, it is used for correcting the answer 511 of the first report. Obviously, the natural language understanding system 520 can be based on the natural language understanding system 100 in FIG. 1, and add a new module (will be explained in conjunction with the subsequent FIG. 5B) to achieve the purpose of correcting the wrong voice response according to the user's voice input. Purpose. the

前述自然语言对话系统500中的各构件可配置在同一机器中。举例而言,语音取样模块510与自然语言理解系统520例如是配置于同一电子装置。其中,电子装置可以是移动电话(Cell phone)、个人数字助理(Personal Digital Assistant,PDA)手机、智能型手机(Smart phone)等移动通讯装置、掌上型计算机(Pocket PC)、平板型计算机(Tablet PC)、笔记型计算机、个人计算机、或是其他具备通讯功能或安装有通讯软件的电子装置,在此并不限制其范围。此外,上述电子装置可使用Android操作系统、Microsoft操作系统、Android操作系统、Linux操作系统等等,不限于此。当然,前述自然语言对话系统500中的各构件也不一定需设置在同一机器中,而可分散在不同装置或系统并通过各种不同的通讯协议来连结。举例而言,自然语言理解系 统520可以位于云端伺服器中,也可以位于区域网路中的伺服器。此外,自然语言理解系统520中的各构件也可分散在不同的机器,例如自然语言理解系统520中的各构件可位于与语音取样模块510相同或不同的机器。  The various components in the aforementioned natural language dialogue system 500 can be configured in the same machine. For example, the voice sampling module 510 and the natural language understanding system 520 are configured in the same electronic device. Among them, the electronic device can be a mobile phone (Cell phone), a personal digital assistant (Personal Digital Assistant, PDA) mobile phone, a smart phone (Smart phone) and other mobile communication devices, a palmtop computer (Pocket PC), a tablet computer (Tablet PC), notebook computer, personal computer, or other electronic devices with communication functions or installed with communication software, without limiting the scope here. In addition, the above-mentioned electronic device may use an Android operating system, a Microsoft operating system, an Android operating system, a Linux operating system, etc., but is not limited thereto. Of course, the various components in the aforementioned natural language dialogue system 500 do not necessarily need to be installed in the same machine, but can be distributed in different devices or systems and connected through various communication protocols. For example, the natural language understanding system 520 can be located in a cloud server, or a server in a local area network. In addition, the various components in the natural language understanding system 520 may also be distributed in different machines, for example, the various components in the natural language understanding system 520 may be located in the same or different machines as the speech sampling module 510 . the

在本实施例中,语音取样模块510用以接收语音输入,此语音取样模块510可以为麦克风(Microphone)等接收音讯的装置,而第一语音输入501/第二语音输入501’可以是来自用户的语音。  In this embodiment, the voice sampling module 510 is used to receive voice input. The voice sampling module 510 can be a device for receiving audio such as a microphone (Microphone), and the first voice input 501/second voice input 501' can be from the user voice. the

此外,本实施例的自然语言理解系统520可由一个或数个逻辑门组合而成的硬件电路来实作。或者,在本发明另一实施例中,自然语言理解系统520可以通过计算机程序码来实作。举例来说,自然语言理解系统520例如是由程序语言所撰写的程序码片段来实作于应用程序、操作系统或驱动程序等,而这些程序码片段储存在储存单元中,并藉由处理单元(图5A未显示)来执行的。为了使本领域的技术人员进一步了解本实施例的自然语言理解系统520,底下举实例来进行说明。然,本发明在此仅为举例说明,并不以此为限,例如运用硬件、软件、固件、或是此三种实施方式的混合结合等方式,皆可运用来实施本发明。  In addition, the natural language understanding system 520 of this embodiment can be implemented by a hardware circuit composed of one or several logic gates. Alternatively, in another embodiment of the present invention, the natural language understanding system 520 may be implemented by computer program codes. For example, the natural language understanding system 520 is implemented in application programs, operating systems, or drivers, etc., by program code segments written in a programming language, and these program code segments are stored in the storage unit and processed by the processing unit (not shown in Figure 5A) to perform. In order for those skilled in the art to further understand the natural language understanding system 520 of this embodiment, examples are given below for illustration. However, the present invention is only illustrated here and is not limited thereto. For example, hardware, software, firmware, or a combination of these three implementations can be used to implement the present invention. the

图5B是依照本发明一实施例所绘示的自然语言理解系统520的方块图。请参照图5B,本实施例的自然语言理解系统520可包括语音识别模块522、自然语言处理模块524以及语音合成模块526。其中,语音识别模块522会接收从语音取样模块510传来的请求信息,例如对第一语音输入501进行解析的第一请求信息503,并取出一个或多个第一关键字509(例如图1A的关键字108或字句等)。自然语言处理模块524可再对这些第一关键字509进行解析,而获得至少包含一个回报答案的候选列表(与图5A的处理方式相同,亦即例如通过图1A的检索系统200对结构化数据库220进行全文检索,并在取得回应结果110且对意图数据112比对后产生确定意图语法数据114,最后由分析结果输出模块116所送出的分析结果104来产生回报答案),并且会从候选列表所有的回报答案中选出一个较符合第一语音输入501的答案以做为第一回报答案511(例如挑选完全匹配记录…等)。由于第一回报答案511是自然语言理解系统520在内部分析而得的答案,所以还必须将它转换成语音输出才能输出予用户,这样用户才能进行判断。于是语音合成模块526会依据第一回报答案511来查询语音合成数据库530,而此语音合成数据库530例如是记录有文字以及其对应的语音信息,可使得语音合成模 块526能够找出对应于第一回报答案511的第一语音513,藉以合成出第一语音应答507。之后,语音合成模块526可将合成的第一语音应答507通过语音输出接口(未绘示)(其中语音输出接口例如为喇叭、扬声器、或耳机等装置)输出予用户。应注意的是,语音合成模块526在依据第一回报答案511查询语音合成数据库530时,可能需要先将第一回报答案511进行格式转换,然后通过语音合成数据库530所规定的接口进行呼叫。由于呼叫语音合成数据库530时是否需要进行格式转换与语音合成数据库530本身的定义相关,因这部分属于本领域的技术人员所熟知的技术,故在此不予详述。  FIG. 5B is a block diagram of a natural language understanding system 520 according to an embodiment of the present invention. Referring to FIG. 5B , the natural language understanding system 520 of this embodiment may include a speech recognition module 522 , a natural language processing module 524 and a speech synthesis module 526 . Wherein, the voice recognition module 522 will receive the request information transmitted from the voice sampling module 510, such as the first request information 503 for analyzing the first voice input 501, and take out one or more first keywords 509 (such as FIG. 1A keywords 108 or phrases, etc.). The natural language processing module 524 can analyze these first keywords 509 again, and obtain a candidate list containing at least one return answer (the processing method is the same as that of FIG. 220 carries out a full-text search, and after obtaining the response result 110 and comparing the intent data 112 to generate the definite intent grammar data 114, and finally the analysis result 104 sent by the analysis result output module 116 to generate a return answer), and will select from the candidate list Select an answer more consistent with the first voice input 501 from all reported answers as the first reported answer 511 (for example, select an exact match record, etc.). Since the first reported answer 511 is the answer obtained by the internal analysis of the natural language understanding system 520, it must be converted into a voice output before being output to the user, so that the user can make a judgment. Then the speech synthesis module 526 will query the speech synthesis database 530 according to the first reported answer 511, and this speech synthesis database 530 is for example recorded with text and its corresponding voice information, which can make the speech synthesis module 526 find out the corresponding A first voice 513 that reports an answer 511 to synthesize a first voice response 507 . Afterwards, the voice synthesis module 526 can output the synthesized first voice response 507 to the user through a voice output interface (not shown) (wherein the voice output interface is, for example, a loudspeaker, a loudspeaker, or an earphone). It should be noted that when the speech synthesis module 526 queries the speech synthesis database 530 according to the first reported answer 511 , it may first need to convert the format of the first reported answer 511 , and then make a call through the interface specified by the speech synthesis database 530 . Whether format conversion is required when calling the speech synthesis database 530 is related to the definition of the speech synthesis database 530 itself, and this part is a technology well known to those skilled in the art, so it will not be described in detail here. the

接下来列举实例来说明,若用户输入的是“我要看三国演义”的第一语音输入501话,语音识别模块522会接收从语音取样模块510传来的对第一语音输入501进行解析的第一请求信息503,然后取出例如是包含“三国演义”的第一关键字509。自然语言处理模块524则可再对这个第一关键字509“三国演义”进行解析(例如通过图1A的检索系统200对结构化数据库220进行全文检索,并在取得回应结果110且对意图数据112比对后产生确定意图语法数据114,最后由分析结果输出模块116所送出的分析结果104),进而产生包含“三国演义”的三个意图选项的回报答案,并将其整合成一候选列表(假设每个意图选项只有一个回报答案,其分别归类于“看书”、“看电视剧”、以及“看电影”三个选项),接着再从候选列表的这三个回报答案中选出一个在热度字段316具有最高值(例如挑选图3B的记录10)做为第一回报答案511。在一实施例中,可以直接执行热度字段316具有最高值的所对应的方式(例如先前所提的直接播放萧敬腾所演唱的“背叛”予用户),本发明并不对此加以限制。  Next enumerate examples to illustrate, if what the user inputs is the first voice input 501 of "I want to see Romance of the Three Kingdoms", the voice recognition module 522 will receive the first voice input 501 that is transmitted from the voice sampling module 510 and analyze it In the first request information 503, for example, the first keyword 509 containing "Romance of the Three Kingdoms" is taken out. The natural language processing module 524 can then analyze the first keyword 509 "Romance of the Three Kingdoms" (for example, perform a full-text search on the structured database 220 through the retrieval system 200 of FIG. After the comparison, generate the determined intent grammar data 114, and finally the analysis result 104 sent by the analysis result output module 116), and then generate the return answer containing three intention options of "Romance of the Three Kingdoms", and integrate it into a candidate list (assuming Each intent option has only one return answer, which are classified into three options of "reading", "watching TV series", and "watching movie" respectively), and then select one of the three return answers from the candidate list to be popular Field 316 has the highest value (eg pick record 10 of FIG. 3B ) as the first reported answer 511 . In one embodiment, the method corresponding to the highest value of the popularity field 316 can be directly executed (for example, directly playing "betrayal" sung by Xiao Jingteng to the user as mentioned earlier), and the present invention is not limited thereto. the

此外,自然语言处理模块524还可藉由解析后续所接收的第二语音输入501’(因为与先前的语音输入501运用同样的方式馈入语音取样模块510),而判断前次的第一回报答案511是否正确。因为第二语音输入501’是用户针对先前提供予用户的第一语音应答507所做的回应,其包含用户认为先前的第一语音应答507正确与否的信息。倘若在分析第二语音输入501’后是表示用户认为第一回报答案511不正确,自然语言处理模块524可选择上述候选列表中的其他回报答案做为第二回报答案511’,例如从候选列表中剔除第一回报答案511后,并在剩余的回报答案重新挑选一第二回报答案511’,再利用语音合成模块526找出对应于第二回报答案511’的第二语音513’,最后通 过语音合成模块526将第二语音513’合成为第二语音应答507’播放予用户。  In addition, the natural language processing module 524 can also judge the previous first report by analyzing the subsequent received second speech input 501' (because it is fed into the speech sampling module 510 in the same way as the previous speech input 501). Is the answer 511 correct. Because the second voice input 501' is the user's response to the first voice response 507 previously provided to the user, it contains information about whether the user thinks the previous first voice response 507 is correct or not. If after analyzing the second voice input 501', it indicates that the user thinks that the first reported answer 511 is incorrect, the natural language processing module 524 can select other reported answers in the above-mentioned candidate list as the second reported answer 511', for example, from the candidate list After removing the first reported answer 511, and re-selecting a second reported answer 511' from the remaining reported answers, and then using the speech synthesis module 526 to find out the second voice 513' corresponding to the second reported answer 511', the ultimatum The voice synthesis module 526 synthesizes the second voice 513' into a second voice response 507' and plays it to the user. the

延续先前用户输入“我要看三国演义”的例子来说,若用户想要看三国演义的电视剧,所以先前输出予用户的图3B记录10的选项(因为是看“三国演义”的电影)就不是用户想要的,于是用户可能再输入“我要看三国演义电视剧”(用户明确指出想看的是电视剧)、或是“我不要看三国演义电影”(用户只否定目前选项)…等作为第二语音输入501’。于是第二语音输入501’将在解析而取得其第二请求信息503’(或是第二关键字509’)后,会发现第二请求信息503’中的第二关键字509’将包含“电视剧”(用户有明确指示)或是“不要电影”(用户只否定目前选项),因此将判断第一回报答案511不符合用户的需求。是以,此时可以从候选列表再选出另一个回报答案做为第二回报答案511’并输出对应的第二语音应答507’,例如输出“我现在为您播放三国演义电视剧”的第二语音应答507’(如果用户明确指出想观看三国演义电视剧)、或是输出“您想要的是哪个选项”(如果用户只否定目前选项)的第二语音应答507’,并结合候选列表中其他的选项供用户选取(例如“挑选热度字段316数值次高的回报答案作为第二回报答案511’)。再者,在另一实施例中,若是用户所输入的第二语音输入501’包含“选择”的讯息,例如显示“观看三国演义书籍”、“观看三国演义电视剧”、以及“观看三国演义电影”三个选项给用户做选择时,用户可能输入“我要看电影”的第二语音输入501’时,将在分析第二语音输入501’的第二请求信息503’并发现用户的意图后(例如从第二关键字509’发现用户选择“观看电影”),于是第二语音输入501’将在解析而取得其第二请求信息503’后,输出“我现在为您播放三国演义电影”的第二语音应答507’(如果用户想观看三国演义电影)然后直接播放电影予用户。当然,若用户所输入的是“我要第三个选项”时(假设此时用户所选择的是阅读书籍),将执行第三选所对应的应用程序,亦即输出“您想要的是阅读三国演义书籍”的第二语音应答507’,并结合显示三国演义的电子书予用户的动作。  Continuing the example of the previous user inputting "I want to watch Romance of the Three Kingdoms", if the user wants to watch TV dramas of Romance of the Three Kingdoms, the option of record 10 in Figure 3B previously output to the user (because it is watching the movie "Romance of the Three Kingdoms") will be It is not what the user wants, so the user may input "I want to watch the Romance of the Three Kingdoms TV series" (the user clearly indicates that what he wants to watch is a TV series), or "I don't want to watch the Romance of the Three Kingdoms movie" (the user only denies the current option)...etc. The second voice input 501'. Then the second voice input 501' will analyze and obtain its second request information 503' (or second keyword 509'), and it will be found that the second keyword 509' in the second request information 503' will include " TV series" (the user has a clear instruction) or "no movie" (the user only negates the current option), so it will be judged that the first returned answer 511 does not meet the needs of the user. Therefore, at this time, another report answer can be selected from the candidate list as the second report answer 511' and the corresponding second voice response 507' is output, for example, the second voice response of "I will play the Romance of the Three Kingdoms TV series for you now" is output. Voice response 507' (if the user clearly indicates that he wants to watch the Romance of the Three Kingdoms TV series), or output the second voice response 507' of "which option you want" (if the user only negates the current option), combined with other options in the candidate list options for the user to select (for example, "select the return answer with the second highest numerical value in the popularity field 316 as the second return answer 511'). Moreover, in another embodiment, if the second voice input 501' input by the user includes " "Select" message, for example, when three options are displayed for the user to choose from: "Watch Romance of the Three Kingdoms Books", "Watch Romance of the Three Kingdoms TV Series", and "Watch Romance of the Three Kingdoms Movies", the user may input the second voice of "I want to watch a movie" When inputting 501', after analyzing the second request information 503' of the second voice input 501' and finding the user's intention (for example, finding that the user selects "watch a movie" from the second keyword 509'), the second voice input 501' will analyze and obtain its second request information 503', output the second voice response 507' of "I will play Romance of the Three Kingdoms movie for you now" (if the user wants to watch Romance of the Three Kingdoms movie) and then directly play the movie to the user. Of course, if what the user inputs is "I want the third option" (assuming that what the user chooses at this time is to read books), the application program corresponding to the third selection will be executed, that is, the output "what you want is Read the second voice response 507' of Romance of the Three Kingdoms book ", combined with the action of displaying the e-book of Romance of the Three Kingdoms to the user. 

在本实施例中,前述自然语言理解系统520中的语音识别模块522、自然语言处理模块524以及语音合成模块526可与语音取样模块510配置在同一机器中。在其他实施例中,语音识别模块522、自然语言处理模块524以及语音合成模块526亦可分散在不同的机器(例如计算机系统、伺服器或类似装置/系统)中。例如图5C所示的自然语言理解系统520’,语音合成模块 526可与语音取样模块510配置在同一机器502,而语音识别模块522、自然语言处理模块524可配置在另一机器。此外,在图5C的架构下,自然语言处理模块524会将第一回报答案511/第二回报答案511’传送至语音合成模块526,其随即以第一回报答案511/第二回报答案511’送往语音合成数据库以寻找对应的第一语音513/第二语音513’,作为产生第一语音应答507/第二语音应答507’的依据。  In this embodiment, the speech recognition module 522 , the natural language processing module 524 and the speech synthesis module 526 in the aforementioned natural language understanding system 520 can be configured in the same machine as the speech sampling module 510 . In other embodiments, the speech recognition module 522 , the natural language processing module 524 and the speech synthesis module 526 may also be distributed in different machines (such as computer systems, servers or similar devices/systems). For example, in the natural language understanding system 520' shown in Figure 5C, the speech synthesis module 526 and the speech sampling module 510 can be configured in the same machine 502, while the speech recognition module 522 and the natural language processing module 524 can be configured in another machine. In addition, under the framework of FIG. 5C, the natural language processing module 524 will send the first reported answer 511/second reported answer 511' to the speech synthesis module 526, which then uses the first reported answer 511/second reported answer 511' Send to the speech synthesis database to find the corresponding first speech 513/second speech 513', as the basis for generating the first speech response 507/second speech response 507'. the

图6是依照本发明一实施例所绘示的修正第一语音应答507的方法流程图。在本实施例中的修正第一语音应答507的方法中,当用户认为目前所播放的第一语音应答507不符合其先前所输入的第一请求信息503时,会再输入第二语音输入501’并馈入语音取样模块510,随后再由自然语言理解系统520分析而得知先前播放予用户的第一语音应答507并不符合用户的意图时,自然语言理解系统520可再次输出第二语音应答507’,藉以修正原本的第一语音应答507。为了方便说明,在此仅举图5A的自然语言对话系统500为例,但本实施例的修正第一语音应答507的方法亦可适用于上述图5C的自然语言对话系统500’。  FIG. 6 is a flowchart of a method for modifying the first voice response 507 according to an embodiment of the present invention. In the method for modifying the first voice response 507 in this embodiment, when the user thinks that the currently played first voice response 507 does not conform to the first request information 503 that he previously input, he will input the second voice input 501 again. 'and fed into the voice sampling module 510, and then analyzed by the natural language understanding system 520 and learned that the first voice response 507 previously played to the user does not meet the user's intention, the natural language understanding system 520 can output the second voice again Response 507 ′, so as to modify the original first voice response 507 . For convenience of description, only the natural language dialogue system 500 in FIG. 5A is taken as an example here, but the method for modifying the first voice response 507 in this embodiment is also applicable to the natural language dialogue system 500' in FIG. 5C. the

请同时参照图5A及图6,于步骤S602中,语音取样模块510会接收第一语音输入501(亦同样馈入语音取样模块510)。其中,第一语音输入501例如是来自用户的语音,且第一语音输入501还可具有用户的第一请求信息503。具体而言,来自用户的第一语音输入501可以是询问句、命令句或其他请求信息等,例如「我要看三国演义」、「我要听忘情水的音乐」或「今天温度几度」等等。  Please refer to FIG. 5A and FIG. 6 at the same time. In step S602, the voice sampling module 510 receives the first voice input 501 (also fed into the voice sampling module 510). Wherein, the first voice input 501 is, for example, a voice from a user, and the first voice input 501 may also have the user's first request information 503 . Specifically, the first voice input 501 from the user may be an inquiry sentence, a command sentence, or other request information, such as "I want to watch Romance of the Three Kingdoms", "I want to listen to the music of Wangqingshui" or "what's the temperature today" etc. the

于步骤S604中,自然语言理解系统520会解析第一语音输入501中所包括的至少一个第一关键字509而获得候选列表,其中候选列表具有一个或多个回报答案。举例来说,当用户的第一语音输入501为「我要看三国演义」时,自然语言理解系统520经过分析后所获得的第一关键字509例如是「『三国演义』、『看』」。又例如,当用户的第一语音输入501为「我要听忘情水的歌」时,自然语言理解系统520经过分析后所获得的第一关键字509例如是「『忘情水』、『听』、『歌』」。  In step S604 , the natural language understanding system 520 analyzes at least one first keyword 509 included in the first voice input 501 to obtain a candidate list, wherein the candidate list has one or more reported answers. For example, when the user's first voice input 501 is "I want to watch Romance of the Three Kingdoms", the first keyword 509 obtained by the natural language understanding system 520 after analysis is, for example, ""Romance of the Three Kingdoms", "watch"" . For another example, when the user's first voice input 501 is "I want to listen to the song of Wangqingshui", the first keyword 509 obtained by the natural language understanding system 520 after analysis is, for example, ""Wangqingshui", "listen". ,"Song"". the

接后,自然语言理解系统520可依据上述第一关键字509自结构化数据库220进行查询,而获得至少一笔搜寻结果(例如图1的分析结果104),据以做为候选列表中的回报答案。至于从多个回报答案中选择第一回报答案 511的方式可如图1A所述,在此不予以赘述。由于第一关键字509可能包含不同的知识领域(例如电影类、书籍类、音乐类或游戏类等等),且同一知识领域中亦可进一步分成多种类别(例如同一电影或书籍名称的不同作者、同一歌曲名称的不同演唱者、同一游戏名称的不同版本等等),故针对第一关键字509而言,自然语言理解系统520可在结构化数据库中查询到一笔或多笔相关于此第一关键字509的搜寻结果(例如分析结果104),其中每一笔搜寻结果中可包括相关于此第一关键字509的指引数据(例如以“萧敬腾”、“背叛”为关键字108在图3A、3B的结构化数据库220进行全文检索时,将得到例如图3A的记录6与7两组匹配结果,它们分别包含“singerguid”、“songnameguid”的指引数据,此指引数据为储存在指引字段310的数据)与其他数据。其中,其他数据例如是在搜寻结果中,除了与第一关键字709相关以外的其他关键字等等(例如以“一起走过的日子”为关键字且在图3A的结构化数据库220做全文检索而得到记录1为匹配结果时,“刘德华”与“港台,粤语,流行”两者即为其他数据)。因此从另一观点来看,当用户所输入的第一语音输入501具有多个第一关键字509时,则表示用户的第一请求信息503较明确,使得自然语言理解系统520较能查询到与第一请求信息503接近的搜寻结果。  Then, the natural language understanding system 520 can query from the structured database 220 according to the above-mentioned first keyword 509, and obtain at least one search result (for example, the analysis result 104 in FIG. 1 ), as a return in the candidate list Answer. As for the method of selecting the first return answer 511 from multiple return answers, it can be described in FIG. 1A , and will not be repeated here. Since the first keyword 509 may contain different fields of knowledge (such as movies, books, music, or games, etc.), and the same field of knowledge can be further divided into multiple categories (such as different categories of the same movie or book title) authors, different singers of the same song name, different versions of the same game name, etc.), so for the first keyword 509, the natural language understanding system 520 can query one or more records related to The search results of the first keyword 509 (such as the analysis result 104), wherein each search result may include guidance data related to the first keyword 509 (such as "Hsiao Jingteng", "betrayal" as the keywords 108 When the structured database 220 in FIGS. 3A and 3B performs a full-text search, for example, two groups of matching results of records 6 and 7 in FIG. 3A will be obtained, which respectively include the guide data of "singerguid" and "songnameguid", which are stored in The data of the guide field 310) and other data. Wherein, other data are, for example, in the search results, other keywords etc. except those related to the first keyword 709 (for example, take "the days we walked together" as the keyword and do the full text in the structured database 220 of Fig. 3A When record 1 obtained from the retrieval is a matching result, "Andy Lau" and "Hong Kong and Taiwan, Cantonese, popular" are other data). Therefore, from another point of view, when the first voice input 501 input by the user has multiple first keywords 509, it means that the user's first request information 503 is relatively clear, so that the natural language understanding system 520 can better query Search results close to the first request information 503 . the

举例来说,当第一关键字509为「三国演义」时(例如用户输入“我要看三国演义”的语音输入时),自然语言理解系统520分析后可能产生三个可能意图语法数据106(如图1所示):  For example, when the first keyword 509 is "Romance of the Three Kingdoms" (for example, when the user enters the voice input of "I want to watch Romance of the Three Kingdoms"), the natural language understanding system 520 may generate three possible intent grammar data 106 after analysis ( As shown in Figure 1):

"<readbook>,<bookname>=三国演义";  "<readbook>,<bookname>=Romance of the Three Kingdoms"; 

"<watchTV>,<TVname>=三国演义";以及  "<watchTV>,<TVname>=Romance of the Three Kingdoms"; and

"<watchfilm>,<filmname>=三国演义"。  "<watchfilm>,<filmname>=Romance of the Three Kingdoms". the

因此查讯到的搜寻结果是关于「...『三国演义』...『书籍』」(意图数据为<readbook>)、「...『三国演义』...『电视剧』」(意图数据为<watchTV>)、「...『三国演义』...『电影』」(意图数据为<watchfilm>)的记录(例如图3B的记录8、9、10),其中『电视剧』及『书籍』、『电影』分别列举对应的用户意图)。又例如,当第一关键字509为「『忘情水』、『音乐』」(例如用户输入“我要听忘情水的音乐”的语音输入)时,自然语言理解系统520分析后可能产生以下的可能意图语法数据:  Therefore, the search results inquired are about "..."Romance of the Three Kingdoms"..."Book"" (the intention data is <readbook>), "..."Romance of the Three Kingdoms"..."TV Drama"" (the intention The data is <watchTV>), "..."Romance of the Three Kingdoms"..."movie"" (intent data is <watchfilm>) records (such as records 8, 9, and 10 in Figure 3B), where "TV drama" and "Books" and "Movies" respectively list the corresponding user intents). For another example, when the first keyword 509 is ""Wangqingshui", "music"" (for example, the user enters the voice input of "I want to listen to the music of Wangqingshui"), the natural language understanding system 520 may generate the following after analysis Possible intent syntax data:

"<playmusic>,<songname>=忘情水";  "<playmusic>,<songname>=Forget Love Water"; 

所查讯到的搜寻结果例如关于「...『忘情水』...『刘德华』」的记录(例如图3B的记录11)、「...『忘情水』...『李翊君』」的记录(例如图3B的记录12),其中『刘德华』及『李翊君』为对应于用户的意图数据。换言之,每一笔搜寻结果可包括第一关键字509以及相关于第一关键字509的意图数据,而自然语言理解系统520会依据所查询到的搜寻结果,将搜寻结果中所包括的数据转换成回报答案,并将回报答案记录于候选列表中,以供后续步骤使用。  The search results inquired are, for example, records about "..."Wangqingshui"..."Andy Lau"" (such as record 11 in Figure 3B), "..."Wangqingshui"..."Li Yijun"" (for example, record 12 in FIG. 3B ), wherein "Andy Lau" and "Li Yijun" are the intention data corresponding to the user. In other words, each search result may include the first keyword 509 and intent data related to the first keyword 509, and the natural language understanding system 520 will convert the data included in the search result according to the search result queried Generate a report answer, and record the report answer in the candidate list for use in subsequent steps. the

于步骤S606中,自然语言理解系统520会自候选列表中选择至少一第一回报答案511,并依据第一回报答案511输出对应的第一语音应答507。在本实施例中,自然语言理解系统520可按照优先顺序排列候选列表中的回报答案,并依据优先顺序自候选列表中选出回报答案,据以输出第一语音应答507。  In step S606 , the natural language understanding system 520 selects at least one first reported answer 511 from the candidate list, and outputs a corresponding first voice response 507 according to the first reported answer 511 . In this embodiment, the natural language understanding system 520 can arrange the reported answers in the candidate list according to the priority, and select the reported answer from the candidate list according to the priority, so as to output the first voice response 507 . the

举例来说,当第一关键字509为「三国演义」时,假设自然语言理解系统520查询到很多笔关于「...『三国演义』...『书籍』」的记录(亦即以查询到的数据数量多寡做优先顺序),其次为「...『三国演义』...『音乐』」的记录,而关于「...『三国演义』...『电视剧』」的记录数量最少,则自然语言理解系统520会将「三国演义的书籍」做为第一回报答案(最优先选择的回报答案),「三国演义的音乐」做为第二回报答案(第二优先选择的回报答案),「三国演义的电视剧」做为第三回报答案(第三优先选择的回报答案)。当然,若相关于「三国演义的书籍」的第一回报答案不只一笔记录时,还可以依据应先顺序(例如被点选次数多寡)来挑选第一回报答案511,相关细节前面已提过,在此不予赘述。  For example, when the first keyword 509 is "Romance of the Three Kingdoms", it is assumed that the natural language understanding system 520 has inquired many records about "..."Romance of the Three Kingdoms"..."book"" (that is, by querying The number of data received is prioritized), followed by the records of "..."Romance of the Three Kingdoms"..."Music"", and the number of records about "..."Romance of the Three Kingdoms"..."TV Drama"" At least, then the natural language understanding system 520 will use "the books of the Romance of the Three Kingdoms" as the first return answer (the most preferred return answer), and "the music of the Romance of the Three Kingdoms" as the second return answer (the second most preferred return answer) Answer), "The TV series of the Romance of the Three Kingdoms" as the third return answer (the third preferred return answer). Of course, if there is more than one record of the first return answer related to "Books of the Romance of the Three Kingdoms", the first return answer 511 can also be selected according to the priority order (for example, the number of clicks). The relevant details have been mentioned above. , which will not be described here. the

接着,于步骤S608,语音取样模块510会接收第二语音输入501’,而自然语言理解系统520会解析此第二语音输入501’,并判断先前所选出的第一回报答案511是否正确。在此,语音取样模块510会对第二语音输入501’进行解析,以解析出第二语音输入501’所包括的第二关键字509’,其中此第二关键字509’例如是用户进一步提供的关键字(例如时间、意图、知识领域…等等)。并且,当第二语音输入501’中的第二关键字509’与第一回报答案511中所相关的意图数据不相符时,自然语言理解系统520会判断先前所选出的第一回报答案511为不正确。至于判断第二语音输入501’的第二请求信息503’包含的是“正确”或“否定”第一语音应答507的方式前面已提过,在 此不予赘述。  Next, in step S608, the voice sampling module 510 receives the second voice input 501', and the natural language understanding system 520 analyzes the second voice input 501', and determines whether the previously selected first reported answer 511 is correct. Here, the voice sampling module 510 will analyze the second voice input 501' to analyze the second keyword 509' included in the second voice input 501', wherein the second keyword 509' is, for example, further provided by the user. keywords (e.g. time, intent, domain of knowledge...etc.). And, when the second keyword 509' in the second voice input 501' does not match the relevant intent data in the first reported answer 511, the natural language understanding system 520 will determine that the previously selected first reported answer 511 is incorrect. As for judging whether the second request information 503' of the second voice input 501' contains "correct" or "negative" first voice response 507, it has been mentioned above, and will not be repeated here. the

进一步而言,自然语言理解系统520所解析的第二语音输入501’可包括或不包括明确的第二关键字509’。举例来说,语音取样模块510例如是接收到来自用户所说的「我不是指三国演义的书籍」(情况A)、「我不是指三国演义的书籍,我是指三国演义的电视剧」(情况B)、「我是指三国演义的电视剧」(情况C)等等。上述情况A中的第二关键字509’例如为「『不是』、『三国演义』、『书籍』」,情况B中的关键字509例如为「『不是』、『三国演义』、『书籍』,『是』、『三国演义』、『电视剧』」,而情况C中的第二关键字509’例如为「『是』、『三国演义』、『电视剧』」。为了方便说明,上述仅列举情况A、B及C为例,但本实施例并不限于此。  Furthermore, the second speech input 501' parsed by the natural language understanding system 520 may or may not include an explicit second keyword 509'. For example, the voice sampling module 510 receives, for example, "I do not refer to the books of the Romance of the Three Kingdoms" (case A), "I do not refer to the books of the Romance of the Three Kingdoms, but I refer to the TV dramas of the Romance of the Three Kingdoms" (case A) from the user. B), "I mean the TV drama Romance of the Three Kingdoms" (case C) and so on. The second keyword 509' in the above case A is for example ""not", "Romance of the Three Kingdoms", "book"", and the keyword 509 in case B is for example "" not", "Romance of the Three Kingdoms", "book" , "Yes", "Romance of the Three Kingdoms", "TV drama"", and the second keyword 509' in the situation C is, for example, "" is ", "Romance of the Three Kingdoms", "TV drama"". For the convenience of description, the above-mentioned cases A, B and C are only cited as examples, but this embodiment is not limited thereto. the

接着,自然语言理解系统520会依据上述第二语音输入501’所包括的第二关键字509’,来判断第一回报答案511中相关的意图数据是否正确。也就是说,倘若断第一回报答案511为「三国演义的书籍」,而上述第二关键字509’为「『三国演义』、『电视剧』」,则自然语言理解系统520会判断第一回报答案511中相关的意图数据(即用户想看三国演义『书籍』)不符合来自用户第二语音输入501’的第二关键字509’(即用户想看三国演义『电视剧』),藉以判断第一回报答案511不正确。类似地,倘若判断回报答案为「三国演义的书籍」,而上述第二关键字509’为「『不是』、『三国演义』、『书籍』」,则自然语言理解系统520亦会判断出第一回报答案511不正确。  Next, the natural language understanding system 520 will judge whether the relevant intention data in the first reported answer 511 is correct or not according to the second keyword 509' included in the second voice input 501'. That is to say, if the first return answer 511 is determined to be "Books of the Romance of the Three Kingdoms", and the above-mentioned second keyword 509' is ""Romance of the Three Kingdoms", "TV series"", the natural language understanding system 520 will determine the first return The relevant intent data in answer 511 (that is, the user wants to watch Romance of the Three Kingdoms "Book") does not match the second keyword 509' from the user's second voice input 501' (that is, the user wants to watch Romance of the Three Kingdoms "TV series"), so as to determine the One returns the answer 511 incorrect. Similarly, if the judgment return answer is "Books of the Romance of the Three Kingdoms", and the above-mentioned second keyword 509' is ""No", "Romance of the Three Kingdoms", "Book"", the natural language understanding system 520 will also judge that the first One returns the answer 511 incorrect. the

当自然语言理解系统520解析第二语音输入501之后,判断之前输出的第一语音应答501为正确时,则如步骤S610所示,自然语言理解系统520会做出对应于第二语音输入501’的回应。举例来说,假设来自用户的第二语音输入501’为「是的,是三国演义的书籍」,则自然语言理解系统520可以是输出「正在帮您开启三国演义的书籍」的第二语音应答507’。或者,自然语言理解系统520可在播放第二语音应答507’的同时,直接通过处理单元(未绘示)来载入三国演义的书籍内容。  When the natural language understanding system 520 analyzes the second voice input 501 and judges that the previously output first voice response 501 is correct, then as shown in step S610, the natural language understanding system 520 will make a response corresponding to the second voice input 501′. response. For example, assuming that the second voice input 501' from the user is "Yes, it is a book about Romance of the Three Kingdoms", the natural language understanding system 520 may output the second voice response of "Helping you open the book about Romance of the Three Kingdoms" 507'. Alternatively, the natural language understanding system 520 can directly load the book content of Romance of the Three Kingdoms through a processing unit (not shown) while playing the second voice response 507'. the

然而,当自然语言理解系统520解析第二语音输入501’之后,判断之前输出的第一语音应答507(亦即回报答案511)不正确时,则如步骤S612所示,自然语言理解系统520会自候选列表中选择第一回报答案511之外的另一者,并依据所选择的结果输出第二语音应答507’。在此,倘若用户所提供的第二语音输入501’中不具有明确的第二关键字509’(如上述情况A的第 二语音输入501’),则自然语言理解系统520可依据优先顺序从候选列表中选出第二优先选择的回报答案。或者,倘若用户所提供的第二语音输入501’中具有明确的第二关键字509’(如上述情况B及C的第二语音输入501’),则自然语言理解系统520可直接依据用户所指引的第二关键字509’,在从候选列表中选出对应的回报答案。  However, when the natural language understanding system 520 analyzes the second voice input 501' and judges that the previously output first voice response 507 (that is, the reported answer 511) is incorrect, then as shown in step S612, the natural language understanding system 520 will Select another answer other than the first reported answer 511 from the candidate list, and output a second voice response 507' according to the selected result. Here, if the second voice input 501' provided by the user does not have a clear second keyword 509' (such as the second voice input 501' in the above case A), the natural language understanding system 520 can start from Select the second preferred return answer from the candidate list. Alternatively, if the second voice input 501' provided by the user has a clear second keyword 509' (such as the second voice input 501' in the above cases B and C), the natural language understanding system 520 can directly The second keyword 509' of the guide selects the corresponding return answer from the candidate list. the

另一方面,倘若用户所提供的第二语音输入501’中具有明确的第二关键字509’(如上述情况B及C的第二语音输入),但自然语言理解系统520在候选列表中查无符合此第二关键字509的回报答案,则自然语言理解系统520会输出第三语音应答,例如「查无此书」或「我不知道」等。  On the other hand, if the second voice input 501' provided by the user has a clear second keyword 509' (such as the second voice input of the above-mentioned cases B and C), but the natural language understanding system 520 searches the candidate list If there is no returned answer matching the second keyword 509, the natural language understanding system 520 will output a third voice response, such as "I can't find this book" or "I don't know". the

为了使本领域的技术人员进一步了解本实施例的修正语音应答的方法以及自然语言对话系统,以下再举一实施例进行详细的说明。  In order for those skilled in the art to further understand the method for modifying voice response and the natural language dialogue system of this embodiment, another embodiment is given below for detailed description. the

首先,假设语音取样模块510接收的第一语音输入501为「我要看三国演义」(步骤S602),接着,自然语言理解系统520可解析出为「『看』、『三国演义』」的第一关键字509,并获得具有多个第一回报答案的候选列表,其中每一个回报答案具有相关的关键字与其他数据(其他数据可储存于图3A/3B的内容字段306中、或是各记录302的数值字段312的一部份)(步骤S604),如表一所示(假设搜寻结果中关于三国演义的书籍/电视剧/音乐/电影各只有一笔数据)。  First, assume that the first voice input 501 received by the voice sampling module 510 is "I want to watch Romance of the Three Kingdoms" (step S602), and then, the natural language understanding system 520 can analyze the first voice input 501 as ""watch", "Romance of the Three Kingdoms"" A keyword 509, and obtain a candidate list with a plurality of first reported answers, wherein each reported answer has related keywords and other data (other data can be stored in the content field 306 of Fig. 3A/3B, or each A part of the value field 312 of the record 302) (step S604), as shown in Table 1 (assuming that there is only one piece of data about the books/TV dramas/music/movies about Romance of the Three Kingdoms in the search results). the

表一  Table I

Figure BDA00003204415900311
Figure BDA00003204415900311

接着,自然语言理解系统520会在候选列表中选出所需的回报答案。假设自然语言理解系统520依序选取候选列表中的回报答案a以做为第一回报答案511,则自然语言理解系统520例如是输出「是否播放三国演义的书籍」,即第一语音应答507(步骤S606)。  Next, the natural language understanding system 520 will select the desired answer from the candidate list. Assuming that the natural language understanding system 520 sequentially selects the return answer a in the candidate list as the first return answer 511, the natural language understanding system 520, for example, outputs "whether to play the book of Romance of the Three Kingdoms", that is, the first voice response 507 ( Step S606). the

此时,若语音取样模块510接收的第二语音输入501’为「是的」(步骤S608),则自然语言理解系统520会判断出上述的回报答案a为正确,且自然语言理解系统520会输出另一语音应答507「请稍候」(亦即第二语音应答507’),并通过处理单元(未绘示)来载入三国演义的书籍内容(步骤S610)。  At this time, if the second voice input 501' received by the voice sampling module 510 is "Yes" (step S608), the natural language understanding system 520 will determine that the above-mentioned reported answer a is correct, and the natural language understanding system 520 will Output another voice response 507 "please wait" (that is, the second voice response 507'), and load the book content of Romance of the Three Kingdoms through a processing unit (not shown) (step S610). the

然而,若语音取样模块510接收的第二语音输入501’为「我不是指三国演义的书籍」(步骤S608),则自然语言理解系统520会判断出上述的回报答案a为不正确,且自然语言理解系统520会再从候选列表的回报答案b~e中,选出另一回报答案做第二回报答案511’,其例如是回报答案b的「是否要播放三国演义的电视剧」。倘若用户继续回答「不是电视剧」,则自然语言理解系统520会选择回报答案c~e的其中之一来回报。此外,倘若候选列表中的回报答案a~e皆被自然语言理解系统520回报予用户过,且这些回报答案a~e中没有符合用户的语音输入501时,则自然语言理解系统520输出「查无任何数据」的语音应答507(步骤S612)。  However, if the second voice input 501' received by the voice sampling module 510 is "I do not refer to the books of Romance of the Three Kingdoms" (step S608), the natural language understanding system 520 will judge that the above-mentioned return answer a is incorrect, and naturally The language understanding system 520 will then select another reported answer from the reported answers b-e in the candidate list as the second reported answer 511 ′, which is, for example, “whether to play the TV series Romance of the Three Kingdoms” for the reported answer b. If the user continues to answer "not a TV series", the natural language understanding system 520 will choose to report one of the answers c-e. In addition, if the reported answers a~e in the candidate list have been reported to the user by the natural language understanding system 520, and none of the reported answers a~e matches the user's voice input 501, the natural language understanding system 520 outputs "check No data" voice response 507 (step S612). the

在另一实施例中,于上述的步骤S608,若语音取样模块510接收用户的第二语音输入501’为「我是指三国演义的漫画」,在此,由于候选列表中并无关于漫画的回报答案,故自然语言理解系统520会直接输出「查无任何数据」的第二语音应答507’。  In another embodiment, in the above-mentioned step S608, if the voice sampling module 510 receives the user's second voice input 501' as "I mean the comics of the Romance of the Three Kingdoms", here, since there is no comics in the candidate list Return the answer, so the natural language understanding system 520 will directly output the second voice response 507' of "no data found". the

基于上述,自然语言理解系统520可依据来自用户的第一语音输入501而输出对应的第一语音应答507。其中,当自然语言理解系统520所输出的第一语音应答507不符合用户的第一语音输入501的请求信息503时,自然语言理解系统520可修正原本输出的第一语音应答507,并依据用户后续所提供的第二语音输入501’,进一步输出较符合用户第一请求信息503的第二语音应答507’。如此一来,倘若用户仍不满意自然语言理解系统520所提供的答案时,自然语言理解系统520可自动地进行修正,并回报新的语音应答予用户,藉以增进用户与自然语言对话系统500进行对话时的便利性。  Based on the above, the natural language understanding system 520 can output a corresponding first voice response 507 according to the first voice input 501 from the user. Wherein, when the first voice response 507 output by the natural language understanding system 520 does not conform to the request information 503 of the user's first voice input 501, the natural language understanding system 520 can modify the originally output first voice response 507, and according to the user's The subsequent second voice input 501 ′ further outputs a second voice response 507 ′ that is more in line with the user's first request information 503 . In this way, if the user is still dissatisfied with the answer provided by the natural language understanding system 520, the natural language understanding system 520 can automatically make corrections and report a new voice response to the user, so as to enhance the interaction between the user and the natural language dialogue system 500. Ease of conversation. the

值得一提的是,在图6的步骤S606与步骤S612中,自然语言理解系统520还可依照不同评估优先顺序的方法,来排序候选列表中的回报答案,据以按照此优先顺序自候选列表中选出回报答案,再输出对应于回报答案的语音应答。  It is worth mentioning that, in step S606 and step S612 in FIG. 6 , the natural language understanding system 520 can also sort the reported answers in the candidate list according to different methods of evaluating priority order, and then select the answer from the candidate list according to this priority order. Select the reported answer from among, and then output the voice response corresponding to the reported answer. the

举例来说,自然语言理解系统520可依据众人使用习惯,来排序候选列表中的第一回报答案511的优先顺序,其中越是关于众人经常使用的答案则优先排列。例如,再以第一关键字509为「三国演义」为例,假设自然语言理解系统520找到的回报答案为三国演义的电视剧、三国演义的书籍与三国演义的音乐。其中,若众人提到「三国演义」时通常是指「三国演义」的书籍,较少人会指「三国演义」的电视剧,而更少人会指「三国演义」的音乐(例如使用图3C中的热度字段316所储存的数值来代表全部用户的匹配情形时,热度字段316的数值在「三国演义」的「书籍」记录上会最高),则自然语言理解系统520会按照优先顺序排序关于「书籍」、「电视剧」、「音乐」的回报答案。也就是说,自然语言理解系统520会优先选择「三国演义的书籍」来做为第一回报答案511,并依据此第一回报答案511输出第一语音应答507。  For example, the natural language understanding system 520 can prioritize the first reported answers 511 in the candidate list according to the usage habits of the people, and the answers that are more frequently used by the people are prioritized. For example, taking the first keyword 509 as "Romance of the Three Kingdoms" as an example, assume that the returned answers found by the natural language understanding system 520 are TV dramas of Romance of the Three Kingdoms, books of Romance of the Three Kingdoms, and music of Romance of the Three Kingdoms. Among them, when people mention "Romance of the Three Kingdoms", they usually refer to the books of "Romance of the Three Kingdoms", less people refer to the TV series of "Romance of the Three Kingdoms", and even fewer people refer to the music of "Romance of the Three Kingdoms" (for example, use Figure 3C When the value stored in the heat field 316 in represents the matching situation of all users, the value of the heat field 316 will be the highest in the "book" record of "Romance of the Three Kingdoms", then the natural language understanding system 520 will sort according to the priority order about Return answers for "Books", "TV Dramas", and "Music". That is to say, the natural language understanding system 520 will preferentially select “Books of Romance of the Three Kingdoms” as the first reported answer 511 , and output the first voice response 507 according to the first reported answer 511 . the

此外,自然语言理解系统520亦可依据用户习惯,以决定回报答案的优先顺序。具体来说,自然语言理解系统520可将曾经接收到来自用户的语音输入(包括第一语音输入501、第二语音输入501’、或是任何由用户所输入的语音输入)记录在特性数据库(例如图7A/7B所示),其中特性数据库可储存在硬碟等储存装置中。特性数据库可记录自然语言理解系统520解析用户的语音输入501时,所获得的第一关键字509以及自然语言理解系统520所产生的应答记录等关于用户喜好、习惯等数据。关于用户喜好/习惯数据的储存与撷取,将在后面通过图7A/7B/8做更进一步的说明。此外,在一实施例中,在图3C中的热度字段316所储存的数值是与用户的习惯(例如匹配次数)相关时,可用热度字段316的数值判断用户的使用习惯或优先顺序。因此,自然语言理解系统520在选择回报答案时,可根据特性数据库730中所记录的用户习惯等信息,按照优先排序回报答案,藉以输出较符合用户的语音输入501的语音应答507。举例来说,在图3B中,记录8/9/10的热度字段316所储存的数值分别为2/5/8,其可分别代表「三国演义」的「书籍」、「电视剧」、「电影」的匹配次数分别为2/5/8,所以对应于「三国演义的电影」 的回报答案将被优先选择。  In addition, the natural language understanding system 520 can also determine the priority order of the reported answers according to user habits. Specifically, the natural language understanding system 520 can record the speech input (including the first speech input 501, the second speech input 501', or any speech input by the user) once received from the user in the characteristic database ( For example, as shown in FIG. 7A/7B), wherein the characteristic database can be stored in a storage device such as a hard disk. The characteristic database can record the first keyword 509 obtained when the natural language understanding system 520 parses the user's voice input 501 , and the response records generated by the natural language understanding system 520 and other data about user preferences and habits. The storage and retrieval of user preference/habit data will be further described later with reference to FIGS. 7A/7B/8. In addition, in one embodiment, when the value stored in the popularity field 316 in FIG. 3C is related to the user's habit (eg, matching times), the user's usage habit or priority can be judged by the value of the popularity field 316 . Therefore, when the natural language understanding system 520 selects an answer to report, it can report answers in priority order according to information such as user habits recorded in the characteristic database 730 , so as to output a voice response 507 that is more in line with the user's voice input 501 . For example, in FIG. 3B , the values stored in the heat field 316 of records 8/9/10 are 2/5/8, respectively, which can respectively represent the "books", "television dramas" and "movies" of "Romance of the Three Kingdoms". " are 2/5/8 respectively, so the return answer corresponding to "Movies of the Romance of the Three Kingdoms" will be given priority. the

另一方面,自然语言理解系统520亦可依据用户习惯来选择回报答案。举例来说,假设用户与自然语言理解系统520进行对话时,经常提起到「我要看三国演义的书籍」,而较少提起「我要看三国演义的电视剧」,且更少提到「我要看三国演义的音乐」(例如用户对话数据库中记录有20笔关于「三国演义的书籍」的记录(例如图3B记录8的喜好字段318所示),8笔关于「三国演义的电视剧」的记录(例如图3B记录9的喜好字段318所示),以及1笔关于「三国演义的音乐」的记录),则候选列表中的回报答案的优先顺序将会依序为「三国演义的书籍」、「三国演义的电视剧」以及「三国演义的音乐」。也就是说,当第一关键字509为「三国演义」时,自然语言理解系统520会选择「三国演义的书籍」来做为第一回报答案511,并依据此第一回报答案511输出第一语音应答507。  On the other hand, the natural language understanding system 520 can also choose to report an answer according to user habits. For example, suppose that when the user talks to the natural language understanding system 520, he often mentions "I want to read the books of the Romance of the Three Kingdoms", but seldom mentions "I want to watch the TV series of the Romance of the Three Kingdoms", and even less mentions "I want to watch the Romance of the Three Kingdoms". To watch the music of Romance of the Three Kingdoms" (for example, there are 20 records about "Books of Romance of the Three Kingdoms" in the user dialogue database (such as shown in the favorite field 318 of record 8 in Figure 3B), 8 records about "TV dramas of Romance of the Three Kingdoms" record (such as shown in the favorite field 318 of record 9 in Figure 3B), and a record about "Music of the Romance of the Three Kingdoms"), the priority order of the returned answers in the candidate list will be "Books of the Romance of the Three Kingdoms" , "TV Dramas of the Romance of the Three Kingdoms" and "Music of the Romance of the Three Kingdoms". That is to say, when the first keyword 509 is "Romance of the Three Kingdoms", the natural language understanding system 520 will select "Books of the Romance of the Three Kingdoms" as the first return answer 511, and output the first answer 511 based on the first return answer 511. Voice response 507. the

值得一提的是,自然语言理解系统520还可依据用户喜好,以决定回报答案的优先顺序。具体来说,用户对话数据库还可记录有用户所表达过的关键字,例如:「喜欢」、「偶像」、「厌恶」或「讨厌」等等。因此,自然语言理解系统520可自候选列表中,依据上述关键字被记录的次数来对回报答案进行排序。举例来说,假设回报答案中相关于「喜欢」的次数较多,则此回报答案会优先被选取。或者,假设回报答案中相关于「厌恶」的次数较多,则较后被选取。  It is worth mentioning that the natural language understanding system 520 can also determine the priority order of the returned answers according to user preferences. Specifically, the user dialogue database can also record the keywords expressed by the user, for example: "like", "idol", "dislike" or "hate" and so on. Therefore, the natural language understanding system 520 can sort the reported answers from the candidate list according to the number of times the keywords are recorded. For example, if there are more times of "like" in the reported answer, the reported answer will be selected first. Or, if there are more times of "dislike" in the reported answer, it will be selected later. the

举例来说,假设用户与自然语言理解系统520进行对话时,经常提到「我讨厌看三国演义的电视剧」,而较少提到「我讨厌听三国演义的音乐」,且更少提到「我讨厌听三国演义的书籍」(例如用户对话数据库中记录有20笔关于「我讨厌看三国演义的电视剧」的记录(例如可通过图3B记录9的厌恶字段320做记录),8笔关于「我讨厌听三国演义的音乐」的记录,以及1笔关于「我讨厌看三国演义的书籍」(例如通过图3B记录8的厌恶字段320做记录)),则候选列表中的回报答案的优先顺序依序是「三国演义的书籍」、「三国演义的电视剧」以及「三国演义的音乐」。也就是说,当第一关键字509为「三国演义」时,自然语言理解系统520会选择「三国演义」的书籍来做为第一回报答案511,并依据此第一回报答案511输出第一语音应答507。在一实施例中,可以在图3B的热度字段316外另外加一个“厌恶字段320”,用以记录用户的“厌恶程度”。在另一个实施例中,可以在解析到用 户对某一记录的“厌恶”信息时,直接在对应记录的热度字段316(或喜好字段318)上减一(或其他数值),这样可以在不增加字段时记录用户的喜好。各种记录用户喜好的实施方式都可应用在本发明实施例中,本发明并不对此加以限制。其他关于用户习惯数据的记录与运用、以及用户/众人使用习惯及喜好…等方式来提供应答及回报答案的实施例,会在后面的图7A/7B/8做更详尽的解说。  For example, assume that when a user talks to the natural language understanding system 520, he often mentions "I hate watching TV dramas of Romance of the Three Kingdoms", and seldom mentions "I hate listening to music of Romance of the Three Kingdoms", and even less mentions " I hate listening to the books of Romance of the Three Kingdoms" (for example, there are 20 records about "I hate watching TV dramas of Romance of the Three Kingdoms" in the user dialogue database (for example, it can be recorded through the disgust field 320 of record 9 in Figure 3B), 8 records about " I hate listening to the music of the Romance of the Three Kingdoms" record, and 1 record about "I hate reading the Romance of the Three Kingdoms" (for example, make a record through the disgust field 320 of record 8 in Figure 3B)), then the priority order of the return answer in the candidate list The order is "Books of the Romance of the Three Kingdoms", "TV Dramas of the Romance of the Three Kingdoms" and "Music of the Romance of the Three Kingdoms". That is to say, when the first keyword 509 is "Romance of the Three Kingdoms", the natural language understanding system 520 will select the book "Romance of the Three Kingdoms" as the first return answer 511, and output the first answer 511 based on the first return answer 511. Voice response 507. In an embodiment, a "dislike field 320" may be added in addition to the popularity field 316 in FIG. 3B to record the user's "dislike degree". In another embodiment, when the user's "dislike" information for a certain record is analyzed, one (or other numerical value) can be directly subtracted from the popularity field 316 (or favorite field 318) of the corresponding record, so that the Record user preference when not adding fields. Various implementations of recording user preferences can be applied in the embodiments of the present invention, and the present invention is not limited thereto. Other embodiments regarding the recording and application of user habit data, and the use habits and preferences of users/people to provide answers and return answers will be explained in more detail in FIG. 7A/7B/8 later. the

另一方面,自然语言理解系统520还可依据用户早于自然语言对话系统500提供回报答案前(例如第一语音输入501被播放前,此时用户尚不知自然语言对话系统500将提供哪种回报答案供其选择)所输入的语音输入,以决定至少一回报答案的优先顺序。也就是说,假设有语音输入(例如第四语音输入)被语音取样模块510所接收的时间早于第一语音输入501被播放时,则自然语言理解系统520亦可通过解析第四语音输入中的第四关键字,并在候选列表中优先选取具有与此第四关键字符合的第四回报答案,并依据此第四回报答案输出第四语音应答。  On the other hand, the natural language understanding system 520 can also be based on the fact that before the user provides the return answer earlier than the natural language dialogue system 500 (for example, before the first voice input 501 is played, the user does not yet know what kind of return the natural language dialogue system 500 will provide. answer for its choice) input voice input to determine the priority of at least one reported answer. That is to say, assuming that there is a voice input (for example, the fourth voice input) received by the voice sampling module 510 earlier than when the first voice input 501 is played, the natural language understanding system 520 can also analyze the fourth voice input the fourth keyword, and preferentially select a fourth reported answer that matches the fourth keyword in the candidate list, and output a fourth voice response based on the fourth reported answer. the

举例来说,假设自然语言理解系统520先接收到「我想看电视剧」的第一语音输入501,且没多久(例如隔了几秒)之后,假设自然语言理解系统520又接收到「帮我放三国演义好了」的第四语音输入501。此时,自然语言理解系统520可在第一语音输入501中识别到「电视剧」的第一关键字509,随后又在第四关键字中识别到「三国演义」。因此,自然语言理解系统520会从候选列表,选取关于「三国演义」与「电视剧」的回报答案,并以此第四回报答案据以输出第四语音应答予用户。  For example, assume that the natural language understanding system 520 first receives the first voice input 501 of "I want to watch a TV series", and not long after (for example, after a few seconds), it is assumed that the natural language understanding system 520 receives "Help me" again. Play the Romance of the Three Kingdoms." Input 501 in the fourth voice. At this time, the natural language understanding system 520 can recognize the first keyword 509 of "TV series" in the first voice input 501, and then recognize "Romance of the Three Kingdoms" in the fourth keyword. Therefore, the natural language understanding system 520 selects the reported answers about "Romance of the Three Kingdoms" and "TV drama" from the candidate list, and outputs a fourth voice response to the user based on the fourth reported answer. the

基于上述,自然语言理解系统520可依据来自用户的语音输入,并参酌众人使用习惯、用户喜好、用户习惯或用户所说的前后对话等等信息,而输出较能符合语音输入的请求信息的语音应答予用户。其中,自然语言理解系统520可依据不同的排序方式,例如众人使用习惯、用户喜好、用户习惯或用户所说的前后对话等等方式,来优先排序候选列表中的回报答案。藉此,若来自用户的语音输入较不明确时,自然语言理解系统520可参酌众人使用习惯、用户喜好、用户习惯或用户所说的前后对话,来判断出用户的语音输入501中所意指的意图(例如第一语音输入501中的关键字509的属性、知识领域等等)。换言之,若回报答案与用户曾表达过/众人所指的意图接近时,自然语言理解系统520则会优先考虑此回报答案。如此一来,自然语言 对话系统500所输出的语音应答,可较符合用户的请求信息。  Based on the above, the natural language understanding system 520 can output a voice that is more in line with the voice input request information based on the voice input from the user and taking into account information such as everyone's usage habits, user preferences, user habits, or the user's previous and subsequent conversations. Reply to the user. Among them, the natural language understanding system 520 can prioritize the reported answers in the candidate list according to different sorting methods, such as people's usage habits, user preferences, user habits, or user conversations before and after. In this way, if the voice input from the user is relatively unclear, the natural language understanding system 520 can determine what the user's voice input 501 means by taking into account the usage habits of everyone, user preferences, user habits, or the previous and subsequent dialogues spoken by the user. intent (for example, attributes of keywords 509 in the first speech input 501, domain of knowledge, etc.). In other words, the natural language understanding system 520 will give priority to the reported answer if the reported answer is close to the intention expressed by the user or referred to by the public. In this way, the voice response output by the natural language dialogue system 500 can be more in line with the user's request information. the

综上所述,在本实施例的修正语音应答的方法与自然语言对话系统中,自然语言对话系统可依据来自用户的第一语音输入501而输出对应的第一语音应答507。其中,当自然语言对话系统所输出的第一语音应答507不符合用户的第一语音输入501的第一请求信息503或第一关键字509时,自然语言对话系统可修正原本输出的第一语音应答507,并依据用户后续所提供的第二语音输入501’,进一步选出较符合用户需求的第二语音应答507’。此外,自然语言对话系统还可依据众人使用习惯、用户喜好、用户习惯或用户所说的前后对话等等方式,来优先选出较适当的回报答案,据以输出对应的语音应答予用户。如此一来,倘若用户不满意自然语言对话系统所提供的答案时,自然语言对话系统可依照用户每一次所说出的请求信息自动地进行修正,并回报新的语音应答予用户,藉以增进用户与自然语言对话系统进行对话时的便利性。  To sum up, in the method for modifying voice response and the natural language dialogue system of this embodiment, the natural language dialogue system can output the corresponding first voice response 507 according to the first voice input 501 from the user. Wherein, when the first voice response 507 output by the natural language dialogue system does not conform to the first request information 503 or the first keyword 509 of the user's first voice input 501, the natural language dialogue system can correct the originally output first voice Response 507, and further select a second voice response 507' that better meets the user's needs according to the second voice input 501' provided by the user subsequently. In addition, the natural language dialogue system can also select a more appropriate return answer based on the usage habits of the people, user preferences, user habits, or the preceding and following conversations of the user, and then output the corresponding voice response to the user. In this way, if the user is not satisfied with the answer provided by the natural language dialogue system, the natural language dialogue system can automatically correct it according to the request information spoken by the user each time, and report a new voice response to the user, so as to enhance the user experience. Convenience when having a conversation with a natural language dialogue system. the

接着再以自然语言理解系统100与结构化数据库220等架构与构件,应用于依据与用户的对话场景及上下文、用户使用习惯、众人使用习惯及用户喜好来提供应答及回报答案的实例做的说明。  Next, the structure and components of the natural language understanding system 100 and the structured database 220 are used to illustrate the example of providing answers and reporting answers based on the dialogue scene and context with the user, the user's usage habits, everyone's usage habits, and user preferences. . the

图7A是依照本发明一实施例所绘示的自然语言对话系统的方块图。请参照图7A,自然语言对话系统700包括语音取样模块710、自然语言理解系统720、特性数据库730及语音合成数据库740。事实上,图7A中的语音取样模块710与图5A的语音取样模块510相同、而且自然语言理解系统520与自然语言理解系统720亦相同,所以其执行的功能是相同的。此外,自然语言理解系统720分析请求信息703时,亦可通过对图1的数据化数据库220进行全文检索而获得用户的意图,这部分的技术因前面已针对图1与相关叙述做说明故不再赘述。至于特性数据库730是用以储存由自然语言理解系统720所送来的用户喜好数据715、或提供用户喜好记录717予自然语言理解系统720,这部分在后文会再行详述。而语音合成数据库740则等同语音合成数据库530,用以提供语音输出予用户。在本实施例中,语音取样模块710用以接收语音输入701(即图5A/B的第一/第二语音输入501/501’,为来自用户的语音),而自然语言理解系统720会解析语音输入中的请求信息703(即图5A/B的第一/第二请求信息503/503’),并输出对应的语音应答707(即图5A/B的第一/第二语音应答507/507’)。前述自然语言对话系统700 中的各构件可配置在同一机器中,本发明对此并不加以限定。  FIG. 7A is a block diagram of a natural language dialogue system according to an embodiment of the present invention. Referring to FIG. 7A , the natural language dialogue system 700 includes a speech sampling module 710 , a natural language understanding system 720 , a feature database 730 and a speech synthesis database 740 . In fact, the speech sampling module 710 in FIG. 7A is the same as the speech sampling module 510 in FIG. 5A , and the natural language understanding system 520 is also the same as the natural language understanding system 720 , so the functions they perform are the same. In addition, when the natural language understanding system 720 analyzes the request information 703, it can also obtain the user's intention by performing a full-text search on the digitized database 220 in FIG. Let me repeat. As for the characteristic database 730 is used to store the user preference data 715 sent by the natural language understanding system 720 or provide the user preference record 717 to the natural language understanding system 720, this part will be described in detail later. The speech synthesis database 740 is equivalent to the speech synthesis database 530 and is used to provide speech output to the user. In this embodiment, the voice sampling module 710 is used to receive the voice input 701 (that is, the first/second voice input 501/501' in FIG. 5A/B is the voice from the user), and the natural language understanding system 720 will analyze The request information 703 in voice input (that is, the first/second request information 503/503' of FIG. 5A/B), and output the corresponding voice response 707 (that is, the first/second voice response 507/ 507'). Each component in the aforementioned natural language dialogue system 700 may be configured in the same machine, which is not limited in the present invention. the

自然语言理解系统720会接收从语音取样模块710传来的对语音输入701进行解析后的请求信息703,并且,自然语言理解系统720会根据语音输入701中的一个或多个关键字709来产生包含至少一个回报答案的候选列表,再从候选列表中找出较符合关键字709的一者作为回报答案711,并据以查询语音合成数据库740以找出对应于回报答案711的语音713,最后再依据语音713输出语音应答707。此外,本实施例的自然语言理解系统720可由一个或数个逻辑门组合而成的硬件电路来实作,或以计算机程序码来实作,在此仅为举例说明,并不以此为限。  The natural language understanding system 720 will receive the request information 703 after analyzing the speech input 701 transmitted from the speech sampling module 710, and the natural language understanding system 720 will generate one or more keywords 709 according to the speech input 701 A candidate list containing at least one reported answer, and then find out from the candidate list one that is more in line with the keyword 709 as the reported answer 711, and query the speech synthesis database 740 accordingly to find out the voice 713 corresponding to the reported answer 711, and finally Then output a voice response 707 according to the voice 713 . In addition, the natural language understanding system 720 of this embodiment can be implemented by a hardware circuit composed of one or several logic gates, or implemented by computer program codes, which is only an example and not limited thereto. . the

图7B是依照本发明另一实施例所绘示的自然语言对话系统700’的方块图。图7B的自然语言理解系统720’可包括语音识别模块722与自然语言处理模块724,而语音取样模块710可与语音合成模块726合并在一语音综合处理模块702中。其中,语音识别模块722会接收从语音取样模块710传来对语音输入701进行解析的请求信息703,并转换成一个或多个关键字709。自然语言处理模块724再对这些关键字709进行处理,而获得至少一个候选列表,并且从候选列表中选出一个较符合语音输入701者做为回报答案711。由于此回报答案711是自然语言理解系统720在内部分析而得的答案,所以还必须将转换成文字或语音输出才能输出予用户,于是语音合成模块726会依据回报答案711来查询语音合成数据库740,而此语音合成数据库740例如是记录有文字以及其对应的语音信息,可使得语音合成模块726能够找出对应于回报答案711的语音713,藉以合成出语音应答707。之后,语音合成模块726可将合成的语音通过语音输出接口(未绘示),其中语音输出接口例如为喇叭、扬声器、或耳机等装置)输出,藉以输出语音予用户。应注意的是,在图7A中,自然语言理解系统720是将语音合成模块726并入其中(例如图5B的架构,但语音合成模块726未显示于图7A中),而语音合成模块将利用回报答案711对语音合成数据库740进行查询以取得语音713,作为合成出语音应答707的依据。  Fig. 7B is a block diagram of a natural language dialogue system 700' according to another embodiment of the present invention. The natural language understanding system 720' in FIG. 7B may include a speech recognition module 722 and a natural language processing module 724, and the speech sampling module 710 and the speech synthesis module 726 may be combined in a speech synthesis processing module 702. Wherein, the voice recognition module 722 receives the request information 703 for analyzing the voice input 701 from the voice sampling module 710 and converts it into one or more keywords 709 . The natural language processing module 724 then processes the keywords 709 to obtain at least one candidate list, and selects one that is more consistent with the voice input 701 from the candidate list as the return answer 711 . Since the reported answer 711 is the answer obtained by the internal analysis of the natural language understanding system 720, it must be converted into text or voice output before it can be output to the user, so the speech synthesis module 726 will query the speech synthesis database 740 according to the reported answer 711 , and the speech synthesis database 740 is, for example, recorded text and its corresponding speech information, which enables the speech synthesis module 726 to find the speech 713 corresponding to the reported answer 711 to synthesize the speech response 707 . Afterwards, the speech synthesis module 726 can output the synthesized speech through a speech output interface (not shown), wherein the speech output interface is, for example, a loudspeaker, a loudspeaker, or an earphone, etc.), so as to output the speech to the user. It should be noted that in FIG. 7A , the natural language understanding system 720 incorporates a speech synthesis module 726 (such as the architecture of FIG. 5B , but the speech synthesis module 726 is not shown in FIG. 7A ), and the speech synthesis module will utilize The report answer 711 queries the speech synthesis database 740 to obtain the speech 713 as a basis for synthesizing the speech response 707 . the

在本实施例中,前述自然语言理解系统720中的语音识别模块722、自然语言处理模块724以及语音合成模块726,可分别等同于图5B的语音识别模块522、自然语言处理模块524以及语音合成模块526并提供相同的功能。此外,语音识别模块722、自然语言处理模块724以及语音合成模块726 可与语音取样模块710配置在同一机器中。在其他实施例中,语音识别模块722、自然语言处理模块724以及语音合成模块726亦可分散在不同的机器中(例如计算机系统、伺服器或类似装置/系统)。例如图7B所示的自然语言理解系统720’,语音合成模块726可与语音取样模块710配置在同一机器702,而语音识别模块722、自然语言处理模块724可配置在另一机器。应注意的是,在图7B的架构中,因语音合成模块726与语音取样模块710配置在一机器702中,因此自然语音理解系统720就需要将回报答案711传送至机器702,并由语音合成模块726会将回报答案711送往语音合成数据库740以寻找对应的语音713,作为产生语音应答707的依据。此外,语音合成模块726在依据回报答案711呼叫语音合成数据库740时,可能需要先将回报答案711进行格式转换,然后通过语音合成数据库740所规定的接口进行呼叫,因这部分属于本领域的技术人员所熟知的技术,故在此不予详述。  In this embodiment, the speech recognition module 722, the natural language processing module 724 and the speech synthesis module 726 in the aforementioned natural language understanding system 720 can be respectively equivalent to the speech recognition module 522, the natural language processing module 524 and the speech synthesis module of FIG. 5B Module 526 and provides the same functionality. In addition, the speech recognition module 722, the natural language processing module 724 and the speech synthesis module 726 can be configured in the same machine as the speech sampling module 710. In other embodiments, the speech recognition module 722 , the natural language processing module 724 and the speech synthesis module 726 may also be distributed in different machines (such as computer systems, servers or similar devices/systems). For example, in the natural language understanding system 720' shown in FIG. 7B, the speech synthesis module 726 and the speech sampling module 710 can be configured in the same machine 702, while the speech recognition module 722 and the natural language processing module 724 can be configured in another machine. It should be noted that, in the framework of FIG. 7B, since the speech synthesis module 726 and the speech sampling module 710 are configured in a machine 702, the natural speech understanding system 720 needs to transmit the return answer 711 to the machine 702, and synthesized by speech The module 726 sends the reported answer 711 to the speech synthesis database 740 to find the corresponding speech 713 as a basis for generating the speech response 707 . In addition, when the speech synthesis module 726 calls the speech synthesis database 740 according to the reported answer 711, it may first need to convert the format of the returned answer 711, and then make a call through the interface specified by the speech synthesis database 740, because this part belongs to the technology in the art The technology is well known to the personnel, so it will not be described in detail here. the

以下即结合上述结合图7A的自然语言对话系统700来说明自然语言对话方法。图8是依照本发明一实施例所绘示的自然语言对话方法的流程图。为了方便说明,在此仅举图7A的自然语言对话系统800为例,但本实施例的自然语言对话方法亦可适用于上述图7B的自然语言对话系统700’。与图5/6相较下,图5/6所处理的依据用户的语音输入而自动进行修正所输出的信息,但图7A/7B/8所处理的是依据特性数据库730来记录用户喜好数据715,并据以从候选列表中选择一者做回报答案711,并播放其对应语音予用户。事实上,图5/6与图7A/7B/8的实施方式可择一或并存,发明并不对此加以限制。  The natural language dialogue method will be described below in conjunction with the above natural language dialogue system 700 in conjunction with FIG. 7A . FIG. 8 is a flowchart of a natural language dialogue method according to an embodiment of the present invention. For convenience of description, only the natural language dialog system 800 in FIG. 7A is taken as an example here, but the natural language dialog method of this embodiment can also be applied to the natural language dialog system 700' in FIG. 7B. Compared with Figure 5/6, Figure 5/6 deals with automatically correcting the output information based on the user's voice input, but Figure 7A/7B/8 deals with recording user preference data based on the characteristic database 730 715, and select one from the candidate list as the answer 711 accordingly, and play its corresponding voice to the user. In fact, the implementations of Fig. 5/6 and Fig. 7A/7B/8 can be selected or coexisted, and the invention is not limited thereto. the

请同时参照图7A及图8,于步骤S810中,语音取样模块710会接收语音输入701。其中,语音输入701例如是来自用户的语音,且语音输入701还可具有用户的请求信息703。具体而言,来自用户的语音输入701可以是询问句、命令句或其他请求信息等,例如前面提过的实例「我要看三国演义」、「我要听忘情水的音乐」或「今天温度几度」等等。应注意的是,步骤S802-S806为自然语言对话系统700对用户先前的语音输入储存用户喜好数据715的流程,往后的步骤S810-S840即基于这些先前已储存在特性数据库730的用户喜好数据715进行操作。步骤S802-S806的细节将在后文再行详述,以下将先讲述步骤S820-S840的操作内容。  Please refer to FIG. 7A and FIG. 8 at the same time. In step S810 , the voice sampling module 710 receives the voice input 701 . Wherein, the voice input 701 is, for example, voice from a user, and the voice input 701 may also have request information 703 of the user. Specifically, the voice input 701 from the user can be an inquiry sentence, a command sentence, or other request information, such as the examples mentioned above "I want to watch the Romance of the Three Kingdoms", "I want to listen to the music of Wangqingshui" or "today's temperature How many times" and so on. It should be noted that steps S802-S806 are the process for the natural language dialogue system 700 to store user preference data 715 for the user's previous voice input, and subsequent steps S810-S840 are based on these user preference data previously stored in the characteristic database 730 715 to operate. The details of steps S802-S806 will be described later, and the operation content of steps S820-S840 will be described below. the

于步骤S820中,自然语言理解系统720会解析第一语音输入701中所 包括的至少一个关键字709,进而获得候选列表,其中候选列表具有一个或多个回报答案。详细而言,自然语言理解系统720会解析语音输入701,而获得语音输入701的一个或多个关键字709。举例来说,当用户的语音输入701为「我要看三国演义」时,自然语言理解系统720经过分析后所获得的关键字709例如是「『三国演义』、『看』」(如前所述,还要再分析用户想看的是书籍、电视剧、或电影)。又例如,当用户的语音输入701为「我要听忘情水的歌」时,自然语言理解系统720经过分析后所获得的关键字709例如是「『忘情水』、『听』、『歌』」(如前所述,可以再分析用户想听的是刘德华或李翊君所演唱的版本)。接后,自然语言理解系统720可依据上述关键字709自结构化数据库进行全文检索,而获得至少一笔搜寻结果(可为图3A/3B的其中的至少一笔记录),据以做为候选列表中的回报答案。由于一个关键字709可能属于不同的知识领域(例如电影类、书籍类、音乐类或游戏类等等),且同一知识领域中亦可进一步分成多种类别(例如同一电影或书籍名称的不同作者、同一歌曲名称的不同演唱者、同一游戏名称的不同版本等等),故针对一个关键字709而言,自然语言理解系统720可在分析后(例如对结构化数据库220进行全文检索)得到一笔或多笔相关于此关键字709的搜寻结果,其包含除了关键字709以及关键字709以外的其他信息等等(其他信息的内容如表一所示)。因此从另一观点来看,当用户所输入的第一语音输入701具有多个关键字709时,则表示用户的请求信息703较明确,使得自然语言理解系统720较能分析到与请求信息703接近的搜寻结果(因为若自然语言理解系统720可找到完全匹配结果时,应该就是用户想要的选项了)。  In step S820, the natural language understanding system 720 analyzes at least one keyword 709 included in the first voice input 701, and then obtains a candidate list, wherein the candidate list has one or more reported answers. In detail, the natural language understanding system 720 analyzes the speech input 701 to obtain one or more keywords 709 of the speech input 701 . For example, when the user's voice input 701 is "I want to watch Romance of the Three Kingdoms", the keyword 709 obtained by the natural language understanding system 720 after analysis is, for example, ""Romance of the Three Kingdoms", "watch"" (as mentioned above description, and then analyze whether the user wants to watch a book, a TV series, or a movie). For another example, when the user's voice input 701 is "I want to listen to the song of Wangqingshui", the keyword 709 obtained after analysis by the natural language understanding system 720 is, for example, ""Wangqingshui", "listen", "song" "(As mentioned earlier, it can be further analyzed that what the user wants to listen to is the version sung by Andy Lau or Li Yijun). Then, the natural language understanding system 720 can perform a full-text search from the structured database according to the above-mentioned keywords 709, and obtain at least one search result (which can be at least one record in FIG. 3A/3B), and use it as a candidate Return answers in a list. Since a keyword 709 may belong to different fields of knowledge (such as movies, books, music or games, etc.), and the same field of knowledge can be further divided into multiple categories (such as different authors of the same movie or book title , different singers of the same song title, different versions of the same game title, etc.), so for a keyword 709, the natural language understanding system 720 can obtain a One or more search results related to the keyword 709, which include other information except the keyword 709 and the keyword 709 (the content of other information is shown in Table 1). Therefore, from another point of view, when the first voice input 701 input by the user has multiple keywords 709, it means that the user's request information 703 is relatively clear, so that the natural language understanding system 720 can analyze the request information 703 more clearly. Close search results (because if the natural language understanding system 720 can find a complete match result, it should be the option that the user wants). the

举例来说,当关键字709为「三国演义」时,自然语言理解系统720所分析到的搜寻结果例如是关于「...『三国演义』...『电视剧』」、「...『三国演义』...『书籍』」的记录(其中『电视剧』及『书籍』即为回应结果所指示的用户意图)。又例如,当关键字709为「『忘情水』、『音乐』」时,自然语言理解系统720所分析到的用户意图可能为「...『忘情水』...『音乐』...『刘德华』」、「...『忘情水』...『音乐』...『李翊君』」的记录,其中『刘德华』、『李翊君』为用以指示用户意图的搜寻结果。换言之,在自然语言理解系统720对结构化数据库220进行全文检索后,每一笔搜寻结果可包括关键字709、以及相关于关键字709的其他数据(如表一所示),而自然语言 理解系统720会依据所分析到的搜寻结果转换成包含至少一个回报答案的候选列表以供后续步骤使用。  For example, when the keyword 709 is "Romance of the Three Kingdoms", the search results analyzed by the natural language understanding system 720 are, for example, about "..."Romance of the Three Kingdoms"..."TV drama"", "..." Records of "Romance of the Three Kingdoms"..."Books" (where "TV series" and "Books" are the user intentions indicated by the response results). For another example, when the keyword 709 is ""Wangqingshui", "music"", the user's intent analyzed by the natural language understanding system 720 may be "..."Wangqingshui"..."music"... Records of "Andy Lau", "..."Wangqingshui"..."Music"..."Li Yijun"", where "Andy Lau" and "Li Yijun" are the search results used to indicate the user's intention. In other words, after the natural language understanding system 720 performs a full-text search on the structured database 220, each search result may include keywords 709 and other data related to the keywords 709 (as shown in Table 1), and the natural language understanding The system 720 converts the analyzed search results into a candidate list including at least one reported answer for subsequent steps. the

于步骤S830中,自然语言理解系统720根据特性数据库730所送来的用户喜好记录717(例如依据储存其中的用户喜好数据715所汇整的结果,后面会对此做说明),用以自候选列表中选择一回报答案711,并依据回报答案711输出语音应答707。在本实施例中,自然语言理解系统720可按照一优先顺序(优先顺序包含哪些方式以下会再详述)排列从候选列表中选出回报答案711。而在步骤S840中,依据回报答案711,输出语音应答707(步骤S840)。  In step S830, the natural language understanding system 720 uses the user preference record 717 sent from the characteristic database 730 (for example, the result collected based on the user preference data 715 stored therein, which will be described later), to select from the candidate Select a reported answer 711 from the list, and output a voice response 707 according to the reported answer 711 . In this embodiment, the natural language understanding system 720 can select the reported answer 711 from the candidate list according to a priority order (how the priority order includes will be described in detail below). In step S840, according to the reported answer 711, a voice response 707 is output (step S840). the

举例来说,在一实施例中可以搜寻结果的数量做优先顺序,例如当关键字709为「三国演义」时,假设自然语言理解系统720在分析后,发现在结构化数据库220中关于「...『三国演义』...『书籍』」的记录数量最多,其次为「...『三国演义』...『音乐』」的记录,而关于「...『三国演义』...『电视剧』」的记录数量最少,则自然语言理解系统720会将相关于「三国演义的书籍」的记录做为第一优先回报答案(例如将所有关于「三国演义的书籍」整理成一候选列表,并可依据热度字段316的数值进行排序),相关于「三国演义的音乐」的记录做为第二优先回报答案,相关于「三国演义的电视剧」的记录做为第三优先回报答案。应注意的是,除了搜寻结果的数量外,作为优先顺序的依据还可以是用户喜好、用户习惯、或是众人使用习惯,相关的叙述往后会再详述。  For example, in one embodiment, the number of search results can be prioritized. For example, when the keyword 709 is "Romance of the Three Kingdoms", it is assumed that the natural language understanding system 720 finds in the structured database 220 about ". .."Romance of the Three Kingdoms"..."Books"" has the largest number of records, followed by records of "..."Romance of the Three Kingdoms"..."Music"", and about "..."Romance of the Three Kingdoms".. "TV series"" has the least number of records, then the natural language understanding system 720 will use the records related to "Romance of the Three Kingdoms" as the first priority return answer (for example, organize all books about "Romance of the Three Kingdoms" into a candidate list , and can be sorted according to the value of the popularity field 316), records related to "Romance of the Three Kingdoms" are used as the second priority return answer, and records related to "Romance of the Three Kingdoms" are used as the third priority return answer. It should be noted that, in addition to the number of search results, the priority may also be based on user preferences, user habits, or public usage habits, and related descriptions will be described in detail later. the

为了使本领域的技术人员进一步了解本实施例的自然语言对话方法以及自然语言对话系统,以下再举一实施例进行详细的说明。  In order for those skilled in the art to further understand the natural language dialogue method and the natural language dialogue system of this embodiment, another embodiment is given below for detailed description. the

首先,假设语音取样模块710接收的第一语音输入701为「我要看三国演义」(步骤S810),接着,自然语言理解系统720可解析出为「『看』、『三国演义』」的关键字709,并获得具有多个回报答案的候选列表,其中每一个回报答案具有相关的关键字(步骤S820)与其他信息,亦如上述的表一所示。  First, assume that the first voice input 701 received by the voice sampling module 710 is "I want to watch the Romance of the Three Kingdoms" (step S810), and then, the natural language understanding system 720 can parse out the key words of ""watch", "Romance of the Three Kingdoms"" word 709, and obtain a candidate list with a plurality of reported answers, wherein each reported answer has related keywords (step S820) and other information, as shown in Table 1 above. the

接着,自然语言理解系统720会在候选列表中选出回报答案。假设自然语言理解系统720选取候选列表中的回报答案a(请参考表一)以做为第一回报答案711,则自然语言理解系统720例如是输出「是否播放三国演义的书籍」,作为语音应答707(步骤S830~S840)。  Next, the natural language understanding system 720 selects a reward answer from the candidate list. Assuming that the natural language understanding system 720 selects the return answer a in the candidate list (please refer to Table 1) as the first return answer 711, then the natural language understanding system 720, for example, outputs "whether to play the book of Romance of the Three Kingdoms" as a voice response 707 (steps S830-S840). the

如上所述,自然语言理解系统720还可依照不同评估优先顺序的方法, 来排序候选列表中的回报答案,据此输出对应于回报答案711的语音应答707。举例来说,自然语言理解系统720可依据与使用者的多个对话记录判断用户喜好(例如前面提过的使用用户的正面/负向用语),亦即可利用该用户喜好记录717决定回报答案711的优先顺序。然在解说用户正面/负面用语的使用方式之前,先对用户喜好数据715在储存用户/众人的喜好/厌恶或习惯的方式做说明。  As mentioned above, the natural language understanding system 720 can also sort the reported answers in the candidate list according to different evaluation priority methods, and output the voice response 707 corresponding to the reported answer 711 accordingly. For example, the natural language understanding system 720 can determine the user's preference based on multiple conversation records with the user (such as using the user's positive/negative words mentioned above), that is, the user preference record 717 can be used to determine the return answer 711 priority. However, before explaining how users use positive/negative terms, let us explain how the user preference data 715 stores preferences/dislikes or habits of users/everyone. the

现在依据步骤S802-806关于用户喜好数据715的储存方式。在一实施例中,可在步骤S810接收语音输入701之前,即在步骤S802中接收多个语音输入,也就是先前的历史对话记录,并根据这些先前的多个语音输入701,撷取用户喜好数据715(步骤S804),然后储存在特性数据库730中。事实上,用户喜好数据715亦可储存在结构化数据库220中(或说是将特性数据库730并入结构化数据库220的方式)。举例来说,在一实施例中,可以直接利用图3B的热度字段316来记录用户的喜好,至于热度字段316的记录方式前面已提过(例如某一记录302被匹配时即将其热度字段加一),在此不予赘述。当然,也可以在结构化数据库220另辟字段来储存用户喜好数据715,例如用关键字(例如“三国演义”)为基础,结合用户喜好(例如当用户提到“喜欢”等正向用语以及“厌恶”等负面用语时,可分别在图3B的喜好字段318与厌恶字段320的数值加一),然后计算喜好的数量(例如统计正向用语与等负面用语的数量)。于是自然语言理解系统720对结构化数据库200查询用户喜好记录717时,可以直接查询喜好字段318与厌恶字段320的数值(可查询正向用语与等负面用语各有多少数量),再据以判断用户的喜好(亦即将正面用语及负面用语的统计数值作为用户喜好记录717传送至自然语言理解系统720)。  Now follow steps S802-806 regarding the storage method of the user preference data 715. In one embodiment, before receiving the voice input 701 in step S810, multiple voice inputs, that is, previous historical conversation records may be received in step S802, and user preferences may be retrieved based on these multiple previous voice inputs 701 The data 715 (step S804 ) is then stored in the property database 730 . In fact, the user preference data 715 can also be stored in the structured database 220 (or in a manner of merging the characteristic database 730 into the structured database 220). For example, in one embodiment, the popularity field 316 in FIG. 3B can be directly used to record the user's preferences. As for the recording method of the popularity field 316, it has been mentioned above (for example, when a certain record 302 is matched, its popularity field will be added 1), which will not be repeated here. Of course, it is also possible to create another field in the structured database 220 to store the user preference data 715, such as using keywords (such as "Romance of the Three Kingdoms") as the basis, combined with user preferences (such as when the user mentions positive terms such as "like" and For negative terms such as "disgusting", you can add one to the values in the favorite field 318 and the disgusting field 320 in FIG. Therefore, when the natural language understanding system 720 queries the user preference record 717 in the structured database 200, it can directly query the values of the preference field 318 and the dislike field 320 (you can query the number of positive terms and negative terms), and then make a judgment based on the The user's preferences (that is, the statistics of positive and negative terms are sent to the natural language understanding system 720 as the user preference record 717). the

以下将描述将用户喜好信息715储存在特性数据库730的情形(亦即特性数据库730不并入结构化数据库220)。在一实施例中,用户喜好信息715可使用关键字与用户对此关键字的“喜好”的对应方式来储存,举例来说,用户喜好信息715的储存可直接使用图8B的喜好字段852与厌恶字段862来记录用户个人对某关键字的喜好与厌恶,并以喜好字段854与厌恶字段864来记录众人对此组关键字的喜好与厌恶。例如在图8B中,记录832所储存的关键字「『三国演义』、『书籍』」所对应喜好字段852与厌恶字段862的数值为分别为20与1、记录834所储存的关键字「『三国演义』、『电 视剧』」所对应的喜好字段852与厌恶字段862的数值为分别8与20、记录836所储存的关键字「『三国演义』、『音乐』」所对应的喜好字段852与厌恶字段862的数值为分别为1与8,其皆表示用户个人对于相关关键字的喜好与厌恶数据(例如喜好字段852的数值越高表示越喜欢、厌恶字段862的数值越高表示越厌恶)。此外,记录832所对应喜好字段854与厌恶字段864的数值为分别为5与3、记录834所对应的喜好字段854与厌恶字段864的数值为分别80与20、记录836所对应的喜好字段854与厌恶字段864的数值为分别为2与10,其是表示众人对于相关关键字的喜好与厌恶数据(以“喜好指示”简称之),于是便可依据用户的喜好来增加喜好字段852与厌恶字段862的数值。因此,若用户输入“我想看三国演义的电视剧”的语音时,自然语言理解系统720可将“关键字”「『三国演义』、『电视剧』」与增加喜好字段数值的“喜好指示”合并成用户喜好数据715送往特性数据库730,于是特性数据库730可在记录834的喜好字段852数值进行加一的操作(因为用户想看「『三国演义』、『电视剧』」,表示其喜好度增加)。依据上述记录用户喜好数据的方式,往后当用户又再输入相关的关键字时,例如用户在输入“我要看三国演义”时,自然语言理解系统720可依据关键字“三国演义”在图8B的特性数据库730查询到三笔与“三国演义”相关的记录832/834/836,而特性数据库730可将喜好字段852与厌恶字段862的数值做为用户喜好记录717回传给自然语言理解系统720,于是自然语言理解系统720可依据用户喜好记录717作为判断用户个人的喜好依据。当然,特性数据库730亦可将喜好字段854与厌恶字段864的数值做为用户喜好记录717回传给自然语言理解系统720,只是此时用户喜好记录717将作为判断众人喜好的依据,本发明对用户喜好记录717代表的是用户个人或是众人的喜好并不加以限制。  The following will describe the situation where the user preference information 715 is stored in the property database 730 (that is, the property database 730 is not incorporated into the structured database 220). In one embodiment, the user preference information 715 can be stored using a keyword corresponding to the user's "favorite" for the keyword. For example, the storage of the user preference information 715 can directly use the preference field 852 and The dislike field 862 is used to record the personal likes and dislikes of a certain keyword, and the like field 854 and the dislike field 864 are used to record the likes and dislikes of the group of keywords. For example, in FIG. 8B , the values of the favorite field 852 and the dislike field 862 corresponding to the keyword ""Romance of the Three Kingdoms", "book"" stored in the record 832 are 20 and 1 respectively, and the keyword "" stored in the record 834 The values of the preference field 852 and the dislike field 862 corresponding to "Romance of the Three Kingdoms" and "TV drama" are respectively 8 and 20, and the preference fields corresponding to the keywords "Romance of the Three Kingdoms" and "Music"" stored in record 836 The numerical values of 852 and dislike field 862 are respectively 1 and 8, which all represent the user's personal preference and dislike data for relevant keywords (for example, the higher the numerical value of the liking field 852, the more he likes, and the higher the numerical value of the dislike field 862, the more disgust). In addition, the values of the like field 854 and the dislike field 864 corresponding to the record 832 are 5 and 3 respectively, the values of the like field 854 and the dislike field 864 corresponding to the record 834 are 80 and 20 respectively, and the like field 854 corresponding to the record 836 The numerical value of field 864 and dislike is respectively 2 and 10, which represent the likes and dislikes data (abbreviated as "favorite instruction") of the public for related keywords, so the like field 852 and dislike can be increased according to user's preferences The value of field 862. Therefore, if the user inputs the voice of "I want to watch the TV series of the Romance of the Three Kingdoms", the natural language understanding system 720 can combine the "keywords" ""Romance of the Three Kingdoms", "TV series"" with the "favorite indication" that increases the value of the preference field The user preference data 715 is sent to the characteristic database 730, so the characteristic database 730 can add 1 to the value of the preference field 852 of the record 834 (because the user wants to watch ""Romance of the Three Kingdoms" and "TV series"", it means that the degree of preference has increased ). According to the above-mentioned method of recording user preference data, when the user enters relevant keywords again in the future, for example, when the user inputs "I want to watch Romance of the Three Kingdoms", the natural language understanding system 720 can use the keyword "Romance of the Three Kingdoms" in the picture The characteristic database 730 of 8B finds three records 832/834/836 related to "Romance of the Three Kingdoms", and the characteristic database 730 can return the values of the favorite field 852 and the dislike field 862 as the user preference record 717 to the natural language understanding system 720, so the natural language understanding system 720 can use the user preference record 717 as the basis for judging the user's personal preference. Of course, the characteristic database 730 can also return the values of the like field 854 and the dislike field 864 as the user preference record 717 to the natural language understanding system 720, but at this time the user preference record 717 will be used as a basis for judging everyone's preferences. The user preferences record 717 represents the preferences of the user or the public and is not limited. the

在另一实施例中,喜好字段852与厌恶字段862的数值亦可作为判断用户/众人习惯的依据。举例来说,自然语言理解系统720可在接收用户喜好记录717后,先判断喜好字段852/854与厌恶字段862/864的数值差异,若两个数值相差到了某个临界值之上,表示用户习惯使用特定的方式来进行对话,例如当喜好字段852的数值较厌恶字段862的数值大了10次以上,表示用户特别喜欢使用“正面用语”作对话(此即“用户习惯”的一种记录方式),因此自然语言理解系统720在这个情形下可仅以喜好字段852来选取回报答 案。当自然语言理解系统720使用的是特性数据库730所储存的喜好字段854/厌恶字段864的数值时,表示所判断的是特性数据库730所有用户的喜好记录,而判断结果即可以作为众人使用习惯的参考数据。应注意的是,由特性数据库730回传给自然语言理解系统720的用户喜好记录717可同时包含用户个人的喜好记录(例如喜好字段852/厌恶字段862的数值)与众人的喜好记录(例如喜好字段854/厌恶字段864的数值),本发明对此并不加以限制。  In another embodiment, the values of the like field 852 and the dislike field 862 can also be used as a basis for judging user/people habits. For example, after receiving the user preference record 717, the natural language understanding system 720 can first judge the numerical difference between the like field 852/854 and the dislike field 862/864, if the difference between the two values exceeds a certain critical value, it means that the user It is customary to use a specific method for dialogue. For example, when the value of the favorite field 852 is more than 10 times greater than the value of the dislike field 862, it means that the user particularly likes to use "positive terms" for dialogue (this is a record of "user habits") mode), so the natural language understanding system 720 can only use the preference field 852 to select the reported answer in this situation. When the natural language understanding system 720 uses the value of the like field 854/dislike field 864 stored in the characteristic database 730, it means that what is judged is the preference records of all users in the characteristic database 730, and the judgment result can be used as a basis for everyone's usage habits. reference data. It should be noted that the user preference record 717 sent back from the characteristic database 730 to the natural language understanding system 720 may include both the user's personal preference record (such as the value of the preference field 852/dislike field 862) and the public preference record (such as the preference field 854/value of dislike field 864), the present invention does not limit this. the

至于对基于本次的语音输入所获得的用户喜好数据715的储存,可在步骤S820产生候选列表时(不论是完全匹配或部分匹配),由自然语言对话系统700储存此次在用户语音输入中所取得的用户喜好数据715。例如在步骤S820中,每当关键字可在结构化数据库220中产生匹配结果时,即可判定用户对此匹配结果是有所偏好的倾向,因此可以将“关键字”与“喜好指示”送往特性数据库730,并在其中找到对应的记录后,变更对应记录其对应的喜好字段852/854或厌恶字段862/864数值(例如当用户输入“我想看三国演义的书籍”时,可对图8B的记录832的喜好字段852/854的数值加一)。在又一实施例中,自然语言对话系统700亦可在步骤S830中,于用户选取一回报答案后才储存用户喜好数据715。此外,若当未在特性数据库730找到对应的关键字时,可以建立一新的记录来储存用户喜好数据715。例如当用户输入“我听刘德华的忘情水”的语音并产生关键字「『刘德华』、『忘情水』」时,若进行储存时未在特性数据库730找到对应的记录,所以将在特性数据库730建立新的记录838,并在其对应的喜好字段852/854数值加一。上述的用户喜好数据715储存时机与储存方式,仅为说明之用,本领域的技术人员当可依据实际应用变更本发明所示的实施例,但所有不脱离本发明精神所为的等效修饰仍应包含在本发明权利要求中。  As for the storage of the user preference data 715 obtained based on the voice input this time, when the candidate list is generated in step S820 (whether it is a complete match or a partial match), the natural language dialog system 700 stores the data in the user voice input this time. The obtained user preference data 715 . For example, in step S820, whenever a keyword can generate a matching result in the structured database 220, it can be determined that the user has a preference for the matching result, so the "keyword" and "favorite indication" can be sent to Go to the characteristic database 730, and after finding the corresponding record therein, change the corresponding preference field 852/854 or dislike field 862/864 value of the corresponding record (for example, when the user inputs "I want to see the books of the Romance of the Three Kingdoms", the The value of the preference field 852/854 of record 832 of FIG. 8B is incremented by one). In yet another embodiment, the natural language dialogue system 700 may also store the user preference data 715 after the user selects a report answer in step S830. In addition, if the corresponding keyword is not found in the characteristic database 730 , a new record can be created to store the user preference data 715 . For example, when the user inputs the voice of "I listen to Andy Lau's Wangqingshui" and generates keywords ""Andy Lau", "Wangqingshui"", if the corresponding record is not found in the characteristic database 730 when storing, it will be stored in the characteristic database 730 Create a new record 838, and add one to the value of the corresponding preference field 852/854. The storage timing and storage method of the above-mentioned user preference data 715 are for illustration purposes only, and those skilled in the art may change the embodiments shown in the present invention according to actual applications, but all equivalent modifications that do not depart from the spirit of the present invention should still be included in the claims of the present invention. the

此外,虽然在图8B所示的特性数据库730储存记录832-838的格式与结构化数据库220的记录格式(例如图3A/3B/3C所示者)并不相同,但本发明对各个记录的储存格式并不加以限制。再者,虽然上述实施例仅讲述喜好字段852/854与厌恶字段862/864的储存与使用方式,但在另一实施例中,可在特性数据库730另辟字段872/874以分别储存用户/众人的其他习惯,例如该笔记录对应的数据被下载、引用、推荐、评论、或转介…的次数等数据。在另一实施例中,这些下载、引用、推荐、评论、或转介的次数或数据亦可集中以喜好字段852/854与厌恶字段862/864作储存,例如用户每次对某项 记录提供好的评论或转介予他人参考时可在喜好字段852/854的数值加一、若用户对某项记录提供不好的评论时即可在厌恶字段862/864的数值加一,本发明对记录的数量与字段的数值记录方式皆不予限制。应注意的是,本领域的技术人员应知,因图8B中的喜好字段852、厌恶字段862…等仅与用户个人的选择与喜好相关,所以可将这些用户个人的选择/喜好/厌恶信息储存在用户的移动通讯装置中,而与全体用户相关的喜好字段854、厌恶字段864…等信息就储存在伺服器中,于是亦可节省伺服器的储存空间,也保留用户个人喜好的隐密性。  In addition, although the format of storing records 832-838 in the characteristic database 730 shown in FIG. 8B is different from the record format of the structured database 220 (such as those shown in FIGS. The storage format is not limited. Moreover, although the above-mentioned embodiment only describes the storage and use of the like field 852/854 and the dislike field 862/864, in another embodiment, another field 872/874 can be set up in the characteristic database 730 to store the user/dislike field 872/874 respectively. Other habits of the public, such as the number of downloads, citations, recommendations, comments, or referrals of the data corresponding to this record. In another embodiment, the times or data of these downloads, citations, recommendations, comments, or referrals can also be stored in the favorite field 852/854 and the dislike field 862/864, for example, each time a user provides When a good comment or referral is given to others for reference, one can be added to the value of the favorite field 852/854. If the user provides a bad comment to a certain record, one can be added to the value of the dislike field 862/864. There is no limit to the number of records and the method of recording the values of the fields. It should be noted that those skilled in the art should know that since the like field 852 and dislike field 862 in FIG. Stored in the user's mobile communication device, and the like field 854, dislike field 864... and other information related to all users are stored in the server, thus saving the storage space of the server and keeping the privacy of the user's personal preferences sex. the

以下再利用图7A与图8B对用户的实际使用状况做更进一步的说明。基于多个语音输入701的对话内容,假设用户与自然语言理解系统720进行对话时,经常提到「我讨厌看三国演义的电视剧」,而较少提到「我讨厌听三国演义的音乐」,且更少提到「我讨厌听三国演义的书籍」(例如特性数据库730中记录有20笔关于「我讨厌看三国演义的电视剧」的记录(亦即在图8B中,“三国演义”加“电视剧”的负面用语的数量就是20),8笔关于「我讨厌听三国演义的音乐」的记录(亦即在图8B中,“三国演义”加“音乐”的负面用语的数量是8),以及1笔关于「我讨厌听三国演义的书籍」)(亦即在图8B中,“三国演义”加“书籍”的负面用语的数量是1),因为从特性数据库730所回传的用户喜好记录717将包含这三个负面用语的数量(亦即20、8、1),则自然语言理解系统720会将候选列表中的回报答案711的优先顺序依序排列为「三国演义的书籍」、「三国演义的音乐」、以及「三国演义的电视剧」。也就是说,当关键字709为「三国演义」时,自然语言理解系统720会选择「三国演义」的书籍来做为回报答案711,并依据此回报答案711输出语音应答707。应注意的是,虽然上述是单独使用用户所用过的负面用语的统计数值来列优先顺序,但在另一实施例中,仍可单独使用用户所用过的正面用语的统计数值来列优先顺序(例如先前提到的,喜好字段852的数值比厌恶字段862某一个临界值之上)。  The actual usage status of the user will be further described below using FIG. 7A and FIG. 8B . Based on the dialogue content of multiple voice inputs 701, it is assumed that when the user talks to the natural language understanding system 720, he often mentions "I hate watching TV dramas of the Romance of the Three Kingdoms" and seldom mentions "I hate listening to the music of the Romance of the Three Kingdoms", And mention less "I hate listening to the books of Romance of the Three Kingdoms" (for example, there are 20 records about "TV dramas of Romance of the Three Kingdoms I hate watching" in the characteristic database 730 (that is, in Fig. 8B, "Romance of the Three Kingdoms" plus " The number of negative terms of TV series" is 20), 8 records about "I hate listening to the music of Romance of the Three Kingdoms" (that is, in Figure 8B, the number of negative terms of "Romance of the Three Kingdoms" plus "music" is 8), And 1 about " I hate to listen to the books of Romance of the Three Kingdoms ") (that is, in Fig. 8B, the quantity of the negative term of " Romance of the Three Kingdoms " plus " book " is 1), because the user preference returned from characteristic database 730 Record 717 will contain the quantity of these three negative terms (that is, 20, 8, 1), then the natural language understanding system 720 will arrange the priority order of the return answer 711 in the candidate list as "Books of the Romance of the Three Kingdoms", "The Music of the Romance of the Three Kingdoms", and "The TV Drama of the Romance of the Three Kingdoms". That is to say, when the keyword 709 is "Romance of the Three Kingdoms", the natural language understanding system 720 will select the book "Romance of the Three Kingdoms" as the return answer 711, and output a voice response 707 based on the return answer 711. It should be noted that although the above is to use the statistical value of the negative words used by the user alone to set the priority order, in another embodiment, the statistical value of the positive words used by the user can still be used alone to set the priority order ( For example, as mentioned earlier, the value of the like field 852 is higher than a certain threshold value of the dislike field 862). the

值得一提的是,自然语言理解系统720还可同时依据用户使用的正面用语与负面用语的多寡,以决定回报答案的优先顺序。具体来说,特性数据库730还可记录有用户所表达过的关键字,例如:「喜欢」、「偶像」(以上为正面用语)、「厌恶」或「讨厌」(以上为负面用语)等等。因此,自然语言理解系统720除了可比较用户使用“喜欢”与“厌恶”的相差次数之外,还可自 候选列表中,直接依据上述关键字所对应的正面/负面用语次数来对回报答案进行排序(亦即比较正面用语或负面用语哪者的引用次数较多)。举例来说,假设回报答案中相关于「喜欢」的次数较多(亦即正面用语的引用次数较多、或是喜好字段852的数值比较大),则此回报答案会优先被选取。或者,假设回报答案中相关于「厌恶」的次数较多(亦即负面用语的引用次数较多、或是厌恶字段862的数值比较大),则较后被选取,于是自然语言理解系统720可将所有的回报答案依据上述的优先顺序排列方式整理出一个候选列表。由于部分用户可能偏好使用正面用语(例如喜好字段852的数值特别大)、而另一些用户则习惯使用负面用语(例如厌恶字段862的数值特别大),因此在上述实施例中,因用户喜好记录717将反映个别用户的使用习惯,因此可以提供更符合用户习惯的选项供其选取。  It is worth mentioning that the natural language understanding system 720 can also determine the priority order of the returned answers based on the amount of positive words and negative words used by the user. Specifically, the characteristic database 730 can also record the keywords expressed by the user, such as: "like", "idol" (the above are positive terms), "dislike" or "hate" (the above are negative terms), etc. . Therefore, the natural language understanding system 720, in addition to comparing the difference between the number of times the user uses "like" and "dislike", can also directly evaluate the return answer based on the number of times of positive/negative terms corresponding to the above keywords from the candidate list. Ranking (i.e. comparing positive terms or negative terms which has more citations). For example, if there are more times of "like" in the reported answer (that is, more positive words are cited, or the value of the favorite field 852 is larger), then this reported answer will be selected preferentially. Or, if there are more times related to "dislike" in the reported answer (that is, the number of references to negative terms is larger, or the numerical value of the disgust field 862 is relatively large), then it will be selected later, so the natural language understanding system 720 can All the reported answers are sorted into a candidate list according to the above priority order. Since some users may prefer to use positive terms (for example, the numerical value of the favorite field 852 is particularly large), while other users are accustomed to using negative terms (such as the numerical value of the dislike field 862 is particularly large), so in the above-mentioned embodiment, because the user preference record 717 will reflect the usage habits of individual users, so options more in line with user habits can be provided for them to choose. the

此外,自然语言理解系统720亦可依据众人使用习惯,来排序候选列表中的回报答案711的优先顺序,其中越是关于众人经常使用的答案则优先排列(例如使用图3C的热度字段316做记录)。例如,当关键字709为「三国演义」时,假设自然语言理解系统720找到的回报答案例如为三国演义的电视剧、三国演义的书籍与三国演义的音乐。其中,若众人提到「三国演义」时通常是指「三国演义」的电视剧,较少人会指「三国演义」的电影,而更少人会指「三国演义」的书籍,(例如图8B中,相关记录在喜好字段854的数值分别为80、40、5),则自然语言理解系统720会按照优先顺序排序关于「电视剧」、「电影」、「书籍」的回报答案711。也就是说,自然语言理解系统720会优先选择「三国演义的电视剧」来做为回报答案711,并依据此回报答案711输出语音应答707。至于上述的“众人经常使用的答案优先排列”的方式,可以使用图3C的热度字段316做记录,而记录方式已在上述图3C的相关段落揭示,在此不予赘述。  In addition, the natural language understanding system 720 can also sort the priority order of the reported answers 711 in the candidate list according to the usage habits of the people, and the answers that are more frequently used by the people are prioritized (for example, use the heat field 316 in FIG. 3C to record ). For example, when the keyword 709 is "Romance of the Three Kingdoms", it is assumed that the returned answer found by the natural language understanding system 720 is, for example, TV dramas of Romance of the Three Kingdoms, books of Romance of the Three Kingdoms, and music of Romance of the Three Kingdoms. Among them, when people mention "Romance of the Three Kingdoms", they usually refer to TV dramas of "Romance of the Three Kingdoms", fewer people refer to movies of "Romance of the Three Kingdoms", and even fewer people refer to books of "Romance of the Three Kingdoms", (such as Figure 8B , the values of the relevant records in the preference field 854 are 80, 40, and 5), then the natural language understanding system 720 will sort the returned answers 711 about "tv series", "movie" and "book" according to the order of priority. That is to say, the natural language understanding system 720 will preferentially select "TV series of Romance of the Three Kingdoms" as the return answer 711 , and output a voice response 707 based on the return answer 711 . As for the above-mentioned method of "arranging the answers frequently used by everyone first", you can use the popularity field 316 in FIG. 3C to record, and the recording method has been disclosed in the relevant paragraphs of FIG. 3C above, and will not be repeated here. the

此外,自然语言理解系统720也可依据用户的使用频率以决定回报答案711的优先顺序。具体来说,因自然语言理解系统720可将曾经接收到来自用户的语音输入701记录在特性数据库730,特性数据库730可记录自然语言理解系统720解析用户的语音输入701时,所获得的关键字709以及自然语言理解系统720所有产生过的回报答案711等应答信息。因此自然语言理解系统720在往后选择回报答案711时,可根据特性数据库730中所记录的应答信息(例如用户喜好/厌恶/习惯、甚至是众人喜好/厌恶/习惯…等信息), 按照优先排序找出较符合用户意图(由用户的语音输入所判定)的回报答案711,藉以对应的的语音应答。至于上述“依据用户习惯决定回报答案711的优先顺序”的方式,亦可使用图3C的热度字段316做记录,而记录方式已在上述图3C的相关段落揭示,在此不予赘述。  In addition, the natural language understanding system 720 may also determine the priority order of the reported answers 711 according to the frequency of use of the user. Specifically, because the natural language understanding system 720 can record the voice input 701 received from the user in the characteristic database 730, the characteristic database 730 can record the keywords obtained when the natural language understanding system 720 parses the user's voice input 701 709 and all response information such as the return answer 711 generated by the natural language understanding system 720. Therefore, when the natural language understanding system 720 selects the return answer 711 in the future, according to the response information recorded in the characteristic database 730 (such as user preferences/dislikes/habits, or even everyone's preferences/dislikes/habits...), according to the priority Sort to find the reward answer 711 that is more in line with the user's intention (determined by the user's voice input), so as to provide a corresponding voice response. As for the above method of "determining the priority order of the reported answer 711 according to the user's habit", the popularity field 316 in FIG. 3C can also be used for recording, and the recording method has been disclosed in the relevant paragraph of FIG. 3C above, and will not be repeated here. the

综合上述,自然语言理解系统720可将上述的用户喜好属性(例如正面用语与负面用语)、用户习惯及众人使用习惯储存至特性数据库730中(步骤S806)。也就是说,在步骤S802、步骤S804及步骤S806中,从用户的先前的历史对话记录获知用户喜好数据715,并将所搜集到的用户喜好数据715加入特性数据库730中,此外,也将用户习惯与众人使用习惯储存至特性数据库730,让自然语言理解系统720能利用特性数据库730中丰富信息(例如用户喜好记录717),提供用户更正确的应答。  Based on the above, the natural language understanding system 720 can store the above-mentioned user preference attributes (such as positive terms and negative terms), user habits, and public usage habits in the characteristic database 730 (step S806 ). That is to say, in step S802, step S804, and step S806, the user preference data 715 is obtained from the user's previous historical conversation records, and the collected user preference data 715 is added to the characteristic database 730. Habits and people's usage habits are stored in the characteristic database 730, so that the natural language understanding system 720 can use the rich information in the characteristic database 730 (such as user preference records 717) to provide users with more correct answers. the

接下来对步骤S830的细节做进一步描述。在步骤S830中,是在步骤S810接收语音输入、并在S820解析语音输入的关键字709以获得候选列表后,接着,自然语言理解系统720依据将用户喜好、用户习惯或众人使用习惯等用户喜好记录717,决定至少一回报答案的优先顺序(步骤S880)。如上所述,优先顺序可以通过搜寻的记录数量、用户或众人的正面/负面用语等方式为依据。接着,依据优先顺序自候选列表中选择一回报答案711(步骤S890),至于回报答案的选择亦可如上所述,选择匹配程度最高者、或是优先顺序最高者。之后,依据回报答案711,输出语音应答707(步骤S840)。  Next, the details of step S830 will be further described. In step S830, after the voice input is received in step S810, and the keyword 709 of the voice input is analyzed in S820 to obtain a candidate list, then the natural language understanding system 720 uses user preferences such as user preferences, user habits, or everyone's usage habits, etc. Record 717, determine the priority order of at least one reported answer (step S880). As mentioned above, prioritization can be based on the number of records searched, positive/negative terms by the user or crowd, etc. Next, select a report answer 711 from the candidate list according to the priority order (step S890 ). As for the selection of the report answer, the one with the highest matching degree or the one with the highest priority order can be selected as described above. Afterwards, according to the reported answer 711, a voice response 707 is output (step S840). the

另一方面,自然语言理解系统720还可依据用户更早输入的语音输入701,以决定至少一回报答案的优先顺序。也就是说,假设有另一个语音输入701(例如前面提到的第四语音输入)被语音取样模块710所接收的时间提前于语音应答707被播放时,则自然语言理解系统720亦可通过解析这个语音输入701(亦即第四语音输入)中的关键字(亦即第四关键字709),并在候选列表中,优先选取与此关键字符合的回报答案以做为回报答案711,并依据此回报答案711输出语音应答707。  On the other hand, the natural language understanding system 720 can also determine the priority order of at least one reported answer according to the voice input 701 input earlier by the user. That is to say, assuming that another voice input 701 (such as the aforementioned fourth voice input) is received by the voice sampling module 710 earlier than the voice response 707 is played, the natural language understanding system 720 can also analyze the The keyword (that is, the fourth keyword 709) in this voice input 701 (that is, the fourth voice input), and in the candidate list, preferentially select the return answer that matches the keyword as the return answer 711, and Output a voice response 707 according to the reported answer 711 . the

举例来说,假设自然语言理解系统720先接收到「我想看电视剧」的语音输入701,且隔了几秒之后,假设自然语言理解系统720又接收到「帮我放三国演义好了」的语音输入701。此时,自然语言理解系统720可在第一次的语音输入701中识别到「电视剧」的关键字(第一关键字),且在后面识别到「三国演义」的关键字(第四关键字),因此,自然语言理解系统720会 从候选列表中,选取意图数据是关于「三国演义」与「电视剧」的回报答案,并以此回报答案711据以输出用语音应答707予用户。  For example, suppose the natural language understanding system 720 first receives the voice input 701 of "I want to watch a TV series", and after a few seconds, suppose the natural language understanding system 720 receives the voice input 701 of "Help me play Romance of the Three Kingdoms" Voice input 701. At this time, the natural language understanding system 720 can recognize the keyword (first keyword) of "TV series" in the first voice input 701, and recognize the keyword of "Romance of the Three Kingdoms" (the fourth keyword) later. ), therefore, the natural language understanding system 720 will select from the candidate list, the intention data is the return answer about "Romance of the Three Kingdoms" and "TV drama", and report the answer 711 so as to output the voice response 707 to the user. the

基于上述,自然语言理解系统720可依据来自用户的语音输入,并参酌众人使用习惯、用户喜好、用户习惯或用户所说的前后对话等等信息,而输出较能符合语音输入701的请求信息703的语音应答707予用户。其中,自然语言理解系统720可依据不同的排序方式,例如众人使用习惯、用户喜好、用户习惯或用户所说的前后对话等等方式,来优先排序候选列表中的回报答案。藉此,若来自用户的语音输入701较不明确时,自然语言理解系统720可参酌众人使用习惯、用户喜好、用户习惯或用户所说的前后对话,来判断出用户的语音输入701中所意指的意图(例如语音输入中的关键字709的属性、知识领域等等)。换言之,若回报答案711与用户曾表达过/众人所指的意图接近时,自然语言理解系统720则会优先考虑此回报答案711。如此一来,自然语言对话系统700所输出的语音应答707,可较符合用户的请求信息703。  Based on the above, the natural language understanding system 720 can output the request information 703 that is more in line with the voice input 701 based on the voice input from the user, and taking into account information such as everyone's usage habits, user preferences, user habits, or the conversations that the user said before and after. The voice response 707 is given to the user. Among them, the natural language understanding system 720 can prioritize the reported answers in the candidate list according to different sorting methods, such as people's usage habits, user preferences, user habits, or user conversations before and after. In this way, if the voice input 701 from the user is relatively unclear, the natural language understanding system 720 can determine what the user's voice input 701 means by taking into account the usage habits of everyone, user preferences, user habits, or the conversations before and after the user speaks. Refers to the intent (such as the attribute of the keyword 709 in the speech input, the domain of knowledge, etc.). In other words, if the reported answer 711 is close to the intention expressed by the user or referred to by the public, the natural language understanding system 720 will give priority to the reported answer 711 . In this way, the voice response 707 output by the natural language dialogue system 700 can be more in line with the user's request information 703 . the

应注意的是,虽然上述将特性数据库730与结构化数据库220以不同的数据库做描述,但这两个数据库可合并在一起,本领域的技术人员可依据实际应用进行选择。  It should be noted that although the characteristic database 730 and the structured database 220 are described above as different databases, these two databases can be combined together, and those skilled in the art can choose according to actual applications. the

综上所述,本发明提供一种自然语言对话方法及其系统,自然语言对话系统可依据来自用户的语音输入而输出对应的语音应答。本发明的自然语言对话系统还可依据依据众人使用习惯、用户喜好、用户习惯或用户所说的前后对话等等方式,来优先选出较适当的回报答案,据以输出语音应答予用户,藉以增进用户与自然语言对话系统进行对话时的便利性。  To sum up, the present invention provides a natural language dialogue method and its system. The natural language dialogue system can output a corresponding voice response according to the voice input from the user. The natural language dialog system of the present invention can also preferentially select a more appropriate return answer based on the usage habits of the people, user preferences, user habits, or the previous and subsequent dialogues spoken by the user, so as to output a voice response to the user, thereby Improve the convenience of users when they have a dialogue with the natural language dialogue system. the

接着再以自然语言理解系统100与结构化数据库220等架构与构件,应用于依据用户语音输入的请求信息分析而得的回报答案的数量,决定直接依据数据类型进行操作、或是要求用户提供进一步指示,随后在回报答案只剩一者时,亦可直接依据数据类型进行操作的实例做的说明。提供用户这项选择的好处为系统可以不必替用户进行回报答案的筛选,而是将包含回报答案的候选列表直接提供给用户,并让用户通过回报答案的选取,自己决定想要执行的软件或提供哪种服务,以达到提供用户友好接口(user-friendly interface)的目的。  Then, the structure and components of the natural language understanding system 100 and the structured database 220 are applied to the number of returned answers obtained by analyzing the request information input by the user's voice, and the decision is made to directly operate based on the data type or require the user to provide further information. instructions, and then when there is only one answer left in the report, it can also be explained directly based on the example of the data type. The advantage of providing users with this choice is that the system does not need to screen the reported answers for the user, but directly provides the user with a candidate list containing the reported answers, and allows the user to decide the software or What kind of service is provided to achieve the purpose of providing a user-friendly interface. the

图9为依据本发明一实施例的移动终端装置的系统示意图。请参照图9, 在本实施例中,移动终端装置900包括语音接收单元910、数据处理单元920、显示单元930及存储单元940。数据处理单元920耦接语音接收单元910、显示单元930及存储单元940。语音接收单元910用以接收第一输入语音SP1及第二输入语音SP2且传送至数据处理单元920。上述的第一语音输入SP1与第二语音输入SP2可以是语音输入501、701。显示单元930用以受控于数据处理单元920以显示第一/第二候选列表908/908’。存储单元940用以储存多个数据,这些数据可包含前述的结构化数据库220或特性数据库730的数据,在此不再赘述。此外,存储单元940可以是伺服器或计算机系统内的任何类型的存储器,例如动态随机存储器(DRAM),静态随机存储器(SRAM)、快闪存储器(Flash memory)、只读存储器(ROM)…等,本发明对此并不加以限制,本领域的技术人员可以依据实际需求进行选用。  FIG. 9 is a system diagram of a mobile terminal device according to an embodiment of the present invention. 9, in this embodiment, the mobile terminal device 900 includes a voice receiving unit 910, a data processing unit 920, a display unit 930 and a storage unit 940. The data processing unit 920 is coupled to the voice receiving unit 910 , the display unit 930 and the storage unit 940 . The voice receiving unit 910 is used for receiving the first input voice SP1 and the second input voice SP2 and sending them to the data processing unit 920 . The aforementioned first voice input SP1 and second voice input SP2 may be voice inputs 501 and 701 . The display unit 930 is controlled by the data processing unit 920 to display the first/second candidate list 908/908'. The storage unit 940 is used for storing a plurality of data, which may include the data of the aforementioned structured database 220 or the characteristic database 730 , which will not be repeated here. In addition, the storage unit 940 can be any type of memory in the server or computer system, such as dynamic random access memory (DRAM), static random access memory (SRAM), flash memory (Flash memory), read-only memory (ROM)...etc. , the present invention is not limited to this, and those skilled in the art can select according to actual needs. the

在本实施例中,数据处理单元920的作用如同图1的自然语言理解系统100,会对第一输入语音SP1进行语音识别以产生请求信息902,再对第一请求信息902进行分析与自然语言处理以产生对应第一输入语音SP1的第一关键字904,并且依据第一输入语音SP1对应的第一关键字904从存储单元940的数据(例如搜寻引擎240依据关键字108对结构化数据库220进行全文检索)中找出第一回报答案906(例如第一回报答案511/711)。当所找到的第一回报答案906数量为1时,数据处理单元920可直接依据第一回报答案906所对应的数据的类型进行对应的操作;当第一回报答案906的数量大于1时,数据处理单元920可将第一回报答案906整理成一个第一候选列表908,随后控制显示单元940显示第一候选列表908予用户。在显示第一候选列表908供用户做进一步选取的状况下,数据处理单元920会收到第二输入语音SP2,并对其进行语音识别以产生第二请求信息902’,再对第二请求信息902’进行自然语言处理以产生对应第二输入语音SP2的第二关键字904’,并且依据第二输入语音SP2对应的第二关键字904’从第一候选列表908中选择对应的部分。其中,第一关键字904及第二关键字904’可以由多个关键字所构成。上述对第二语音输入SP2进行分析而产生第二请求信息902’与第二关键字904’的方式,可以运用图5A与7A对第二语音输入进行分析的方式,因此不再赘述。  In this embodiment, the function of the data processing unit 920 is similar to that of the natural language understanding system 100 in FIG. Process to generate the first keyword 904 corresponding to the first input speech SP1, and according to the first keyword 904 corresponding to the first input speech SP1 from the data of the storage unit 940 (such as the search engine 240 according to the keyword 108 to the structured database 220 Find the first reported answer 906 (for example, the first reported answer 511/711) in the full-text search). When the number of first reported answers 906 found is 1, the data processing unit 920 can directly perform corresponding operations according to the type of data corresponding to the first reported answers 906; The unit 920 can organize the first reported answer 906 into a first candidate list 908, and then control the display unit 940 to display the first candidate list 908 to the user. In the situation where the first candidate list 908 is displayed for the user to make further selections, the data processing unit 920 will receive the second input voice SP2, and perform voice recognition on it to generate the second request information 902', and then process the second request information 902' performs natural language processing to generate a second keyword 904' corresponding to the second input speech SP2, and selects a corresponding part from the first candidate list 908 according to the second keyword 904' corresponding to the second input speech SP2. Wherein, the first keyword 904 and the second keyword 904' may be composed of multiple keywords. The above method of analyzing the second voice input SP2 to generate the second request information 902' and the second keyword 904' can use the method of analyzing the second voice input shown in Figs. 5A and 7A, so it will not be repeated here. the

类似地,当第二回报答案906的数量为1时,数据处理单元920会依据第二回报答案906的类型进行对应的操作;当第二回报答案906’的数量大于 1时,数据处理单元920会再依据第二回报答案906’整理成一个第二候选列表908’并控制显示单元940予以显示。接着,再依据用户下一个输入语音以选择对应的部分,再依据后续回报答案的数量进行对应的操作,此可参照上述说明类推得知,在此则不再赘述。  Similarly, when the quantity of the second reported answer 906 is 1, the data processing unit 920 will perform corresponding operations according to the type of the second reported answer 906; when the quantity of the second reported answer 906' is greater than 1, the data processing unit 920 A second candidate list 908' will be organized according to the second reported answer 906' and displayed by controlling the display unit 940. Then, select the corresponding part according to the next voice input by the user, and then perform the corresponding operation according to the number of subsequent reported answers. the

进一步来说,数据处理单元920会将结构化数据库220的多个记录302(例如标题字段304中的各分字段308的数值数据)与第一输入语音SP1对应的第一关键字904进行比对(如前面对图1、图3A、3B、3C所述)。当结构化数据库220某个记录302与第一输入语音SP1的第一关键字904为至少部分匹配时,则将此记录302视为第一输入语音SP1所产生的匹配结果(例如图3A/3B的产生匹配结果)。其中,若数据的类型为音乐档,则记录302可包括歌曲名称、歌手、专辑名称、出版时间、播放次序、…等;若数据的类型为影像档,则记录302可包括影片名称、出版时间、工作人员(包含演出人员)、…等;若数据的类型为网页档,则记录302可包括网站名称、网页类型、对应的使用者帐户、…等;若数据的类型为图片档,则记录302可包括图片名称、图片信息、…等;若数据的类型为名片档,则记录302可包括连络人名称、连络人电话、连络人地址、…等。上述记录302为举例以说明,且记录302可依据实际应用而定,本发明实施例不以此为限。  Further, the data processing unit 920 will compare the multiple records 302 of the structured database 220 (such as the numerical data of each sub-field 308 in the title field 304) with the first keyword 904 corresponding to the first input speech SP1 (As previously described for Figures 1, 3A, 3B, 3C). When a certain record 302 of the structured database 220 is at least partially matched with the first keyword 904 of the first input speech SP1, the record 302 is regarded as the matching result produced by the first input speech SP1 (for example, FIG. 3A/3B produces matching results). Wherein, if the type of data is a music file, then record 302 may include song name, artist, album name, publication time, play order, ... etc.; if the type of data is video file, then record 302 may include movie name, publication time , staff (including performers), ... etc.; if the type of data is a web page file, the record 302 may include the website name, type of web page, corresponding user account, ... etc.; if the type of data is a picture file, record 302 may include picture name, picture information, . The above record 302 is an example for illustration, and the record 302 may be determined according to actual application, and the embodiment of the present invention is not limited thereto. the

接着,数据处理单元920可判断第二输入语音SP2对应的第二关键字904’是否包含指示顺序的一顺序词汇(例如“我要第三个选项”或“我选第三个”)。当第二输入语音SP2对应的第二关键字904’包含指示顺序的顺序词汇时,则数据处理单元920依据顺序词汇自第一候选列表908中选择位于对应位置的数据。当第二输入语音SP2对应的第二关键字904’未包含指示顺序的顺序词汇时,表示用户可能直接选取第一候选列表908中某个第一回报答案906,则数据处理单元920将第一候选列表908中各个第一回报答案906所对应的记录302与第二关键字904’进行比对,以决定第一回报答案906与第二输入语音SP2的对应程度,再依据对应程度决定第一候选列表908中是否有某个第一回报答案906对应第二输入语音SP2。在本发明的一实施例中,数据处理单元920可依据第一回报答案906对第二关键字904’的对应程度(例如完全匹配或是部分匹配的程度),来决定第一候选列表906中是否有某个第一回报答案906与第二输入语音SP2产生对应,藉以简化选择的流程。其中,数据处理单元920可选择数据中对应程度为最高者为对应第二输入语 音SP2。  Next, the data processing unit 920 can determine whether the second keyword 904' corresponding to the second input speech SP2 contains a sequence of words indicating the sequence (such as "I want the third option" or "I choose the third"). When the second keyword 904' corresponding to the second input speech SP2 contains a sequential vocabulary indicating the sequence, the data processing unit 920 selects the data at the corresponding position from the first candidate list 908 according to the sequential vocabulary. When the second keyword 904' corresponding to the second input voice SP2 does not contain the sequential vocabulary indicating the order, it means that the user may directly select a first reported answer 906 in the first candidate list 908, and the data processing unit 920 will convert the first The record 302 corresponding to each first reported answer 906 in the candidate list 908 is compared with the second keyword 904' to determine the degree of correspondence between the first reported answer 906 and the second input speech SP2, and then determine the first key word according to the degree of correspondence. Whether there is a certain first reported answer 906 in the candidate list 908 corresponds to the second input speech SP2. In an embodiment of the present invention, the data processing unit 920 can determine the corresponding degree of the first reported answer 906 to the second keyword 904' (such as the degree of complete match or partial match) to determine the first candidate list 906 Whether there is a first reported answer 906 corresponding to the second input voice SP2, so as to simplify the selection process. Wherein, the data processing unit 920 may select the one with the highest corresponding degree among the data as corresponding to the second input speech SP2. the

举例来说,若第一输入语音SP1为“今天天气怎样”,在进行语音识别及自然语言处理后,第一输入语音SP1对应的第一关键字904会包括“今天”及“天气”,因此数据处理单元920会读取对应今天天气的数据,并且通过显示单元930显示这些天气数据作为第一候选列表908。接着,若第二输入语音SP2为“我要看第3笔数据”或“我选择第3笔”,在进行语音识别及自然语言处理后,第二输入语音SP2对应的第二关键字904’会包括“第3笔”,在此“第3笔”会被解读为指示顺序的顺序词汇,因此数据处理单元920会读取第一候选列表908中第3笔数据(亦即第一候选列表908中的第三笔第一回报答案906),并且再通过显示单元930显示对应的天气信息。或者,若第二输入语音SP2为“我要看北京的天气”或“我选择北京的天气”,在进行语音识别及自然语言处理后,第二输入语音SP2对应的第二关键字904’会包括“北京”及“天气”,因此数据处理单元920会读取第一候选列表908中对应北京的数据。当此项选择所对应的第一回报答案906数量为1时,可直接通过显示单元930显示对应的天气信息;当所选择的第一回报答案906数量大于1时,则再显示进一步的第二候选列表908’(包含至少一个第二回报答案906’)供使用者进一步选择。  For example, if the first input speech SP1 is "what's the weather today", after performing speech recognition and natural language processing, the first keyword 904 corresponding to the first input speech SP1 will include "today" and "weather", so The data processing unit 920 reads the data corresponding to today's weather, and displays the weather data as the first candidate list 908 through the display unit 930 . Then, if the second input voice SP2 is "I want to see the third data" or "I choose the third data", after voice recognition and natural language processing, the second keyword 904' corresponding to the second input voice SP2 Can comprise " the 3rd pen ", here " the 3rd pen " can be interpreted as the sequential vocabulary of indication order, so the data processing unit 920 can read the 3rd pen data in the first candidate list 908 (that is the first candidate list 908 in the third first return answer 906), and then display the corresponding weather information through the display unit 930. Or, if the second input voice SP2 is "I want to see the weather in Beijing" or "I choose the weather in Beijing", after voice recognition and natural language processing, the second keyword 904' corresponding to the second input voice SP2 will be Include "Beijing" and "weather", so the data processing unit 920 will read the data corresponding to Beijing in the first candidate list 908 . When the number of the first reported answer 906 corresponding to this selection is 1, the corresponding weather information can be displayed directly through the display unit 930; when the number of the selected first reported answer 906 is greater than 1, further second candidates are displayed The list 908' (including at least one second reported answer 906') is for the user to further select. the

若第一输入语音SP1为“我要打电话给老张”,在进行语音识别及自然语言处理后,第一输入语音SP1对应的第一关键字904会包括“电话”及“张”,因此数据处理单元920会读取对应姓“张”的连络人数据(可通过对结构化数据库220进行全文检索,再取得对应于记录302的详细数据),并且通过显示单元930显示这些连络人数据(亦即第一回报答案906)的第一候选列表908。接着,若第二输入语音SP2为“第3个老张”或“我选择第3个”,在进行语音识别及自然语言处理后,第二输入语音SP2对应的第二关键字904’会包括“第3个”,在此“第3个”会被解读为指示顺序的顺序词汇,因此数据处理单元920会读取第一候选列表908中的第3笔数据(亦即第三个第一回报答案906),并且依据所选择的数据进行拨接。或者,若第二输入语音SP2为“我选139开头的”,在进行语音识别及自然语言处理后,第二输入语音SP2对应的第二关键字904’会包括“139”及“开头”,在此“139”不会被解读为指示顺序的顺序词汇,因此数据处理单元920会读取第一候选列表908中电话号码为139开头的连络人数据;若第二输入语音SP2为“我要北京的老张”,在 进行语音识别及自然语言处理后,第二输入语音SP2对应的第二关键字904’会包括“北京”及“张”,数据处理单元920会读取第一候选列表908中地址为北京的连络人数据。当所选择的第一回报答案906数量为1时,则依据所选择的数据进行拨接;当所选择的第一回报答案906数量大于1,则将此时所选取的第一回报答案906作为第二回报答案906’,并整理成一第二候选列表908’显示予用户供其选择。  If the first input speech SP1 is "I want to call Lao Zhang", after performing speech recognition and natural language processing, the first keyword 904 corresponding to the first input speech SP1 will include "telephone" and "Zhang", so The data processing unit 920 will read the contact person data corresponding to the surname "Zhang" (the detailed data corresponding to the record 302 can be obtained by performing a full-text search on the structured database 220), and display these contact person data through the display unit 930 A first candidate list 908 of data (ie, a first reported answer 906). Next, if the second input speech SP2 is "the third old Zhang" or "I choose the third", after performing speech recognition and natural language processing, the second keyword 904' corresponding to the second input speech SP2 will include "The 3rd", here "the 3rd" will be interpreted as a sequential vocabulary indicating the order, so the data processing unit 920 will read the 3rd data in the first candidate list 908 (that is, the third first Return answer 906), and dial according to the selected data. Or, if the second input speech SP2 is "I choose the one starting with 139", after speech recognition and natural language processing, the second keyword 904' corresponding to the second input speech SP2 will include "139" and "beginning", Here "139" will not be interpreted as a sequential vocabulary indicating the order, so the data processing unit 920 will read the contact data whose phone number starts with 139 in the first candidate list 908; if the second input voice SP2 is "I Want Beijing's Lao Zhang", after performing speech recognition and natural language processing, the second keyword 904' corresponding to the second input voice SP2 will include "Beijing" and "Zhang", and the data processing unit 920 will read the first candidate The contact person data whose address is Beijing in the list 908. When the selected first return answer 906 quantity is 1, then dial according to the selected data; The answers are reported 906' and sorted into a second candidate list 908' for the user to choose. the

若第一输入语音SP1为“我要找餐厅”,在进行语音识别及自然语言处理后,第一输入语音SP1的第一关键字904会包括“餐厅”,数据处理单元920会读取所有对应于餐厅第一回报答案906,由于这样的指示并不是很明确,所以将通过显示单元930显示第一候选列表908(包含对应于所有餐厅数据的第一回报答案906)予用户,并等用户进一步的指示。接着,若用户通过第二输入语音SP2输入“第3个餐厅”或“我选择第3个”时,在进行语音识别及自然语言处理后,第二输入语音SP2对应的第二关键字904’会包括“第3个”,在此“第3个”会被解读为指示顺序的顺序词汇,因此数据处理单元920会读取第一候选列表908中第3笔数据,并且依据所选择的数据进行显示。或者,若第二输入语音SP2为“我选最近的”,在进行语音识别及自然语言处理后,第二输入语音SP2对应的第二关键字904’会包括“最近的”,因此数据处理单元920会读取第一候选列表908中地址与使用者最近的餐厅数据;若第二输入语音SP2为“我要北京的餐厅”,在进行语音识别及自然语言处理后,第二输入语音SP2对应的第二关键字904’会包括“北京”及“餐厅”,因此数据处理单元920会读取第一候选列表908中地址为北京的餐厅数据。当所选择第一回报答案906的数量为1时,则依据所选择的数据进行显示;当所选择的第一回报答案906数量大于1,则将此时所选取的第一回报答案906作为第二回报答案906’,并整理成一第二候选列表908’显示予使用者供其选择。  If the first input voice SP1 is "I am looking for a restaurant", after performing voice recognition and natural language processing, the first keyword 904 of the first input voice SP1 will include "restaurant", and the data processing unit 920 will read all corresponding For the restaurant's first return answer 906, since such an instruction is not very clear, the first candidate list 908 (comprising the first return answer 906 corresponding to all restaurant data) will be displayed to the user through the display unit 930, and the user will wait for further instructions. Next, if the user inputs "the third restaurant" or "I choose the third" through the second input voice SP2, after performing voice recognition and natural language processing, the second keyword 904' corresponding to the second input voice SP2 Will include "the 3rd", where "the 3rd" will be interpreted as a sequential vocabulary indicating the order, so the data processing unit 920 will read the 3rd data in the first candidate list 908, and according to the selected data to display. Or, if the second input speech SP2 is "I choose the nearest", after performing speech recognition and natural language processing, the second keyword 904' corresponding to the second input speech SP2 will include "the nearest", so the data processing unit 920 will read the restaurant data whose address is closest to the user in the first candidate list 908; if the second input voice SP2 is "I want a restaurant in Beijing", after voice recognition and natural language processing, the second input voice SP2 corresponds to The second keyword 904 ′ includes “Beijing” and “restaurant”, so the data processing unit 920 will read the restaurant data whose address is Beijing in the first candidate list 908 . When the number of the selected first reported answer 906 is 1, it will be displayed according to the selected data; when the selected first reported answer 906 is greater than 1, then the selected first reported answer 906 will be used as the second report Answers 906', and sorted into a second candidate list 908' for the user to choose. the

依据上述,数据处理单元920可依据所选择第一回报答案906(或第二回报答案906’)的数据的类型进行对应的操作。举例来说,当所选择第一回报答案906对应的数据的类型为一音乐档,则数据处理单元920依据所选择的数据进行音乐播放;当所选择的数据的类型为一影像档,则数据处理920单元依据所选择的数据进行影像播放;当所选择的数据的类型为一网页档,则数据处理单元920依据所选择的数据进行显示;当所选择的数据的类型为一图片档,则数据处理单元920依据所选择的数据进行图片显示;当所选择 的数据的类型为一名片档,则数据处理单元920依据所选择的数据进行拨接。  According to the above, the data processing unit 920 can perform corresponding operations according to the data type of the selected first reported answer 906 (or the second reported answer 906'). For example, when the type of data corresponding to the selected first reported answer 906 is a music file, the data processing unit 920 performs music playback according to the selected data; The unit performs video playback according to the selected data; when the type of the selected data is a web page file, the data processing unit 920 displays according to the selected data; when the type of the selected data is a picture file, the data processing unit 920 The picture is displayed according to the selected data; when the type of the selected data is a card file, the data processing unit 920 dials according to the selected data. the

图10为依据本发明一实施例的信息系统的系统示意图。请参照图9及图10,在本实施例中,信息系统1000包括移动终端装置1010及伺服器1020,其中伺服器1020可以是云端伺服器、区域网路伺服器、或其他类似装置,但本发明实施例不以此为限。移动终端装置1010包括语音接收单元1011、数据处理单元1013及显示单元1015。数据处理单元1013耦接语音接收单元1011、显示单元1015及伺服器1020。移动终端装置1010可以是移动电话(Cell phone)、个人数字助理(Personal Digital Assistant,PDA)手机、智能型手机(Smart phone)等移动通讯装置,本发明亦不对此加以限制。语音接收单元1011的功能相似于语音接收单元910,显示单元1015的功能相似于显示单元930。伺服器1020用以储存多个数据且具有语音识别功能。  FIG. 10 is a system diagram of an information system according to an embodiment of the present invention. Please refer to FIG. 9 and FIG. 10. In this embodiment, the information system 1000 includes a mobile terminal device 1010 and a server 1020, wherein the server 1020 can be a cloud server, a local area network server, or other similar devices, but this The embodiments of the invention are not limited thereto. The mobile terminal device 1010 includes a voice receiving unit 1011 , a data processing unit 1013 and a display unit 1015 . The data processing unit 1013 is coupled to the voice receiving unit 1011 , the display unit 1015 and the server 1020 . The mobile terminal device 1010 may be a mobile communication device such as a cell phone (Cell phone), a personal digital assistant (Personal Digital Assistant, PDA), a smart phone (Smart phone), and the present invention is not limited thereto. The function of the voice receiving unit 1011 is similar to that of the voice receiving unit 910 , and the function of the display unit 1015 is similar to that of the display unit 930 . The server 1020 is used for storing a plurality of data and has a voice recognition function. the

在本实施例中,数据处理单元1013会通过伺服器1020对第一输入语音SP1进行语音识别以产生第一请求信息902,再对第一请求信息902进行自然语言处理以产生对应第一输入语音SP1的第一关键字904,并且伺服器1020会依据第一关键字904对结构化数据库220进行全文检索以找出第一回报答案906后并传送至数据处理单元1013。当第一回报答案906的数量为1时,数据处理单元1013会依据第一回报答案906所对应的数据类型进行对应的操作;当第一回报答案906的数量大于1时,数据处理单元1013将此时所选择的第一回报答案906整理成第一候选列表908后控制显示单元1015显示予用户,并等候用户进一步的指示。当用户又输入指示后,接着,数据处理单元1013会通过伺服器1020对第二输入语音PS2进行语音识别以产生第二请求信息902’,再对第二请求信息902’进行分析与自然语言处理以产生对应第二输入语音SP2的第二关键字904’,并且伺服器1020依据第二输入语音SP2对应的第二关键字904’从第一候选列表908中挑选对应的第一回报答案906作为第二回报答案906’,并传送至数据处理单元1013。类似地,当此时对应的第二回报答案906的数量为1时,数据处理单元920会依据第二回报答案906所对应的数据的类型进行对应的操作;当第二回报答案906的数量大于1时,数据处理单元1013会再此时所选择的第二回报答案906整理成一第二候选列表908’后,再控制显示单元1015显示予用户做进一步选择。接着,伺服器1020会再依据后续输入语音选择对应的部分,并 且数据处理单元1013会再依据选择的数据的数量进行对应的操作,此可参照上述说明类推得知,在此则不再赘述。  In this embodiment, the data processing unit 1013 performs speech recognition on the first input speech SP1 through the server 1020 to generate the first request information 902, and then performs natural language processing on the first request information 902 to generate the corresponding first input speech The first keyword 904 of SP1, and the server 1020 will perform a full-text search on the structured database 220 according to the first keyword 904 to find out the first reported answer 906 and send it to the data processing unit 1013 . When the number of the first reported answer 906 is 1, the data processing unit 1013 will perform corresponding operations according to the data type corresponding to the first reported answer 906; when the number of the first reported answer 906 is greater than 1, the data processing unit 1013 will At this time, the selected first reported answer 906 is sorted into a first candidate list 908 and then the display unit 1015 is controlled to display to the user, and further instructions from the user are awaited. After the user inputs the instruction again, the data processing unit 1013 will perform voice recognition on the second input voice PS2 through the server 1020 to generate the second request information 902', and then analyze and process the second request information 902' with natural language To generate the second keyword 904' corresponding to the second input speech SP2, and the server 1020 selects the corresponding first reported answer 906 from the first candidate list 908 according to the second keyword 904' corresponding to the second input speech SP2 as The second returns the answer 906 ′ and sends it to the data processing unit 1013 . Similarly, when the number of the corresponding second reported answer 906 is 1, the data processing unit 920 will perform corresponding operations according to the type of data corresponding to the second reported answer 906; when the number of the second reported answer 906 is greater than At 1, the data processing unit 1013 will organize the second reported answer 906 selected at this time into a second candidate list 908 ′, and then control the display unit 1015 to display it for the user to make further choices. Then, the server 1020 will select the corresponding part according to the subsequent input voice, and the data processing unit 1013 will perform corresponding operations according to the quantity of the selected data. This can be obtained by analogy with reference to the above description, and will not be repeated here. . the

应注意的是,在一实施例中,若依据第一输入语音SP1对应的第一关键字904所选择的第一回报答案906数量为1时,可以直接进行该数据对应的操作。此外,在另一实施例中,可以先输出一个提示予用户,以通知用户所选择的第一回报答案906的对应操作将被执行。再者,在又一实施例中,亦可在依据第二输入语音SP2对应的第二关键字904’所选择的第二回报答案906数量为1时,直接进行该数据对应的操作。当然,在另一实施例中,亦可以先输出一个提示予用户,以通知用户所选择的数据的对应操作将被执行,本发明对此都不加以限制。  It should be noted that, in one embodiment, if the number of the first reported answer 906 selected according to the first keyword 904 corresponding to the first input voice SP1 is 1, the operation corresponding to the data can be performed directly. In addition, in another embodiment, a prompt may be first output to the user to inform the user that the corresponding operation of the selected first report answer 906 will be executed. Furthermore, in yet another embodiment, when the number of the second reported answer 906 selected according to the second keyword 904' corresponding to the second input voice SP2 is 1, the operation corresponding to the data can be directly performed. Certainly, in another embodiment, a prompt may also be first output to the user to inform the user that the corresponding operation of the selected data will be executed, and the present invention is not limited thereto. the

进一步来说,伺服器1020会将结构化数据库220各个记录302与第一输入语音SP1对应的第一关键字904进行比对。当各个记录302与第一关键字904为至少部分匹配时,则将此记录302视为第一输入语音SP1所匹配的数据,并将此记录302作为第一回报答案906的一者。若依据第一输入语音SP1对应的第一关键字904所选择的第一回报答案906数量大于1时,用户可能再通过第二输入语音SP2输入指示。由于用户此时通过第二输入语音SP2所输入的指示可能包含顺序(用以指示选择显示信息中的第几项等顺序)、直接选定显示信息中的某一者(例如直接指示某项信息的内容)、或是依据指示判定用户的意图(例如选取最近的餐厅,就会用显示“最近”的餐厅给用户),于是伺服器1020接着将判断第二输入语音SP2对应的第二关键字904’是否包含指示顺序的一顺序词汇。当第二输入语音SP2对应的第二关键字904’包含指示顺序的顺序词汇时,则伺服器1020依据顺序词汇自第一候选列表908中选择位于对应位置的第一回报答案906。当第二输入语音SP2对应的第二关键字904’未包含指示顺序的顺序词汇时,则伺服器1020将第一候选列表908中各个第一回报答案906与第二输入语音SP2对应的第二关键字904’进行比对,以决定第一回报答案906与第二输入语音SP2的对应程度,并可依据这些对应程度决定第一候选列表908中第一回报答案906是否对应第二输入语音SP2。在本发明的一实施例中,伺服器1020可依据第一回报答案906与第二关键字904’的对应程度决定第一候选列表908中的那些第一回报答案906对应第二输入语音SP2,以简化选择的流程。其中,伺服器1020可选择第一回报答案906中对应程度为最高者为对应于第二输入语 音SP2者。  Further, the server 1020 compares each record 302 of the structured database 220 with the first keyword 904 corresponding to the first input speech SP1. When each record 302 at least partially matches the first keyword 904 , the record 302 is regarded as data matched by the first input speech SP1 , and the record 302 is used as one of the first reported answers 906 . If the number of first reported answers 906 selected according to the first keyword 904 corresponding to the first input voice SP1 is greater than 1, the user may input instructions through the second input voice SP2. Since the instructions input by the user through the second input voice SP2 at this time may include order (to indicate the order of which item in the displayed information to be selected), directly select a certain one in the displayed information (for example, to directly indicate a certain item of information) content), or determine the user’s intention according to the instructions (such as selecting the nearest restaurant, the “nearest” restaurant will be displayed to the user), so the server 1020 will then determine the second keyword corresponding to the second input voice SP2 904' contains a sequential word indicating the sequence. When the second keyword 904' corresponding to the second input voice SP2 contains a sequential vocabulary indicating the order, the server 1020 selects the first reported answer 906 at the corresponding position from the first candidate list 908 according to the sequential vocabulary. When the second keyword 904' corresponding to the second input speech SP2 does not contain the sequence vocabulary indicating the sequence, the server 1020 returns each first answer 906 in the first candidate list 908 to the second keyword corresponding to the second input speech SP2. The keyword 904' is compared to determine the degree of correspondence between the first reported answer 906 and the second input speech SP2, and it can be determined based on these corresponding degrees whether the first reported answer 906 in the first candidate list 908 corresponds to the second input speech SP2 . In an embodiment of the present invention, the server 1020 may determine which first reported answers 906 in the first candidate list 908 correspond to the second input speech SP2 according to the degree of correspondence between the first reported answers 906 and the second keyword 904′, to simplify the selection process. Wherein, the server 1020 may select the one with the highest corresponding degree among the first reported answers 906 as the one corresponding to the second input voice SP2. the

图11为依据本发明一实施例的基于语音识别的选择方法的流程图。请参照图11,在本实施例中,会接收第一输入语音(步骤S1100),并且对第一输入语音SP1进行语音识别以产生第一请求信息902(步骤S1110),再对第一请求信息902进行分析自然语言处理以产生对应第一输入语音的第一关键字904(步骤S1120)。接着,会依据第一关键字904从多个数据中选择对应的第一回报答案906(步骤S1130),并且判断所选择的第一回报答案906数量是否为1(步骤S1140)。当所选择第一回报答案906的数量为1时,亦即步骤S1140的判断结果为“是”,则依据第一回报答案906所对应的数据类型进行对应的操作(步骤S1150)。当所选择第一回报答案906的数量大于1时,亦即步骤S1140的判断结果为“否”,依据所选择第一回报答案906显示第一候选列表908且接收第二输入语音SP2(步骤S1160),并且对第二输入语音进行语音识别以产生第二请求信息902’(步骤S1170),再对第二请求信息902’进行分析与自然语言处理以产生对应第二输入语音的第二关键字904’(步骤S1180)。接着,依据第二请求信息902从第一候选列表908中的第一回报答案906选择对应的部分,再回到步骤S1140判断判断所选择第一回报答案906的数量是否为1(步骤S1190)。其中,上述步骤的顺序为用以说明,本发明实施例不以此为限。并且,上述步骤的细节可参照图9及图10实施例,在此则不再赘述。  FIG. 11 is a flowchart of a selection method based on speech recognition according to an embodiment of the present invention. Please refer to FIG. 11 , in this embodiment, the first input voice is received (step S1100), and voice recognition is performed on the first input voice SP1 to generate the first request information 902 (step S1110), and then the first request information 902 Perform analysis of natural language processing to generate a first keyword 904 corresponding to the first input speech (step S1120 ). Next, the corresponding first reported answer 906 is selected from multiple data according to the first keyword 904 (step S1130 ), and it is determined whether the number of the selected first reported answer 906 is 1 (step S1140 ). When the number of the selected first reported answer 906 is 1, that is, the judgment result of step S1140 is “Yes”, then the corresponding operation is performed according to the data type corresponding to the first reported answer 906 (step S1150 ). When the number of the selected first reported answer 906 is greater than 1, that is, the judgment result of step S1140 is "No", the first candidate list 908 is displayed according to the selected first reported answer 906 and the second input voice SP2 is received (step S1160) , and perform voice recognition on the second input voice to generate the second request information 902' (step S1170), then analyze and process the second request information 902' to generate the second keyword 904 corresponding to the second input voice '(step S1180). Next, select the corresponding part from the first reported answer 906 in the first candidate list 908 according to the second request information 902, and return to step S1140 to determine whether the number of the selected first reported answer 906 is 1 (step S1190). Wherein, the sequence of the above steps is for illustration, and the embodiment of the present invention is not limited thereto. Moreover, the details of the above steps can refer to the embodiment shown in FIG. 9 and FIG. 10 , which will not be repeated here. the

综上所述,本发明实施例的基于语音识别的选择方法及其移动终端装置及信息系统,其对第一输入语音及第二输入语音进行语音识别及自然语言处理以确认第一输入语音及第二输入语音对应的关键字,再依据第一输入语音及第二输入语音对应的关键字对回报答案进行选择。藉此,可提升使用者操作的便利性。  In summary, the voice recognition-based selection method and its mobile terminal device and information system in the embodiment of the present invention perform voice recognition and natural language processing on the first input voice and the second input voice to confirm the first input voice and the second input voice. The keywords corresponding to the second input voice, and then select the returned answer according to the keywords corresponding to the first input voice and the second input voice. Thereby, the convenience of the user's operation can be improved. the

接下来针对本发明所揭示的自然语言理解系统100与结构化数据库220等架构与构件,与辅助启动装置相结合的操作实例做说明。  Next, an operation example of combining the structure and components of the natural language understanding system 100 and the structured database 220 disclosed in the present invention with the auxiliary starting device will be described. the

图12是依照本发明一实施例所绘示的语音操控系统的方块图。请参照图12,语音操控系统1200包括辅助启动装置1210、移动终端装置1220以及伺服器1230。在本实施例中,辅助启动装置1210会通过无线传输信号,来启动移动终端装置1220的语音系统,使得移动终端装置1220根据语音信号与伺服器1230进行沟通。  FIG. 12 is a block diagram of a voice control system according to an embodiment of the present invention. Referring to FIG. 12 , the voice control system 1200 includes an auxiliary activation device 1210 , a mobile terminal device 1220 and a server 1230 . In this embodiment, the auxiliary activation device 1210 activates the voice system of the mobile terminal device 1220 through wireless transmission signals, so that the mobile terminal device 1220 communicates with the server 1230 according to the voice signal. the

详细而言,辅助启动装置1210包括第一无线传输模块1212以及触发模块1214,其中触发模块1214耦接于第一无线传输模块1212。第一无线传输模块1212例如是支援无线相容认证(Wireless fidelity,Wi-Fi)、全球互通微波存取(Worldwide Interoperability for Microwave Access,WiMAX)、蓝芽(Bluetooth)、超宽频(ultra-wideband,UWB)或射频识别(Radio-frequency identification,RFID)等通讯协议的装置,其可发出无线传输信号,以和另一无线传输模块彼此对应而建立无线连结。触发模块1214例如为按钮、按键等。在本实施例中,当使用者按压此触发模块1214产生一触发信号后,第一无线传输模块1212接收此触发信号而启动,此时第一无线传输模块1212会发出无线传输信号,并通过第一无线传输模块1212传送此无线传输信号至移动终端装置1220。在一实施例中,上述的辅助启动装置1210可为一蓝牙耳机。  In detail, the auxiliary starting device 1210 includes a first wireless transmission module 1212 and a trigger module 1214 , wherein the trigger module 1214 is coupled to the first wireless transmission module 1212 . The first wireless transmission module 1212 supports, for example, wireless compatibility certification (Wireless fidelity, Wi-Fi), Worldwide Interoperability for Microwave Access (WiMAX), Bluetooth (Bluetooth), ultra-wideband (ultra-wideband, UWB) or radio-frequency identification (Radio-frequency identification, RFID) and other communication protocol devices, which can send out wireless transmission signals to correspond with another wireless transmission module to establish a wireless connection. The trigger module 1214 is, for example, a button, a key, and the like. In this embodiment, when the user presses the trigger module 1214 to generate a trigger signal, the first wireless transmission module 1212 receives the trigger signal and activates. A wireless transmission module 1212 transmits the wireless transmission signal to the mobile terminal device 1220 . In one embodiment, the above-mentioned auxiliary starting device 1210 may be a bluetooth earphone. the

值得注意的是,虽然目前有些免持的耳机/麦克风亦具有启动移动终端装置1220某些功能的设计,但本发明的另一实施例中,辅助启动装置1210可以不同于上述的耳机/麦克风。上述的耳机/麦克风藉由与移动终端装置的连线,以取代移动终端装置1220上的耳机/麦克风而进行听/通话,启动功能为附加设计,但本发明的辅助启动装置1210“仅”用于开启移动终端装置1220中的语音系统,并不具有听/通话的功能,故内部的电路设计可简化,成本也较低。换言之,相对于上述的免持耳机/麦克风而言,辅助启动装置1210是另外装置,即使用者可能同时具备免持的耳机/麦克风以及本发明的辅助启动装置1210。  It should be noted that although some hands-free earphones/microphones are also designed to activate certain functions of the mobile terminal device 1220, in another embodiment of the present invention, the auxiliary activation device 1210 may be different from the aforementioned earphone/microphone. The above-mentioned earphone/microphone is connected to the mobile terminal device to replace the earphone/microphone on the mobile terminal device 1220 for listening/talking. The activation function is an additional design, but the auxiliary activation device 1210 of the present invention is "only" used When the voice system in the mobile terminal device 1220 is turned on, it does not have the function of listening/talking, so the internal circuit design can be simplified, and the cost is also low. In other words, compared to the above-mentioned hands-free earphone/microphone, the auxiliary activation device 1210 is another device, that is, the user may have both the hands-free earphone/microphone and the auxiliary activation device 1210 of the present invention. the

此外,上述的辅助启动装置1210的形体可以是使用者随手可及的用品,例如戒指、手表、耳环、项链、眼镜等装饰品,即各种随身可携式物品,或者是安装构件,例如为配置于方向盘上的行车配件,不限于上述。也就是说,辅助启动装置1210为“生活化”的装置,通过内部系统的设置,让使用者能够轻易地触碰到触发模块1214,以开启语音系统。举例来说,当辅助启动装置1210的形体为戒指时,使用者可轻易地移动手指来按压戒指的触发模块1214使其被触发。另一方面,当辅助启动装置1210的形体为配置于行车配件的装置时,使用者亦能够在行车期间轻易地触发行车配件装置的触发模块1214。此外,相较于配戴耳机/麦克风进行听/通话的不舒适感,使用本发明的辅助启动装置1210可以将移动终端装置1220中的语音系统开启,甚至 进而开启扩音功能(后将详述),使得使用者在不需配戴耳机/麦克风,仍可直接通过移动终端装置1220进行听/通话。另外,对于使用者而言,这些“生活化”的辅助启动装置1210为原本就会配戴或使用的物品,故在使用上不会有不习惯或是不舒适感的问题,即不需要花时间适应。举例来说,当使用者在厨房做菜时,需要拨打放置于客厅的移动电话时,假设其配戴具有戒指、项链或手表形体的本发明的辅助启动装置1210,就可以轻触戒指、项链或手表以开启语音系统以询问友人食谱细节。虽然目前部份具有启动功能的耳机/麦克风亦可以达到上述的目的,但是在每次做菜的过程中,并非每次都需要拨打电话请教友人,故对于使用者来说,随时配戴耳机/麦克风做菜,以备随时操控移动终端装置可说是相当的不方便。  In addition, the shape of the above-mentioned auxiliary starting device 1210 can be articles that are readily available to the user, such as rings, watches, earrings, necklaces, glasses and other decorations, that is, various portable items, or installation components, such as The driving accessories arranged on the steering wheel are not limited to the above. That is to say, the auxiliary activation device 1210 is a "living" device, through the setting of the internal system, the user can easily touch the trigger module 1214 to activate the voice system. For example, when the auxiliary activation device 1210 is in the form of a ring, the user can easily move his finger to press the trigger module 1214 of the ring to be triggered. On the other hand, when the shape of the auxiliary activation device 1210 is a device configured on a vehicle accessory, the user can easily trigger the trigger module 1214 of the vehicle accessory device during driving. In addition, compared to the uncomfortable feeling of wearing earphones/microphones for listening/talking, using the auxiliary starting device 1210 of the present invention can turn on the voice system in the mobile terminal device 1220, and even turn on the sound amplification function (details will be described later) ), so that the user can still listen/talk directly through the mobile terminal device 1220 without wearing an earphone/microphone. In addition, for users, these "life-like" auxiliary activation devices 1210 are articles that they would wear or use originally, so there will be no problem of unaccustomed or uncomfortable feeling in use, that is, no need to spend Time to adapt. For example, when a user is cooking in the kitchen and needs to make a call to a mobile phone placed in the living room, assuming he wears the auxiliary starting device 1210 of the present invention having a shape of a ring, necklace or watch, he can lightly touch the ring, necklace, etc. or watch to turn on the voice system to ask friends for recipe details. Although some earphones/microphones with activation function can also achieve the above-mentioned purpose, but in the process of cooking every time, it is not necessary to make a phone call to ask friends for advice every time, so for users, wearing earphones/microphones at any time It can be said that it is quite inconvenient to use a microphone for cooking in order to control the mobile terminal device at any time. the

在其他实施例中,辅助启动装置1210还可配置有无线充电电池1216,用以驱动第一无线传输模块1212。进一步而言,无线充电电池1216包括电池单元12162以及无线充电模块12164,其中无线充电模块12164耦接于电池单元12162。在此,无线充电模块12164可接收来自一无线供电装置(未绘示)所供应的能量,并将此能量转换为电力来对电池单元12162充电。如此一来,辅助启动装置1210的第一无线传输模块1212可便利地通过无线充电电池1216来进行充电。  In other embodiments, the auxiliary starting device 1210 can also be configured with a wireless rechargeable battery 1216 for driving the first wireless transmission module 1212 . Furthermore, the wireless charging battery 1216 includes a battery unit 12162 and a wireless charging module 12164 , wherein the wireless charging module 12164 is coupled to the battery unit 12162 . Here, the wireless charging module 12164 can receive energy supplied by a wireless power supply device (not shown), and convert the energy into electricity to charge the battery unit 12162 . In this way, the first wireless transmission module 1212 of the auxiliary starting device 1210 can be conveniently charged by the wireless rechargeable battery 1216 . the

另一方面,移动终端装置1220例如为移动电话(Cell phone)、个人数字助理(Personal Digital Assistant,PDA)手机、智能型手机(Smart phone),或是安装有通讯软件的掌上型计算机(Pocket PC)、平板型计算机(Tablet PC)或笔记型计算机等等。移动终端装置1220可以是任何具备通讯功能的可携式(Portable)移动装置,在此并不限制其范围。此外,移动终端装置1220可使用Android操作系统、Microsoft操作系统、Android操作系统、Linux操作系统等等,不限于上述。  On the other hand, the mobile terminal device 1220 is, for example, a mobile phone (Cell phone), a personal digital assistant (Personal Digital Assistant, PDA) mobile phone, a smart phone (Smart phone), or a palmtop computer (Pocket PC) with communication software installed. ), tablet PC (Tablet PC) or notebook computer, etc. The mobile terminal device 1220 can be any portable (Portable) mobile device with a communication function, and its scope is not limited here. In addition, the mobile terminal device 1220 may use an Android operating system, a Microsoft operating system, an Android operating system, a Linux operating system, etc., and is not limited to the above. the

移动终端装置1220包括第二无线传输模块1222,第二无线传输模块1222能与辅助启动装置1210的第一无线传输模块1212相匹配,并采用相对应的无线通讯协议(例如无线相容认证、全球互通微波存取、蓝芽、超宽频通讯协议或射频识别等通讯协议),藉以与第一无线传输模块1212建立无线连结。值得注意的是,在此所述的“第一”无线传输模块1212、“第二”无线传输模块1222是用以说明无线传输模块配置于不同的装置,并非用以限定本发明。  The mobile terminal device 1220 includes a second wireless transmission module 1222, the second wireless transmission module 1222 can be matched with the first wireless transmission module 1212 of the auxiliary starting device 1210, and adopts a corresponding wireless communication protocol (such as wireless compatible certification, global Interoperability with communication protocols such as microwave access, bluetooth, ultra-wideband communication protocol or radio frequency identification), so as to establish a wireless connection with the first wireless transmission module 1212 . It should be noted that the “first” wireless transmission module 1212 and the “second” wireless transmission module 1222 mentioned here are used to illustrate that the wireless transmission modules are configured in different devices, and are not intended to limit the present invention. the

在其他实施例中,移动终端装置1220还包括语音系统1221,此语音系统1221耦接于第二无线传输模块1222,故使用者触发辅助启动装置1210的触发模块1214后,能通过第一无线传输模块1212与第二无线传输模块1222无线地启动语音系统1221。在一实施例中,此语音系统1221可包括语音取样模块1224、语音合成模块1226以及语音输出接口1227。语音取样模块1224用以接收来自使用者的语音信号,此语音取样模块1224例如为麦克风(Microphone)等接收音讯的装置。语音合成模块1226可查询一语音合成数据库,而此语音合成数据库例如是记录有文字以及其对应的语音的信息,使得语音合成模块1226能够找出对应于特定文字讯息的语音,以将文字讯息进行语音合成。之后,语音合成模块1226可将合成的语音通过语音输出接口1227输出,藉以播放予使用者。上述的语音输出接口1227例如为喇叭或耳机等。  In other embodiments, the mobile terminal device 1220 further includes a voice system 1221, and the voice system 1221 is coupled to the second wireless transmission module 1222. Therefore, after the user triggers the trigger module 1214 of the auxiliary starting device 1210, the voice system 1221 can be transmitted through the first wireless transmission module. The module 1212 and the second wireless transmission module 1222 wirelessly activate the voice system 1221 . In one embodiment, the speech system 1221 may include a speech sampling module 1224 , a speech synthesis module 1226 and a speech output interface 1227 . The voice sampling module 1224 is used for receiving voice signals from the user. The voice sampling module 1224 is, for example, a device for receiving audio such as a microphone (Microphone). The speech synthesis module 1226 can query a speech synthesis database, and the speech synthesis database is, for example, information recorded with text and its corresponding speech, so that the speech synthesis module 1226 can find out the speech corresponding to a specific text message, so as to convert the text message into speech synthesis. Afterwards, the speech synthesis module 1226 can output the synthesized speech through the speech output interface 1227, so as to play it to the user. The aforementioned voice output interface 1227 is, for example, a speaker or an earphone. the

另外,移动终端装置1220还可配置有通讯模块1228。通讯模块1228例如是能传递与接收无线讯号的元件,如射频收发器。进一步而言,通讯模块1228能够让使用者通过移动终端装置1220接听或拨打电话或使用电信业者所提供的其他服务。在本实施例中,通讯模块1228可通过网际网路接收来自伺服器1230的应答信息,并依据此应答信息建立移动终端装置1220与至少一电子装置之间的通话连线,其中所述电子装置例如为另一移动终端装置(未绘示)。  In addition, the mobile terminal device 1220 can also be configured with a communication module 1228 . The communication module 1228 is, for example, a component capable of transmitting and receiving wireless signals, such as a radio frequency transceiver. Furthermore, the communication module 1228 enables the user to receive or make calls or use other services provided by the telecommunications operator through the mobile terminal device 1220 . In this embodiment, the communication module 1228 can receive the response information from the server 1230 through the Internet, and establish a call connection between the mobile terminal device 1220 and at least one electronic device according to the response information, wherein the electronic device For example, it is another mobile terminal device (not shown). the

伺服器1230例如为网路伺服器或云端伺服器等,其具有语音理解模块1232。在本实施例中,语音理解模块1232包括语音识别模块12322以及语音处理模块12324,其中语音处理模块12324耦接于语音识别模块12322。在此,语音识别模块12322会接收从语音取样模块1224传来的语音信号,以将语音信号转换成多个分段语义(例如关键字或字句等)。语音处理模块12324则可依据这些分段语义而解析出这些分段语义所代表的意指(例如意图、时间、地点等),进而判断出上述语音信号中所表示的意思。此外,语音处理模块12324还会根据所解析的结果产生对应的应答信息。在本实施例中,语音理解模块1232可由一个或数个逻辑门组合而成的硬件电路来实作,亦可以是以计算机程序码来实作。值得一提的是,在另一实施例中,语音理解模块1232可配置于移动终端装置1320中,如图13所示的语音操控系统1300。上述伺服器1230的语音理解模块1232的操作,可如图1A的自然语 言理解系统100、图5A/7A/7B的自然语言对话系统500/700/700’。  The server 1230 is, for example, a web server or a cloud server, which has a speech understanding module 1232 . In this embodiment, the speech understanding module 1232 includes a speech recognition module 12322 and a speech processing module 12324 , wherein the speech processing module 12324 is coupled to the speech recognition module 12322 . Here, the voice recognition module 12322 receives the voice signal from the voice sampling module 1224 to convert the voice signal into multiple semantic segments (such as keywords or sentences, etc.). The voice processing module 12324 can analyze the meanings (such as intention, time, location, etc.) represented by these segmentation semantics according to these segmentation semantics, and then determine the meaning expressed in the above-mentioned voice signal. In addition, the speech processing module 12324 will also generate corresponding response information according to the parsed results. In this embodiment, the speech understanding module 1232 may be implemented by a hardware circuit composed of one or several logic gates, or may be implemented by a computer program code. It is worth mentioning that, in another embodiment, the voice understanding module 1232 can be configured in a mobile terminal device 1320 , such as the voice control system 1300 shown in FIG. 13 . The operation of the speech understanding module 1232 of the above-mentioned server 1230 can be as shown in the natural language understanding system 100 of Figure 1A and the natural language dialogue system 500/700/700' of Figures 5A/7A/7B. the

以下即结合上述语音操控系统1200来说明语音操控的方法。图14是依照本发明一实施例所绘示的语音操控方法的流程图。请同时参照图12及图14,于步骤S1402中,辅助启动装置1210发送无线传输信号至移动终端装置1220。详细的说明是,当辅助启动装置1210的第一无线传输模块1212因接收到一触发信号被触发时,此辅助启动装置1210会发送无线传输信号至移动终端装置1220。具体而言,当辅助启动装置1210中的触发模块1214被使用者按压时,此时触发模块1214会因触发信号被触发,而使第一无线传输模块1212发送无线传输信号至移动终端装置1220的第二无线传输模块1222,藉以使得第一无线传输模块1212通过无线通讯协议与第二无线传输模块1222连结。上述的辅助启动装置1210仅用于开启移动终端装置1220中的语音系统,并不具有听/通话的功能,故内部的电路设计可简化,成本也较低。换言之,相对于一般移动终端装置1220所附加的免持耳机/麦克风而言,辅助启动装置1210是另一装置,即使用者可能同时具备免持的耳机/麦克风以及本发明的辅助启动装置1210。  The method of voice control will be described below in conjunction with the above-mentioned voice control system 1200 . FIG. 14 is a flowchart of a voice control method according to an embodiment of the present invention. Please refer to FIG. 12 and FIG. 14 at the same time. In step S1402 , the auxiliary starting device 1210 sends a wireless transmission signal to the mobile terminal device 1220 . In detail, when the first wireless transmission module 1212 of the auxiliary starting device 1210 is triggered by receiving a trigger signal, the auxiliary starting device 1210 will send a wireless transmission signal to the mobile terminal device 1220 . Specifically, when the trigger module 1214 in the auxiliary starting device 1210 is pressed by the user, the trigger module 1214 will be triggered by the trigger signal at this time, so that the first wireless transmission module 1212 will send a wireless transmission signal to the mobile terminal device 1220. The second wireless transmission module 1222 enables the first wireless transmission module 1212 to connect with the second wireless transmission module 1222 through a wireless communication protocol. The above-mentioned auxiliary activation device 1210 is only used to activate the voice system in the mobile terminal device 1220, and does not have the function of listening/talking, so the internal circuit design can be simplified and the cost is low. In other words, compared to the hands-free earphone/microphone attached to the general mobile terminal device 1220, the auxiliary activation device 1210 is another device, that is, the user may have both the hands-free earphone/microphone and the auxiliary activation device 1210 of the present invention. the

值得一提的是,上述的辅助启动装置1210的形体可以是使用者随手可及的用品,例如戒指、手表、耳环、项链、眼镜等各种随身可携式物品,或者是安装构件,例如为配置于方向盘上的行车配件,不限于上述。也就是说,辅助启动装置1210为“生活化”的装置,通过内部系统的设置,让使用者能够轻易地触碰到触发模块1214,以开启语音系统1221。因此,使用本发明的辅助启动装置1210可以将移动终端装置1220中的语音系统1221开启,甚至进而开启扩音功能(后将详述),使得使用者在不需配戴耳机/麦克风,仍可直接通过移动终端装置1220进行听/通话。此外,对于使用者而言,这些“生活化”的辅助启动装置1210为原本就会配戴或使用的物品,故在使用上不会有不习惯或是不舒适感的问题。  It is worth mentioning that the shape of the above-mentioned auxiliary starting device 1210 can be an article that is readily available to the user, such as various portable items such as rings, watches, earrings, necklaces, glasses, etc., or an installation component, such as The driving accessories arranged on the steering wheel are not limited to the above. That is to say, the auxiliary starting device 1210 is a "lifelike" device, through the setting of the internal system, the user can easily touch the trigger module 1214 to activate the voice system 1221 . Therefore, the voice system 1221 in the mobile terminal device 1220 can be turned on by using the auxiliary starting device 1210 of the present invention, and even the sound amplification function can be turned on (will be described in detail later), so that the user can still use the audio system without wearing an earphone/microphone Listen/call directly through the mobile terminal device 1220 . In addition, for the user, these "living-oriented" auxiliary activation devices 1210 are items that would be worn or used originally, so there will be no problem of unaccustomed or uncomfortable feeling in use. the

此外,第一无线传输模块1212与第二无线传输模块1222皆可处于睡眠模式或工作模式。其中,睡眠模式指的是无线传输模块为关闭状态,亦即无线传输模块不会接收/侦测无线传输信号,而无法与其它无线传输模块连结。工作模式指的是无线传输模块为开启状态,亦即无线传输模块可不断地侦测无线传输信号,或随时发送无线传输信号,而能够与其它无线传输模块连结。在此,当触发模块1214被触发时,倘若第一无线传输模块1212处于睡眠模 式,则触发模块1214会唤醒第一无线传输模块1212,使第一无线传输模块1212进入工作模式,并使第一无线传输模块1212发送无线传输信号至第二无线传输模块1222,而让第一无线传输模块1212通过无线通讯协议与移动终端装置1220的第二无线传输模块1222连结。  In addition, both the first wireless transmission module 1212 and the second wireless transmission module 1222 can be in sleep mode or working mode. Wherein, the sleep mode refers to that the wireless transmission module is in a closed state, that is, the wireless transmission module does not receive/detect wireless transmission signals, and cannot be connected with other wireless transmission modules. The working mode means that the wireless transmission module is in an open state, that is, the wireless transmission module can continuously detect wireless transmission signals, or send wireless transmission signals at any time, and can be connected with other wireless transmission modules. Here, when the trigger module 1214 is triggered, if the first wireless transmission module 1212 is in sleep mode, the trigger module 1214 will wake up the first wireless transmission module 1212, make the first wireless transmission module 1212 enter the working mode, and make the second A wireless transmission module 1212 sends a wireless transmission signal to the second wireless transmission module 1222, so that the first wireless transmission module 1212 is connected with the second wireless transmission module 1222 of the mobile terminal device 1220 through a wireless communication protocol. the

另一方面,为了避免第一无线传输模块1212持续维持在工作模式而消耗过多的电力,在第一无线传输模块1212进入工作模式后的预设时间(例如为5分钟)内,倘若触发模块1214未再被触发,则第一无线传输模块1212会自工作模式进入睡眠模式,并停止与移动终端装置1220的第二无线传输模块1220连结。  On the other hand, in order to prevent the first wireless transmission module 1212 from continuing to consume too much power in the working mode, within a preset time (for example, 5 minutes) after the first wireless transmission module 1212 enters the working mode, if the trigger If 1214 is not triggered again, the first wireless transmission module 1212 will enter the sleep mode from the working mode, and stop connecting with the second wireless transmission module 1220 of the mobile terminal device 1220 . the

之后,于步骤S1404中,移动终端装置1220的第二无线传输模块1222会接收无线传输信号,以启动语音系统1221。接着,于步骤S1406,当第二无线传输模块1222侦测到无线传输信号时,移动终端装置1220可启动语音系统1221,而语音系统的1221取样模块1224可开始接收语音信号,例如「今天温度几度」、「打电话给老王。」、「请查询电话号码。」等等。  After that, in step S1404 , the second wireless transmission module 1222 of the mobile terminal device 1220 receives the wireless transmission signal to activate the voice system 1221 . Next, in step S1406, when the second wireless transmission module 1222 detects a wireless transmission signal, the mobile terminal device 1220 can start the voice system 1221, and the 1221 sampling module 1224 of the voice system can start to receive voice signals, such as "What is the temperature today?" Degree", "Call Lao Wang.", "Please check the phone number." and so on. the

于步骤S1408,语音取样模块1224会将上述语音信号传送至伺服器1230中的语音理解模块1232,以通过语音理解模块1232解析语音信号以及产生应答信息。进一步而言,语音理解模块1232中的语音识别模块12322会接收来自语音取样模块1224的语音信号,并将语音信号分割成多个分段语义,而语音处理模块12324则会对上述分段语义进行语音理解,以产生用以回应语音信号的应答信息。  In step S1408, the voice sampling module 1224 transmits the voice signal to the voice understanding module 1232 in the server 1230, so that the voice understanding module 1232 analyzes the voice signal and generates response information. Further, the speech recognition module 12322 in the speech understanding module 1232 will receive the speech signal from the speech sampling module 1224, and divide the speech signal into a plurality of segmentation semantics, and the speech processing module 12324 will perform the above segmentation semantics Speech understanding to generate response information in response to speech signals. the

在本发明的另一实施例中,移动终端装置1220更可接收语音处理模块12324所产生的应答信息,据以通过语音输出接口1227输出应答信息中的内容或执行应答信息所下达的操作。于步骤S1410,移动终端装置1220的语音合成模块1226会接收语音理解模块1232所产生的应答信息,并依据应答信息中的内容(例如词汇或字句等)进行语音合成,而产生语音应答。并且,于步骤S1412,语音输出接口1227会接收并输出此语音应答。  In another embodiment of the present invention, the mobile terminal device 1220 can further receive the response information generated by the voice processing module 12324 , so as to output the content of the response information through the voice output interface 1227 or execute the operation issued by the response information. In step S1410, the speech synthesis module 1226 of the mobile terminal device 1220 receives the response information generated by the speech understanding module 1232, and performs speech synthesis according to the content (such as words or sentences) in the response information to generate a speech response. And, in step S1412, the voice output interface 1227 receives and outputs the voice response. the

举例而言,当使用者按压辅助启动装置1210中的触发模块1214时,第一无线传输模块1212则会发送无线传输信号至第二无线传输模块1222,使得移动终端装置1220启动语音系统1221的语音取样模块1224。在此,假设来自使用者的语音信号为一询问句,例如「今天温度几度」,则语音取样模块1224便会接收并将此语音信号传送至伺服器1230中的语音理解模块 1232进行解析,且语音理解模块1232可将解析所产生的应答信息传送回移动终端装置1220。假设语音理解模块1232所产生的应答信息中的内容为「30℃」,则语音合成模块1226会将此「30℃」的讯息合成为语音应答,且语音输出接口1227能将此语音应播报给使用者。  For example, when the user presses the trigger module 1214 in the auxiliary activation device 1210, the first wireless transmission module 1212 will send a wireless transmission signal to the second wireless transmission module 1222, so that the mobile terminal device 1220 activates the voice of the voice system 1221 Sampling module 1224 . Here, assuming that the voice signal from the user is an inquiry sentence, such as "what's the temperature today", the voice sampling module 1224 will receive and send the voice signal to the voice understanding module 1232 in the server 1230 for analysis, And the speech understanding module 1232 can send the response information generated by the analysis back to the mobile terminal device 1220 . Assuming that the content of the response information generated by the speech understanding module 1232 is "30°C", the speech synthesis module 1226 will synthesize the message of "30°C" into a speech response, and the speech output interface 1227 can broadcast the speech response to user. the

在另一实施例中,假设来自使用者的语音信号为一命令句,例如「打电话给老王。」,则语音理解模块1232中可识别出此命令句为「拨电话给老王的请求」。此外,语音理解模块1232会再产生新的应答信息,例如「请确认是否拨给老王」,并将此新的应答信息传送至移动终端装置1220。在此,语音合成模块1226会将此新的应答信息合成为语音应答,并通过语音输出接口1227播报于使用者。更进一步地说,当使用者的应答为「是」的类的肯定答案时,类似地,语音取样模块1224可接收并传送此语音信号至伺服器1230,以让语音理解模块1232进行解析。语音理解模块1232解析结束后,便会在应答信息记录有一拨号指令信息,并传送至移动终端装置1220。此时,通讯模块1228则会依据电话数据库所记录的联络人信息,查询出「老王」的电话号码,以建立移动终端装置1220与另一电子装置之间的通话连线,亦即拨号给「老王」。  In another embodiment, assuming that the voice signal from the user is a command sentence, such as "call Lao Wang.", then the voice understanding module 1232 can recognize that the command sentence is a request for "call Lao Wang." ". In addition, the speech understanding module 1232 will generate a new response message, such as "please confirm whether to call Lao Wang", and send the new response message to the mobile terminal device 1220 . Here, the voice synthesis module 1226 synthesizes the new response information into a voice response, and broadcasts it to the user through the voice output interface 1227 . Furthermore, when the user's answer is an affirmative answer of "yes", similarly, the voice sampling module 1224 can receive and transmit the voice signal to the server 1230 for the voice understanding module 1232 to analyze. After the speech understanding module 1232 completes the analysis, it will record a dial instruction message in the response message and send it to the mobile terminal device 1220 . At this time, the communication module 1228 will query the phone number of "Lao Wang" according to the contact information recorded in the phone database, so as to establish a call connection between the mobile terminal device 1220 and another electronic device, that is, dial to "Old King". the

在其他实施例中,除上述的语音操控系统1200外,亦可利用语音操控系统1300或其他类似的系统,进行上述的操作方法,并不以上述的实施例为限。  In other embodiments, in addition to the above-mentioned voice control system 1200 , the voice control system 1300 or other similar systems can also be used to perform the above-mentioned operation method, which is not limited to the above-mentioned embodiments. the

综上所述,在本实施例的语音操控系统与方法中,辅助启动装置能够无线地开启移动终端装置的语音功能。而且,此辅助启动装置的形体可以是使用者随手可及的“生活化”的用品,例如戒指、手表、耳环、项链、眼镜等装饰品,即各种随身可携式物品,或者是安装构件,例如为配置于方向盘上的行车配件,不限于上述。如此一来,相较于目前另外配戴免持耳机/麦克风的不舒适感,使用本发明的辅助启动装置1210来开启移动终端装置1220中的语音系统将更为便利。  To sum up, in the voice control system and method of this embodiment, the auxiliary activation device can wirelessly activate the voice function of the mobile terminal device. Moreover, the shape of this auxiliary starting device can be "life-like" articles that are readily available to the user, such as rings, watches, earrings, necklaces, glasses and other decorations, that is, various portable items, or installation components , for example, a driving accessory arranged on the steering wheel, not limited to the above. In this way, it will be more convenient to use the auxiliary activation device 1210 of the present invention to activate the voice system in the mobile terminal device 1220 than the discomfort of wearing a hands-free earphone/microphone at present. the

值得注意的是,上述具有语音理解模块的伺服器1230可能为网路伺服器或云端伺服器,而云端伺服器可能会涉及到使用者的隐私权的问题。例如,使用者需上传完整的通讯录至云端伺服器,才能完成如拨打电话、发简讯等与通讯录相关的操作。即使云端伺服器采用加密连线,并且即用即传不保存,还是难以消除使用者的担优。据此,以下提供另一种语音操控的方法及其对 应的语音交互系统,移动终端装置可在不上传完整通讯录的情况下,与云端伺服器来执行语音交互服务。为了使本发明的内容更为明了,以下特举实施例作为本发明确实能够据以实施的范例。  It should be noted that the above-mentioned server 1230 with the speech understanding module may be a network server or a cloud server, and the cloud server may involve the issue of privacy of the user. For example, the user needs to upload the complete address book to the cloud server in order to complete operations related to the address book such as making calls and sending text messages. Even if the cloud server adopts an encrypted connection, and the instant transmission is not saved, it is still difficult to eliminate the user's worries. Accordingly, another method of voice control and its corresponding voice interaction system are provided below. The mobile terminal device can perform voice interaction services with the cloud server without uploading the complete address book. In order to make the content of the present invention clearer, the following specific examples are given as examples in which the present invention can actually be implemented. the

虽然本发明已以实施例揭示如上,然其并非用以限定本发明,本领域的技术人员在不脱离本发明的精神和范围的前提下,可作些许的更动与润饰,故本发明的保护范围是以本发明的权利要求为准。  Although the present invention has been disclosed above with embodiments, it is not intended to limit the present invention. Those skilled in the art can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the present invention The scope of protection is based on the claims of the present invention. the

Claims (20)

1. system of selection based on speech recognition comprises:
Receive one first phonetic entry;
This first phonetic entry is carried out speech recognition to produce one first key word;
This first key word is carried out natural language processing to produce one first user view that should first phonetic entry;
Select at least one first repayment answer according to this first user view;
When the quantity of this first repayment answer of selecting is 1, carry out corresponding operation according to the type of selected this first repayment answer;
When the quantity of this first repayment answer of selecting greater than 1 the time, show that one comprises first candidate data of this first repayment answer;
Receiving one second phonetic entry, this second phonetic entry is carried out speech recognition to produce one second key word;
This second key word is carried out natural language processing to produce should second phonetic entry, one second user view; And
From this first repayment answer of first candidate list, select the second repayment answer according to this second user view.
2. the system of selection based on speech recognition as claimed in claim 1, wherein select the step of this first repayment answer to comprise according to this first user view:
The record and this first user view that are stored in a structured database are compared; And
When this record and this first user view are at least part of coupling, then this record is considered as this first repayment answer corresponding to this first phonetic entry.
3. the system of selection based on speech recognition as claimed in claim 1, wherein according to selecting the step of this second repayment answer to comprise in this first repayment answer from this first candidate list of this second user view:
Judge whether this second user view comprises an order vocabulary of indication order;
When this second user view comprises this order vocabulary of indication order, then in this first candidate list, select to be positioned at the first repayment answer of correspondence position according to this order vocabulary;
When this second user view does not comprise this order vocabulary of indication order, then will compare corresponding to record and this second user view of respectively this first repayment answer in this first candidate list; And
Which this first repayment answer that determines this first candidate list according to this comparison result is to should second phonetic entry.
4. the system of selection based on speech recognition as claimed in claim 3 wherein comprises step that should second phonetic entry according to which determines in this first candidate list this first repayment answer of this comparison result:
Select in this first repayment answer this matching degree to be should second phonetic entry for the soprano.
5. the system of selection based on speech recognition as claimed in claim 1, wherein the step of carrying out corresponding operation according to selected this first type of repaying answer comprises:
When selected this first type of repaying answer is music shelves, then music is carried out in selected this first repayment answer;
When selected first type of repaying answer is image shelves, then image is carried out in the selected first repayment answer and play;
When selected first type of repaying answer is webpage shelves, then the selected first repayment answer is shown;
When selected first type of repaying answer is picture shelves, then picture is carried out in the selected first repayment answer and show; And
When selected first type of repaying answer is a cardfile, then the selected first repayment answer is dialed and connected.
6. mobile terminal apparatus comprises:
One voice receiving unit receives one first phonetic entry and one second phonetic entry;
One display unit;
One storage unit is in order to store a plurality of data; And
One data processing unit, couple this voice receiving unit, this display unit and this storage unit, this data processing unit carries out speech recognition to produce one first key word to this first phonetic entry, this first key word is carried out natural language processing to produce one first user view that should first voice, and select the first repayment answer according to this first user view, when the quantity of this first repayment answer of selecting is 1, this data processing unit carries out corresponding operation according to the type of selected this first repayment answer, when the quantity of this first repayment answer of selecting greater than 1 the time, this data processing unit is controlled this display unit and is shown this first candidate list that comprises this first repayment answer, and this data processing unit carries out speech recognition to produce one second key word to these second voice, this second key word is carried out natural language processing producing one second user view that should second phonetic entry, and according to selecting the second repayment answer in this first repayment answer from this first candidate list of this second user view.
7. mobile terminal apparatus as claimed in claim 6, wherein this data processing unit will be compared corresponding to record and this first user view of respectively this first repayment answer, when respectively this record and this first user view are at least part of coupling, then this record is considered as this first repayment answer corresponding to this first phonetic entry.
8. mobile terminal apparatus as claimed in claim 6, wherein this data processing unit judges whether this second user view comprises an order vocabulary of indication order, when this second user view comprises this order vocabulary of indication order, then this data processing unit selects to be positioned at this first repayment answer of correspondence position in this first candidate list according to this order vocabulary, when this second user view does not comprise this order vocabulary of indication order, then this data processing unit with in this first candidate list respectively this first repayment corresponding this record of answer and this second user view compare to determine this first matching degree of repaying answer and this second phonetic entry, and this first repays answer corresponding to this second phonetic entry which to determine in this first candidate list according to these matching degrees.
9. mobile terminal apparatus as claimed in claim 8, wherein this data processing unit selects in this first repayment answer the matching degree soprano for to should second phonetic entry.
10. mobile terminal apparatus as claimed in claim 6, wherein the type when selected this first repayment answer is music shelves, then this data processing unit carries out music according to selected this first repayment answer, when selected this first type of repaying answer is image shelves, then this data processing unit carries out the image broadcast according to selected this first repayment answer, when selected this first type of repaying answer is webpage shelves, then this data processing unit shows according to selected this first repayment answer, when selected this first type of repaying answer is picture shelves, then this data processing unit carries out the picture demonstration according to selected this first repayment answer, and when selected this first the repayment answer type be a cardfile, then data processing unit according to selected this first the repayment answer dial and connect.
11. an infosystem comprises:
One servomechanism is in order to store a plurality of data and to have speech identifying function; And
A kind of mobile terminal apparatus comprises:
One voice receiving unit receives one first phonetic entry and one second phonetic entry;
One display unit;
One data processing unit, couple this voice receiving unit, this display unit and this servomechanism, this data processing unit carries out speech recognition to produce one first key word by this servomechanism to this first phonetic entry, first key word is carried out natural language processing to produce one first user view that should first phonetic entry, and this servomechanism is selected corresponding at least one first repayment answer and is sent to this data processing unit according to this first user view from the record that a structured database comprises, when the quantity of this first repayment answer of selecting is 1, this data processing unit carries out corresponding operation according to the type of selected this first repayment answer, when the quantity of this first repayment answer of selecting greater than 1 the time, this data processing unit is controlled this display unit according to this first repayment answer of selecting and is shown first candidate list that comprises this first repayment answer, and this data processing unit carries out speech recognition to produce one second key word by this servomechanism to this second phonetic entry, second key word is carried out natural language processing producing one second user view that should second phonetic entry, and this servomechanism is according to selecting the second repayment answer and be sent to this data processing unit in this first repayment answer from this first candidate list of this second user view.
12. infosystem as claimed in claim 11, wherein this servomechanism respectively compare by this record and this first user view of this first repayment answer, when respectively this record and this first user view are at least part of coupling, then this record is considered as corresponding this first repayment answer of this first phonetic entry.
13. infosystem as claimed in claim 11, wherein this servomechanism judges whether this second user view comprises an order vocabulary of indication order, when this second user view comprises this order vocabulary of indication order, then this servomechanism selects to be positioned at this first repayment answer of correspondence position in this first candidate list according to this order vocabulary, when this second user view does not comprise this order vocabulary of indication order, then this servomechanism with in this first candidate list respectively this record and this second user view compare to determine the matching degree of this first repayment answer material and this second phonetic entry, and according to this matching degree determine this first candidate list which this first repay answer corresponding to this second phonetic entry.
14. infosystem as claimed in claim 13, wherein this this servomechanism selects in this first repayment answer this matching degree soprano for to should second phonetic entry.
15. infosystem as claimed in claim 11, wherein the type when selected this first repayment answer is music shelves, then this data processing unit carries out music according to selected this first repayment answer, when selected this first type of repaying answer is image shelves, then this data processing unit carries out the image broadcast according to selected this first repayment answer, when selected this first type of repaying answer is webpage shelves, then this data processing unit shows according to selected this first repayment answer, when selected this first type of repaying answer is picture shelves, then this data processing unit carries out the picture demonstration according to selected this first repayment answer, and when selected this first the repayment answer type be a cardfile, then data processing unit according to selected this first the repayment answer dial and connect.
16. the system of selection based on speech recognition comprises:
One first phonetic entry is carried out speech recognition to produce one first key word;
Retrieve to obtain at least one first repayment answer according to this first key word in a structured database;
When the quantity of this first repayment answer of selecting greater than 1 the time, show that one comprises first candidate data of this first repayment answer;
After showing this first candidate list, receive one second phonetic entry, and this second phonetic entry is carried out speech recognition to produce one second key word; And
From this first repayment answer of first candidate list, select the second repayment answer according to this second user view.
17. the system of selection based on speech recognition as claimed in claim 16, wherein this first key word step of retrieving to obtain at least one first repayment answer in a structured database comprises:
When the record of this structured database and this first key word are at least part of coupling, then this record is considered as this first repayment answer corresponding to this first phonetic entry.
18. the system of selection based on speech recognition as claimed in claim 16, wherein according to selecting the step of this second repayment answer to comprise in this first repayment answer from this first candidate list of this second key word:
When this second key word comprises this order vocabulary of indication order, then in this first candidate list, select to be positioned at the first repayment answer of correspondence position according to this order vocabulary;
When this second key word does not comprise this order vocabulary of indication order, then will compare corresponding to record and this second key word of respectively this first repayment answer in this first candidate list; And
Which this first repayment answer that determines this first candidate list according to this comparison result is to should second phonetic entry.
19. the system of selection based on speech recognition as claimed in claim 18 wherein comprises step that should second phonetic entry according to which determines in this first candidate list this first repayment answer of this comparison result:
Select in this first repayment answer this matching degree to be should second phonetic entry for the soprano.
20. the system of selection based on speech recognition as claimed in claim 16, wherein the step of carrying out corresponding operation according to selected this first type of repaying answer comprises:
When selected this first type of repaying answer is music shelves, then music is carried out in selected this first repayment answer;
When selected first type of repaying answer is image shelves, then image is carried out in the selected first repayment answer and play;
When selected first type of repaying answer is webpage shelves, then the selected first repayment answer is shown;
When selected first type of repaying answer is picture shelves, then picture is carried out in the selected first repayment answer and show; And
When selected first type of repaying answer is a cardfile, then the selected first repayment answer is dialed and connected.
CN2013101828630A 2012-12-31 2013-05-17 Selection method based on voice recognition, mobile terminal device and information system thereof Pending CN103280218A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN2013101828630A CN103280218A (en) 2012-12-31 2013-05-17 Selection method based on voice recognition, mobile terminal device and information system thereof
CN201710007339.8A CN106847278A (en) 2012-12-31 2013-05-17 Selection method based on voice recognition, mobile terminal device and information system thereof
TW102121404A TWI511124B (en) 2012-12-31 2013-06-17 Selection method based on speech recognition and mobile terminal device and information system using the same

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201210593079.4 2012-12-31
CN2012105930794A CN103021403A (en) 2012-12-31 2012-12-31 Selection method based on voice recognition, mobile terminal device and information system thereof
CN2013101828630A CN103280218A (en) 2012-12-31 2013-05-17 Selection method based on voice recognition, mobile terminal device and information system thereof

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201710007339.8A Division CN106847278A (en) 2012-12-31 2013-05-17 Selection method based on voice recognition, mobile terminal device and information system thereof

Publications (1)

Publication Number Publication Date
CN103280218A true CN103280218A (en) 2013-09-04

Family

ID=47969935

Family Applications (3)

Application Number Title Priority Date Filing Date
CN2012105930794A Pending CN103021403A (en) 2012-12-31 2012-12-31 Selection method based on voice recognition, mobile terminal device and information system thereof
CN2013101828630A Pending CN103280218A (en) 2012-12-31 2013-05-17 Selection method based on voice recognition, mobile terminal device and information system thereof
CN201710007339.8A Pending CN106847278A (en) 2012-12-31 2013-05-17 Selection method based on voice recognition, mobile terminal device and information system thereof

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN2012105930794A Pending CN103021403A (en) 2012-12-31 2012-12-31 Selection method based on voice recognition, mobile terminal device and information system thereof

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201710007339.8A Pending CN106847278A (en) 2012-12-31 2013-05-17 Selection method based on voice recognition, mobile terminal device and information system thereof

Country Status (2)

Country Link
CN (3) CN103021403A (en)
TW (1) TWI511124B (en)

Cited By (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601202A (en) * 2014-12-23 2015-05-06 惠州Tcl移动通信有限公司 Method and terminal for realizing file search based on Bluetooth technology as well as Bluetooth device
CN105161098A (en) * 2015-07-31 2015-12-16 北京奇虎科技有限公司 Speech recognition method and speech recognition device for interaction system
CN107452378A (en) * 2017-08-15 2017-12-08 北京百度网讯科技有限公司 Voice interactive method and device based on artificial intelligence
CN107615377A (en) * 2015-10-05 2018-01-19 萨万特系统有限责任公司 The key phrase suggestion based on history for the Voice command of domestic automation system
CN108228637A (en) * 2016-12-21 2018-06-29 中国电信股份有限公司 Natural language client auto-answer method and system
CN109712619A (en) * 2018-12-24 2019-05-03 出门问问信息科技有限公司 A kind of method, apparatus and voice interactive system that decoupling dialogue is assumed and executed
CN110603586A (en) * 2017-05-09 2019-12-20 苹果公司 User interface for correcting recognition errors
US10741181B2 (en) 2017-05-09 2020-08-11 Apple Inc. User interface for correcting recognition errors
CN106663424B (en) * 2014-03-31 2021-03-05 三菱电机株式会社 Intention understanding device and method
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11321116B2 (en) 2012-05-15 2022-05-03 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11670289B2 (en) 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11853647B2 (en) 2015-12-23 2023-12-26 Apple Inc. Proactive assistance based on dialog communication between devices
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback
US12001933B2 (en) 2015-05-15 2024-06-04 Apple Inc. Virtual assistant in a communication session
US12010262B2 (en) 2013-08-06 2024-06-11 Apple Inc. Auto-activating smart responses based on activities from remote devices
US12014118B2 (en) 2017-05-15 2024-06-18 Apple Inc. Multi-modal interfaces having selection disambiguation and text modification capability
US12051413B2 (en) 2015-09-30 2024-07-30 Apple Inc. Intelligent device identification
US12067985B2 (en) 2018-06-01 2024-08-20 Apple Inc. Virtual assistant operations in multi-device environments
US12073147B2 (en) 2013-06-09 2024-08-27 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US12087308B2 (en) 2010-01-18 2024-09-10 Apple Inc. Intelligent automated assistant
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
US12223282B2 (en) 2016-06-09 2025-02-11 Apple Inc. Intelligent automated assistant in a home environment
US12254887B2 (en) 2017-05-16 2025-03-18 Apple Inc. Far-field extension of digital assistant services for providing a notification of an event to a user
US12260234B2 (en) 2017-01-09 2025-03-25 Apple Inc. Application integration with a digital assistant
US12431128B2 (en) 2022-08-05 2025-09-30 Apple Inc. Task flow identification based on user intent

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103139378A (en) * 2012-12-31 2013-06-05 威盛电子股份有限公司 Mobile terminal device and method for automatically opening sound output interface of mobile terminal device
CN104243666B (en) * 2013-06-13 2017-10-31 腾讯科技(深圳)有限公司 language processing method and device
CN104424944B (en) * 2013-08-19 2018-01-23 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN104636323B (en) * 2013-11-07 2018-04-03 腾讯科技(深圳)有限公司 Handle the method and device of speech text
KR102197143B1 (en) * 2013-11-26 2020-12-31 현대모비스 주식회사 System for command operation using speech recognition and method thereof
CN103677566A (en) * 2013-11-27 2014-03-26 北京百纳威尔科技有限公司 Picture editing method and picture editing device
CN105592067B (en) * 2014-11-07 2020-07-28 三星电子株式会社 Voice signal processing method, terminal and server for realizing same
EP4350558A3 (en) 2014-11-07 2024-06-19 Samsung Electronics Co., Ltd. Speech signal processing method and speech signal processing apparatus
CN105335498A (en) * 2015-10-23 2016-02-17 广东小天才科技有限公司 Method and system for information recommendation based on voice information
US10229671B2 (en) * 2015-12-02 2019-03-12 GM Global Technology Operations LLC Prioritized content loading for vehicle automatic speech recognition systems
US10097919B2 (en) * 2016-02-22 2018-10-09 Sonos, Inc. Music service selection
CN106897155B (en) * 2016-08-29 2019-11-05 阿里巴巴集团控股有限公司 Method and device for displaying an interface
CN106408200A (en) * 2016-09-28 2017-02-15 孙腾 Mutual assistance management and control system and method
TWI601071B (en) * 2016-09-30 2017-10-01 亞旭電腦股份有限公司 Method, electronic device and computer with non-volatile storage device for inputting voice signal of phone set to smart device
US10437928B2 (en) * 2016-12-30 2019-10-08 Google Llc Device identifier dependent operation processing of packet based data communication
US11238860B2 (en) 2017-01-20 2022-02-01 Huawei Technologies Co., Ltd. Method and terminal for implementing speech control
CN106952646A (en) * 2017-02-27 2017-07-14 深圳市朗空亿科科技有限公司 A kind of robot interactive method and system based on natural language
WO2018228515A1 (en) 2017-06-15 2018-12-20 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for speech recognition
KR102561712B1 (en) * 2017-12-07 2023-08-02 삼성전자주식회사 Apparatus for Voice Recognition and operation method thereof
TWI651714B (en) * 2017-12-22 2019-02-21 隆宸星股份有限公司 Voice option selection system and method and smart robot using the same
TWI678672B (en) * 2018-01-18 2019-12-01 中國信託金融控股股份有限公司 Accounting information query method and accounting system
CN110111793B (en) * 2018-02-01 2023-07-14 腾讯科技(深圳)有限公司 Audio information processing method, device, storage medium and electronic device
CN110459211B (en) * 2018-05-07 2023-06-23 阿里巴巴集团控股有限公司 Man-machine conversation method, client, electronic equipment and storage medium
CN108806685A (en) * 2018-07-02 2018-11-13 英业达科技有限公司 Speech control system and its method
CN110942769A (en) * 2018-09-20 2020-03-31 九阳股份有限公司 Multi-turn dialogue response system based on directed graph
CN109947911B (en) * 2019-01-14 2023-06-16 达闼机器人股份有限公司 Man-machine interaction method and device, computing equipment and computer storage medium
CN115240664A (en) * 2019-04-10 2022-10-25 华为技术有限公司 A method and electronic device for human-computer interaction
CN110111788B (en) * 2019-05-06 2022-02-08 阿波罗智联(北京)科技有限公司 Voice interaction method and device, terminal and computer readable medium
TWI751560B (en) * 2019-05-20 2022-01-01 仁寶電腦工業股份有限公司 Speech-to-text device and speech-to-text method
CN110581772B (en) * 2019-09-06 2020-10-13 腾讯科技(深圳)有限公司 Instant messaging message interaction method and device and computer readable storage medium
US11935521B2 (en) * 2019-09-12 2024-03-19 Oracle International Corporation Real-time feedback for efficient dialog processing
CN110706704A (en) * 2019-10-17 2020-01-17 四川长虹电器股份有限公司 Method, device and computer equipment for generating voice interaction prototype
CN110827815B (en) * 2019-11-07 2022-07-15 深圳传音控股股份有限公司 Voice recognition method, terminal, system and computer storage medium
CN110990598B (en) * 2019-11-18 2020-11-27 北京声智科技有限公司 Resource retrieval method and device, electronic equipment and computer-readable storage medium
CN112002321B (en) * 2020-08-11 2023-09-19 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
CN112331185B (en) * 2020-11-10 2023-08-11 珠海格力电器股份有限公司 Voice interaction method, system, storage medium and electronic equipment
CN112562651A (en) * 2020-11-26 2021-03-26 杭州讯酷科技有限公司 Method for generating page based on intelligent recognition of keywords of natural language
CN113470649B (en) * 2021-08-18 2024-08-23 三星电子(中国)研发中心 Voice interaction method and device
TWI808038B (en) * 2022-11-14 2023-07-01 犀動智能科技股份有限公司 Media file selection method and service system and computer program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1941077A (en) * 2005-09-27 2007-04-04 株式会社东芝 Apparatus and method speech recognition of character string in speech input
CN101577115A (en) * 2008-05-09 2009-11-11 台达电子工业股份有限公司 Voice input system and method thereof
CN101599062A (en) * 2008-06-06 2009-12-09 佛山市顺德区顺达电脑厂有限公司 Search method and system
US20090326947A1 (en) * 2008-06-27 2009-12-31 James Arnold System and method for spoken topic or criterion recognition in digital media and contextual advertising
CN102221985A (en) * 2010-04-16 2011-10-19 韦宏伟 Chinese and control command voice recognition input method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4878477B2 (en) * 2006-01-18 2012-02-15 富士通株式会社 Information retrieval appropriateness determination processing program and operator skill determination processing program
TWI312945B (en) * 2006-06-07 2009-08-01 Ind Tech Res Inst Method and apparatus for multimedia data management
TW200943277A (en) * 2008-04-07 2009-10-16 Mitac Int Corp Search methods and systems, and machine readable medium thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1941077A (en) * 2005-09-27 2007-04-04 株式会社东芝 Apparatus and method speech recognition of character string in speech input
CN101577115A (en) * 2008-05-09 2009-11-11 台达电子工业股份有限公司 Voice input system and method thereof
CN101599062A (en) * 2008-06-06 2009-12-09 佛山市顺德区顺达电脑厂有限公司 Search method and system
US20090326947A1 (en) * 2008-06-27 2009-12-31 James Arnold System and method for spoken topic or criterion recognition in digital media and contextual advertising
CN102221985A (en) * 2010-04-16 2011-10-19 韦宏伟 Chinese and control command voice recognition input method and device

Cited By (118)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11979836B2 (en) 2007-04-03 2024-05-07 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11900936B2 (en) 2008-10-02 2024-02-13 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US12165635B2 (en) 2010-01-18 2024-12-10 Apple Inc. Intelligent automated assistant
US12087308B2 (en) 2010-01-18 2024-09-10 Apple Inc. Intelligent automated assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11321116B2 (en) 2012-05-15 2022-05-03 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11862186B2 (en) 2013-02-07 2024-01-02 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11636869B2 (en) 2013-02-07 2023-04-25 Apple Inc. Voice trigger for a digital assistant
US11557310B2 (en) 2013-02-07 2023-01-17 Apple Inc. Voice trigger for a digital assistant
US12009007B2 (en) 2013-02-07 2024-06-11 Apple Inc. Voice trigger for a digital assistant
US12277954B2 (en) 2013-02-07 2025-04-15 Apple Inc. Voice trigger for a digital assistant
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US12073147B2 (en) 2013-06-09 2024-08-27 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US12010262B2 (en) 2013-08-06 2024-06-11 Apple Inc. Auto-activating smart responses based on activities from remote devices
CN106663424B (en) * 2014-03-31 2021-03-05 三菱电机株式会社 Intention understanding device and method
US12067990B2 (en) 2014-05-30 2024-08-20 Apple Inc. Intelligent assistant for home automation
US11699448B2 (en) 2014-05-30 2023-07-11 Apple Inc. Intelligent assistant for home automation
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11810562B2 (en) 2014-05-30 2023-11-07 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US12118999B2 (en) 2014-05-30 2024-10-15 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11670289B2 (en) 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
US12200297B2 (en) 2014-06-30 2025-01-14 Apple Inc. Intelligent automated assistant for TV user interactions
US11838579B2 (en) 2014-06-30 2023-12-05 Apple Inc. Intelligent automated assistant for TV user interactions
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
CN104601202A (en) * 2014-12-23 2015-05-06 惠州Tcl移动通信有限公司 Method and terminal for realizing file search based on Bluetooth technology as well as Bluetooth device
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US12236952B2 (en) 2015-03-08 2025-02-25 Apple Inc. Virtual assistant activation
US11842734B2 (en) 2015-03-08 2023-12-12 Apple Inc. Virtual assistant activation
US12001933B2 (en) 2015-05-15 2024-06-04 Apple Inc. Virtual assistant in a communication session
US12154016B2 (en) 2015-05-15 2024-11-26 Apple Inc. Virtual assistant in a communication session
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback
CN105161098A (en) * 2015-07-31 2015-12-16 北京奇虎科技有限公司 Speech recognition method and speech recognition device for interaction system
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11550542B2 (en) 2015-09-08 2023-01-10 Apple Inc. Zero latency digital assistant
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US12204932B2 (en) 2015-09-08 2025-01-21 Apple Inc. Distributed personal assistant
US11954405B2 (en) 2015-09-08 2024-04-09 Apple Inc. Zero latency digital assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US12051413B2 (en) 2015-09-30 2024-07-30 Apple Inc. Intelligent device identification
CN107615377A (en) * 2015-10-05 2018-01-19 萨万特系统有限责任公司 The key phrase suggestion based on history for the Voice command of domestic automation system
CN107615377B (en) * 2015-10-05 2021-11-09 萨万特系统公司 History-based key phrase suggestions for voice control of home automation systems
US11809886B2 (en) 2015-11-06 2023-11-07 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11853647B2 (en) 2015-12-23 2023-12-26 Apple Inc. Proactive assistance based on dialog communication between devices
US12223282B2 (en) 2016-06-09 2025-02-11 Apple Inc. Intelligent automated assistant in a home environment
US11657820B2 (en) 2016-06-10 2023-05-23 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US12175977B2 (en) 2016-06-10 2024-12-24 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11749275B2 (en) 2016-06-11 2023-09-05 Apple Inc. Application integration with a digital assistant
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
US12293763B2 (en) 2016-06-11 2025-05-06 Apple Inc. Application integration with a digital assistant
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
CN108228637A (en) * 2016-12-21 2018-06-29 中国电信股份有限公司 Natural language client auto-answer method and system
US12260234B2 (en) 2017-01-09 2025-03-25 Apple Inc. Application integration with a digital assistant
US10741181B2 (en) 2017-05-09 2020-08-11 Apple Inc. User interface for correcting recognition errors
CN110603586A (en) * 2017-05-09 2019-12-20 苹果公司 User interface for correcting recognition errors
CN110603586B (en) * 2017-05-09 2020-09-22 苹果公司 User interface for correcting recognition errors
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11538469B2 (en) 2017-05-12 2022-12-27 Apple Inc. Low-latency intelligent automated assistant
US11837237B2 (en) 2017-05-12 2023-12-05 Apple Inc. User-specific acoustic models
US11862151B2 (en) 2017-05-12 2024-01-02 Apple Inc. Low-latency intelligent automated assistant
US12014118B2 (en) 2017-05-15 2024-06-18 Apple Inc. Multi-modal interfaces having selection disambiguation and text modification capability
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US12254887B2 (en) 2017-05-16 2025-03-18 Apple Inc. Far-field extension of digital assistant services for providing a notification of an event to a user
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US12026197B2 (en) 2017-05-16 2024-07-02 Apple Inc. Intelligent automated assistant for media exploration
CN107452378A (en) * 2017-08-15 2017-12-08 北京百度网讯科技有限公司 Voice interactive method and device based on artificial intelligence
US12211502B2 (en) 2018-03-26 2025-01-28 Apple Inc. Natural assistant interaction
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US11900923B2 (en) 2018-05-07 2024-02-13 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US11487364B2 (en) 2018-05-07 2022-11-01 Apple Inc. Raise to speak
US11907436B2 (en) 2018-05-07 2024-02-20 Apple Inc. Raise to speak
US12067985B2 (en) 2018-06-01 2024-08-20 Apple Inc. Virtual assistant operations in multi-device environments
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11630525B2 (en) 2018-06-01 2023-04-18 Apple Inc. Attention aware virtual assistant dismissal
US11360577B2 (en) 2018-06-01 2022-06-14 Apple Inc. Attention aware virtual assistant dismissal
US12061752B2 (en) 2018-06-01 2024-08-13 Apple Inc. Attention aware virtual assistant dismissal
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US12080287B2 (en) 2018-06-01 2024-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
CN109712619A (en) * 2018-12-24 2019-05-03 出门问问信息科技有限公司 A kind of method, apparatus and voice interactive system that decoupling dialogue is assumed and executed
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US12136419B2 (en) 2019-03-18 2024-11-05 Apple Inc. Multimodality in digital assistant systems
US12154571B2 (en) 2019-05-06 2024-11-26 Apple Inc. Spoken notifications
US12216894B2 (en) 2019-05-06 2025-02-04 Apple Inc. User configurable task triggers
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US12197712B2 (en) 2020-05-11 2025-01-14 Apple Inc. Providing relevant data items based on context
US11924254B2 (en) 2020-05-11 2024-03-05 Apple Inc. Digital assistant hardware abstraction
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US12219314B2 (en) 2020-07-21 2025-02-04 Apple Inc. User identification using headphones
US11750962B2 (en) 2020-07-21 2023-09-05 Apple Inc. User identification using headphones
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US12431128B2 (en) 2022-08-05 2025-09-30 Apple Inc. Task flow identification based on user intent

Also Published As

Publication number Publication date
TW201426736A (en) 2014-07-01
CN106847278A (en) 2017-06-13
TWI511124B (en) 2015-12-01
CN103021403A (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN103279508B (en) Method for correcting voice response and natural language dialogue system
CN103761242B (en) Search method, search system, and natural language understanding system
CN103268315B (en) Natural Language Dialogue Method and System
TWI511124B (en) Selection method based on speech recognition and mobile terminal device and information system using the same
US9824150B2 (en) Systems and methods for providing information discovery and retrieval
CN102272828B (en) Method and system for providing a voice interface
US9502031B2 (en) Method for supporting dynamic grammars in WFST-based ASR
CN105701254B (en) Information processing method and device for information processing
US20180052824A1 (en) Task identification and completion based on natural language query
US20060143007A1 (en) User interaction with voice information services
CN102982800A (en) Electronic device with audio video file video processing function and audio video file processing method
US20190236208A1 (en) Smart speaker with music recognition
CN111739530A (en) Interaction method and device, earphone and earphone storage device
TWI578175B (en) Searching method, searching system and nature language understanding system
CN112347774B (en) Model determination method and device for user emotion recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130904