JP7385635B2

JP7385635B2 - Voice command recognition system, voice command recognition method, and program

Info

Publication number: JP7385635B2
Application number: JP2021131693A
Authority: JP
Inventors: 武飯野
Original assignee: NEC Personal Computers Ltd
Current assignee: NEC Personal Computers Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-11-22
Anticipated expiration: 2041-08-12
Also published as: JP2023026071A

Description

本発明は、音声コマンド認識システム、音声コマンド認識方法、及びプログラムに関するものである。 The present invention relates to a voice command recognition system, a voice command recognition method, and a program.

近年、ユーザが発話した音声コマンドに従って種々の処理を行う音声エージェント機能を搭載した情報処理装置が提案されている（例えば、特許文献１参照）。 In recent years, an information processing device equipped with a voice agent function that performs various processes according to voice commands uttered by a user has been proposed (see, for example, Patent Document 1).

特開２０２０－１３４６２７号公報Japanese Patent Application Publication No. 2020-134627

従来の音声エージェント機能は、例えば、あいまいな発話パターンを許容していないため、ユーザは意味が明確となるように発話を行う必要があった。例えば、「次」という発話パターンでは、音声エージェント機能は、ユーザが何を意図しているのか解釈できない。このため、例えば、ユーザは「次のニュースを読んで」、「次の曲を再生して」等の冗長的な発話を行い、指示を行う必要があった。このような冗長的な発話は、自然な会話に近い対話の実現を妨げる要因になっていた。 Conventional voice agent functions, for example, do not allow ambiguous speech patterns, so the user needs to make speeches that make the meaning clear. For example, in the utterance pattern "next", the voice agent function cannot interpret what the user intends. Therefore, for example, the user had to give instructions by making redundant utterances such as "read the next news" or "play the next song." Such redundant utterances have been a factor that hinders the realization of dialogue that resembles natural conversation.

本発明は、このような事情に鑑みてなされたものであって、発話パターンの簡略化を図るとともに、ユーザの意図を反映したコマンド認識を行うことのできる音声コマンド認識システム、音声コマンド認識方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and provides a voice command recognition system, a voice command recognition method, and a voice command recognition method capable of simplifying speech patterns and recognizing commands that reflect the user's intentions. and programs.

本発明の第１態様は、コマンドグループ情報を含む音声コマンドの情報が時間情報と関連付けられて登録されたコマンド履歴情報を管理する履歴情報管理部と、コマンドグループ情報を含む複数の音声コマンドが定義された定義情報を用いて、ユーザが発話した音声データに対応する音声コマンドをコマンド候補として特定するコマンド候補特定部と、複数のコマンド候補が特定された場合に、コマンド候補のコマンドグループ情報と、前記コマンド履歴情報に登録されている前記音声コマンドのコマンドグループ情報及び時間情報とを用いて、複数のコマンド候補から処理対象の音声コマンドを特定するコマンド特定部と、前記コマンド特定部によって特定された音声コマンドに対応する処理内容を決定する処理決定部とを備え、前記音声コマンドは、実行処理が定義された独立コマンドと、従属する独立コマンドに応じて実行処理が変化する従属コマンドとに区分され、前記定義情報において、前記従属コマンドの音声コマンドには、従属する複数の独立コマンドの情報が登録されており、前記処理決定部は、前記コマンド特定部によって特定された音声コマンドが従属コマンドか否かを判定する判定部と、特定された前記音声コマンドが従属コマンドである場合に、当該音声コマンドが従属する独立コマンドのうち、最も直近に認識された独立コマンドを前記コマンド履歴情報を用いて特定する独立コマンド特定部と、特定した独立コマンドの音声コマンドと従属コマンドである音声コマンドとに基づいて処理内容を決定する決定部とを備える音声コマンド認識システムである。 A first aspect of the present invention includes a history information management unit that manages command history information in which voice command information including command group information is registered in association with time information, and a plurality of voice commands including command group information are defined. a command candidate identification unit that identifies a voice command corresponding to voice data uttered by a user as a command candidate using the defined definition information; and when a plurality of command candidates are identified, command group information of the command candidate; a command specifying unit that specifies a voice command to be processed from a plurality of command candidates using command group information and time information of the voice command registered in the command history information; a processing determination unit that determines processing content corresponding to a voice command, and the voice command is divided into an independent command in which an execution process is defined and a dependent command in which an execution process changes according to a dependent independent command. , in the definition information, information on a plurality of dependent independent commands is registered in the audio command of the dependent command, and the processing determining unit determines whether the audio command specified by the command specifying unit is a dependent command or not. a determination unit that determines whether the specified voice command is a dependent command, and a determining unit that identifies the most recently recognized independent command among the independent commands to which the voice command is dependent, using the command history information; This is a voice command recognition system that includes an independent command specifying section that performs the independent command, and a determining section that determines processing content based on the voice command that is the specified independent command and the voice command that is the dependent command.

本発明の第２態様は、コマンドグループ情報を含む音声コマンドの情報が時間情報と関連付けられて登録されたコマンド履歴情報を管理する履歴情報管理工程と、コマンドグループ情報を含む複数の音声コマンドが定義された定義情報を用いて、ユーザが発話した音声データに対応する音声コマンドをコマンド候補として特定するコマンド候補特定工程と、複数のコマンド候補が特定された場合に、コマンド候補のコマンドグループ情報と、前記コマンド履歴情報に登録されている前記音声コマンドのコマンドグループ情報及び時間情報とを用いて、複数のコマンド候補から処理対象の音声コマンドを特定するコマンド特定工程と、前記コマンド特定工程において特定された音声コマンドに対応する処理内容を決定する処理決定工程とをコンピュータが実行し、前記音声コマンドは、実行処理が定義された独立コマンドと、従属する独立コマンドに応じて実行処理が変化する従属コマンドとに区分され、前記定義情報において、前記従属コマンドの音声コマンドには、従属する複数の独立コマンドの情報が登録されており、前記処理決定工程は、前記コマンド特定工程において特定された音声コマンドが従属コマンドか否かを判定する判定工程と、特定された前記音声コマンドが従属コマンドである場合に、当該音声コマンドが従属する独立コマンドのうち、最も直近に認識された独立コマンドを前記コマンド履歴情報を用いて特定する独立コマンド特定工程と、特定した独立コマンドの音声コマンドと従属コマンドである音声コマンドとに基づいて処理内容を決定する決定工程とを含む音声コマンド認識方法である。 A second aspect of the present invention includes a history information management step of managing command history information in which information of voice commands including command group information is registered in association with time information, and a plurality of voice commands including command group information are defined. a command candidate identification step of identifying a voice command corresponding to the voice data uttered by the user as a command candidate using the defined definition information; and when a plurality of command candidates are identified, command group information of the command candidate; a command specifying step of specifying a voice command to be processed from a plurality of command candidates using command group information and time information of the voice command registered in the command history information; The computer executes a process determination step of determining the process content corresponding to the voice command , and the voice command is divided into an independent command with a defined execution process and a dependent command whose execution process changes depending on the dependent independent command. In the definition information, information on a plurality of subordinate independent commands is registered in the voice command of the dependent command, and the process determination step determines whether the voice command identified in the command identification step is a subordinate command. a determination step of determining whether or not the voice command is a command; and when the identified voice command is a dependent command, the most recently recognized independent command among the independent commands to which the voice command is subordinate is determined based on the command history information; This is a voice command recognition method that includes an independent command specifying step for specifying an independent command using an independent command, and a determining step for determining processing content based on the voice command of the specified independent command and the voice command that is a dependent command.

本発明の第３態様は、コンピュータを上記記載の音声コマンド認識システムとして機能させるためのプログラムである。 A third aspect of the present invention is a program for causing a computer to function as the voice command recognition system described above.

本発明によれば、発話パターンの簡略化を図るとともに、ユーザの意図を反映したコマンド認識を行うことができるという効果を奏する。 According to the present invention, it is possible to simplify the speech pattern and perform command recognition that reflects the user's intention.

本発明の第１実施形態に係る情報処理装置のハードウェア構成の一例を示した概略構成図である。1 is a schematic configuration diagram showing an example of the hardware configuration of an information processing device according to a first embodiment of the present invention. 本発明の第１実施形態に係る情報処理装置が備える機能の一例を示した機能ブロック図である。FIG. 2 is a functional block diagram showing an example of functions included in the information processing device according to the first embodiment of the present invention. 本発明の第１実施形態に係るコマンド定義情報の一例を示した図である。FIG. 3 is a diagram showing an example of command definition information according to the first embodiment of the present invention. 本発明の第１実施形態に係る実行処理定義情報の一例を示した図である。FIG. 3 is a diagram showing an example of execution process definition information according to the first embodiment of the present invention. 本発明の第１実施形態に係るコマンド認識部が備える機能を示した機能ブロック図である。FIG. 2 is a functional block diagram showing functions included in the command recognition unit according to the first embodiment of the present invention. 本発明の第１実施形態に係るコマンド履歴情報の一例を示した図である。FIG. 3 is a diagram showing an example of command history information according to the first embodiment of the present invention. 本発明の第１実施形態に係るコマンド特定部が行う処理について説明するための図である。FIG. 3 is a diagram for explaining processing performed by a command specifying unit according to the first embodiment of the present invention. 本発明の第１実施形態に係るコマンド履歴管理部が行う処理について説明するための図である。FIG. 3 is a diagram for explaining processing performed by the command history management unit according to the first embodiment of the present invention. 本発明の第２実施形態に係るコマンド定義情報の一例を示した図である。FIG. 7 is a diagram showing an example of command definition information according to a second embodiment of the present invention. 本発明の第２実施形態に係るコマンド特定部が行う処理について説明するための図である。FIG. 7 is a diagram for explaining processing performed by a command specifying unit according to a second embodiment of the present invention. 本発明の第３実施形態に係るコマンド定義情報の一例を示した図である。FIG. 7 is a diagram showing an example of command definition information according to a third embodiment of the present invention. 本発明の第３実施形態に係る実行処理定義情報の一例を示した図である。FIG. 7 is a diagram showing an example of execution process definition information according to the third embodiment of the present invention. 本発明の第３実施形態に係るコマンド認識部が有する機能の一例を示した機能ブロック図である。FIG. 7 is a functional block diagram showing an example of functions possessed by a command recognition unit according to a third embodiment of the present invention. 本発明の第３実施形態に係る独立コマンド特定部が行う処理について説明するための図である。FIG. 7 is a diagram for explaining processing performed by an independent command specifying unit according to a third embodiment of the present invention.

〔第１実施形態〕
以下に、本発明の第１実施形態に係る音声コマンド認識システム、音声コマンド認識方法、及びプログラムについて、図面を参照して説明する。本実施形態では、音声コマンド認識システムが情報処理装置１に搭載されている場合を例示して説明する。
情報処理装置１の一例として、ノートＰＣ、デスクトップ型ＰＣ、タブレット端末、スマートフォン等が挙げられる。 [First embodiment]
EMBODIMENT OF THE INVENTION Below, the voice command recognition system, voice command recognition method, and program which concern on 1st Embodiment of this invention are demonstrated with reference to drawings. In this embodiment, a case where a voice command recognition system is installed in the information processing device 1 will be described as an example.
Examples of the information processing device 1 include a notebook PC, a desktop PC, a tablet terminal, a smartphone, and the like.

図１は、本発明の第１実施形態に係る情報処理装置１のハードウェア構成の一例を示した概略構成図である。
図１に示すように、情報処理装置１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１、メインメモリ１２、記憶部１３、マイク１４、スピーカ１５、通信部１６、入力部１７、及び表示部１８等を備えている。これら各部は直接的にまたはバスを介して間接的に相互に接続されており互いに連携して各種処理を実行する。 FIG. 1 is a schematic configuration diagram showing an example of the hardware configuration of an information processing device 1 according to a first embodiment of the present invention.
As shown in FIG. 1, the information processing device 1 includes, for example, a CPU (Central Processing Unit) 11, a main memory 12, a storage section 13, a microphone 14, a speaker 15, a communication section 16, an input section 17, a display section 18, etc. It is equipped with These units are connected to each other directly or indirectly via a bus, and cooperate with each other to execute various processes.

ＣＰＵ１１は、例えば、バスを介して接続された記憶部１３に格納されたＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）により情報処理装置１全体の制御を行うとともに、記憶部１３に格納された各種プログラムを実行することにより各種処理を実行する。 For example, the CPU 11 controls the entire information processing device 1 using an OS (Operating System) stored in the storage unit 13 connected via a bus, and also executes various programs stored in the storage unit 13. Executes various processes.

メインメモリ１２は、キャッシュメモリ、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の書き込み可能なメモリで構成され、ＣＰＵ１１の実行プログラムの読み出し、実行プログラムによる処理データの書き込み等を行う作業領域として利用される。 The main memory 12 is composed of a writable memory such as a cache memory and a RAM (Random Access Memory), and is used as a work area for reading an execution program of the CPU 11, writing processing data by the execution program, and the like.

記憶部１３は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリ等であり、例えば、Ｗｉｎｄｏｗｓ（登録商標）、ｉＯＳ（登録商標）、Ａｎｄｒｏｉｄ（登録商標）等の情報処理装置１全体の制御を行うためのＯＳ、周辺機器類をハードウェア操作するための各種デバイスドライバ、各種アプリケーションソフトウェア（以下、単に「アプリケーション」という。）、及び各種データやファイル等を格納する。また、記憶部１３には、各種処理を実現するためのプログラムや、各種処理を実現するために必要とされる各種データが格納されている。 The storage unit 13 is, for example, a ROM (Read Only Memory), an HDD (Hard Disk Drive), a flash memory, etc., and is used for information processing such as Windows (registered trademark), iOS (registered trademark), Android (registered trademark), etc. It stores an OS for controlling the entire device 1, various device drivers for operating the hardware of peripheral devices, various application software (hereinafter simply referred to as "applications"), and various data and files. Furthermore, the storage unit 13 stores programs for implementing various processes and various data required for implementing various processes.

マイク１４は、ユーザが発話した音声や環境音を音声信号に変換して出力する。
スピーカ１５は、音声信号を音声に変換して出力する。
通信部１６は、ネットワークに接続するための通信インターフェースを備え、３ＧやＬＴＥ、５Ｇ回線を含むワイヤレスネットワークや、有線／無線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等のネットワークに接続し、他のデバイスとの通信を確立させ、情報の相互通信を実現させる。 The microphone 14 converts voices uttered by the user and environmental sounds into audio signals and outputs the signals.
The speaker 15 converts the audio signal into audio and outputs it.
The communication unit 16 includes a communication interface for connecting to a network, and connects to a wireless network including 3G, LTE, and 5G lines, a wired/wireless LAN, Bluetooth (registered trademark), etc., and communicates with other devices. Establish communication and realize mutual communication of information.

入力部１７は、例えば、キーボード、マウス、タッチパネル等、ユーザが情報処理装置１に対して指示を与えるためのユーザインタフェースである。
表示部１８は、例えば、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）、有機ＥＬ（ＥｌｅｃｔｒｏＬｕｍｉｎｅｓｃｅｎｃｅ）等で構成される表示画面を有し、ＣＰＵ１１からの指令に基づいて動作する。 The input unit 17 is a user interface, such as a keyboard, a mouse, or a touch panel, through which a user gives instructions to the information processing device 1 .
The display unit 18 has a display screen configured with, for example, an LCD (Liquid Crystal Display), an organic EL (Electro Luminescence), etc., and operates based on instructions from the CPU 11.

図２は、本実施形態に係る情報処理装置１が備える機能の一例を示した機能ブロック図である。 FIG. 2 is a functional block diagram showing an example of functions included in the information processing device 1 according to the present embodiment.

後述する各種機能を実現するための一連の処理は、一例として、プログラム（例えば、音声コマンド認識プログラム）の形式で記憶部１３に記憶されており、このプログラムをＣＰＵ１１がメインメモリ１２に読み出して、情報の加工・演算処理を実行することにより、各種機能が実現される。なお、プログラムは、記憶部１３に予めインストールされている形態や、他のコンピュータ読み取り可能な記憶媒体に記憶された状態で提供される形態、有線又は無線による通信手段を介して配信される形態等が適用されてもよい。コンピュータ読み取り可能な記憶媒体とは、磁気ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、半導体メモリ等である。 A series of processes for realizing various functions described later are stored in the storage unit 13 in the form of a program (for example, a voice command recognition program), and the CPU 11 reads this program into the main memory 12 and executes the process. Various functions are realized by processing and calculating information. The program may be preinstalled in the storage unit 13, stored in another computer-readable storage medium, distributed via wired or wireless communication means, etc. may be applied. Computer-readable storage media include magnetic disks, magneto-optical disks, CD-ROMs, DVD-ROMs, semiconductor memories, and the like.

図２に示すように、情報処理装置１は、例えば、コマンド定義データベース２０と、音声エージェント部３０と、連携アプリケーションＡＰとを備えている。コマンド定義データベース２０には、コマンド定義情報ＤＦ１と、実行処理定義情報ＤＦ２とが格納されている。 As shown in FIG. 2, the information processing device 1 includes, for example, a command definition database 20, a voice agent section 30, and a cooperative application AP. The command definition database 20 stores command definition information DF1 and execution process definition information DF2.

音声エージェント部３０は、例えば、音声認識部４０と、コマンド認識部５０とを備えている。
そして、本実施形態に係る音声認識システムは、一例として、コマンド定義データベース２０と、音声エージェント部３０に実装されたコマンド認識部５０を備えている。 The voice agent section 30 includes, for example, a voice recognition section 40 and a command recognition section 50.
The voice recognition system according to this embodiment includes, for example, a command definition database 20 and a command recognition unit 50 installed in the voice agent unit 30.

コマンド定義情報ＤＦ１は、例えば、図３に例示されるように、発話パターンとコマンドＩＤとが関連付けられて登録されている。図３に例示したコマンド定義情報ＤＦ１では、発話パターン「ニュースを読んで」とコマンドＩＤ「ＲｅａｄＮｅｗｓ」とが関連付けられ、発話パターン「次」とコマンドＩＤ「ＮｅｘｔＮｅｗｓ」とが関連付けられ、発話パターン「音楽を再生して」とコマンドＩＤ「ＰｌａｙＭｕｓｉｃ」とが関連付けられ、発話パターン「次」とコマンドＩＤ「ＮｅｘｔＭｕｓｉｃ」とが関連付けられている。 In the command definition information DF1, for example, as illustrated in FIG. 3, a speech pattern and a command ID are registered in association with each other. In the command definition information DF1 illustrated in FIG. 3, the utterance pattern "Read the news" is associated with the command ID "ReadNews", the utterance pattern "Next" is associated with the command ID "NextNews", and the utterance pattern "Music" is associated with the command ID "NextNews". "Play Music" is associated with the command ID "PlayMusic," and the speech pattern "Next" is associated with the command ID "NextMusic."

実行処理定義情報ＤＦ２は、例えば、図４に例示されるように、複数の音声コマンドが登録されている。音声コマンドは、例えば、コマンドＩＤと、コマンドグループＩＤ（コマンドグループ情報）と、実行する処理内容とを含んでいる。コマンドグループＩＤは、例えば、情報処理装置１に搭載される連携アプリケーションＡＰに対応して設けられている。例えば、図４に例示した実行処理定義情報ＤＦ２では、コマンドグループＩＤとして、「ニュースグループ」と、「音楽グループ」が示されている。 For example, as illustrated in FIG. 4, a plurality of voice commands are registered in the execution process definition information DF2. The voice command includes, for example, a command ID, a command group ID (command group information), and processing details to be executed. For example, the command group ID is provided corresponding to the cooperative application AP installed in the information processing device 1. For example, in the execution process definition information DF2 illustrated in FIG. 4, "news group" and "music group" are shown as command group IDs.

コマンドグループＩＤ「ニュースグループ」を含む音声コマンドは、ニュースアプリに対する音声コマンドであり、コマンドグループＩＤ「音楽グループ」を含む音声コマンドは、音楽アプリに対する音声コマンドである。
図４に例示した実行処理定義情報ＤＦ２には、コマンドグループＩＤ「ＮｅｗｓＧｒｏｕｐ」に属する音声コマンドとして音声コマンドＣ１、Ｃ２が示され、コマンドグループＩＤ「ＭｕｓｉｃＧｒｏｕｐ」に属する音声コマンドとして音声コマンドＣ３、Ｃ４が示されている。 A voice command that includes the command group ID "news group" is a voice command for a news app, and a voice command that includes the command group ID "music group" is a voice command for a music app.
In the execution process definition information DF2 illustrated in FIG. 4, voice commands C1 and C2 are shown as voice commands belonging to the command group ID "NewsGroup", and voice commands C3 and C4 are shown as voice commands belonging to the command group ID "MusicGroup". It is shown.

音声エージェント部３０は、上述したように、音声認識部４０と、コマンド認識部５０を備えている。
音声認識部４０は、例えば、ユーザが発話した音声を音声認識し、音声認識結果を出力する。これにより、例えば、発話内容を示すデータが出力される。なお、音声認識については公知の技術を採用すればよく、ここでの詳細な説明は省略する。
コマンド認識部５０は、音声認識部４０によって認識された発話内容に基づいてユーザが意図したコマンドを認識する。 The voice agent section 30 includes the voice recognition section 40 and the command recognition section 50, as described above.
The speech recognition unit 40, for example, performs speech recognition on the speech uttered by the user and outputs the speech recognition result. As a result, for example, data indicating the content of the utterance is output. Note that a known technique may be used for voice recognition, and detailed description thereof will be omitted here.
The command recognition unit 50 recognizes the command intended by the user based on the utterance content recognized by the voice recognition unit 40.

コマンド認識部５０は、図５に示すように、履歴情報管理部５１と、コマンド候補特定部５２と、コマンド特定部５３とを備えている。 As shown in FIG. 5, the command recognition unit 50 includes a history information management unit 51, a command candidate identification unit 52, and a command identification unit 53.

履歴情報管理部５１は、コマンド履歴情報を管理する。コマンド履歴情報は、後述するコマンド特定部５３によって過去に特定された音声コマンドの履歴を示す情報である。例えば、コマンド履歴情報には、過去に特定された音声コマンドのコマンドグループＩＤと時間情報とが関連付けられて登録されている。 The history information management unit 51 manages command history information. The command history information is information indicating the history of voice commands identified in the past by the command identifying unit 53, which will be described later. For example, in the command history information, command group IDs and time information of voice commands specified in the past are registered in association with each other.

図６にコマンド履歴情報の一例を示す。図６に例示するコマンド履歴情報では、コマンドグループＩＤと時間情報に加えて、コマンドＩＤ及び発話パターンが登録されている。また、時間情報として、音声コマンドを特定した時刻であるコマンド認識時刻が登録されている。なお、時間情報は、コマンド認識時刻に限られない。例えば、ユーザの発話を受け付けた時刻から音声コマンドが連携アプリケーションＡＰに出力されるまでの任意のタイミングの時刻を時間情報として採用することが可能である。 FIG. 6 shows an example of command history information. In the command history information illustrated in FIG. 6, in addition to the command group ID and time information, the command ID and speech pattern are registered. Further, as time information, a command recognition time, which is the time when the voice command was specified, is registered. Note that the time information is not limited to the command recognition time. For example, it is possible to employ as the time information an arbitrary timing from the time when the user's utterance is received until the voice command is output to the cooperative application AP.

履歴情報管理部５１は、後述するコマンド特定部５３によって音声コマンドが特定された場合に、特定された音声コマンドの情報と時間情報とを関連付けてコマンド履歴情報に登録する。また、履歴情報管理部５１は、コマンド認識時刻から所定期間経過した音声コマンドの情報をコマンド履歴情報から削除する。換言すると、履歴情報管理部５１は、現在から所定期間以上前の時間情報を有する音声コマンドの情報をコマンド履歴情報から削除する。 When a voice command is specified by a command specifying section 53, which will be described later, the history information management section 51 associates information about the specified voice command with time information and registers the information in the command history information. Furthermore, the history information management unit 51 deletes information on voice commands for which a predetermined period of time has passed since the command recognition time from the command history information. In other words, the history information management unit 51 deletes information on voice commands having time information that is a predetermined period or more before the current time from the command history information.

コマンド候補特定部５２は、コマンド定義データベース２０に格納されている定義情報を用いて、ユーザが発話した発話内容と所定の条件を満たす音声コマンドをコマンド候補として特定する。例えば、コマンド候補特定部５２は、音声認識部４０（図２参照）による音声認識結果とコマンド定義情報ＤＦ１とを照合し、音声認識結果に一致する発話パターンと関連付けられたコマンドＩＤを特定する。続いて、特定したコマンドＩＤで識別される音声コマンドをコマンド候補として実行処理定義情報ＤＦ２から特定する。 The command candidate specifying unit 52 uses definition information stored in the command definition database 20 to specify, as a command candidate, a voice command that satisfies the content of the utterance uttered by the user and a predetermined condition. For example, the command candidate identifying unit 52 compares the voice recognition result by the voice recognition unit 40 (see FIG. 2) with the command definition information DF1, and identifies the command ID associated with the utterance pattern that matches the voice recognition result. Subsequently, the voice command identified by the specified command ID is specified as a command candidate from the execution process definition information DF2.

コマンド特定部５３は、コマンド候補特定部５２によって特定された音声コマンドが一つである場合、そのコマンド候補を処理対象の音声コマンドとして特定する。
また、コマンド特定部５３は、コマンド候補特定部５２によって複数のコマンド候補が特定された場合に、コマンド候補のコマンドグループＩＤと、コマンド履歴情報に登録されているコマンドグループＩＤ及び時間情報を用いて、複数のコマンド候補のいずれかを処理対象の音声コマンドとして特定する。 When the number of voice commands identified by the command candidate specifying unit 52 is one, the command specifying unit 53 specifies the command candidate as the voice command to be processed.
In addition, when a plurality of command candidates are identified by the command candidate identifying unit 52, the command specifying unit 53 uses the command group ID of the command candidate and the command group ID and time information registered in the command history information. , one of the plurality of command candidates is identified as the voice command to be processed.

コマンド特定部５３は、例えば、各コマンド候補のコマンドグループＩＤと一致する音声コマンドの情報のうち、最も新しい時間情報を有する音声コマンドの情報とコマンドグループＩＤが一致する候補コマンドを処理対象の音声コマンドとして特定する。
なお、複数の候補コマンドからいずれか一つの候補コマンドを絞りきれなかった場合、音声コマンドの認識に失敗したと判定する。 For example, the command specifying unit 53 selects a candidate command whose command group ID matches the information of the voice command having the latest time information among the voice command information that matches the command group ID of each command candidate as the voice command to be processed. Specify as.
Note that if any one candidate command cannot be narrowed down from the plurality of candidate commands, it is determined that recognition of the voice command has failed.

例えば、コマンド特定部５３は、コマンド履歴情報から時間情報が新しい順にコマンドグループＩＤを取得し、取得したコマンドグループＩＤとコマンド候補のコマンドグループＩＤとを照合する。そして、最初に照合結果が一致したコマンド候補を処理対象の音声コマンドとして特定する。 For example, the command specifying unit 53 acquires command group IDs from command history information in order of newest time information, and compares the acquired command group IDs with command group IDs of command candidates. Then, the command candidate with the first matching result is identified as the voice command to be processed.

コマンド特定部５３によって特定された音声コマンドは、例えば、その音声コマンドに含まれるコマンドグループＩＤで識別される連携アプリケーションＡＰに出力される。これにより、音声コマンドに応じた処理が実行される。 The voice command specified by the command specifying unit 53 is output to the cooperative application AP identified by the command group ID included in the voice command, for example. As a result, processing according to the voice command is executed.

次に、本実施形態に係る音声コマンド認識方法について説明する。以下の説明では、説明の便宜上、図３に示したコマンド定義情報ＤＦ１及び図４に示した実行処理定義情報ＤＦ２がコマンド定義データベース２０に格納され、また、図６に示したコマンド履歴情報が履歴情報管理部５１によって管理されている状態で、ユーザによって「次」という発話が行われた場合を例示して説明する。 Next, a voice command recognition method according to this embodiment will be explained. In the following explanation, for convenience of explanation, the command definition information DF1 shown in FIG. 3 and the execution process definition information DF2 shown in FIG. 4 are stored in the command definition database 20, and the command history information shown in FIG. A case in which the user utters "next" while being managed by the information management unit 51 will be described as an example.

まず、ユーザによって発話された「次」との音声は、マイク１４（図１参照）によって音声データに変換され、音声エージェント部３０（図２参照）に入力される。
音声エージェント部３０の音声認識部４０は、音声データに基づいてユーザによる「次」との発話を認識し、音声認識結果をコマンド認識部５０に出力する。 First, the voice of "next" uttered by the user is converted into voice data by the microphone 14 (see FIG. 1), and is input to the voice agent section 30 (see FIG. 2).
The voice recognition unit 40 of the voice agent unit 30 recognizes the user's utterance of “next” based on the voice data, and outputs the voice recognition result to the command recognition unit 50.

コマンド認識部５０のコマンド候補特定部５２は、例えば、音声認識結果である「次」との発話と、コマンド定義情報ＤＦ１の発話パターンとを照合し、発話「次」と所定の条件を満たす発話パターンを特定し、さらに、特定した発話パターンに関連付けられているコマンドＩＤを特定する。この結果、例えば、図３に示したコマンド定義情報ＤＦ１からコマンドＩＤ「ＮｅｘｔＮｅｗｓ」、「ＮｅｘｔＭｕｓｉｃ」が特定される。続いて、コマンド候補特定部５２は、特定したコマンドＩＤで識別される音声コマンドをコマンド候補として実行処理定義情報ＤＦ２から特定する。この結果、コマンドＩＤ「ＮｅｘｔＮｅｗｓ」に対応する音声コマンドＣ２及びコマンドＩＤ「ＮｅｘｔＭｕｓｉｃ」に対応する音声コマンドＣ４がコマンド候補として特定される。 For example, the command candidate identification unit 52 of the command recognition unit 50 compares the utterance “next”, which is the voice recognition result, with the utterance pattern of the command definition information DF1, and identifies the utterance “next” as an utterance that satisfies a predetermined condition. A pattern is identified, and a command ID associated with the identified speech pattern is further identified. As a result, for example, the command IDs "NextNews" and "NextMusic" are specified from the command definition information DF1 shown in FIG. 3. Subsequently, the command candidate specifying unit 52 specifies the voice command identified by the specified command ID as a command candidate from the execution process definition information DF2. As a result, the voice command C2 corresponding to the command ID "NextNews" and the voice command C4 corresponding to the command ID "NextMusic" are specified as command candidates.

コマンド特定部５３は、コマンド候補である音声コマンドＣ２、Ｃ４から処理対象の音声コマンドを特定する。例えば、コマンド特定部５３は、図７に例示するように、コマンド候補である音声コマンドＣ２のコマンドグループＩＤ「ＮｅｗｓＧｒｏｕｐ」と、コマンド候補である音声コマンドＣ４のコマンドグループＩＤ「ＭｕｓｉｃＧｒｏｕｐ」と、コマンド履歴情報に登録されているコマンドグループＩＤとを照合する。このとき、コマンド特定部５３は、コマンド履歴情報に登録されている複数のコマンドグループＩＤのうち、コマンド認識時刻が新しいものから順に照合を行い、最初に照合結果が一致したコマンド候補を処理対象の音声コマンドとして特定する。 The command specifying unit 53 specifies a voice command to be processed from the voice commands C2 and C4 that are command candidates. For example, as illustrated in FIG. 7, the command specifying unit 53 includes the command group ID "NewsGroup" of the voice command C2 that is a command candidate, the command group ID "MusicGroup" of the voice command C4 that is a command candidate, and the command history. Check against the command group ID registered in the information. At this time, the command specifying unit 53 performs matching among the plurality of command group IDs registered in the command history information in order of command recognition time, starting with the latest command recognition time, and selects the command candidate with the first matching result as the processing target. Identify as a voice command.

具体的には、コマンド特定部５３は、コマンド候補である音声コマンドＣ２のコマンドグループＩＤ「ＮｅｗｓＧｒｏｕｐ」、音声コマンドＣ４のコマンドグループＩＤ「ＭｕｓｉｃＧｒｏｕｐ」のそれぞれと、コマンド認識時刻が最も新しいコマンドグループＩＤ「ＮｅｗｓＧｒｏｕｐ」とを照合する。この結果、コマンドグループＩＤ「ＮｅｗｓＧｒｏｕｐ」を有する音声コマンドＣ２が処理対象の音声コマンドとして特定される。 Specifically, the command specifying unit 53 selects the command group ID "NewsGroup" of the voice command C2, which is a command candidate, the command group ID "MusicGroup" of the voice command C4, and the command group ID "NewsGroup" with the latest command recognition time. NewsGroup”. As a result, the voice command C2 having the command group ID "NewsGroup" is specified as the voice command to be processed.

コマンド特定部５３によって特定された音声コマンドＣ２は、その音声コマンドＣ２に含まれるコマンドグループＩＤ「ＮｅｗｓＧｒｏｕｐ」で識別される連携アプリケーションＡＰであるニュースアプリケーションに出力される。これにより、ニュースアプリケーションにおいて音声コマンドＣ２に応じた処理が実行される。 The voice command C2 specified by the command specifying unit 53 is output to the news application that is the cooperative application AP identified by the command group ID "NewsGroup" included in the voice command C2. As a result, processing according to the voice command C2 is executed in the news application.

また、履歴情報管理部５１は、コマンド特定部５３によって音声コマンドＣ２が特定されると、特定された音声コマンドＣ２に基づいてコマンド履歴情報を更新する。これにより、図８に示すように、コマンドＩＤ「ＮｅｘｔＮｅｗｓ」で識別される音声コマンドＣ２の情報がコマンド履歴情報に登録されることとなる。
また、履歴情報管理部５１は、コマンド履歴情報に登録してから所定期間が経過した履歴を削除する。 Further, when the voice command C2 is specified by the command specifying section 53, the history information management section 51 updates the command history information based on the specified voice command C2. As a result, as shown in FIG. 8, information on the voice command C2 identified by the command ID "NextNews" is registered in the command history information.
Further, the history information management unit 51 deletes history that has been registered in the command history information for a predetermined period of time.

以上説明したように、本実施形態に係る音声コマンド認識システムは、過去に特定された音声コマンドの情報がコマンド認識時刻（時間情報）と関連付けられて登録されたコマンド履歴情報を管理する履歴情報管理部５１と、コマンドグループＩＤ（コマンドグループ情報）を含む複数の音声コマンドが定義された定義情報（コマンド定義情報ＤＦ１、実行処理定義情報ＤＦ２）を用いて、ユーザが発話した音声データに対応する音声コマンドをコマンド候補として特定するコマンド候補特定部５２と、複数のコマンド候補が特定された場合に、コマンド候補のコマンドグループＩＤと、コマンド履歴情報に登録されている音声コマンドのコマンドグループＩＤ及びコマンド認識時刻を用いて、複数のコマンド候補から処理対象の音声コマンドを特定するコマンド特定部５３とを備える。 As explained above, the voice command recognition system according to the present embodiment has history information management that manages command history information in which information on voice commands identified in the past is registered in association with command recognition time (time information). 51 and definition information (command definition information DF1, execution process definition information DF2) in which a plurality of voice commands including a command group ID (command group information) are defined, a voice corresponding to the voice data uttered by the user is generated. A command candidate identification unit 52 that identifies a command as a command candidate, and when multiple command candidates are identified, the command group ID of the command candidate, the command group ID of the voice command registered in the command history information, and command recognition. It includes a command specifying unit 53 that uses time to specify a voice command to be processed from a plurality of command candidates.

このように、連続して発話されやすいグループの音声コマンドをグループ化し、音声コマンドを認識する際には、コマンド履歴情報のコマンドグループＩＤとコマンド認識時刻とを用いて、処理対象の音声コマンドを特定する。これにより、例えば、「次」などのように、意味があいまいで、また、複数のアプリケーションに対して共通する簡素化された発話であっても、ユーザの意図を反映したコマンド認識を行うことが可能となる。この結果、ユーザは自然な会話に近い発話内容によって所望の処理を実行させることができる。 In this way, when grouping voice commands that are likely to be uttered consecutively and recognizing voice commands, the command group ID and command recognition time of the command history information are used to identify the voice command to be processed. do. This makes it possible to recognize commands that reflect the user's intentions, even for simplified utterances that have ambiguous meanings and are common to multiple applications, such as "next." It becomes possible. As a result, the user can perform desired processing using utterances that are close to natural conversation.

また、同じアプリケーションに対する入力指示は連続して行われる可能性が高い。したがって、アプリケーションに対応してコマンドグループＩＤを付与することにより、音声コマンドの認識精度を高めることが可能となる。 Furthermore, there is a high possibility that input instructions for the same application will be given consecutively. Therefore, by assigning a command group ID corresponding to an application, it is possible to improve the recognition accuracy of voice commands.

また、履歴情報管理部５１は、コマンド認識時刻から所定期間以上経過した履歴をコマンド履歴情報から削除する。これにより、コマンド履歴情報には、過去所定期間内に認識された音声コマンドだけが登録されることとなる。これにより、ユーザが意図しないコマンド解釈を防ぐことが可能となる。 Further, the history information management unit 51 deletes history from the command history information for which a predetermined period of time or more has passed since the command recognition time. As a result, only voice commands recognized within a predetermined period of time in the past are registered in the command history information. This makes it possible to prevent command interpretation that is not intended by the user.

また、音声コマンドにコマンドグループＩＤを付与することにより、音声コマンドをコマンドグループ毎に区分けすることができる。これにより、例えば、アプリケーション間における発話パターンの調整や音声コマンドの調整を行う必要がなく、自由にこれらの定義を行うことができる。 Furthermore, by assigning a command group ID to the voice commands, the voice commands can be classified into command groups. Thereby, for example, there is no need to adjust speech patterns or voice commands between applications, and these can be freely defined.

〔第２実施形態〕
次に、本発明の第２実施形態に係る音声コマンド認識システム、音声コマンド認識方法、及びプログラムについて図面を参照して説明する。
上述した第１実施形態では、音声コマンドにコマンドグループＩＤを含め、コマンドグループＩＤを用いて候補コマンドから処理対象の音声コマンドを特定したが、本実施形態では、コマンドグループＩＤを用いない点が異なる。
以下、上述した第１実施形態と共通する点については説明を省略し、異なる点について主に説明する。 [Second embodiment]
Next, a voice command recognition system, a voice command recognition method, and a program according to a second embodiment of the present invention will be described with reference to the drawings.
In the first embodiment described above, the command group ID is included in the voice command and the voice command to be processed is identified from the candidate commands using the command group ID, but this embodiment differs in that the command group ID is not used. .
Hereinafter, the explanation of the points common to the first embodiment described above will be omitted, and the points that are different will be mainly explained.

例えば、ユーザがニュースを読んでいる際、「次のニュース」との発話の後に「次」と発話した場合は、この「次」との発話は、その前に行われた「次のニュース」を簡素化した指示であると解釈することができる。
このように、本実施形態では、連続して発話される可能性の高い異なる発話パターンであって、同じ意味を意図している発話パターンを一つのグループとして捉え、これらに共通のコマンドＩＤを付与する。 For example, if a user is reading the news and utters ``next'' after uttering ``next news'', this utterance of ``next'' will be interpreted as the previous utterance of ``next news''. can be interpreted as a simplified instruction.
In this way, in this embodiment, different speech patterns that are likely to be uttered consecutively and that have the same meaning are regarded as one group, and a common command ID is assigned to them. do.

例えば、図９に示すように、発話パターン「次のニュース」、「次」を一つの連続する発話グループとして捉え、これら発話パターンに共通のコマンドＩＤ「ＮｅｘｔＮｅｗｓ」を関連付けてコマンド定義情報ＤＦ１’に予め登録しておく。 For example, as shown in FIG. 9, the utterance patterns "Next News" and "Next" are regarded as one continuous utterance group, and a common command ID "NextNews" is associated with these utterance patterns, and the command definition information DF1' is created. Register in advance.

このようなコマンド定義情報ＤＦ１’の作りこみをすることで、上述した第１実施形態のように、コマンドグループＩＤを用いずに処理対象の音声コマンドを特定することが可能となる。以下、本実施形態に係る音声コマンド認識方法について簡単に説明する。
なお、本実施形態に係る実行処理定義情報（図示略）は、図４に示した実行処理定義情報ＤＦ２においてコマンドグループＩＤの情報が省略されたものとされる。また、履歴情報管理部５１によって管理されるコマンド履歴情報には、少なくともコマンドＩＤと時間情報とが関連付けられて登録される。 By creating such command definition information DF1', it becomes possible to specify the voice command to be processed without using the command group ID, as in the first embodiment described above. The voice command recognition method according to this embodiment will be briefly described below.
Note that the execution process definition information (not shown) according to this embodiment is the execution process definition information DF2 shown in FIG. 4 with the command group ID information omitted. Further, in the command history information managed by the history information management unit 51, at least a command ID and time information are registered in association with each other.

例えば、ユーザによって「次」との発話が行われた場合、コマンド候補特定部５２によって上述した第１実施形態と同様の処理が行われ、コマンド候補が特定される。これにより、例えば、図１０に示すように、コマンドＩＤ「ＮｅｘｔＮｅｗｓ」、「ＮｅｘｔＭｕｓｉｃ」の音声コマンドＣ２、Ｃ４がコマンド候補として特定される。 For example, when the user utters "next", the command candidate specifying unit 52 performs the same process as in the first embodiment described above to specify a command candidate. As a result, for example, as shown in FIG. 10, audio commands C2 and C4 with command IDs "NextNews" and "NextMusic" are identified as command candidates.

続いて、コマンド特定部（図示略）は、コマンド候補である音声コマンドＣ２，Ｃ４のいずれかを処理対象の音声コマンドとして特定する。例えば、コマンド特定部は、図１０に例示するように、コマンド候補である音声コマンドＣ２のコマンドＩＤ「ＮｅｘｔＮｅｗｓ」と、コマンド候補である音声コマンドＣ４のコマンドＩＤ「ＮｅｘｔＭｕｓｉｃ」と、コマンド履歴情報に登録されているコマンドＩＤとを照合する。このとき、コマンド特定部は、コマンド履歴情報に登録されている複数のコマンドＩＤのうち、コマンド認識時刻が新しいコマンドＩＤから順に照合を行う。この結果、図１０に示した例では、コマンドＩＤ「ＮｅｘｔＮｅｗｓ」が互いに一致することとなり、コマンドＩＤ「ＮｅｘｔＮｅｗｓ」の音声コマンドＣ２が処理対象の音声コマンドとして特定される。 Subsequently, the command specifying unit (not shown) specifies either of the voice commands C2 and C4, which are command candidates, as the voice command to be processed. For example, as illustrated in FIG. 10, the command identification unit registers the command ID "NextNews" of the voice command C2 which is a command candidate, the command ID "NextMusic" of the voice command C4 which is a command candidate, in the command history information. Check the command ID that has been set. At this time, the command specifying unit performs verification among the plurality of command IDs registered in the command history information in order from the command ID with the latest command recognition time. As a result, in the example shown in FIG. 10, the command IDs "NextNews" match each other, and the voice command C2 with the command ID "NextNews" is specified as the voice command to be processed.

本実施形態によれば、連続して発話されやすいグループの発話パターンをグループ化して共通のコマンドＩＤを付与し、音声コマンドを認識する際には、コマンド履歴情報のコマンドＩＤとコマンド認識時刻とを用いて、音声コマンドを特定する。これにより、発話パターンを簡略化することができるとともに、ユーザの意図を反映したコマンド認識を行うことが可能となる。 According to this embodiment, speech patterns of groups that are likely to be uttered consecutively are grouped and given a common command ID, and when recognizing a voice command, the command ID and command recognition time of command history information are used. to identify voice commands. This makes it possible to simplify the speech pattern and perform command recognition that reflects the user's intention.

〔第３実施形態〕
次に、本発明の第３実施形態に係る音声コマンド認識システム、音声コマンド認識方法、及びプログラムについて図面を参照して説明する。
例えば、上述した「次」との発話の他、ユーザが繰り返し指示を出す可能性のある発話パターンとして「もっと」などがある。この「もっと」という発話は、その前に発話された内容によって意味が変わる。例えば、テレビを視聴しているときに「もっと」と発話された場合、解釈としては、チャンネルを上げる、チャンネルを下げる、音量を上げる、音量を下げる等、判断がつきにくい。しかしながら、「もっと」と発話される前に「チャンネルを上げて」と発話されていた場合、「もっと」という発話は、「チャンネルを上げる」ことを意図していると解釈できる。
本実施形態は、「もっと」等のようなあいまいな発話からユーザが意図するコマンドを認定する点に特徴を有する。
以下、上述した第１実施形態と共通する点については説明を省略し、異なる点について主に説明する。 [Third embodiment]
Next, a voice command recognition system, a voice command recognition method, and a program according to a third embodiment of the present invention will be described with reference to the drawings.
For example, in addition to the above-mentioned utterance ``next'', there is a utterance pattern such as ``more'' that the user may repeatedly issue an instruction to. The meaning of this utterance "more" changes depending on what was uttered before it. For example, if someone says "more" while watching TV, it is difficult to determine how to interpret it, such as raising the channel, lowering the channel, raising the volume, or lowering the volume. However, if "turn up the channel" is uttered before "more" is uttered, the utterance "more" can be interpreted as intending to "turn up the channel."
The present embodiment is characterized in that the command intended by the user is recognized from ambiguous utterances such as "more".
Hereinafter, the explanation of the points common to the first embodiment described above will be omitted, and the points that are different will be mainly explained.

図１１は、本実施形態に係るコマンド定義情報ＤＦ１の一例を示した図、図１２は、本実施形態に係る実行処理定義情報ＤＦ２の一例を示した図である。
コマンド定義情報ＤＦ１には、第１実施形態で説明した通り、発話パターンとコマンドＩＤとが関連付けられて登録されている。図１１に例示したコマンド定義情報ＤＦ１では、発話パターン「チャンネルを上げて」とコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」とが関連付けられ、発話パターン「チャンネルを下げて」とコマンドＩＤ「ＣｈａｎｎｅｌＤｏｗｎ」とが関連付けられ、発話パターン「もっと」とコマンドＩＤ「Ｍｏｒｅ」とが関連付けられている。 FIG. 11 is a diagram showing an example of command definition information DF1 according to this embodiment, and FIG. 12 is a diagram showing an example of execution process definition information DF2 according to this embodiment.
As described in the first embodiment, utterance patterns and command IDs are registered in the command definition information DF1 in association with each other. In the command definition information DF1 illustrated in FIG. 11, the utterance pattern "Raise the channel" is associated with the command ID "ChannelUp", the utterance pattern "Turn the channel down" is associated with the command ID "ChannelDown", and the utterance The pattern "More" is associated with the command ID "More".

実行処理定義情報ＤＦ２には、第１実施形態で説明した通り、複数の音声コマンドが登録されている。本実施形態において、音声コマンドは、独立コマンドと従属コマンドに区別される。
独立コマンドは、実行処理が定義されたコマンドである。例えば、第１実施形態で説明した音声コマンドは、全て独立コマンドである。
独立コマンドの音声コマンドは、コマンドＩＤ、コマンドグループＩＤ、及び実行する処理内容を含んでいる。 As explained in the first embodiment, a plurality of voice commands are registered in the execution process definition information DF2. In this embodiment, voice commands are distinguished into independent commands and dependent commands.
An independent command is a command for which execution processing is defined. For example, the voice commands described in the first embodiment are all independent commands.
The voice command, which is an independent command, includes a command ID, a command group ID, and the processing content to be executed.

一方、従属コマンドは、従属コマンドだけでは実行する処理内容が決定されず、従属する独立コマンドに応じて実行処理が変化するコマンドである。従属コマンドの音声コマンドは、例えば、コマンドＩＤ、コマンドグループＩＤ、従属する複数の独立コマンドのコマンドＩＤを含んでいる。 On the other hand, a dependent command is a command in which the processing content to be executed is not determined by the dependent command alone, and the execution processing changes depending on the dependent independent command. The voice command of the dependent command includes, for example, a command ID, a command group ID, and command IDs of a plurality of dependent independent commands.

図１２に例示した実行処理定義情報ＤＦ２では、独立コマンドとして、音声コマンドＣ１１，Ｃ１２が示されており、従属コマンドとして音声コマンドＣ１３が示されている。具体的には、独立コマンドである音声コマンドＣ１１には、コマンドＩＤ「ＣｈａｎｎｅｌＵｐ」、コマンドグループＩＤ「ＴＶＧｒｏｕｐ」、及び実行処理が登録され、音声コマンドＣ１２には、コマンドＩＤ「ＣｈａｎｎｅｌＤｏｗｎ」、コマンドグループＩＤ「ＴＶＧｒｏｕｐ」、及び実行処理が登録されている。 In the execution process definition information DF2 illustrated in FIG. 12, voice commands C11 and C12 are shown as independent commands, and voice command C13 is shown as a dependent command. Specifically, a command ID "ChannelUp", a command group ID "TVGroup", and an execution process are registered in the voice command C11 which is an independent command, and a command ID "ChannelDown" and a command group ID are registered in the voice command C12. “TVGroup” and execution processing are registered.

また、従属コマンドである音声コマンドＣ１３には、コマンドＩＤ「Ｍｏｒｅ」、コマンドグループＩＤ「ＴＶＧｒｏｕｐ」、及び従属する独立コマンドの情報としてコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」、「ＣｈａｎｎｅｌＤｏｗｎ」が登録されている。音声コマンドＣ１３がコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」に従属した場合には、コマンドＩＤ「ＣｈａｎｎｅｌＵｐ」で識別される音声コマンドＣ１１の処理内容とし、コマンドＩＤ「ＣｈａｎｎｅｌＤｏｗｎ」に従属した場合には、コマンドＩＤ「ＣｈａｎｎｅｌＤｏｗｎ」で識別される音声コマンドＣ１２の処理内容とする。 Furthermore, in the audio command C13 which is a subordinate command, a command ID "More", a command group ID "TVGroup", and command IDs "ChannelUp" and "ChannelDown" are registered as information of dependent independent commands. When the voice command C13 is subordinate to the command ID "ChannelUp", the processing contents are the voice command C11 identified by the command ID "ChannelUp", and when it is subordinate to the command ID "ChannelDown", the processing content is the command ID "ChannelDown". This is the processing content of the voice command C12 identified by .

図１３は、本実施形態にコマンド認識部５０ａが有する機能の一例を示した機能ブロック図である。
図１３に示すように、本実施形態に係るコマンド認識部５０ａは、履歴情報管理部５１、コマンド候補特定部５２、コマンド特定部５３、及び処理決定部５４を備えている。
処理決定部５４は、コマンド特定部５３によって特定された音声コマンドの処理内容を決定する。 FIG. 13 is a functional block diagram showing an example of the functions that the command recognition unit 50a has in this embodiment.
As shown in FIG. 13, the command recognition unit 50a according to this embodiment includes a history information management unit 51, a command candidate identification unit 52, a command identification unit 53, and a process determination unit 54.
The processing determining section 54 determines the processing content of the voice command specified by the command specifying section 53.

処理決定部５４は、判定部６１と、独立コマンド特定部６２と、決定部６３とを備えている。
判定部６１は、コマンド特定部５３によって特定された音声コマンドが独立コマンドか従属コマンドか否かを判定する。
判定部６１によって独立コマンドであると判定された場合には、第１実施形態と同様であり、特定した音声コマンドに含まれるコマンドグループＩＤで識別される連携アプリケーションＡＰへ音声コマンドが出力される。これにより、連携アプリケーションＡＰにおいて音声コマンドに応じた処理が実行される。 The processing determining section 54 includes a determining section 61, an independent command specifying section 62, and a determining section 63.
The determining unit 61 determines whether the voice command specified by the command specifying unit 53 is an independent command or a dependent command.
If the determining unit 61 determines that the command is an independent command, the same as in the first embodiment, the voice command is output to the cooperative application AP identified by the command group ID included in the specified voice command. Thereby, processing according to the voice command is executed in the cooperative application AP.

一方、判定部６１によって従属コマンドであると判定された場合には、独立コマンド特定部６２によって、当該音声コマンドが従属する独立コマンドが特定される。独立コマンド特定部６２は、当該音声コマンドに登録されている独立コマンドのうち、最も直近に認識された独立コマンドをコマンド履歴情報から特定する。 On the other hand, if the determining unit 61 determines that the voice command is a dependent command, the independent command specifying unit 62 specifies the independent command to which the voice command is dependent. The independent command identifying unit 62 identifies the most recently recognized independent command from the command history information among the independent commands registered in the voice command.

例えば、独立コマンド特定部６２は、コマンド特定部５３によって特定された音声コマンドが図１２に示される音声コマンドＣ１３であった場合、音声コマンドＣ１３から従属する独立コマンドのコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」、「ＣｈａｎｎｅｌＤｏｗｎ」を取得する。そして、取得したコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」、「ＣｈａｎｎｅｌＤｏｗｎ」と、コマンド履歴情報のコマンドＩＤとを照合し、時間情報の最も新しいコマンドＩＤの音声コマンドを当該従属コマンドが従属する独立コマンドとして特定する。 For example, if the voice command specified by the command specifying unit 53 is the voice command C13 shown in FIG. ”. Then, the acquired command IDs "ChannelUp" and "ChannelDown" are compared with the command ID in the command history information, and the voice command with the newest command ID in the time information is identified as the independent command to which the dependent command is subordinated.

例えば、図１４に示すように、コマンド履歴情報にコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」、「ＣｈａｎｎｅｌＤｏｗｎ」の音声コマンドの情報が登録されていた場合には、独立コマンド特定部６２は、時間情報の新しいコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」の音声コマンドを独立コマンドとして特定する。 For example, as shown in FIG. 14, when voice command information with command IDs "ChannelUp" and "ChannelDown" is registered in the command history information, the independent command specifying unit 62 selects a new command ID "ChannelDown" in the time information. The voice command "ChannelUp" is specified as an independent command.

決定部６３は、独立コマンド特定部６２によって特定された独立コマンドに基づいて、当該従属コマンドの処理内容を決定する。例えば、独立コマンドとしてコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」の音声コマンドＣ１１が特定された場合には、実行処理定義情報ＤＦ２からコマンドＩＤ「ＣｈａｎｎｅｌＵｐ」の処理内容を取得し、取得した処理内容に基づいて従属コマンドである音声コマンドＣ１３の処理内容を決定する。例えば、特定した独立コマンドの処理内容を当該従属コマンドである音声コマンドＣ１３の処理内容とする。これにより、当該音声コマンドＣ１３は、連携アプリケーションＡＰのテレビアプリケーションに出力され、チャンネルが上げられる。 The determining unit 63 determines the processing content of the dependent command based on the independent command specified by the independent command specifying unit 62. For example, when the voice command C11 with the command ID "ChannelUp" is specified as an independent command, the processing content of the command ID "ChannelUp" is obtained from the execution process definition information DF2, and the subordinate command is executed based on the obtained processing content. The processing content of a certain voice command C13 is determined. For example, the processing content of the identified independent command is set as the processing content of the voice command C13, which is the dependent command. Thereby, the voice command C13 is output to the television application of the cooperative application AP, and the channel is raised.

以上説明してきたように、本実施形態に係る音声コマンド認識システム、音声コマンド認識方法、及びプログラムによれば、音声コマンドを独立コマンドと従属コマンドとに区分し、従属コマンドの音声コマンドに、従属する独立コマンドの情報を登録する。そして、処理決定部５４は、コマンド特定部５３によって特定された音声コマンドが従属コマンドである場合に、当該音声コマンドが従属する独立コマンドのうち、最も直近に認識された独立コマンドをコマンド履歴情報を用いて特定し、特定した独立コマンドの処理内容に基づいて従属コマンドである当該音声コマンドの処理内容を決定する。 As described above, according to the voice command recognition system, voice command recognition method, and program according to the present embodiment, voice commands are divided into independent commands and dependent commands, and voice commands that are subordinate to the voice commands of the subordinate commands are Register independent command information. Then, when the voice command identified by the command specifying unit 53 is a dependent command, the process determining unit 54 selects the most recently recognized independent command from among the independent commands to which the voice command is dependent, and stores the command history information in the command history information. Based on the processing content of the identified independent command, the processing content of the voice command, which is a dependent command, is determined.

このような構成を備えることにより、「もっと」などの意味があいまいな発話からユーザが意図するコマンドを認識することが可能となる。特に、「もっと」は、先行して発話したコマンドを繰り返し指示する言葉である。本実施形態によれば、ユーザが先行して指示したコマンドを簡素な発話で繰り返し実行させることが可能となる。 With such a configuration, it becomes possible to recognize the command intended by the user from an utterance with an ambiguous meaning such as "more". In particular, "more" is a word that instructs the user to repeat a previously uttered command. According to the present embodiment, it is possible to repeatedly execute a command previously instructed by the user with a simple utterance.

以上、本発明について実施形態を用いて説明したが、本発明の技術的範囲は上記実施形態に記載の範囲には限定されない。発明の要旨を逸脱しない範囲で上記実施形態に多様な変更又は改良を加えることができ、該変更又は改良を加えた形態も本発明の技術的範囲に含まれる。また、上記実施形態を適宜組み合わせてもよい。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the range described in the above embodiments. Various changes or improvements can be made to the embodiments described above without departing from the gist of the invention, and forms with such changes or improvements are also included within the technical scope of the present invention. Further, the above embodiments may be combined as appropriate.

例えば、第３実施形態では、処理決定部５４が第１実施形態に係るコマンド認識部５０に適用される場合を例示して説明したが、第３実施形態に係る処理決定部５４の適用はこの例に限られない。例えば、第２実施形態に係るコマンド認識部に適用することも可能である。また、公知の音声エージェント機能にも汎用的に適用することが可能である。 For example, in the third embodiment, the case where the processing determining unit 54 is applied to the command recognition unit 50 according to the first embodiment has been described as an example, but the application of the processing determining unit 54 according to the third embodiment is as follows. Not limited to examples. For example, it is also possible to apply to the command recognition unit according to the second embodiment. Furthermore, it is also possible to apply the present invention to known voice agent functions in general.

また、各実施形態では、情報処理装置１が音声認識システムを搭載している場合を例示して説明したが、この例に限られない。例えば、音声認識システムの一部の構成が他のシステムやサーバに搭載されてもよい。この場合、通信部１６（図１参照）を通じて他の構成と接続し、上述した処理を実現させる。例えば、音声エージェント部３０が備える音声認識部４０は、所定のサーバ上に設けられていてもよい。 Further, in each embodiment, the case where the information processing device 1 is equipped with a voice recognition system has been described as an example, but the present invention is not limited to this example. For example, part of the configuration of the voice recognition system may be installed in another system or server. In this case, it is connected to other components through the communication unit 16 (see FIG. 1) to realize the above-described processing. For example, the voice recognition unit 40 included in the voice agent unit 30 may be provided on a predetermined server.

１：情報処理装置
１１：ＣＰＵ
１２：メインメモリ
１３：記憶部
１４：マイク
１５：スピーカ
１６：通信部
１７：入力部
１８：表示部
２０：コマンド定義データベース
３０：音声エージェント部
４０：音声認識部
５０：コマンド認識部
５０ａ：コマンド認識部
５１：履歴情報管理部
５２：コマンド候補特定部
５３：コマンド特定部
５４：処理決定部
６１：判定部
６２：独立コマンド特定部
６３：決定部 1: Information processing device 11: CPU
12: Main memory 13: Storage section 14: Microphone 15: Speaker 16: Communication section 17: Input section 18: Display section 20: Command definition database 30: Voice agent section 40: Voice recognition section 50: Command recognition section 50a: Command recognition Unit 51: History information management unit 52: Command candidate identification unit 53: Command identification unit 54: Processing determination unit 61: Determination unit 62: Independent command identification unit 63: Determination unit

Claims

a history information management unit that manages command history information in which voice command information including command group information is registered in association with time information;
a command candidate identification unit that identifies a voice command corresponding to voice data uttered by a user as a command candidate using definition information in which a plurality of voice commands including command group information are defined;
When multiple command candidates are identified, the command group information of the command candidates and the command group information and time information of the voice command registered in the command history information are used to select a processing target from the multiple command candidates. a command identification unit that identifies a voice command ;
a processing determining unit that determines processing content corresponding to the voice command identified by the command identifying unit;
Equipped with
The voice commands are classified into independent commands whose execution processing is defined and dependent commands whose execution processing changes depending on the dependent independent command,
In the definition information, information on a plurality of dependent independent commands is registered in the voice command of the dependent command,
The processing determining unit includes:
a determination unit that determines whether the voice command specified by the command identification unit is a dependent command;
an independent command specifying unit that uses the command history information to specify, when the specified voice command is a dependent command, the most recently recognized independent command among the independent commands to which the voice command is dependent;
a determining unit that determines processing content based on the voice command that is the identified independent command and the voice command that is the dependent command;
A voice command recognition system.

The command specifying unit specifies, as a processing target voice command, a candidate command whose command group information matches the information of a voice command having the latest time information among the voice command information that matches the command group information of each command candidate. The voice command recognition system according to claim 1.

3. The voice command recognition system according to claim 1, wherein the command group information is provided corresponding to an application that executes a voice command.

4. The voice command recognition system according to claim 1, wherein the history information management section deletes information on voice commands having time information from a predetermined period or more before the current time from the command history information.

a history information management step of managing command history information in which voice command information including command group information is registered in association with time information;
a command candidate identification step of identifying a voice command corresponding to voice data uttered by the user as a command candidate using definition information in which a plurality of voice commands including command group information are defined;
When multiple command candidates are identified, the command group information of the command candidates and the command group information and time information of the voice command registered in the command history information are used to select a processing target from the multiple command candidates. a command identifying step of identifying a voice command ;
a process determining step for determining the processing content corresponding to the voice command identified in the command identifying step;
The computer executes
The voice commands are classified into independent commands whose execution processing is defined and dependent commands whose execution processing changes depending on the dependent independent command,
In the definition information, information on a plurality of dependent independent commands is registered in the voice command of the dependent command,
The process determining step includes:
a determining step of determining whether the voice command identified in the command identifying step is a dependent command;
When the identified voice command is a dependent command, an independent command identifying step of identifying the most recently recognized independent command among the independent commands to which the voice command is dependent, using the command history information;
a decision step of determining processing content based on the voice command as the identified independent command and the voice command as the dependent command;
Voice command recognition methods including .

A program for causing a computer to function as the voice command recognition system according to any one of claims 1 to 4 .