Speech-To-Steve project is aiming to allow users to control the malmo agents via speech. At a high level, this is implemented by taking speech from the user and coverting it into text using Google SpeechRecognition, and using NLP libraries spaCy to parse and process the text to parameters, which will be fed to the commands in malmo. In addition, we implemented similarity check to the objects and multistep commands. This allows our agent, for example, understanding that stallions and horses are the same, or jumping exactly 10 times if such commands are given.
In order to decrease the cost, we also implemented similarity check on objects with support of spaCy so that the malmo agent is able to recognize "destroy" as "kill", "stallion" as "horse", etc.
| Speech Recognition Accuracy | Command Parsing Accuracy | Command Executed Accuracy | |
|---|---|---|---|
| Basic Commands | 95.20% | 95.20% | 95.20% |
| Advanced Commands | 86.80% | 88.00% | 91.02% |
- SpeechRecognition
Library for performing speech recognition - PyAudio
Record audio input from microphone - Google Speech Recognition API
Convert audio into text - spaCy
Information extraction and natural language understanding - NeuralCoref
Pipeline extension for spaCy 2.1+ which annotates and resolves coreference clusters using a neural network - craft_work.py
Malmo tutorial file used as reference for some crafting-related commands
- Speech Recognition Systems
Explanation of how basic speech-to-text models work and visualization of these models - spaCy Documentation
Detailed explanation of how spaCy works and its usage, as well as various diagrams - NeuralCoreF Documentation
Detailed explanation of how NeuralCoref works and its usage, as well as various diagrams.
