Structure / Used technologies

Backend: language model
Middleware: Bash-scripts who do various things and MarryTTS
Frontend: Simon

BACKEND: language model

For a PC to understand humans it needs to know how they sound and what words they use. It does that with a language model. It is the brain of this project :-D There are two types of models that describe language - grammars and statistical language models. Grammars describe very simple types of languages for command and control, and they are usually written by hand or generated automatically with plain code (Simon scenarios).

However, for now not even all command and control words are in the open source GERMAN language model.

I use the German language model from Voxforge: http://voxforge.org/

More specific the on from Guenter: http://goofy.zamia.org/voxforge/

I rely on Guenter to publish new language models (as they won't compile on my machine for some miraculous reason :-( ).

If you want to know more about the whole model compilation business check out his awesome tools: https://github.com/gooofy/voxforge/

MIDDLEWARE: bash scripts do the stuff .-)

Here is most of the work for my purpose done. Get the local weather data and play it back to you, get some jokes from the net, switch on a lamp ?? It is all done here. I use bash scripts for this because they are easy to understand and to modify if the web page does change their information layout etc. Also they don't need much knowledge so many people can contribute.

MarryTTS is used to generate spoken responses and output of the data collected. -> https://github.com/marytts/marytts

FRONTEND: Simon

Great program, it is very powerful and heart of this project! The dev is also really helpful and nice guy! It handles all the basics like microphone calibration, recording what you are saying, comparing that with the language model and launching commands etc

Other than that I use Simon to do various things from me:

Break down the functionality of the scripts into modules (called Scenarios), so someone can only use the "Search" module and don't has to install all the other modules like Translation etc.
It is possible to train the speech model to your voice so it understands you (better) (very important!)
Provides and easy method to change the activation sentences to your likening
Can do many more things on its own without the scripts :-D

Check out Simon here: http://grasch.net/blog and here https://simon.kde.org/

Youtube video: http://www.youtube.com/watch?v=x_9ImaiOISs&list=UUiVicBYegdFX9BnYOD2EMNw (quite old but you sould get the idea :-D)

<< go_back -Main Site-

Linjark

Getting easy speechrecognition/voicecontrol to the average Linux desktop users

Structure / Used technologies

BACKEND: language model

MIDDLEWARE: bash scripts do the stuff .-)

FRONTEND: Simon