Tensorflow Speech Recognition

Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks.

Replaces caffe-speech-recognition, see there for some background.

Ultimate goal

Create a decent standalone speech recognition for Linux etc. Some people say we have the models but not enough training data. We disagree: There is plenty of training data (100GB here and 21GB here on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

Getting started

Toy examples: ./number_classifier_tflearn.py ./speaker_classifier_tflearn.py

Some less trivial architectures: ./densenet_layer.py

Later: ./train.sh ./record.py

Partners + collaborators wanted

We are in the process of tackling this project in seriousness. If you want to join the party just start with a small pull request.

Update: Nervana demonstrated that it is possible for 'independents' to build models that are state of the art. Unfortunately they didn't open source the software.

###Fun tasks for newcomers

Data Augmentation : create on-the-fly modulation of the data: increase the speech frequency, add background noise, alter the pitch etc,...

###Extensions Extensions to current tensorflow which are probably needed:

WarpCTC on the GPU
Incremental collaborative snapshots ('P2P learning') !
Modular graphs/models + persistance

Even though this project is far from finished we hope it gives you some starting points.

Looking for a tensorflow consultant / deep learning contractor? Reach out to info@pannous.com

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
extra		extra
images		images
layer @ bda3f98		layer @ bda3f98
tensorpeers @ 2395bc4		tensorpeers @ 2395bc4
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
WarpCTC.txt		WarpCTC.txt
__init__.py		__init__.py
bdlstm_utils.py		bdlstm_utils.py
densenet_layer.py		densenet_layer.py
generate_speech_data.py		generate_speech_data.py
lstm_ctc_to_chars.py		lstm_ctc_to_chars.py
lstm_mfcc_ctc_to_words.py		lstm_mfcc_ctc_to_words.py
lstm_mfcc_to_chars.py		lstm_mfcc_to_chars.py
lstm_to_chars.py		lstm_to_chars.py
mfcc_feature_classifier.py		mfcc_feature_classifier.py
number_classifier_tflearn.py		number_classifier_tflearn.py
number_gan_layer.py		number_gan_layer.py
number_gan_tflearn.py		number_gan_tflearn.py
record.py		record.py
requirements.txt		requirements.txt
speaker_classifier_tflearn.py		speaker_classifier_tflearn.py
spectro_gan.py		spectro_gan.py
speech2text-seq2seq.py		speech2text-seq2seq.py
speech2text-tflearn.py		speech2text-tflearn.py
speech_data.py		speech_data.py
speech_encoder.py		speech_encoder.py
wave_GANerate.py		wave_GANerate.py
word_to_phonemes.swift		word_to_phonemes.swift

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tensorflow Speech Recognition

Ultimate goal

Getting started

Partners + collaborators wanted

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tensorflow Speech Recognition

Ultimate goal

Getting started

Partners + collaborators wanted

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages