[Machine Learning] Finding fast and slow speech automatically

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

Posts: 42
Joined: Thu Oct 08, 2015 4:56 am UTC

[Machine Learning] Finding fast and slow speech automatically

Postby jacques01 » Thu Jun 23, 2016 11:40 pm UTC

I am trying to develop a machine learning classifier that can automatically find fast or slow speech given the output of a speech to text engine.

I have a dataset I've created that consists of automatic speech to text transcription (ASR) where each word contains the following data:

{"word":"the", "start":0.0, "end":0.50, "pro":"dh ih s", "class":"O"}

Assume time is in seconds, "pro" is the phonetic transcription of the word (I used my own grapheme to phoneme converted to get that), and "class" is the annotated label I added. There are three possible classes for words:


"O" is basically a word that's normal in terms of speed. SLOW and FAST are exactly what they seem.

I have roughly 15,000 words annotated for speed. About half are O, and the other half is either SLOW or FAST.

I ran a baseline conditional random field sequence classifier (tested and trained on the data, no cross folds yet) that has an overall F1 of about 83%. SLOW words only have a recall of about 66%, whereas FAST has good recall and precision (both around 80%). Obviously, O also has good F1 at around 86%.

I haven't done any true evaluation yet because I have no idea what my features should be. I am not a speech scientist or a signal processing engineer. My hope is to not have to use actual features from the recorded audio, just the ASR output.

Given this information, what tools or algorithms can I use to help me discover features automatically?

Should I be using a conditional random field? Are there better approaches to this problem?

Thank you.

Return to “Computer Science”

Who is online

Users browsing this forum: phlip and 1 guest