11 : Do You Speak my Language?

“Grandpa, what was life like before computers got so small that they became invisible”, this caption of a cartoon sums up our predicament. How do we communicate with applications on the ever decreasing size of our computers. On the smaller machines, even familiar applications may look different and behave oddly. We may need help to use the different interface. However, reading a help document on a cramped Netbook or a smart phonce screen is not easy. Wouldn't it be nicer if the “mouse over” text or a help message could be read out to us instead?

Efforts at text to speech has been around long before computers were created (see http://en.wikipedia.org/wiki/Text_to_speech). Getting the computer to speak in a nice voice has proven to be a remarkably hard task. Many Interactive Voice Response(IVR) systems get around the problem by recording phrases spoken by people and playing back the appropriate sequence of wave files. This is clearly feasible if the size of the required vocabulary is small.

If you are willing to compromise and accept a voice which is comprehensible, even if robotic, there are a few options. The eSpeak (http://espeak.sourceforge.net/ ) system should be first to be explored as it is widely used as a part of the accessibility features and is the platform being used by the OLPC/Sugar project. It is small and has support for a fair number of languages.

Getting Started

Python provides a wrapper for using the Speech Dispatcher server, a generic server for text to speech (TTS) applications. It can talk to various TTS engines, including eSpeak, flite and festival. 'Festival' has the best voices but the voices are available easily for English only.

You will need to ensure that several software packages are installed. The minimal list for Fedora 10 and Ubuntu 8.10 is:

espeak

espeak-data also on Ubuntu

speech-dispatcher

speech-dispatcher-python (on Fedora)

python-speechd (on Ubuntu)

You may wish to enter some text in Hindi. Indic Onscreen Keyboard (available on Fedora but not on Ubuntu) is a reasonable option. The on-screen keyboard and the keyboard layouts are provided by:

iok

m17n-contrib-hindi

You will need to configure the default engine in /etc/speech-dispatcher/speechd.conf:

DefaultModule espeak

Since Ubuntu and Fedora are now using Pulseaudio, you should change the audio output in /etc/speech-dispatcher/modules/espeak.conf to pulse; otherwise, you may hear only silence(on Fedora):

EspeakAudioOutputMethod "pulse"

You will need to create the default directory for the logs. The speech dispatcher will work but no log messages will be stored.

$ sudo mkdir /var/log/speech-dispatcher

Now, start the speech-dispatcher service and you are ready to start.

$ sudo service speech-dispatcherd start

Verify that the speech dispatcher is working:

$ spd-say 'Hello, Hello, Testing 1 2 3'

If you hear what you expect, you can now proceed further.

Learning the First Steps

You need to learn what the speech dispatcher provides and how it behaves. Interactive learning is the easiest. So, start you Python interpreter and try the following:

>>> import speechd

>>> dir(speechd)

>>> help(speechd.Speaker)

A little exploring and you realise that you need to create an object of the type Speaker. So, create it and see what next.

>>> spk = speechd.Speaker('me')

>>> dir(spk)

List commands are always helpful. So, try

>>> spk.list_output_modules()

('espeak', 'flite')

The first command tells you the output TTS engines available. If the default is not espeak or you wish to switch between the modules, you can easily do so:

>>> spk.set_output_module('espeak')

>>> spk.speak('Testing, 1 2 3')

(225, 'OK MESSAGE QUEUED', ('3',))

That was easy. Now, list the voices available:

>>> spk.list_synthesis_voices()

(('afrikaans', 'af', 'none'), ... ('hindi-test', 'hi', 'none'), ...('cantonese-test', 'zh', 'yue'))

>>> spk.set_language('hi')

Hindi is an option, though as a testing version at present. Try to say something.

>>> spk.speak('Testing, 1 2 3')

(225, 'OK MESSAGE QUEUED', ('4',))

You should have noticed a difference. May be it was too fast so, let us slow it down:

>>> help(spk.set_rate)

>>> spk.set_rate(-50)

>>> spk.speak('Testing, 1 2 3')

Negative values of setting the rate, slow down the speech. The numbers should clearly be spoken in Hindi.

>>> help(spk.set_voice)

>>> spk.set_voice('FEMALE1')

>>> spk.speak('Testing, 1 2 3')

>>> spk.set_pitch(50)

>>> spk.speak('Testing, 1 2 3')

Continue your exploring. The voice isn't very feminine, so more potential for work!

What if you enter a Hindi text message? Try using the on-screen keyboard, iok and enter a unicode message:

>>> text=u"गब्बर सिंह कहते धे, जो डर गया वह मर गया."

>>> spk.speak(text)

Don't be surprised if the Hindi text looks different in various consoles. Even the synthetic voice seems scared of Gabbar Singh! More areas of potential improvements.

Try changing the TTS language to English and hear the difference.

Read Aloud Application

You can have the computer read aloud one of the first nursery rhymes that I learnt. Save it in a text file, NurseryRhyme.txt.

मछली जल की है रानी.

जीवन उसका है पानी.

हाध लगाओ, डर जाएगी.

बाहर निकालो, मर जाएगी.

Your basic program, read_aloud.py, would look like:

import speechd

s = speechd.Speaker('ReadAloud')

s.set_output_module('espeak')

s.set_language('hi')

f = open('NurseryRhyme.txt')

s.set_rate(-50)

for line in f.readlines():

sentence = unicode(line, 'utf8')

print sentence

s.speak(sentence)

s.close()

In Python 3, all strings will be unicode, but currently, you will need to interpret the data read as a unicode string.

You would notice that the entire poem is displayed even before the first line is spoken. You want to display each line as it is being spoken.

So, you need to think in terms of events. Your program needs to wait till the TTS engine has finished speaking the line. You would notice that the speak method accepts a call back parameter and an events parameter. This ensures that the speech dispatcher will call your program back after the events you have requested.

The call back method will be passed a parameter, the event, which resulted in it being called. The speech dispatcher currently has 2 events – 'begin' and 'end'. Needless to say, you would be interested only in the second event.

Python's threading module contains a class Event which will be very useful for this application. So, the better version of your read aloud application will become:

import speechd

from threading import Event

def spoken(event):

speech_over.set()

s = speechd.Speaker('ReadAloud')

s.set_output_module('espeak')

s.set_language('hi')

f = open('NurseryRhyme.txt')

s.set_rate(-50)

speech_over = Event()

for line in f.readlines():

sentence = unicode(line, 'utf8')

print sentence

s.speak(sentence, callback=spoken, event_types='end')

speech_over.wait()

speech_over.clear()

s.close()

You have added just 6 lines, modified one line and gained quite a bit of control over the ability to integrate the playing of sound with the rest of the application.

Dhvani and Festival

Dhvani won an award from LFY last year and is available at http://dhvani.sourceforge.net/. Using the source code from the repository and a little help from Santhosh Thottingal, I could try the above examples code. The voice quality seems more natural than the espeak voice. I hope the package evolves and becomes a part of the Fedora and Ubuntu repositories.

Information about integration with speech dispatcher is available at http://dhvani.sourceforge.net/doc/screenreader.html. In our python code, the change needed is

s.set_output_module('dhvani-generic')

Debian and Ubuntu have a package for a Hindi voice for festival. The quality of voice is acceptable but not better than eSpeak's.

Final Words

If more applications used TTS, the speech quality would definitely improve. If speech quality were better, more applications would have used TTS.

Fortunately, smart cellphones, netbooks and ebook readers are going to change the dynamics of the above deadlock. Adding language support to eSpeak project is likely to get acceptance the fastest. (On how to go about it: http://espeak.sourceforge.net/add_language.html )

If user applications rely on speech dispatcher as Python encourages, it is easier to use a TTS engine optimised for specific languages, e.g. dhvani for Indic scripts. It is just a configuration detail.

It is worth adding TTS to your applications today. If for no other reason, it attracts attention.



<Prev>  <Next>
Comments