|
This beta is for the more
technically advanced users, so if as you progress it seems too technically
advanced, you may want to wait a few months until we have this process
integrated as an easy to use feature in the product.
SAPI Background
Speech products such as TextAloud use a
microsoft created speech interface called SAPI to work with speech engines
from different vendors. There are two versions of SAPI. SAPI4 is
the older version, supported by more voice engines, and is more stable.
By default, whenever possible, TextAloud uses SAPI4 to work with engines from
L&H, Microsoft, and AT&T Natural Voices. This older standard is more
stable, and has wider support, but does not allow some advanced features, such
as changing voices within an article.
SAPI5 is a newer version of the
interface. It is supported by some microsoft voices, AT&T Natural
Voices, and the new Cepstral Beta voices we are testing. While it
sometimes isn't as stable, it enables some advanced features.
Specifically, SAPI5 supports XML tags within the text being spoken that allows
you to change the voice being used, as well as some other things such as
changing speed of speaking on the fly, forcing a particular pronunciation of a
word, place emphasis on a phrase, inserting silence into text (similar to our
Pause TAG), change volume, etc.
For this test, we will focus on
voice changing, but will provide a link where you can get more info on other
TAGs.
The point of this quick history
lesson is that to use XML TAGs, your copy of TextAloud must be using SAPI5 to
work with SAPI5 voice engines. It will not work with sapi4 voices.
OK, let's get started.
Step #1 (skip if you know you
already have sapi5 working on your system)
1st we must make sure SAPI5 is installed and works on your computer. You
must have Win98 or later. Download and install
http://www.webaloud.com/files/Microsoft-English-TTS-51.msi
This places SAPI5 on your computer. You
can verify it by finding the Speech icon on your computers control panel.
Natural Voices will support
SAPI5, and you can preview them through sapi5 by clicking the speech icon in
the control panel, and then the Text to Speech TAB.
You should at least have the
SAM voice. You can add sapi5 versions of Mary and Mike (old microsoft voices)
by installing
http://www.nextup.com/files/MarMike5.msi
Step #2
You will need an updated version of TextAloud.
Download and install version 1.454 or later from
http://nextup.com/TextAloud/download.html
Step #3
Next, we need to make the switch to SAPI5.
Start TextAloud, then click the Misc. Options TAB. You'll see a new
option on this screen, "SAPI Version". From that dropdown list, choose
Use SAPI5 ONLY.
After changing, you will see an
informational window, then click Proceed. This should put TextAloud into
a mode where only sapi5 voices can be chosen.
Step #4
OK, Let's try out voice changing. To
change voices within an article, you have to put in a voice change TAG into
the text at the point where you want the voice to switch. The TAG looks
like this
<voice
required="name=Audrey16">
The only part that you will
customize is the name of the voice. Here are some examples of names you
might use:
Microsoft Mary
Microsoft Mike
Cepstral Emily
Mike16
Crystal16
From these examples, you can
figure out the rest. Microsoft voices are the name of the voice, with
Microsoft in front of it. AT&T Voices are simply the voice name.
16khz voices have 16 appended, 8khz voices do not have a number appended.
So, an example of text you
might try in an article as an experiment is:
<voice
required="name=Crystal16"> Hi. My name is Crystal. I'd like to
introduce you to my friend Mike.
<voice required="name=Mike16"> Hello, this is Mike.
Step #5
Let us know if it worked for you and what
questions you have. If the tagging isn't exactly right, it will either
not switch voices, or give an error. If you are in sapi4 mode instead of
sapi5, it will actually speak those tags.
Email Us
with your feedback.
Advanced Issues
You can get more info on SAPI5
XML Tags by downloading the full sapi5 help file and looking in the index for
XML TAGS at
http://www.microsoft.com/speech/techinfo/apioverview/#_doc
Another issue to be aware of is
that different voice engines use different bitrates of output. This can
cause some unpredictable results. For example, 16khz Natural Voices set
output for 16khz. Microsoft voices uses 22khz, Cepstral may be something
different.
If an article is set for one
engine, and then you switch to a voice within another engine, it will work,
but the sound quality may not be as good as you normally expect. This
can happen whether speaking aloud or to audio file. So a general rule of
thumb is that the article should be set for a voice from a particular vendor,
and any voices you switch to within that article should be from the same
company.
Let us know what questions you
have, if you have any problems, or suggestions by emailing us at
mailto:support@nextup.com?subject=Voice Switching Feedback
|