The short list below are a few regular expressions I've created that help with pronounciation. The part before the "/" goes in the word field; the part after the "/" goes in the pronounciatin field. The part in brackets is just a description. The first group are for all voices; the second group is specific to some NeoSpeech voice anomalies.

words like iPod, eBay
{{re=\b([ei])([A-Z]\w+)\b}} / $1$2

{{re=\b(\w*)\.?(\w+)\.(com|org|gov|us|cc|bus|net)}} / $1 dot$2 dot $3 Most year entries from 1600 through 1999: {{re=\b(16|17|18|19)([1-9]\d)\b}} /$1 $2 Year entries like "19-0h-1 or "17-0h-9": {{re=\b(16|17|18|19)0([1-9])\b}} /$1 o $2 Year entries ending in "00" that should be read "yy-hundred": {{re=\b(16|17|18|19)00\b}} /$1 hundred

e.g.:
{{re=\be\.g\.}} / for example

i.e.:
{{re=\bi\.e\.}} / that is

The following entries are specific to a few NeoSpeech voice qurks:

Fixes an occasional problem where second word in hyphenated word combination doesn't get read:
{{re=\b(\w+)-(\w+)}} / $1$2

Fixes problem for some two word combinations (_*_al adv_*_ like "financial advisor" or "funeral adminstrator" get proncouced with the first word being pluralized):
{{re=\b(\w+[aeiouyAEIOUY]l+)\b\s(ad[mv]\w+)\b}} / $1,$2

These next three fix the problem where Mr., Mrs. and Ms. are all spelled out:
{{re=\bmr\.\s(\w*)}} / mister $1 {{re=\bmrs\.\s}} / misses {{re=\bms\.\s}} / miz Fixes problem for some other two word combinations (_*_le adv_*_ like "responsable advisor" or "reasonable adminstrator" get proncouced with the first word being pluralized). Similar to another entry above where the first word ends in an al, el, etc. This one is where the first word ends in *le. {{re=\b(\w+l+e)\b\s+(ad[mv]\w+)\b}} ///$1 , $2

Here is one people ask frequently ... how to keep the items in a bulleted list from running together. Most times I've seen the bullet character its been a hex 95 character, so maybe need to tweak if you have a different character used for the bullet. This expression finds strings starting with a bullet character, followed by alpha numerics and spaces, then one or more carraige returns (no punctuation between bullet and carriage return(s)). The pronunciation sticks a period at the end of the string.

Word: {{re=\x95([\w\s]+)((\r\n)+)}} Pronunciation: $1.$2 This one fixes a very minor and specific pronounciation bug in the NeoSpeech voices, where the phrase "put it" is incorrectly pronounced "putS it" when it appears after a word ending in "y' and when either a colon or comma follows: word: {{re=\b(\w+y)\sput\sit[:,]\s+(\w+)\b}} pronounciation: $1 puuht it {{pause=0.25}} $2 Example: As Orwell once brilliantly put it: "Insincerity is the enemy of clear language." Go figure!

The regex in the original post, words like iPod, eBay {{re=\b([ei])([A-Z]\w+)\b}} / $1 $2 causes the word each to be spelled out.

I should have mentioned that you have to check the Case Senstive box. . . that will fix the "each" problem.

Also, make sure that the expression is entered with capital "A-Z" as shown above. What about sentences that start with the word Each ? That's the one that hit me. At first I didn't realize it was one of the regular expressions causing the problem, and tried to enter pronunciations specifically for the word to no avail. Then when I renamed the pronunciation file so it wasn't found, the problem disappeared so I realized I'd just wasted my time trying to tell it how to pronounce the word and the real problem was one of the regexes! Thanks, Mark. So the problem is with matching either eBay or EBay ?

The simplest way to do it would be to use the case sensitive checkbox as Sean indicated, and add "E" and "I" (uppercase) to the expression {{re=\b([eiEI])([A-Z]\w+)\b}} / $1 $2 There is also a modifier you can embed in an expression to toggle the "ignore case" switch within the expression. (?i) sets ignore case, and (?-i) turns it off. Here's how you could use the modifier in this case: {{re=\b(?i)([ei])(?-i)([A-Z]\w+)\b}} / $1 $2 For more on modifiers, see http://regular-expressions.org/refadv.html

Hi, Jim. My goal in posting was merely to raise the awareness that if a regular expression replacement isn't specific enough it can cause more problems than the help it gives. Thanks, Mark. ### Using the | to remove pause The pause in between the numbers for this expression bugged me. {{re=\b(16|17|18|19)([1-9]\d)\b}} / $1 $2 Example: 1642 / 16 42 I tried changing the pause using {{Pause=0.1}}, but this just adds to the initial pause instead of replacing it. I finally found that the pipe character "|" will slur the numbers the way I wanted. {{re=\b(16|17|18|19)([1-9]\d)\b}} / $1|$2 I'm not sure the pipe is the correct character to use but it seems to be a neutral character and has the effect I want. Thanks for the tips, Arlo

bump. This is an excellent RE post.

### Re: Helpful Regular Expressions

SFCurley wrote:The part before the "/" goes in the word field; the part after the "/" goes in the pronounciatin field. The part in brackets is just a description. dot com web addresses: {{re=\b(\w*)\.?(\w+)\.(com|org|gov|us|cc|bus|net)}} /$1 dot $2 dot$3

Again, I just cant get above to work despite following the initial explanation to the letter. What am I doing wrong? I tried checking/unchecking case-senitive but it just dont work either way.
Melker63

Posts: 44
Joined: Sat Nov 06, 2004 5:34 pm
Location: Stockholm

Two thoughts:

1. Can you cut and copy exactly what you have in each field and post?

2. Are any reg-exes working for you?

3. If the answer to question 2 is yes, but THIS particular regex is not working for you, it could be a conflict. If any other non-regex/non-mask pronounciation entry matches something in the web address, then this regex won't ever be evaluated since a match with higher precedence has matched. Also, if any other regex with higher precedence matches, that would pre-empt this one, too.

I think the precedence of matching goes:

Pronounciation editor entries, then
Reg-exes (in order of size of regex tring size from largest to smallest).
SFCurley

Posts: 361
Joined: Wed Dec 10, 2003 1:12 pm

The only other item I have in the pronunciation-window is:
&: / {{Pause=0.8}}<s>

{{re=\b(\w*)\.?(\w+)\.(com|org|gov|us|cc|bus|net)}} / $1 dot$2 dot $3 Melker63 Posts: 44 Joined: Sat Nov 06, 2004 5:34 pm Location: Stockholm Melker63 - It might help if you post the following: 1. What version of TextAloud you're running 2. A small sample of text containing a URL that isn't being handled by the expression 3. How you're hearing TextAloud pronounce the URL. Hopefully that will be enough to figure out what the problem is. Jim Bretti NextUp.com Listen and Learn Anywhere http://www.NextUp.com Jim Bretti Posts: 1223 Joined: Wed Oct 29, 2003 11:07 am Jim Bretti wrote:1. What version of TextAloud you're running 2. A small sample of text containing a URL that isn't being handled by the expression 3. How you're hearing TextAloud pronounce the URL. 1: I have the latest v2.185 version installed. 2: SFCurleys suggestion works nice with the following adress: http://news.bbc.co.uk/2/hi/science/nature/3686106.stm But not with these two below: http://www.911podcasts.com/default.php? ... pi=0&typ=0 http://www.dn.se/DNet/jsp/polopoly.jsp? ... nderType=6 3: Anything that isnt a word TA Spells out letter for letter in above two links. Melker63 Posts: 44 Joined: Sat Nov 06, 2004 5:34 pm Location: Stockholm This one should do it: word: {{re=www\.([A-Za-z0-9-]+)\.(com|net|org|gov|biz|us|cc|se)([?&A-Za-z0-9/+=,._-]*)(#)*}} Pronounciation: w w w dot$1 dot $2 <s> What's the .se domain by the way? SFCurley Posts: 361 Joined: Wed Dec 10, 2003 1:12 pm Sean, .se is the Internet country code top-level domain for Sweden. Last edited by D.Leikin on Wed Jun 14, 2006 3:42 pm, edited 1 time in total. D.Leikin Posts: 682 Joined: Sat Jan 14, 2006 2:15 pm Duh! Should've probably known that. (I would've guessed that if it was .sw). Thanks. SFCurley Posts: 361 Joined: Wed Dec 10, 2003 1:12 pm DNS now is supporting short names. For example, to go to NextUp one might simply type “nextup” instead of “www.nextup.com” in the address string. I just thought maybe there is no need for speaking www aloud. Last edited by D.Leikin on Wed Jun 14, 2006 3:50 pm, edited 1 time in total. D.Leikin Posts: 682 Joined: Sat Jan 14, 2006 2:15 pm Here's an improvement over the regex discussed above: word: {{re=(\w+)\.([A-Za-z0-9-]+)\.(com|net|org|gov|biz|us|cc)([?&A-Za-z0-9/+=,._-]*)(#)*}} pronounciation:$1 dot $2 dot$3 <s>

This one accounts for cases where "www" is not the first part of the web address, for example:
http://money.cnn.com/magazines/fortune/ ... /index.htm
SFCurley

Posts: 361
Joined: Wed Dec 10, 2003 1:12 pm

