Repeating the same Regex, x number of times

Forum for TextAloud version 3

Moderator: Jim Bretti

Post Reply
PHenry1026
Posts: 231
Joined: Thu Jan 11, 2007 12:10 pm
Contact:

Repeating the same Regex, x number of times

Post by PHenry1026 »

Greetings:

Is there currently a way to get the regex engine to repeat the same regex, x number of times. If not, could this feature be developed for TA's Pronunciation Dictionary. There are number of cases (usually involving lists of numbers) that the only way for the regex to work as intended is for me to repeat the same regex manually x number of times.

Example 1:

<<start of text>>

Wethered did not compete in any more ladies' amateur championships but she continued to make her annual appearance in the Worplesdon mixed foursomes, which she won in 1922, 1923, 1927, 1928, 1931, 1932, 1933, and 1936.

<<end of text>>

In the above example, I have to enter the same regex 6 times consecutively for the dates to be pronounced correctly.

Example 2:

<<start of text>>

States With The Highest Starbucks Density (People Per Store)

By Marnie Hanel

Washington

9,918:1

Oregon

11,824:1

Nevada

12,388:1

<<end of text>>

In example 2 above, I have to enter the same regex 3 times consecutively for the ratios to be pronounced correctly.




P.S. In example 2, the regex is anchored by "States With The Highest Starbucks Density" and in example 1, is anchored by "she won in".
Last edited by PHenry1026 on Sun May 12, 2013 6:26 am, edited 3 times in total.
Jim Bretti
Posts: 1558
Joined: Wed Oct 29, 2003 11:07 am
Contact:

Re: Repeating the same Regex, x number of times

Post by Jim Bretti »

Hi Percy,

Would you post the regular expression and respell field for these dictionary entries so I can better understand?

Thanks.
Jim Bretti
NextUp.com
PHenry1026
Posts: 231
Joined: Thu Jan 11, 2007 12:10 pm
Contact:

Re: Repeating the same Regex, x number of times

Post by PHenry1026 »

For Example 1:

(?#19[1-9][0-9], 19[1-9][0-9]_1)(?m)(?<=^|\s|['"‘“(]|\p{Pi}|\p{Ps}|\p{Pd})which s?he won in (?:[^.]*|[^.]*?)\K(1[9])([1-9][0-9],)(?!(?: and| or) (?:1[9][1-9][0-9],?|2[0-9][1-9][0-9],?))(?=[’”\p{Po}\p{Pe}\p{Pf}]{0,2}(?:\s|$))


"$1" "$2"


(?#19[1-9][0-9], 19[1-9][0-9]_2)(?m)(?<=^|\s|['"‘“(]|\p{Pi}|\p{Ps}|\p{Pd})which s?he won in (?:[^.]*|[^.]*?)\K(1[9])([1-9][0-9],)(?!(?: and| or) (?:1[9][1-9][0-9],?|2[0-9][1-9][0-9],?))(?=[’”\p{Po}\p{Pe}\p{Pf}]{0,2}(?:\s|$))

"$1" "$2"


(?#19[1-9][0-9], 19[1-9][0-9]_3)(?m)(?<=^|\s|['"‘“(]|\p{Pi}|\p{Ps}|\p{Pd})which s?he won in (?:[^.]*|[^.]*?)\K(1[9])([1-9][0-9],)(?!(?: and| or) (?:1[9][1-9][0-9],?|2[0-9][1-9][0-9],?))(?=[’”\p{Po}\p{Pe}\p{Pf}]{0,2}(?:\s|$))


"$1" "$2"


(?#19[1-9][0-9], 19[1-9][0-9]_4)(?m)(?<=^|\s|['"‘“(]|\p{Pi}|\p{Ps}|\p{Pd})which s?he won in (?:[^.]*|[^.]*?)\K(1[9])([1-9][0-9],)(?!(?: and| or) (?:1[9][1-9][0-9],?|2[0-9][1-9][0-9],?))(?=[’”\p{Po}\p{Pe}\p{Pf}]{0,2}(?:\s|$))

"$1" "$2"


(?#19[1-9][0-9], 19[1-9][0-9]_5)(?m)(?<=^|\s|['"‘“(]|\p{Pi}|\p{Ps}|\p{Pd})which s?he won in (?:[^.]*|[^.]*?)\K(1[9])([1-9][0-9],)(?!(?: and| or) (?:1[9][1-9][0-9],?|2[0-9][1-9][0-9],?))(?=[’”\p{Po}\p{Pe}\p{Pf}]{0,2}(?:\s|$))


"$1" "$2"


(?#19[1-9][0-9], 19[1-9][0-9]_6)(?m)(?<=^|\s|['"‘“(]|\p{Pi}|\p{Ps}|\p{Pd})which s?he won in (?:[^.]*|[^.]*?)\K(1[9])([1-9][0-9],)(?!(?: and| or) (?:1[9][1-9][0-9],?|2[0-9][1-9][0-9],?))(?=[’”\p{Po}\p{Pe}\p{Pf}]{0,2}(?:\s|$))


"$1" "$2"



(?#19[1-9][0-9] and)(?m)(?<= )(1[9])([1-9][0-9],?) and (1[9])([1-9][0-9])(?=\p{Pd}|[’”\p{Po}\p{Pe}\p{Pf}]{0,2}(?:\s|$))


"$1" $2 and "$3" "$4"



For Example 2:


(?#Ratio_\K_1)(?m)^\s*?States With The Highest Starbucks Density(?:.*|.*?)\r?\n\s*?\K([1-9][0-9]{0,}|[1-9][0-9]{0,2}\,[0-9]{3}):([1-9][0-9]{0,}|[1-9][0-9]{0,2}\,[0-9]{3})(?=\s*?$)


$1 to $2



(?#Ratio_\K_2)(?m)^\s*?States With The Highest Starbucks Density(?:.*|.*?)\r?\n\s*?\K([1-9][0-9]{0,}|[1-9][0-9]{0,2}\,[0-9]{3}):([1-9][0-9]{0,}|[1-9][0-9]{0,2}\,[0-9]{3})(?=\s*?$)


$1 to $2


(?#Ratio_\K_3)(?m)^\s*?States With The Highest Starbucks Density(?:.*|.*?)\r?\n\s*?\K([1-9][0-9]{0,}|[1-9][0-9]{0,2}\,[0-9]{3}):([1-9][0-9]{0,}|[1-9][0-9]{0,2}\,[0-9]{3})(?=\s*?$)


$1 to $2
Last edited by PHenry1026 on Sun May 12, 2013 6:28 am, edited 3 times in total.
Jim Bretti
Posts: 1558
Joined: Wed Oct 29, 2003 11:07 am
Contact:

Re: Repeating the same Regex, x number of times

Post by Jim Bretti »

Lets concentrate on the first example for now, the one with handling dates. The way I understand it, one expression is used multiple times for adjusting the pronunciation of 4 digit year numbers under these conditions:
1. Text must be anchored with either "which she won in" or "which he won in"
2. Year numbers have to be followed by commas
3. The expression includes a negative lookahead that prevents year number match if either "and" or "or" follows the year number.

You have a separate expression set up specifically to handle (2) and (3).

I guess the first question is does the year number correction need to be this restrictive? Could you replace this with something more general, like:

(?<=\b)(19)(\d\d)(?=\b)

and respell as

$1 $2

Using something more general like this would solve the problem with repeating, and should also be more efficient. On the expression you're needing to repeat 6 times, take a look at this part of the expression: (?:[^.]*|[^.]*?). The greedy match there *might* be causing trouble, I'm not sure. It may be causing the expression to find the last year that matches on the first try (1932), and the repeats are required to back up. Just a guess, could be wrong about that. It is possible this may help with handling your repeat problem, but you still would need the second expression for the other case.


Basically same question / idea on second example with ratios ... could you use a general expression like:

(?<=\b)(\d+,)*(\d+):(\d)(?=\b)

This would find strings like 23:1 1,000:1 etc

I need to think some more about having repeat counts for dictionary entries, but I'd look at more general expressions if at all possible.
Jim Bretti
NextUp.com
PHenry1026
Posts: 231
Joined: Thu Jan 11, 2007 12:10 pm
Contact:

Re: Repeating the same Regex, x number of times

Post by PHenry1026 »

Hi Jim,

Thanks for your suggestions, but the problem with using these simpler matches is that they create a lot of unintended matches. I am still amazed that you favor \b which can be very unpredictable in its matching, especially when numbers are involved. I had checked the regex with the greedy and non-greedy quantifiers and that is not the problem. One causes the first ratio to match and the other causes the last ratio to match (It does not matter which order they are in, second ratio won't match; same analysis for the date).

I am still not sure from you reply if you are thinking about allowing some kind of looping (repeating the same regex, x number of times). Otherwise, I believe your analysis is right on target.



P.S. I noticed that you were careful in your ratio example not to match 1:10, as this would destroy all date and time functions; but 1:10 is a legitimate ratio. Also ratios like 0:0, etc, make no sense. Also, (?<=\b)(19)(\d\d)(?=\b), would not pronounce dates like 1800, 1900, and 1905 correctly. If you are not going to allow some kind of looping, please leave things as they are; I don't mind the pain of entering the regex manually.
Post Reply