Hi:
I did see this on the TextAloud 3 forum:
([a-z][.?!])(\d+)([,-]\d+)*
This works OK, except sometimes the textual output of the epub files Text Aloud generates appears to have a quotation mark (") immediately proceeding the superscript. I'd imagine this regular expression can be modified to handle this case as well.
However, what is generating the textual output that Text Aloud uses? I'd think much of the epub file and be extracted as HTML. If so, is it possible to remove the superscripts then when they'll be easy to clearly and accurately identify?
Thanks,
Paul
Removing superscripts from documents?
Moderator: Jim Bretti
-
- Posts: 1558
- Joined: Wed Oct 29, 2003 11:07 am
- Contact:
Re: Removing superscripts from documents?
Hi Paul,
I'll take a look a this, seems like we should be able to strip superscripts when reading the source html.
I'll take a look a this, seems like we should be able to strip superscripts when reading the source html.
Jim Bretti
NextUp.com
NextUp.com
Re: Removing superscripts from documents?
Hi Jim:
At least for epubs, after you extract the epub archive, it seems you can run a replacement regrex like this over the HTML files in the archive:
Search: <sup[^>]*>([^<]*)</sup>
Replacement: [\1]
It might need to be modified depending on the flavor of regrex engine, but it seems to be working to replace the HTML superscript tags with just the superscript number enclosed in square brackets. Then, you can simply use a Text Filter to 'Filter text in square brackets'.
Paul
At least for epubs, after you extract the epub archive, it seems you can run a replacement regrex like this over the HTML files in the archive:
Search: <sup[^>]*>([^<]*)</sup>
Replacement: [\1]
It might need to be modified depending on the flavor of regrex engine, but it seems to be working to replace the HTML superscript tags with just the superscript number enclosed in square brackets. Then, you can simply use a Text Filter to 'Filter text in square brackets'.
Paul
-
- Posts: 1558
- Joined: Wed Oct 29, 2003 11:07 am
- Contact:
Re: Removing superscripts from documents?
Hi Paul,
Thanks for the tip, that helps!
Thanks for the tip, that helps!
Jim Bretti
NextUp.com
NextUp.com