regex help

Forum for TextAloud version 3

Moderator: Jim Bretti

regex help

Postby JamesW » Thu Aug 05, 2010 4:05 pm

Hello,

Since i'm terrible at regex i ask here.

Would it be possible to make a regex that could make text like
"T H E P R O C E S S A N D T H E P O W E R" become "THE PROCESS AND THE POWER" so the speech engine could speak it as words and not as letters ?

I have text book that have chapter names like this.

Since textaloud doesn't do pdf's i had to convert it to text and that's how it got this way :(
JamesW
 
Posts: 29
Joined: Sat Jul 11, 2009 5:34 pm

Re: regex help

Postby Jim Bretti » Fri Aug 06, 2010 10:13 am

Hi James,

On the PDF support, TextAloud does support opening pdf documents using File -> Open. There are some cases where we can't handle the pdf, mainly if the document is copy protected, or if the document was created from a scanner. So you might try opening the document with TextAloud File -> Open, and see if the text conversion is any different. Let me know if you have any problems importing the document.

If you end up needing some way to strip spaces like you're asking, I'll look into it, and maybe someone else will have some ideas. One problem will be figuring out which spaces should *not* be removed, so we don't end up with 'THEPROCESSANDTHEPOWER'.
Jim Bretti
NextUp.com
Listen and Learn Anywhere
http://www.NextUp.com
Jim Bretti
 
Posts: 1223
Joined: Wed Oct 29, 2003 11:07 am

Re: regex help

Postby JamesW » Fri Aug 06, 2010 12:42 pm

Oh i didn't know it could read pdfs, i assumed it couldn't based on forum threads :oops:

textalouds pdf to text utility worked surprising well, better then foxit readers, which was the tool i used before.
textalouds pdf to text is not perfect though, i see squares before the chapter names and some text gets moved two lines down but that's not a problem :)

Ex 1. "SECTION 5" becomes
"SECTION

5"
and here's a square sample copy and pasted " CLUE" when i use notepad2 it shows the squares as "FF" as a single "letter"
Image

Update
I noticed another issue with your pdf to text converter, it mangles lists, example:
"• one, blaha
• two, bla bla"
becomes (using textalouds pdf to text utility):
"one, blaha two, bla bla"

and that gets annoying after a while.

But thanks for the help

One question though: What can't textalouds pdf to text do, are there any small bugs in it like it misses things (common for many pdf to text tools)
I ask so i know what to watch out for.
JamesW
 
Posts: 29
Joined: Sat Jul 11, 2009 5:34 pm

Re: regex help

Postby Jim Bretti » Sat Aug 07, 2010 11:46 am

I don't know of any way to describe what kinds of problems you'll have with the TextAloud pdf to text conversion. It depends a lot on the document and how it was created. About the only thing you can do is try other pdf conversion utilities if you find the TextAloud conversion isn't working well.
Jim Bretti
NextUp.com
Listen and Learn Anywhere
http://www.NextUp.com
Jim Bretti
 
Posts: 1223
Joined: Wed Oct 29, 2003 11:07 am


Return to TextAloud 3 Forum

Who is online

Users browsing this forum: No registered users and 0 guests