Pauses when reading text copied from PDF

Forum for TextAloud version 4

Moderator: Jim Bretti

Post Reply
codeowl
Posts: 7
Joined: Sat Nov 09, 2019 2:15 am
Contact:

Pauses when reading text copied from PDF

Post by codeowl »

Hi there,

I am currently trialing TextAloud 4, and it is looking great. I have used another TTS product for years that has some annoying issues so I thought it was time to find a better solution. So one of the annoying problems is pauses in text copied from PDF, Word Docs or web pages, specifically around dot points and headings that don't end in full stops. My previous TTS app and TextAloud 4 both read these as one big long sentence, which makes it hard to follow.

Now I see that TextAloud 4 has some great options to assist with this that my previous TTS app did not:
  • Settings
    • Speaking Rules
      • Automatic Pauses
        • Pause between sentences
        • Pause at 1 or more new line characters
        • Pause at 2 or more new line characters
      • Other Rules
        • Insert periods at newline characters for lines that do not end with punctuation
However I have tried all different combinations of these and none seem to produce the desired result.
Consider the following text:
Image

When I copy this to the clipboard and read it directly from there with TextAloud 4, with none of the options turned on, it does not pause after the heading, or after the end of each dot point.
There is no combination of options I can find that fixes this, that does not break something else. For example, if I turn on the option for "Insert periods at newline characters for lines that do not end with punctuation", this works perfectly for heading and dot points, they read beautifully with this option on. However it breaks all the other sentences. This is because the PDF has a new line at the end of each line. For example if I copy that text out of the PDF and into NotePad++ it looks like this:
Pre-Market Gappers
Experienced traders are sensitive to being in the right stocks at the
right time. As I mentioned, traders are only as good as the stocks they
trade. I and the traders in our community use a scanner every morning
that is programmed to find Stocks in Play based on the following
criteria:
Stocks that in the pre-market gapped up or down at least 2%
Stocks that have traded at least 50,000 shares in the pre-market
Stocks that have an average daily volume of over 500,000
shares
Stocks that have Average True Range of at least 50 cents (how
large of a range a stock has on average every day)
There is a fundamental catalyst for the stock
As a rule, I do not trade stocks with an enormous short interest
higher than 30% (the short interest is the quantity of stock
shares that investors or traders have sold short but not yet
covered or closed out)
Why these criteria?
And so TextAloud 4 pauses at the end of each line, which is mid sentence on most lines.

This is really frustrating, I feel like I am so close to a solution with TextAloud 4, but so far it has eluded me. Is there a way to solve this problem so that headings and dot points that do not end in full stops as well as normal sentences that span over multiple lines and include new line characters mid sentence, can all be read the way a human would read with the correct pauses?
Eg; Pause after Headings & Dot Points with no full stops, and pause after full stops.

Thank you for your time.

UPDATE:
I have found that if I copy the text from ebooks origional format AZW3, it does not have a line break at the end of each line and then the setting "Insert periods at newline characters for lines that do not end with punctuation", works perfectly!!.... except for when the dot point ends in brackets. For example:
  • Here is the first (dot point)
  • Here is the second (dot point)
Here is some additional text.
All the text in the above quote gets read as a single line. When I copy it to the clipboard and paste into NotePad++ it looks like this:
Here is the first (dot point)
Here is the first second (dot point)
Here is some additional text.
Also I have found that with numerical dot points, the numbers do not get read. I am not sure if it is possible to fix this as when I copy the numbered dot points from the eBook reader "Calibre", and paste into NotePad++, it strips out the numbers and just leaves them in a format like the above with each one separated by a line break.

Maybe the solution here is to get a better eBook viewing program, so that it copies all the data required (like the numbers in numbered lists). Do you have any suggestions for this?

Thanks again for your time.
Jim Bretti
Posts: 1558
Joined: Wed Oct 29, 2003 11:07 am
Contact:

Re: Pauses when reading text copied from PDF

Post by Jim Bretti »

If I understand correctly, one problem you're having is that the rule to insert periods is not handling lines that end with a close parentheses character. I'll look at fixing the rule, but in the meantime you could add a TextAloud pronunciation dictionary entry to handle this case. From the TextAloud main menu click Control Center -> Pronunciation Dictionary Maintenance. Select a dictionary in the left panel (or create a new dictionary). Then click the New Entry button and create a dictionary entry that looks like this:

Text Matching: Regular Expression
(?<=\))(?=\R)

Pronounce Using: Respell
.

So copy the regular expression above into the regular expression field in the New Dictionary Entry window. Set the Pronounce Using dropdown to "Respell", and place a period in the respell field. The expression will insert a period after any close parentheses character followed by a newline.

Does that help?
Jim Bretti
NextUp.com
codeowl
Posts: 7
Joined: Sat Nov 09, 2019 2:15 am
Contact:

Re: Pauses when reading text copied from PDF

Post by codeowl »

Jim,

Thanks for the response. Great to see the product is actively supported and developed!!
I will give the work around a go thanks. Another issue I just identified is when a line before dot points ends in a colon.
Eg; It looks like this in the book:
Some of these steps you should do before and after each and every single trade you make:
1. Education and simulated trading
2. Preparation
But when I copy it to NotePad++ and view in in there it looks like this.
Some of these steps you should do before and after each and every single trade you make:
Education and simulated trading
Preparation
And even though I have the setting "Insert periods at newline characters for lines that do not end with punctuation" turned on, it reads the first and second line without a pause, but does pause between the 2nd and third line as expected.

I tried using the setting "Pause after specified characters", but this had the unfortunate effect of pausing when reading the time.
Eg; There were pauses in all the times in this text:
I would say the first one hour that the market is open (9:30 to 10:30 a.m. New York time) is the absolute minimum time you should be available for trading and practice, in addition to any time you need for preparation before the market opens at 9:30 a.m. New York time. Sometimes I am done with trading and hit my daily goal by 9:45 a.m., but sometimes I need to watch the market longer to find trading opportunities.
It is fine in most cases to pause after a colon, but not in the middle of reading the time.
Is there a way I can get TextAloud to pause after a colon in all instances except when that colon is used in a time? Or only when the colon is at the end of a line with no punctuation?
Jim Bretti
Posts: 1558
Joined: Wed Oct 29, 2003 11:07 am
Contact:

Re: Pauses when reading text copied from PDF

Post by Jim Bretti »

I'm not sure how the voice you're using would handle a colon followed by a period, but here are two options.

First, here is how you can force a period to follow the colon at the end of a line.

To do this, modify the regular expression in my earlier post to look like this:

Text Matching: Regular Expression
(?<=[\):])(?=\R)

Pronounce Using: Respell
.

This works the same as the expression in the first post, but now we're looking for closing parentheses character *or* a colon at the end of a line, and inserting a period.


That might work, but it may be better to add a pause in this case. The regular expression is very similar:

Text Matching: Regular Expression
(?<=[:])(?=\R)

Pronounce Using: Respell
{{Pause=0.5}}

This inserts a 1/2 second pause at colons followed by a newline character.

I left the colon character inside square brackets. So if you need to add other character(s) later, you can include them inside the square brackets. To add a semi-colon, you'd change the expression to:

Text Matching: Regular Expression
(?<=[:;])(?=\R)

Pronounce Using: Respell
{{Pause=0.5}}

Does that help?
Jim Bretti
NextUp.com
codeowl
Posts: 7
Joined: Sat Nov 09, 2019 2:15 am
Contact:

Re: Pauses when reading text copied from PDF

Post by codeowl »

Jim,

Thanks for the great response. Text Aloud is so much more advanced than the previous TTS app I had been using, I am just loving its configurability!!
That regular expression totally fixed the problem.

I have one more I could use some help with, now the pauses are gone in the colons used in the time "06:30 pm", I am finding it pronouncing "06 to 30 pm". I looked into the help and found I could use a mask to identify this pattern like ##:##, and I want to swap it out with ## ##. I tried using $1 $2 as the Respell but it just pronounces: "one dollar two dollars pm". What can I put in the Respell field to get it to say the input with a space instead of the colon?

Also I found a small bug (I am a programmer by trade :). I I am using the dark theme, and if I open the help from the Edit Entry dialogue in the Dictionary, and click on some different items in the help menu and scroll around the help, it makes program windows title bar and borders turn white, and you can no longer see the control box. This is fixed after restart. I am using Windows 10 pro 64 bit.
I replicated it a second time following the above steps.
Jim Bretti
Posts: 1558
Joined: Wed Oct 29, 2003 11:07 am
Contact:

Re: Pauses when reading text copied from PDF

Post by Jim Bretti »

I think the problem with the mask is that you're missing parentheses. When you use $1 and $2 in the respell field, they correspond to values matched in the expression in parentheses pairs. If you change your mask from ##:## to (##):(##) I think it should work.

I'll check on the problem you found with displaying help with theme enabled.
Jim Bretti
NextUp.com
codeowl
Posts: 7
Joined: Sat Nov 09, 2019 2:15 am
Contact:

Re: Pauses when reading text copied from PDF

Post by codeowl »

Jim,

That is perfect mate!! Really great functionality!!

I went to buy TextAloud 4 and did a search of my emails first and turns out I actually have a version 3 license. I remembered buying it in the past (back in 2013) as I got a voice that wouldn't work with the product I had. But then I found a way around that and just continued using my old TTS app.
Clearly TextAloud 4 is now a superior product to the one I was using, am kicking myself for putting up with the old one for song long!!

Anyway, when I went to Upgrade my version 3 license the page had a field to use an Upgrade Coupon. How do I get one of these?

Regards
Jim Bretti
Posts: 1558
Joined: Wed Oct 29, 2003 11:07 am
Contact:

Re: Pauses when reading text copied from PDF

Post by Jim Bretti »

Glad to hear that helped.

If you need help with purchasing or upgrade coupons please contact our support desk at support@nextup.com
Jim Bretti
NextUp.com
codeowl
Posts: 7
Joined: Sat Nov 09, 2019 2:15 am
Contact:

Re: Pauses when reading text copied from PDF

Post by codeowl »

Jim, thanks for all your help here. A Grade Support mate!!
I will contact support and get a version 4 license sorted ;-)

Thanks again ;-)
kashiff
Posts: 1
Joined: Mon May 01, 2023 4:44 am
Contact:

Re: Pauses when reading text copied from PDF

Post by kashiff »

When reading text that has been copied from a PDF document, it is important to be aware of potential formatting issues that may affect the readability and accuracy of the content. PDF documents often contain text that is laid out in a specific way, which can result in the text being copied in a jumbled or disorganized manner. In addition, some PDFs may use fonts or formatting that are not compatible with the software used to view the copied text.

To ensure the best possible reading experience, it is recommended to check the copied text for formatting errors and correct them as necessary. This may involve adjusting the spacing, font size, or other formatting elements to make the text more readable and consistent. It is also a good idea to compare the copied text against the original PDF to verify its accuracy and completeness.
Post Reply