PDF to text converting method. W/ free/open-source software?

Discussion Forum for TextAloud. Great place to share ideas, ask questions, talk with other users. If you have a tough technical question, still feel free to ask us at support@nextup.com. Also, if you would like a personal response, be sure to leave your email address.

Moderators: kdwhite, Jim Bretti, D.Leikin

PDF to text converting method. W/ free/open-source software?

Postby talker » Thu Sep 10, 2009 8:48 am

I have ebooks in PDF format which I would like to convert to be a good listen with TextAloud.

Let's say a PDF file has the book title in the header - it is easy to remove, I just have to search for the same character string and remove all from the whole text... but how to remove page numbering? It is tougher, because it is a different number on each page.

I also want to have, let's say every chapter in the book to be a different text file. Have you found an automation to split the book by chapters or do you do it by manually?
talker
 
Posts: 1
Joined: Wed Sep 02, 2009 2:37 pm

Re: PDF to text converting method. W/ free/open-source software?

Postby D.Leikin » Sat Sep 12, 2009 4:04 pm

Hi,

1. Here's what Jim recommends to suppress page numbers:
(below is the entry that should be added to Pronunciation Editor [Options->Pronunciation Editor])

Word: {{re=(\r\n\s*)p\.\s*\d+\s*(\r\n)}}
Pronunciation: <s>

Please see initial Jim's post for details :

viewtopic.php?f=11&t=3269

2. There IS an automation to split books by chapters.

You may just need to go File->File Splitter Utility, choose appropriate "Split Method" (to split by chapters you'd need to select "By Specified String in Input File"), then type in the string you need (e.g., CHAPTER in case each chapter starts with the word "CHAPTER") into the box named "Comma delimited list of strings", and then press the "Split" button.

The described procedure will get your file split into chapters (provided that you have not missed to first select the file you want to split by clicking the "Browse" button next to "Input File Path" and selecting the file, then specifying the "Output Directory" and then the "Base Output Filename"), resultant files (each containing only one chapter) being saved in the the Output Directory.

You may feel free to experiment with this file splitter procedure as the initial file will always be kept intact.
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Re: PDF to text converting method. W/ free/open-source software?

Postby jimfaster » Mon May 23, 2011 3:07 am

Try any online converter.
jimfaster
 
Posts: 4
Joined: Wed Mar 23, 2011 2:15 am

Re: PDF to text converting method. W/ free/open-source software?

Postby Braddy » Wed Oct 03, 2012 12:24 am

PDF2Text Pilot 3.0.1 is a freeware, open-source product. You can use the code as an example of solving a text extraction task in your software program.

The text extracting feature is handled by the PDF Creator Pilot library. Note that PDF Creator Pilot is a commercial component. So, the text extractor code, excepting the code of that PDF library, is available for free.
Braddy
 
Posts: 1
Joined: Tue Oct 02, 2012 6:55 am


Return to TextAloud 2 Forum

Who is online

Users browsing this forum: No registered users and 1 guest