Finereader 7.0 > 8.0 worth it?

Discussion Forum for TextAloud. Great place to share ideas, ask questions, talk with other users. If you have a tough technical question, still feel free to ask us at support@nextup.com. Also, if you would like a personal response, be sure to leave your email address.

Moderators: kdwhite, Jim Bretti, D.Leikin

Finereader 7.0 > 8.0 worth it?

Postby feddup » Wed Mar 08, 2006 6:20 pm

For about two years I've made MP3 books from purely technical computer related books. I'm completely devoted to Textaloud as well as finereader. I presently own Finereader 7.0 which works but has issues with tables, equations, screenshots and programming code. I spend long hours editing the output to make it useable. It was in these forums that I heard the new 8.0 version was much better at recognizing technical material. I knew of the new version but thought "no way I'm paying the $180 upgrade price. Is it worth it? Technical information is difficult for OCR software and i thought that was a given and would not change. Any guidance would be appreciatted. Is it worth $180?
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Postby D.Leikin » Thu Mar 09, 2006 4:15 am

Hi,

I think they provide trial versions which you can download and test on your own. (At least that was the case some time ago.) I guess that's the best way to decide whether you need it.
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Postby Bunger Henry » Thu Mar 09, 2006 8:15 am

I'm interested in this question, too. I use 7.0 to scan paperback novels. I notice that whereever there is a long dash character, the letter before and after the dash gets highlighted as an uncertain character. I can't find a way to make this stop happening. It's time consuming, because I spell check before I use Text Aloud, and the spell checker stops at all the long dashes. I wonder if they've fixed this problem with 8.0.
Bunger Henry
 
Posts: 149
Joined: Thu Apr 15, 2004 8:17 pm

Postby SFCurley » Thu Mar 09, 2006 12:40 pm

I upgraded a couple of months ago to 8.0. I'm very happy. That said, some of the features I'm now using in 8.0, like repetitive cropping may have been in 7.0 and I just didn't know, but overall I"m very happy w/ 8.0.
SFCurley
 
Posts: 361
Joined: Wed Dec 10, 2003 1:12 pm

Trial?

Postby feddup » Thu Mar 09, 2006 5:58 pm

I hadn't even considered trialing 8.0 .I'll have to look into it. If it helps on technical documents then I want it but I've adapted with 7.0 and would rather not shell out another $180. Thanks for the responses!
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Re: Trial?

Postby D.Leikin » Fri Mar 10, 2006 5:17 am

feddup wrote: If it helps on technical documents then I want it


As far as I know, Finereader 8.0 has OCR support for multilingual texts, simple chemical formulae, and programming languages, e.g. C/C++, Java, Basic, etc. I guess that all these capabilities can be used simultaneously.
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

OCR for scientific docs

Postby DaveH » Fri Mar 10, 2006 8:17 am

[quote="feddup"] Technical information is difficult for OCR software and i thought that was a given and would not change[/quote]

I am interested in converting downloaded PDF scientific documents to word or HTML in order to read with TextAloud. The closest that I have come to a decent solution is from intrapdf.com. I am also in the process of upgrading my scansoft OCR omnipage pro to version 15. Interestingly this includes their RealSpeak software and I believe, the option to use voice input to edit pronunciation. Could be interesting. If I find a good solution I will post here. Below is a copy of a reply from intrapdf to one of my queries which may be of interest:

Thank you for your attention to our product.
Converting scientific documents to HTML format is a real headache. We a going to release a special version of PDF converter to convert such files, this version will allow user to specify text areas for precisious conversion. This feature will preserve special symbols, formulas, etc. Regarding text-to-speach software, we can add a feature to add a hidden plain text, that will be read by this software, but will not be visible.
I will notify you when this version will be available for download.
Upgrades are free, you can order your copy now and upgrade to the newest version at any time.
You can order online at
https://www.regnow.com/softsell/nph-sof ... tem=3666-2
Kind regards,
Alexander Mova support@intrapdf.com

Dave
Dave UK
DaveH
 
Posts: 178
Joined: Tue Feb 17, 2004 11:54 am
Location: UK

Re: OCR for scientific docs

Postby D.Leikin » Sat Mar 11, 2006 10:03 am

Hi Dave

I’m interested in converting scientific PDF articles into Word format too and would like to share some info on the subject. Let me start on a merry note.

There’s a simple and effective way to do PDF-to-DOC/RTF conversion that preserves everything, i.e., text, graphics, formulae, formatting, layout, etc. Such conversion is done in just a mouseclick if a PDF file is opened in Adobe Acrobat Professional and then is saved as DOC or RTF file. I believe this is the most effective way to tackle the problem. It works fine. I have tested this “method” in the following way. First, I converted my paper (let its name be “Original.DOC”) that had both math equations and graphics into PDF format. Then the PDF file (let it be “Transformed. PDF”) was opened in Adobe Acrobat 7.0 Professional and saved as a “Restored.DOC”. I had compared the Original.doc with Restored.doc. I could not find any difference between the two files except for the fact that math formulae in Restored.doc turned into graphics. These formulae looked identical with equations in Original.DOC but could no longer be edited. Obviously, this happened because the DOC-to-PDF converter cannot interpret embedded formulae and just prints them out into the PDF file as graphics. This minor issue is inevitable because PDF readers cannot interpret embedded equations. On the whole, I think that Adobe Acrobat does ensure the best achievable results in PDF-to-DOC conversion.

The success of the above PDF-to-DOC conversion test is due to the fact that this conversion was made on the same machine where the PDF file had been created. Unfortunately, this will not be the case if one tries to convert a PDF file created on a different system.

PDF format was designed to make documents accessible in situations when fonts they employ are not installed on the end users’ systems. When a PDF is created, all the necessary fonts are embedded into the PDF file. If some of these fonts are missing on a system, the PDF reader will use embedded fonts instead to let the user correctly view the document or even to correctly print the document out. However, when a PDF is converted to DOC the result, generally, turns out to be corrupt because MS Word is unable to correctly interpret these missing fonts.

Theoretically, any font embedded into a PDF file can be extracted and installed on the user’s system. However, such extraction is a painstaking procedure and I doubt if it can be automated at all. The problem is that not all necessary font information is embedded into PDF files, so one has to manually specify dozens of input parameters to correctly extract the font he needs. Moreover, PDF can make use of fonts that are not currently supported by MS Word. Moreover, some fonts may not be extracted because of severe license restrictions imposed by their owners. Moreover, even if you are willing to spend up to $10,000 to get the latest Adobe Font Folio, the problem can still persist as some publishers may use fonts they had created on their own. (By the way, I’m not sure if Adobe’s OTE fonts are supported by MS Office applications.) So I do not believe that Intrapdf will have much luck in creating an automated PDF-to-DOC converter unless they invent some new technology. The only thing they can really do is let users manually specify the blocks that cannot be converted (such as math formulae) in order that these blocks are passed to the output DOC file as graphics. The same result can be achieved right away with Omnipage and Finereader OCR programs. One can just mark formulae blocks as graphics and send the recognized document to Word. The net result will match the highest quality theoretical limit that can be achieved by the use of Adobe Acrobat conversion capabilities. I think that OCR remains the most precise tool to convert PDF files.

If you are willing to experiment with PDF-to-DOC conversion and would like to know how embedded fonts are extracted from a PDF-file, please, let me know and I’ll try to provide you with some details on this issue.

Best regards

Dmitry
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

finereader?

Postby feddup » Sat Mar 11, 2006 11:38 am

Is the pdf to doc conversion you're speaking of being done by finereader or acrobat pro 7.0? I might have missed it. I'm almost resigned to upgrading to finereader 8.0 but don't have the expendable cash right now. I'm also interested in 8.0's supposed ability to automate (sounds like a macro) repeated finereader tasks. Does anybody have input on how that feature works?
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Postby D.Leikin » Sun Mar 12, 2006 4:52 am

Bunger Henry wrote:I'm interested in this question, too. I use 7.0 to scan paperback novels. I notice that whereever there is a long dash character, the letter before and after the dash gets highlighted as an uncertain character. I can't find a way to make this stop happening. It's time consuming, because I spell check before I use Text Aloud, and the spell checker stops at all the long dashes. I wonder if they've fixed this problem with 8.0.


Did you try to tweak things in TOOLS->OPTIONS->PROOFING?
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Re: OCR for scientific docs

Postby DaveH » Sun Mar 12, 2006 7:54 am

D.Leikin wrote:Hi Dave
I’m interested in converting scientific PDF articles into Word format too ...


Hi Dmitry

Yesterday I was walking on the Welsh hills in glorious sunshine but today snow is falling with the Malvern hills covered in a couple of inches and it is freezing. On the radio I am listening to England playing cricket with India in sweltering heat. Sorry...English people love to discuss the weather and drink tea!

Very many thanks for your illuminating discussion of PDF => Word conversion. I have in fact used omnipage 14 to zone equations as images for conversion and I agree that it works well. It is however, extremely tedious with a large number of pages to process. I have not used Adobe Acrobat but this approach certainly sounds interesting. I would guess that most journals probably stick to basic fonts. It is a bit pricey though. I hold an honorary post at Birmingham Univ however, so may be able to get a copy there if they have a site license.

I do wonder if the artificial intelligence people might have a way of recognising equations and formulae so that they could be treated as images for a reliable automated conversion technique. I was hoping that the Intrapdf people might be trying this route.

Unfortunately I don't think that I can spare time to experiment with extracting embedded fonts as I am not a software engineer. I am interested though and many thanks for the offer.

Best wishes
Dave
Dave UK
DaveH
 
Posts: 178
Joined: Tue Feb 17, 2004 11:54 am
Location: UK

TRIAL!

Postby feddup » Wed Mar 15, 2006 1:24 am

Thanks to d.Leikin for the finereader trial suggestion. ABBYY does offer a 15 day trial which for a $180 upgrade is a good idea. I might try it this weekend. My scanner died after tens of thousands of scans so my audio book creation has halted. I should be trying it by the weekend.
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Postby D.Leikin » Wed Mar 15, 2006 3:12 am

I guess that RF 8.0 has improved the recognition algorithm upon its previous version, and now one can use digital camera intead of a scanner. Just take a look at this feature.
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Postby Bunger Henry » Wed Mar 15, 2006 8:05 am

Did you try to tweak things in TOOLS->OPTIONS->PROOFING?


There is no such feature on Finereader 7.0. You can do Tools, Options, but there is no Proofing.
Bunger Henry
 
Posts: 149
Joined: Thu Apr 15, 2004 8:17 pm

Postby D.Leikin » Wed Mar 15, 2006 9:02 am

Very many apologies.
It was Tools > Options > Check Spelling.
I can locate it in FineReader 7.0 Professional (Build 7.0.0.522).
[/code]
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Trial begins

Postby feddup » Fri Mar 17, 2006 12:55 am

I started trialing finereader 8.0 tonight and it's definitely better but still far from perfect. I must mention when trying to find the best settings I hit finereader with the most difficult material I've encountered. This material has code samples, small fonts, large fonts, tables without borders or deviders, highlighted passages, screenshots and seperate boxes of dialogue in different colors. 8.0 definitely did much better but the difference isn't miraculous. I'm leaning towards purchasing the upgrade but I'll use the full 15 days to decide. Has anyone tried omnipage for material such as this? Do they have a trial? I know and like finereader but I'm not married to it!! It's not perfect. Heck I'd try other OCR software but I researched this to death and thought it was pretty much finereader vs. Omnipage period. Any news to the contrary would be considered.
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Postby D.Leikin » Fri Mar 17, 2006 5:06 am

You can find a 15-days trial of Omnipage 15.0 Professional here
http://www.soft32.com/download_126275.html
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Thanks again!

Postby feddup » Fri Mar 17, 2006 7:26 am

Thank you. I've bought a lot of marginal software over the years. I'm willing to waste $30 but not $180. I'll definitely give it a try. Would you happen to know if omnipage (Nuance) offers competitive upgrades like ABBYY does?
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Re: Thanks again!

Postby D.Leikin » Fri Mar 17, 2006 8:46 am

feddup wrote:I'm willing to waste $30 but not $180.


In Central Europe and Australia, FineReader 8.0 Upgrage goes off at about $100. I guess that's just the happy medium.
D.Leikin
 
Posts: 682
Joined: Sat Jan 14, 2006 2:15 pm

Re: Thanks again!

Postby DaveH » Fri Mar 17, 2006 6:36 pm

feddup wrote:Thank you. I've bought a lot of marginal software over the years. I'm willing to waste $30 but not $180. I'll definitely give it a try. Would you happen to know if omnipage (Nuance) offers competitive upgrades like ABBYY does?


I have just upgraded omnipage pro from V14 to V15. Am hoping to play with it over the weekend. I paid about £70 (~$120) from Amazon in the UK but it is worth shopping around. An initial look indicates large improvements over V14. It is closely integrated with Microsoft Word and can recognise symbols from keyboard and various alphabets (including the vital Greek letters). For the document I tried, it reproduced superscripts and subscripts e.g. powers 10^n but failed with 10^n^m. Looks as though it can handle symbols (from Word's dictionary) on a line but cannot use the equation editor in Word for multi-line expressions. For scanning pages from a typical novel the OCR seemed very accurate. For pages with digrams it got confused with text on the diagram and it was necessary to zone by hand for best results.

Dave
Dave UK
DaveH
 
Posts: 178
Joined: Tue Feb 17, 2004 11:54 am
Location: UK

Postby Bunger Henry » Fri Mar 17, 2006 7:03 pm

D.Leikin wrote:Very many apologies.
It was Tools > Options > Check Spelling.
I can locate it in FineReader 7.0 Professional (Build 7.0.0.522).
[/code]


I've used this option, but it doesn't help me with this issue. If there were a way to eliminate the uncertain character highlight for all characters adjacent to a long dash, it would be a major improvement. Those characters are almost never wrong, but it always highlights them. I guess the ends of the long dash are too close to the adjacent characters and this confuses the OCR software.
Bunger Henry
 
Posts: 149
Joined: Thu Apr 15, 2004 8:17 pm

Postby DaveH » Sat Mar 18, 2006 4:27 am

I've used this option, but it doesn't help me with this issue. If there were a way to eliminate the uncertain character highlight for all characters adjacent to a long dash, it would be a major improvement. Those characters are almost never wrong, but it always highlights them. I guess the ends of the long dash are too close to the adjacent characters and this confuses the OCR software.


In omnipage there is an option to ignore specified characters in the OCR. Hopefully this would replace with a space but I haven't tried it. This isn't a problem that I have noticed with omnipage.

Dave
Dave UK
DaveH
 
Posts: 178
Joined: Tue Feb 17, 2004 11:54 am
Location: UK

FR 8.0 better

Postby feddup » Sat Mar 18, 2006 11:36 pm

Finereader 8.0 does seem quite a bit better. I've used it today quite a bit. It still has issues if the content varies greatly. I'm still on dial up but I'm going to download the trial version of omnipage tonight when I go to bed. It's certainly worth a try. I'll hopefully have time tomorrow to try omnipage.
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

omnipage

Postby feddup » Sun Mar 19, 2006 10:40 am

apperently the omnipage trial version is almost half a gig in size so I'll be downloading it somewhere other than my home dial up connection. I have other resources for large downloads but it won't be happening today.
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Postby Bunger Henry » Sun Mar 19, 2006 8:53 pm

I went ahead and shelled out the $180 for the upgrade to Finereader 8.0. There seems to be an improvement in recognition quality, but I'm still seeing the problem where characters adjacent to long dashes are highlighted as being uncertain. That's unfortunate, since it can take awhile to sift through all those highlights in a 90-page document, and those characters by the long dashes are almost always correct. Could be I'm doing something wrong, since no one else seems to be having this problem.
Bunger Henry
 
Posts: 149
Joined: Thu Apr 15, 2004 8:17 pm

questionable

Postby feddup » Mon Mar 20, 2006 1:23 am

Finereader seems better overall but flaky. I had one page that wanted to turn sideways. It did it three times. I finally manually flipped it and manually set the blocks and it went ok. There was a read as single column "legasy setting" that's gone in 8.0. For columns of definitions and terms with no deviders that was handy. I did notice much greater speed and better recognition on curved surfaces. I was scanning dual pages where i couldn't before. It's definitely better in most respects but I've got to try omnipage if I can do it for free. I'm sure it has a different set of issues.
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Postby DaveH » Mon Mar 20, 2006 7:17 pm

DaveH wrote:
Henry Bunger wrote: If there were a way to eliminate the uncertain character highlight for all characters adjacent to a long dash, it would be a major improvement. Those characters are almost never wrong, but it always highlights them. I guess the ends of the long dash are too close to the adjacent characters and this confuses the OCR software.


In omnipage there is an option to ignore specified characters in the OCR. Hopefully this would replace with a space but I haven't tried it. This isn't a problem that I have noticed with omnipage.

Dave


Hi
I am getting more familiar with omnipage 15 now and came across a similar problem. When omnipage does not recognise a character it replaces it with a unique user chosen reject character which I chose to be a twiddle (my comment above is incorrect). In my case the problem character was a symbol used to denote a footnote and was replaced by a twiddle in the recognised text.

Not sure if this will help but is it possible that fine reader is not recognising a symbol and replacing it with a long dash? If this is so, you may be able to train the OCR to recognise the symbol as something from the keyboard.

Dave
Dave UK
DaveH
 
Posts: 178
Joined: Tue Feb 17, 2004 11:54 am
Location: UK

Postby Bunger Henry » Mon Mar 20, 2006 9:45 pm

Not sure if this will help but is it possible that fine reader is not recognising a symbol and replacing it with a long dash? If this is so, you may be able to train the OCR to recognise the symbol as something from the keyboard.


No, the reader correctly detects a long dash and the characters to either side, but it highlights the letters as being uncertain anyway. Happens every time.
Bunger Henry
 
Posts: 149
Joined: Thu Apr 15, 2004 8:17 pm

omnipage?

Postby feddup » Tue Mar 21, 2006 1:38 am

how does omnipage do on technical books? Tables, graphs, programming code, equations, extreme variations in color, various fonts and font sizes. How does it do on dual page scans as far as the curvature of the page? I'm going to try it this weekend but I'm used to finereader and I'm sure there will be a learning period with a new OCR. I'm going to upgrade finereader to 8.0 (much faster than 7.0) unless omnipage flat out smokes it. I'm just looking for as much input before I bake off $180 I'd rather not spend.
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Re: omnipage?

Postby DaveH » Tue Mar 21, 2006 4:38 am

feddup wrote:how does omnipage do on technical books? Tables, graphs, programming code, equations, extreme variations in color, various fonts and font sizes. How does it do on dual page scans as far as the curvature of the page? .


Hi
I do scan books using a flat bed and it copes well with the curvature. To handle tables its best to zone the table by hand, likewise graphs containing text on the axes are best zoned as pictures. Simple equations on a line e.g a^2+b^2=c^2 are OK but you may need to train the OCR for certain features. Equations using multiple lines or symbols(eg integral sign) not in the character map are best zoned as pictures. They won't be read by TextAloud anyway. It can cope with varying fonts and font sizes. I don't think colour is a problem. I don't think there should be a problem with programming code as long as all of the symbols are on the keyboard or in the character map. The OCR can be trained to recognise problem expressions. If you proof read each page, you may need to create a user dictionary containing all the bits it recognise but flags as suspect.

In conclusion the performance depends on the complexity of the document and how much time you are prepared to spend manually zoning the pages. You just have to try it and see.

I will be interested to hear how it compares to finereader.

Dave
Dave UK
DaveH
 
Posts: 178
Joined: Tue Feb 17, 2004 11:54 am
Location: UK

Next

Return to TextAloud 2 Forum

Who is online

Users browsing this forum: No registered users and 0 guests