I am new to OCR and am using OmniPage Pro 15.0 to scan college textbooks (some hardback, some paper back). Content is 98% just B&W text with very little if any formulas or equations, tables, or diagrams/photos. Some words have superscript notational numbers for footnote references. Most books have header and/or footer.
Books are both new and used but generally have very good contrast between characters and paper. If pushed down sufficiently at spine to reduce book curl, most scans are very successful with excellent accuracy and very few suspect words/characters.
1) HOWEVER, some scanned pages are erroneously reported by OP Pro as being fine and they are not. Preponderance of errors are characters at beginning or end of line closest to spine. As examples of what was allowed through as OK are the following:
As shown in TextEditor Should be
y you
a an or and
th the
Any other similar examples. I do NOT have OP set to flag all non-dictionary words, and I expect that I should try that setting to avoid these "false negatives" situations.
2) If text is clean and with good contrast, am I correct that it is probably best to scan with "Despeckle" off?
3) My strategy for scanning most or all of a 300 to 400 page book is as follows - any other suggestions:
a) Look over book and determine what might be typical challenges.
b) Choose settings for page size etc. Leave Brightness at default.
c) Scan several typical pages each from front, middle, and back of book to represent different types of book curl.
d) Evaluate results and modify strategy if needed.
Record number of errors reported by ProofReader in the Document area. Let TA read the page while checking it against the actual book - note any mis-matches. Visually look at the scanned image edge AND text editor view of edge that is closest to spine and check for poor scanning and for errors that are actual but not reported by ProofReader.
If scan is not good and/or errors are high or their are unreported errors, change scanning strategy. Changes primarily would be to attempting to get better contact between pages and glass and reducing book curl either by the amount of pressure and/or by pressing more uniformly along the book spine especially if the book has a long spine. As needed a change in the Brightness setting might be tried, however I believe I had read in OP manual that Brightness setting should not impact results at all if you are scanning in B&W mode. Since most of my problems are along the spine and seem most related to the relationship of text to the glass, I don't know whether changes to Brightness or if I change to Grayscale mode and adjust both Brightness and Contrast would be worth the experiment. Hopefully someone has good practical experience as was true in this recent thread with advice on scanning mass paperback books using ABBYY FineReader (http://www.nextup.com/phpBB2/viewtopic.php?t=2462)
4) Finally any advice or pitfalls with the ProofReader step? Are there times when selecting "Change All" might get you into trouble? Are there times when you don't want IntelliTrain to be operating as it creates more problems than it might be fixing? It also seems that the ProofReader sometimes suggests words that clearly would not be in the Dictionary - is it simply making the suggestion based on the scanned image without caring whether it is a Dictionary word or not?
Fortunately while headers and footers can be a challenge in scanning generally they are consistent text and therefore pretty easy to fix enmasse after the scanning process.
Sorry to have to post this in this forum but the User forum at Nuance for OP is very inactive - no posts at all today. Thanks for letting me learn from your experience.
