Your OmniPage book scanning advice

Discussion Forum for TextAloud. Great place to share ideas, ask questions, talk with other users. If you have a tough technical question, still feel free to ask us at support@nextup.com. Also, if you would like a personal response, be sure to leave your email address.

Moderators: kdwhite, Jim Bretti, D.Leikin

Your OmniPage book scanning advice

Postby EdwardA » Tue Jun 13, 2006 2:14 pm

I am new to OCR and am using OmniPage Pro 15.0 to scan college textbooks (some hardback, some paper back). Content is 98% just B&W text with very little if any formulas or equations, tables, or diagrams/photos. Some words have superscript notational numbers for footnote references. Most books have header and/or footer.

Books are both new and used but generally have very good contrast between characters and paper. If pushed down sufficiently at spine to reduce book curl, most scans are very successful with excellent accuracy and very few suspect words/characters.

1) HOWEVER, some scanned pages are erroneously reported by OP Pro as being fine and they are not. Preponderance of errors are characters at beginning or end of line closest to spine. As examples of what was allowed through as OK are the following:

As shown in TextEditor Should be
y you
a an or and
th the

Any other similar examples. I do NOT have OP set to flag all non-dictionary words, and I expect that I should try that setting to avoid these "false negatives" situations.

2) If text is clean and with good contrast, am I correct that it is probably best to scan with "Despeckle" off?

3) My strategy for scanning most or all of a 300 to 400 page book is as follows - any other suggestions:
a) Look over book and determine what might be typical challenges.
b) Choose settings for page size etc. Leave Brightness at default.
c) Scan several typical pages each from front, middle, and back of book to represent different types of book curl.
d) Evaluate results and modify strategy if needed.

Record number of errors reported by ProofReader in the Document area. Let TA read the page while checking it against the actual book - note any mis-matches. Visually look at the scanned image edge AND text editor view of edge that is closest to spine and check for poor scanning and for errors that are actual but not reported by ProofReader.

If scan is not good and/or errors are high or their are unreported errors, change scanning strategy. Changes primarily would be to attempting to get better contact between pages and glass and reducing book curl either by the amount of pressure and/or by pressing more uniformly along the book spine especially if the book has a long spine. As needed a change in the Brightness setting might be tried, however I believe I had read in OP manual that Brightness setting should not impact results at all if you are scanning in B&W mode. Since most of my problems are along the spine and seem most related to the relationship of text to the glass, I don't know whether changes to Brightness or if I change to Grayscale mode and adjust both Brightness and Contrast would be worth the experiment. Hopefully someone has good practical experience as was true in this recent thread with advice on scanning mass paperback books using ABBYY FineReader (http://www.nextup.com/phpBB2/viewtopic.php?t=2462)

4) Finally any advice or pitfalls with the ProofReader step? Are there times when selecting "Change All" might get you into trouble? Are there times when you don't want IntelliTrain to be operating as it creates more problems than it might be fixing? It also seems that the ProofReader sometimes suggests words that clearly would not be in the Dictionary - is it simply making the suggestion based on the scanned image without caring whether it is a Dictionary word or not?

Fortunately while headers and footers can be a challenge in scanning generally they are consistent text and therefore pretty easy to fix enmasse after the scanning process.

Sorry to have to post this in this forum but the User forum at Nuance for OP is very inactive - no posts at all today. Thanks for letting me learn from your experience.
EdwardA
 
Posts: 46
Joined: Tue Apr 25, 2006 12:10 am
Location: Oregon, USA

Postby kdwhite » Wed Jun 14, 2006 11:43 am

We don't know much about the scanning process at all, so let's see if some users step up with some advice.
Ken White
NextUp.com
The Power of Spoken Audio
http://www.NextUp.com

** TextAloud - The world's most popular Text To Speech tool.
http://www.nextup.com/TextAloud/
kdwhite
Site Admin
 
Posts: 2627
Joined: Mon Sep 29, 2003 11:34 am

New to OCR?

Postby feddup » Fri Jun 16, 2006 11:55 pm

If you're new to OCR then you'll be teaching in a matter of weeks. Your breakdown is fantastic. One point. You said leave brightness at default. Sometimes when authors choose to highlight certain text or have a gray text box on an otherwise white sheet then brightness and contrast might need tweaking. I wish I could give you a formula but trial and error is your best friend in this case. It took me years to, on purpose, look through the book for the most troublesome pages and test there first. As far as the scanning close to the spine. Make sure the book opens well at the binding and firmly hold the spine down from the back. This is an art form and I have very little to teach you. You're a quick study!
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Postby EdwardA » Sat Jun 17, 2006 9:22 pm

Feddup, thanks for the response and encouragement.
It seems clear that accurate observation is a key to actually learning from ones experience with scanning, but perhaps someone can cut my learning curve on two ?s I raised above:


2) If text is clean and with good contrast, am I correct that it is probably best to scan with "Despeckle" off?

4) Finally any advice or pitfalls with the ProofReader step? Are there times when selecting "Change All" might get you into trouble? Are there times when you don't want IntelliTrain to be operating as it creates more problems than it might be fixing? It also seems that the ProofReader sometimes suggests words that clearly would not be in the Dictionary - is it simply making the suggestion based on the scanned image without caring whether it is a Dictionary word or not?

Oh yes, I did experiment with checking the option of "mark all non-dictionary words" and while this avoided the "false negatives" that I mentioned, I find this setting somewhat useless as in the material I was scanning there were so many non-dcitionary words that it was very time consuming to train the program. Since I realize that one has to make sure that the scans are accurate regardless of whehter the program reports very few ?able words/characters, I expect that my strategy will be that if there is a poor scan and/or lots of words that are accepted but are actually incorrect that rather than fix the page I will find it quicker to re-scan the page.
EdwardA
 
Posts: 46
Joined: Tue Apr 25, 2006 12:10 am
Location: Oregon, USA

clueless

Postby feddup » Sun Jun 18, 2006 12:01 am

Despeckle sounds nice in concept but I'm unsure if it does anything at all! I've tried "training" in omnipage and finereader and I honestly think I was doing more damage than good. I've actually spent whole nights with a particularly nightmarish page varying the settings (despeckle included) only to walk away defeated. My problems could well be operator error but I've repeatedly heard from many IT experts moans of dispair when OCR software is mentioned! If perfect OCR software comes out I'll be waiting in line at midnight like the idiots that have to have the newest gaming consoles.
feddup
 
Posts: 57
Joined: Tue Mar 07, 2006 8:14 pm
Location: Kansas City, Missouri

Postby EdwardA » Sun Jun 18, 2006 11:51 am

Thanks Feddup ... Your exp will remind me that if I experience some frustration or less than stellar results it is par for the course. Your post also gave me a good laugh.
EdwardA
 
Posts: 46
Joined: Tue Apr 25, 2006 12:10 am
Location: Oregon, USA

Re: Your OmniPage book scanning advice

Postby johnvarenda » Mon Feb 15, 2010 6:11 am

Thanks for the posting.....
..........
http://www.e-datapro.net
johnvarenda
 
Posts: 2
Joined: Thu Nov 26, 2009 6:53 am

Re: Your OmniPage book scanning advice

Postby Daniell » Fri Sep 30, 2011 1:37 am

I'm in need of an automatic page turner, badly; even if it’s only 99% reliable. Missed pages can be detected by OCRing page numbers. Manually capturing a few pages is better than having to manually capture them all.
Daniell
 
Posts: 3
Joined: Thu Sep 08, 2011 4:27 am


Return to TextAloud 2 Forum

Who is online

Users browsing this forum: No registered users and 1 guest