Page 130 - DCAP303_MULTIMEDIA_SYSTEMS
P. 130

Multimedia Systems



                   notes         trend to meet larger volume and larger group of users after 30 years development of the desktop
                                 OCR. Internet and broadband technologies have made WebOCR and OnlineOCR practically
                                 available to both individual users and enterprise customers. Since 2000, some major OCR vendors
                                 began offering WebOCR and Online software, a number of new entrants companies to seize the
                                 opportunity to develop innovative Web-based OCR service, some of which are free of charge
                                 services.
                                 application-oriented oCr
                                 Since OCR technology has been more and more widely applied to paper-intensive industry,
                                 it is facing more complex images environment in the real world. For example, complicated
                                 backgrounds, degraded-images, heavy-noise, paper skew, picture distortion, low-resolution,
                                 disturbed by grid and lines, text image consisting of special fonts, symbols, glossary words and
                                 etc. All the factors affect OCR products’ stability in recognition accuracy.
                                 In recent years, the major OCR technology providers began to develop dedicated OCR systems,
                                 each for a special type of images. They combine various optimization methods related the special
                                 image, such as business rules, standard expression, glossary dictionary and rich information
                                 contained in colour image, to improve the recognition accuracy.
                                 Such strategy to customize OCR technology is called “Application-Oriented OCR” or “Customized
                                 OCR”, widely used in the fields of Business-card OCR, Invoice OCR, Screenshot OCR, ID card
                                 OCR, Driver-license OCR or Auto plant OCR, and so on.

                                 7.2.2 oCr software in present scenario
                                 One study based on recognition of 19th and early 20th century newspaper pages concluded
                                 that character-by-character OCR accuracy for commercial OCR software varied from 71% to
                                 98%; total accuracy can only be achieved by human review. Other areas—including recognition
                                 of hand printing, cursive handwriting, and printed text in other scripts (especially those East
                                 Asian language characters which have many strokes for a single character)—are still the subject
                                 of active research.
                                 Accuracy rates can be measured in several ways, and how they are measured can greatly affect
                                 the reported accuracy rate. For example, if word context (basically a lexicon of words) is not used
                                 to correct software finding non-existent words, a character error rate of 1% (99% accuracy) may
                                 result in an error rate of 5% (95% accuracy) or worse if the measurement is based on whether
                                 each whole word was recognized with no incorrect letters.
                                 On-line character recognition is sometimes confused with Optical Character Recognition
                                 (Handwriting recognition). The OCR is an instance of off-line character recognition, where the
                                 system recognizes the fixed static shape of the character, while on-line character recognition instead
                                 recognizes the dynamic motion during handwriting. For example, on-line recognition, such as that
                                 used for gestures in the Penpoint OS or the Tablet PC can tell whether a horizontal mark was
                                 drawn right-to-left, or left-to-right. On-line character recognition is also referred to by other terms
                                 such as dynamic character recognition, real-time character recognition, and Intelligent Character
                                 Recognition or ICR.
                                 On-line systems for recognizing hand-printed text on the fly have become well known as
                                 commercial products in recent years. Among these are the input devices for personal digital
                                 assistants such as those running Palm OS. The Apple Newton pioneered this product. The
                                 algorithms used in these devices take advantage of the fact that the order, speed, and direction of
                                 individual lines segments at input are known. Also, the user can be retrained to use only specific
                                 letter shapes. These methods cannot be used in software that scans paper documents, so accurate
                                 recognition of hand-printed documents is still largely an open problem. Accuracy rates of 80% to
                                 90% on neat, clean hand-printed characters can be achieved, but that accuracy rate still translates
                                 to dozens of errors per page, making the technology useful only in very limited applications.



        124                               LoveLy professionaL University
   125   126   127   128   129   130   131   132   133   134   135