Global Patent Index - EP 0602955 A3

EP 0602955 A3 19950118 - Text recognition.

Title (en)

Text recognition.

Title (de)

Texterkennung.

Title (fr)

Reconnaissance de texte.

Publication

EP 0602955 A3 19950118 (EN)

Application

EP 93310134 A 19931215

Priority

  • US 99191392 A 19921217
  • US 99235892 A 19921217

Abstract (en)

[origin: EP0602955A2] Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMs) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2x) and subjecting the reduced image to vertical and horizontal morphological closing operations. The bounding boxes of connected components in the resulting image are then used to hypothesize word or text line bounding boxes, and the original bitmaps within the boxes are used to hypothesize words. In a particular embodiment, a range of structuring elements is used for the closing operations to accommodate the variation of inter- and intra-character spacing with font and font size. <IMAGE>

IPC 1-7

G06K 9/00; G06K 9/68

IPC 8 full level

G06V 30/262 (2022.01); G06V 30/10 (2022.01)

CPC (source: EP US)

G06F 18/295 (2023.01 - EP); G06V 30/19187 (2022.01 - EP US); G06V 30/262 (2022.01 - EP US); G06V 30/10 (2022.01 - EP US)

Citation (search report)

  • [DXA] C B BOSE ET S KUO: "Connected and degraded text recognition using hidden markov model", PROC. OF THE 11TH INT. CONF. ON PATTERN RECOGNITION, 30 August 1992 (1992-08-30), THE HAGUE, NL, pages 116 - 9
  • [A] T PAVLIDIS: "A vectorizer and feature extractor for document recognition", COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING, vol. 35, 1986, pages 111 - 27, XP007907277, DOI: doi:10.1016/0734-189X(86)90128-3
  • [A] L D WILCOX ET M A BUSH: "Training and search algorithms for an interactive wordspotting system", ICASSP-92 IEEE INT. CONF. ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, vol. 2, 23 March 1992 (1992-03-23), SAN FRANCISCO, CA, pages 97 - 100, XP000356946
  • [A] N F BRICKMAN: "Word AUTOCorrelation redundancy match (WARM) technology", IBM JOURNAL OF RESEARCH AND DEVELOPMENT, vol. 26, no. 6, November 1982 (1982-11-01), NEW YORK NY, pages 681 - 6, XP001379913
  • [DA] T K HO, J J HULL ET S N SRIHARI: "A word shape analysis approach to recognition of degraded word images", U.S. POSTAL SERVICE ADVANCED TECHNOLOGY CONFERENCE, 5 November 1990 (1990-11-05), pages 217-231

Designated contracting state (EPC)

DE FR GB

DOCDB simple family (publication)

EP 0602955 A2 19940622; EP 0602955 A3 19950118; EP 0602955 B1 20001227; DE 69329789 D1 20010201; DE 69329789 T2 20010503

DOCDB simple family (application)

EP 93310134 A 19931215; DE 69329789 T 19931215