BACKGROUND OF THE INVENTION
(i) Field of the Invention
[0001] The present invention relates to an information processing apparatus, a program,
and an information processing method.
(ii) Description of Related Art
[0002] JP2000-105796A discloses a method of creating, based on a reading result of a document, an intermediate
file including a text code, error information indicating a text position at which
a reading error occurs, and image information of the entire document, detecting a
field to which a text with an error belongs based on the error information included
in the intermediate file, cutting out an image of the field from the image information
of the entire document, and displaying an error correction screen including a text
reading result to be corrected in the field and the image of the field.
SUMMARY OF THE INVENTION
[0003] An object of the present invention is to specify a position of an image corresponding
to a corrected text string.
[0004] According to a first aspect of the present disclosure, there is provided an information
processing apparatus including a processor configured to acquire a text recognition
result including a text string included in an image and position information of the
text string in the image, display the text string included in the text recognition
result, and specify, in a case where the displayed text string is corrected, position
information corresponding to the corrected text string, among pieces of the position
information associated with each text string included in the text recognition result.
[0005] According to a second aspect of the present disclosure, there is provided the information
processing apparatus according to the first aspect, in which the processor may be
configured to display a second text string corresponding to a first text string included
in the text recognition result, and specify, in a case where the second text string
is corrected, position information corresponding to the corrected second text string,
among pieces of the position information associated with each text string included
in the text recognition result.
[0006] According to a third aspect of the present disclosure, there is provided the information
processing apparatus according to the second aspect, in which the processor may be
configured to specify, in a case where the first text string is corrected, position
information corresponding to the corrected first text string from a group of pieces
of the position information associated with each text string included in the text
recognition result.
[0007] According to a fourth aspect of the present disclosure, there is provided the information
processing apparatus according to any one of the first to third aspects, in which
the processor may be configured to acquire image data representing an image, and display,
among images represented by the acquired image data, an image at a position indicated
by the specified position information.
[0008] According to a fifth aspect of the present disclosure, there is provided the information
processing apparatus according to the fourth aspect, in which the processor may be
configured to display, among the images represented by the acquired image data, an
image including the corrected text string.
[0009] According to a sixth aspect of the present disclosure, there is provided the information
processing apparatus according to any one of the first to fifth aspects, in which
the processor may be configured to specify, in a case where the text string is corrected
and a part of the corrected text string and each text string included in the text
recognition result match with each other, position information of the corrected text
string including the matched part.
[0010] According to a seventh aspect of the present disclosure, there is provided the information
processing apparatus according to any one of the first to sixth aspects, in which
the processor may be configured to acquire image data representing an image, display,
in a case where the text string is corrected and a plurality of pieces of position
information corresponding to the corrected text string are specified from a group
of pieces of the position information associated with each text string included in
the text recognition result, a plurality of images at positions indicated by the plurality
of pieces of position information, and display, as an image corresponding to the corrected
text string, an image selected from the plurality of images.
[0011] According to an eighth aspect of the present disclosure, there is provided the information
processing apparatus according to any one of the first to seventh aspects, in which
the processor may be configured to acquire image data representing an image, specify,
in a case where the text string is corrected and a plurality of pieces of position
information corresponding to the corrected text string are specified from a group
of pieces of the position information associated with each text string included in
the text recognition result, a priority for each of a plurality of images at positions
indicated by the plurality of pieces of position information, and display, as an image
corresponding to the corrected text string, an image that is selected according to
the specified priority from the plurality of images.
[0012] According to a ninth aspect of the present disclosure, there is provided the information
processing apparatus according to the eighth aspect, in which the processor may be
configured to use one of a plurality of rules for specifying the priority.
[0013] According to a tenth aspect of the present disclosure, there is provided the information
processing apparatus according to the ninth aspect, in which the processor may be
configured to use, among the plurality of rules, a rule according to the corrected
text string.
[0014] According to an eleventh aspect of the present disclosure, there is provided the
information processing apparatus according to the ninth aspect, in which the processor
may be configured to use, among the plurality of rules, a rule according to an attribute
of the image data.
[0015] According to a twelfth aspect of the present disclosure, there is provided the information
processing apparatus according to any one of the first to eleventh aspects, in which
the processor may be configured to display, in a case where the text string is corrected,
a screen for setting whether or not the image corresponding to the corrected text
string is to be recognized as a text.
[0016] According to a thirteenth aspect of the present disclosure, there is provided the
information processing apparatus according to the twelfth aspect, in which the processor
may be configured to display a screen for designating a position of the image to be
recognized as a text.
[0017] According to a fourteenth aspect of the present disclosure, there is provided a program
causing a computer to execute a process including acquiring a text recognition result
including a text string included in an image and a position of the text string in
the image, displaying the text string included in the text recognition result, and
specifying, in a case where the text string is corrected, position information corresponding
to the corrected text string from a group of pieces of the position information associated
with each text string included in the text recognition result.
[0018] According to a fifteenth aspect of the present disclosure, there is provided an information
processing method including acquiring a text recognition result including a text string
included in an image and position information of the text string in the image, displaying
the text string included in the text recognition result, and specifying, in a case
where the displayed text string is corrected, position information corresponding to
the corrected text string, among pieces of the position information associated with
each text string included in the text recognition result.
[0019] According to the first aspect, the fourteenth aspect, and the fifteenth aspect of
the present invention, it is possible to specify the position of the image corresponding
to the corrected text string.
[0020] According to the information processing apparatus of the second aspect of the present
invention, in a case where the second text string corresponding to the first text
string included in the text recognition result is corrected, it is possible to specify
the position of the image corresponding to the corrected second text string.
[0021] According to the information processing apparatus of the third aspect of the present
invention, in a case where the first text string included in the text recognition
result is corrected, it is possible to specify the position of the image corresponding
to the corrected first text string.
[0022] According to the information processing apparatus of the fourth aspect of the present
invention, it is possible to display an image at a position indicated by the specified
position information.
[0023] According to the information processing apparatus of the fifth aspect of the present
invention, it is possible to display an image including the corrected text string.
[0024] According to the information processing apparatus of the sixth aspect of the present
invention, it is possible to specify the position information of the corrected text
string including the matched part.
[0025] According to the information processing apparatus of the seventh aspect of the present
invention, in a case where a plurality of pieces of position information corresponding
to the corrected text string are specified, it is possible to display any one of a
plurality of images at positions indicated by the plurality of pieces of position
information. According to the information processing apparatus of the eighth aspect
of the present invention, in a case where a plurality of pieces of position information
corresponding to the corrected text string are specified, it is possible to display,
according to the priority, any one of a plurality of images at positions indicated
by the plurality of pieces of position information.
[0026] According to the information processing apparatus of the ninth aspect of the present
invention, it is possible to specify the priority according to any one of a plurality
of rules.
[0027] According to the information processing apparatus of the tenth aspect of the present
invention, it is possible to specify the priority according to a rule according to
the corrected text string, among the plurality of rules.
[0028] According to the information processing apparatus of the eleventh aspect of the present
invention, it is possible to specify the priority according to a rule according to
an attribute of the image data, among the plurality of rules.
[0029] According to the information processing apparatus of the twelfth aspect of the present
invention, in a case where the text string is corrected, it is possible to designate
whether or not the image corresponding to the corrected text string is to be recognized
as a text.
[0030] According to the information processing apparatus of the thirteenth aspect of the
present invention, it is possible to designate a position of the image to be recognized
as a text.
BRIEF DESCRIPTION OF THE DRAWINGS
[0031] Exemplary embodiment(s) of the present invention will be described in detail based
on the following figures, wherein:
Fig. 1 is a block diagram illustrating a configuration of an information processing
system according to an exemplary embodiment of the present invention;
Fig. 2 is a block diagram illustrating a hardware configuration of a document image
data management apparatus according to the present exemplary embodiment;
Fig. 3 is a block diagram illustrating a hardware configuration of a user terminal
according to the present exemplary embodiment;
Fig. 4 is a diagram illustrating a document;
Fig. 5 is a diagram illustrating a text recognition result stored in the document
image data management apparatus;
Fig. 6 is a diagram illustrating an extraction table stored in the document image
data management apparatus;
Fig. 7 is a diagram illustrating an extraction result stored in the document image
data management apparatus;
Fig. 8 is a flowchart illustrating an operation of the document image data management
apparatus;
Fig. 9 is a diagram illustrating a correction UI screen displayed on the user terminal;
Fig. 10 is a diagram illustrating a correction example of an extraction result stored
in the document image data management apparatus;
Fig. 11 is a diagram illustrating a correction UI screen displayed on the user terminal;
Fig. 12 is a diagram illustrating a correction example of an extraction result stored
in the document image data management apparatus; and
Fig. 13 is a diagram illustrating a correction UI screen displayed on the user terminal.
DETAILED DESCRIPTION OF THE INVENTION
[1] Configuration
[0032] Fig. 1 is a block diagram illustrating a configuration of an information processing
system 100 according to the present exemplary embodiment. The information processing
system 100 includes a document image data management apparatus 1 and a user terminal
2. Both the document image data management apparatus 1 and the user terminal 2 are
computer apparatuses, and are connected to each other by a communication line 3 including
a wireless communication line or a wired communication line. The document image data
management apparatus 1 is an example of an information processing apparatus according
to an exemplary embodiment of the invention.
[0033] Fig. 2 is a diagram illustrating a hardware configuration of the document image data
management apparatus 1. A processor 11 is a processor that controls other components
of the document image data management apparatus 1. A memory 12 is a storage device
that functions as a work area which is used in a case where the processor 11 executes
a program, and includes, for example, a random access memory (RAM). The storage 13
is a storage device that stores various programs and data, and includes, for example,
a solid state drive (SSD) or a hard disk drive (HDD). The processor 11 executes a
program stored in the memory 12 or the storage 13, and thus functions of the document
image data management apparatus 1 are realized. A communication interface (IF) 14
performs communication with another apparatus via the communication line 3 according
to a predetermined wireless communication standard or a predetermined wired communication
standard.
[0034] Fig. 3 is a diagram illustrating a hardware configuration of the user terminal 2.
A processor 21 is a processor that controls other components of the user terminal
2. A memory 22 is a storage device that functions as a work area which is used in
a case where the processor 21 executes a program, and includes, for example, a RAM.
A storage 23 is a storage device that stores various programs and data, and includes,
for example, an SSD or an HDD. The processor 21 executes a program stored in the memory
22 or the storage 23, and thus functions of the user terminal 2 are realized. A communication
IF 24 performs communication with another apparatus according to a predetermined wireless
communication standard or a predetermined wired communication standard. A user interface
(UI) unit 25 includes, for example, a display device such as a display, and an operation
device such as various keys, and displays a UI screen for a user and receives an operation
of a user.
[0035] In the information processing system 100, in a case where a scanner apparatus (not
illustrated) is caused to read a document by a user, document image data indicating
the read result is generated by the scanner apparatus, and the document image data
is stored in the document image data management apparatus 1. The user can browse various
document image data stored in the document image data management apparatus 1, or assign,
as information called as a tag or meta data, any text string to the document image
data, by operating the user terminal 2.
[0036] Fig. 4 is a diagram illustrating a document according to the present exemplary embodiment.
This example illustrates a situation where a document corresponding to an invoice
includes text strings t1 to t7 corresponding to "INVOICE", "DATE", "NUMBER", and the
like.
[0037] The processor 11 of the document image data management apparatus 1 performs text
recognition processing such as optical character recognition/reader (OCR) on document
image data indicating a document. Thereby, the processor 11 acquires a text recognition
result including a text string included in the document image data and a position
of the text string in the document image data. The text recognition result is stored
in the storage 13 of the document image data management apparatus 1. Fig. 5 is a diagram
illustrating a text recognition result stored in the storage 13 of the document image
data management apparatus 1. Fig. 5 illustrates a result obtained by performing text
recognition processing on the document illustrated in Fig. 4. As illustrated in Fig.
5, a group of text strings recognized by performing the text recognition processing
and a group of pieces of position information indicating a position of each text string
in the document are stored in association with each other. Each position information
corresponding to each text string is represented by, for example, an XY coordinate
value (x, y) of any one vertex of a rectangle including the text string (for example,
a circumscribed rectangle circumscribing the text string), a length (width) of the
rectangle in an X-axis direction, and a length (height) of the rectangle in a Y-axis
direction, based on XY orthogonal coordinate axes which are set for the document (refer
to Fig. 4). For example, in Fig. 4, the position information of the text string t1
"INVOICE" is "p01" as illustrated in Fig. 5, and the position information of the text
string t2 "DATE" is "p02" as illustrated in Fig. 5. Further, the position information
of the text string t3 "ISSUE DATE" is "p03" as illustrated in Fig. 5, and the position
information of the text string t4 "10/01/2018" is "p04" as illustrated in Fig. 5.
[0038] The processor 11 of the document image data management apparatus 1 extracts text
strings called as a key and a value from the group of the recognized text strings,
and extracts image data corresponding to the text string called as a value from the
document image data. Here, the key means an attribute of a text string that is predetermined
from the group of the text strings included in each document, such as a title of the
document, a date of the document, and a reference number of the document. On the other
hand, the value is the text string itself corresponding to the key in each document,
and the key and the value are paired concepts. For example, in the document corresponding
to the invoice, the value corresponding to the key "TITLE" is the text string "INVOICE",
and the value corresponding to the key "DATE" is the text string "MM/DD/YYYY" (M,
D, and Y are any numbers), and the value corresponding to the key "NUMBER" is the
text string "XXXXXXXXX" (X is any text, symbol or number). The text string corresponding
to the key according to the present exemplary embodiment is an example of a first
text string according to an exemplary embodiment of the present invention, and the
text string corresponding to the value according to the present exemplary embodiment
is an example of a second text string according to an exemplary embodiment of the
present invention.
[0039] The processor 11 of the document image data management apparatus 1 stores an extraction
table in which rules for extracting the keys and the values from the document image
data are described. Fig. 6 is a diagram illustrating an extraction table stored in
the storage 13 of the document image data management apparatus 1. In the extraction
table, the group of the text strings serving as each key in the document and the pieces
of the position information of the text strings serving as the value corresponding
to each key are associated with each other. In the extraction table, for example,
it is defined that the value corresponding to the key "TITLE" is at a position "TOP"
in the document. In addition, it is defined that the value corresponding to the key
"DATE" is at a position "RIGHT SIDE OF key" in the document. Further, it is defined
that the value corresponding to the key "NUMBER" is at a position "RIGHT SIDE OF key"
in the document. In Fig. 6, the position information of the text string that serves
the value corresponding to each key is represented as "TOP" or "RIGHT SIDE". On the
other hand, in reality, the position information is represented using, for example,
a coordinate value in an XY orthogonal coordinate system which is set for the document.
For example, the top position means, for example, the position information of the
text string having the largest Y coordinate value on the XY orthogonal coordinate
axes which are set for the document. Further, the position on the right side of the
key means, for example, the position information of the text string having an X coordinate
value which is next larger than an X coordinate value of the key on the XY orthogonal
coordinate axes which are set for the document.
[0040] The processor 11 of the document image data management apparatus 1 extracts text
strings called as a key and a value from the group of the recognized text strings
according to the extraction table, and extracts image data corresponding to the text
string called as a value from the document image data. Fig. 7 is a diagram illustrating
an extraction result stored in the storage 13 of the document image data management
apparatus 1. Fig. 7 illustrates an extraction result, from the document illustrated
in Fig. 4, according to the extraction table illustrated in Fig. 6. As illustrated
in Fig. 7, the text string as the value "INVOICE" corresponding to the key "TITLE"
is extracted, and the position information "p01" of the image data corresponding to
the value "INVOICE" is extracted. In addition, the text string as the value "ISSUE
DATE" corresponding to the key "DATE" is extracted, and the position information "p03"
of the image data corresponding to the value "ISSUE DATE" is extracted. Further, the
text string as the value "INVOICE NUMBER" corresponding to the key "NUMBER" is extracted,
and the position information "p06" of the image data corresponding to the value "INVOICE
NUMBER" is extracted. Here, extraction of the position information of the image data
corresponding to the value corresponds to extraction of the image data.
[0041] In Fig. 7, the text string "ISSUE DATE" is extracted as the value corresponding to
the key "DATE". On the other hand, the text string is only an English translation
of a word "DATE", and the value is a text string "MM/DD/YYYY" (M, D, and Y are any
numbers), which is the value corresponding to the key "DATE", that is, "10/01/2018".
Similarly, the text string "INVOICE NUMBER" is extracted as the value corresponding
to the key "NUMBER". On the other hand, the text string is only an English translation
of a word "NUMBER", and the value is a text string "LI-K12554". Such an error occurs,
for example, because there may be cases where layouts are different in various documents,
such as a case where the value corresponding to the key "DATE" is on the right side
of the key, or as illustrated in the example of Fig. 4, a case where the value corresponding
to the key "DATE" is on the right side of the English translation of the key.
[0042] In such a case, the user may correct the extraction result by operating the user
terminal 2. An operation related to the correction will be described.
[2] Operation
[0043] An operation of the document image data management apparatus 1 will be described
with reference to a flowchart illustrated in Fig. 8. In Fig. 8, the processor 11 of
the document image data management apparatus 1 causes the user terminal 2 to display
a correction UI screen for allowing the user to correct the extraction result (step
S0). Fig. 9 is a diagram illustrating a correction UI screen. A correction UI screen
G1 illustrated in Fig. 9 is, for example, a UI screen according to the contents of
Fig. 7. On the correction UI screen, a text string corresponding to the key included
in the text recognition result, a text string corresponding to the value included
in the text recognition result, and an image which corresponds to the text string
corresponding to the value and is included in the image represented by the document
image data are displayed. Further, a correction UI screen G2 illustrated in Fig. 9
is a UI screen on which the entire document image represented by the document image
data illustrated in Fig. 4 is displayed. The correction UI screens G1 and G2 are displayed
side by side on one screen, for example, such that the screens are browsed at the
same time by the user.
[0044] Here, as described above, the text string "ISSUE DATE" is displayed as the value
corresponding to the key "DATE", and the image corresponding to the value is displayed.
On the other hand, the value is originally "10/01/2018". For this reason, the user
performs an operation of correcting "ISSUE DATE" displayed as the value corresponding
to the key "DATE" to "10/01/2018". The correction operation may be, for example, an
operation in which the user directly inputs the text string "10/01/2018" as the value
corresponding to the key "DATE" on the correction UI screen G1, or may be an operation
in which the user designates the text string "10/01/2018" displayed on the correction
UI screen G2 as the value corresponding to the key "DATE".
[0045] In a case where it is determined that the value is corrected (YES in step S1), the
processor 11 of the document image data management apparatus 1 searches for a text
string corresponding to "10/01/2018" as the corrected value from the text recognition
result illustrated in Fig. 5, and determines the number of the text strings corresponding
to the corrected value (step S2). Here, in a case where the text recognition result
does not include a text string corresponding to the corrected value (NONE in step
S2), the processor 11 of the document image data management apparatus 1 causes the
user terminal 2 to display a predetermined error screen, and ends processing illustrated
in Fig. 8.
[0046] In a case where the text recognition result includes one text string corresponding
to the corrected value (ONE in step S2), the processor 11 of the document image data
management apparatus 1 specifies the position information of the image corresponding
to the text string, based on the text recognition result illustrated in Fig. 5 (step
S3). Here, as illustrated in Fig. 5, the position information "p04" corresponding
to the text string "10/01/2018" is specified.
[0047] The processor 11 of the document image data management apparatus 1 rewrites the text
string "ISSUE DATE" which is the value before correction, corresponds to the key "DATE",
and is included in the data illustrated in Fig. 7, into the text string "10/01/2018"
as the corrected value, and rewrites the corresponding position information "p03"
into the specified position information "p04" (step S4). Thereby, the content of the
extraction result illustrated in Fig. 7 can be rewritten into an extraction result
as illustrated in Fig. 10. Therefore, as illustrated in Fig. 11, on the correction
UI screen G1, the value "10/01/2018" corresponding to the key "DATE" is displayed,
and the image corresponding to the position information "p04" (in the document image,
the image corresponding to "10/01/2018") is displayed.
[0048] In the same procedure, in a case where the user corrects "INVOICE NUMBER" displayed
as the value corresponding to the key "NUMBER", to "LI-K12554", as illustrated in
Fig. 12, the processor 11 of the document image data management apparatus 1 rewrites
the text string "INVOICE NUMBER", which is the value before correction, corresponds
to the key "NUMBER", and is included in the data illustrated in Fig. 7, into the text
string "LI-K12554" as the corrected value, and rewrites the corresponding position
information "p06" into the position information "p07". Thereby, the correction UI
screen G1 as illustrated in Fig. 13 is displayed on the user terminal 2.
[0049] Further, in a case where the text recognition result includes a plurality of text
strings corresponding to the corrected value (plurality in step S2), the processor
11 of the document image data management apparatus 1 selects the text string having
the highest priority based on priorities in the text recognition result illustrated
in Fig. 5 (step S5).
[0050] Specifically, the processor 11 of the document image data management apparatus 1
causes the user terminal 2 to display, on the correction UI screen G2, a plurality
of images at positions indicated by pieces of the position information of the plurality
of text strings corresponding to the values according to the example of Fig. 6, and
in a case where the user selects an image from the plurality of images by operating
the user terminal 2, causes the user terminal 2 to display, as the image corresponding
to the corrected text string, the image selected by the user. After correction, as
described above, the extraction result of the text string is rewritten, and the correction
UI screen G1 according to the rewritten extraction result is displayed.
[0051] According to the above-described present exemplary embodiment, it is possible to
specify the position of the image corresponding to the corrected text string, from
the group of the text strings included in the document. Further, according to the
present exemplary embodiment, it is possible to display an image at a specified position.
[3] Modification Example
[0052] The above-described exemplary embodiment is merely an example of implementation of
the present invention, and may be modified as follows. Further, the above-described
exemplary embodiment and each of the following modification examples may be implemented
by being combined with each other as appropriate.
[0053] (1) In the above-described exemplary embodiment, the processor 11 of the document
image data management apparatus 1 causes the user terminal 2 to display the text string
(second text string) corresponding to the value, which corresponds to the text string
(first text string) corresponding to the key detected from the text recognition result,
and in a case where the text string (second text string) corresponding to the value
is corrected, specifies the position information corresponding to the text string
(second text string), which corresponds to the corrected value, from the group of
pieces of position information associated with each text string included in the text
recognition result. On the other hand, the text string (first text string) corresponding
to the key may be corrected by the user. In this case, in a case where the first text
string is corrected, the processor 11 may specify the position information corresponding
to the corrected first text string, from the group of pieces of position information
associated with each text string included in the text recognition result, and cause
the user terminal 2 to display the image at the specified position.
[0054] (2) In the above-described exemplary embodiment, in a case where the text string
corresponding to the value is corrected and the corrected text string and the text
string included in the text recognition result match with each other, the processor
11 of the document image data management apparatus 1 specifies the position information
of the matched text string, and causes the user terminal 2 to display the image at
the specified position. In the processing, in a case where the text string corresponding
to the value is corrected and a part of the corrected text string and each text string
included in the text recognition result match with each other, the processor 11 may
specify the position information of the corrected text string including the matched
part, and cause the user terminal 2 to display the image at the specified position.
That is, the corrected text string and the text string included in the text recognition
result may partially match with each other.
[0055] (3) In the above-described exemplary embodiment, in a case where the text string
corresponding to the value is corrected and a plurality of pieces of position information
corresponding to the corrected text string are specified from the group of pieces
of position information associated with each text string included in the text recognition
result, the processor 11 of the document image data management apparatus 1 treats
the position information of the image selected by the user, as the position information
having the highest priority. On the other hand, the priority is not limited to the
example of the exemplary embodiment.
[0056] Further, the processor 11 may store a plurality of rules for specifying the priority
in the storage 13, and use any one of the plurality of rules. For example, the processor
11 may use, among the plurality of rules, a rule according to the corrected text string.
For example, in a case where the corrected text string is the text string corresponding
to the value which corresponds to the key "TITLE", the processor 11 may set the priority
of the text string having the largest size or the text string having a specific font
to be higher.
[0057] Further, the processor 11 may use, among the plurality of rules, a rule according
to an attribute of the document image data. For example, a case where metadata indicating
a type (attribute) is assigned to the document image data is considered. For example,
in a case where certain metadata A is assigned to the document image data and the
corrected text string is the text string corresponding to the value which corresponds
to the key "TITLE", the processor 11 may set the priority of the text string having
the largest size to be higher. Further, for example, in a case where certain metadata
B is assigned to the document image data and the corrected text string is the text
string corresponding to the value which corresponds to the key "TITLE", the processor
11 may set the priority of the text string having a specific font to be higher.
[0058] (4) In a case where the text string is corrected, the processor 11 of the document
image data management apparatus 1 may cause the user terminal 2 to display a UI screen
for setting whether or not the image corresponding to the corrected text string is
to be recognized as a text. For example, in a case where the text string is corrected,
the processor 11 of the document image data management apparatus 1 may cause the user
terminal 2 to display a screen for designating the position of the image to be recognized
as a text. More specifically, the processor 11 causes the user terminal 2 to display
a screen for inquiring of the user about whether to rewrite the position information
of the text string as the value corresponding to each key, which is illustrated in
Fig. 6, into the position information indicating the position of the image which corresponds
to the corrected text string and is included in the document, and in a case where
a response for rewriting is input from the user, rewrites the position information
of the text string into the position information indicating the position of the image
which corresponds to the corrected text string and is included in the document. Thereby,
the position information of the text string, such as "TOP" or "RIGHT SIDE" illustrated
in Fig. 6, is rewritten. Therefore, the user does not need to correct the text string
after the rewriting.
[0059] (5) In the above-described exemplary embodiment, the program executed by the processor
11 of the document image data management apparatus 1 or the program executed by the
processor 21 of the user terminal 2 may be downloaded via a communication line such
as the Internet. Further, the program may be provided by being recorded on a computer-readable
recording medium such as a magnetic recording medium (a magnetic tape, a magnetic
disk, or the like), an optical recording medium (an optical disk or the like), a magneto-optical
recording medium, or a semiconductor memory.
[0060] In the embodiments above, the term "processor" refers to hardware in a broad sense.
Examples of the processor include general processors (e.g., CPU: Central Processing
Unit) and dedicated processors (e.g., GPU: Graphics Processing Unit, ASIC: Application
Specific Integrated Circuit, FPGA: Field Programmable Gate Array, and programmable
logic device). In the embodiments above, the term "processor" is broad enough to encompass
one processor or plural processors in collaboration which are located physically apart
from each other but may work cooperatively. The order of operations of the processor
is not limited to one described in the embodiments above, and may be changed.
[0061] The foregoing description of the exemplary embodiments of the present invention has
been provided for the purposes of illustration and description. It is not intended
to be exhaustive or to limit the invention to the precise forms disclosed. Obviously,
many modifications and variations will be apparent to practitioners skilled in the
art. The embodiments were chosen and described in order to best explain the principles
of the invention and its practical applications, thereby enabling others skilled in
the art to understand the invention for various embodiments and with the various modifications
as are suited to the particular use contemplated. It is intended that the scope of
the invention be defined by the following claims and their equivalents.
Brief Description of the Reference Symbols
[0062]
- 1:
- document image data management apparatus
- 11:
- processor
- 12:
- memory
- 13:
- storage
- 14:
- communication IF
- 2:
- user terminal
- 21:
- processor
- 22:
- memory
- 23:
- storage
- 24:
- communication IF
- 25:
- UI unit
- 100:
- information processing system
1. An information processing apparatus comprising:
a processor configured to:
acquire a text recognition result including a text string included in an image and
position information of the text string in the image;
display the text string included in the text recognition result; and
specify, in a case where the displayed text string is corrected, position information
corresponding to the corrected text string, among pieces of the position information
associated with each text string included in the text recognition result.
2. The information processing apparatus according to claim 1, wherein the processor is
configured to:
display a second text string corresponding to a first text string included in the
text recognition result, and
specify, in a case where the second text string is corrected, position information
corresponding to the corrected second text string, among pieces of the position information
associated with each text string included in the text recognition result.
3. The information processing apparatus according to claim 2, wherein the processor is
configured to:
specify, in a case where the first text string is corrected, position information
corresponding to the corrected first text string from a group of pieces of the position
information associated with each text string included in the text recognition result.
4. The information processing apparatus according to any one of claims 1 to 3, wherein
the processor is configured to:
acquire image data representing an image, and
display, among images represented by the acquired image data, an image at a position
indicated by the specified position information.
5. The information processing apparatus according to claim 4, wherein the processor is
configured to:
display, among the images represented by the acquired image data, an image including
the corrected text string.
6. The information processing apparatus according to any one of claims 1 to 5, wherein
the processor is configured to:
specify, in a case where the text string is corrected and a part of the corrected
text string and each text string included in the text recognition result match with
each other, position information of the corrected text string including the matched
part.
7. The information processing apparatus according to any one of claims 1 to 6, wherein
the processor is configured to:
acquire image data representing an image, display, in a case where the text string
is corrected and a plurality of pieces of position information corresponding to the
corrected text string are specified from a group of pieces of the position information
associated with each text string included in the text recognition result, a plurality
of images at positions indicated by the plurality of pieces of position information,
and
display, as an image corresponding to the corrected text string, an image selected
from the plurality of images.
8. The information processing apparatus according to any one of claims 1 to 7, wherein
the processor is configured to:
acquire image data representing an image,
specify, in a case where the text string is corrected and a plurality of pieces of
position information corresponding to the corrected text string are specified from
a group of pieces of the position information associated with each text string included
in the text recognition result, a priority for each of a plurality of images at positions
indicated by the plurality of pieces of position information, and
display, as an image corresponding to the corrected text string, an image that is
selected according to the specified priority from the plurality of images.
9. The information processing apparatus according to claim 8, wherein the processor is
configured to:
use one of a plurality of rules for specifying the priority.
10. The information processing apparatus according to claim 9, wherein the processor is
configured to:
use, among the plurality of rules, a rule according to the corrected text string.
11. The information processing apparatus according to claim 9, wherein the processor is
configured to:
use, among the plurality of rules, a rule according to an attribute of the image data.
12. The information processing apparatus according to any one of claims 1 to 11, wherein
the processor is configured to:
display, in a case where the text string is corrected, a screen for setting whether
or not the image corresponding to the corrected text string is to be recognized as
a text.
13. The information processing apparatus according to claim 12, wherein the processor
is configured to:
display a screen for designating a position of the image to be recognized as a text.
14. A program causing a computer to execute a process comprising:
acquiring a text recognition result including a text string included in an image and
a position of the text string in the image;
displaying the text string included in the text recognition result; and
specifying, in a case where the text string is corrected, position information corresponding
to the corrected text string from a group of pieces of the position information associated
with each text string included in the text recognition result.
15. An information processing method comprising:
acquiring a text recognition result including a text string included in an image and
position information of the text string in the image;
displaying the text string included in the text recognition result; and
specifying, in a case where the displayed text string is corrected, position information
corresponding to the corrected text string, among pieces of the position information
associated with each text string included in the text recognition result.