BACKGROUND OF THE INVENTION
Field of the Invention
[0001] The present disclosure relates to an information processing device and an information
processing method.
Description of the Background Art
[0002] Conventionally, there has been an attempt to improve work processes included in a
work site such as a factory by using a moving image captured by a camera installed
in the work site. For example,
Japanese Patent Laying-Open No. 2020-204819 discloses an information processing device that analyzes a moving image obtained
by image capture with a ceiling camera. The information processing device analyzes
the moving image to determine whether or not there is a worker in a monitoring area
associated with each process, and generates data indicating a time zone determined
to have a worker in the monitoring area.
SUMMARY OF THE INVENTION
[0003] In the technique disclosed in
Japanese Patent Laying-Open No. 2020-204819, whether or not there is a worker in a monitoring area associated with each process
is monitored, but a detailed situation of the worker cannot be monitored.
[0004] The present disclosure has been made in view of the above problems, and an object
thereof is to provide an information processing device and an information processing
method with which it is possible to recognize a detailed situation of a worker.
[0005] According to an example of the present disclosure, an information processing device
includes an acquisition unit, an operation section detector, and a provision unit.
The acquisition unit acquires a moving image from a camera that is installed at a
production site and that images a target worker and surroundings of the target worker.
The operation section detector detects, from a predetermined number of consecutive
first frames included in the moving image, an operation section of work performed
by the target worker included in the predetermined number of first frames using an
inference model. The provision unit provides a detection result by the operation section
detector. The inference model is generated by learning processing using a plurality
of learning data sets. Each of the plurality of learning data sets includes a predetermined
number of consecutive second frames included in a moving image that includes a specific
worker, and a label indicating an operation section of work performed by the specific
worker included in the predetermined number of second frames.
[0006] According to an example of the present disclosure, an information processing device
includes: an acquisition unit configured to acquire a moving image from a camera that
images a face of a worker, the camera being installed at a production site; an emotion
detector configured to detect an emotion of the worker included in each frame of the
moving image; and a provision unit configured to provide a transition of the emotion
detected by the emotion detector.
[0007] In the above disclosure, the emotion detector preferably outputs a score of each
of a plurality of types of emotions. Furthermore, the provision unit preferably provides
a notification for promoting care for the worker in response to the score of a target
type out of the plurality of types of emotions falling outside a prescribed range.
[0008] According to one example of the present disclosure, an information processing device
includes: an acquisition unit configured to acquire a moving image from a camera that
images a face of a worker, the camera being installed at a production site; a line-of-sight
detector configured to detect a line-of-sight direction of the worker included in
each frame of the moving image; and a provision unit configured to provide an image
including an object in front of the worker. The provision unit determines a position
of a viewpoint of the worker in the image on the basis of the line-of-sight direction
detected by the line-of-sight detector, and displays a mark at the determined position
in the image.
[0009] According to one example of the present disclosure, an information processing method
includes: acquiring a moving image from a camera that is installed at a production
site and that images a target worker and surroundings of the target worker; detecting,
from a predetermined number of consecutive first frames included in the moving image,
an operation section of work performed by the target worker included in the predetermined
number of first frames using an inference model; and providing a detection result.
The inference model is generated by learning processing using a plurality of learning
data sets, and each of the plurality of learning data sets includes a predetermined
number of consecutive second frames included in a moving image that includes a specific
worker, and a label indicating an operation section of work performed by the specific
worker included in the predetermined number of second frames.
[0010] According to one example of the present disclosure, an information processing method
includes: acquiring a moving image from a camera that is installed at a production
site and that images a face of a worker; detecting an emotion of the worker included
in each frame of the moving image; and providing a transition of the emotion detected.
[0011] In the above disclosure, the detecting preferably includes outputting a score of
each of a plurality of types of emotions. The providing preferably includes providing
a notification for promoting care for the worker in response to a score of a target
type out of the plurality of types falling outside a prescribed range.
[0012] According to one example of the present disclosure, an information processing method
includes: acquiring a moving image from a camera that is installed at a production
site and that images a face of a worker; detecting a line-of-sight direction of the
worker included in each frame of the moving image; and providing an image including
an object in front of the worker. The providing includes determining a position of
a viewpoint of the worker in the image on the basis of the line-of-sight direction
that has been detected, and displaying a mark at the determined position in the image.
[0013] According to these disclosures, a user can recognize a detailed situation (operation
section of work, line-of-sight direction, and emotion) of the worker.
[0014] The foregoing and other objects, features, aspects and advantages of the present
invention will become more apparent from the following detailed description of the
present invention when taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015]
Fig. 1 is a diagram illustrating an overall configuration of a system to which an
information processing device according to an embodiment is applied.
Fig. 2 is a schematic diagram illustrating a hardware configuration example of the
information processing device according to the embodiment.
Fig. 3 is a diagram illustrating an example of functional configuration of the information
processing device according to the embodiment.
Fig. 4 is a diagram illustrating an example of an inference model.
Fig. 5 is a diagram illustrating three frames respectively corresponding to three
operation sections corresponding to a "soldering" process, and a frame not belonging
to any operation section, in which Fig. 5(a) illustrates a frame of a first section,
Fig. 5(b) illustrates a frame of a second section, Fig. 5(c) illustrates a frame of
a third section, and Fig. 5(d) illustrates a frame not belonging to any of the first
to third sections.
Fig. 6 is a diagram illustrating a verification result of an estimated operation section.
Fig. 7 is a diagram illustrating one example of a provision screen.
Fig. 8 is a diagram illustrating another example of the provision screen.
Fig. 9 is a diagram illustrating still another example of the provision screen.
Fig. 10 is a diagram illustrating a relationship between worker's emotions and production
indexes.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0016] An embodiment of the present invention will be described in detail with reference
to the drawings. The same or corresponding parts in the drawings are denoted by the
same reference signs, and the description thereof will not be repeated. Modifications
described below may be appropriately and selectively combined.
[0017] Fig. 1 is a diagram illustrating an overall configuration of a system to which an
information processing device according to the present embodiment is applied. As illustrated
in Fig. 1, a system 1 includes a production line 2, an information processing device
10, a programmable logic controller (PLC) 20, and cameras 30 and 40.
[0018] Production line 2 includes multiple processes 3_1 to 3_n and produces various products.
Multiple processes 3_1 to 3_n include, for example, a "soldering" process, a "board
assembly" process, a process of "incorporating board into body", an "inspection" process,
and the like. Various devices can be installed in each process of the production line.
Examples of the devices include a robot, a machining device, an inspection device,
various sensors, and the like.
[0019] PLC 20 is a controller that controls entire production line 2, and is communicably
connected to devices installed in production line 2. Various types of industrial Ethernet
(registered trademark) are used as a network that communicably connects PLC 20 and
the devices. As the industrial Ethernet (registered trademark), EtherCAT (registered
trademark), Profmet IRT, MECHATROLINK (registered trademark) -III, Powerlink, SERCOS
(registered trademark) -III, CIP Motion, and the like are known for example, and any
of these protocols may be adopted. Further, a field network other than Industrial
Ethernet (registered trademark) may be used. For example, in a case where motion control
is not performed, DeviceNet, CompoNet/IP (registered trademark), or the like may be
used.
[0020] PLC 20 operates as a master in a master-slave control system, and acquires information
from the devices as input data. PLC 20 executes arithmetic processing using the acquired
input data in accordance with a user program incorporated in advance. PLC 20 determines
a control content for the master-slave control system in response to the execution
of the arithmetic processing, and outputs control data corresponding to the control
content to the devices. PLC 20 repeatedly acquires input data from the devices and
outputs control data to the devices at a predetermined cycle (control cycle).
[0021] Cameras 30 and 40 are installed so as to capture an image of a worker who works in
production line 2. In the example shown in Fig. 1, cameras 30 and 40 are installed
so as to capture an image of a worker 4 in process 3-3. Specifically, camera 30 is
installed at a position where the face of worker 4 can be imaged from the front. Camera
40 is installed at a position where worker 4 and a work table of process 3-3 can be
imaged. Cameras 30 and 40 output moving image data (hereinafter simply referred to
as a "moving image") obtained by image capture to information processing device 10.
Note that cameras 30 and 40 may be installed not only in process 3-3 but also in a
process other than process 3-3.
[0022] Information processing device 10 is, for example, a general-purpose computer, and
analyzes a detailed situation of worker 4 working in process 3-3 on the basis of the
moving images acquired from cameras 30 and 40. Note that information processing device
10 may use the input data acquired by PLC 20 and the control data output from PLC
20 when analyzing the situation of worker 4.
<Hardware configuration of information processing device>
[0023] Fig. 2 is a schematic diagram illustrating a hardware configuration example of the
information processing device according to the embodiment. Information processing
device 10 typically has a structure according to a general-purpose computer architecture
as illustrated in Fig. 2. Specifically, information processing device 10 includes
a processor 11 such as a central processing unit (CPU) or a micro-processing unit
(MPU), a memory 12, a storage 13, a display controller 14, an input interface 15,
a camera interface 16, and a communication interface 17. These components are connected
to each other via a bus so as to be able to perform data communication.
[0024] Processor 11 implements various kinds of processing according to the present embodiment
by expanding various programs stored in storage 13 in memory 12 and executing the
programs.
[0025] Memory 12 is typically a volatile storage device such as a DRAM, and stores a program
read from storage 13 and the like.
[0026] Storage 13 is typically a non-volatile magnetic storage device such as a hard disk
drive. Storage 13 stores a model generation program 131, an operation section detection
program 134, an emotion detection program 135, a line-of-sight detection program 136,
and a provision program 137 which are to be executed by processor 11. Storage 13 also
stores a plurality of learning data sets 132 used for execution of model generation
program 131 and an inference model 133 generated by execution of model generation
program 131. Various programs installed in storage 13 are distributed in a state of
being stored in a memory card or the like.
[0027] Display controller 14 is connected to display device 70, and outputs a signal for
displaying various types of information to display device 70 in accordance with an
internal command from processor 11.
[0028] Input interface 15 mediates data transmission between processor 11 and an input device
75 such as a keyboard, a mouse, a touch panel, or a dedicated console. That is, input
interface 15 receives an operation command given by a user operating input device
75.
[0029] Camera interface 16 mediates data transmission between processor 11 and cameras 30
and 40. More specifically, an imaging instruction is output from processor 11 to cameras
30 and 40 via camera interface 16. Camera interface 16 outputs the moving image received
from cameras 30 and 40 to processor 11 in response to the imaging instruction. Camera
interface 16 operates as an acquisition unit that acquires a moving image from cameras
30 and 40.
[0030] Communication interface 17 mediates data transmission between processor 11 and an
external device (for example, PLC 20). Communication interface 17 typically includes
Ethernet (registered trademark), a universal serial bus (USB), and the like. Note
that various programs stored in storage 13 may be downloaded from a distribution server
or the like via communication interface 17.
[0031] When a computer having a structure following the general-purpose computer architecture
as described above is used, an operating system (OS) for providing basic functions
of the computer may be installed in addition to an application for providing the functions
according to the present embodiment. In this case, the program according to the present
embodiment may execute processing by calling necessary modules in a predetermined
order and timing among program modules provided as a part of the OS. That is, the
program itself according to the present embodiment does not include the module as
described above, and processing may be executed in cooperation with the OS.
[0032] Alternatively, some or all of the functions provided by executing model generation
program 131, operation section detection program 134, emotion detection program 135,
line-of-sight detection program 136, and provision program 137 may be implemented
as a dedicated hardware circuit.
<Functional configuration of information processing device>
[0033] Fig. 3 is a diagram illustrating an example of functional configuration of the information
processing device according to the embodiment. As illustrated in Fig. 3, information
processing device 10 includes a storage unit 101, a model generator 102, an operation
section detector 103, an emotion detector 104, a line-of-sight detector 105, and a
provision unit 106. Storage unit 101 is implemented by memory 12 and storage 13. Model
generator 102 is implemented by processor 11 executing model generation program 131.
Operation section detector 103 is implemented by processor 11 executing operation
section detection program 134. Emotion detector 104 is implemented by processor 11
executing emotion detection program 135. Line-of-sight detector 105 is implemented
by processor 11 executing line-of-sight detection program 136. Provision unit 106
is implemented by display controller 14, input interface 15, and processor 11 that
executes provision program 137.
(Configuration related to function of detecting operation section)
[0034] The work of each process includes multiple operation sections. For example, the "soldering"
process includes an operation section in which the board is carried in from a previous
process and is attached to a jig, an operation section in which a component is soldered
to the board, and an operation section in which the board is taken out from the jig
and transferred to the next process.
[0035] Model generator 102 generates inference model 133 that infers the operation section
to which each frame of the moving image obtained by image capture with camera 40 belongs.
Model generator 102 stores generated inference model 133 in storage unit 101.
[0036] Inference model 133 may be appropriately configured to be capable of executing arithmetic
processing of carrying out an inference task corresponding to the target data by,
for example, a predetermined algorithm, a predetermined rule, a functional expression,
or the like. The output of inference model 133 may be appropriately configured to
be able to specify a result of the execution of the inference task. In an example
of the present embodiment, inference model 133 includes a trained machine learning
model generated by machine learning. The machine learning model includes parameters
that can be adjusted by machine learning. The configuration and type of the machine
learning model may be appropriately selected according to the embodiment.
[0037] Fig. 4 is a diagram illustrating an example of the inference model. Fig. 4 illustrates
inference model 133 configured by a neural network.
[0038] As illustrated in Fig. 4, inference model 133 includes an input layer 51, one or
more intermediate (hidden) layers 52, and an output layer 53. The number of intermediate
layers 52 may be appropriately determined according to the embodiment. Intermediate
layer 52 may be omitted. The number of layers of the neural network constituting inference
model 133 may be appropriately determined according to the embodiment. Input layer
51 may be appropriately configured to be able to receive target data. Output layer
53 may be appropriately configured to output a value corresponding to the inference
result. Input layer 51 may be configured to be able to receive information other than
the target data, and output layer 53 may be configured to output information other
than the information corresponding to the inference result.
[0039] Each of input layer 51, intermediate layer 52, and output layer 53 includes one or
more nodes (neurons). The number of nodes included in each of input layer 51, intermediate
layer 52, and output layer 53 is not particularly limited, and may be appropriately
determined according to the embodiment. The node included in each of input layer 51,
intermediate layer 52, and output layer 53 may be connected to all nodes in adjacent
layers. As a result, inference model 133 may be constructed with a fully connected
neural network. However, the connection relationship of the nodes is not limited to
such an example, and may be appropriately determined according to the embodiment.
For example, each node may be connected to a specific node of an adjacent layer or
may be connected to a node of a layer other than the adjacent layer.
[0040] A weight (connection weight) is set to each connection between the nodes. A threshold
is set for each node, and basically, an output of each node is determined according
to whether or not a sum of products of each input and each weight exceeds the threshold.
The threshold may be expressed by an activating function. In this case, the sum of
the products of each input and each weight is input to the activating function, and
the operation of the activating function is executed, so that the output of each node
is determined. The type of the activating function may be freely selected. The weight
of each connection between nodes included in input layer 51, intermediate layer 52,
and output layer 53 and the threshold of each node are examples of parameters used
for the arithmetic processing of inference model 133.
[0041] In machine learning, the values of the parameters of inference model 133 are adjusted,
as appropriate, to obtain the ability to perform a desired inference task using the
plurality of learning data sets 132. Learning data set 132 includes a combination
of training data and a correct label. In one example, the machine learning is configured
by training inference model 133 so that the execution result of the inference task
obtained from inference model 133 through an input of the training data to inference
model 133 for learning data set 132 matches the corresponding correct label (adjusting
the value of the parameter). For example, a known method such as an error back-propagation
method may be adopted as the machine learning method according to the machine learning
model.
[0042] In the present embodiment, learning data set 132 is created in advance from a moving
image obtained by image capture with camera 40. The moving image includes a specific
worker selected for machine learning. Each of the plurality of learning data sets
132 includes training data that is a predetermined number of consecutive frames included
in the moving image, and a correct label indicating an operation section of the work
performed by the specific worker included in the training data. As a result, by inputting
a predetermined number of frames, inference model 133 in which a label indicating
an operation section to be inferred is output is generated.
[0043] Operation section detector 103 detects an operation section to which each frame of
the moving image obtained from camera 40 belongs. Specifically, operation section
detector 103 inputs a predetermined number of consecutive frames including a frame
(hereinafter referred to as "target frame") from which an operation section is to
be detected to inference model 133. For example, a predetermined number (m + n + 1)
of frames including m consecutive frames before the target frame, the target frame,
and n consecutive frames after the target frame are input to inference model 133.
Operation section detector 103 detects the operation section indicated by the label
output from inference model 133 as the operation section to which the target frame
belongs.
(Emotion detector)
[0044] Emotion detector 104 detects the emotion of the worker on the basis of the moving
image acquired from camera 30. Emotion detector 104 may detect emotion using a known
technology (for example,
Japanese Patent Laying-Open No. 2016-149063).
[0045] For example, emotion detector 104 detects the face and face organs (eyes, eyebrows,
nose, mouth, etc.) for each frame of the moving image. Any algorithm including a known
method may be used for detecting face and face organs, and thus a detailed description
thereof will be omitted.
[0046] Emotion detector 104 recognizes the emotion (expression) of the worker included in
the frame on the basis of the states of the detected face and face organs. In the
present embodiment, emotions are classified into five types which are "neutral", "glad",
"angry", "surprise", and "sad". Alternatively, emotions may be classified into seven
types which are the above five types, "disgust", and "fear". A score obtained by quantifying
the degree of each of the five types (or seven types) of emotions so that the total
is 100 is output as the emotion recognition result. The score of each emotion is also
referred to as an expression component value. The emotion (expression) also depends
on the physical condition and mental state of the worker. Therefore, the score can
be used to estimate the physical condition and mental state of the worker.
[0047] Note that any algorithm including known methods may be used for recognizing emotion.
For example, the emotion detector 104 extracts a feature amount related to the relative
position and shape of the face organs on the basis of position information of the
face organs. As the feature amount, a Haar-like feature amount, a distance between
feature points, a Fourier descriptor, or the like can be used. Next, emotion detector
104 inputs the extracted feature amount to a discriminator of each of the five types
(or seven types) of face expressions, and calculates the degree of each expression.
Each discriminator can be generated by learning using a sample image. Finally, emotion
detector 104 normalizes the output values from the discriminators for the five types
(or seven types) so that the total is 100, and outputs scores (expression component
values) of the five types (or seven types) of emotions.
[0048] Emotion detector 104 stores the emotion recognition result together with time stamp
information in a database in storage unit 101.
(Line-of-sight detector)
[0049] Line-of-sight detector 105 detects the line-of-sight of the worker on the basis of
the moving image acquired from camera 30. Line-of-sight detector 105 detects the line-of-sight
using a known technology (for example,
Japanese Patent Laying-Open No. 2009-266086).
[0050] For example, line-of-sight detector 105 estimates the face direction of the worker
included in each frame of the moving image. Note that the method used for estimating
the face direction here is not limited to a specific method, and it is desirable to
use a method capable of more accurate, high-speed, and simple estimation.
[0051] Furthermore, line-of-sight detector 105 detects the eye contour and the pupil of
the worker included in each frame. For example, it is conceivable that line-of-sight
detector 105 detects the inner corner of the eye and the outer corner of the eye by
edge detection or corner detection. After detecting the pupil contour by edge detection,
line-of-sight detector 105 detects the left end and the right end of the pupil.
[0052] Line-of-sight detector 105 calculates feature parameters on the basis of the detection
results of the eye contour and the pupil. The feature parameter represents a relationship
between the inner corner and the outer corner of the eye and the left end and the
right end of the pupil. For example, the feature parameter indicates i) relative coordinates
of the inner corner of the eye with respect to the left end of the pupil (in other
words, a vector between the left end of the pupil and the inner corner of the eye)
and ii) relative coordinates of the outer corner of the eye with respect to the right
end of the pupil (in other words, a vector between the right end of the pupil and
the outer corner of the eye). Alternatively, the feature parameter may indicate a
ratio of the lengths of the two vectors described above. Both feature parameters represent
the position of the pupil with respect to the eye contour.
[0053] Line-of-sight detector 105 estimates the pupil direction of the worker by applying
the estimated face direction and feature parameters to the correlation between both
the face direction and the feature parameters and the pupil direction. The correlation
is created in advance. Line-of-sight detector 105 obtains the line-of-sight direction
of the worker by adding the estimated face direction to the estimated pupil direction.
(Provision unit)
[0054] Provision unit 106 provides a screen indicating detection results by operation section
detector 103, emotion detector 104, and line-of-sight detector 105, and various types
of information obtained from the detection results. Specifically, provision unit 106
displays the screen on display device 70. Various types of information may also be
generated from each of the detected operation section, emotion, and line-of-sight
of the worker, or may be generated by combining a plurality of items selected from
the operation section, emotion, and line-of-sight.
<Verification of estimation example of operation section >
[0055] A specific verification result of the operation section estimated for the "soldering"
process will be described.
[0056] Fig. 5 is a diagram illustrating three frames respectively corresponding to three
operation sections corresponding to the "soldering" process and a frame not belonging
to any operation section. As described above, the "soldering" process includes a "first
section" which is an operation section in which the board is carried in from a previous
process and is attached to a jig, a "second section" that is an operation section
in which a component is soldered to the board, and a "third section" that is an operation
section in which the board is taken out from the jig and transferred to the next process.
Parts (a), (b), and (c) of Fig. 5 illustrate frames belonging to the operation sections
of "first section", the "second section", and the "third section", respectively. The
moving image includes a frame that does not belong to any of the operation sections
of the "first section", the "second section", and the "third section", that is, a
frame in which no work of any of the operation sections of the "first section", the
"second section", and the "third section" is performed. Therefore, inference model
133 for classifying each frame of the moving image into any one of the operation sections
of the "first section", the "second section", the "third section", and "None" is generated.
The operation section of "None" is a section in which no work of the operation sections
of the "first section", the "second section", and the "third section" is performed.
[0057] Fig. 6 is a diagram illustrating a verification result of an estimated operation
section. The upper part of Fig. 6 illustrates operation sections classified by a person
checking a moving image. That is, the upper part of Fig. 6 shows the correct answer
of the operation sections. On the other hand, the lower part of Fig. 6 illustrates
the operation sections inferred using inference model 133.
[0058] The operation sections illustrated in the lower part of Fig. 6 are inferred using
inference model 133 generated according to the following conditions.
- Used model: 3D ResNet (https://github.com/kenshohara/3D-ResNets-PyTorch)
- Input data: image of 16 frames in which each pixel indicates the density of RGB, and
the image size is 112 pixels × 112 pixels
- Learning Rate: 0.1 (0.01 when validation loss converges)
- Data Augmentation:
Horizontal flip at 50%
Randomly select from 4 corners and 1 center, and perform spatial crop
Randomly extract 16 frames in moving image
- Transfer learning: using r3d50_K_200
depth 50, epoch 200, classes 700, used data set kinectis-700
- Number of used data: operation section of "first section": 10, operation section of
"second section": 10, operation section of "third section": 15, operation section
of "None": 2
- Mini batch size: 30
[0059] As illustrated in Fig. 6, the operation sections inferred by inference model 133
are similar to the operation sections classified by human confirmation. As described
above, the inference accuracy of inference model 133 is high.
<Example of provision screen>
[0060] Fig. 7 is a diagram illustrating one example of a provision screen. A screen 60 shown
in Fig. 7 includes a graph 61 provided by provision unit 106 and showing the transition
of the detected operation sections. The user can determine whether the operation procedure
of the worker is appropriate or not by checking screen 60.
[0061] Fig. 8 is a diagram illustrating another example of the provision screen. Fig. 9
is a diagram illustrating still another example of the provision screen. A screen
65 illustrated in Figs. 8 and 9 is provided by provision unit 106. As illustrated
in Figs. 8 and 9, screen 65 includes regions 66 to 68.
[0062] In region 66, a moving image obtained by image capture with camera 30 is played.
In region 66, a frame is displayed in accordance with an operation on an operation
bar 69. Note that, in a case where there is no operation on operation bar 69, the
latest frame acquired from camera 30 may be displayed in region 66.
[0063] In region 66, marks 66a to 66d and lines 66e and 66f are displayed in the moving
image.
[0064] Mark 66a indicates the position of the pupil with respect to the contour of the right
eye of the worker included in the moving image. Mark 66b indicates the position of
the pupil with respect to the contour of the left eye of the worker included in the
moving image. Marks 66a and 66b are generated on the basis of the eye contour and
the pupil detected from the frame displayed in region 66.
[0065] Line 66e indicates the line-of-sight direction of the right eye of the worker included
in the moving image. Line 66f indicates the line-of-sight direction of the left eye
of the worker included in the moving image. Lines 66e and 66f are generated on the
basis of the line-of-sight direction detected from the frame displayed in region 66.
[0066] As a result, the user can easily recognize the eye contour, the state of the pupils,
and the line-of-sight direction of the worker by checking marks 66a and 66b and lines
66e and 66f.
[0067] Mark 66c indicates a negative type of emotion of the worker included in the moving
image. Specifically, mark 66c indicates an emotion having the highest score among
emotions "neutral", "surprise", "angry", and "sad", and has a picture corresponding
to the emotion. Mark 66c in Fig. 8 indicates the emotion "neutral". Mark 66c in Fig.
9 indicates the emotion "sad". In addition, an indicator 66g indicating the magnitude
of the score of the emotion indicated by mark 66c is illustrated around mark 66c.
[0068] Mark 66d indicates a positive type of emotion of the worker included in the moving
image. Specifically, mark 66d indicates an emotion having the highest score among
emotions "neutral" and "glad", and has a picture corresponding to the emotion. Mark
66d in Fig. 8 indicates the emotion "neutral". Mark 66d in Fig. 9 indicates the emotion
"glad". In addition, an indicator 66h indicating the magnitude of the score of the
emotion indicated by mark 66d is illustrated around mark 66d.
[0069] The user can recognize the emotion of the worker by checking marks 66c and 66d, and
can recognize the degree of the emotion by checking indicators 66g and 66h.
[0070] In region 67, an image including an object in front of the worker is displayed. The
image may be prepared in advance or may be acquired from a camera different from cameras
30 and 40. In region 67, a mark 67a indicating the viewpoint of the worker is also
displayed. The position of mark 67a is determined on the basis of the line-of-sight
direction detected from the frame displayed in region 66. In screen 65 illustrated
in Fig. 8, the line-of-sight of the worker is directed to the upper left, and thus,
mark 67a is displayed in the upper left portion of the image in region 67. Specifically,
in the image in region 67, mark 67a is displayed so as to be superimposed on a standard
operation procedure A on the upper left side. In screen 65 illustrated in Fig. 9,
the line-of-sight of the worker is directed downward, and thus, mark 67a is displayed
in the lower portion of the image in region 67. Specifically, in the image in region
67, mark 67a is displayed so as to be superimposed on a parts box on the lower side.
[0071] The user can easily recognize where the worker is looking by checking region 67.
[0072] In region 68, a graph indicating transition of the worker's emotion is displayed.
That is, the graph indicates the transition of the score of each of the five types
of emotions "neutral", "glad", "surprise", "angry", and "sad". In region 68, a line
68a indicating the time corresponding to the frame displayed in region 66 is displayed.
Therefore, the user can recognize the emotion of the worker included in the frame
displayed in region 66 by viewing the score of each emotion overlapping line 68a.
<Usage example of detection result>
[0073] Fig. 10 is a diagram illustrating a relationship between worker's emotions and production
indexes. The upper part of Fig. 10 illustrates the transition of the production volume
and the defect rate per unit time which are production indexes.
The lower part of Fig. 10 illustrates the transition of the score of each emotion
of the worker. In the example illustrated in Fig. 10, a decrease in the production
volume and an increase in the defect rate per unit time are observed with an increase
in the score of "sad".
[0074] Therefore, an administrator can recognize the worker having the emotion leading to
the decrease in production efficiency by checking region 68 in Figs. 8 and 9, and
can provide an appropriate care to the worker. Furthermore, as described above, emotion
depends on physical conditions and mental states. Therefore, the administrator can
recognize a change in the physical condition or mental state of the worker by checking
region 68 in Figs. 8 and 9, and give the worker a rest.
[0075] Furthermore, provision unit 106 may provide a notification for promoting care for
the worker in response to the score of a target type out of the plurality of types
of emotions falling outside a prescribed range on the basis of the relationship illustrated
in Fig. 10. Specifically, provision unit 106 may compare the score of the emotion
"sad" with a threshold, and provide a notification for promoting appropriate care
in response to the score of the emotion "sad" exceeding the threshold. For example,
workers with an intellectual disorder or a mental disorder often have difficulty in
communication. The administrator can provide appropriate care at an early stage by
receiving the above notification regarding such workers. As a result, a decrease in
production efficiency can be suppressed.
[0076] The worker preferably performs the work while checking the standard operation procedure.
Therefore, the administrator checks region 67 in Figs. 8 and 9 to determine whether
or not the viewpoint of the worker moves in a desired order. As a result, the administrator
can determine whether or not the work is performed in an appropriate procedure.
[0077] Furthermore, provision unit 106 may store reference information indicating the transition
of the viewpoint when standard work is performed, and calculate the similarity between
the reference information and the transition of mark 67a displayed in region 67. The
reference information is created in advance. Provision unit 106 may provide a notification
indicating that the work procedure is different in response to the similarity between
the reference information and the transition of mark 67a displayed in region 67 being
less than the threshold. Thus, the administrator can easily recognize the worker who
should be educated about the work procedure.
[0078] By checking screen 60 illustrated in Fig. 7, the administrator can create an ideal
work procedure manual from the transition of operation sections detected from the
moving image obtained by imaging a skilled worker. Alternatively, provision unit 106
may automatically create a work standard on the basis of the transition of detected
operation sections, and provide the created work standard.
<Modification>
[0079] Storage 13 of information processing device 10 may not store model generation program
131. That is, information processing device 10 may not include model generator 102.
In this case, information processing device 10 may acquire inference model 133 from
another device in which model generation program 131 is installed. A processor of
the other device executes model generation program 131 to implement model generator
102.
[0080] Storage 13 of information processing device 10 may not store one or two of operation
section detection program 134, emotion detection program 135, and line-of-sight detection
program 136. That is, information processing device 10 may not include one or two
functional blocks of operation section detector 103, emotion detector 104, and line-of-sight
detector 105. For example, in a case where information processing device 10 includes
only emotion detector 104, it is only sufficient that provision unit 106 provides
screen 65 including regions 66 and 68 but not including region 67. In a case where
information processing device 10 includes only line-of-sight detector 105, it is only
sufficient that provision unit 106 provides screen 65 including regions 66 and 67
but not including region 68. In a case where information processing device 10 includes
only operation section detector 103, provision unit 106 provides screen 60 illustrated
in Fig. 7 and does not provide screen 65 illustrated in Figs. 8 and 9. In a case where
information processing device 10 includes only emotion detector 104 and line-of-sight
detector 105, provision unit 106 provides screen 65 illustrated in Figs. 8 and 9 and
does not provide screen 60 illustrated in Fig. 7. In a case where information processing
device 10 includes only operation section detector 103 and emotion detector 104, it
is only sufficient that provision unit 106 provides screen 60 illustrated in Fig.
7 and screen 65 including regions 66 and 68 but not including region 67. In a case
where information processing device 10 includes only operation section detector 103
and line-of-sight detector 105, it is only sufficient that provision unit 106 provides
screen 60 illustrated in Fig. 7 and screen 65 including regions 66 and 67 but not
including region 68.
[0081] Although the present invention has been described and illustrated in detail, it is
clearly understood that the same is by way of illustration and example only and is
not to be taken by way of limitation, the scope of the present invention being interpreted
by the terms of the appended claims.
1. An information processing device (10) comprising:
a first acquisition unit (16) configured to acquire a first moving image from a first
camera (40) that images a target worker (4) and surroundings of the target worker
(4), the first camera (40) being installed at a production site;
an operation section detector (11, 103) configured to detect, from a predetermined
number of consecutive first frames included in the first moving image, an operation
section of work performed by the target worker (4) included in the predetermined number
of first frames using an inference model (133); and
a provision unit (11, 106) configured to provide a detection result by the operation
section detector (11, 103), wherein
the inference model (133) is generated by learning processing using a plurality of
learning data sets (132), each of the plurality of learning data sets (132) including
a predetermined number of consecutive second frames included in a moving image that
includes a specific worker, and a label indicating an operation section of work performed
by the specific worker included in the predetermined number of second frames.
2. The information processing device (10) according to claim 1, further comprising:
a second acquisition unit (16) configured to acquire a second moving image from a
second camera (30) that images a face of the target worker (4), the second camera
(30) being installed at the production site; and
an emotion detector (11, 104) configured to detect an emotion of the target worker
(4) included in each frame of the second moving image, wherein
the provision unit (11, 106) further provides a transition of the emotion detected
by the emotion detector (11, 104).
3. The information processing device (10) according to claim 1, further comprising:
a second acquisition unit (16) configured to acquire a second moving image from a
second camera (30) that images a face of the target worker (4), the second camera
(30) being installed at the production site; and
a line-of-sight detector (11, 105) configured to detect a line-of-sight direction
of the target worker (4) included in each frame of the second moving image, wherein
the provision unit (11, 106):
further provides an image including an object in front of the target worker (4);
determines a position of a viewpoint of the worker (4) in the image on the basis of
the line-of-sight direction detected by the line-of-sight detector (11, 105); and
displays a mark (67a) at the determined position in the image.
4. The information processing device (10) according to claim 1, further comprising:
a second acquisition unit (16) configured to acquire a second moving image from a
second camera (30) that images a face of the target worker (4), the second camera
(30) being installed at the production site;
an emotion detector (11, 104) configured to detect an emotion of the target worker
(4) included in each frame of the second moving image; and
a line-of-sight detector (11, 105) configured to detect a line-of-sight direction
of the target worker (4) included in each frame of the second moving image, wherein
the provision unit (11, 106):
further provides a transition of the emotion detected by the emotion detector (11,
104);
further provides an image including an object in front of the target worker (4);
determines a position of a viewpoint of the worker in the image on the basis of the
line-of-sight direction detected by the line-of-sight detector (11, 105); and
displays a mark (67a) at the determined position in the image.
5. An information processing device (10) comprising:
an acquisition unit (16) configured to acquire a moving image from a camera (30) that
images a face of a worker, the camera (30) being installed at a production site;
a line-of-sight detector (11, 105) configured to detect a line-of-sight direction
of the worker (4) included in each frame of the moving image; and
a provision unit (11, 106) configured to provide an image including an object in front
of the worker (4), wherein
the provision unit (11, 106):
determines a position of a viewpoint of the worker (4) in the image on the basis of
the line-of-sight direction detected by the line-of-sight detector (11, 105); and
displays a mark (67a) at the determined position in the image.
6. The information processing device (10) according to claim 5, further comprising:
an emotion detector (11, 104) configured to detect an emotion of the worker (4) included
in each frame of the moving image, wherein
the provision unit (11, 106) further provides a transition of the emotion detected
by the emotion detector (11, 104).
7. An information processing method comprising:
acquiring a moving image from a camera (40) that is installed at a production site
and that images a target worker (4) and surroundings of the target worker (4);
detecting, from a predetermined number of consecutive first frames included in the
moving image, an operation section of work performed by the target worker (4) included
in the predetermined number of first frames using an inference model (133); and
providing a detection result, wherein
the inference model (133) is generated by learning processing using a plurality of
learning data sets (132), each of the plurality of learning data sets (132) including
a predetermined number of consecutive second frames included in a moving image that
includes a specific worker, and a label indicating an operation section of work performed
by the specific worker included in the predetermined number of second frames.
8. An information processing method comprising:
acquiring a moving image from a camera (30) that is installed at a production site
and that images a face of a worker (4);
detecting a line-of-sight direction of the worker (4) included in each frame of the
moving image; and
providing an image including an object in front of the worker (4), wherein
the providing includes:
determining a position of a viewpoint of the worker (4) in the image on the basis
of the line-of-sight direction that has been detected; and
displaying a mark (67a) at the determined position in the image.