CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority from prior Japanese Patent Application No.
2019-086363, filed on April 26, 2019, entitled "METHOD, APPARATUS, AND COMPUTER PROGRAM FOR SUPPORTING DISEASE ANALYSIS,
AND METHOD, APPARATUS, AND PROGRAM FOR TRAINING COMPUTER ALGORITHM", the entire content
of which is incorporated herein by reference.
FIELD OF THE INVENTION
[0002] The present invention relates to a method, an apparatus, and a computer program that
support disease analysis, and a method, an apparatus, and a program that train a computer
algorithm for supporting disease analysis.
BACKGROUND
[0003] Japanese Laid-Open Patent Publication No.
H10-197522 describes a method for distinguishing between a pathologic tissue that shows a "hard
cancer pattern" and one that shows an "endocapillary fibrous tumor pattern" by inputting
two kinds of feature quantities to a neural network. One of the feature quantities
is calculated by using the number, area, shape, roundness, color, and chromaticity
of nuclear regions, the number, area, shape, and roundness of cavity regions, the
number, area, shape, roundness, color, and chromaticity of interstitium regions, and
the number, area, shape, and roundness of lumen regions, which are extracted from
a tissue image, the texture of the image, and a wavelet transform value. The other
of the feature quantities is calculated by using the degree of a two-layer structure
in which epithelial cells are accompanied by myoepithelial cells, the degree of fibrillization,
the presence or absence of a papillary pattern, the presence or absence of a cribriform
pattern, the presence or absence of a necrotic substance, the presence or absence
of a solid pattern, and the color or chromaticity of the image.
SUMMARY OF THE INVENTION
[0004] Japanese Laid-Open Patent Publication No.
H10-197522 discloses discernment of a disease on the basis of an image of a tissue, but does
not disclose discernment of a disease based on individual cell images.
[0005] An object of the present invention is to accurately discern a disease on the basis
of individual cell images.
[0006] The present invention relates to a method for supporting disease analysis. The method
includes classifying, on the basis of images obtained from a plurality of analysis
target cells contained in a specimen collected from a subject, a morphology of each
analysis target cell, and obtaining cell morphology classification information corresponding
to the specimen, on the basis of a result of the classification; and analyzing a disease
of the subject by means of a computer algorithm, on the basis of the cell morphology
classification information. According to these configurations, a disease can be discerned
on the basis of individual cell images.
[0007] Preferably, the classifying of the morphology of each analysis target cell includes
discerning a type of cell of each analysis target cell. More preferably, the cell
morphology classification information is information regarding a cell number for each
type of cell (64). According to these configurations, a disease can be discerned on
the basis of individual types of cells.
[0008] Preferably, the classifying of the morphology of each analysis target cell includes
discerning a type of abnormal finding in each analysis target cell. More preferably,
the cell morphology classification information is information regarding a cell number
for each type of abnormal finding (63). According to these configurations, a disease
can be discerned on the basis of the types of abnormal findings in individual cells.
[0009] The classifying of the morphology of each analysis target cell includes discerning
a type of abnormal finding for each type of cell of the analysis target cell. According
to these configurations, a disease can be more accurately discerned on the basis of
individual cell images.
[0010] The classifying of the morphology of each analysis target cell includes inputting
analysis data that includes information regarding each analysis target cell, to a
deep learning algorithm having a neural network structure, and classifying the morphology
of each analysis target cell by means of the deep learning algorithm. According to
these configurations, a disease can be more accurately discerned.
[0011] The computer algorithm is a machine learning algorithm, and the analyzing of the
disease of the subject is performed by inputting the cell morphology classification
information as a feature quantity to the machine learning algorithm (67). According
to these configurations, a disease can be more accurately discerned.
[0012] Preferably, the machine learning algorithm (67) is selected from tree, regression,
neural network, Bayes, clustering, or ensemble learning. More preferably, the machine
learning algorithm is gradient boosting tree. By using these machine learning algorithms,
a disease can be more accurately discerned.
[0013] The obtaining of the cell morphology classification information includes obtaining
a probability that each analysis target cell belongs to each of a plurality of cell
morphology classifications, calculating a sum of the probability for each type of
the cell morphology classifications, and obtaining the sum as the cell morphology
classification information. According to these configurations, more accurate disease
discernment can be realized.
[0014] Preferably, the specimen is a blood specimen. Since cells in blood reflect pathologies
of various diseases, more accurate disease discernment can be realized.
[0015] Preferably, the disease is a hematopoietic system disease. According to the present
invention, a hematopoietic system disease can be accurately discerned.
[0016] The hematopoietic system disease is aplastic anemia or myelodysplastic syndrome.
According to the present invention, a hematopoietic system disease can be accurately
discerned.
[0017] Preferably, the abnormal finding is at least one type selected from the group consisting
of: nucleus morphology abnormality; granulation abnormality; cell size abnormality;
cell malformation; cytoclasis; vacuole; immature cell; presence of inclusion body;
Döhle body; satellitism; nucleoreticulum abnormality; petal-like nucleus; increased
N/C ratio; and bleb-like, smudge, and hairy cell-like morphologies By evaluating these
abnormal findings in these cells, more accurate disease discernment can be realized.
[0018] Preferably, the nucleus morphology abnormality includes at least one type selected
from hypersegmentation, hyposegmentation, pseudo-Pelger anomaly, ring-shaped nucleus,
spherical nucleus, elliptical nucleus, apoptosis, polynuclearity, karyorrhexis, enucleation,
bare nucleus, irregular nuclear contour, nuclear fragmentation, internuclear bridging,
multiple nuclei, cleaved nucleus, nuclear division, and nucleolus abnormality. The
granulation abnormality includes at least one type selected from degranulation, granule
distribution abnormality, toxic granule, Auer rod, Fagott cell, and pseudo Chediak-Higashi
granule-like granule. The cell size abnormality includes megathrombocyte. By evaluating
these abnormal findings, more accurate disease discernment can be realized.
[0019] Preferably, the type of cell includes at least one type selected from neutrophil,
eosinophil, platelet, lymphocyte, monocyte, and basophil. By evaluating these types
of cells, more accurate disease discernment can be realized.
[0020] More preferably, the type of cell further includes at least one type selected from
metamyelocyte, myelocyte, promyelocyte, blast, plasma cell, atypical lymphocyte, immature
eosinophil, immature basophil, erythroblast, and megakaryocyte. By evaluating these
types of cells in those cells, more accurate disease discernment can be realized.
[0021] The present invention relates to an apparatus (200) for supporting disease analysis.
The apparatus (200) includes a processing unit (20). The processing unit (20) classifies,
on the basis of images obtained from a plurality of analysis target cells contained
in a specimen collected from a subject, a morphology of each analysis target cell,
and obtains cell morphology classification information corresponding to the specimen,
on the basis of a result of the classification; and analyzes a disease of the subject
by means of a computer algorithm, on the basis of the cell morphology classification
information.
[0022] The present invention relates to a program for supporting disease analysis. The program
is configured to cause a computer to execute: classifying, on the basis of images
obtained from a plurality of analysis target cells contained in a specimen collected
from a subject, a morphology of each analysis target cell, and obtaining cell morphology
classification information corresponding to the specimen, on the basis of a result
of the classification; and analyzing a disease of the subject by means of a computer
algorithm, on the basis of the cell morphology classification information.
[0023] According to the apparatus or the program for supporting disease analysis, accurate
disease discernment can be realized.
[0024] The present invention relates to a training method for a computer algorithm for supporting
disease analysis. The training method includes: classifying, on the basis of images
obtained from a plurality of analysis target cells contained in a specimen collected
from a subject, a morphology of each analysis target cell, and obtaining cell morphology
classification information corresponding to the specimen, on the basis of a result
of the classification; and inputting the obtained cell morphology classification information
as first training data and disease information of the subject as second training data,
to the computer algorithm.
[0025] The present invention relates to a training apparatus (100) for a computer algorithm
for supporting disease analysis. The training apparatus (100) includes a processing
unit (10). The processing unit (10) classifies, on the basis of images obtained from
a plurality of analysis target cells contained in a specimen collected from a subject,
a morphology of each analysis target cell, obtains cell morphology classification
information corresponding to the specimen, on the basis of a result of the classification,
and inputs the obtained cell morphology classification information as first training
data and disease information (55) of the subject as second training data, to the computer
algorithm.
[0026] The present invention relates to a training program for a computer algorithm for
supporting disease analysis. The training program is configured to cause a computer
to execute: classifying, on the basis of images obtained from a plurality of analysis
target cells contained in a specimen collected from a subject, a morphology of each
analysis target cell, and obtaining cell morphology classification information corresponding
to the specimen, on the basis of a result of the classification; and inputting the
cell morphology classification information as first training data and disease information
(55) of the subject as second training data, to the computer algorithm.
[0027] According to the training method, the training apparatus (100), or the training program,
accurate disease discernment can be realized.
[0028] According to the present invention, a disease can be accurately discerned on the
basis of individual cell images.
BRIEF DESCRIPTION OF THE DRAWINGS
[0029]
FIG. 1 shows an outline of a support method using a discriminator;
FIG. 2 is a schematic diagram showing an example of a generation procedure of deep
learning training data and a training procedure of a 1 st deep learning algorithm,
a 2nd deep learning algorithm, and a machine learning algorithm;
FIG. 3A shows an example of a label value;
FIG. 3B shows an example of the label value;
FIG. 4 shows an example of machine learning training data;
FIG. 5 is a schematic diagram showing an example of a generation procedure of analysis
data and a procedure of disease analysis using a computer algorithm;
FIG. 6 shows a schematic configuration example of a disease analysis system 1;
FIG. 7 is a block diagram showing an example of a hardware configuration of a vendor-side
apparatus 100;
FIG. 8 is a block diagram showing an example of a hardware configuration of a user-side
apparatus 200;
FIG. 9 is a block diagram for describing an example of functions of a training apparatus
100A;
FIG. 10 is a flow chart showing an example of the flow of a deep learning process;
FIG. 11A is a schematic diagram for describing a neural network;
FIG. 11B is a schematic diagram for describing the neural network;
FIG. 11C is a schematic diagram for describing the neural network;
FIG. 12 is a block diagram for describing an example of functions of a machine learning
apparatus 100A;
FIG. 13 is a flow chart showing an example of the flow of a machine learning process
using first information;
FIG. 14 is a flow chart showing an example of the flow of a machine learning process
using the first information and second information;
FIG. 15 is a block diagram for describing an example of functions of a disease analyzer
200A;
FIG. 16 is a flow chart showing an example of the flow of a disease analysis process
using the first information;
FIG. 17 is a flow chart showing an example of the flow of a disease analysis process
using the first information and the second information;
FIG. 18 shows a schematic configuration example of a disease analysis system 2;
FIG. 19 is a block diagram for describing an example of functions of an integrated-type
disease analyzer 200B;
FIG. 20 shows a schematic configuration example of a disease analysis system 3;
FIG. 21 is a block diagram for describing an example of functions of an integrated-type
disease analyzer 100B;
FIG. 22A shows the structure of the first part of a discriminator used in Example;
FIG. 22B shows the structure of the second part of a discriminator used in Example;
FIG. 22C shows the structure of the third part of a discriminator used in Example;
FIG. 23 is a table showing the number of cells used as training data for a deep learning
algorithm and the number of cells used in validation for evaluating the performance
of the trained deep learning algorithm;
FIG. 24 is a table showing a result of evaluation of the performance of a trained
2nd deep learning algorithm;
FIG. 25 is a table showing a result of evaluation of the performance of a trained
1st deep learning algorithm;
FIG. 26A is the first part of a heat map of abnormal findings contributing to disease
analysis;
FIG. 26B is the second part of a heat map of abnormal findings contributing to disease
analysis;
FIG. 26C is the third part of a heat map of abnormal findings contributing to disease
analysis;
Fig. 26D is the fourth part of a heat map of abnormal findings contributing to disease
analysis; and
FIG. 27 shows a ROC curve of a disease analysis result.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0030] Hereinafter, embodiments of the present disclosure will be described in detail with
reference to the attached drawings. In the description below and the drawings, the
same reference characters represent the same or similar components. Thus, description
of the same or similar components is not repeated.
[0031] A method for supporting analysis of a disease of a subject (hereinafter, sometimes
simply referred to as "support method") will be described. The support method includes
classifying a morphology of each analysis target cell, and analyzing a disease of
the subject on the basis of the classification result. On the basis of images obtained
from a plurality of analysis target cells contained in a specimen collected from a
subject, the morphology of each analysis target cell is classified, and on the basis
of the classification result, cell morphology classification information corresponding
to the specimen is obtained. The support method includes analyzing a disease of the
subject on the basis of information regarding the type of abnormal finding (hereinafter,
sometimes referred to as "first information") as the cell morphology classification
information. The first information is information regarding the type of abnormal finding
that is obtained on the basis of the type of abnormal finding detected from each of
a plurality of analysis target cells contained in the specimen, and that corresponds
to the specimen. The abnormal finding is discerned on the basis of an image obtained
by capturing an analysis target cell. The support method includes analyzing a disease
of the subject on the basis of information regarding the type of cell (hereinafter,
sometimes referred to as "second information") as the cell morphology classification
information. The second information is information regarding the type of cell that
is obtained on the basis of the type of cell of each of a plurality of analysis target
cells contained in the specimen, and that corresponds to the specimen. The type of
cell is discerned on the basis of an image obtained by capturing an analysis target
cell.
[0032] The subject may be any animal for which a disease is to be analyzed. Examples of
the animal include human, dog, cat, rabbit, and monkey. Preferably, the subject is
a human.
[0033] The disease may be any disease that the above-mentioned animal has. For example,
the disease can include tumors of tissues other than the hematopoietic organ system,
diseases of the hematopoietic organ system, metabolic diseases, renal diseases, infectious
diseases, allergic diseases, autoimmune diseases, traumatic injuries, and the like.
[0034] The tumors of tissues other than the hematopoietic organ system can include benign
epithelial tumor, benign non epithelial tumor, malignant epithelial tumor, and malignant
non epithelial tumor. Preferable examples of the tumors of tissues other than the
hematopoietic organ system include malignant epithelial tumor and malignant non epithelial
tumor.
[0035] Examples of the diseases of the hematopoietic organ system include tumor, anemia,
plethora, platelet disorder, and myelofibrosis. Preferable examples of the hematopoietic
system tumor include: myelodysplastic syndrome; leukemia (acute myeloblastic leukemia,
acute myeloblastic leukemia (involving neutrophil differentiation), acute promyelocytic
leukemia, acute myelomonocytic leukemia, acute monocytic leukemia, erythroleukemia,
acute megakaryoblastic leukemia, acute myeloid leukemia, acute lymphoblastic leukemia,
lymphoblastic leukemia, chronic myelogenous leukemia, chronic lymphocytic leukemia,
and the like); malignant lymphoma (Hodgkin's lymphoma, non-Hodgkin's lymphoma, and
the like); multiple myeloma; and granuloma. Malignant tumors of the hematopoietic
system are preferably myelodysplastic syndrome, leukemia, and multiple myeloma, and
more preferably, myelodysplastic syndrome.
[0036] Examples of anemia include aplastic anemia, iron-deficiency anemia, megaloblastic
anemia (including vitamin B12 deficiency, folate deficiency, and the like), hemorrhagic
anemia, renal anemia, hemolytic anemia, thalassemia, sideroblastic anemia, and atransferrinemia.
Anemia is preferably aplastic anemia, pernicious anemia, iron-deficiency anemia, or
sideroblastic anemia, and more preferably aplastic anemia.
[0037] Plethora can include polycythemia vera and secondary polycythemia. Preferably, plethora
is polycythemia vera.
[0038] Platelet disorder can include thrombocytopenia, thrombocytosis, and megakaryocyte
abnormality. Thrombocytopenia can include disseminated intravascular coagulation,
idiopathic thrombocytopenic purpura, MYH9 disorder, Bernard-Soulier syndrome, and
the like. Thrombocytosis can include essential thrombocythemia. Megakaryocyte abnormality
can include small megakaryocyte, multinucleated megakaryocyte, platelet hypoplasia,
and the like.
[0039] Myelofibrosis can include primary myelofibrosis and secondary myelofibrosis.
[0040] Metabolic diseases can include carbohydrate metabolism abnormality, lipid metabolism
abnormality, electrolyte abnormality, metal metabolism abnormality, and the like.
The carbohydrate metabolism abnormality can include mucopolysaccharidosis, diabetes,
and glycogenosis. Preferably, carbohydrate metabolism abnormality is mucopolysaccharidosis
and diabetes. Lipid metabolism abnormality can include Gaucher's disease, Niemann-Pick
disease, hyperlipidemia, and atherosclerotic disease. Atherosclerotic disease can
include arteriosclerosis, atherosclerosis, thrombosis, embolism, and the like. Electrolyte
abnormality can include hyperkalemia, hypokalemia, hypernatremia, hyponatremia, and
the like. Metal metabolism abnormality can include iron metabolism abnormality, copper
metabolism abnormality, calcium metabolism abnormality, and inorganic phosphorus metabolism
abnormality.
[0041] Nephropathy can include nephrotic syndrome, renal impairment, acute renal failure,
chronic kidney disease, renal failure, and the like.
[0042] Infectious diseases can include bacterial infection, viral infection, rickettsial
infection, chlamydial infection, fungal infection, protozoan infection, and parasitic
infection.
[0043] Pathogenic bacteria of bacterial infections are not limited in particular. Examples
of pathogenic bacteria include coliform bacteria, staphylococci, streptococci, Haemophilus
bacteria, Neisseria bacteria, Moraxella bacteria, Listeria bacteria, Corynebacterium
diphtheria, Clostridium bacteria, Helicobacter bacteria, and Mycobacterium tuberculosis
complex.
[0044] Pathogenic viruses of viral infections are not limited in particular. Examples of
the pathogenic viruses include influenza virus, measles virus, rubella virus, varicellovirus,
dengue fever virus, cytomegalovirus, EB virus, enterovirus, human immunodeficiency
virus, HTLV-1 (human T-lymphotropic virus type-I), rabies virus, and the like.
[0045] Pathogenic fungi of fungal infections are not limited in particular. Pathogenic fungi
can include yeast-like fungi, filamentous fungi, and the like. Yeast-like fungi can
include Cryptococcus fungi, Candida fungi, and the like. Filamentous fungi can include
Aspergillus fungi, and the like.
[0046] Pathogenic protozoa of protozoan infections are not limited in particular. The pathogenic
protozoa can include malaria parasite, kala-azar parasite, and the like.
[0047] Pathogenic protozoa of parasitic infections can include lumbricus, nematode, hookworm,
and the like.
[0048] Preferable examples of the infectious diseases include bacterial infections, viral
infections, protozoan infections, and parasitic infections. More preferable examples
are bacterial infections. Pathologies of infectious diseases can include pneumonia,
sepsis, meningitis, and urinary tract infection.
[0049] Allergic diseases can include allergic diseases that belong to type I, type II, type
III, type IV, or type V. Allergic diseases belonging to type I can include pollinosis,
anaphylactic shock, allergic rhinitis, conjunctivitis, bronchial asthma, urticarial,
atopic dermatitis, and the like. Allergic diseases belonging to type II can include
immune incompatibile blood transfusion, autoimmune hemolytic anemia, autoimmune thrombocytopenia,
autoimmune granulocytopenia, Hashimoto's disease, Goodpasture syndrome, and the like.
Allergic diseases belonging to type III can include immune complex nephritis, Arthus
reaction, serum sickness, and the like. Allergic diseases belonging to type IV can
include tuberculosis, contact dermatitis, and the like. Allergic diseases belonging
to type V can include Basedow's disease, and the like. Allergic diseases are preferably
those of type I, type II, type III, and type IV, more preferably those of type I,
type II, type III, and further preferably that of type I. Allergic diseases belonging
to type II, type III, and type V overlap some of autoimmune diseases described later.
[0050] Autoimmune diseases can include systemic lupus erythematosus, rheumatoid arthritis,
multiple sclerosis, Sjogren's syndrome, scleroderma, dermatomyositis, primary biliary
cirrhosis, primary sclerosing cholangitis, ulcerative colitis, Crohn's disease, psoriasis,
vitiligo, bullous pemphigoid, alopecia areata, sudden dilated cardiomyopathy, type
1 diabetes mellitus, Basedow's disease, Hashimoto's disease, myasthenia gravis, IgA
nephropathy, membranous nephropathy, megaloblastic anemia, and the like. The autoimmune
diseases are preferably systemic lupus erythematosus, rheumatoid arthritis, multiple
sclerosis, Sjogren's syndrome, scleroderma, and dermatomyositis. The autoimmune diseases
are preferably autoimmune diseases in which antinuclear antibody is detected.
[0051] Traumatic injuries can include bone fracture, burn, and the like.
[0052] The specimen may be any specimen that can be collected from a subject. Preferably,
the specimen is blood, bone marrow, urine, and body fluid. Examples of blood include
peripheral blood, venous blood, and arterial blood. Preferably, blood is peripheral
blood. Examples of blood include peripheral blood collected by using an anticoagulant
agent such as ethylenediaminetetraacetate (sodium salt or potassium salt), heparin
sodium, or the like. The body fluid means fluids other than blood and urine. Examples
of the body fluid include ascites, pleural fluid, and spinal fluid.
[0053] The specimen may be selected in accordance with the disease to be analyzed. Cells
in blood, particularly in the above-described diseases, often have features that are
different from those of normal cells, in the numerical distribution of the types of
cells and/or the types of abnormal findings, which are described later. Therefore,
with respect to various diseases, analysis can be performed by using blood specimens.
Bone marrow allows analysis of diseases of the hematopoietic organ system, in particular.
Cells contained in ascites, pleural fluid, spinal fluid, and the like are effective
for diagnosis of tumors of tissues other than the hematopoietic organ system, diseases
of the hematopoietic organ system, infectious diseases, and the like, in particular.
Urine allows analysis of tumors of tissues other than the hematopoietic organ system,
infectious diseases, and the like, in particular.
[0054] The analysis target cell may be any cell that is contained in a specimen. The analysis
target cell means a cell that is used in order to analyze a disease. The analysis
target cell can include a plurality of cells. Here, the "plurality" can include a
case where the number of one type of cell is a plural number and a case where the
number of cell types is a plural number. The specimen in a normal state can include
a plurality of types of cells that are morphologically classified through histological
microscopic observation or cytological microscopic observation. The morphological
classification of a cell (also referred to as "cell morphology classification") includes
classification of the type of the cell and classification of the type of abnormal
finding in the cell. Preferably, the analysis target cell is a group of cells that
belong to a predetermined cell lineage. The predetermined cell lineage is a cell group
that belongs to the same lineage differentiated from one type of tissue stem cell.
Preferably, the predetermined cell lineage is cells of the hematopoietic system, and
more preferably, cells in blood (also referred to as "blood cells").
[0055] In a conventional method, a person observes, in a microscopic bright field, a preparation
having been subjected to bright field staining, whereby hematopoietic cells are morphologically
classified. Preferably, the staining is selected from Wright's staining, Giemsa staining,
Wright-Giemsa staining, and May-Giemsa staining. More preferably, the staining is
May-Giemsa staining. The preparation may be any preparation that allows individual
observation of the morphology of each cell belonging to a predetermined cell group.
Examples of the preparation include a smear preparation and an impression preparation.
Preferably, the preparation is a smear preparation using peripheral blood or bone
marrow as a specimen, and more preferably, is a smear preparation of peripheral blood.
[0056] In morphological classification, the type of blood cell includes: neutrophil including
segmented neutrophil and band neutrophil; metamyelocyte; myelocyte; promyelocyte;
blast; lymphocyte; plasma cell; atypical lymphocyte; monocyte; eosinophil; basophil;
erythroblast (which is nucleated erythrocyte and includes proerythroblast, basophilic
erythroblast, polychromatic erythroblast, orthochromatic erythroblast, promegaloblast,
basophilic megaloblast, polychromatic megaloblast, and orthochromatic megaloblast);
platelet; platelet aggregate; megakaryocyte (which is nucleated megakaryocyte and
includes micromegakaryocyte); and the like.
[0057] The predetermined cell group may include abnormal cells that exhibit morphologically
abnormal findings, in addition to normal cells. An abnormality appears as a morphologically
classified cell feature. Examples of abnormal cells are cells that emerge when a person
has a predetermined disease, and are tumor cells and the like. In the case of the
hematopoietic system, the predetermined disease is a disease selected from the group
consisting of: myelodysplastic syndrome; leukemia (including acute myeloblastic leukemia,
acute myeloblastic leukemia, acute promyelocytic leukemia, acute myelomonocytic leukemia
(involving neutrophil differentiation), acute monocytic leukemia, erythroleukemia,
acute megakaryoblastic leukemia, acute myeloid leukemia, acute lymphoblastic leukemia,
lymphoblastic leukemia, chronic myelogenous leukemia, chronic lymphocytic leukemia,
and the like); malignant lymphoma (Hodgkin's lymphoma, non-Hodgkin's lymphoma, and
the like); and multiple myeloma. In the case of the hematopoietic system, the abnormal
finding corresponds to a cell that has at least one type of morphological feature
selected from the group consisting of: nucleus morphology abnormality; presence of
vacuole; granule morphology abnormality; granule distribution abnormality; presence
of abnormal granule; cell size abnormality; presence of inclusion body; and bare nucleus.
[0058] Examples of the nucleus morphology abnormality include: nucleus becoming small; nucleus
becoming large; nucleus becoming hypersegmented; nucleus that should be segmented
in a normal state but has not been segmented (including pseudo-Pelger anomaly and
the like); presence of vacuole; swelled nucleolus; cleaved nucleus; a single cell
that should have one nucleus but has an abnormality of having two; and the like.
[0059] Examples of abnormality in the morphology of an entire cell include: presence of
vacuole in cytoplasm (also referred to as vacuolar degeneration); presence of morphological
abnormality in granule such as megathrombocyte, azurophil granule, neturophil granule,
eosinophil granule, and basophil granule; presence of abnormality in distribution
(excess, decrease, or disappearance) of the above-mentioned granules; presence of
abnormal granule (for example, toxic granule); cell size abnormality (larger or smaller
than normal cell); presence of inclusion body (Döhle body, Auer rod, and the like);
and bare nucleus.
[0060] Preferably, the abnormal finding is at least one type selected from the group consisting
of: nucleus morphology abnormality; granulation abnormality; cell size abnormality;
cell malformation; cytoclasis; vacuole; immature cell; presence of inclusion body;
Döhle body; satellitism; nucleoreticulum abnormality; petal-like nucleus; increased
N/C ratio; and bleb-like, smudge, and hairy cell-like morphologies.
[0061] Preferably, the nucleus morphology abnormality includes at least one type selected
from hypersegmentation, hyposegmentation, pseudo-Pelger anomaly, ring-shaped nucleus,
spherical nucleus, elliptical nucleus, apoptosis, polynuclearity, karyorrhexis, enucleation,
bare nucleus, irregular nuclear contour, nuclear fragmentation, internuclear bridging,
multiple nuclei, cleaved nucleus, nuclear division, and nucleolus abnormality. The
granulation abnormality includes at least one type selected from degranulation, granule
distribution abnormality, toxic granule, Auer rod, Fagott cell, and pseudo Chediak-Higashi
granule-like granule. Granulation abnormality in eosinophils and basophils includes
a phenomenon of biased distribution of granules in a cell as abnormal granules, for
example. The cell size abnormality includes megathrombocyte.
[0062] Preferably, the type of cell includes at least one type selected from neutrophil,
eosinophil, platelet, lymphocyte, monocyte, and basophil.
[0063] More preferably, the type of cell further includes at least one type selected from
metamyelocyte, myelocyte, promyelocyte, blast, plasma cell, atypical lymphocyte, immature
eosinophil, immature basophil, erythroblast, and megakaryocyte.
[0064] More preferably, the hematopoietic system disease is aplastic anemia or myelodysplastic
syndrome, and when the type of cell is neutrophil, the abnormal finding is at least
one type selected from granulation abnormality and hypersegmentation, or when the
type of cell is eosinophil, the abnormal finding is abnormal granule. The abnormal
finding in a cell includes megathrombocyte. By evaluating these findings, it is possible
to discern between aplastic anemia and myelodysplastic syndrome.
<Outline of support method>
[0065] In the support method, the manners of discerning an abnormal finding and/or discerning
the type of cell are not limited in particular, as long as discerning an abnormal
finding on the basis of an image and/or discerning the type of cell can be realized.
The discerning may be performed by an examiner or may be performed by using a discriminator
described below.
[0066] The outline of the support method using a discriminator is described with reference
to FIG. 1. The discriminator used in the support method includes a computer algorithm.
Preferably, the computer algorithm includes a first computer algorithm and a second
computer algorithm. More preferably, the first computer algorithm includes a plurality
of deep learning algorithms having a neural network structure. The second computer
algorithm includes a machine learning algorithm. Preferably, the deep learning algorithms
include a first neural network 50 for extracting a feature quantity quantitatively
representing a morphological feature of a cell, a second neural network 51 for discerning
the type of abnormal finding in the cell, and/or a second neural network 52 for discerning
the type of the cell. The first neural network 50 extracts a feature quantity of the
cell. The second neural network 51, 52 is at the downstream of the first neural network,
and discerns an abnormal finding in the cell or the type of the cell on the basis
of the feature quantity extracted by the first neural network 50. More preferably,
the second neural network 51, 52 may include a neural network trained for discerning
the type of cell, and a plurality of types of neural networks that have been trained
for respective abnormal findings in cells and that correspond to respective abnormal
findings. For example, in FIG. 1, a 1st deep learning algorithm is a deep learning
algorithm for discerning a first abnormal finding (for example, granulation abnormality),
and includes the first neural network 50 and the second neural network 51 trained
for discerning the first abnormal finding. A 1st' deep learning algorithm is a deep
learning algorithm for detecting a second abnormal finding (for example, hypersegmentation),
and includes the first neural network 50 and the second neural network 51 trained
for discerning the second abnormal finding. A 2nd deep learning algorithm is a deep
learning algorithm for discerning the type of cell, and includes the first neural
network 50 and the second neural network 52 trained for discerning the type of cell.
[0067] The machine learning algorithm analyzes, for each specimen, a disease of a subject
from whom the specimen has been collected, on the basis of a feature quantity outputted
from the deep learning algorithm, and outputs, as an analysis result, a disease name
or a label indicating the disease name.
[0068] Next, deep learning training data 75, a method for generating machine learning training
data, and a method for analyzing a disease are described with reference to the examples
shown in FIG. 2 to FIG. 4. In the following, for convenience, description is made
using the first neural network, the second neural network, and a gradient boosting
tree which is a machine learning algorithm.
<Generation of deep learning training data>
[0069] A training image 70 that is used for training a deep learning algorithm is an image
obtained by capturing an analysis target cell contained in a specimen collected from
a subject to whom a disease name has already been given. A plurality of the training
images 70 are captured for one specimen. The analysis target cell included in each
image is associated with the type of cell based on morphological classification and
a result of an abnormal finding discerned by an examiner. Preferably, a preparation
for capturing the training image 70 is created from a specimen containing the same
type of cell as the analysis target cell, by a preparation creating method and a staining
method similar to those employed for a preparation that includes the analysis target
cell. Preferably, the training image 70 is captured in the same condition as that
used for capturing the analysis target cell.
[0070] The training image 70 can be obtained in advance for each cell by using, for example,
a known light microscope or an imaging apparatus such as a virtual slide scanner.
In the example shown in FIG. 2, the training image 70 is generated by reducing a raw
image captured in 360 pixels × 365 pixels by a blood cell differential automatic analyzer
DI-60 (manufactured by Sysmex Corporation) into 255 pixels × 255 pixels. However,
this reduction is not mandatory. The number of pixels of the training image 70 is
not limited in particular as long as analysis can be performed, but the number of
pixels of one side of the image is preferably greater than 100. In the example shown
in FIG. 2, erythrocytes are present around a neutrophil, but the image may be trimmed
such that only the target cell is included in the image. If, at least, one cell, for
which training is to be performed (erythrocytes, and platelets of normal sizes may
be included), is included in one image and the pixels corresponding to the cell, for
which training is to be performed, exist by about 1/9 of the total pixels of the image,
the image can be used as the training image 70.
[0071] For example, preferably, image capturing by the imaging apparatus is performed in
RGB colors, CMY colors, or the like. Preferably, as for a color image, the darkness/paleness
or brightness of each of primary colors, such as red, green, and blue, or cyan, magenta,
and yellow, is expressed by a 24 bit value (8 bits × 3 colors). It is sufficient that
the training image 70 includes at least one hue, and the darkness/paleness or brightness
of the hue, but more preferably, includes at least two hues and the darkness/paleness
or brightness of each hue. Information including hue and the darkness/paleness or
brightness of the hue is also called tone.
[0072] Next, information of tone of each pixel is converted from, for example, RGB colors
into a format that includes information of brightness and information of hue. Examples
of the format that includes information of brightness and information of hue include
YUV (YCbCr, YPbPr, YIQ, and the like). Here, an example of converting to a YCbCr format
is described. Since the training image is in RGB colors, conversion into brightness
72Y, first hue (for example, bluish color) 72Cb, and second hue (for example, reddish
color) 72Cr is performed. Conversion from RGB to YCbCr can be performed by a known
method. For example, conversion from RGB to YCbCr can be performed according to International
Standard ITU-R BT.601. The brightness 72Y, the first hue 72Cb, and the second hue
72Cr after the conversion can be each expressed as a matrix of gradation values as
shown in FIG. 2 (hereinafter, also referred to as tone matrices 72y, 72cb, 72cr).
The brightness 72Y, the first hue 72Cb, and the second hue 72Cr are each expressed
in 256 gradations consisting of 0 to 255 gradations. Here, instead of the brightness
72Y, the first hue 72Cb, and the second hue 72Cr, the training image may be converted
into the three primary colors of red R, green G, and blue B, or the three primary
colors of pigment of cyan C, magenta M, and yellow Y.
[0073] Next, on the basis of the tone matrices 72y, 72cb, 72cr, for each pixel, tone vector
data 74 is generated by combining three gradation values of the brightness 72y, the
first hue 72cb, and the second hue 72cr.
[0074] Next, for example, since the training image 70 in FIG. 2 is of a segmented neutrophil,
each tone vector data 74 generated from the training image 70 in FIG. 2 is provided
with "1" as a label value 77 which indicates that the image is of a segmented neutrophil,
whereby the deep learning training data 75 is obtained. In FIG. 2, for convenience,
the deep learning training data 75 is expressed by 3 pixels × 3 pixels. However, in
actuality, the tone vector data exists by the number of pixels that have been obtained
at the capture of the training image 70.
[0075] FIGS. 3A and 3B show an example of the label value 77. As the label value, a label
value 77 that is different according to the type of cell and the presence or absence
of an abnormal finding in each cell is provided.
<Outline of generation of discriminator>
[0076] Using FIG. 2 as an example, the outline of a method for generating a discriminator
is described. The generation of a discriminator can include training of the deep learning
algorithm and training of the machine learning algorithm.
- Training of deep learning algorithm
[0077] The 1st deep learning algorithm includes the first neural network 50 and the second
neural network 51 in order to generate first information 53, which is information
regarding the type of abnormal finding. The 2nd deep learning algorithm includes the
first neural network 50 and the second neural network 52 in order to generate second
information 54, which is information regarding the type of cell.
[0078] The number of nodes at an input layer 50a in the first neural network 50 corresponds
to the product of the number of pixels of the deep learning training data 75 that
is inputted, and the number of brightness and hue (for example, in the above example,
three, i.e., the brightness 72y, the first hue 72cb, and the second hue 72cr) included
in the image. The tone vector data 74 is inputted, as a set 76 thereof, to the input
layer 50a of the first neural network 50. The label value 77 of each pixel of the
deep learning training data 75 is inputted to an output layer 50b of the first neural
network, to train the first neural network 50.
[0079] On the basis of the deep learning training data 75, the first neural network 50 extracts
a feature quantity with respect to a cell feature reflecting the morphological cell
type or abnormal finding described above. The output layer 50b of the first neural
network outputs a result reflecting these feature quantities. Each result outputted
from a softmax function of the output layer 50b of the first neural network 50 is
inputted to an input layer 51a of the second neural network 51 and an input layer
52a of the second neural network 52. Since cells that belong to a predetermined cell
lineage have similar cell morphologies, the second neural networks 51, 52 are trained
so as to be further specialized in discernment of cell features that reflect a morphologically
specific type of cell and a morphologically specific abnormal finding. Thus, the label
value 77 of the deep learning training data 75 is also inputted to output layers 51b,
52b of the second neural network. Reference characters 50c, 51c, and 52c in FIG. 2
represent middle layers. For one abnormal finding, one second neural network 51 can
be trained. In other words, second neural networks 51 corresponding to the number
of types of abnormal findings that should be analyzed can be trained. The number of
types of the second neural network 52 for discerning the type of cell is one.
[0080] Preferably, the first neural network 50 is a convolution connect neural network,
and the second neural networks 51, 52 are each a full connect neural network.
[0081] Accordingly, the 1st deep learning algorithm having the trained first neural network
60 and second neural network 61, and the 2nd deep learning algorithm having the trained
first neural network 60 and second neural network 62 are generated (see FIG. 5).
[0082] For example, the second neural network 61 for discerning an abnormal finding outputs
a probability of the presence or absence of an abnormal finding, as a discernment
result of an abnormal finding. The probability may be given in terms of the name of
an abnormal finding or a label value corresponding to the abnormal finding. The second
neural network 62 for discerning the type of cell outputs, as a discernment result,
a probability that each analysis target cell belongs to each of a plurality of types
of cells inputted as the training data. The probability may be given in terms of the
name of the type of cell or a label value corresponding to the type of cell.
- Training of machine learning algorithm
[0083] Machine learning training data 90 shown in FIG. 4 is used as training data for training
a machine learning algorithm 57. The machine learning training data 90 includes a
feature quantity and disease information 55. For each specimen, the abnormal finding
and/or the probability (which have been outputted from the deep learning algorithm)
of the type of cell or a value obtained by converting the probability into a cell
number can be used as the feature quantity to be learned. In the machine learning
training data 90, the feature quantity is associated with disease information expressed
as the name of a disease of the subject from whom the corresponding specimen has been
collected, a label value of the disease, or the like.
[0084] The feature quantity to be inputted to the machine learning algorithm 57 is at least
one of information regarding the type of abnormal finding and information regarding
the type of cell. As the feature quantity, information regarding the type of abnormal
finding and information regarding the type of cell are preferably used. The abnormal
finding to be used as the feature quantity may be of one type or a plurality of types.
The type of cell to be used as the feature quantity may be of one type or a plurality
of types.
[0085] The training image 70 captured from each specimen and used for training the deep
learning algorithm is analyzed by using the trained 1st deep learning algorithm and/or
2nd deep learning algorithm, and the abnormal finding and/or the type of cell is discerned
for the cell in each training image 70. For each cell, a probability of having each
abnormal finding and a label value indicating the abnormal finding is outputted from
the second neural network 61. The probability of having each abnormal finding and
the label value indicating the abnormal finding serve as a discernment result of the
type of abnormal finding. A probability corresponding to each type of cell and a label
value indicating the type of cell are outputted from the second neural network 62.
The probability corresponding to each type of cell and the label value indicating
the type of cell serve as a discernment result of the type of cell. On the basis of
these pieces of information, the feature quantity to be inputted to the machine learning
algorithm 57 is generated.
[0086] FIG. 4 shows an example of the machine learning training data 90. For convenience
of description, FIG. 4 shows an example in which: the cell number is three (cell No.
1 to No. 3); the abnormal finding includes five findings, which are degranulation,
Auer rod, spherical nucleus, hypersegmentation, and megathrombocyte; and the type
of cell includes eight types, which are segmented neutrophil, band neutrophil, lymphocyte,
monocyte, eosinophil, basophil, blast, and platelet. A to F provided above the table
in FIG. 4 represent column numbers of the table. 1 to 19 provided at the left of the
table represent row numbers.
[0087] For each specimen, with respect to each analysis target cell, the first neural network
50 and the second neural network 51 calculate the probability of having each abnormal
finding, and the second neural network 51 outputs the calculated probability. In FIG.
4, the probability of having an abnormal finding is expressed by the numbers from
0 to 1. For example, in the case of having an abnormal finding, the probability can
be expressed by a value close to "1", and in the case of not having an abnormal finding,
the probability can be expressed by a number close to "0". In FIG. 4, the values in
the cells of rows 1 to 5 in columns A to E are values outputted by the second neural
network 51. Next, for each specimen, the sum of the probabilities for each analyzed
type of abnormal finding is calculated. In FIG. 4, the values in the cells of rows
1 to 5 in column F are the sums of the respective abnormal findings. For each specimen,
the group of data in each of which a label indicating the name of an abnormal finding
is associated with the sum per specimen of the probability of having the abnormal
finding is referred to as "information regarding the type of abnormal finding". In
FIG. 2, the information regarding the type of abnormal finding is the first information
53. In FIG. 4, the group of data in which cell B1 is associated with cell F1, cell
B2 is associated with cell F2, cell B3 is associated with cell F3, cell B4 is associated
with cell F4, and cell B5 is associated with cell F5 is the "information regarding
the type of abnormal finding", which serves as the first information 53. The first
information 53 is associated with the disease information 55 expressed as a disease
name or a label value indicating the disease name, to serve as the machine learning
training data 90. In FIG. 4, row 19 indicates the disease information 55.
[0088] As shown in FIG. 4, with respect to one type of abnormal finding, a probability less
than "1" is indicated in some cases. In such a case, for example, a predetermined
cut off value is determined, and all types of abnormal findings indicating values
lower than the cut off value may be regarded as having a probability of "0". Alternatively,
a predetermined cut off value is determined, and all types of abnormal findings indicating
values higher than the cut off value may be regarded as having a probability of "1".
[0089] Here, the probability for each type of abnormal finding may be expressed as a cell
number for each type of abnormal finding.
[0090] Also with respect to the type of cell, for each analysis target cell, the first neural
network 50 and the second neural network 52 calculate a probability corresponding
to each type of cell, and the second neural network 52 outputs the calculated probability.
The probability corresponding to each type of cell is calculated for all the types
of cells which are the analysis targets of the first neural network 50 and the second
neural network 52. In the example shown in FIG. 4, for one analysis target cell, with
respect to all the items of segmented neutrophil, band neutrophil, lymphocyte, monocyte,
eosinophil, basophil, blast, and platelet, a probability corresponding to each type
of cell is calculated. The values in the cells of rows 6 to 13 in columns A to E are
values outputted by the second neural network 52. Next, for each specimen, the sum
of the probabilities for each analyzed type of cell is calculated. In FIG. 4, the
values in the cells of rows 6 to 13 in column F are the sums of the respective types
of cells. The group of data in each of which a label indicating the name of the cell
type is associated with the sum per specimen of the probability corresponding to the
cell type is referred to as "information regarding the type of cell". The information
regarding the type of cell is the second information 54 in FIG. 2. In FIG. 4, the
group of data in which cell B6 is associated with cell F6, cell B7 is associated with
cell F7, cell B8 is associated with cell F8, cell B9 is associated with cell F9, cell
B10 is associated with cell F10, cell B11 is associated with cell F11, cell B12 is
associated with cell F12, and cell B13 is associated with cell F13 is "information
regarding the type of cell", which serves as the second information 54. The second
information 54 is associated with the disease information 55 expressed as a disease
name or a label value indicating the disease name, to serve as the machine learning
training data 90. In FIG. 4, row 19 indicates the disease information 55.
[0091] Here, the probability for each type of cell may be expressed as a cell number for
each type of cell. As shown in FIG. 4, with respect to one analysis target cell, a
probability higher than 0 is indicated for a plurality of items of type of cell in
some cases. In such a case, for example, a predetermined cut off value is determined,
and all types of cells indicating values lower than the cut off value may be regarded
as having a probability of "0". Alternatively, a predetermined cut off value is determined,
and all types of cells indicating values higher than the cut off value may be regarded
as having a probability of "1".
[0092] Further, a preferable feature quantity is information regarding the type of abnormal
finding obtained for each type of cell. With reference to FIG. 4, for example, as
shown in row 14 to row 18 which indicate degranulation of neutrophil (cell B14), Auer
rod of blast (cell B15), spherical nucleus of neutrophil (cell B16), hypersegmentation
of neutrophil (cell B17), and megathrombocyte (cell B18), when generating the deep
learning training data 75, a specific type of cell and a specific type of abnormal
finding are associated to each other, and training is performed. The feature quantity
is generated in the same manner as that for the type of abnormal finding not associated
with the type of cell. The information regarding the type of abnormal finding obtained
for each type of cell is referred to as third information. The third information is
associated with the disease information 55 expressed as a disease name or a label
value indicating the disease name, to serve as the machine learning training data
90.
[0093] The machine learning training data 90 is inputted to the machine learning algorithm
57, to train the machine learning algorithm 57, whereby a trained machine learning
algorithm 67 (see FIG. 5) is generated.
[0094] Preferably, a training method for the machine learning algorithm 57 uses at least
one of: the machine learning training data 90 in which the first information is associated
with the disease information 55; the machine learning training data 90 in which the
second information is associated with the disease information 55; and the machine
learning training data 90 in which the third information is associated with the disease
information 55. More preferably, the training method uses the machine learning training
data 90 in which the first information is associated with the disease information
55 and the machine learning training data 90 in which the second information is associated
with the disease information 55, or uses the machine learning training data 90 in
which the third information is associated with the disease information 55 and the
machine learning training data 90 in which the second information is associated with
the disease information 55. Most preferably, in the training method, both of the machine
learning training data 90 in which the second information 54 is associated with the
disease information 55 expressed as a disease name or a label value indicating the
disease name, and the machine learning training data 90 in which the third information
is associated with the disease information 55 expressed as a disease name or a label
value indicating the disease name are inputted as training data to the machine learning
algorithm 57. In this case, the types of cells in the second information 54 and the
types of cells associated with the third information may be the same or different
with each other.
[0095] The machine learning algorithm may be any machine learning algorithm that can analyze
a disease on the basis of the feature quantity described above. For example, the machine
learning algorithm can be selected from regression, tree, neural network, Bayes, time
series model, clustering, and ensemble learning.
[0096] The regression can include linear regression, logistic regression, support vector
machine, and the like. The tree can include gradient boosting tree, decision tree,
regression tree, random forest, and the like. The neural network can include perceptron,
convolution neural network, recurrent neural network, residual network, and the like.
The time series model can include moving average, auto regression, autoregressive
moving average, autoregressive integrated moving average, and the like. The clustering
can include k-nearest-neighbor. The ensemble learning can include boosting, bagging,
and the like. Gradient boosting tree is preferable.
<Support method for disease analysis>
[0097] FIG. 5 shows an example of the support method for disease analysis. In the support
method, analysis data 81 is generated from an analysis image 78 obtained by capturing
an analysis target cell. The analysis image 78 is an image obtained by capturing an
analysis target cell contained in a specimen collected from a subject. The analysis
image 78 can be obtained by using, for example, a known light microscope or a known
imaging apparatus such as a virtual slide scanner. In the example shown in FIG. 5,
similar to the training image 70, the analysis image 78 is generated by reducing a
raw image captured in 360 pixels × 365 pixels by a blood cell differential automatic
analyzer DI-60 (manufactured by Sysmex Corporation) into 255 pixels × 255 pixels.
However, this reduction is not mandatory. The number of pixels of the analysis image
78 is not limited in particular as long as analysis can be performed, but the number
of pixels of one side of the image is preferably greater than 100. In the example
shown in FIG. 5, erythrocytes are present around a segmented neutrophil, but the image
may be trimmed such that only the target cell is included in the image. If, at least,
one cell to be analyzed (erythrocytes, and platelets of normal sizes may be included)
is included in one image, and the pixels corresponding to the cell to be analyzed
exist by about 1/9 of the total pixels of the image, the image can be used as the
analysis image 78.
[0098] For example, preferably, image capturing by the imaging apparatus is performed in
RGB colors, CMY colors, or the like. Preferably, as for a color image, the darkness/paleness
or brightness of each of primary colors, such as red, green, and blue, or cyan, magenta,
or yellow, is expressed by a 24 bit value (8 bits × 3 colors). It is sufficient that
the analysis image 78 includes at least one hue, and the darkness/paleness or brightness
of the hue, but more preferably, includes at least two hues and the darkness/paleness
or brightness of each hue. Information including hue and the darkness/paleness or
brightness of the hue is also called tone.
[0099] For example, the format of RGB colors is converted into a format that includes information
of brightness and information of hue. Examples of the format that includes information
of brightness and information of hue include YUV (YCbCr, YPbPr, YIQ, and the like).
Here, an example of converting to a YCbCr format is described. Since the analysis
image is in RGB colors, conversion into brightness 79Y, first hue (for example, bluish
color) 79Cb, and second hue (for example, reddish color) 79Cr is performed. Conversion
from RGB to YCbCr can be performed by a known method. For example, conversion from
RGB to YCbCr can be performed according to International Standard ITU-R BT.601. The
brightness 79Y, the first hue 79Cb, and the second hue 79Cr after the conversion can
be each expressed as a matrix of gradation values as shown in FIG. 5 (hereinafter,
also referred to as tone matrices 79y, 79cb, 79cr). The brightness 79Y, the first
hue 79Cb, and the second hue 79Cr are each expressed in 256 gradations consisting
of 0 to 255 gradations. Here, instead of the brightness 79Y, the first hue 79Cb, and
the second hue 79Cr, the analysis image may be converted into the three primary colors
of red R, green G, and blue B, or the three primary colors of pigment of cyan C, magenta
M, and yellow Y.
[0100] Next, on the basis of the tone matrices 79y, 79cb, 79cr, for each pixel, tone vector
data 80 is generated by combining three gradation values of the brightness 79y, the
first hue 79cb, and the second hue 79cr. A set of the tone vector data 80 generated
from one analysis image 78 is generated as the analysis data 81.
[0101] Preferably, the generation of the analysis data 81 and the generation of the deep
learning training data 75 are performed at least in the same capture condition and
the same generation condition of the vector data that is inputted from each image
to a neural network.
[0102] The 1st deep learning algorithm includes the first neural network 60 and the second
neural network 61 in order to generate first information 63 which is information regarding
the type of abnormal finding. The 2nd deep learning algorithm includes the first neural
network 60 and the second neural network 62 in order to generate second information
64, which is information regarding the type of cell.
[0103] The analysis data 81 is inputted to an input layer 60a of the trained first neural
network 60. The first neural network 60 extracts a feature quantity of the cell from
the analysis data 81, and outputs the result from an output layer 60b of the first
neural network 60. Each result outputted from a softmax function of the output layer
60b of the first neural network 60 is inputted to an input layer 61a of the second
neural network 61 and an input layer 62a of the second neural network 62.
[0104] Next, the result outputted from the output layer 60b is inputted to the input layer
61a of the trained second neural network 61. For example, on the basis of the inputted
feature quantity, the second neural network 61 for discerning an abnormal finding
outputs, from an output layer 61b, a probability of the presence or absence of an
abnormal finding, as a discernment result of an abnormal finding.
[0105] Meanwhile, the result outputted from the output layer 60b is inputted to the input
layer 62a of the trained second neural network 62. On the basis of the inputted feature
quantity, the second neural network 62 outputs, from an output layer 62b, a probability
that the analysis target cell included in the analysis image belongs to each of the
types of cells inputted as the training data. In FIG. 5, reference characters 60c,
61c, 62c represent middle layers.
[0106] Next, on the discernment result of the abnormal finding, for each specimen, information
regarding the type of abnormal finding (the first information 63 in FIG. 5) corresponding
to the specimen is obtained. For example, the first information 63 is the sum of the
probabilities for each analyzed type of abnormal finding outputted from the output
layer 61b of the second neural network 61. The generation method for the first information
63 is the same as the generation method for the machine learning training data.
[0107] Meanwhile, on the basis of the discernment result of the type of cell, for each specimen,
information regarding the type of cell (the second information 64 in FIG. 5) corresponding
to the specimen is obtained. For example, the second information 64 is the sum of
the probabilities for each analyzed type of cell outputted from the output layer 62b
of the second neural network 62. The generation method for the second information
64 is the same as the generation method for the machine learning training data 90.
[0108] When the generated first information 63 and second information 64 are inputted to
the trained machine learning algorithm 67, an analysis result 83 is generated by the
machine learning algorithm 67. The analysis result 83 can be a disease name or a label
value indicating the disease name.
[0109] Preferably, as the data inputted to the machine learning algorithm 67, at least one
of the first information 63, the second information 64, and the third information
can be used. More preferably, the first information 63 and the second information
64 can be used, or the third information and the second information 64 can be used.
Most preferably, both of the second information 64 and the third information are used
as the analysis data 81. In this case, the types of cells in the second information
64 and the types of cells associated with the third information may be the same or
different with each other. The third information is information that is generated
by associating a specific type of cell with a specific type of abnormal finding when
generating the analysis data 81, and the generation method therefor is the same as
the method described in the generation method for the machine learning training data
90.
[Disease analysis support system 1]
<Configuration of disease analysis support system 1>
[0110] A disease analysis support system 1 is described. With reference to FIG. 6, the disease
analysis support system 1 includes a training apparatus 100A and a disease analyzer
200A. A vendor-side apparatus 100 operates as the training apparatus 100A, and a user-side
apparatus 200 operates as the disease analyzer 200A. The training apparatus 100A generates
a discriminator by using the deep learning training data 75 and the machine learning
training data 90, and provides the discriminator to a user. The discriminator is provided
from the training apparatus 100A to the disease analyzer 200A through a storage medium
98 or a network 99. The disease analyzer 200A performs analysis of an image of an
analysis target cell, using the discriminator provided from the training apparatus
100A.
[0111] The training apparatus 100A is implemented as a general purpose computer, for example,
and performs a deep learning process on the basis of a flow chart described later.
The disease analyzer 200A is implemented as a general purpose computer, for example,
and performs a disease analysis process on the basis of a flow chart described later.
The storage medium 98 is a computer-readable non-transitory tangible storage medium
such as a DVD-ROM or a USB memory, for example.
[0112] The training apparatus 100A is connected to an imaging apparatus 300. The imaging
apparatus 300 includes an image pickup device 301 and a fluorescence microscope 302,
and captures a bright field image of a learning preparation 308 set on a stage 309.
The training preparation 308 has been subjected to the staining described above. The
training apparatus 100A obtains the training image 70 captured by the imaging apparatus
300.
[0113] The disease analyzer 200A is connected to an imaging apparatus 400. The imaging apparatus
400 includes an image pickup device 401 and a fluorescence microscope 402, and captures
a bright field image of an analysis target preparation 408 set on a stage 409. The
analysis target preparation 408 has been subjected to staining in advance as described
above. The disease analyzer 200A obtains the analysis target image 78 captured by
the imaging apparatus 400.
[0114] As the imaging apparatus 300, 400, a known light microscope, a known virtual slide
scanner, or the like that has a function of capturing a preparation can be used.
<Hardware configuration of training apparatus>
[0115] With reference to FIG. 7, the vendor-side apparatus 100 (the training apparatus 100A)
includes a processing unit 10 (10A), an input unit 16, and an output unit 17.
[0116] The processing unit 10 includes: a CPU (Central Processing Unit) 11 which performs
data processing described later; a memory 12 to be used as a work area for data processing;
a storage unit 13 which stores therein a program and process data described later;
a bus 14 which transmits data between units; an interface unit 15 which inputs/outputs
data with respect to an external apparatus, and a GPU (Graphics Processing Unit) 19.
The input unit 16 and the output unit 17 are connected to the processing unit 10.
For example, the input unit 16 is an input device such as a keyboard or a mouse, and
the output unit 17 is a display device such as a liquid crystal display. The GPU 19
functions as an accelerator that assists arithmetic processing (for example, parallel
arithmetic processing) performed by the CPU 11. That is, the processing performed
by the CPU 11 described below also includes processing performed by the CPU 11 using
the GPU 19 as an accelerator.
[0117] In order to perform the process of each step described below with reference to FIG.
10, FIG. 13, and FIG. 14, the processing unit 10 has previously stored, in the storage
unit 13, a program and a discriminator according to the present disclosure in an executable
form, for example. The executable form is a form generated through conversion of a
programming language by a compiler, for example. The processing unit 10 uses the program
stored in the storage unit 13, to perform training processes on the first neural network
50, the second neural network 51, the second neural network 52, and the machine learning
algorithm 57.
[0118] In the description below, unless otherwise specified, the process performed by the
processing unit 10 means a process performed by the CPU 11 on the basis of the program
stored in the storage unit 13 or the memory 12, as well as the first neural network
50, the second neural network 51, the second neural network 52, and the machine learning
algorithm 57. The CPU 11 temporarily stores, in a volatile manner, necessary data
(such as intermediate data being processed) using the memory 12 as a work area, and
stores as appropriate in the storage unit 13, data to be saved for a long time such
as calculation results, in a nonvolatile manner.
<Hardware configuration of disease analyzer>
[0119] With reference to FIG. 8, the user-side apparatus 200 (disease analyzer 200A, disease
analyzer 200B, disease analyzer 200C) includes a processing unit 20 (20A, 20B, 20C),
an input unit 26, and an output unit 27.
[0120] The processing unit 20 includes: a CPU (Central Processing Unit) 21 which performs
data processing described later; a memory 22 to be used as a work area for data processing;
a storage unit 23 which stores therein a program and process data described later;
a bus 24 which transmits data between units; an interface unit 25 which inputs/outputs
data with respect to an external apparatus; and a GPU (Graphics Processing Unit) 29.
The input unit 26 and the output unit 27 are connected to the processing unit 20.
For example, the input unit 26 is an input device such as a keyboard or a mouse, and
the output unit 27 is a display device such as a liquid crystal display. The GPU 29
functions as an accelerator that assists arithmetic processing (for example, parallel
arithmetic processing) performed by the CPU 21. That is, the processing performed
by the CPU 21 described below also includes processing performed by the CPU 21 using
the GPU 29 as an accelerator.
[0121] In order to perform the process of each step described in the disease analysis process
below, the processing unit 20 has previously stored, in the storage unit 23, a program
and the discriminator according to the present disclosure in an executable form, for
example. The executable form is a form generated through conversion of a programming
language by a compiler, for example. The processing unit 20 uses the program and the
discriminator stored in the storage unit 23, to perform a process.
[0122] In the description below, unless otherwise specified, the process performed by the
processing unit 20 means a process performed, in actuality, by the CPU 21 of the processing
unit 20 on the basis of the program and the deep learning algorithm 60 stored in the
storage unit 23 or the memory 22. The CPU 21 temporarily stores, in a volatile manner,
necessary data (such as intermediate data being processed) using the memory 22 as
a work area, and stores as appropriate in the storage unit 23, data to be saved for
a long time such as calculation results, in a nonvolatile manner.
<Function block and processing procedure>
(Deep learning process)
[0123] With reference to FIG. 9, the processing unit 10A of the training apparatus 100A
includes a deep learning training data generation unit 101, a deep learning training
data input unit 102, and a deep learning algorithm update unit 103. These function
blocks are realized when: a program for causing a computer to execute the deep learning
process is installed in the storage unit 13 or the memory 12 of the processing unit
10A; and the program is executed by the CPU 11. A deep learning training data database
(DB) 104 and a deep learning algorithm database (DB) 105 are stored in the storage
unit 13 or the memory 12 of the processing unit 10A.
[0124] Each training image 70 is captured in advance by the imaging apparatus 300 and is
stored in advance in the storage unit 13 or the memory 12 of the processing unit 10A,
in association with the morphological type of cell or abnormal finding to which an
analysis target cell belongs, for example. The first neural network 50, and the second
neural network 51 and the second neural network 52 that have not been trained are
stored in the deep learning training data database 104 in advance. The first neural
network 50, and the second neural network 51 and the second neural network 52 that
have been trained once and are to be updated are stored in the deep learning algorithm
database 105 in advance.
[0125] The processing unit 10A of the training apparatus 100A performs the process shown
in FIG. 10. With reference to the function blocks shown in FIG. 9, the processes of
steps S11, S12, S16, and S17 are performed by the deep learning training data generation
unit 101. The process of step S13 is performed by the deep learning training data
input unit 102. The process of step S14 is performed by the deep learning algorithm
update unit 103.
[0126] An example of the deep learning process performed by the processing unit 10A is described
with reference to FIG. 10. First, the processing unit 10A obtains training images
70. Each training image 70 is obtained via the I/F unit 15 through an operation by
an operator, from the imaging apparatus 300, from the storage medium 98, or via a
network. When the training image 70 is obtained, information regarding which of the
morphologically classified cell type and/or abnormal finding is indicated by the training
image 70 is also obtained. The information regarding which of the morphologically
classified cell type and/or abnormal finding is indicated may be associated with the
training image 70, or may be inputted by the operator through the input unit 16.
[0127] In step S11, the processing unit 10A converts the obtained training image 70 into
brightness Y, first hue Cb, and second hue Cr, and generates tone vector data 74 in
accordance with the procedure described in the training data generation method above.
[0128] In step S12, the processing unit 10A provides a label value that corresponds to the
tone vector data 74, on the basis of: the information associated with the training
image 70 and regarding which of the morphologically classified cell type and/or abnormal
finding is indicated; and the label value associated with the morphologically classified
cell type or abnormal finding stored in the memory 12 or the storage unit 13. In this
manner, the processing unit 10A generates the deep learning training data 75.
[0129] In step S13 shown in FIG. 10, the processing unit 10A trains the first neural network
50 and the second neural network 51 by using the deep learning training data 75. The
training results of the first neural network 50 and the second neural network 51 are
accumulated every time training is performed by using a plurality of the deep learning
training data 75.
[0130] Next, in step S14, the processing unit 10A determines whether or not training results
of a previously-set predetermined number of trials have been accumulated. When the
training results of the predetermined number of trials have been accumulated (YES),
the processing unit 10A advances to the process of step S15, and when the training
results of the predetermined number of trials have not been accumulated (NO), the
processing unit 10A advances to the process of step S16.
[0131] Next, when the training results of the predetermined number of trials have been accumulated,
the processing unit 10A updates, in step S15, connection weights w of the first neural
network 50 and the second neural network 51, or of the first neural network 50 and
the second neural network 52, by using the training results accumulated in step S13.
In the disease analysis method, the stochastic gradient descent method is used. Thus,
the connection weights w of the first neural network 50 and the second neural network
51, or of the first neural network 50 and the second neural network 52 are updated
at a stage where learning results of the predetermined number of trials have been
accumulated. Specifically, the process of updating the connection weights w is a process
of performing calculation according to the gradient descent method, expressed in Formula
11 and Formula 12 described later.
[0132] In step S16, the processing unit 10A determines whether or not the first neural network
50 and the second neural network 51 or the first neural network 50 and the second
neural network 52 have been trained using a prescribed number of pieces of training
data 75. When the training has been performed using the prescribed number of pieces
of training data 75 (YES), the deep learning process ends.
[0133] When the first neural network 50 and the second neural network 51 or the first neural
network 50 and the second neural network 52 have not been trained using the prescribed
number of pieces of training data 75 (NO), the processing unit 10A advances from step
S16 to step S17, and performs the processes from step S11 to step S16 with respect
to the next training image 70.
[0134] In accordance with the process described above, the first neural network 50 and the
second neural network 51 or the first neural network 50 and the second neural network
52 are trained, whereby the 1st deep learning algorithm and the 2nd deep learning
algorithm are obtained.
(Structure of neural network)
[0135] FIG. 11A shows an example of the structure of the first neural network 50 and the
second neural networks 51, 52. The first neural network 50 and the second neural networks
51, 52 include: the input layers 50a, 51a, 52a; the output layers 50b, 51b, 52b; and
the middle layers 50c, 51c, 52c between the input layers 50a, 51a, 52a and the output
layers 50b, 51b, 52b. Each middle layer 50c, 51c, 52c is composed of a plurality of
layers. The number of layers forming the middle layer 50c, 51c 52c can be 5 or greater,
for example.
[0136] In the first neural network 50 and the second neural network 51, or in the first
neural network 50 and the second neural network 52, a plurality of nodes 89 arranged
in a layered manner are connected between the layers. Accordingly, information propagates
only in one direction indicated by the arrow D in the figure, from the input-side
layer 50a, 51a, 52a to the output-side layer 50b, 51b, 52b.
(Calculation at each node)
[0137] FIG. 11B is a schematic diagram showing calculation performed at each node. Each
node 89 receives a plurality of inputs and calculates one output (z). In the case
of the example shown in FIG. 11B, the node 89 receives four inputs. The total input
(u) received by the node 89 is expressed by Formula 1 below.
[Math 1]

[0138] Each input is multiplied by a different weight. In Formula 1, b is a value called
bias. The output (z) of the node serves as an output of a predetermined function f
with respect to the total input (u) expressed by Formula 1, and is expressed by Formula
2 below. The function f is called an activation function.
[Math 2]

[0139] FIG. 11C is a schematic diagram illustrating calculation between nodes. In the first
neural network 50 and the second neural network 51, 52, with respect to the total
input (u) expressed by Formula 1, nodes that output results (z) each expressed by
Formula 2 are arranged in a layered manner. Outputs from the nodes of the previous
layer serve as inputs to the nodes of the next layer. In the example shown in FIG.
11C, the outputs from nodes 89a in the left layer in the figure serve as inputs to
nodes 89b in the right layer. Each node 89b in the right layer receives outputs from
the respective nodes 89a in the left layer. The connection between each node 89a in
the left layer and each node 89b in the right layer is multiplied by a different weight.
When the respective outputs from the plurality of nodes 89a of the left layer are
defined as x
1 to x
4, the inputs to the respective three nodes 89b in the right layer are expressed by
Formula 3-1 to Formula 3-3 below.
[Math 3]



[0140] When Formula 3-1 to Formula 3-3 are generalized, Formula 3-4 is obtained. Here, i=1,
··· I, j=1, ··· J.
[Math 4]

When Formula 3-4 is applied to an activation function, an output is obtained. The
output is expressed by Formula 4 below.
[Math 5]

(Activation function)
[0141] In the disease analysis method, a rectified linear unit function is used as the activation
function. The rectified linear unit function is expressed by Formula 5 below.
[Math 6]

Formula 5 is a function obtained by setting u=0 to the part u<0 in the linear function
with z=u. In the example shown in FIG. 11C, using Formula 5, the output from the node
of j=1 is expressed by the formula below.
[Math 7]

(Neural network learning)
[0142] If the function expressed by use of a neural network is defined as y(x:w), the function
y(x:w) varies when a parameter w of the neural network is varied. Adjusting the function
y(x:w) such that the neural network selects a more suitable parameter w with respect
to the input x is referred to as neural network learning. It is assumed that a plurality
of pairs of an input and an output of the function expressed by use of the neural
network have been given. If a desirable output for an input x is defined as d, the
pairs of the input/output are given as {(x
1,d
1), (x
2,d
2), ··· ,(x
n,d
n)}. The set of pairs each expressed as (x,d) is referred to as training data. Specifically,
the set of pairs of a label of the true value image and a color density value for
each pixel in a single color image of each color, Y, Cb, or Cr shown in FIG. 2 is
the training data in FIG. 2.
[0143] The neural network learning means adjusting the weight w such that, with respect
to any input/output pair (x
n,d
n), the output y(x
n:w) of the neural network when given an input x
n, becomes close to the output d
n as much as possible. An error function is a function for measuring the closeness

between the training data and a function expressed by use of the neural network.
The error function is also called a loss function. An error function E(w) used in
the disease analysis method is expressed by Formula 6 below. Formula 6 is called cross
entropy.
[Math 9]

[0144] A method for calculating the cross entropy in Formula 6 is described. In the output
layer 50b of the neural network 50 to be used in the disease analysis method, i.e.,
in the last layer of the neural network, an activation function is used that classifies
inputs x into a finite number of classes according to the contents. The activation
function is called a softmax function and expressed by Formula 7 below. It is assumed
that, in the output layer 50b, the nodes are arranged by the same number as the number
of classes k. It is assumed that the total input u of each node k (k=1, ··· K) of
an output layer L is given as u
k(L) from the outputs of the previous layer L-1. Accordingly, the output of the k-th node
in the output layer is expressed by Formula 7 below.
[Math 10]

[0145] Formula 7 is the softmax function. The sum of output y
1, ··· , y
K determined by Formula 7 is always 1.
[0146] When each class is expressed as C
1, ··· , C
K, output y
K of node k in the output layer L (i.e., u
k(L)) represents the probability that a given input x belongs to class C
K. Refer to Formula 8 below. The input x is classified into a class in which the probability
expressed by Formula 8 becomes largest.
[Maths 11]

[0147] In the neural network learning, a function expressed by the neural network is considered
as a model of the posterior probability of each class, the likelihood of the weight
w with respect to the training data is evaluated under such a probability model, and
a weight w that maximizes the likelihood is selected.
[0148] It is assumed that target output d
n by the softmax function of Formula 7 is 1 only if the output is a correct class,
and otherwise, target output d
n is 0. In a case where the target output is expressed in a vector format of d
n=[d
n1, ··· , d
nK], if, for example, the correct class for input x
n is C
3, only target output d
n3 becomes 1, and the other target outputs become 0. When coding is performed in this
manner, the posterior distribution is expressed by Formula 9 below.
[Math 12]

[0149] Likelihood L(w) of weight w with respect to the training data {(x
n,d
n)}(n=1, ··· , N) is expressed by Formula 10 below. When the logarithm of likelihood
L(w) is taken and the sign is inverted, the error function of Formula 6 is derived.
[Math 13]

Learning means minimizing error function E(w) calculated on the basis of the training
data, with respect to parameter w of the neural network. In the disease analysis method,
error function E(w) is expressed by Formula 6.
[0150] Minimizing error function E(w) with respect to parameter w has the same meaning as
finding a local minimum point of function E(w). Parameter w is a weight of connection
between nodes. The local minimum point of weight w is obtained by iterative calculation
of repeatedly updating parameter w from an arbitrary initial value as a starting point.
An example of such calculation is the gradient descent method.
[0151] In the gradient descent method, a vector expressed by Formula 11 below is used.
[Math 14]

[0152] In the gradient descent method, a process of moving the value of current parameter
w in the negative gradient direction (i.e., -VE) is repeated many times. If it is
assumed that w
(t) is the current weight and that w
(t+1) is the weight after the moving, the calculation according to the gradient descent
method is expressed by Formula 12 below. Value t means the number of times the parameter
w is moved.
[Math 15]

[Math 16]
[0153] The above symbol is a constant that determines the magnitude of the update amount
of parameter w, and is called a learning coefficient. As a result of repetition of
the calculation expressed by Formula 12, error function E (w
(t)) decreases in association with increase of value t, and parameter w reaches a local
minimum point.
[0154] It should be noted that the calculation according to Formula 12 may be performed
on all the training data (n=1, ··· , N) or may be performed on only part of the training
data. The gradient descent method performed on only part of the training data is called
a stochastic gradient descent method. In the disease analysis method, the stochastic
gradient descent method is used.
(Machine learning process 1)
[0155] In a first machine learning process, the machine learning algorithm 57 is trained
on the basis of the first information or the second information.
[0156] With reference to FIG. 12, the processing unit 10A of the training apparatus 100A
trains the machine learning algorithm 57 on the basis of the first information 53
and the disease information 55. The processing unit 10A includes a machine learning
training data generation unit 101a, a machine learning training data input unit 102a,
and a machine learning algorithm update unit 103a. These function blocks are realized
when: a program for causing a computer to execute the machine learning process is
installed in the storage unit 13 or the memory 12 of the processing unit 10A; and
the program is executed by the CPU 11. A machine learning training data database (DB)
104a and a machine learning algorithm database (DB) 105a are stored in the storage
unit 13 or the memory 12 of the processing unit 10A.
[0157] The first information or the second information has been generated by the processing
unit 10A, and is stored in advance in the storage unit 13 or the memory 12 of the
processing unit 10A, in association with the morphological type of cell or abnormal
finding to which an analysis target cell belongs, for example. The first neural network
50, and the second neural network 51 and the second neural network 52 that have not
been trained are stored in advance in the machine learning training data database
104a. The first neural network 50, and the second neural network 51 and the second
neural network 52 that have been trained once and are to be updated are stored in
advance in the machine learning algorithm database 105a.
[0158] The processing unit 10A of the training apparatus 100A performs the process shown
in FIG. 13. With reference to the function blocks shown in FIG. 12, the processes
of steps S111, S112, S114, and S115 are performed by the machine learning training
data generation unit 101a. The process of step S113 is performed by the machine learning
training data input unit 102a.
[0159] An example of the first machine learning process performed by the processing unit
10A is described with reference to FIG. 13.
[0160] The processing unit 10A of the training apparatus 100A generates, in step Sill, the
first information or the second information in accordance with the method described
in the section of training of the machine learning algorithm above. Specifically,
the processing unit 10A discerns the type of abnormal finding with respect to a cell
in each training image 70 on the basis of the 1st deep learning algorithm or the 2nd
deep learning algorithm having been trained through step S11 to step S16, and obtains
a discernment result. For each cell, the discernment result of the type of abnormal
finding is outputted from the second neural network 61. In step S111, on the basis
of the discernment result of the type of abnormal finding, the processing unit 10A
generates the first information for each specimen for which the training image 70
has been obtained. Alternatively, the processing unit 10A discerns the type of cell
with respect to a cell in each training image 70 on the basis of the second neural
network 62, and obtains a discernment result. On the basis of the discernment result
of the type of cell, the processing unit 10A generates the second information for
each specimen for which the training image 70 has been obtained.
[0161] Next, in step S112, the processing unit 10A generates the machine learning training
data 90 on the basis of the first information and the disease information 55 associated
with the training image 70. Alternatively, the processing unit 10A generates the machine
learning training data 90 on the basis of the second information and the disease information
55 associated with the training image 70.
[0162] Next, in step S113, the processing unit 10A inputs the machine learning training
data 90 to the machine learning algorithm, to train the machine learning algorithm.
[0163] Next, in step S114, the processing unit 10A determines whether the process has been
performed on all the training specimens. When the process has been performed on all
the training specimens, the process ends. When the process has not been performed
on all the training specimens, the processing unit 10A advances to step S115, obtains
a discernment result of the type of abnormal finding or a discernment result of the
type of cell of another specimen, returns to step Sill, and repeats training of the
machine learning algorithm.
(Machine learning process 2)
[0164] In a second machine learning process, the machine learning algorithm 57 is trained
on the basis of the first information and the second information.
[0165] With reference to FIG. 12, the processing unit 10A of the training apparatus 100A
trains the machine learning algorithm 57 on the basis of the first information 53
and the disease information 55. The processing unit 10A includes a machine learning
training data generation unit 101b, a machine learning training data input unit 102b,
and a machine learning algorithm update unit 103b. These function blocks are realized
when: a program for causing a computer to execute the machine learning process is
installed in the storage unit 13 or the memory 12 of the processing unit 10A; and
the program is executed by the CPU 11. A machine learning training data database (DB)
104b and a machine learning algorithm database (DB) 105b are stored in the storage
unit 13 or the memory 12 of the processing unit 10A.
[0166] The first information and the second information has been generated by the processing
unit 10A, and is stored in advance in the storage unit 13 or the memory 12 of the
processing unit 10A, in association with the morphological type of cell or abnormal
finding to which an analysis target cell belongs, for example. The first neural network
50, and the second neural network 51 and the second neural network 52 that have not
been trained are stored in advance in the machine learning training data database
104b. The first neural network 50, and the second neural network 51 and the second
neural network 52 that have been trained once and are to be updated are stored in
advance in the machine learning algorithm database 105b.
[0167] The processing unit 10A of the training apparatus 100A performs the process shown
in FIG. 14. With reference to the function blocks shown in FIG. 12, the processes
in steps S1111, S1112, S1114, and S 1115 are performed by the machine learning training
data generation unit 101b. The process of step S1113 is performed by the machine learning
training data input unit 102b.
[0168] An example of the second machine learning process performed by the processing unit
10A is described with reference to FIG. 14. The processing unit 10A of the training
apparatus 100A generates, in step S1111, the first information and the second information
in accordance with the method described in the section of training of the machine
learning algorithm above. Specifically, the processing unit 10A discerns the type
of abnormal finding with respect to a cell in each training image 70 on the basis
of the 1st deep learning algorithm and the 2nd deep learning algorithm having been
trained through step S11 to step S16, and obtains a discernment result. For each cell,
the discernment result of the type of abnormal finding is outputted from the second
neural network 61. In step S1111, on the basis of the discernment result of the type
of abnormal finding, the processing unit 10A generates the first information for each
specimen for which the training image 70 has been obtained. In addition, the processing
unit 10A discerns the type of cell with respect to a cell in each training image 70
on the basis of the second neural network 62, and obtains a discernment result. On
the basis of the discernment result of the type of cell, the processing unit 10A generates
the second information for each specimen for which the training image 70 has been
obtained.
[0169] Next, in step S1112, the processing unit 10A generates the machine learning training
data 90 on the basis of the first information, the second information, and the disease
information 55 associated with the training image 70.
[0170] Next, in step S1113, the processing unit 10A inputs the machine learning training
data 90 to the machine learning algorithm, to train the machine learning algorithm.
[0171] Next, in step S1114, the processing unit 10A determines whether the process has been
performed on all the training specimens. When the process has been performed on all
the training specimens, the process ends. When the process has not been performed
on all the training specimens, the processing unit 10A advances to step S1115, obtains
a discernment result of the type of abnormal finding and a discernment result of the
type of cell of another specimen, returns to step S1111, and repeats training of the
machine learning algorithm.
[0172] The outline of the machine learning algorithm used in step S113 and S1113 is as follows.
[0173] As the machine learning algorithm, ensemble learning (classifier configured by a
plurality of classifiers) such as Gradient Boosting can be used. Examples of ensemble
learning include Extreme Gradient Boosting (EGB) and Stochastic Gradient Boosting.
Gradient Boosting is a type of boosting algorithm, and is a technique of forming a
plurality of weak learners. As the weak learner, regression tree can be used, for
example.
[0174] For example, in regression tree, when an input vector is defined as x and a label
is defined as y, with respect to the entire learner

, weak learner fm(x), m=1,2, ··· M is sequentially learned and integrated so that
loss function L(y, F(x)) becomes smallest. That is, it is assumed that function F
0(x)=f
0(x) is given at the start of learning, and in the m-th step learning, with respect
to a learner composed of m weak learners

, weak learner fm(x) is determined so that loss function L(y, F(x)) becomes smallest.
In ensemble learning, when the weak learner is optimized, all pieces of data in the
training set are not used, and pieces of data that realize "constant" at random are
sampled to be used.
[Math 19]
[0175]
- (1) Constant function F0(x) that would minimize loss is obtained.
- (2) For m = 1 to M
- (a) N pieces of data are sampled from the training set to obtain set D.
- (b) With respect to each element (x1,y1), (x2,y2), ··· (xN,yN) of the set D, gradient

is calculated.
- (c) Regression tree T(x) that predicts the obtained gradient is generated.
That is, a regression tree that minimizes

is generated.
This regression tree is weak learner ƒm(x).
- (d) Weight of leaf of the regression tree T is optimized so that loss

becomes smallest.
- (e) Fi(x) = Fm-1(x)+vT(x) is set. ν is shrinkage parameter, and is a constant that satisfies 0<ν≤1.
- (3) FM(x) is outputted as F(x).
[0176] Specifically, learner F(x) is obtained according to the algorithm below. The shrinkage
parameter v may be set to 1, and Fo(x) may be changed from a constant function.
(Disease analysis process)
[0177] FIG. 15 shows a function block diagram of the disease analyzer 200A, which performs
a disease analysis process up to generation of the analysis result 83 on the basis
of the analysis target image 78. A processing unit 20A of the disease analyzer 200A
includes an analysis data generation unit 201, an analysis data input unit 202, and
an analysis unit 203. These function blocks are realized when; a program for causing
a computer according to the present disclosure to execute the disease analysis process
is installed in the storage unit 23 or the memory 22 of the processing unit 20A; and
the program is executed by the CPU 21. The deep learning training data database (DB)
104 and the deep learning algorithm database (DB) 105 are provided from the training
apparatus 100A through the storage medium 98 or the network 99, and are stored in
the storage unit 23 or the memory 22 of the processing unit 20A. The machine learning
training data database (DB) 104a, 104b, and the machine learning algorithm database
(DB) 105a, 105b are provided from the training apparatus 100A through the storage
medium 98 or the network 99, and are stored in the storage unit 23 or the memory 22
of the processing unit 20A.
[0178] Each analysis target image 78 is captured by the imaging apparatus 400 and is stored
in the storage unit 23 or the memory 22 of the processing unit 20A. The first neural
network 60 and the second neural networks 61, 62 which have been trained and which
include connection weights w are stored in the deep learning algorithm database 105,
in association the morphological-classification-based type of cell or type of abnormal
finding to which the analysis target cell belongs, for example. The first neural network
60 and the second neural networks 61, 62 function as program modules which are part
of the program that causes a computer to execute the disease analysis process. That
is, the first neural network 60 and the second neural networks 61, 62 are used by
the computer including a CPU and a memory, and output a discernment result of the
type of abnormal finding or a discernment result of the type of cell. The CPU 21 of
the processing unit 20A causes the computer to execute calculation or processing of
specific information according to the intended use. The trained machine learning algorithm
67 is stored in the machine learning algorithm database 105a, 105b, and functions
as a program module which is part of the program that causes the computer to execute
the disease analysis process. That is, the machine learning algorithm 67 is used by
the computer including a CPU and a memory, and outputs a disease analysis result.
[0179] Specifically, the CPU 21 of the processing unit 20A generates, in the analysis data
generation unit 201, a discernment result of the type of abnormal finding, by using
the 1st deep learning algorithm stored in the storage unit 23 or the memory 22. The
processing unit 20A generates, in the analysis data generation unit 201, the first
information 63 on the basis of the discernment result of the type of abnormal finding.
The generated first information 63 is inputted to the analysis data input unit 202
and is stored into the machine learning training data DB 104a. The processing unit
20A performs disease analysis in the analysis unit 203, and outputs an analysis result
83 to the output unit 27. Alternatively, the CPU 21 of the processing unit 20A generates,
in the analysis data generation unit 201, a discernment result of the type of cell,
by using the 2nd deep learning algorithm stored in the storage unit 23 or the memory
22. The processing unit 20A generates, in the analysis data generation unit 201, the
second information 64 on the basis of the discernment result of the type of cell.
The generated second information 64 is inputted to the analysis data input unit 202,
and is stored into the machine learning training data DB 104b. The processing unit
20A performs disease analysis in the analysis unit 203, and outputs an analysis result
83 to the output unit 27.
[0180] With reference the function blocks shown in FIG. 15, the processes of steps S21 and
S22 in FIG. 16 are performed by the analysis data generation unit 201. The processes
of steps S23, S24, S25, and S27 are performed by the analysis data input unit 202.
The process of step S26 is performed by the analysis unit 203.
(Disease analysis process 1)
[0181] With reference to FIG. 16, an example of a first disease analysis process up to outputting
of an analysis result 83 on the basis of the analysis target image 78 performed by
the processing unit 20A is described. In the first disease analysis process, the analysis
result 83 is outputted on the basis of the first information or the second information.
[0182] First, the processing unit 20A obtains analysis images 78. Each analysis image 78
is obtained via the I/F unit 25 through an operation by a user, from the imaging apparatus
400, from the storage medium 98, or via a network.
[0183] Similar to step S11 shown in FIG. 10, in step S21, the obtained analysis image 78
is converted into brightness Y, first hue Cb, and second hue Cr, and the tone vector
data 80 is generated in accordance with the procedure described in the analysis data
generation method above.
[0184] Next, in step S22, the processing unit 20A generates the analysis data 81 from the
tone vector data 80, in accordance with the procedure described in the analysis data
generation method above.
[0185] Next, in step S23, the processing unit 20A obtains the 1st deep learning algorithm
or the 2nd deep learning algorithm stored in the algorithm database 105.
[0186] Next, in step S24, the processing unit 20A inputs the analysis data 81 to the first
neural network 60 forming the 1st deep learning algorithm. In accordance with the
procedure described in the disease analysis method above, the processing unit 20A
inputs the feature quantity outputted from the first neural network 60, to the second
neural network 61, and outputs a discernment result of the type of abnormal finding
from the second neural network 61. The processing unit 20A stores the discernment
result into the memory 22 or the storage unit 23. Alternatively, in step S24, the
processing unit 20A inputs the analysis data 81, to the first neural network 60 forming
the 2nd deep learning algorithm. In accordance with the procedure described in the
disease analysis method above, the processing unit 20A inputs the feature quantity
outputted from the first neural network 60, to the second neural network 62, and outputs
a discernment result of the type of cell from the second neural network 62. The processing
unit 20A stores the discernment result into the memory 22 or the storage unit 23.
[0187] In step S25, the processing unit 20A determines whether the discernment has been
performed on all the analysis images 78 obtained first. When the discernment has been
performed on all the analysis images 78 (YES), the processing unit 20A advances to
step S26, and generates the first information 63 on the basis of the discernment result
of the type of abnormal finding, or generates the second information on the basis
of the discernment result of the type of cell. When the discernment has not been performed
on all the analysis images 78 (NO), the processing unit 20A advances to step S27,
and performs the processes from step S21 to step S25 on the analysis images 78 for
which the discernment has not been performed.
[0188] Next, in step S28, the processing unit 20A obtains the machine learning algorithm
67. Subsequently, in step S29, the processing unit 20A inputs the first information
or the second information to the machine learning algorithm 67.
[0189] Lastly, in step S30, the processing unit 20A outputs an analysis result 83 to the
output unit 27, as a disease name or a label value associated with the disease name.
(Disease analysis process 2)
[0190] With reference to FIG. 17, an example of a second disease analysis process up to
outputting of an analysis result 83 on the basis of the analysis target image 78 performed
by the processing unit 20A is described. In the second disease analysis process, the
analysis result 83 is outputted on the basis of the first information and the second
information.
[0191] First, the processing unit 20A obtains analysis images 78. Each analysis image 78
is obtained via the I/F unit 25 through an operation by a user, from the imaging apparatus
400, from the storage medium 98, or via a network.
[0192] Similar to step S11 shown in FIG. 10, in step S121, the obtained analysis image 78
is converted into brightness Y, first hue Cb, and second hue Cr, and the tone vector
data 80 is generated in accordance with the procedure described in the analysis data
generation method above.
[0193] Next, in step S122, the processing unit 20A generates the analysis data 81 from the
tone vector data 80, in accordance with the procedure described in the analysis data
generation method above.
[0194] Next, in step S123, the processing unit 20A obtains the 1st deep learning algorithm
and the 2nd deep learning algorithm stored in the algorithm database 105.
[0195] Next, in step S124, the processing unit 20A inputs the analysis data 81 to the first
neural network 60 forming the 1st deep learning algorithm. In accordance with the
procedure described in the disease analysis method above, the processing unit 20A
inputs the feature quantity outputted from the first neural network 60, to the second
neural network 61, and outputs a discernment result of the type of abnormal finding
from the second neural network 61. The processing unit 20A stores the discernment
result into the memory 22 or the storage unit 23. In addition, in step S124, the processing
unit 20A inputs the analysis data 81 to the first neural network 60 forming the 2nd
deep learning algorithm. In accordance with the procedure described in the disease
analysis method above, the processing unit 20A inputs the feature quantity outputted
from the first neural network 60, to the second neural network 62, and outputs a discernment
result of the type of cell from the second neural network 62. The processing unit
20A stores the discernment result into the memory 22 or the storage unit 23.
[0196] In step S125, the processing unit 20A determines whether the discernment has been
performed on all the analysis images 78 obtained first. When the discernment has been
performed on all the analysis images 78 (YES), the processing unit 20A advances to
step S126, generates the first information 63 on the basis of the discernment result
of the type of abnormal finding, and generates the second information on the basis
of the discernment result of the type of cell. When the discernment has not been performed
on all the analysis images 78 (NO), the processing unit 20A advances to step S127,
and performs the processes from step S121 to step S125 on the analysis images 78 for
which the discernment has not been performed.
[0197] Next, in step S128, the processing unit 20A obtains the machine learning algorithm
67. Subsequently, in step S129, the processing unit 20A inputs the first information
and the second information to the machine learning algorithm 67.
[0198] Lastly, in step S130, the processing unit 20A outputs an analysis result 83 to the
output unit 27, as a disease name or a label value associated with the disease name.
<Computer program>
[0199] A computer program that is for assisting the disease analysis and that causes a computer
to execute the processes of steps S21 to S30 or steps S121 to S130 is described. The
computer program may include a program that is for training a machine learning algorithm
and that causes a computer to execute the processes of steps S11 to S17 and steps
Sill to S115, or a program that is for training a machine learning algorithm and that
causes a computer to execute the processes of step S11 to S17 and step S1111 to S1115.
[0200] Further, a program product, such as a storage medium, having stored therein the computer
program is described. The computer program is stored in a storage medium such as a
hard disk, a semiconductor memory device such as a flash memory, or an optical disk.
The storage form of the program into the storage medium is not limited in particular,
as long as the processing unit can read the program. Preferably, the program is stored
in the storage medium in a nonvolatile manner.
[Disease analysis system 2]
<Configuration of disease analysis system 2>
[0201] Another aspect of the disease analysis system is described. FIG. 18 shows a configuration
example of a disease analysis system 2. The disease analysis system 2 includes the
user-side apparatus 200, and the user-side apparatus 200 operates as an integrated-type
disease analyzer 200B. The disease analyzer 200B is implemented as a general purpose
computer, for example, and performs both the deep learning process and the disease
analysis process described in the disease analysis system 1 above. That is, the disease
analysis system 2 is a stand-alone-type system that performs deep learning and disease
analysis on the user side. In the disease analysis system 2, the integrated-type disease
analyzer 200B installed on the user side has both functions of the training apparatus
100A and the disease analyzer 200A.
[0202] In FIG. 18, the disease analyzer 200B is connected to the imaging apparatus 400.
The imaging apparatus 400 captures training images 70 during the deep learning process,
and captures analysis target images 78 during the disease analysis process.
<Hardware configuration>
[0203] The hardware configuration of the disease analyzer 200B is the same as the hardware
configuration of the user-side apparatus 200 shown in FIG. 8.
<Function block and processing procedure>
[0204] FIG. 19 shows a function block diagram of the disease analyzer 200B. A processing
unit 20B of the disease analyzer 200B includes the deep learning training data generation
unit 101, the deep learning training data input unit 102, the deep learning algorithm
update unit 103, the machine learning training data generation unit 101a, 101b, the
machine learning training data input unit 102a, 102b, the machine learning algorithm
update unit 103a, 103b, the analysis data generation unit 201, the analysis data input
unit 202, and the analysis unit 203.
[0205] The processing unit 20B of the disease analyzer 200B performs the process shown in
FIG. 10 during the deep learning process, performs the process shown in FIG. 13 or
FIG. 14 during the machine learning process, and performs the process shown in FIG.
16 or FIG. 17 during the disease analysis process. With reference to the function
blocks shown in FIG. 19, during the deep learning process, the processes of steps
S11, S12, S16, and S17 in FIG. 10 are performed by the deep learning training data
generation unit 101. The process of step S13 in FIG. 10 is performed by the deep learning
training data input unit 102. The process of step S14 in FIG. 10 is performed by the
deep learning algorithm update unit 103. During the machine learning process, the
processes of steps S111, S112, S114, and S115 in FIG. 13 are performed by the machine
learning training data generation unit 101a. The process of step S113 is performed
by the machine learning training data input unit 102a. Alternatively, during the machine
learning process, the processes of steps S1111, S1112, S1114, and S1115 in FIG. 14
are performed by the machine learning training data generation unit 101b. The process
of step S1113 in FIG. 14 is performed by the machine learning training data input
unit 102b. During the disease analysis process, the processes of steps S21 and S22
in FIG. 16 are performed by the analysis data generation unit 201. The processes of
steps S23, S24, S25, and S27 in FIG. 16 are performed by the analysis data input unit
202. The process of step S26 in FIG. 16 is performed by the analysis unit 203. Alternatively,
the processes of steps S121 and S122 in FIG. 17 are performed by the analysis data
generation unit 201. The processes of steps S123, S124, S125, and S127 in FIG. 17
are performed by the analysis data input unit 202. The process of step S126 in FIG.
16 is performed by the analysis unit 203.
[0206] The procedures of the deep learning process and the disease analysis process performed
by the disease analyzer 200B are the same as the procedures of those performed by
the training apparatus 100A and the disease analyzer 200A. However, the disease analyzer
200B obtains the training image 70 from the imaging apparatus 400.
[0207] In the disease analyzer 200B, the user can confirm the discerning accuracy of the
discriminator. Should the discernment result by the discriminator be different from
the discernment result according to the observation of the image by the user, if the
analysis data 81 is used as the training data 75, and the discernment result according
to the observation of the image by the user is used as the label value 77, it is possible
to train the 1st deep learning algorithm and the 2nd deep learning algorithm again.
Accordingly, the training efficiency of the first neural network 50 and the second
neural network 51 can be improved.
[Disease analysis system 3]
<Configuration of disease analysis system 3>
[0208] Another aspect of the disease analysis system is described. FIG. 20 shows a configuration
example of a disease analysis system 3. The disease analysis system 3 includes the
vendor-side apparatus 100 and the user-side apparatus 200. The vendor-side apparatus
100 includes the processing unit 10 (10B), the input unit 16, and the output unit
17. The vendor-side apparatus 100 operates as an integrated-type disease analyzer
100B. The user-side apparatus 200 operates as a terminal apparatus 200C. The disease
analyzer 100B is implemented as a general purpose computer, for example, and is a
cloud-server-side apparatus that performs both the deep learning process and the disease
analysis process described in the disease analysis system 1 above. The terminal apparatus
200C is implemented as a general purpose computer, for example, transmits an analysis
target image to the disease analyzer 100B through the network 99, and receives an
analysis result image from the disease analyzer 100B through the network 99.
[0209] In the disease analysis system 3, the integrated-type disease analyzer 100B installed
on the vendor side has both functions of the training apparatus 100A and the disease
analyzer 200A. Meanwhile, the disease analysis system 3 includes the terminal apparatus
200C, and provides the user-side terminal apparatus 200C with an input interface for
the analysis image 78 and an output interface for an analysis result image. That is,
the disease analysis system 3 is a cloud service-type system in which the vendor side
that performs the deep learning process and the disease analysis process provides
an input interface for providing the analysis image 78 to the user side and the output
interface for providing the analysis result 83 to the user side. The input interface
and the output interface may be integrated.
[0210] The disease analyzer 100B is connected to the imaging apparatus 300 and obtains the
training image 70 captured by the imaging apparatus 300.
[0211] The terminal apparatus 200C is connected to the imaging apparatus 400 and obtains
the analysis target image 78 captured by the imaging apparatus 400.
<Hardware configuration>
[0212] The hardware configuration of the disease analyzer 100B is the same as the hardware
configuration of the vendor-side apparatus 100 shown in FIG. 7. The hardware configuration
of the terminal apparatus 200C is the same as the hardware configuration of the user-side
apparatus 200 shown in FIG. 8.
<Function block and processing procedure>
[0213] FIG. 21 shows a function block diagram of the disease analyzer 100B. A processing
unit 10B of the disease analyzer 100B includes the deep learning training data generation
unit 101, the deep learning training data input unit 102, the deep learning algorithm
update unit 103, the machine learning training data generation unit 101a, 101b, the
machine learning training data input unit 102a, 102b, the machine learning algorithm
update unit 103a, 103b, the analysis data generation unit 201, the analysis data input
unit 202, and the analysis unit 203.
[0214] The processing unit 20B of the disease analyzer 200B performs the process shown in
FIG. 10 during the deep learning process, performs the process shown in FIG. 13 or
FIG. 14 during the machine learning process, and performs the process shown in FIG.
16 or FIG. 17 during the disease analysis process. With reference to the function
blocks shown in FIG. 21, during the deep learning process, the processes of the steps
S11, S12, S16, and S17 in FIG. 10 are performed by the deep learning training data
generation unit 101. The process of step S13 in FIG. 10 is performed by the deep learning
training data input unit 102. The process of step S14 in FIG. 10 is performed by the
deep learning algorithm update unit 103. During the machine learning process, the
processes of steps S111, S112, S114, and S115 in FIG. 13 are performed by the machine
learning training data generation unit 101a. The process of step S113 is performed
by machine learning training data input unit 102a. Alternatively, during the machine
learning process, the processes of steps S1111, S1112, S1114, and S1115 in FIG. 14
are performed by the machine learning training data generation unit 101b. The process
of step S1113 in FIG. 14 is performed by the machine learning training data input
unit 102b. During the disease analysis process, the processes of the steps S21 and
S22 in FIG. 16 are performed by the analysis data generation unit 201. The processes
of steps S21 and S22 in FIG. 16 are performed by the analysis data generation unit
201. The processes of steps S23, S24, S25, and S27 in FIG. 16 is performed by the
analysis data input unit 202. The process of step S26 in FIG. 16 is performed by the
analysis unit 203. Alternatively, the processes of steps S121 and S122 in FIG. 17
are performed by the analysis data generation unit 201. The processes steps S123,
S124, S125, and S127 in FIG. 17 are performed by the analysis data input unit 202.
The process of step S126 in FIG. 16 is performed by the analysis unit 203.
[0215] The procedures of the deep learning process and the disease analysis process performed
by the disease analyzer 100B are the same as the procedures of those performed by
the training apparatus 100A and the disease analyzer 200A.
[0216] The processing unit 10B receives the analysis target image 78 from the user-side
terminal apparatus 200C, and generates the deep learning training data 75 in accordance
with the steps S11 to S17 shown in FIG. 10.
[0217] In step S26 shown in FIG. 12, the processing unit 10B transmits an analysis result
including the analysis result 83 to the user-side terminal apparatus 200C. In the
user-side terminal apparatus 200C, a processing unit 20C outputs the received analysis
result to the output unit 27.
[0218] As described above, the user of the terminal apparatus 200C can obtain the analysis
result 83 by transmitting the analysis target image 78 to the disease analyzer 100B.
[0219] According to the disease analyzer 100B, the user can use the discriminator without
obtaining the training data database 104 and the algorithm database 105 from the training
apparatus 100A. Accordingly, the service of discerning the type of cell and the feature
of cell based on morphological classification can be provided as a cloud service.
[Other embodiments]
[0220] The present disclosure is not limited to the above embodiment.
[0221] In the above embodiment, an example of a method for generating the deep learning
training data 75 by converting the tone into brightness Y, first hue Cb, and second
hue Cr, has been described. However, the conversion of the tone is not limited thereto.
Without converting the tone, the three primary colors of red (R), green (G), and blue
(B), for example, may be directly used. Alternatively, two primary colors obtained
by excluding one hue from the primary colors may be used. Alternatively, one (for
example green (G) only) obtained by selecting any one of the three primary colors
of red (R), green (G), and blue (B) may be used. Conversion into the three primary
colors of pigment of cyan (C), magenta (M), and yellow (Y) may be employed. Also,
for example, the analysis target image 78 is not limited to a color image of the three
primary colors of red (R), green (G), and blue (B), and may be a color image of two
primary colors. It is sufficient that the image includes one or more primary colors.
[0222] In the training data generation method and the analysis data generation method described
above, in step S11, the processing unit 10A, 20B, 10B generates the tone matrices
72y, 72cb, 72cr from the training image 70. However, the training image 70 may be
the one converted into brightness Y, first hue Cb, and second hue Cr. That is, the
processing unit 10A, 20B, 10B may originally obtain brightness Y, first hue Cb, and
second hue Cr, directly from the virtual slide scanner or the like, for example. Similarly,
in step S21, although the processing unit 20A, 20B, 10B generates the tone matrices
72y, 72cb, 72cr from the analysis target image 78, the processing unit 20A, 20B, 10B
may originally obtain brightness Y, first hue Cb, and second hue Cr, directly from
the virtual slide scanner or the like, for example.
[0223] Other than RGB and CMY, various types of color spaces such as YUV and CIE L*a*b*
can be used in image obtainment and tone conversion.
[0224] In the tone vector data 74 and the tone vector data 80, for each pixel, information
of tone is stored in the order of brightness Y, first hue Cb, and second hue Cr. However,
the order of storing the information of tone and the handling order thereof are not
limited thereto. However, the arrangement order of the information of tone in the
tone vector data 74 and the arrangement order of information of tone in the tone vector
data 80 are preferably the same with each other.
[0225] In each image analysis system, the processing unit 10A, 10B is realized as an integrated
device. However, the processing unit 10A, 10B may not necessarily be an integrated
device. Instead, a configuration may be employed in which: the CPU 11, the memory
12, the storage unit 13, the GPU 19 and the like are provided at separate places;
and these are connected through a network. Also, the processing unit 10A, 10B, the
input unit 16, and the output unit 17 may not necessarily be provided at one place,
and may be respectively provided at separate places and communicably connected with
one another through a network. This also applies to the processing unit 20A, 20B,
20C.
[0226] In the disease analysis support system described above, function blocks of the deep
learning training data generation unit 101, the machine learning training data generation
unit 101a, 101b, the deep learning training data input unit 102, the machine learning
training data input unit 102a, 102b, the deep learning algorithm update unit 103,
the machine learning algorithm update unit 103a, 103b, the analysis data generation
unit 201, the analysis data input unit 202, and the analysis unit 203 are executed
by the single CPU 11 or the single CPU 21. However, these function blocks may not
necessarily be executed by a single CPU, and may be executed in a distributed manner
by a plurality of CPUs. These function blocks may be executed in a distributed manner
by a plurality of GPUs, or may be executed in a distributed manner by a plurality
of CPUs and a plurality of GPUs.
[0227] In the disease analysis support system described above, the program for performing
the process of each step described in FIG. 10 and FIG. 12 is stored in advance in
the storage unit 13, 23. Instead, the program may be installed in the processing unit
10B, 20B from a computer-readable non-transitory tangible storage medium 98 such as
a DVD-ROM or a USB memory, for example. Alternatively, the processing unit 10B, 20B
may be connected to the network 99 and the program may be downloaded from, for example,
an external server (not shown) through the network 99 and installed.
[0228] In each disease analysis system, the input unit 16, 26 is an input device such as
a keyboard or a mouse, and the output unit 17, 27 is realized as a display device
such as a liquid crystal display. Instead, the input unit 16, 26 and the output unit
17, 27 may be integrated to realize a touch-panel-type display device. Alternatively,
the output unit 17, 27 may be implemented by a printer or the like.
[0229] In each disease analysis system described above, the imaging apparatus 300 is directly
connected to the training apparatus 100A or the disease analyzer 100B. However, the
imaging apparatus 300 may be connected to the training apparatus 100A or the disease
analyzer 100B via the network 99. Similarly, with respect to the imaging apparatus
400, although the imaging apparatus 400 is directly connected to the disease analyzer
200A or the disease analyzer 200B, the imaging apparatus 400 may be connected to the
disease analyzer 200A or the disease analyzer 200B via the network 99.
[Effect of discriminator]
<Training of deep learning algorithm and machine learning algorithm>
[0230] A total of 3,261 peripheral blood (PB) smear preparations were used for evaluation.
The peripheral blood (PB) smear preparations included 1,165 peripheral blood (PB)
smear preparations (myelodysplastic syndrome (n = 94), myeloproliferative neoplasm
(n = 127), acute myeloid leukemia (n = 38), acute lymphoblastic leukemia (n = 27),
malignant lymphoma (n = 324), multiple myeloma (n = 82), and non-neoplastic blood
disease (n = 473)), which were derived from subjects having blood diseases and which
were obtained in Juntendo University Hospital during 2017 to 2018. PB smear preparation
slides were stained with May Grunwald-Giemsa and created by a smear preparation creation
apparatus SP-10 (manufactured by Sysmex Corporation). From the PB smear preparation
slides, a total of 703,970 digitized cell images were obtained by using a blood cell
differential automatic analyzer DI-60 (manufactured by Sysmex Corporation). From the
images, deep learning training data 75 was generated according to the deep learning
training data generation method described above.
[0231] As the first computer algorithm, a deep learning algorithm was used. As for the
deep learning algorithm, Convolutional Neural Network (CNN) was used as the first
neural network, and Fully Connected Neural Network (FCNN) was used as the second neural
network, and discernment as to the type of cell and the type of abnormal finding was
performed.
[0232] As the second computer algorithm, Extreme Gradient Boosting (EGB), which is a machine
learning algorithm, was used, to construct an automatic disease analysis support system.
[0233] FIG. 22 shows a configuration of the discriminator used in Example. In order to simultaneously
detect the type of cell and the type of abnormal finding, the deep learning algorithm
was systematized.
[0234] This deep learning algorithm is composed of two major modules, i.e., "CNN module"
and "FCNN module". The CNN module extracts features expressed by tone vector data
from images captured by DI-60. The FCNN module analyzes the features extracted by
the CNN module, and classifies cell images into 17 types of cells, as well as 97 features
of abnormal findings such as the size and shape of cells and nuclei, cytoplasm image
patterns, and the like.
[0235] "CNN module" is composed of two sub-modules. The first (upstream) sub-module has
three identical blocks, and each block has two parallel paths each composed of several
convolution network layers. These layers optimize extraction of a feature to the next
block on the basis of on input image data and output parameters. The second (downstream)
sub-module includes eight consecutive blocks. Each block has two parallel paths composed
of a series of convolution layers and a path that does not include convolution components.
This is called Residual Network (ResNet). The ResNet functions as a buffer for preventing
saturation of the system.
[0236] Respective layers of separable convolution, exception-based convolution layer (Conv
2D), batch normalization layer (BN), and activation layer (ACT) have different roles.
Separable convolution is a modified-type convolution called Xception. Conv 2D is a
major component of a neural network that optimizes parameters when forming a "feature
map" through extraction of features and processing of an image. ACT following the
two layers of Conv 2D and BN is Rectified Linear Unit (ReLU). The first sub-module
is connected to the second sub-module which is composed of eight consecutive similar
blocks in order to create a feature map. Conv 2D bypasses in the second module in
order to avoid unexpected deep layer saturation, which leads to effective calculation
of weight due to backpropagation. The deep convolution neural network architecture
above was implemented by backend of Keras and Tensorflow.
[0237] Discernment results of 17 types of cells and discernment results of 97 types of abnormal
findings for each type of cell obtained by the first computer algorithm were used
for training the machine learning algorithm. When discerning the types of abnormal
findings for each type of cell, neutrophil was associated with abnormal findings without
differentiating segmented neutrophil and band neutrophil. Among the abnormal findings
shown in FIGS. 3A and 3B, items of "other abnormalities", "pseudo Chediak-Higashi
granule-like", and "other abnormalities (including aggregation)" of platelet were
excluded from analysis. The first information was generated from the discernment results
of the types of abnormal findings for each type of cell, the second information was
generated from the discernment results of the type of cell, and the generated first
information and second information were inputted to XG Boost.
[0238] In order to train the deep learning algorithm, 703,970 digitized cell images were
divided into 695,030 images for a training data set and 8,940 images for a validation
data set.
[0239] In order to construct the system, peripheral blood cells of 89 myelodysplastic syndrome
(MDS) cases and 43 aplastic anemia (AA) cases were used for the training. Next, using
PB smear preparation images obtained from 26 MDS patients and 11 AA patients, the
automatic disease analysis support system based on EGB was validated.
[0240] Discernment of cells used in the training were performed by two committee-authorized
blood test laboratory technicians and one senior hematopathologist in accordance with
the morphological criteria according to H20-A2 guideline of Clinical and Laboratory
Standard Institute (CLSI) and the WHO classification of bone marrow tumor and acute
leukemia revised in 2016. The training data set was classified into 17 types of cells
and 97 types of abnormal findings.
[0241] FIG. 23 shows the number and the types of cell images for the training and the validation.
[0242] After the training, the evaluation data set was used to evaluate the performance
of the first computer algorithm. FIG. 24 shows the accuracy of discernment results
of the types of cells according to the trained first computer algorithm. The sensitivity
and the specificity calculated by using an ROC curve were good.
[0243] FIG. 25 shows the accuracy of discernment results of the types of abnormal findings
according to the trained first computer algorithm. The sensitivity, the specificity,
and AUC calculated by using an ROC curve were good.
[0244] Therefore, it was shown that the discerning accuracy of the trained first computer
algorithm was good.
[0245] Next, MDS and AA were differentiated by using the discriminator. FIG. 26 is a diagram
showing the contribution degree of the types of abnormal findings for each type of
cell expressed as a heat map of SHAP values. In the heat map shown in FIG. 26, each
column corresponds to a specimen of one patient, and each row corresponds to an abnormal
finding for each type of cell. The patients from the first column at the left end
to the 26th column correspond to MDS patients. The patients from the 27-th column
to the 37-th column at the right end correspond to AA patients. The magnitude of the
detection rate is expressed by the darkness/paleness of the heat map. FIG. 26 reveals
that the detection rates of abnormal degranulation of neutrophil and abnormal granules
of eosinophil and the detection rate of megathrombocyte in MDS patients are significantly
higher than those of AA patients.
[0246] FIG. 27 shows a result of evaluation as to the accuracy of the discriminator in differential
diagnosis between MDS and AA. The evaluation was performed in terms of sensitivity,
specificity, and AUC calculated by using an ROC curve. The sensitivity and the specificity
of the discriminator were 96.2% and 100%, respectively, and AUC of the ROC curve was
0.990. Thus, high accuracy in differential diagnosis between MDS and AA was shown.
[0247] The discriminator described above was shown to be useful for supporting disease analysis.
DESCRIPTION OF THE REFERENCE CHARACTERS
[0248]
- 200
- disease analysis support apparatus
- 20
- processing unit
- 60
- first neural network
- 61
- second neural network
- 62
- second neural network
- 67
- machine learning algorithm
- 55
- disease information
- 53
- information regarding the of abnormal finding
- 54
- information regarding the type of cell
- 81
- analysis data