Technical Field
[0001] This description discloses a method for predicting indications for a test substance
in humans, a device for predicting indications for a test substance in humans, a program
for predicting indications for a test substance in humans, and a method for training
an artificial intelligence model for use in predicting indications for a test substance
in humans and a trained artificial intelligence model.
Background Art
[0002] Development of a new drug starts with a drug discovery research (discovery phase)
to find candidate substances for new drugs, followed by a preclinical trial (phase
0) using animals or cultured cells and phases I to III clinical trials in humans.
Only the substances that have passed the trials are allowed to apply for permission
to receive approval for manufacture and sale as pharmaceutical products from the Ministry
of Health, Labour and Welfare. Then, even after having undergone a review required
to be approved as a pharmaceutical product and having been launched on the market,
there is provided a period for monitoring adverse events or effects that were not
predicted during the development and approval review stages. As described above, it
takes huge amount of time and money for one new drug to be launched on the market.
On the other hand, the probability for a substance to proceed from the discovery phase
to approval for manufacture and sale is said to be about 1.6%. It is also said that
the only 13.8% of the substances that passed the preclinical trial exhibit an effect
without adverse events during the clinical trials from the passage of the preclinical
trial to the phase III and reach the application for approval. In other words, more
than 80% of candidate substances drop out during the phase I to III clinical trials.
The loss due to the dropout is said to be as enormous as 150 to 200 million dollars
per substance.
[0003] As a method for assisting the exploration of candidate substances for new drugs in
the development of a new drug, Patent Document 1 discloses a method including comparing
test data of an organ-related index factor in each organ obtained from cells or tissues
derived from one or more organs of individuals to which a test substance has been
administered with preliminarily determined corresponding standard data of the organ-related
index factor to obtain a pattern similarity for calculating the similarity of the
pattern of the organ-related index factor, and predicting the efficacies or side effects
of the test substance in the one or more organs and/or in organs other than the one
or more organs using the pattern similarity of the organ-related index factor as an
index.
[0004] Also, as a method for predicting efficacies or side effects of a candidate substance
in the development of a new drug, Patent Document 2 discloses an artificial intelligence
model for predicting one or multiple actions of a test substance on humans from the
dynamics of transcriptome in multiple different organs which are the same as multiple
different organs collected from non-human animals to which the test substance has
been administered to prepare training data. The method includes inputting a data set
indicating the dynamics of transcriptome in multiple different organs collected from
non-human animals to which multiple existing substances with a known action on humans
have been individually administered for each of the non-human animals and data indicating
known actions of each existing substance on humans into the artificial intelligence
model as training data to train the artificial intelligence model.
Citation List
Patent Document
Summary of Invention
Technical Problem
[0006] One object of this disclosure is to predict effectively an indication for a test
substance in humans from the dynamics of a biomarker in response to the administration
of the test substance to animals other than humans.
[0007] By the method described in Patent Document 2, it is only possible to predict efficacies
already known about the existing substances used to acquire transcriptome data of
multiple organs as training data.
[0008] An object of the present invention is to predict an efficacy of a test substance
even if the test substance has an efficacy that has not been known about existing
substances used to acquire training data.
Solution to Problem
[0009] The present invention may include the following aspects as embodiments.
[0010] Embodiment 1. A certain embodiment of the present invention relates to a method for
training an artificial intelligence model. The training method includes inputting
a first training data set, a second training data set and a third training data set
in association with one another into an artificial intelligence model to train the
artificial intelligence model, the first training data set being a set of data in
which a set of data indicating the dynamics of a biomarker in one organ or each of
multiple different organs collected from respective non-human animals to which multiple
predetermined existing substances with a known indication in humans have been individually
administered is linked with labels indicating respective names of the administered
predetermined existing substances, the second training data set being a set of data
in which labels indicating respective names of the multiple predetermined existing
substances are linked with labels indicating the indications reported for each of
the multiple predetermined existing substances, the third training data set being
a set of data in which labels indicating the indications reported for each of the
multiple predetermined existing substances are linked with information about adverse
events reported correspondingly to each of these indications, wherein the artificial
intelligence model is for predicting an indication for a test substance in humans.
[0011] Embodiment 2. In the training according to Embodiment 1, the first training data
set and the third training data set are linked by means of the second training data
set to generate a fourth training data set, and the fourth training data set is input
into the artificial intelligence model.
[0012] Embodiment 3. In the training method according to Embodiment 1 or 2, the information
about adverse events includes labels indicating the adverse events, and the presence
or absence or frequencies of occurrence of the adverse events in the indications.
[0013] Embodiment 4. In the training method according to any one of Embodiments 1 to 3,
the biomarker is a transcriptome.
[0014] Embodiment 5. In the training method according to any one of Embodiments 1 to 4,
the artificial intelligence model is a One-Class SVM
[0015] Embodiment 6. A certain embodiment of the present invention relates to a training
device for an artificial intelligence model. The training device includes a processing
part, wherein the processing part inputs a first training data set, a second training
data set and a third training data set in association with one another into an artificial
intelligence model to train the artificial intelligence model, the first training
data set being a set of data in which a set of data indicating the dynamics of a biomarker
in one organ or each of multiple different organs collected from respective non-human
animals to which multiple predetermined existing substances with a known indication
in humans have been individually administered is linked with labels indicating respective
names of the administered predetermined existing substances, the second training data
set being a set of data in which labels indicating respective names of the multiple
predetermined existing substances are linked with labels indicating the indications
reported for each of the multiple predetermined existing substances, the third training
data set being a set of data in which labels indicating the indications reported for
each of the multiple predetermined existing substances are linked with information
about adverse events reported correspondingly to each of these indications, and wherein
the artificial intelligence model is for predicting an indication for a test substance
in humans.
[0016] Embodiment 7. A certain embodiment of the present invention relates to a program
for training an artificial intelligence model that, when executed by a computer, causes
the computer to execute the step of inputting a first training data set, a second
training data set and a third training data set in association with one another into
an artificial intelligence model to train the artificial intelligence model. In the
program, the first training data set is a set of data in which a set of data indicating
the dynamics of a biomarker in one organ or each of multiple different organs collected
from respective non-human animals to which multiple predetermined existing substances
with a known indication in humans have been individually administered is linked with
labels indicating respective names of the administered predetermined existing substances,
the second training data set is a set of data in which labels indicating respective
names of the multiple predetermined existing substances are linked with labels indicating
the indications reported for each of the multiple predetermined existing substances,
and the third training data set is a set of data in which labels indicating the indications
reported for each of the multiple predetermined existing substances are linked with
information about adverse events reported correspondingly to each of these indications,
wherein the artificial intelligence model is for predicting indication for a test
substance in humans.
[0017] Embodiment 8. A certain embodiment of the present invention relates to a method for
predicting an indication for a test substance in humans. The method includes the steps
of: acquiring a first test data set, the first test data set being a set of data indicating
the dynamics of a biomarker in one or multiple organs collected from non-human animals
to which a test substance has been administered, and inputting the first test data
set and a second test data set into an artificial intelligence model trained by a
method according to any one of Embodiments 1 to 5 to use the trained artificial intelligence
model to predict an indication for the test substance in humans based on the first
test data set and the second test data set input thereinto, the second test data set
being a set of data in which labels of multiple known indications are linked with
information about adverse events reported correspondingly to each of the multiple
known indications.
[0018] Embodiment 9. In the prediction method according to Embodiment 8, the test substance
does not include an existing substance or an equivalent substance of an existing substance.
[0019] Embodiment 10. In the prediction method according to Embodiment 8 or 9, the test
substance is one selected from existing substances or equivalent substances of existing
substances.
[0020] Embodiment 11. A certain embodiment of the present invention relates to a prediction
device for predicting an indication for a test substance in humans. The prediction
device includes a processing part, wherein the processing part inputs a first test
data set and a second test data set into an artificial intelligence model trained
by a method according to any one of Embodiments 1 to 5 to use the trained artificial
intelligence model to predict an indication for the test substance in humans based
on the first test data set and the second test data set input thereinto, the first
test data set being a set of data indicating the dynamics of a biomarker in one or
multiple organs corresponding to one or multiple organs collected from non-human animals
to which the test substance has been administered to generate the first training data
set, the second test data set being a set of data in which labels of multiple known
indications are linked with information, acquired to generate a third training data
set, about adverse events reported correspondingly to each of the multiple known indications.
[0021] Embodiment 12. A certain embodiment of the present invention relates to a computer
program for predicting an indication for a test substance in humans that, when executed
by a computer, causes the computer to execute the step of: inputting a first test
data set and a second test data set into an artificial intelligence model trained
by a method according to any one of Embodiments 1 to 5 to use the trained artificial
intelligence model to predict an indication for the test substance in humans based
on the first test data set and the second test data set input thereinto, the first
test data set being a set of data indicating the dynamics of a biomarker in one or
multiple organs corresponding to one or multiple organs collected from non-human animals
to which the test substance has been administered to generate the first training data
set, the second test data set being a set of data in which labels of multiple known
indications are linked with information about adverse events reported correspondingly
to each of the multiple known indications.
[0022] Embodiment 13. A certain embodiment of the present invention relates to a prediction
system for predicting an indication for a test substance in humans. The system includes:
a server device for transmitting a first test data set, the first test data set being
a set of data indicating the dynamics of a biomarker in one or multiple organs collected
from non-human animals to which the test substance has been administered, and a prediction
device for predicting an action of the test substance on humans connected to the server
device via a network. The server device includes a communication part for transmitting
the first test data set, the prediction device includes a processing part and a communication
part, wherein the processing part acquires the first test data set transmitted via
the communication part of the server device via the communication part of the prediction
device, and inputs the acquired first test data set and a second test data set into
an artificial intelligence model trained by a method according to any one of Embodiments
1 to 5 to use the trained artificial intelligence model to predict an indication for
the test substance in humans based on the first test data set and the second test
data set input thereinto, the first test data set being a set of data indicating the
dynamics of a biomarker in one or multiple organs corresponding to one or multiple
organs collected from non-human animals to which the test substance has been administered
to generate the first training data set, the second test data set being a set of data
in which labels of multiple known indications are linked with information, acquired
to generate a third training data set, about adverse events reported correspondingly
to each of the multiple known indications.
[0023] Embodiment 14. A certain embodiment of the present invention relates to a method
for using a first training data set, a second training data set and a third training
data set to train an artificial intelligence model for predicting an indication for
a test substance in humans, the first training data set being a set of data in which
a set of data indicating the dynamics of a biomarker in one organ or each of multiple
different organs collected from respective non-human animals to which multiple predetermined
existing substances with a known indication in humans have been individually administered
is linked with labels indicating the names of existing substances administered to
acquire the set of data indicating the dynamics of a biomarker, the second training
data set being a set of data in which labels indicating respective names of the multiple
predetermined existing substances are linked with labels indicating the indications
reported for each of the multiple predetermined existing substances, the third training
data set being a set of data in which labels indicating the indications are linked
with information about adverse events reported correspondingly to each of the indications.
[0024] Embodiment 15. This embodiment relates to a method for using a first test data set
and a second test data set as test data for predicting an indication for a test substance
in humans. In the method, the first test data set is a set of data indicating the
dynamics of a biomarker in one or multiple organs corresponding to one or multiple
organs collected from non-human animals to which the test substance has been administered
to generate the first training data set, and the second test data set is a set of
data in which labels of multiple known indications are linked with information about
adverse events reported correspondingly to each of the multiple known indications.
Advantageous Effects of Invention
[0025] It is possible to predict an efficacy of a test substance even if the test substance
has an efficacy that has not been known about existing substances used to acquire
training data.
Brief Description of Drawings
[0026]
FIG. 1 illustrates an overview of the present invention.
FIG. 2 illustrates an overview of the invention described in Patent Document 2 (prior
art).
FIG. 3 shows examples of training data. FIG. 3(A) shows examples of first training
data. FIG. 3(B) shows examples of second training data. FIG. 3(C) shows examples of
third training data. FIG. 3(D) shows examples of fourth training data.
FIG. 4(A) illustrates a hardware configuration of a training system. FIG. 4(B) illustrates
a hardware configuration of a prediction system.
FIG. 5 illustrates a hardware configuration of a training device.
FIG. 6 is a flowchart showing the flow of processing by a training program.
FIG. 7 illustrates a hardware configuration of a prediction device.
FIG. 8 is a flowchart showing the flow of processing by a prediction program.
FIG. 9 illustrates a hardware configuration of a server device.
FIG. 10 is a flowchart showing the flow of processing in a prediction system.
FIG. 11 shows a prediction result from an artificial intelligence trained without
using transcriptome data of test drugs.
FIG. 12 shows a prediction result from an artificial intelligence trained using transcriptome
data of test drugs.
FIG. 13 shows some of decision function values of alendronate.
Description of Embodiments
1. Overview of training method and prediction method, and description of terms
[0027] First, a method for training an artificial intelligence and a prediction method as
certain embodiments of this disclosure are outlined. Also, the differences between
conventional methods and the training and prediction methods included in this disclosure
are described.
[0028] The prediction method predicts an indication for a test substance in humans. Preferably,
the prediction method predicts an indication for a test substance in humans based
on information about the dynamics of a biomarker in non-human animals to which an
existing substance with a known action on humans has been administered, known indications,
and adverse events reported correspondingly to the known indications. The prediction
is made using an artificial intelligence model.
(1) Training phase
[0029] As shown in FIG. 1, the artificial intelligence model used for the prediction is
preferably trained by means of a data set including three types of training data sets,
i.e., a first training data set, a second training data set and a third training data
set, which are associated with one another.
[0030] As shown in FIG. 1, the first training data set is a set of data in which a set of
data indicating the dynamics of a biomarker in one organ or each of multiple different
organs collected from respective non-human animals after multiple predetermined existing
substances with a known indication in humans have been individually administered to
the non-human animals is linked with labels indicating respective names of the administered
predetermined existing substances. The first training data set is generated as shown
in FIG. 1. For example, Drugs A, B and C as predetermined existing substances are
individually administered to non-human animals such as mice and organs or tissues
as parts of organs are respectively collected from the non-human animals. Next, the
dynamics of a biomarker in the collected organs or tissues is analyzed and a first
training data set is generated from [data indicating respective organ names and the
dynamics of a biomarker] and [respective names of the administered drugs]. FIG. 3(A)
shows a more specific example of the first training data set. In the example of the
first training data set shown in FIG. 3(A), the leftmost column is referred to as
"first column." In the first column shown in FIG. 3(A), a drug name "Aripiprazole"
and a drug name "EMPA" are shown as examples. In the second and subsequent columns,
the expression level of RNA in each organ is shown. "Heart" and "Skin" are labels
of organ names, and "Alas2" and "Apod" are labels of names of genes whose expression
was analyzed. In the second and subsequent columns and the second and subsequent rows,
values indicating expression levels of respective genes have been entered as elements.
In the first training data set, [labels indicating organ names and labels indicating
gene names] and [values indicating expression levels of respective genes] correspond
to labels indicating drug names.
[0031] As shown in FIG. 1, the second training data set are a set of data in which labels
indicating respective names of the multiple predetermined existing substances administered
to obtain the first training data set (the first column of FIG. 3(A)) are linked with
labels indicating the indications reported for each of the multiple predetermined
existing substances. FIG. 3(B) shows a specific example of the second training data
set. In the example of the second training data set shown in FIG. 3(B), the leftmost
column is referred to as "first column." In the first column shown in FIG. 3(B), a
drug name "Aripiprazole" and a drug name "EMPA" are shown as examples. The second
and subsequent columns show indications reported for each drug listed in the first
column. Here, "Nerve injury" is shown as a name label indicating an indication for
the drug name "Aripiprazole," and "Type 2 diabetes mellitus" is shown as a name label
indicating an indication for the drug name "EMPA."
[0032] As shown in FIG. 1, the third training data is a set of data in which labels indicating
the indications reported for each of the multiple predetermined existing substances
administered to obtain the first training data set as shown in FIG. 3(B) are linked
with information about adverse events reported correspondingly to each of these indications.
Here, the information about adverse events may include labels indicating the names
of adverse events, and the presence or absence or frequencies of occurrence of adverse
events. FIG. 3(C) shows a more specific example of the third training data set. In
the example of the third training data set shown in FIG. 3(C), the leftmost column
is referred to as "first column." "Nerve injury," which is an indication for the drug
name "Aripiprazole" shown in "Indication 1" of FIG. 3(B), is shown in the first column
of FIG. 3(C). Also, as an indication for the drug name "EMPA" shown in "Indication
1" of FIG. 3(B), "Type 2 diabetes mellitus" is shown in the first column of FIG. 3(C).
The uppermost column of the second and subsequent columns of FIG. 3(C) shows labels
indicating the names of adverse events, and "Sleep disorder" and "Blood glucose decreased"
are shown here. The numerical values in the second and subsequent rows of the second
column of FIG. 3(C) show the frequencies of occurrence of respective adverse events.
[0033] In the training method of this embodiment, it is a fourth training data set that
is input into an artificial intelligence, the fourth training data set being generated
by linking the first training data set with the third training data set by means of
the second training data set.
[0034] FIG. 3(D) shows an example of the fourth training data set. In the example of the
fourth training data set shown in FIG. 3(D), the leftmost column is referred to as
"first column." In the first and second columns, labels indicating the names of the
adverse events shown in FIG. 3(C) and the frequencies of occurrence of respective
adverse events are shown. Also, in the fourth and subsequent columns, labels indicating
the names of organs and labels of the names of genes, and the expression levels of
the genes shown in FIG. 3(A) are shown. In other words, FIG. 3(D) shows a data set
including the frequencies of occurrence of the adverse events in the second and subsequent
columns of FIG. 3(C) which are substituted for the labels in the first column showing
the names of drugs in FIG. 3(A).
(2) Prediction phase
[0035] An artificial intelligence model trained as described in Section 1.(1) above is used
to predict an indication for a test substance in humans. The test data sets that are
input into the trained artificial intelligence model to predict an indication are
a first test data set and a second test data set. The first test data set is input
into the trained artificial intelligence model together with the second test data
set.
[0036] The first test data set is a set of data indicating the dynamics of a biomarker in
one or multiple organs collected from non-human animals to which the test substance
has been administered. Also, the multiple organs correspond to the organs collected
to generate the first training data set. Preferably, the first test data is data in
which [labels indicating organ names and labels indicating gene names] are linked
with [values indicating the expression levels of respective genes] which is obtained
by administering one test substance to non-human animals and analyzing transcriptome
in one or multiple organs collected therefrom.
[0037] The second test data set is a set of data in which labels of multiple known indications
are linked with the information, acquired to generate a third training data set, about
adverse events reported correspondingly to each of the multiple known indications.
Here, the multiple known indications may include not only the indications used as
the second training data but also known indications registered in an external database.
In the known indications, the term "multiple" may be intended to mean, for example,
100, 500, 1000 or 2000, or more.
[0038] Here, in the prediction method, the test substance does not have to be an existing
substance or an equivalent substance of an existing substance. When the test substance
is not an existing substance or an equivalent substance of an existing substance,
the prediction method serves as a method for predicting an indication for a new substance.
[0039] Also, in the prediction method, an existing substance or an equivalent substance
of an existing substance may be included as a test substance. In this case, the prediction
method serves as a drug repositioning method for exploring a new indication for an
existing substance or an equivalent substance of an existing substance. When the prediction
method described in this specification is used as a drug repositioning method, it
is preferred to include the test substance in the existing substances administered
to acquire the first training data set. In this way, the prediction accuracy can be
increased.
(3) Comparison with conventional method
[0040] The conventional method shown in FIG. 2 is a method described in Patent Document
2, in which Drugs A, B and C as existing substances, for example, are individually
administered to non-human animals such as mice and organs or tissues as parts of organs
are collected from the respective non-human animals. Next, the dynamics of a biomarker
in the collected organs or tissues is analyzed to generate a first training data set.
Also, second training data is generated from a human clinical database for, for example,
adverse events, indications, drug kinetics and indications for existing substances.
Then, the artificial intelligence model shown in FIG. 2 is generated by training using
the first training data set and the second training data. In other words, in the conventional
method, an artificial intelligence model is constructed by associating the dynamics
of a biomarker with each one of adverse events, indications, drug kinetics or indications
for existing substances. Also, the test data used in the conventional method is data
indicating the dynamics of a biomarker in one organ or multiple different organs corresponding
to one organ or multiple organs collected from non-human animals to which the test
substance has been administered to generate the first training data set.
[0041] This embodiment is different from the conventional method in that not only the dynamics
of a biomarker but also information about adverse events that is substituted for the
indication names are used as training data. Also, as test data as well, not only the
dynamics of a biomarker but also information about multiple known indications and
adverse events are used.
[0042] It is, therefore, possible to predict an indication for a test substance even if
the test substance has an indication that has not been known about existing substances
used to acquire the training data.
(4) Description of terms
[0043] In this disclosure, the non-human animals are not limited. Examples include mammals
such as mice, rats, dogs, cats, rabbits, cows, horses, goats, sheep and pigs, and
birds such as chickens. Preferably, the non-human animals are mammals such as mice,
rats, dogs, cats, cows, horses and pigs, more preferably mice, rats or the like, and
still more preferably mice. The non-human animals also include fetuses, chicks and
so on of the animals.
[0044] In this disclosure, the term "substance" may include, for example, compounds; nucleic
acids; carbohydrates; lipids; glycoproteins; glycolipids; lipoproteins; amino acids;
peptides; proteins; polyphenols; chemokines; at least one metabolic substance selected
from the group consisting of ultimate metabolites, intermediary metabolites and synthetic
raw material substances of the above-mentioned substances; metal ions; or microorganisms.
Also, the substance may be a simple substance or may be a mixture of multiple substances.
Preferably, the term "substance" includes, for example, pharmaceutical products, quasi-pharmaceutical
products, cosmeceutical products, foods, foods for specified health use, foods with
functional claims, and candidates therefor. Also, the term "substance" may also include
substances whose testing was discontinued or suspended during a preclinical or clinical
trial for pharmaceutical approval.
[0045] The "existing substance" is not limited as long as it is an existing substance. Preferably,
it is a substance with a known action on humans. Also, the term "equivalent substance
of an existing substance" may include those that are similar in structure to an existing
substance and has a similar action thereto. The term "similar action" here is intended
to mean having the same kind of action as an existing substance although the intensity
of the action may be the same or different.
[0046] The "adverse event" is not limited as long as it is an action that is determined
to be harmful to humans. Preferred examples include adverse events listed in an external
database such as FAERS (https://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/Adverse
DrugEffects/ucm082193.htm) or clinicaltrials.gov (https://clinicaltrials.gov/).
[0047] The "indication" is not limited as long as it is a disorder or symptom in humans
that should be mitigated, treated, arrested or prevented. Examples of the disorder
or symptom include disorders or symptoms listed in an external database such as the
above-mentioned FAERS, all drug labels of DAILYMED (https://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm
), Medical Subject Headings (https://www.nlm.nih.gov/mesh/meshhome.html), Drugs@FDA
(https://www.accessdata.fda.gov/scripts/cder/daf/), or International Classification
of Diseases (https://www.who.int/health-topics/international-classification-of-diseases).
More specifically, examples of the indication include ischemic diseases such as thrombosis,
embolism and stenosis (in particular, heart, brain, lungs, large intestine, etc.);
circulatory disorders such as aneurysm, phlebeurysm, congestion and hemorrhage (aortae,
veins, lungs, liver, spleen , retinae, etc.); allergic diseases such as allergic bronchitis
and glomerulonephritis; dementia such as Alzheimer's dementia; degenerative disorders
such as Parkinson's disease, amyotrophic lateral sclerosis and myasthenia gravis (nerves,
skeletal muscles, etc.); tumors (benign epithelial tumor, benign non-epithelial tumor,
malignant epithelial tumor, malignant non-epithelial tumor); metabolic diseases (abnormal
carbohydrate metabolism, abnormal lipid metabolism, electrolyte imbalance); infectious
diseases (bacteria, viruses, rickettsia, chlamydia, fungi, protozoa, parasite, etc.);
and symptoms or illnesses associated with autoimmune diseases or the like such as
renal diseases, systemic erythematodes and multiple sclerosis.
[0048] The incidence rate of an adverse event can be obtained by the following method. A
word indicating the name of the adverse event is extracted by, for example, text extraction
from a database such as clinicaltrials.gov, FAERS, or all drug labels of DAILYMED
as described above. One extracted word can be counted as one reported adverse event.
For one existing substance, the incidence rate can be obtained according to the equation:
Incidence rate = (the number of cases reported for one adverse event)/(the total number
of cases of adverse events reported for the existing substance). When explanation
related to actions is registered in text form in a database, syntactic analysis, word
segmentation, semantic analysis or the like may be performed on the registered texts
by natural language processing before the extraction of the texts corresponding to
the actions.
[0049] The "organ" is not limited as long as it is an organ present in the body of a mammal
or bird as described above. For example, in the case of a mammal, the organ is at
least one selected from circulatory system organs (heart, artery, vein, lymph duct,
etc.), respiratory system organs (nasal cavity, paranasal sinus, larynx, trachea,
bronchi, lung, etc.), gastrointestinal system organs (lip, cheek, palate, tooth, gum,
tongue, salivary gland, pharynx, esophagus, stomach, duodenum, jejunum, ileum, cecum,
appendix, ascending colon, transverse colon, sigmoid colon, rectum, anus, liver, gallbladder,
bile duct, biliary tract, pancreas, pancreatic duct, etc.), urinary system organs
(urethra, bladder, ureter, kidney), nervous system organs (cerebrum, cerebellum, mesencephalon,
brain stem, spinal cord, peripheral nerve, autonomic nerve, etc.), female reproductive
system organs (ovary, oviduct, uterus, vagina, etc.), breast, male reproductive system
organs (penis, prostate, testicle, epididymis, vas deferens), endocrine system organs
(hypothalamus, pituitary gland, pineal body, thyroid gland, parathyroid gland, adrenal
gland, etc.), integumentary system organs (skin, hair, nail, etc.), hematopoietic
system organs (blood, bone marrow, spleen, etc.), immune system organs (lymph node,
tonsil, thymus, etc.), bone and soft tissue organs (bone, cartilage, skeletal muscle,
connective tissue, ligament, tendon, diaphragm, peritoneum, pleura, adipose tissue
(brown adipose, white adipose), etc.), and sensory system organs (eyeball, palpebra,
lacrimal gland, external ear, middle ear, inner ear, cochlea, etc.). Preferably, the
"organ" can be at least one selected from bone marrow, pancreas, skull bone, liver,
skin, brain, pituitary gland, adrenal gland, thyroid gland, spleen, thymus, heart,
lung, aorta, skeletal muscle, testicle, epididymal fat, eyeball, ileum, stomach, jejunum,
large intestine, kidney, and parotid gland. Preferably, all of bone marrow, pancreas,
skull bone, liver, skin, brain, pituitary gland, adrenal gland, thyroid gland, spleen,
thymus, heart, lung, aorta, skeletal muscle, testicle, epididymal fat, eyeball, ileum,
stomach, jejunum, large intestine, kidney, and parotid gland are used in the prediction
according to this disclosure. The term "multiple organs" is not limited as long as
the number of organs is two or more. For example, the multiple organs can be selected
from 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
or 24 types of organs.
[0050] The term "organ-derived" is intended to mean, for example, being collected from an
organ, or being cultured from cells, tissues or a body fluid of a collected organ.
[0051] The term "body fluid" includes, for example, serum, plasma, urine, spinal fluid,
ascites, pleural effusion, saliva, gastric juice, pancreatic juice, bile, milk, lymph
and intercellular fluid.
[0052] The term "biomarker" means a biological substance that can be varied in the cells
or tissues of each organ and/or in a body fluid depending on the administration of
the substance. An example of a biological substance that may serve as a "biomarker,"
is at least one selected from nucleic acids; carbohydrates; lipids; glycoproteins;
glycolipids; lipoproteins; amino acids, peptides; proteins; polyphenols; chemokines;
at least one metabolic substance selected from the group consisting of ultimate metabolites,
intermediary metabolites and synthetic raw material substances of the above-mentioned
substances; metal ions and so on. More preferred is a nucleic acid. The biomarker
is preferably a group of biological substances that are varied in the cells or tissues
of each organ and/or in a body fluid depending on the administration of the substance.
An example of a group of biological substances can be a group of at least one kind
selected from nucleic acids; carbohydrates; lipids; glycoproteins; glycolipids; lipoproteins;
amino acids, peptides; proteins; polyphenols; chemokines; at least one metabolic substance
selected from the group consisting of ultimate metabolites, intermediary metabolites
and synthetic raw material substances of the above-mentioned substances; metal ions
and so on.
[0053] The term "nucleic acids" preferably means a group of RNAs contained in transcriptome,
such as mRNAs, non-coding RNAs and microRNAs, more preferably a group of mRNAs. The
RNAs are preferably mRNAs, non-coding RNAs and/or microRNAs that may be expressed
in the cells or tissues of the above organs or cells in a body fluid, more preferably
mRNAs, non-coding RNAs and/or microRNAs that may be detected by RNA-Seq or the like
(https://www.ncbi.nlm.nih.gov/gene?LinkName=genome_gene&from_uid=52, http://jp.support.illumina.com/sequencing/sequencing_software/igenome.html).
Preferably, all RNAs that can be analyzed by RNA-Seq are used for the prediction according
to this disclosure.
[0054] The term "set of data indicating the dynamics of a biomarker" is intended to mean
a set of data indicating that the biomarker has or has not been varied in response
to the administration of an existing substance. Preferably, the dynamics of a biomarker
indicates that the biomarker has been varied in response to the administration of
an existing substance. The data can be acquired by, for example, the following method.
For tissues, cells or body fluids derived from certain organs collected from non-human
animals to which an existing substance has been administered, the abundance or concentration
of each biomarker is measured to acquire a measurement value for each organ of the
individuals to which the existing substance has been administered. Also, from non-human
animals to which the existing substance has not been administered, the abundance or
concentration of each biomarker is measured for tissues, cells or body fluids derived
from organs corresponding to the organs from which measurement values of the individuals
to which the existing substance has been administered were acquired in the same manner
to acquire measurement values in non-administered individuals. The measurement values
of each biomarker derived from each organ of the individuals to which the existing
substance has been administered are compared with the measurement values in non-administered
individuals of the biomarker for each organ corresponding to the biomarkers in the
individuals to which the existing substance has been administered to acquire values
indicating the differences therebetween as data. Here, the term "corresponding to"
means that the organs and biomarkers are the same or of the same type. Preferably,
the differences can be represented as ratios (such as quotients) of the measurement
values of respective biomarkers derived from the individuals to which the existing
substance has been administered to the measurement values of biomarkers corresponding
to the above biomarkers in the non-administered individuals. For example, the data
includes quotients obtained by dividing the measurement values of biomarker A in organs
A derived from individuals to which the existing substance has been administered by
the measurement values of biomarker A in organs A derived from non-administered individuals.
[0055] When the biomarker is transcriptome, all RNAs that can be analyzed by RNA-Seq may
be used. Alternatively, the RNAs may be analyzed for their expression, and divided
into subsets (modules) of data indicating the dynamics of each RNA with which the
organ name and the gene name are linked using, for example, WGCNA (https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/).
For each module divided by means of WGCNA, a Pearson's correlation coefficient with
1-of-K representation may be calculated for each existing substance to select a module
with the highest absolute value of the correlation coefficient for each existing substance,
and the RNA in each organ included in the selected module may be used as a biomarker.
[0056] Further, when the biomarker in response to the administration of an existing substance
is transcriptome, the variation in transcriptome in each organ of the animals to which
the existing substance has been administered compared with that of the animals to
which the existing substance has not been administered can be obtained using DESeq2
analysis. For example, the expression levels of RNAs in each organ collected from
animals to which the existing substance has been administered and the expression levels
of genes in each corresponding organ collected from animals to which the existing
substance has not been administered are quantified by htseq-count to obtain respective
count data. Then, respective organs and the expression levels of respective genes
in respective organs are compared. As a result of the comparison, a log
2 (fold) value of the variation in gene expression in the animals to which the existing
substance has been administered and a p-value, which serves as an index of the probability
of each variation, are output for each gene in each organ. Based on the log
2 (fold) value, it is possible to determine whether or not the dynamics of a biomarker
such as transcriptome is present.
[0057] The measurement values of a biomarker can be acquired by a known method. When the
biomarker is a nucleic acid, the measurement values can be acquired by sequencing
such as RNA-Seq, quantitative PCR, or the like. When the biomarker is a carbohydrate,
lipid, glycolipid, amino acid, polyphenols; chemokine; at least one metabolic substance
selected from the group consisting of ultimate metabolites, intermediary metabolites
and synthetic raw material substances of the above-mentioned substances or the like,
the measurement values can be acquired by, for example, mass spectrometry. When the
biomarker is a glycoprotein, lipoprotein, peptide, protein or the like, the measurement
values can be acquired by, for example, an ELISA (Enzyme-Linked Immuno Sorbent Assay)
method. The method for collecting tissues, cells or body fluids derived from organs
for use in the measurement and the preprocessing method for the measurement of a biomarker
are also known.
[0058] The "test substance" is a substance to be evaluated for its actions. The test substance
may be an existing substance, an equivalent of an existing substance or a new substance.
In the prediction method, even when the relationship between an action of the test
substance and an action of an existing substance or an equivalent substance of an
existing substance has not been found, it is possible to predict an action of the
test substance on humans. On the other hand, when the test substance is one selected
from existing substances or equivalents of existing substances, unknown action of
the existing substance or equivalent of the existing substance can be found. The unknown
action may be one action or multiple actions. The unknown action is preferably a new
indication. By predicting a new indication for a test substance in humans, drug repositioning
can be also achieved. Administration of a test substance to non-human animals is known.
Also, the data indicating the dynamics of a biomarker in one or multiple organs collected
from non-human animals to which a test substance has been administered can be acquired
in the same manner as the data indicating the dynamics of a biomarker in one or multiple
organs collected from non-human animals to which an existing substance has been administered.
2. Construction of artificial intelligence model
2-1. Generation of training data
(1) Generation of first training data set
[0059] A first training data set is constituted of a set of data indicating the dynamics
of a biomarker in one organ or each of multiple different organs, and the labels indicating
the names of existing substances. The one organ or the multiple different organs may
be collected from respective non-human animals to which multiple existing substances
with a known action on humans have been individually administered. The first training
data set may be stored in an auxiliary storage part 104 of a training device 10 shown
as a database TR1 in FIG. 5.
[0060] The set of data indicating the dynamics of a biomarker in one organ or each of multiple
different organs may be acquired by a method described in Section 1.(4) above.
[0061] Each item of the data indicating the dynamics of a biomarker in the respective organs
may be linked with information about the names of existing substances administered,
information about the names of organs collected, information about the names of biomarkers,
and so on. The "information about the names" may be the names themselves or labels
of abbreviated names or the like, or may be label values corresponding to respective
names.
[0062] Each item of data included in the set of data indicating the dynamics of a biomarker
serves as an element that constitutes a matrix in a first training data set for an
artificial intelligence model, which is described later. When the biomarker is transcriptome,
the expression level of each RNA corresponds to data included in the set of data indicating
the dynamics of a biomarker, and serves as an element of a matrix constituting the
first training data set. For example, when the biomarker is transcriptome, a log
2 (fold) value of each existing substance obtained by DESeq2 analysis may be used as
each element of the first training data set.
[0063] An example of the first training data set is as shown in Section 1.(1) above and
FIG. 3(A).
[0064] As a set of data indicating the dynamics of a biomarker, measurement values of the
biomarker may be directly used as elements of the first training data set, or may
be subjected to standardization, dimensionality reduction or the like before being
used as elements of the first training data set. An example of a standardization method
is to transform data indicating expression differences such that the mean value is
0 and the variance is 1, for example. The mean value in the standardization can be
the mean value in each organ, the mean value in each gene, or the mean value of all
data. Also, the dimensionality reduction can be achieved by statistical processing
such as a principal component analysis. The parent population in performing statistical
processing can be set for each organ, for each gene, or for all data. For example,
when the biomarker is transcriptome, only the genes having a p-value not greater than
a predetermined value relative to a log
2 (fold) value of each existing substance obtained by DESeq2 analysis may be used as
the elements of the first training data set. The predetermined value can be 10
-3 or 10
-4, for example. Preferred is 10
-4.
[0065] The labels indicating respective names of the predetermined existing substances administered
included in the first training data set may be the names of the substances themselves
or may be encoded.
[0066] The first training data set may be updated in response to the update of the existing
substances or the addition of data indicating the dynamics of a new biomarker.
(2) Generation of second training data set
[0067] As shown in Section 1.(1) above and FIG. 3(B), the second training data set is generated
by linking labels indicating respective names of multiple predetermined existing substances
administered to non-human animals to generate the first training data set with labels
indicating the indications reported for each of the multiple predetermined existing
substances. For the indications for existing substances, by conducting, for each existing
substance, a search for a word indicating the name of the existing substance, for
example, labels of the names of indications corresponding thereto can be acquired
from an external database such as FAERS, all drug labels of DAILYMED, Medical Subject
Headings, Drugs@FDA, or International Classification of Diseases as described in Section
1.(4) above. There may be one indication or two or more indications per existing substance.
When there are two or more indications per existing substance, the two or more multiple
indications constitute the second training data set. The labels indicating the indications
reported for each of the multiple predetermined existing substances can be acquired
by performing text extraction, natural language processing, digitize processing, image
analysis processing or the like on the data set stored in a database. For example,
when labels indicating the names of respective indications corresponding to respective
existing substances administered to non-human animals to generate the first training
data set which are stored in an external database are registered as inserts in texts,
syntactic analysis, word segmentation, semantic analysis or the like may be performed
on the registered texts by natural language processing before the extraction of the
texts corresponding to actions.
(3) Generation of third training data set
[0068] As described in Section 1.(1) above and FIG. 3(C), the third training data is a set
of data in which the labels indicating indications reported for each of the multiple
predetermined existing substances administered to acquire the first training data
set as shown in FIG. 3(B) are linked with information about adverse events reported
correspondingly to each of these indications. For the indications reported for each
of the multiple predetermined existing substances, by conducting, for each existing
substance, a search for a word of the name of the existing substance, for example,
labels of the names of indications corresponding thereto can be acquired from an external
database such as FAERS, all drug labels of DAILYMED, Medical Subject Headings, Drugs@FDA,
or International Classification of Diseases. The labels indicating adverse events
reported correspondingly to each of these indications can be acquired from an external
database such as FAERS or clinicaltrials.gov by conducting a search for labels indicating
the indication names. Also, when labels indicating the names of indications or adverse
events are registered as inserts in texts, syntactic analysis, word segmentation,
semantic analysis or the like may be performed on the registered texts by natural
language processing before the extraction of the texts corresponding to actions.
[0069] The frequencies of occurrence of adverse events can be calculated by a method described
in Section 1.(4) above.
(4) Generation of fourth training data set
[0070] As described in Section 1.(1) above and FIG. 3(D), the fourth training data set is
generated by substituting the frequencies of occurrence of adverse events reported
for indications corresponding to the labels indicating the names of existing substances
administered to acquire the first training data (the frequencies of occurrence of
adverse events in the second and subsequent columns shown in FIG. 3(C)) into the parts
of the labels indicating the drug names included in the first training data set (the
first column of FIG. 3(A) indicating the drug names).
2-2. Input of training data into artificial intelligence model
[0071] The artificial intelligence model is not limited as long as the problem associated
with the present invention can be solved. In this embodiment, the use of an artificial
intelligence model that can perform Link Prediction is preferred. Examples of such
an artificial intelligence model include One-Class SVM (One-Class support vector machine).
[0072] An example of inputting fourth training data is described using a case where Link
Prediction is performed with a One-Class SVM as an example. The data to be input into
the One-class SVM are input into the One-class SVM as a fourth training data set obtained
by associating the first training data set with the third training data set by a kernel
function of the following equation:

[0073] Here, <·,·> denotes an operator that scales each vector such that 12 norms are equal
to 1 and takes an inner product between both the scaled vectors.
[0074] As the One-class SVM, 'scikit-learn' package from Python, for example, may be used
with a parameter nu=0.1.
2-3. System for training artificial intelligence model
[0075] FIG. 4(A) illustrates a hardware configuration of a training system 50. The training
system 50 includes a measurement part 30, which is a next-generation sequencer or
the like, for acquiring measurement data of a biomarker, and a training device 10.
The training device 10 may be communicably connected to the measurement part 30 by
a wireless or wired network, or may acquire data acquired by the measurement part
30 via a storage medium such as a CD-R.
(1) Device for training artificial intelligence model
[0076] The training of the artificial intelligence model can be carried out using the training
device 10 (which may be hereinafter referred to also as "device 10"), for example.
[0077] In the description of the device 10 and the processing in the device 10, for the
terms that are common to those described in Sections 1. and 2-1. above, the above
description is incorporated here.
[0078] The device 10 includes at least a processing part 101 and a storage part. The storage
part is constituted of a main storage part 102 and/or an auxiliary storage part 104.
[0079] FIG. 5 illustrates a hardware configuration of the device 10. The device 10 may be
connected to an input part 111, and an output part 112 and a storage medium 113. The
device 10 may be also connected to a measurement part 30, which is a next-generation
sequencer, a mass spectrometer or the like. Also, the device 10 may be communicably
connected to an external database 60 such as FAERS, all drug labels of DAILYMED, Medical
Subject Headings, Drugs@FDA, International Classification of Diseases or clinicaltrials.gov.
[0080] In the device 10, the processing part 101, the main storage part 102, a ROM (read
only memory) 103, the auxiliary storage part 104, a communication interface (I/F)
105, an input interface (I/F) 106, an output interface (I/F) 107 and a media interface
(I/F) 108 are connected for mutual data communication by a bus 109.
[0081] The processing part 101 is constituted of a CPU, MPU or the like. The processing
in the processing part 101 may be assisted by a GPU. The processing part 101 executes
a computer program stored in the auxiliary storage part 104 or the ROM 103, and processes
the acquired data, whereby the device 10 functions. The processing part 101 acquires
a data set indicating the dynamics of a biomarker in multiple different organs collected
from non-human animals to which an existing substance has been administered as described
in Section 1. above and known actions of the existing substance on humans as training
data. Also, the processing part 101 trains an artificial intelligence model using
the two types of training data.
[0082] The ROM 103 is constituted of a mask ROM, a PROM, an EPROM, an EEPROM or the like,
and stores computer programs that are executed by the processing part 101 and data
that are used thereby. The ROM 103 stores a boot program that is executed by the processing
part 101 when the device 10 is started up, programs and settings relating to the operation
of the hardware of the device 10, and so on.
[0083] The main storage part 102 is constituted of a RAM (Random access memory) such as
an SRAM or DRAM. The main storage part 102 is used to read out the computer programs
stored in the ROM 103 and the auxiliary storage part 104. The main storage part 102
is also utilized as a workspace when the processing part 101 executes these computer
programs. The main storage part 102 temporarily stores training data or the like acquired
via a network, functions of the artificial intelligence model read out by the auxiliary
storage part 104, and so on.
[0084] The auxiliary storage part 104 is constituted of a hard disk, a semiconductor memory
element such as a flash memory, an optical disc, or the like. In the auxiliary storage
part 104, various computer programs to be executed by the processing part 101 and
various setting data for use in executing the computer programs are stored. Specifically,
the auxiliary storage part 104 stores operation software (OS) 1041, a training program
TP, an artificial intelligence model database AI1, a database TR1 for storing a first
training data set, a database TR2 for storing a second training data set, a database
TR3 for storing a third training data set in a non-volatile manner. The training program
TP performs processing for training an artificial intelligence as described later
in cooperation with the operation software (OS) 1041.
[0085] The communication I/F 105 is constituted of a serial interface such as a USB, IEEE1394
or RS-232C, a parallel interface such as a SCSI, IDE or IEEE1284, and an analog interface
constituted of a D/A converter, A/D converter or the like, a network interface controller
(NIC) and so on. The communication I/F 105 functions as a communication part 105,
and, under the control of the processing part 101, receives data from the measurement
part 30 or other external devices, and, when necessary, transmits information stored
in or generated by the device 10 to the measurement part 30 or to the outside, or
displays it. The communication I/F 105 may communicate with the measurement part 30
or other external devices (not shown, e.g., other computers or cloud systems) via
a network.
[0086] The input I/F 106 is constituted of a serial interface such as a USB, IEEE1394 or
RS-232C, a parallel interface such as an SCSI, IDE or IEEE1284, an analog interface
constituted of a D/A converter, A/D converter or the like, and so on. The input I/F
106 accepts character input, clicks, sound input or the like from the input part 111.
The accepted inputs are stored in the main storage part 102 or the auxiliary storage
part 104.
[0087] The input part 111 is constituted of a touch panel, keyboard, mouse, pen tablet,
microphone or the like, and performs character input or sound input into the device
10. The input part 111 may be externally connected to the device 10 or may be integrated
with the device 10.
[0088] The output I/F 107 is constituted, for example, of an interface similar to that for
the input I/F 106. The output I/F 107 outputs information generated by the processing
part 101 to the output part 112. The output I/F 107 outputs information generated
by the processing part 101 and stored in the auxiliary storage part 104 to the output
part 112.
[0089] The output part 112 is constituted, for example, of a display, a printer or the like,
and displays measurement results transmitted from the measurement part 30, various
operation windows in the device 10, respective items of training data, an artificial
intelligence model, and so on.
[0090] The media I/F 108 reads out, for example, application software or the like stored
in the storage medium 113. The read out application software or the like is stored
in the main storage part 102 or the auxiliary storage part 104. Also, the media I/F
108 writes information generated by the processing part 101 into the storage medium
113. The media I/F 108 writes information generated by the processing part 101 and
stored in the auxiliary storage part 104 into the storage medium 113.
[0091] The storage medium 113 is constituted of a flexible disk, a CD-ROM, a DVD-ROM or
the like. The storage medium 113 is connected to the media I/F 108 by a flexible disk
drive, a CD-ROM drive, a DVD-ROM drive or the like. An application program or the
like for a computer to execute an operation may be stored in the storage medium 113.
[0092] The processing part 101 may acquire application software and various settings necessary
for control of the device 10 via a network instead of reading them out of the ROM
103 or the auxiliary storage part 104. It is also possible that the application program
is stored in an auxiliary storage part of a server computer on a network and the device
10 accesses this server computer to download the computer program and stores it in
the ROM 103 or the auxiliary storage part 104.
[0093] Also, in the ROM 103 or the auxiliary storage part 104, an operation system that
provides a graphical user interface environment, such as Windows (trademark) manufactured
and sold by Microsoft Corporation in the United States, has been installed, for example.
An application program according to a second embodiment shall operate on the operating
system. In other words, the device 10 may be a personal computer or the like.
(2) Processing for training artificial intelligence model
[0094] Referring to FIG. 6, the flow of processing for training an artificial intelligence
model by the training program TP is described.
[0095] The processing part 101 accepts a command to start processing input by an operator
through the input part 111, and, in step S1, acquires the first training data set,
the second training data set and the third training data set from the first training
data set database TR1, the second training data set database TR2, the third training
data set database TR3, respectively, stored in the auxiliary storage part 104.
[0096] Next, the processing part 101 accepts a command to start generation of a fourth training
data set input by the operator through the input part 111, and, in step S2, generate
a fourth training data set.
[0097] Next, the processing part 101 accepts a command to input the fourth training data
set input by the operator through the input part 111, and, in step S3, inputs the
fourth training data set into the artificial intelligence model AI1 to train the artificial
intelligence model.
[0098] The processing part 101 stores the trained artificial intelligence model in the auxiliary
storage part 104.
[0099] The transition from one step to another may be made according to a command input
by the operator, or may be triggered by the completion of the previous step so that
the processing part 101 can make the transition automatically.
[0100] In the training processing, for the terms and explanations that are common to those
described in Sections 1. and 2-1. above, the above description is incorporated here.
3. Prediction of indication by artificial intelligence model
3-1. Generation of test data
(1) Generation of first test data set
[0101] A first test data set is a set of data indicating the dynamics of a biomarker in
one organ or each of multiple different organs, and may be acquired from one or multiple
organs corresponding to one organ or multiple different organs from which the first
training data have been acquired. The set of data indicating the dynamics of a biomarker
in respective organs may be acquired in the same manner as a data set indicating the
dynamics of a biomarker that is used as first training data by a method as described
in Section 1.(4) above.
(2) Generation of second test data set
[0102] As described in Section 1.(2) above, the second test data is a set of data in which
labels of multiple known indications are linked with information about adverse events
reported correspondingly to each of the multiple known indications. The labels of
multiple known indications and labels indicating adverse events reported correspondingly
to each of these indications can be acquired from an external database such as FAERS
or clinicaltrials.gov by conducting a search for labels indicating the indication
names. Also, when the labels indicating the names of indications or adverse events
are registered as inserts in texts, syntactic analysis, word segmentation, semantic
analysis or the like may be performed on the registered texts by natural language
processing before the extraction of the texts corresponding to actions.
[0103] The frequencies of occurrence of adverse events can be calculated by a method described
in Section 1.(4) above.
3-2. Prediction system 1
[0104] FIG. 4(A) illustrates a hardware configuration of a prediction system 51. The prediction
system 51 includes a measurement part 30, which is a next-generation sequencer or
the like, for acquiring measurement data of a biomarker, and a prediction device 20.
The prediction device 20 may be connected to the measurement part 30 via a wireless
or wired network, or may acquire data acquired by the measurement part 30 via a storage
medium such as a CD-R.
(1) Indication prediction device
[0105] The prediction of indications can be achieved using the prediction device 20 (which
may be hereinafter referred to simply as "device 20"), for example,
[0106] In the description of the device 20 and the processing in the device 20, for the
terms that are common to those described in Sections 1. and 2-1. above, the above
description is incorporated here.
[0107] FIG. 7 illustrates a hardware configuration of the prediction device 20 (which may
be hereinafter referred to also as "device 20"). The device 20 includes at least a
processing part 201 and a storage part. The storage part is constituted of a main
storage part 202 and/or an auxiliary storage part 204. The device 20 may be connected
to an input part 211, an output part 212, and a storage medium 213. Also, the device
20 may be connected to the measurement part 30, which is a next-generation sequencer,
a mass spectrometer or the like.
[0108] In the device 20, the processing part 201, the main storage part 202, a ROM (read
only memory) 203, the auxiliary storage part 204, a communication interface (I/F)
205, an input interface (I/F) 206, an output interface (I/F) 207, and a media interface
(I/F) 208 are connected for mutual data communication by a bus 209.
[0109] Because the basic hardware configuration of the device 20 is the same as that of
the device 10, the description in Section 2-3.(1) above is incorporated here. The
communication interface 205 functions as a communication part 205.
[0110] However, in the auxiliary storage part 204 of the device 20, operation software (OS)
2041, a prediction program PP, a trained artificial intelligence model AI2, a database
TS1 for storing a first test data set, and a database TS2 for storing a second test
data set are stored in a non-volatile manner in place of the operation software (OS)
1041, the training program TP, the artificial intelligence model All, the database
TR1 for storing a first training data set, the database TR2 for storing a second training
data set and the database TR3 for storing a third training data set. The prediction
program PP performs indication prediction processing as described later in cooperation
with the operation software (OS) 2041.
(2) Processing for predicting indication
[0111] Referring to FIG. 8, the flow of processing for predicting an indication by the prediction
program PP is described.
[0112] The processing part 201 accepts a command to start processing input by an operator
through the input part 211, and, in step S51, acquires the first test data set and
the second test data set stored in the auxiliary storage part 204.
[0113] Next, the processing part 201 accepts a command to start prediction input by the
operator through the input part 211, and, in step S52, inputs the first test data
set database TS1, the second test data set database TS2, the first test data set and
the second test data set into the trained artificial intelligence model AI2 to predict
an indication for the test substance.
[0114] At this time, the trained artificial intelligence model AI2 determines one by one
whether or not the test substance of interest is effective against all the indications
input as the second test data individually. Specifically, the trained artificial intelligence
model AI2 determines whether or not there is a link between the drug of interest and
the individual indications in an LP problem.
[0115] Next, the processing part 201 stores the results in the storage part. A result that
the processing part 201 derives from the trained artificial intelligence model AI2
is a label "1" if the test substance is effective against a certain indication and
a label "-1" if the test substance is not effective against a certain indication.
[0116] In other words, the indications marked with "1" are indications predicted for the
test substance.
[0117] Further, when the artificial intelligence model is a One-Class SVM, decision function
values, which indicate the reliability of the prediction, are calculated. When many
indications are output as prediction results, it is possible to predict that a higher
value indicates a more likely indication. Also, when many indications are output as
prediction results, prediction may be made in the same manner using data indicating
the dynamics of transcriptome in one or multiple organs collected after the administration
of a drug with an action mechanism similar to that of the test substance of interest
as a test substance. Then, the indications found to be common to the prediction result
for the test substance of interest and the prediction result for the other test substance
with a similar action mechanism by comparison therebetween may be used as prediction
results.
3-3. Prediction system 2
[0118] FIG. 4(B) shows the configuration of a prediction system 400.
[0119] The prediction system 400 is communicably connected to a measurement part 30, the
training device 10, the prediction device 20, and a server device 40 that transmits
a data set indicating the dynamics of a biomarker. The training device 10 and the
prediction device 20 acquire data acquired by the measurement part 30 via the server
device 40.
(1) Server device
[0120] Regarding the server device 40 (which may be hereinafter referred to simply as "device
40"), for the terms that are common to those described in Sections 1. and 2-1. above,
the above description is incorporated here.
[0121] FIG. 9 shows a hardware configuration of the server device 40 (which may be referred
to also as "device 40"). The device 40 includes at least a processing part 401 and
a storage part. The storage part is constituted of a main storage part 402 and/or
an auxiliary storage part 404. The device 40 may be connected to an input part 411,
an output part 412 and a storage medium 413. Also, the device 40 may be communicably
connected to a measurement part 30, which is a next-generation sequencer, a mass spectrometer
or the like, by a wired or wireless network.
[0122] In the device 40, the processing part 401, the main storage part 402, a ROM (read
only memory) 403, the auxiliary storage part 404, a communication interface (I/F)
405, an input interface (I/F) 406, an output interface (I/F) 407 and a media interface
(I/F) 408 are connected for mutual data communication by a bus 409.
[0123] Because the basic hardware configuration of the device 40 is the same as that of
the device 10, the description in Section 2-3.(1) above is incorporated here. The
communication interface 405 functions as a communication part 405.
[0124] However, in the auxiliary storage part 404 of the device 40, operation software (OS)
4041, a database TS1 for storing a first test data set are stored in a non-volatile
manner in place of the operation software (OS) 1041, the training program TP, the
artificial intelligence model AI1, the database TR1 for storing a first training data
set, the database TR2 for storing a second training data set and the database TR3
for storing a third training data set.
(2) Operation of prediction system 2
[0125] Referring to FIG. 10, the operation of the prediction system is described.
[0126] Here, the sequential flow from the acquisition of measurement values of a biomarker
by the measurement part 30 to the output of prediction results is described.
[0127] In step S81, the measurement part 30 acquires measurement values of a biomarker in
each organ of non-human animals to which an existing substance has been administered.
The acquisition of measurement values in the measurement part 30 may be made in response
to the input of a command to start measurement by an operator. In step S82, the measurement
part 30 transmits the acquired measurement values to the server device 40. The transmission
processing may be performed in response to the input of a command to start transmission
by the operator.
[0128] In step S83, the processing part 401 of the server device 40 acquires the measurement
values via the communication I/F 405. At this time, the communication I/F 405 functions
as a communication part.
[0129] In step S84, in response to a command to start acquisition of measurement values
input by the operator through the input part 111 of the training device 10, the processing
part 101 of the training device 10 transmits a signal for starting transmission of
measurement values from the communication I/F 105 to the server device 40. The processing
part 401 of the server device 40 accepts the input for starting transmission of measurement
values via the communication I/F 405, and starts transmission of the measurement values
from the communication I/F 405. At this time, the communication I/F 105 and the communication
I/F 405 function as a communication part 105 and a communication part 405, respectively.
[0130] In step S85, the processing part 101 of the training device 10 acquires information
about indications for the existing substance administered to non-human animals and
adverse events corresponding to the indications from the external database 60 via
the communication I/F 105.
[0131] Also, in step S84, the processing part 101 of the training device 10 acquires the
measurement values transmitted from the server device 40 via the communication I/F
105 (step S86), and stores the measurement values in the storage part of the training
device 10. Step S86 may be performed prior to step S85.
[0132] Next, in step S87 of FIG. 10, the processing part 101 of the training device 10 generates
a first training data set, a second training data set and a third training data set
in accordance with the processing shown in step S1 of FIG. 6. The description of step
S1 in FIG. 6 is incorporated here.
[0133] Next, in step S88 of FIG. 10, the processing part 101 of the training device 10 generates
a fourth training data set from the first training data set, the second training data
set and the third training data set in accordance with the processing shown in step
S2 of FIG. 6. The description of step S2 in FIG. 6 is incorporated here.
[0134] Next, in step S89 of FIG. 10, the processing part 101 of the training device 10 inputs
the fourth training data set into an artificial intelligence model in accordance with
the processing shown in steps S3 to S4 of FIG. 6 to train the artificial intelligence
model, and stores the trained artificial intelligence model in the storage part. The
description of steps S3 to S4 of FIG. 6 is incorporated here.
[0135] After accepting a command to start transmission of the artificial intelligence model
from the prediction device 20, the processing part 101 of the training device 10 transmits
the trained and stored artificial intelligence model to the prediction device 20 via
the communication I/F 105 in step S90 of FIG. 10. At this time, the communication
I/F 105 functions as a communication part 105.
[0136] Next, in step S91, the measurement part 30 acquires measurement values of a biomarker
in each organ of non-human animals to which the test substance has been administered.
The acquisition of measurement values in the measurement part 30 may be made in response
to the input of a command to start measurement by the operator. In step S92, the measurement
part 30 transmits the acquired measurement values to the server device 40. The transmission
processing may be performed in response to the input of a command to start transmission
by the operator.
[0137] In step S93, the processing part 401 of the server device 40 acquires the measurement
values via the communication I/F 405. At this time, the communication I/F 405 functions
as a communication part 405.
[0138] In step S94, in response to a command to start acquisition of measurement values
input by the operator through the input part 211 of the prediction device 20, the
processing part 201 of the prediction device 20 transmits a signal for starting transmission
of measurement values from the communication I/F 205 to the server device 40. The
processing part 401 of the server device 40 accepts the input for starting transmission
of measurement values via the communication I/F 405, and starts transmission of the
measurement values from the communication I/F 405. At this time, the communication
I/F 205 and the communication I/F 405 function as a communication part. The processing
part 201 of the prediction device 20 acquires the measurement values via the communication
I/F 205, and stores the measurement values in the storage part of the prediction device
20. Subsequently, the processing part 201 of the prediction device 20 generates a
first test data set. The first test data set is generated as described in Section
2-4.(1) above.
[0139] Next, in step S95, the processing part 201 of the prediction device 20 transmits
a command to start transmission of an artificial intelligence model to the training
device 10 via the communication I/F 205. When accepting the command to start transmission
of an artificial intelligence model from the prediction device 20, the processing
part 101 of the training device 10 transmits the trained artificial intelligence model
to the prediction device 20 via the communication I/F 105 of the training device 10.
The prediction device 20 acquires the trained artificial intelligence model via the
communication I/F 205. Step S95 may be performed prior to step S94.
[0140] Next, in step S96, the processing part 201 of the prediction device 20 inputs the
first test data generated in step S94 and the second test data stored in the storage
part into the trained artificial intelligence model AI2 acquired in step S95, and
predicts actions of the test substance on humans in accordance with step S52 of FIG.
8. In step S97, the processing part 201 of the prediction device 20 outputs the results.
Alternatively, in steps S94 to 97 of FIG. 10 the processing part 201 of the prediction
device 20 may predict a prediction result relating to new indications for an existing
substance.
(3) Method for constructing prediction system
[0141] A method for constructing a prediction system includes the step of preparing the
training device 10 and the prediction device 20. The constructing method may further
include the step of preparing measurement values of a biomarker in one or multiple
organs of non-human animals to which an existing substance has been administered,
or measurement values of a biomarker in one or multiple organs of non-human animals
to which a test substance has been administered.
4. Computer program
4-1. Training program
[0142] The training program TP is a computer program that causes a computer to execute the
processing including steps S1 to S4 of FIG. 6 described above in connection with the
training of an artificial intelligence model to cause the computer to function as
the training device 10.
4-2. Prediction program
[0143] The prediction program PP is a computer program that causes a computer to execute
the processing including steps S51 to S53 described above in connection with prediction
of actions of a test substance to cause the computer to function as the prediction
device 20.
5. Storage medium having computer programs stored therein
[0144] This section relates to a storage medium having the above computer programs stored
therein. The computer programs are stored in a storage medium such as a hard disk,
a semiconductor memory element such as a flash memory, or an optical disc. Also, the
computer programs may be stored in a storage medium connectable via a network such
as a cloud server. The computer programs may be program products in a downloadable
form or stored in a storage medium.
[0145] The storage format of the programs in the storage medium is not limited as long as
a device as described above can read the programs. The storage in the storage medium
is preferably in a non-volatile manner.
6. Modifications
[0146] In Section 2. above, an embodiment is shown in which the training device 10 and the
prediction device 20 are different computers. However, one computer may perform training
of an artificial intelligence model and prediction.
[0147] In this specification, the same reference numeral attached to hardware indicates
the same part or same function.
Examples
[0148] Examples are shown below to describe the present invention in more detail. However,
the present invention should not be construed as being limited to the following embodiments.
[0149] The following animal experiments were conducted on approval of the Ethics Committee
of Karydo TherapeutiX, Inc.
Experimental example I. Gene expression analysis in drug-administered mice
1-1. Preparation of drug-administered mice, and gene expression analysis
1. Administration of drugs
(1) Alendronate
[0150] A solution of alendronate sodium salt trihydrate (Wako) in PBS (NACALAI TESQUE, INC.)
was subcutaneously injected to 11-week old male C57BL/6N mice in a dose of 1.0 mg/kg
every 3 or 4 days for 8 days. The drug was newly prepared for each administration.
Each organ was collected in the afternoon of the eighth day after the drug administration.
(2) Acetaminophen
[0151] 10-week old male C57BL/6N mice were fasted for 12 hours, during which they were allowed
to take water freely. Immediately after the fasting period, acetaminophen (Wako) dissolved
in saline (Otsuka Pharmaceutical Co., Ltd.) was intraperitoneally administered to
the mice in a single dose of 300 mg/kg. After the administration, the mice were allowed
to take normal diet freely. The administration was done by noon, and organs were collected
two hours after the administration.
(3) Aripiprazole
[0152] A solution of aripiprazole (Sigma-Aldrich) in a 0.5% (w/v) carboxymethyl cellulose
400 solution (Wako) was intraperitoneally administered to 11-week old male C57BL/6N
mice in a single dose of 0.3 mg/kg. The drug was administered in the afternoon, and
organs were collected two hours later.
(4) Asenapine
[0153] A solution of asenapine maleate (Chemscene) in saline was subcutaneously administered
to 11-week old male C57BL/6N mice in a single dose of 0.3 mg/kg. The drug was administered
in the afternoon, and organs were collected two hours later.
(5) Cisplatin
[0154] Cisplatin (Bristol-Myers Squibb) was intraperitoneally administered in a single dose
of 20 mg/kg to 11-week old male C57BL/6N mice. Organs were collected in the afternoon
of the third day after the drug administration.
(6) Clozapine
[0155] Clozapine (Sigma-Aldrich) was subcutaneously administered in a single dose of 0.3
mg/kg to 11-week old male C57BL/6N mice. The clozapine was first dissolved in acetic
acid, and then diluted with saline and adjusted to pH 6 with 1M NaOH. Organs were
collected in the afternoon two hours after the drug administration.
(7) Doxycycline
[0156] 9-Week old male C57BL/6N mice were fed with RO water containing 5% sucrose (NACALAI
TESQUE, INC.) and 2 mg/mL of doxycycline hydrochloride n-hydrate (Wako) for 2 weeks.
The RO water containing the drug was replaced with new one every week. Organs were
collected in the afternoon of the 13th day after the drug administration. The negative
control group was fed with RO water containing 5% of sucrose (NACALAI TESQUE, INC.).
(8) Empagliflozin
[0157] Empagliflozin (Toronto research chemical) dissolved in 0.5% carboxymethyl cellulose
was forcibly administered orally to 10-week old male C57BL/6N mice in a daily dose
of 10 mg/kg for 2 weeks. The drug was newly prepared for each administration. Organs
were collected in the afternoon of the 14th day after the start of the drug administration.
(9) Lenalidomide
[0158] Lenalidomide (Wako) was dissolved in a solution containing 0.5% of carboxymethyl
cellulose and 0.25% of Tween-80 (NACALAI TESQUE, INC.), and the solution was forcibly
administered orally to 8-week old male C57BL/6N mice in a daily dose of 50 mg/kg for
69 days. The drug was newly prepared for each administration. Organs were collected
in the afternoon of the 69th day after the start of the drug administration. A solution
containing 0.5% of carboxymethyl cellulose and 0.25% of Tween-80 was administered
to the negative control group.
(10) Lurasidone
[0159] Lurasidone hydrochloride (Medchemexpress) dissolved in a 0.5% carboxymethyl cellulose
solution was forcibly administered orally to 11-week old male C57BL/6N mice in a single
dose of 0.3 mg/kg. Organs were collected in the afternoon two hours after the drug
administration.
(11) Olanzapine
[0160] Olanzapine (Tokyo Chemical Industry Co., Ltd.) dissolved in a 0.5% carboxymethyl
cellulose solution was forcibly administered orally in a single dose of 0.3 mg/kg.
Organs were collected in the afternoon two hours after the drug administration.
(12) Evolocumab (Repatha (trademark))
[0161] Repatha (trademark) (Astellas Pharma Inc.) dissolved in saline was subcutaneously
administered to 11-week old male C57BL/6N mice in a dose of 10 mg/kg every 10 days
for 4 weeks. Organs were collected in the afternoon 4 weeks after the drug administration.
(13) Ricedronate
[0162] Sodium risedronate salt (Cayman Chemical Company) dissolved in PBS was forcibly administered
orally to 11-week old male C57BL/6N mice in a dose of 10 mg/kg every other day for
8 days. The drug was newly prepared for each administration. Organs were collected
in the afternoon of the 8th day after the start of administration.
(14) Sofosbuvir
[0163] Sofosbuvir (LKT) was intraperitoneally administered to 7-week old male C57BL/6N mice
in a daily dose of 20 mg/kg for 10 days. The sofosbuvir was first diluted with DMSO
(NACALAI TESQUE, INC.), and then diluted with PBS at 100-folds before administration
(the final concentration was 1.0% DMSO/PBS). Organs were collected in the afternoon
of the 10th day after the start of administration.
(15) Teriparatide
[0164] Human parathyroid hormone fragment 1-34 (teriparatide) (Sigma-Aldrich) dissolved
in saline was subcutaneously administered to 10-week old male C57BL/6N mice in a daily
dose of 40 µg/kg. Organs were collected in the afternoon of the 14th day after the
start of drug administration. Saline was administered to the negative controls.
(16) Wild type (WT) mice
[0165] The organs were collected in the afternoon from 11-week old male C57BL/6N mice to
which no drug had been administered.
2. Mice and 24-organ transcriptome analysis
(1) Organs
[0166] The experiments using mice, the extraction of organs, and the transcriptome analysis
were performed in accordance with the methods described in Patent Document 1. The
24 organs are adrenal gland, aorta, bone marrow cell (BM), brain, colon, eye, heart,
ileum, jejunum, left kidney, liver, lung, pancreas, parotid gland, pituitary gland,
skeletal muscles, skin, skull, spleen, stomach, left testicle, thymus, thyroid gland,
and sexual gland white adipose tissue (WAT).
[0167] All mice were raised in a temperature-controlled room at approximately 25°C under
alternating 12-hour light and dark cycles and allowed to freely take water and normal
feed (CE-2, CLEA Japan, Inc., Tokyo, Japan).
(2) Transcriptome analysis
[0168] The transcriptome analysis was conducted using QuantSeq 3'mRNA-Seq Library Prep Kit
for Illumina (FWD) (cat#015.384, LEXOGEN) and Illumina NextSeq 500 (75bp single-read,
ca. 400 million reads/run, NextSeq 500/550 High Output Kit v2.5, cat#20024906).
[0169] Differential gene expression data of each organ collected from the mice to which
each drug had been administered were used as characteristics of each drug for machine
learning frame work. RNA-seq data processing (mapping and count of transcription products)
was performed in accordance with the method described in Patent Document 1.
[0170] Mapping of mouse genome was performed on mm10 using TopHat2. Differential gene expression
in each organ of the drug administered groups and the negative control groups (doxycycline
and lenalidomide administered groups) or the WT mouse group (control against groups
to which drugs other than doxycycline and lenalidomide had been administered) was
identified by DESeq2(1.22.1). Each of the drug administered groups, the negative control
groups and the WT mouse group was respectively analyzed with n=2.
3. Examples
[0171] An artificial intelligence model was constructed with Link Prediction (LP) using
a One-class SVM to predict indications for the drugs.
3-1. Training
(1) First training data
[0172] As a characteristic of each drug, genes that showed a change in expression P<0.0001
in each organ were selected. Labels of organs, combinations of the log
2fold values of all genes selected from all organs (24-organ frameworks) or individual
organs (individual organ frameworks) and the organ names, and the name of the drug
administered to acquire the gene expression data were made into a set and used as
first training data.
(2) Second training data
[0173] The labels of the names of drugs administered to mice in Section 1. above and the
labels of the indications for each of the drugs were made into a set and used as second
training data. The names of indications corresponding to the drug names were in accordance
with the FDA Adverse Event Reporting System (FAERS: https://open.fda.gov/data/faers/).
(3) Third training data
[0174] Adverse event report data from 2014 Q2 to 2018 Q1 were downloaded from FAERS (https://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/Adverse
DrugEffects/ucm082193.htm). The words indicating adverse events corresponding to the
names of indications for each drug administered to mice in Section 1. above were extracted
from the report data. One word extracted was regarded as one reported adverse event,
and the frequency of occurrence (%) of each adverse event was respectively calculated
by means of the formula: (the number of cases in which one adverse event was reported
for the name of an indication for one drug)/(the number of all adverse events reported
for the name of an indication for one drug).
(4) Fourth training data
[0175] When the drug names are defined as A and B, for example, g
A and g
B respectively indicate the pattern of transcriptome observed in 24 organs when drugs
A and B are administered (first training data set). Also, when an indication for drug
A and an indication for drug B are represented by "1" and "2," respectively, and elements
of adverse events (AEs) reported for the indication 1 are represented by i, ii ...
N, the vectors of the indication 1 are represented as d
1= (d
1i, d
1ii, ... , d
1N) and d
2= (d
2i, d
2ii, ... , d
2N) (third training data set). Also, because the second training data set includes a
set of a label indicating the name of drug A and a label indicating the name of indication
1 and a set of a label indicating the name of drug B and a label indicating the name
of indication 2, the sets can be represented as g
Ad
1 and g
Bd
2, respectively (second training data set). Here, an indication is regarded as positive
(indicated) when the number of records for drug A taken by patients with indication
1 exceeds 10 in FAERS.
(5) One-class SVM
[0176] As the data to be input into a One-class SVM, a fourth training data set obtained
by associating the first training data set with the third training data set by a kernel
function below was input into a One-class SVM

[0177] Here, <·,·> denotes an operator that scales each vector such that 12 norms are equal
to 1 and takes an inner product between both the scaled vectors.
[0178] As the One-class SVM, p 'scikit-learn' package from Python was used with a parameter
nu=0.1.
3-2. Prediction
[0179] The patterns of transcriptome in 24 organs in response to the administration of a
drug of interest (first test data), and [labels indicating the names of all indications]
and [combinations of the names of adverse events corresponding to the indications
and the frequencies of occurrence thereof (gd)] registered in FAERS were input in
the trained One-class SVM to cause the trained One-class SVM to determine whether
or not the drug of interest would effective against all the indications separately.
Specifically, in LP problem, the trained One-class SVM was caused to determine whether
or not there was a link between the drug of interest and individual indications. The
SVM returns a label "1" when the drug of interest is effective against a certain indication,
and returns a label "-1" when the drug of interest is not effective against a certain
indication.
3-3. Example 1
[0180] In Example 1, prediction was made assuming that indications for one of the drugs
administered in Section 1. above were unknown. In other words, the One-class SVM was
first trained using data on 14 drugs excluding one of the drugs administered in Section
1. above as training data. After that, the excluded drug was used as a drug of interest,
and the patterns of transcriptome in response to the administration of the drug of
interest as first test data were input in the trained One-class SVM together with
the second test data to predict an indication. The results are shown in FIG. 11. In
FIG. 11, TN represents true negative, TP represents true positive, FN represents false
negative, and FP represents true positive. True negative indicates the number of items
that were able to be predicted as not being indications for those that are not indications,
and true positive indicates the number of items that were able to be predicted as
being indications for those that are indications. False negative indicates the number
of items that were predicted as being not indications for those that are indications,
and false positive indicates the number of items that were predicted as being indications
for those that are not indications. The accuracy score is a score that indicates the
accuracy of prediction. The recall score is the coverage rate in the case of being
predicted as "being an indication." The precision score indicates the reliability
in the case of being predicted as "being an indication."
[0181] The accuracy score was high for all the 15 drugs (>0.78). These results indicate
that 78% or more of predicted indications or non-indications have been actually reported
or not reported. Also, the recall score showed a high value (>0.8) for alendronate,
aripiprazole, asenapine, clozapine, empagliflozin, lurasidone, olanzapine, evolocumab,
ricedronate, sofosbuvir and teriparatide. The recall scores indicate that 80% or more
indications already reported for these drugs can be predicted. The recall score of
doxycycline is 0.527, which indicates that about 50% of indications reported are predicted
for this drug. Only acetaminophen (APAP) showed a high precision score (1.000), and
others all showed a low precision score (<0.35). The precision score and the F major
score were not able to be calculated for cisplatin and lenalidomide because both of
them showed 0 TP and 0 FN. The reason for such a low precision score for many drugs
was thought to be mainly due to the presence of more FPs compared to TPs.
[0182] These results indicate that the prediction method of the present invention is useful
for predicting an indication for a new substance with no known indication.
3-4. Example 2
[0183] It was evaluated whether or not the present invention is useful for exploration of
new indications for a known substance, i.e., what is called drug repositioning. An
artificial intelligence was trained using data on all 15 drugs listed in Section 1.
above to predict an indication for individual drugs. The results are shown in FIG.
12. The symbols in the figure are the same as those in FIG. 11.
[0184] As a result, the number of TPs increased and the number of FNs decreased for all
drugs. The recall scores also improved. Further, the accuracy score and the recall
score improved for all drugs, with the range being between 0.770-1.000. These results
indicate that both reported indications and unreported indications can be captured
with a probability of 77% or more. The precision score was low for all drugs due to
a large number of FNs. In FIG. 12, the FPs indicate potential new indications that
have not been previously reported. Due to a relatively large number of FPs, when the
candidates need to be narrowed down, the candidates can be narrowed down by calculating
a decision function value of each indication in FPs and ranking each indication for
each drug. FIG. 13 shows examples of decision function values of alendronate. Also,
indications that are common to drugs already known to have similar action mechanisms
(for example, alendronate and ricedronate, or aripiprazole and clozapine) and predicted
to be FP are considered to have high potential as repositioned indications.
[0185] These results suggests that the prediction method of the present invention is also
useful for drug repositioning.
Reference Signs List
[0186]
10/ training device
20/ prediction device
40/ server device
101/ processing part
201/ processing part
401/ processing part
400/ prediction system
105/ communication part
405/ communication part