TECHNICAL FIELD
[0001] The present invention relates to the technical field of neural networks, and particularly
to a method and apparatus for training a neural network, and a storage medium.
BACKGROUND
[0002] A neural network is a mathematical algorithm model conducting distributed parallel
information processing by simulating behavior characteristics of an animal neural
network. The neural network implements the purpose of information processing by mainly
relying on the complexity of a system and adjusting mutual connection relations between
large amounts of nodes within the system. Neural networks are widely applied in the
field of data processing, for examples data classification, voice analysis and image
recognition. A neural network has to be trained before being used; however, existing
methods for training a neural network not only cause large computation burdens but
also have low efficiency.
SUMMARY
[0003] A method and apparatus for training a neural network, and a storage medium are provided
in the present invention.
[0004] According to a first aspect of the present invention, a method for training a neural
network is provided, including:
training a super network to obtain a network parameter of the super network, wherein
each network layer of the super network includes multiple alternative network sub-structures
in parallel;
for each network layer of the super network, selecting, from the multiple alternative
network sub-structures, an alternative network sub-structure to be a target network
sub-structure;
constructing a sub-network based on the target network sub-structures, each selected
in a respective layer of the super network; and
training the sub-network, by taking the network parameter inherited from the super
network to be an initial parameter of the sub-network, to obtain a network parameter
of the sub-network.
[0005] Optionally, the super network includes N network layers, and each of the network
layers includes M alternative network sub-structures, where N is a positive integer
no smaller than 2, and M is a positive integer no smaller than 2; and
for each network layer of the super network, selecting, from the multiple alternative
network sub-structures, an alternative network sub-structure to be a target network
sub-structure includes:
selecting an m
th alternative network sub-structure of an nth network layer of the super network to
be the target network sub-structure constructing an nth network layer of the sub-network,
where n is a positive integer smaller than or equal to N, and m is a positive integer
smaller than or equal to M.
[0006] Optionally, the method further includes:
after obtaining the network parameter of the super network, for each of the alternative
network sub-structures, storing a mapping relation between a structure identifier
and a network parameter of the respective alternative network sub-structure.
[0007] Optionally, training the sub-network, by taking the network parameter inherited from
the super network to be an initial parameter of the sub-network, to obtain a network
parameter of the sub-network includes:
for each of the alternative network sub-structures contained in the sub-network, querying,
based on a structure identifier of the alternative network sub-structure, the mapping
relation to obtain a network parameter of the alternative network sub-structure; and
training, based on the obtained network parameters of the alternative network sub-structures,
the sub-network, to obtain the network parameter of the sub-network.
[0008] Optionally, for each network layer of the super network, selecting, from the multiple
alternative network sub-structures, an alternative network sub-structure to be a target
network sub-structure includes:
selecting, based on a set search algorithm, an alternative network sub-structure from
the multiple alternative network sub-structures of each network layer of the super
network to be a target network sub-structure; and
the set search algorithm includes at least one of the following: a random search algorithm,
a Bayesian search algorithm, an evolutionary learning algorithm, a reinforcement learning
algorithm, an evolutionary and reinforcement learning combined algorithm, or a gradient
based algorithm.
[0009] Optionally, the method further includes:
processing input data based on the trained sub-network,
wherein a type of the input data includes at least one of the following: an image
data type, a text data type, or an audio data type.
[0010] Optionally, the method further includes:
conducting performance evaluation on the trained sub-network based on a test data
set, to obtain an evaluation result,
wherein a type of test data in the test data set includes at least one of the following:
an image data type, a service data type or an audio data type.
[0011] According to a second aspect of the present invention, an apparatus for training
a neural network is provided, including:
a first training module, configured to train a super network to obtain a network parameter
of the super network, wherein each network layer of the super network includes multiple
alternative network sub-structures in parallel;
a selection module, configured to: for each network layer of the super network, select,
from the multiple alternative network sub-structures, an alternative network sub-structure
to be a target network sub-structure;
a network construction module, configured to construct a sub-network based on the
target network sub-structures, each selected in a respective layer of the super network;
and
a second training module, configured to train the sub-network, by taking the network
parameter inherited from the super network to be an initial parameter of the sub-network,
to obtain a network parameter of the sub-network.
[0012] Optionally, the super network includes N network layers, and each of the network
layers includes M alternative network sub-structures, where N is a positive integer
no smaller than 2, and M is a positive integer no smaller than 2; and
The selection module is specifically configured to select an m
th alternative network sub-structure of an n
th network layer of the super network to be the target network sub-structure constructing
an n
th network layer of the sub-network, where n is a positive integer smaller than or equal
to N, and m is a positive integer smaller than or equal to M.
[0013] Optionally, the apparatus further includes:
a storage module, configured to: after obtaining the network parameter of the super
network, for each of the alternative network sub-structures, store a mapping relation
between a structure identifier and a network parameter of the respective alternative
network sub-structure.
[0014] Optionally, the second training module is specifically configured to:
for each of the alternative network sub-structures contained in the sub-network, query,
based on a structure identifier of the alternative network sub-structure, the mapping
relation to obtain a network parameter of the alternative network sub-structure; and
train, based on the obtained network parameters of the alternative network sub-structures,
the sub-network to obtain the network parameter of the sub-network.
[0015] Optionally, the selection module is specifically configured to:
select, based on a set search algorithm, an alternative network sub-structure from
the multiple alternative network sub-structures of each network layer of the super
network to be a target network sub-structure; and
the set search algorithm includes at least one of the following: a random search algorithm,
a Bayesian search algorithm, an evolutionary learning algorithm, a reinforcement learning
algorithm, an evolutionary and reinforcement learning combined algorithm, or a gradient
based algorithm.
[0016] Optionally, the apparatus further includes:
a data processing module, configured to process input data based on the trained sub-network,
wherein a type of the input data includes at least one of the following: an image
data type, a text data type, or an audio data type.
[0017] Optionally, the apparatus further includes:
a performance evaluation module, configured to conduct performance evaluation on the
trained sub-network based on a test data set, to obtain an evaluation result,
wherein a type of test data in the test data set includes at least one of the following:
an image data type, a service data type or an audio data type.
[0018] According to a third aspect of the present invention, an apparatus for training a
neural network is provided, including:
a processor; and
a memory, configured to store instructions executable by the processor,
wherein the processor is configured to implement, during execution, steps in the above
method for training a neural network.
[0019] According to a fourth aspect of the present invention, a non-transitory computer-readable
storage medium is provided, wherein instructions in the storage medium, when executed
by a processor of an apparatus for training a neural network, enable the apparatus
to execute the above method for training the neural network.
[0020] The technical solutions provided in embodiments of the present invention may have
the following beneficial effects:
[0021] It can be seen from the above embodiments that, in the present invention, a sub-network
can inherit a network parameter from a super network; the network parameter is taken
to be an initial parameter of the sub-network, so as to train the sub-network to obtain
a network parameter of the sub-network. There is no need of training the sub-network
starting from nothing. The computation burden in the process of neural network training
can be reduced, thus improving the efficiency of neural network training.
[0022] It should be understood that the general description above and detailed description
later are merely exemplary and explanatory, and are not intended to restrict the present
invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] The accompanying drawings herein are incorporated into the specification and constitute
part of the present specification, illustrate embodiments consistent with the present
invention and intended for explaining the principles of the present invention together
with the specification.
Fig. 1 illustrates a first schematic flow chart of a method for training a neural
network according to an exemplary embodiment.
Fig. 2 illustrates a second schematic flow chart of a method for training a neural
network according to an exemplary embodiment.
Fig. 3 illustrates a third schematic flow chart of a method for training a neural
network according to an exemplary embodiment.
Fig. 4 illustrates a fourth schematic flow chart of a method for training a neural
network according to an exemplary embodiment.
Fig. 5 illustrates a fifth schematic flow chart of a method for training a neural
network according to an exemplary embodiment.
Fig. 6 illustrates a sixth schematic flow chart of a method for training a neural
network according to an exemplary embodiment.
Fig. 7 illustrates a schematic structural diagram of a super network according to
an exemplary embodiment.
Fig. 8 illustrates a schematic flow chart of constructing a sub-network according
to an exemplary embodiment.
Fig. 9 illustrates a schematic flow chart of sharing a weight parameter according
to an exemplary embodiment.
Fig. 10 illustrates a first block diagram of an apparatus for training a neural network
according to an exemplary embodiment.
Fig. 11 illustrates a second block diagram of an apparatus for training a neural network
according to an exemplary embodiment.
Fig. 12 illustrates a block diagram of an apparatus for training a neural network
according to an exemplary embodiment.
Fig. 13 illustrates a block diagram of another apparatus for training a neural network
according to an exemplary embodiment.
DETAILED DESCRIPTION
[0024] Detailed description will be made here to exemplary embodiments, examples of which
are illustrated in the accompanying drawings. When drawings are involved in the following
description, identical numerals in different drawings refer to identical or similar
elements, unless otherwise indicated. Implementations described in the following exemplary
embodiments do not mean all the implementations consistent with the present invention.
On the contrary, they are merely examples of apparatuses and methods consistent with
some aspects of the present invention detailed in the appended claims.
[0025] A method for training a neural network is provided in embodiments of the present
invention. Fig. 1 illustrates a first schematic flow chart of a method for training
a neural network according to an exemplary embodiment. As illustrated in Fig. 1, the
method mainly includes the following steps.
[0026] In step 101, a super network is trained to obtain a network parameter of the super
network. Each network layer of the super network includes multiple alternative network
sub-structures in parallel.
[0027] Here, the network parameter of the super network includes a weight parameter of the
super network. In some embodiments, the network parameter further includes a threshold
parameter of the super network.
[0028] In an embodiment of the present invention, the super network may be trained based
on collected sample data to obtain the weight parameter of the super network. A data
type of the sample data may be an image data type, a text data type, or an audio data
type.
[0029] In an embodiment of the present invention, a sub-network obtained by training may
be a neural network for realizing a pre-determined function, including but not limited
to at least one of the following functions: target segmentation, for segmenting a
target apart from background in an input image; classification of a target in the
input image; input image based target tracking; medical image based diagnosis assistance;
input voice based voice recognition, voice correction, etc.
[0030] The above are merely examples of the pre-determined functions realized by the sub-network,
and the particular implementation is not limited to the examples above.
[0031] In embodiments of the present invention, the super network includes at least one
network layer, and each of the network layer contains multiple alternative network
sub-structures. The alternative network sub-structures construct part of the super
network. Here, each alternative network sub-structure is distinguished according to
a structure identifier of the respective alternative network sub-structure. The structure
identifier may be the serial number or the name of the alternative network sub-structure.
Different alternative network sub-structures may be composed by different network
sub-models for realizing same or similar functions, or different alternative network
sub-structures may be composed by different network sub-models for realizing different
functions.
[0032] In step 102, for each network layer of the super network, an alternative network
sub-structure is selected from the multiple alternative network sub-structures, to
be a target network sub-structure.
[0033] Here, an alternative network sub-structure may be selected, from a respective network
layer, to be a target network sub-structure for constructing the sub-network.
[0034] In step 103, a sub-network is constructed based on the target network sub-structures,
each selected in a respective layer of the super network.
[0035] In step 104, the sub-network is trained by taking the network parameter inherited
from the super network to be an initial parameter of the sub-network, to obtain a
network parameter of the sub-network.
[0036] Here, after the sub-network is constructed, the network parameter in the super network
may be assigned to the sub-network so that the sub-network inherits the network parameter
from the super network; and the sub-network is trained on the basis that the sub-network
has the network parameter, without the need of training the sub-network starting from
nothing. As such, the network parameter of the obtained sub-network includes a final
weight parameter obtained by training the sub-network.
[0037] Here, the network parameter inherited from the super network before training the
sub-network corresponds to an initial parameter for training the sub-network alone.
[0038] In embodiments of the present invention, a sub-network can inherit a network parameter
from a super network; the network parameter is taken to be an initial parameter of
the sub-network, so as to train the sub-network to obtain a network parameter of the
sub-network. There is no need of training the sub-network starting from nothing. The
computation burden in the process of neural network training can be reduced, thus
improving the efficiency of neural network training.
[0039] Fig. 2 illustrates a second schematic flow chart of a method for training a neural
network according to an exemplary embodiment. As illustrated in Fig. 2, the method
mainly includes the following steps.
[0040] In step 201, a super network is trained to obtain a network parameter of the super
network. Each network layer of the super network includes multiple alternative network
sub-structures in parallel.
[0041] In step 202, the super network includes N network layers, and each of the network
layers includes M alternative network sub-structures, where N is a positive integer
no smaller than 2, and M is a positive integer no smaller than 2. An m
th alternative network sub-structure of an n
th network layer of the super network is selected to be the target network sub-structure
constructing an n
th network layer of the sub-network, where n is a positive integer smaller than or equal
to N, and m is a positive integer smaller than or equal to M.
[0042] Here, an alternative network sub-structure may be selected from a respective network
layer based on a single path activation algorithm, and the selected alternative network
sub-structure is taken to be a target network sub-structure for constructing the sub-network.
[0043] In step 203, a sub-network is constructed based on the target network sub-structures,
each selected in a respective layer of the super network.
[0044] In step 204, the sub-network is trained by taking the network parameter inherited
from the super network to be an initial parameter of the sub-network, to obtain a
network parameter of the sub-network.
[0045] In embodiments of the present invention, an alternative network sub-structure is
selected from each network layer based on a single path activation algorithm, to be
a target network sub-structure constructing the sub-network, which can simplify the
complexity of neural network training, so as to improve the efficiency of neural network
training.
[0046] Fig. 3 illustrates a third schematic flow chart of a method for training a neural
network according to an exemplary embodiment. As illustrated in Fig. 3, the method
mainly includes the following steps.
[0047] In step 301, a super network is trained to obtain a network parameter of the super
network. Each network layer of the super network includes multiple alternative network
sub-structures in parallel.
[0048] In step 302, after obtaining the network parameter of the super network, for each
of the alternative network sub-structures, a mapping relation between a structure
identifier and a network parameter of the respective alternative network sub-structure
is stored.
[0049] Here, the structure identifier may be a serial number or a name of the alternative
network sub-structure. In an embodiment of the present invention, after obtaining
the network parameter, for each of the alternative network sub-structures, a mapping
relation between a structure identifier and a network parameter of the respective
alternative network sub-structure can be established, and stored in a set mapping
table. When the network parameter corresponding to an alternative network sub-structure
is to be acquired, the mapping relation between the structure identifier and the network
parameter of the alternative network sub-structure can be queried directly according
to the structure identifier of the alternative network sub-structure, so that the
efficiency of neural network training can be improved.
[0050] In step 303, for each network layer of the super network, an alternative network
sub-structure is selected to be a target network sub-structure, from the multiple
alternative network sub-structures.
[0051] In step 304, a sub-network is constructed based on the target network sub-structures,
each selected in a respective layer of the super network.
[0052] In step 305, the sub-network is trained by taking the network parameter inherited
from the super network to be an initial parameter of the sub-network, to obtain a
network parameter of the sub-network.
[0053] Fig. 4 illustrates a fourth schematic flow chart of a method for training a neural
network according to an exemplary embodiment. As illustrated in Fig. 4, the method
mainly includes the following steps.
[0054] In step 401, a super network is trained to obtain a network parameter of the super
network. Each network layer of the super network includes multiple alternative network
sub-structures in parallel.
[0055] In step 402, after obtaining the network parameter of the super network, for each
of the alternative network sub-structures, a mapping relation between a structure
identifier and a network parameter of the respective alternative network sub-structure
is stored.
[0056] In step 403, for each network layer of the super network, an alternative network
sub-structure is selected to be a target network sub-structure, from the multiple
alternative network sub-structures.
[0057] In step 404, a sub-network is constructed based on the target network sub-structures,
each selected in a respective layer of the super network.
[0058] In step 405, for each of the alternative network sub-structures contained in the
sub-network, the mapping relation is queried, based on a structure identifier of the
alternative network sub-structure, to obtain a network parameter of the alternative
network sub-structure.
[0059] After the super network is trained, the network parameter corresponding to each alternative
network sub-structure can be obtained, and the mapping relation between the structure
identifier and the network parameter of the respective network sub-structure can be
established. Here, the mapping relation can be stored in a mapping table. In embodiments
of the present invention, based on the structure identifier of the respective alternative
network sub-structure contained in the sub-network, a corresponding network parameter
can be acquired from the mapping table, and the network parameter is shared to the
corresponding alternative network sub-structure in the sub-network.
[0060] In step 406, the sub-network is trained, based on the obtained network parameters
of the alternative network sub-structures, to obtain the network parameter of the
sub-network.
[0061] In embodiments of the present invention, the mapping relation between a structure
identifier and a network parameter of an alternative network sub-structure is queried
directly according to the structure identifier of the alternative network sub-structure,
and the sub-network is trained to obtain the network parameter of the sub-network.
The computation burden in the process of neural network training can be reduced, thus
improving the efficiency of neural network training.
[0062] Fig. 5 illustrates a fifth schematic flow chart of a method for training a neural
network according to an exemplary embodiment. As illustrated in Fig. 5, the method
mainly includes the following steps.
[0063] In step 501, a super network is trained to obtain a network parameter of the super
network. Each network layer of the super network includes multiple alternative network
sub-structures in parallel.
[0064] In step 502, an alternative network sub-structure is selected to be a target network
sub-structure constructing the sub-network, based on a set search algorithm, from
the multiple alternative network sub-structures of each network layer of the super
network. The set search algorithm includes at least one of the following: a random
search algorithm, a Bayesian search algorithm, an evolutionary learning algorithm,
a reinforcement learning algorithm, an evolutionary and reinforcement learning combined
algorithm, or a gradient based algorithm.
[0065] In step 503, a sub-network is constructed based on the target network sub-structures,
each selected in a respective layer of the super network.
[0066] In step 504, the sub-network is trained by taking the network parameter inherited
from the super network to be an initial parameter of the sub-network, to obtain a
network parameter of the sub-network.
[0067] In an optional embodiment, the method further includes: processing input data based
on the trained sub-network. A type of the input data includes at least one of the
following: an image data type, a text data type, or an audio data type.
[0068] In an optional embodiment, the method further includes: conducting performance evaluation
on the trained sub-network based on a test data set, to obtain an evaluation result.
The type of test data in the test data set includes at least one of the following:
an image data type, a service data type or an audio data type.
[0069] Here, after the trained sub-network is constructed, performance thereof can be evaluated
based on the test data set to gradually optimize the network structure, until an optimal
sub-network, for example, a sub-network with minimal verification losses or maximum
awards, is found. Here, test data in the test data set may be input into the trained
sub-network, and an evaluation result is output through the sub-network. The output
evaluation result is compared to a preset standard to obtain a comparison result,
and the performance of the sub-network is evaluated according to the comparison result.
A test result may be the rate or precision, at which the sub-network processes the
test data.
[0070] The technical solution according to any of the above embodiments of the present invention
may be applied in neural architecture search (NAS). NAS is a technique of automatically
designing a neural network. Based on NAS, a neural network structure of high performance
may be automatically designed according to a sample set, and the costs in using and
implementing the neural network may be reduced effectively.
[0071] Given a search space, namely a set of candidate neural network structures, an optimal
network structure is found in the search space using a search strategy. Then, the
quality, namely performance, of the neural network structure is evaluated based on
the performance evaluation strategy, for example, performance evaluation is conducted
using indexes such as the data processing precision, the data processing rate, etc.
of the neural network. Here, the set of candidate neural network structures includes
a set of the alternative network sub-structures above.
[0072] The NAS may be divided into three components: search spaces, search strategies, and
performance evaluation strategies. A search space represents a group of neural network
architectures available for search, that is, candidate neural network structures.
[0073] A search strategy defines which algorithm can be used to find an optimal network
structure parameter configuration quickly and accurately for e.g., the optimization
of a super parameter. The search algorithm is generally an iteration process, and
defines which algorithm can be used to find an optimal network structure parameter
configuration quickly and accurately. The search algorithm may include: a random search
algorithm, a Bayesian search algorithm, an evolutionary learning algorithm, a reinforcement
learning algorithm, an evolutionary and reinforcement learning combined algorithm,
a gradient based algorithm and so on.
[0074] In each step or iteration of the search process, samples are generated from the search
space, and a neural network is formed according to the samples, which is referred
to as a sub-network. In embodiments of the present invention, the samples are the
target network sub-structures determined from the alternative network sub-structures
in the above embodiments.
[0075] Fig. 6 illustrates a sixth schematic flow chart of a method for training a neural
network according to an exemplary embodiment. As illustrated in Fig. 6, the method
mainly includes the following steps:
In step 601, a super network is trained.
[0076] In embodiments of the present invention, in the process of searching based on NAS,
a super network containing multiple network structures (referred to as sub-structures
hereinafter) is trained to generate a super network containing the search space of
all the sub-structures, that is, a set of candidate neural network structures. The
sub-structures are part of the neural network. The super network includes multiple
network layers, and each of the network layers may contain multiple sub-structures.
Here, the sub-structures may be alternative network sub-structures, and the super
network is the set of all the alternative network sub-structures. Fig. 7 illustrates
a schematic structural diagram of a super network according to an exemplary embodiment.
As illustrated in Fig. 7, the super network 700 contains a first network layer 701,
a second network layer 702 and a third network layer 703. The first network layer
701, the second network layer 702 and the third network layer 703 each contains three
parallel sub-structures, which are a sub-structure A, a sub-structure B and a sub-structure
C.
[0077] A weight parameter corresponding to each network structure can be obtained after
the super network is trained. At this time, a mapping relation between a structure
identifier and a network parameter of the respective network sub-structure can be
established, and the mapping relation is stored in a mapping table. The structure
identifier may be used for uniquely identifying the network structure, and includes
a serial number of the network structure, or a name of the network structure.
[0078] In step 602, sub-structures are sampled from the super network, and a sub-network
is constructed according to the sampled sub-structures.
[0079] Here, the sub-structures may be selected from the super network, and the sub-network
is constructed based on the selected sub-structures. Fig. 8 illustrates a schematic
flow chart of constructing a sub-network according to an exemplary embodiment. As
illustrated in Fig. 8, the super network 800 contains a first network layer 801, a
second network layer 802 and a third network layer 803. The first network layer 801,
the second network layer 802 and the third network layer 803 each contains three parallel
sub-structures, which are a sub-structure A, a sub-structure B and a sub-structure
C. In the process of constructing the sub-network, a sub-structure can be selected
from each network layer to construct the sub-network. For example, the sub-structure
A is selected to be a first network layer of the sub-network 804 from the first network
layer 801 of the super network 800. The sub-structure A is selected to be a second
network layer of the sub-network 804 from the second network layer 802. The sub-structure
B is selected to be a third network layer of the sub-network 804 from the third network
layer 803.
[0080] In step 603, weight parameters of the sub-structures in the super network are shared
to the corresponding sub-structures in the sub-network, so as to sufficiently train
the sub-structures.
[0081] After the super network is trained, the weight parameter corresponding to each network
structure can be obtained, and a mapping relation between a structure identifier and
a network parameter of the respective network sub-structure can be established. The
mapping relation is stored in the mapping table. Here, the corresponding weight parameter
can be acquired from the mapping table based on the structure identifier of the respective
sub-structure in the sub-network, and the weight parameter is shared to the corresponding
sub-structure in the sub-network. After the weight parameters of the sub-structures
in the super network are shared to the corresponding sub-structures in the sub-network,
the sub-network can be trained sufficiently.
[0082] In step 604, performance evaluation is conducted on the sub-network based on a test
data set, to obtain an evaluation result.
[0083] Fig. 9 illustrates a schematic flow chart of sharing a weight parameter according
to an exemplary embodiment. As illustrated in Fig. 9, the super network 900 contains
a first network layer 901, a second network layer 902 and a third network layer 903.
The first network layer 901, the second network layer 902 and the third network layer
903 each contains three parallel sub-structures, which are a sub-structure A, a sub-structure
B and a sub-structure C. In the process of constructing a sub-network, a sub-structure
can be selected from each network layer to construct the sub-network. For example,
the sub-structure A is selected to be a first network layer of the sub-network 904
from the first network layer 901 of the super network 900. The sub-structure A is
selected to be a second network layer of the sub-network 904 from the second network
layer 902. The sub-structure B is selected to be a third network layer of the sub-network
904 from the third network layer 903.
[0084] Accordingly, when sharing the weight parameters, the weight parameter of the sub-structure
A of the first network layer 901 in the super network 900 may be shared to the sub-structure
A of the first network layer of the sub-network 904. The weight parameter of the sub-structure
A of the second network layer 902 in the super network 900 may be shared to the sub-structure
A of the second network layer of the sub-network 904. The weight parameter of the
sub-structure B of the third network layer 903 in the super network 900 may be shared
to the sub-structure B of the third network layer of the sub-network 904.
[0085] Technical solutions involved in the present invention can be used for deep learning
tasks such as, but not limited to, image classification, target detection and semantic
segmentation. For example, a series of neural network models can be found based on
weight-sharing NAS, and the found neural network models are used in deployment. Each
found neural network model does not have to be trained starting from nothing. Instead,
neural network parameters inherited from a trained super network are taken to be initial
parameters for training, so as to obtain a finally trained neural network model.
[0086] In embodiments of the present invention, a sub-structure may be sampled from each
network layer of the super network, and connecting relations among all the sub-structures
may be established to form a sub-network. After that, the weight parameter corresponding
to each sub-structure is acquired from the mapping table based on the structure identifier
of the respective sub-structure in the sub-network, so as to train the sub-network.
In this way, there is no need of training a found sub-network starting from nothing,
not only reducing the computation burden of the neural network, but also improving
the search efficiency of the search algorithm.
[0087] Fig. 10 illustrates a first block diagram of an apparatus for training a neural network
according to an exemplary embodiment. As illustrated in Fig. 10, the apparatus 1000
for training a neural network mainly includes: a first training module 1001, a selection
module 1002, a network construction module 1003 and a second training module 1004.
[0088] The first training module is configured to train a super network to obtain a network
parameter of the super network. Each network layer of the super network includes multiple
alternative network sub-structures in parallel.
[0089] The selection module is configured to, for each network layer of the super network,
select an alternative network sub-structure to be a target network sub-structure constructing
a sub-network from the multiple alternative network sub-structures.
[0090] The network construction module is configured to construct a sub-network based on
the target network sub-structures, each selected in a respective layer of the super
network.
[0091] The second training module is configured to train the sub-network, by taking the
network parameter inherited from the super network to be an initial parameter of the
sub-network, to obtain a network parameter of the sub-network.
[0092] In an optional embodiment, the super network includes N network layers, and each
of the network layers includes M alternative network sub-structures, where N is a
positive integer no smaller than 2, and M is a positive integer no smaller than 2.
[0093] The selection module is specifically configured to select an m
th alternative network sub-structure of an n
th network layer of the super network to be the target network sub-structure constructing
an n
th network layer of the sub-network, where n is a positive integer smaller than or equal
to N, and m is a positive integer smaller than or equal to M.
[0094] Fig. 11 illustrates a second block diagram of an apparatus for training a neural
network according to an exemplary embodiment. As illustrated in Fig. 11, the apparatus
1100 for training a neural network mainly includes: a first training module 1001,
a selection module 1002, a network construction module 1003, a second training module
1004 and a storage module 1101.
[0095] The first training module is configured to train a super network to obtain a network
parameter of the super network. Each network layer of the super network includes multiple
alternative network sub-structures in parallel.
[0096] The selection module is configured to, for each network layer of the super network,
select an alternative network sub-structure to be a target network sub-structure constructing
a sub-network from the multiple alternative network sub-structures.
[0097] The network construction module is configured to construct a sub-network based on
the target network sub-structures, each selected in a respective layer of the super
network.
[0098] The second training module is configured to train the sub-network, by taking the
network parameter inherited from the super network to be an initial parameter of the
sub-network, to obtain a network parameter of the sub-network.
[0099] The storage module is configured to: after obtaining the network parameter of the
super network, for each of the alternative network sub-structures, store a mapping
relation between a structure identifier and a network parameter of the respective
alternative network sub-structure.
[0100] In an optional embodiment, the second training module is specifically configured
to, for each of the alternative network sub-structures contained in the sub-network,
query, based on a structure identifier of the alternative network sub-structure, the
mapping relation to obtain a network parameter of the alternative network sub-structure;
and train, based on the obtained network parameters of the alternative network sub-structures,
the sub-network, to obtain the network parameter of the sub-network.
[0101] In an optional embodiment, the selection module is specifically configured to: select,
based on a set search algorithm, an alternative network sub-structure to be a target
network sub-structure from the multiple alternative network sub-structures of each
network layer of the super network.
[0102] The set search algorithm includes at least one of the following: a random search
algorithm, a Bayesian search algorithm, an evolutionary learning algorithm, a reinforcement
learning algorithm, an evolutionary and reinforcement learning combined algorithm,
or a gradient based algorithm.
[0103] In another optional embodiment, the apparatus further includes a data processing
module, configured to process input data based on the trained sub-network.
[0104] A type of the input data includes at least one of the following: an image data type,
a text data type, or an audio data type.
[0105] In another optional embodiment, the apparatus further includes a performance evaluation
module, configured to conduct performance evaluation on the trained sub-network based
on a test data set, to obtain an evaluation result.
[0106] A type of test data in the test data set includes at least one of the following:
an image data type, a service data type or an audio data type.
[0107] With regard to the apparatus in the above embodiments, the specific way for the various
modules to execute operations has been described in detail in the embodiments regarding
the method, which will not be described in detail here.
[0108] Accordingly, an apparatus for training a neural network is also provided in the present
invention, including a processor; and a memory, configured to store instructions executable
by the processor.
[0109] The processor is configured to implement, during execution, steps in the method for
training a neural network in any of the above embodiments.
[0110] Fig. 12 illustrates a block diagram of an apparatus 1200 for training a neural network
according to an exemplary embodiment. For example, the apparatus 1200 may be a mobile
phone, a computer, a digital broadcast terminal, a message transceiving device, a
game console, a tablet device, medical equipment, fitness equipment, a personal digital
assistant, etc.
[0111] As illustrated in Fig. 12, the apparatus 1200 may include one or more of the following:
a processing component 1202, a memory 1204, a power component 1206, a multi-media
component 1208, an audio component 1210, an input/output (I/O) interface 1212, a sensor
component 1214, and a communication component 1216.
[0112] The processing component 1202 generally controls the overall operation of the apparatus
1200, such as operations associated with display, a phone call, data communication,
a camera operation and a recording operation. The processing component 1202 may include
one or more processors 1220 to execute instructions, so as to complete all or some
blocks in the methods above. In addition, the processing component 1202 may include
one or more modules for the interaction between the processing component 1202 and
the other components. For example, the processing component 1202 may include a multi-media
module for interaction between the multi-media component 1208 and the processing component
1202.
[0113] The memory 1204 is configured to store various types of data so as to support operations
at the apparatus 1200. The examples of these types of data include instructions of
any application or method for operating on the apparatus 1200, contact person data,
phone book data, messages, pictures, video, etc. The memory 1204 may be implemented
by any type of volatile or non-volatile storage device or a combination of both, for
example, a static random access memory (SRAM), an electrically erasable programmable
read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable
read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory,
a magnetic disk or an optical disk.
[0114] The power component 1206 supplies power for the various components of the apparatus
1200. The power component 1206 may include a power management system, one or more
power sources, and other components associated with the generation, management and
distribution of power for the apparatus 1200.
[0115] The multi-media component 1208 includes a screen serving as an output interface between
the apparatus 1200 and a user. In some embodiments, the screen may include a liquid
crystal display (LCD) and a touch pad (TP). If the screen includes a touch pad, then
the screen may be implemented as a touch screen so as to receive an input signal from
the user. The touch pad includes one or more touch sensors to sense touch, slide and
gestures on the touch pad. The touch sensors may not only sense the boundary of a
touch or slide action, but also can detect the duration and pressure related to the
touch or slide operation. In some embodiments, the multi-media component 1208 includes
a front camera and/or a rear camera. When the apparatus 1200 is in an operating mode,
such as a photography mode or a video mode, the front camera and/or the rear camera
may receive external multi-media data. Each front camera and rear camera may be a
fixed optical lens system or have a focal length or optical zoom capability.
[0116] The audio component 1210 is configured to output and/or input an audio signal. For
example, the audio component 1210 includes a microphone (MIC), and when the apparatus
1200 is in an operating mode, such as a calling mode, a recording mode and a voice
recognition mode, the microphone is configured to receive an external audio signal.
The received audio signal can be further stored in the memory 1204 or sent via the
communication component 1216. In some embodiments, the audio component 1210 further
includes a loudspeaker for outputting an audio signal.
[0117] The I/O interface 1212 provides an interface between the processing component 1202
and a peripheral interface module, and the above peripheral interface module may be
a keyboard, a click wheel, a button, etc. The button may include but is not limited
to a home page button, a volume button, a start button and a locking button.
[0118] The sensor component 1214 includes one or more sensors for providing state evaluation
for the apparatus 1200 from various aspects. For example, the sensor component 1214
may detect an on/off state of the apparatus 1200, and the relative positioning between
components; for example the components are a display and keyboard of the apparatus
1200. The sensor component 1214 may also detect a positional change of the apparatus
1200 or a component of the apparatus 1200, whether there is contact between a user
and the apparatus 1200, the orientation or acceleration/deceleration of the apparatus
1200, and a temperature change of the apparatus 1200. The sensor component 1214 may
include a proximity sensor configured to detect the existence of an object nearby
without any physical contact. The sensor component 1214 may also include an optical
sensor, such as a CMOS or CCD image sensor, for use in an imaging application. In
some embodiments, the sensor component 1214 may also include an acceleration sensor,
a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
[0119] The communication component 1216 is configured for wired or wireless communication
between the apparatus 1200 and another device. The apparatus 1200 may access a communication
standard based wireless network, such as WiFi, 2G, 3G or a combination thereof. In
an exemplary embodiment, the communication component 1216 receives a broadcast signal
from an external broadcast management system or broadcasts relevant information through
a broadcast channel. In an exemplary embodiment, the communication component 1216
further comprises a near-field communication (NFC) module for short-range communication.
For example, the NFC module may be implemented based on the radio-frequency identification
(RFID) technique, the infrared data association (IrDA) technique, the ultra-wide band
(UWB) technique, the bluetooth (BT) technique or others.
[0120] In an exemplary embodiment, the apparatus 1200 may be implemented by one or more
application-specific integrated circuit (ASIC), a digital signal processor (DSP),
a digital signal processing device (DSPD), a programmable logic device (PLD), a field
programmable gate array (FPGA), a controller, a micro-controller, a micro-processor
or other electronic elements, for executing the above methods.
[0121] In an exemplary embodiment, a non-transitory computer-readable storage medium including
instructions is also provided, for example a memory 1204 including instructions. The
above instructions may be executed by the processor 1220 of the apparatus 1200 to
complete the above methods. For example, the non-transitory computer-readable storage
medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy
disk, an optical data storage device and so on.
[0122] A non-transitory computer-readable storage medium, wherein instructions in the storage
medium, when executed by a processor of a mobile terminal, enable the mobile terminal
to execute a method for training a neural network. The method includes the following
operations:
[0123] A super network is trained to obtain a network parameter of the super network. Each
network layer of the super network includes multiple alternative network sub-structures
in parallel.
[0124] For each network layer of the super network, an alternative network sub-structure
is selected to be a target network sub-structure from the multiple alternative network
sub-structures.
[0125] A sub-network is constructed based on the target network sub-structures, each selected
in a respective layer of the super network.
[0126] The sub-network is trained by taking the network parameter inherited from the super
network to be an initial parameter of the sub-network, to obtain a network parameter
of the sub-network.
[0127] Fig. 13 illustrates a block diagram of another apparatus 1300 for training a neural
network according to an exemplary embodiment. For example, the apparatus 1300 may
be provided as a server. As illustrated in Fig. 13, the apparatus 1300 includes a
processing component 1322, and further includes one or more processor, and a memory
resource represented by a memory 1332, for storing instructions executable by the
processing component 1322, for example an application program. The application program
stored in the memory 1332 may include one or more modules, each corresponding to a
set of instructions. In addition, the processing component 1332 is configured to execute
an instruction so as to carry out the above method for training a neural network.
The method includes the following operations.
[0128] A super network is trained to obtain a network parameter of the super network. Each
network layer of the super network includes multiple alternative network sub-structures
in parallel.
[0129] For each network layer of the super network, an alternative network sub-structure
is selected to be a target network sub-structure constructing a sub-network from the
multiple alternative network sub-structures.
[0130] The sub-network is trained by taking the network parameter inherited from the super
network to be an initial parameter of the sub-network, to obtain a network parameter
of the sub-network.
[0131] The apparatus 1300 may also include: a power component 1326, configured to perform
power management of the apparatus 1300; a wired or wireless network interface 1350,
configured to connect the apparatus 1300 to a network; and an input/output (I/O) interface
1358. The apparatus 1300 may operate based an operating system stored in the memory
1332, for example Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the
like.
[0132] Other embodiments of the present invention would readily occur to those skilled in
the art upon consideration of the specification and practicing the present invention
here. The present invention is aimed at covering any variants, usages or adaptive
changes that comply with generic principles of the present invention and include common
knowledge or customary technical means in the art that is not disclosed in the present
invention. The specification and embodiments are merely considered exemplary, and
the true scope of the present invention are specified by the appended claims
[0133] It should be understood that the present invention is not limited to the precise
structures described above and illustrated in the accompanying drawings, and modifications
and changes may be made thereto without departing from the scope thereof. The scope
of the present invention is merely defined by the appended claims.
1. A method for training a neural network,
characterized by, comprising:
training (101) a super network to obtain a network parameter of the super network,
wherein each network layer of the super network comprises multiple alternative network
sub-structures in parallel;
for each network layer of the super network, selecting (102), from the multiple alternative
network sub-structures, an alternative network sub-structure to be a target network
sub-structure;
constructing (103) a sub-network based on the target network sub-structure, each selected
in a respective network layer of the super network; and
training (104) the sub-network, by taking the network parameter inherited from the
super network to be an initial parameter of the sub-network, to obtain a network parameter
of the sub-network.
2. The method according to claim 1, wherein
the super network comprises N network layers, and each of the network layers comprises
M alternative network sub-structures, where N is a positive integer no smaller than
2, and M is a positive integer no smaller than 2; and
wherein for each network layer of the super network, selecting, from the multiple
alternative network sub-structures, an alternative network sub-structure to be a target
network sub-structure comprises:
selecting an mth alternative network sub-structure of an nth network layer of the super network to be the target network sub-structure constructing
an nth network layer of the sub-network, where n is a positive integer smaller than or equal
to N, and m is a positive integer smaller than or equal to M.
3. The method according to claim 1 or 2, further comprising:
after obtaining the network parameter of the super network, for each of the alternative
network sub-structures, storing a mapping relation between a structure identifier
and a network parameter of the respective alternative network sub-structure.
4. The method according to any one of the preceding claims, wherein training the sub-network,
by taking the network parameter inherited from the super network to be the initial
parameter of the sub-network, to obtain a network parameter of the sub-network comprises:
for each of the alternative network sub-structures contained in the sub-network, querying,
based on a structure identifier of the alternative network sub-structure, the mapping
relation to obtain a network parameter of the alternative network sub-structure; and
training, based on the obtained network parameters of the alternative network sub-structures,
the sub-network, to obtain the network parameter of the sub-network.
5. The method according to any one of the preceding claims, wherein for each network
layer of the super network, selecting, from the multiple alternative network sub-structures,
the alternative network sub-structure to be the target network sub-structure comprises:
selecting, based on a set search algorithm, an alternative network sub-structure from
the multiple alternative network sub-structures of each network layer of the super
network to be a target network sub-structure; and
wherein the set search algorithm comprises at least one of the following: a random
search algorithm, a Bayesian search algorithm, an evolutionary learning algorithm,
a reinforcement learning algorithm, an evolutionary and reinforcement learning combined
algorithm, or a gradient based algorithm.
6. The method according to any one of claims 1 to 5, further comprising:
processing input data based on the trained sub-network,
wherein a type of the input data comprises at least one of the following: an image
data type, a text data type, or an audio data type.
7. The method according to any one of claims 1 to 6, further comprising:
conducting performance evaluation on the trained sub-network based on a test data
set, to obtain an evaluation result,
wherein a type of test data in the test data set comprises at least one of the following:
an image data type, a service data type or an audio data type.
8. An apparatus for training a neural network, comprising:
a first training module (1001), configured to train a super network to obtain a network
parameter of the super network, wherein each network layer of the super network comprises
multiple alternative network sub-structures in parallel;
a selection module (1002), configured to, for each network layer of the super network,
select, from the multiple alternative network sub-structures, an alternative network
sub-structure to be a target network sub-structure;
a network construction module (1003), configured to construct a sub-network based
on the target network sub-structures, each selected in a respective layer of the super
network; and
a second training module (1004), configured to train the sub-network, by taking the
network parameter inherited from the super network to be an initial parameter of the
sub-network, to obtain a network parameter of the sub-network.
9. The apparatus according to claim 8, wherein
the super network comprises N network layers, and each of the network layers comprises
M alternative network sub-structures, where N is a positive integer no smaller than
2, and M is a positive integer no smaller than 2; and
wherein the selection module (1002) is configured to select an mth alternative network sub-structure of an nth network layer of the super network to be the target network sub-structure constructing
an nth network layer of the sub-network, where n is a positive integer smaller than or equal
to N, and m is a positive integer smaller than or equal to M.
10. The apparatus according to claim 8 or 9, further comprising:
a storage module, configured to: after obtaining the network parameter of the super
network, for each of the alternative network sub-structures, store a mapping relation
between a structure identifier and a network parameter of the respective alternative
network sub-structure.
11. The apparatus according to any one of claims 8 to 10, wherein the second training
module (1004) is specifically configured to:
for each of the alternative network sub-structures contained in the sub-network, query,
based on a structure identifier of the alternative network sub-structure, the mapping
relation to obtain a network parameter of the alternative network sub-structure; and
train, based on the obtained network parameters of the alternative network sub-structures,
the sub-network, to obtain the network parameter of the sub-network.
12. The apparatus according to any one of claims 8 to 11, wherein the selection module
(1002) is configured to:
select, based on a set search algorithm, an alternative network sub-structure to be
a target network sub-structure from the multiple alternative network sub-structures
of each network layer of the super network; and
wherein the set search algorithm comprises at least one of the following: a random
search algorithm, a Bayesian search algorithm, an evolutionary learning algorithm,
a reinforcement learning algorithm, an evolutionary and reinforcement learning combined
algorithm, or a gradient based algorithm.
13. The apparatus according to any one of claims 8 to 12, further comprising:
a data processing module, configured to process input data based on the trained sub-network,
wherein a type of the input data comprises at least one of the following: an image
data type, a text data type, or an audio data type.
14. The apparatus according to any one of claims 8 to 13, further comprising:
a performance evaluation module, configured to conduct performance evaluation on the
trained sub-network based on a test data set, to obtain an evaluation result,
wherein a type of test data in the test data set comprises at least one of the following:
an image data type, a service data type or an audio data type.
15. A computer-readable storage medium, wherein instructions in the storage medium, when
executed by a processor of an apparatus for training a neural network, enable the
apparatus to execute the method for training the neural network of any one of preceding
claims 1 to 7.