TECHNICAL FIELD
[0001] This application relates to the field of computer technologies, and in particular,
to a data processing method and apparatus, and a related device.
BACKGROUND
[0002] In the fields of image processing and natural language processing, a plurality of
channels of data are usually processed simultaneously. When the plurality of channels
of data are processed by using a convolutional neural network (convolutional neural
network, CNN), to improve data processing efficiency of an intelligent processing
apparatus, the plurality of channels of data are usually input to the convolutional
neural network as one batch, so that processing of the plurality of channels of data
is completed through one forward inference. However, when processing the plurality
of channels of data, a neural network model needs to perform operations such as vectorization
and fractal on each channel of data separately. This increases time of data tiling
(tiling) and memory reading, and reduces data processing efficiency. In addition,
if a data amount of each channel of data is small, compared with processing one channel
of data with a large data amount through one forward inference, processing the plurality
of channels of data with small data amounts through one forward inference cannot fully
utilize a bandwidth and a computing capability of the intelligent processing apparatus.
Therefore, how to fully utilize the bandwidth and the computing capability of the
intelligent processing apparatus to improve data processing efficiency is an urgent
problem to be resolved.
SUMMARY
[0003] This application discloses a data processing method and apparatus, and a related
device, to improve data processing efficiency of an apparatus, and fully utilize a
bandwidth and a computing capability of the apparatus.
[0004] According to a first aspect, this application provides a data processing method,
where the method includes:
[0005] An intelligent processing apparatus obtains first data and second data, pads, according
to a preset rule, third data used to isolate the first data from the second data between
the first data and the second data, to obtain fourth data, and then completes processing
on the fourth data by using a convolutional neural network.
[0006] The first data and the second data are data to be spliced together, and a sequence
of the first data is prior to a sequence of the second data during splicing. The first
data is any one of image data, voice data, or a text sequence, and the second data
is any one of image data, voice data, or a text sequence.
[0007] Splicing two groups of data into one piece of data with a large data amount and more
elements, and inputting the data obtained after splicing into a convolutional neural
network model for processing can avoid separately performing operations such as vectorization
and fractal on each group of data in a process of simultaneously processing a plurality
of groups of data by using the convolutional neural network. This improves efficiency
of data processing performed by the intelligent processing apparatus by using the
convolutional neural network, and fully utilizes a bandwidth of the intelligent processing
apparatus. By padding elements used to isolate two groups of data between the two
groups of data, a result obtained after spliced data passes through a convolutional
layer of the convolutional neural network includes a result obtained when each group
of data is separately processed. This avoids the following problem: After two groups
of data are directly spliced for convolutional processing, receptive fields corresponding
to some elements in an output result include elements in two feature maps, and consequently
final detection and recognition results of the convolutional neural network are inaccurate.
[0008] In a specific implementation, the padding, according to a preset rule, third data
between the first data and the second data, to obtain fourth data specifically includes:
when row quantities of the first data and the second data are the same, or both row
quantities and column quantities of the first data and the second data are the same,
padding third data of h1 rows and c1 columns between the last column on a right side
of the first data and the first column on a left side of the second data, to obtain
the fourth data, where the first data includes h1 rows and w1 columns, the second
data includes h2 rows and w2 columns, the fourth data includes h1+2p1 rows and w1+c1+w2+2p1
columns, values of h1 and h2 are the same, p1 is a padding size corresponding to a
first network layer, and the first network layer is a network layer to which the fourth
data is to be input.
[0009] In a specific implementation, the padding, according to a preset rule, third data
between the first data and the second data, to obtain fourth data specifically includes:
when column quantities of the first data and the second data are the same, or both
row quantities and column quantities of the first data and the second data are the
same, padding third data of r1 rows and w1 columns between the last row on a lower
side of the first data and the first row on an upper side of the second data, to obtain
the fourth data, where the first data includes h1 rows and w1 columns, the second
data includes h2 rows and w2 columns, values of w1 and w2 are the same, p1 is a padding
size corresponding to a first network layer, the first network layer is a network
layer to which the fourth data is to be input, and the fourth data includes h1+r1+h2+2p1
rows and w1+2p1 columns, that is, data of p1 rows is padded on each of an upper side
and a lower side of the fourth data, and data of p1 columns is padded on each of a
left side and a right side of the fourth data.
[0010] Based on the row quantities and the column quantities of the first data and the second
data, the third data is padded between the last column on the right side of the first
data and the first column on the left side of the second data, or the third data is
padded between the last row on the lower side of the first data and the first row
on the upper side of the second data, to connect the first data and the second data,
where values of elements padded in the fourth data are zeroes. By using the method,
two or more groups of data can be spliced into one group of data with a larger data
amount, and the spliced data is input to the convolutional neural network. This avoids
the following problem: After two groups of data are directly spliced for convolutional
processing, receptive fields corresponding to some elements in an output result include
elements in two feature maps, and consequently final detection and recognition results
of the convolutional neural network are inaccurate.
[0011] In a specific implementation, the determining the column quantity c1 of the third
data based on the column quantity w1 of the first data and network parameters of the
first network layer includes: The intelligent processing apparatus determines, based
on the column quantity w1 of the first data, a size k1, the padding size p1, and a
stride size s1, a column quantity wo1 of fifth data output after the first data is
input to the first network layer; determines, based on the column quantity w1, the
padding size p1, the stride size s1, and the column quantity wo1, a distance Δw between
a center of the last operation on the first data and a center of the first operation
on the second data in a horizontal direction when a convolutional kernel or a pooling
kernel processes spliced data after the spliced data is obtained by padding data of
h1 rows and p1 columns between the last column of the first data and the first column
of the second data; and then determines the column quantity c1 based on the padding
size p1, the stride size s1, and the distance Δw.
[0013] In a specific implementation, that an intelligent processing apparatus completes
data processing on the fourth data by using a convolutional neural network includes:
inputting the fourth data to the first network layer of the convolutional neural network
for processing to obtain sixth data, where the sixth data includes seventh data, eighth
data, and interference data, the seventh data is obtained after the first network
layer processes the first data, the eighth data is obtained after the first network
layer processes the second data, and the interference data is data between the last
column of the seventh data and the first column of the eighth data; determining a
column quantity c
2 of the interference data, and deleting the interference data of the c
2 columns; determining a column quantity c
3 of ninth data padded between the last column of the seventh data and the first column
of the eighth data; padding the ninth data of c
3 columns between the last column of the seventh data and the first column of the eighth
data, to obtain tenth data; and completing data processing on the tenth data by using
the convolutional neural network.
[0014] It should be understood that, because the convolutional neural network includes a
plurality of convolutional layers and a plurality of pooling layers, after the fourth
data obtained by splicing is input to the first network layer for convolutional processing
to obtain the sixth data, the sixth data is further input to a next network layer
(a convolutional layer or a pooling layer) of the convolutional neural network for
convolutional processing or pooling processing. The sixth data includes one or more
columns of interference elements used to isolate the seventh data from the eighth
data, and values of the interference elements are not all 0s. Therefore, before the
sixth data is input to the next network layer of the convolutional neural network,
the column quantity c
2 of the interference elements needs to be determined. The interference elements of
the c
2 columns need to be deleted from the sixth data. Then, the column quantity c
3 of the ninth data padded between the last column of the seventh data and the first
column of the eighth data is determined based on convolutional parameters or pooling
parameters of the next network layer. The ninth data is padded between the last column
of the seventh data and the first column of the eighth data, to obtain the tenth data,
and finally the tenth data is input to the next network layer.
[0015] In a specific implementation, the determining a column quantity c
2 of the interference data by the intelligent processing apparatus includes: determining
a column quantity w
o2 of the sixth data based on a column quantity w
1+c
1+w
2+2p
1 of the fourth data and the network parameters of the first network layer, and determining
the column quantity c
2 of the interference data based on the column quantity w
o2 of the sixth data, a column quantity of the seventh data, and a column quantity of
the eighth data. In this application, the column quantity w
o1, the distance Δw, and the column quantity c
1 may be calculated by using the following formulas:

[0016] In a specific implementation, the padding third data of r1 rows and w1 columns between
the last row of the first data and the first row of the second data by the intelligent
processing apparatus, to obtain the fourth data includes: determining the row quantity
r1 of the third data based on the row quantity h1 of the first data and network parameters
of the first network layer, where the network parameters of the first network layer
include a size k1 of a convolutional kernel or a pooling kernel, a padding size p1,
and a stride size s1; obtaining the column quantity w1 of the first data; and padding
the third data of r1 rows and w1 columns between the first data and the second data,
to obtain the fourth data.
[0017] In a specific implementation, the determining the row quantity r1 of the third data
based on the row quantity h1 of the first data and network parameters of the first
network layer includes: determining, based on the row quantity h1 of the first data,
the size k1 of the convolutional kernel or the pooling kernel, the padding size p1,
and the stride size s1 of the kernel, a row quantity ho1 of fifth data output after
the first data is input to the first network layer; based on the row quantity w1,
the padding size p1, the stride size s1, and the row quantity ho1, determining a distance
Δh between a center of the last operation on the first data and a center of the first
operation on the second data in a vertical direction when the convolutional kernel
or the pooling kernel processes spliced data after the spliced data is obtained by
padding data of h1 rows and p1 columns between the last row of the first data and
the first row of the second data; and determining the row quantity r1 based on the
padding size p1, the stride size s1, and the distance Δh.
[0018] In a specific implementation, that an intelligent processing apparatus completes
data processing on the fourth data by using a convolutional neural network includes:
inputting the fourth data to the first network layer for processing to obtain sixth
data, where the sixth data includes seventh data, eighth data, and interference data,
the seventh data is obtained after the first network layer processes the first data,
the eighth data is obtained after the first network layer processes the second data,
and the interference data is data between the last row of the seventh data and the
first row of the eighth data; determining a row quantity r2 of the interference data,
and deleting the interference data of r2 rows; determining a row quantity r3 of ninth
data padded between the last row of the seventh data and the first row of the eighth
data; padding the ninth data of r3 rows between the last row of the seventh data and
the first row of the eighth data, to obtain tenth data; and completing data processing
on the tenth data by using the convolutional neural network.
[0019] In a specific implementation, the determining a row quantity r2 of the interference
data by the intelligent processing apparatus includes: determining a row quantity
ho2 of the sixth data based on the row quantity h1+r1+h2+2p1 of the fourth data and
the network parameters of the first network layer, and determining the row quantity
r2 of the interference data based on the row quantity ho2 of the sixth data, a row
quantity of the seventh data, and a row quantity of the eighth data.
[0020] According to a second aspect, an embodiment of this application provides a data processing
apparatus, including units configured to perform the data processing method in the
first aspect or any one of the possible implementations of the first aspect.
[0021] According to a third aspect, an embodiment of this application provides a computing
device, including a processor and a memory. The memory is configured to store instructions,
the processor is configured to execute the instructions, and when executing the instructions,
the processor performs the method in the first aspect or any one of the specific implementations
of the first aspect.
[0022] According to a fourth aspect, an embodiment of this application provides a computer-readable
storage medium. The computer-readable storage medium stores instructions, and when
the instructions are run on a computing device, the method according to the first
aspect or any one of the specific implementations of the first aspect is performed.
Based on the implementations provided in the foregoing aspects, further combination
may be performed in this application to provide more implementations.
[0023] Based on the implementations provided in the foregoing aspects, further combination
may be performed in this application to provide more implementations.
BRIEF DESCRIPTION OF DRAWINGS
[0024]
FIG. 1 is a schematic diagram of a structure of a convolutional neural network;
FIG. 2 is a schematic diagram of a system architecture according to an embodiment
of this application;
FIG. 3 is a schematic diagram of another system architecture according to an embodiment
of this application;
FIG. 4 is a schematic diagram of feature map splicing according to an embodiment of
this application;
FIG. 5 is a schematic flowchart of a data processing method according to an embodiment
of this application;
FIG. 6 is a schematic diagram of another manner of feature map splicing according
to an embodiment of this application;
FIG. 7 is a schematic diagram of an operation on a spliced feature map according to
an embodiment of this application;
FIG. 8 is a schematic diagram of another operation on a spliced feature map according
to an embodiment of this application;
FIG. 9 is a schematic diagram of still another operation on a spliced feature map
according to an embodiment of this application;
FIG. 10 is a schematic diagram of a data processing apparatus according to an embodiment
of this application;
FIG. 11 is a schematic diagram of another data processing apparatus according to an
embodiment of this application; and
FIG. 12 is a schematic diagram of a computing device according to an embodiment of
this application.
DESCRIPTION OF EMBODIMENTS
[0025] A control system and method provided in this application are described in detail
below with reference to the accompanying drawings.
[0026] A convolutional neural network is a deep learning model, which is usually used to
analyze data such as images. As shown in FIG. 1, the CNN usually includes network
layers such as a convolutional layer (convolutional layer), a pooling layer (pooling
layer), and a fully connected layer (fully connected layer). The convolutional layer
is configured to perform feature aggregation on input data to obtain a feature map.
The pooling layer is configured to perform, after the convolutional layer performs
feature extraction, feature selection on the feature map output by the convolutional
layer. The fully connected layer is configured to perform nonlinear combination of
features obtained by the convolutional layer and the pooling layer to obtain an output
result. Each network layer includes network layer parameters such as a kernel size
(kernel size), a padding size (padding size), and a stride size (stride size). For
example, convolutional parameters corresponding to the convolutional layer include
a convolutional kernel size (convolutional kernel size), a padding size of the convolutional
layer for input to a feature map, and a stride size of a convolutional kernel, and
pooling parameters corresponding to the pooling layer include a pooling kernel size
(pooling kernel size), a padding size of the pooling layer for input to a feature
map, and a stride size of a pooling kernel. Usually, the CNN inputs feature data obtained
after passing through one or more convolutional layers to the pooling layer for feature
aggregation, and finally inputs the feature data to the fully connected layer after
a plurality of times of convolutional and pooling processing is performed. In FIG.
1, an example in which one convolutional layer is connected to one pooling layer is
used.
[0027] In the fields of image processing and natural language processing, a plurality of
channels of data are usually processed simultaneously. In an example of image processing,
in a video surveillance system shown in FIG. 2 or a data center shown in FIG. 3, an
intelligent processing apparatus needs to process a plurality of images simultaneously,
for example, a plurality of images uploaded by a plurality of surveillance cameras
or a plurality of pieces of user equipment, a plurality of images uploaded by a same
camera or a same piece of user equipment, or a plurality of images obtained by separately
segmenting a plurality of objects in a same image when the plurality of objects are
processed. It should be understood that the intelligent processing apparatus may be
a card, or may be a server or another computing device. In the intelligent processing
apparatus, the image processing may be performed by any one or more of components
with data processing capabilities, such as a central processing unit (central processing
unit, CPU), a graphics processing unit (graphics processing unit, GPU), a tensor processing
unit (tensor processing unit, TPU), or a neural network processing unit (neural network
processing unit, NPU). The user equipment may be any one or more of a mobile phone,
a tablet computer, a personal computer, a camera, a scanner, a surveillance camera,
or a vehicle-mounted camera.
[0028] An embodiment of this application provides a data processing method. Two or more
images are adaptively spliced to obtain a large image, and then the spliced image
is input to a convolutional neural network for forward inference, to fully utilize
a bandwidth and a computing capability of an intelligent processing apparatus, and
improve data processing efficiency of the intelligent processing apparatus.
[0029] It should be noted that in a process of processing an image by the convolutional
neural network, output results of a convolutional layer and a pooling layer are both
referred to as feature maps (feature map). The image is input to the convolutional
neural network in a form of pixel matrix. For example, if the image is a gray image,
a twodimensional pixel matrix is input, or if the image is a color image, a three-dimensional
pixel matrix (or referred to as a tensor) is input. After a pixel matrix corresponding
to one image is input to a first convolutional layer of the convolutional neural network,
a feature map corresponding to the image is obtained through convolutional processing.
Therefore, a pixel matrix corresponding to an image and a feature map corresponding
to the image are both in matrix forms. In this embodiment of this application, for
ease of description, a pixel matrix input by the first convolutional layer of the
convolutional neural network is also referred to as a feature map, and each value
in the feature map is referred to as an element.
[0030] When feature maps corresponding to images are spliced, horizontal splicing or vertical
splicing may be performed. FIG. 4 is a schematic diagram of feature map splicing.
A first feature map is a feature map corresponding to a first image, and a second
feature map is a feature map corresponding to a second image, where each row and each
column of the first feature map and the second feature map separately have five elements.
The horizontal splicing means that the last column on a right side of a feature map
is connected to the first column on a left side of another feature map, and the vertical
splicing means that the last row on a lower side of a feature map is connected to
the first row on an upper side of another feature map. In the horizontal splicing,
row quantities of the feature maps are the same, and column quantities of the feature
maps may be the same or different. In the vertical splicing, column quantities of
the feature maps are the same, and row quantities of the feature maps may be the same
or different.
[0031] In this embodiment of this application, an example in which input data is images,
and two images are horizontally spliced and input to a convolutional layer is used
to describe the data processing method in this embodiment of this application. FIG.
5 is a schematic flowchart of a data processing method according to an embodiment
of this application. The data processing method includes S501 to S503.
[0032] S501: Obtain first data and second data.
[0033] The first data is a first feature map corresponding to a first image, and the second
data is a second feature map corresponding to a second image. The first data and the
second data are two pieces of data of adjacent sequences to be spliced. The first
data includes elements of h
1 rows and w
1 columns. The second data includes elements of h
2 rows and w
2 columns. When the first data and the second data are horizontally spliced, values
of h
1 and h
2 are the same.
[0034] S502: Pad third data between the first data and the second data according to a preset
rule to obtain fourth data.
[0035] The third data is used to isolate the first data from the second data, and the fourth
data includes the first data, the second data, and the third data. A method for padding
the third data between the first feature map and the second feature map to obtain
a third feature map is described in detail by using an example in which the first
feature map and the second feature map are horizontally spliced, and elements of the
first feature map are located on a left side of elements of the second feature map
after splicing.
[0036] After obtaining the first feature map and the second feature map, an intelligent
processing apparatus determines, based on a column quantity w
1 of the first feature map and convolutional parameters of a first convolutional layer
to which the first feature map and the second feature map are to be input, a column
quantity of the third data to be padded between the first feature map and the second
feature map. Convolutional parameters include a convolutional kernel size corresponding
to a convolutional layer, a padding size corresponding to the convolutional layer,
a stride size of a convolutional kernel, and a dilation rate (dilation rate). When
the convolutional kernel is 3*3, the convolutional kernel size is equal to 3.
[0037] With reference to (formula 1), the intelligent processing apparatus can determine,
based on the convolutional parameters of the first convolutional layer to which the
first feature map and the second feature map are to be input and the column quantity
w
1 of the first feature map, a column quantity w
o1 of fifth data output after the first feature map is separately input to the first
convolutional layer for a convolutional operation:

[0038] In formula 1, the ceil operation indicates that a minimum integer greater than or
equal to a specified expression is returned, p
1 is a padding size corresponding to the first convolutional layer, k
1 is a convolutional kernel size corresponding to the first convolutional layer, s
1 is a stride size of a convolutional kernel corresponding to the first convolutional
layer, and di is a dilation rate. For example, the column quantity w
1 of the first feature map is 4, the padding size p
1 in the convolutional parameters is 2, the convolutional kernel size k
1 is 5, the stride size s
1 is 1, and the dilation rate di is 1. Therefore, the column quantity of an output
feature map is 4.
[0039] After obtaining, according to the foregoing (formula 1), the column quantity of the
output feature map after the first feature map is separately input to the first convolutional
layer for the convolutional operation, the intelligent processing apparatus may determine,
based on the column quantity w
1 of the first feature map, the padding size p
1 of the first convolutional layer, the column quantity of the output feature map,
and the stride size s
1 of the convolutional kernel, a distance Δw between a center of the last operation
on the first feature map by the convolutional kernel and a center of the first operation
on the second feature map by the convolutional kernel in the first horizontal movement
of the convolutional kernel in a process in which elements of p
1 columns are padded between the first feature map and the second feature map to obtain
a spliced feature map, the spliced feature map is input to the first convolutional
layer, and then the convolutional kernel performs a convolutional operation on the
spliced feature map. The spliced feature map indicates a feature map obtained after
the elements of p
1 columns are padded between the first feature map and the second feature map to splice
the first feature map and the second feature map, elements of p
1 columns are padded on each of a left side and a right side of the spliced feature
map, and elements of p
1 rows are padded on each of an upper side and a lower side of the spliced feature
map. The intelligent processing apparatus calculates the distance Δw by using the
following (formula 2):

[0040] After the distance Δw is obtained, to ensure that the distance between the center
of the last operation on the first feature map by the convolutional kernel and the
center of the first operation on the second feature map by the convolutional kernel
in the first horizontal movement of the convolutional kernel is an integer multiple
of the stride size s1 of the convolutional kernel, the intelligent processing apparatus
calculates, according to (formula 3) based on the distance Δw, the padding size p1,
and the stride size s1, the column quantity c1 of the third data to be finally padded
between the first feature map and the second feature map:

[0041] After the column quantity c 1 of the third data is obtained through calculation,
when the first feature map and the second feature map are spliced, elements of h1
rows and c1 columns (that is, the third data) are padded between the last column of
the first feature map and the first column of the second feature map, the elements
of p1 columns are padded on each of the left side and the right side of the spliced
feature map, and the elements of p1 rows are padded on each of the upper side and
the lower side of the spliced feature map, to obtain the third feature map (that is,
the fourth data). In other words, after the third data is padded between the first
feature map and the second feature map to obtain the spliced feature map, values of
the row quantity and the column quantity of the elements that are padded on the upper
side, the lower side, the left side, and the right side of the spliced feature map
are the padding size p1 of the first convolutional layer. The third feature map includes
elements of h1+2p1 rows and w1+c1+w2+2p1 columns, where values of the padded elements
are 0s.
[0042] For example, as shown in FIG. 6, both the first feature map and the second feature
map are 4*4 feature maps, and the convolutional kernel size k1 is 5, the padding size
p1 is 2, the stride size s1 is 1, and the dilation rate d1 is 1. Therefore, the elements
of p1 columns are padded between the first feature map and the second feature map,
elements of two columns are padded on each of the left side and the right side of
the spliced feature map, and elements of two rows are padded on each of the upper
side and the lower side of the spliced feature map. The obtained spliced feature map
is shown in FIG. 6. According to the foregoing (formula 1) and (formula 2), it can
be learned by calculation that Δw is 2. As shown in FIG. 6, in the first horizontal
movement of the convolutional kernel, the center of the last operation on the first
feature map is an element 0, the center of the first operation on the second feature
map by the convolutional kernel is an element 4, and a distance between the two elements
is two columns of elements, that is, Δw is equal to 2. According to the foregoing
(formula 1) to (formula 3), it can be learned by calculation that a value of c1 is
equal to 2. Because the value of c1 is the same as the value of the padding size p1,
the third feature map obtained after the elements of c1 columns are padded between
the first feature map and the second feature map is the same as the spliced feature
map in FIG. 6.
[0043] S503: Complete data processing on the fourth data by using a convolutional neural
network.
[0044] After the elements of c1 columns are padded between the first feature map and the
second feature map, and the first feature map and the second feature map are spliced
to obtain the third feature map, the third feature map is input to the first convolutional
layer for convolutional processing to obtain sixth data (that is, a fourth feature
map). The fourth feature map includes seventh data (that is, a fifth feature map),
eighth data (that is, a sixth feature map), and interference data. The fifth feature
map is a feature map obtained after the first feature map is separately input to the
first convolutional layer for convolutional processing. The sixth feature map is a
feature map obtained after the second feature map is separately input to the first
convolutional layer for convolutional processing. That is, the fifth feature map is
a feature extracted from the first feature map, and the sixth feature map is a feature
extracted from the second feature map. The interference data is elements between the
last column of the fifth feature map and the first column of the sixth feature map.
FIG. 7 is a schematic diagram of performing convolutional processing on a spliced
feature map according to an embodiment of this application. The third feature map
includes elements of eight rows and fourteen columns, and after a convolutional operation
is performed by using a convolutional kernel whose size is 5*5 and whose stride size
is 1, a fourth feature map including elements of four rows and 10 columns is output.
The fifth feature map includes elements of four rows and four columns, the sixth feature
map includes elements of four rows and four columns, and interference elements of
four rows and two columns other than the fifth feature map and the sixth feature map
are features that do not belong to the first image or the second image.
[0045] Splicing feature maps corresponding to two images into one feature map including
more elements, and inputting the one feature map obtained after splicing to a convolutional
neural network model for processing can avoid separately performing operations such
as vectorization and fractal on each image in a process of simultaneously processing
a plurality of images by the convolutional neural network. This improves efficiency
of data processing performed by the intelligent processing apparatus by using the
convolutional neural network, and fully utilizes a bandwidth of the intelligent processing
apparatus.
[0046] In addition, if no element used for isolation is padded between the first feature
map and the second feature map, and the first feature map and the second feature map
are directly spliced to obtain a spliced feature map, in an output feature map obtained
by performing a convolutional operation on the spliced feature map when the spliced
feature map passes through a convolutional layer, a receptive field corresponding
to one element may include elements of the two feature maps. For example, as shown
in FIG. 8, the spliced feature map obtained by directly splicing the first feature
map and the second feature map is input to the first convolutional layer, and the
output feature map obtained after convolution is performed on the spliced feature
map is shown in FIG. 8. A receptive field of any one element in the third to sixth
columns in the output feature map includes an element of the first feature map and
an element of the second feature map, that is, one element in the output feature map
includes features of the two images. For example, an area included in a dashed line
box in FIG. 8 is a receptive field corresponding to the third element 5 in the first
row of the output feature map. The receptive field includes elements of the first
feature map and the second feature map, that is, the element 5 includes features of
the two images. This affects subsequent detection, recognition, and tracking of an
object in the images.
[0047] In this embodiment of this application, by padding elements used to isolate two feature
maps between the two feature maps, a result obtained after a spliced feature map passes
through a convolutional layer of the convolutional neural network includes a result
obtained when each feature map is separately processed. This avoids the following
problem: After two feature maps are directly spliced for convolutional processing,
receptive fields corresponding to some elements in the output feature map include
elements of the two feature maps, and consequently final detection and recognition
results of the convolutional neural network are inaccurate.
[0048] Because the convolutional neural network includes a plurality of convolutional layers
and a plurality of pooling layers, after the third feature map obtained by splicing
is input to the first convolutional layer for convolutional processing to obtain the
fourth feature map, the fourth feature map is further input to a next network layer
(a convolutional layer or a pooling layer) of the convolutional neural network for
convolutional processing or pooling processing. The fourth feature map includes one
or more columns of interference elements used to isolate the fifth feature map from
the sixth feature map, and values of the interference elements are not all 0s. Therefore,
before the fourth feature map is input to the next network layer of the convolutional
neural network, a column quantity c2 of the interference elements needs to be determined,
and the interference elements of the c2 columns need to be deleted from the fourth
feature map. Then, a column quantity c3 of ninth data that needs be padded between
the last column of the fifth feature map and the first column of the sixth feature
map is calculated based on convolutional parameters or pooling parameters of the next
network layer with reference to the foregoing (formula 1) to (formula 3). In addition,
the ninth data is padded between the last column of the fifth feature map and the
first column of the sixth feature map to obtain tenth data (that is, a seventh feature
map), and finally the seventh feature map is input to a next network layer.
[0049] The intelligent processing apparatus determines, based on the column quantity of
the third feature map and the convolutional parameters of the first convolutional
layer, a column quantity wo2 of the fourth feature map output by the first convolutional
layer, and then determines, based on the column quantities of the output feature maps
output after the first feature map and the second feature map are separately input
to the first convolutional layer for convolutional operations, a column quantity c2
of elements used for isolation in the fourth feature map. Specifically, the intelligent
processing apparatus determines the column quantity wo2 of the fourth feature map
according to (formula 4):

[0050] Herein, w2 is the column quantity of the second feature map. After the column quantity
of the fourth feature map is obtained, the column quantity c2 of the interference
elements in the fourth feature map is determined according to (formula 5):

[0051] Herein, w
o1 is the column quantity of the output feature map output after the first feature map
is separately input to the first convolutional layer for the convolutional operation,
and w
o3 is the column quantity of the output feature map output after the second feature
map is separately input to the first convolutional layer for the convolutional operation.
[0052] After the column quantity c
2 of the interference elements is determined, the elements of c
2 columns are deleted starting from the (w
o1+1)
th column of the third feature map. Then, according to the same method in S502, the
column quantity c
3 of the ninth data to be padded between the last column of the fifth feature map and
the first column of the sixth feature map is calculated, and the ninth data of c
3 columns is padded between the last column of the fifth feature map and the first
column of the sixth feature map to obtain a spliced feature map. In addition, elements
of p
2 rows are padded on each of an upper side and a lower side of the spliced feature
map, elements of p
2 columns are padded on each of a left side and a right side of the spliced feature
map to obtain the tenth data (that is, the seventh feature map), and the seventh feature
map is input to a next network layer of the convolutional neural network. Herein,
p
2 is a padding size corresponding to the next network layer of the convolutional neural
network.
[0053] For example, as shown in FIG. 9, after the third feature map passes through the first
convolutional layer to obtain the fourth feature map, the intelligent processing apparatus
can learn, through calculation according to (formula 4), that a value of the column
quantity w
o2 of the fourth feature map is 10, and learn, through calculation according to (formula
5), that a value of the column quantity c
2 of the elements used for isolation in the fourth feature map is 2. Then, the intelligent
processing apparatus deletes elements of the fifth and sixth columns from the fourth
feature map. If the next network layer is a second convolutional layer, a value of
a convolutional kernel size k2 is 3, a padding size p2 is 1, a value of a stride size
s is 1, and a value of a dilation rate d is 1 for the second convolutional layer.
The intelligent processing apparatus can determine, according to the foregoing (formula
1) to (formula 3), that a value of the column quantity c
3 of the elements that need be padded between the fifth feature map and the sixth feature
map is 1. Then, the intelligent processing apparatus pads the ninth data with one
row of elements whose values are 0s between the last column of the fifth feature map
and the first column of the sixth feature map to obtain the spliced feature map, pads
elements of one row on each of the upper side and the lower side of the spliced feature
map to obtain the tenth data (that is, the seventh feature map), and input the seventh
feature map to the next network layer of the convolutional neural network.
[0054] In a possible implementation, after the value of the column quantity c
2 of the interference elements and the value of the column quantity c
3 of the elements to be padded between the fifth feature map and the sixth feature
map are determined, a column quantity of elements that need to be added or deleted
between the fifth feature map and the sixth feature map may be determined based on
the value of c
2 and the value of c
3.
[0055] If the value of c
2 is the same as the value of c
3, all values of the interference elements are replaced with 0s. Then, elements of
p
2 columns are padded on each of a left side and a right side of the fourth feature
map, and elements of p
2 rows are padded on each of an upper side and a lower side of the fourth feature map,
to obtain the seventh feature map, and the first feature map is input to the next
network layer of the convolutional neural network. Herein, p
2 is a padding size corresponding to the next network layer of the convolutional neural
network. If the value of c
3 is less than the value of c
2, elements of (c
2-c
3) columns in the interference elements are deleted starting from the (w
o1+1)
th column of the fourth feature map, elements of c
3 columns in the interference elements are retained, and values of the retained elements
of c
3 columns are replaced with 0s. Then, elements of pz columns are padded on each of
a left side and a right side of the fourth feature map, and elements of p
2 rows are padded on each of an upper side and a lower side of the fourth feature map,
to obtain the seventh feature map, and then the seventh feature map is input to the
next network layer of the convolutional neural network. If the value of c
3 is greater than the value of cz, elements of (c
2-c
3) columns whose values are 0s are added between the fifth feature map and the sixth
feature map, and values of interference elements of c
2 columns are replaced with 0s. Then, elements of p
2 columns are padded on each of a left side and a right side of the fourth feature
map, and elements of p
2 rows are padded on each of an upper side and a lower side of the fourth feature map,
to obtain the seventh feature map, and then the seventh feature map is input to the
next network layer of the convolutional neural network.
[0056] The foregoing describes how to determine the column quantity of the third data by
using the example in which the first feature map and the second feature map are horizontally
spliced, and in the third feature map, the first feature map is located on the left
side of the second feature map. If in the third feature map, the second feature map
is located on the left side of the first feature map, when the column quantity c
1 of the third data is calculated, the column quantity in the foregoing (formula 1)
and (formula 2) is replaced with the column quantity w
2 of the second feature map, that is, in the (formula 1) and (formula 2), a column
quantity of a feature map processed first by the convolutional kernel is used for
calculation.
[0057] It should be understood that, during splicing on the first feature map and the second
feature map, the first feature map and the second feature map can be further vertically
spliced. When the first feature map and the second feature map are vertically spliced,
the intelligent processing apparatus replaces the column quantity in the foregoing
formula with a row quantity of a corresponding feature map during calculation performed
according to the foregoing formula. When the first feature map and the second feature
map are vertically spliced, the intelligent processing apparatus first determines
a row quantity r
1 of third data based on a row quantity h
1 of the first feature map and network parameters of a first network layer; and then,
pads the third data of h
1 rows and r
1 columns whose element values are 0s between the last row of the first feature map
and the first row of the second feature map to obtain a spliced feature map. Then,
the intelligent processing apparatus pads elements of p
1 columns on each of a left side and a right side of the spliced feature map, and pads
elements of p
1 rows on each of an upper side and a lower side of the spliced feature map, to obtain
a third feature map. The third feature map includes elements of h
1+r
1+h
2+2p
1 rows and w
1+2p
1 columns.
[0058] When the row quantity r
1 of the third data is determined, the intelligent processing apparatus first determines,
based on the row quantity h
1 of the first feature map and a convolutional kernel size k
1, a padding size p
1, and a stride size s
1 that correspond to the first convolutional layer, a row quantity h
o1 of fifth data output after the first feature map is input to a first convolutional
layer; based on the row quantity w
1, the padding size p
1, the stride size s
1, and the row quantity h
o1 of the fifth data, determines a distance Δh between a center of the last operation
on the first feature map and a center of the first operation on the second feature
map in a vertical direction when a convolutional kernel of the first convolutional
layer processes spliced data after the spliced data is obtained by padding data of
h
1 rows and p
1 columns between the last row of the first feature map and the first row of the second
feature map; and determines the row quantity r
1 of the third data based on the padding size p1, the stride size s1, and the distance
Δh. That is, a method for calculating a row quantity of elements to be padded between
the first feature map and the second feature map when the first feature map and the
second feature map are vertically spliced is the same as a method for calculating
a column quantity of elements to be padded between the first feature map and the second
feature map when the first feature map and the second feature map are horizontally
spliced, and only the column quantity of the first feature map in the foregoing (formula
1) to (formula 3) needs to be replaced with the row quantity of the first feature
map. For example, when the row quantity h
o1 of the output feature map output after the first feature map is separately input
to the first convolutional layer for a convolutional operation is determined according
to (formula 1), the column quantity w
1 of the first feature map is replaced with the row quantity h
1 of the first feature map.
[0059] After obtaining the third feature map, the intelligent processing apparatus inputs
the third feature map to the first convolutional layer to perform convolutional processing
to obtain sixth data (that is, a fourth feature map). The fourth feature map includes
seventh data (that is, a fifth feature map), eighth data (that is, a sixth feature
map), and interference data. The fifth feature map is a feature map obtained after
the first feature map is separately input to the first convolutional layer for convolutional
processing. The sixth feature map is a feature map obtained after the second feature
map is separately input to the first convolutional layer for convolutional processing.
That is, the fifth feature map is a feature extracted from the first feature map,
and the sixth feature map is a feature extracted from the second feature map. The
interference data is elements between the last row of the fifth feature map and the
first row of the sixth feature map.
[0060] Before the fourth feature map is input to a next network layer, the intelligent processing
apparatus determines a row quantity h
o2 of the fourth feature map based on the row quantity h
1+r
1+h
2+2p
1 of the third feature map and the network parameters of the first network layer; determines
a row quantity r
2 of the interference data based on the row quantity h
o2 of the fourth feature map, a row quantity of the fifth feature map, and a row quantity
of the sixth feature map, and deletes the interference data of r
2 rows; determines a row quantity r
3 of ninth data to be padded between the last row of the fifth feature map and the
first row of the sixth feature map, and pads the ninth data of r
3 columns between the last row of the fifth feature map and the first row of the sixth
feature map, to obtain tenth data (that is, a seventh feature map); and finally inputs
the seventh feature map to a next network layer.
[0061] It should be understood that, a method for calculating the row quantity of the interference
elements between the fifth feature map and the sixth feature map when the first feature
map and the second feature map are vertically spliced is the same as a method for
calculating the column quantity of the interference elements when the first feature
map and the second feature map are horizontally spliced, and only a column quantity
of each feature map in the foregoing (formula 4) and (formula 5) needs to be replaced
with a row quantity of the corresponding feature map. For example, when the row quantity
h
o2 of the fourth feature map output after the third feature map is input to the first
convolutional layer for a convolutional operation is determined according to (formula
4), the column quantity w
1 of the first feature map is replaced with the row quantity h
1 of the first feature map, the column quantity w
2 of the second feature map is replaced with the row quantity h
2 of the first feature map, and the column quantity c
1 is replaced with r
1.
[0062] It should be understood that, when the first feature map is a pixel matrix corresponding
to the first image, and the second feature map is a pixel matrix corresponding to
the second image, that is, when the first feature map and the second feature map are
input to the first convolutional layer, it is only necessary to determine a column
quantity or a row quantity of elements to be padded between the first feature map
and the second feature map.
[0063] In this embodiment of this application, when the row quantities of the first feature
map and the second feature map are the same, and the column quantities of the first
feature map and the second feature map are also the same, a fully connected layer
may be replaced with a convolutional layer. A convolutional kernel size of the convolutional
layer used to replace the fully connected layer is a size of a single feature map,
a stride size of a convolutional kernel in a horizontal direction is equal to a column
quantity of a single feature map, and a stride size of the convolutional kernel in
a vertical direction is equal to a row quantity of a single feature map. Therefore,
when a feature map output by a convolutional layer or a pooling layer needs to be
input to a fully connected layer, only interference elements of a feature map output
by the last convolutional layer or the last pooling layer need to be determined, and
after the interference elements are deleted, different feature maps are directly spliced
and input to a fully connected layer, with no need to pad elements used for isolation
between the different feature maps. For example, when the fourth feature map is a
feature map output by the last convolutional layer, only a column quantity c
2 of the interference elements between the fifth feature map and the sixth feature
map needs to be determined, and the interference elements of c
2 columns are deleted from the fourth feature map. Then, the fifth feature map and
the sixth feature map are directly spliced to obtain a seventh feature map, and the
seventh feature map may be input to a fully connected layer.
[0064] The foregoing uses an example in which the first feature map and the second feature
map are input to a convolutional layer to describe how to determine the column quantity
of elements to be padded between the first feature map and the second feature map,
and how to splice the first feature map and the second feature map, and describe how
to determine the column quantity of elements to be added or deleted between the fifth
feature map and the sixth feature map after the fourth feature map is obtained and
before the fourth feature map is input to a next convolutional layer or pooling layer.
If the first feature map and the second feature map are input to a pooling layer,
the intelligent processing apparatus needs to obtain pooling parameters of the pooling
layer to which the first feature map and the second feature map are to be input, and
replaces the convolutional parameters in (formula 1) to (formula 5) with the pooling
parameters for calculation. For example, a convolutional kernel size is replaced with
a pooling kernel size of the pooling layer, a padding size of a convolutional layer
is replaced with a padding size of the pooling layer, a stride size of a convolutional
kernel of the convolutional layer is replaced with a stride size of a pooling kernel
of the pooling layer, and the like. When the pooling parameters are used for calculation,
a value of a dilation rate d in the foregoing (formula 1) is 1.
[0065] The foregoing embodiment uses an example in which two feature maps are spliced to
describe the data processing method provided in this embodiment of this application.
It should be understood that the splicing method for feature maps may be further applied
to splicing of more than two feature maps. When a quantity of feature maps that need
to be spliced is greater than or equal to 3, two feature maps may be spliced to obtain
a new feature map according to the splicing method for two feature maps, and then
the new feature map and the other feature map are spliced according to the splicing
method for two feature maps, until all feature maps are spliced into one feature map.
When a quantity of feature maps that need to be spliced is an even number greater
than 2, half of the feature maps may be horizontally spliced according to the splicing
method for feature maps whose quantity is greater than or equal to 3, to obtain a
new feature map. Then, the other half of the feature maps are horizontally spliced
according to the same method to obtain another new feature map. Subsequently, the
two new feature maps are vertically spliced to obtain a final feature map.
[0066] In the foregoing embodiment, an example in which both the first data and the second
data are image data is used to describe the data processing method provided in this
embodiment of this application. It should be understood that the foregoing method
may be further applied to processing of voice data or text sequences. For example,
when the first data and the second data each are a segment of voice data, after receiving
the first data and the second data, the intelligent processing apparatus first separately
converts the voice data into a text sequence, then converts each word (word) in the
text sequence into a word vector by using a word embedding (word embedding) algorithm,
and forms one matrix by using the word vector corresponding to each word in the segment
of voice data according to a preset rule. The matrix is in a same form as the first
feature map. Therefore, the intelligent processing apparatus may convert the two segments
of voice data into matrices, then splice the matrices corresponding to the segments
of voice data according to a method same as the foregoing method for feature maps,
and input to a spliced matrix to the convolutional neural network for processing.
[0067] It should be noted that, the foregoing data processing method may be further applied
to processing of different types of data. The first data may be any one of image data,
voice data, or a text sequence, and the second data may be any one of image data,
audio data, or a text sequence. This is not specifically limited in embodiments of
this application.
[0068] It should be noted that, for brief description, the foregoing method embodiment is
described as a series of action combinations. However, a person skilled in the art
should understand that the present invention is not limited by the described action
sequence. In addition, a person skilled in the art should also understand that, embodiments
described in the specification are all preferred embodiments, and related actions
are unnecessarily mandatory for the present invention.
[0069] Other appropriate step combinations that can be figured out by a person skilled in
the art based on the content described above also fall within the protection scope
of the present invention. In addition, a person skilled in the art should also understand
that all embodiments described in this specification are preferred embodiments, and
the related actions are not necessarily mandatory to the present invention.
[0070] The foregoing describes in detail the data method provided in embodiments of this
application with reference to FIG. 1 to FIG. 9. The following describes a data processing
apparatus and a computing device provided in embodiments of this application with
reference to FIG. 10 and FIG. 12.
[0071] FIG. 10 is a schematic diagram of a data processing apparatus according to an embodiment
of this application. The data processing apparatus is used in the intelligent processing
apparatus shown in FIG. 2 or FIG. 3, and the data processing apparatus 100 includes
an obtaining unit 101, a padding unit 102, and a processing unit 103.
[0072] The obtaining unit 101 is configured to obtain first data and second data, where
the first data is any one of image data, voice data, or a text sequence, and the second
data is any one of image data, audio data, or a text sequence. The first data and
the second data are data that needs to be spliced in adjacent sequences. During splicing,
a sequence of the first data is prior to a sequence of the second data. In other words,
in a sequence obtained after splicing is completed, the first data is processed before
the second data.
[0073] The padding unit 102 is configured to pad third data between the first data and the
second data to obtain fourth data, where the third data is used to isolate the first
data from the second data.
[0074] The processing unit 103 is configured to complete data processing on the fourth data
by using a convolutional neural network. Because the convolutional neural network
includes a plurality of convolutional layers and a plurality of pooling layers, processing
the fourth data by using the convolutional neural network means that the fourth data
is input to a first network layer for processing. The first network layer may be a
convolutional layer or may be a pooling layer.
[0075] It should be understood that the data processing apparatus 100 in this embodiment
of this application may be implemented by using an application-specific integrated
circuit (application-specific integrated circuit, ASIC) or a programmable logic device
(programmable logic device, PLD). The PLD may be a complex programmable logic device
(complex programmable logic device, CPLD), a field-programmable gate array (field-programmable
gate array, FPGA), a general array logic (generic array logic, GAL), or any combination
thereof. Alternatively, when the data processing method shown in FIG. 1 to FIG. 9
is implemented by using software, the data processing apparatus 100 and modules thereof
may be software modules.
[0076] It should be understood that, when the third data is padded between the first data
and the second data to splice the first data and the second data into the fourth data,
the first data and the second data may be horizontally or vertically spliced. For
a manner of splicing between the first data and the second data and a method for determining
a row quantity and column quantity of the third data during splicing, refer to specific
descriptions in the method embodiment corresponding to FIG. 5. Details are not described
herein again.
[0077] In a possible implementation, as shown in FIG. 11, the data processing apparatus
100 further includes a deletion unit 104. That the processing unit 103 completes data
processing on the fourth data by using a convolutional neural network specifically
includes: inputting the fourth data to the first network layer for processing to obtain
sixth data, where the sixth data includes seventh data, eighth data, and interference
data, the seventh data is obtained after the first network layer processes the first
data, the eighth data is obtained after the first network layer processes the second
data, and the interference data is data between the seventh data and the eighth data.
[0078] The deletion unit 104 is configured to determine a column quantity or a row quantity
of the interference data, and delete the interference data. For a method for determining
the column quantity or the row quantity of the interference data by the deletion unit
104, refer to the method for determining the interference column quantity c
2 and the interference row quantity h
2 by the intelligent processing apparatus in the foregoing method embodiment. Details
are not described herein again.
[0079] The padding unit 102 is further configured to: after the interference data is deleted,
determine a column quantity or a row quantity of ninth data to be padded between the
seventh data and the eighth data, and pad the ninth data between the seventh data
and the eighth data to obtain tenth data.
[0080] The processing unit 103 is further configured to complete data processing on the
tenth data by using the convolutional neural network.
[0081] Specifically, for a data processing operation implemented by the foregoing data processing
apparatus 100, refer to related operations of the intelligent processing apparatus
in the foregoing method embodiment. Details are not described herein again.
[0082] FIG. 12 is a schematic diagram of a structure of a computing device according to
an embodiment of this application. The computing device 200 includes a processor 210,
a communication interface 220, and a memory 230. The processor 210, the communication
interface 220, and the memory 230 are connected to each other through a bus 240. The
processor 210 is configured to execute instructions stored in the memory 230. The
memory 230 stores program code, and the processor 210 may invoke the program code
stored in the memory 230 to perform the following operations:
[0083] An intelligent processing apparatus obtains first data and second data, pads, according
to a preset rule, third data used to isolate the first data from the second data between
the first data and the second data, to obtain fourth data, and then completes processing
on the fourth data by using a convolutional neural network. The first data and the
second data are data to be spliced together, and a sequence of the first data is prior
to a sequence of the second data during splicing. The first data is any one of image
data, voice data, or a text sequence, and the second data is any one of image data,
voice data, or a text sequence.
[0084] In this embodiment of this application, the processor 210 may have a plurality of
specific implementation forms. For example, the processor 210 may be any one or a
combination of a plurality of processors such as a CPU, a GPU, a TPU, or an NPU, or
the processor 210 may be a single-core processor or a multi-core processor. The processor
210 may include a combination of a CPU (GPU, TPU, or NPU) and a hardware chip. The
hardware chip may be an application-specific integrated circuit (application-specific
integrated circuit, ASIC), a programmable logic device (programmable logic device,
PLD), or a combination thereof. The PLD may be a complex programmable logic device
(complex programmable logic device, CPLD), a field-programmable logic gate array (field-programmable
gate array, FPGA), a generic array logic (generic array logic, GAL), or any combination
thereof. Alternatively, the processor 210 may be implemented independently by using
a logic device with embedded processing logic, for example, an FPGA or a digital signal
processor (digital signal processor, DSP).
[0085] The communication interface 220 may be a wired interface or a wireless interface,
and is configured to communicate with another module or device, for example, receive
a video or an image sent by a surveillance device in FIG. 2, or receive a video or
an image sent by user equipment in FIG. 3. The wired interface may be an Ethernet
interface, a controller area network (controller area network, CAN) interface, or
a local interconnect network (local interconnect network, LIN) interface, and the
wireless interface may be a cellular network interface, a wireless local area network
interface, or the like.
[0086] The memory 230 may be a non-volatile memory, for example, a read-only memory (read-only
memory, ROM), a programmable read-only memory (programmable ROM, PROM), an erasable
programmable read-only memory (erasable PROM, EPROM), an electrically erasable programmable
read-only memory (electrically EPROM, EEPROM), or a flash memory. The memory 230 may
alternatively be a volatile memory, which may be a random access memory (random access
memory, RAM) that serves as an external cache.
[0087] The memory 230 may alternatively be configured to store instructions and data, so
that the processor 210 invokes the instructions stored in the memory 230 to implement
an operation performed by the processing unit 103 or an operation performed by the
intelligent processing apparatus in the method embodiment. Further, the computing
device 200 may include more or fewer components than those shown in FIG. 12, or may
have different component configurations.
[0088] The bus 240 may be classified into an address bus, a data bus, a control bus, and
the like. For ease of representation, only one thick line is used for representation
in FIG. 12, but this does not mean that there is only one bus or only one type of
bus.
[0089] Optionally, the computing device 200 may further include an input/output interface
250. The input/output interface 250 is connected to an input/output device, and is
configured to receive input information and output an operation result.
[0090] It should be understood that the computing device 200 in this embodiment of this
application may correspond to the data processing apparatus 100 in the foregoing embodiment,
and may perform operations performed by the intelligent processing apparatus in the
foregoing method embodiment. Details are not described herein again.
[0091] An embodiment of this application further provides a non-transient computer storage
medium. The computer storage medium stores instructions, and when the instructions
are run on a processor, the method steps in the foregoing method embodiment may be
implemented. For specific implementation of performing the foregoing method steps
by the processor of the computer storage medium, refer to specific operations of the
foregoing method embodiment. Details are not described herein again.
[0092] In the foregoing embodiments, the description of each embodiment has respective focuses.
For a part that is not described in detail in an embodiment, refer to related descriptions
in other embodiments.
[0093] All or some of the foregoing embodiments may be implemented by using software, hardware,
firmware, or any combination thereof. When software is used to implement embodiments,
the foregoing embodiments may be implemented completely or partially in a form of
a computer program product. The computer program product includes one or more computer
instructions. When the computer program instructions are loaded or executed on the
computer, the procedures or functions according to embodiments of this application
are all or partially generated. The computer may be a general-purpose computer, a
dedicated computer, a computer network, or other programmable apparatuses. The computer
instructions may be stored in a computer-readable storage medium or may be transmitted
from a computer-readable storage medium to another computer-readable storage medium.
For example, the computer instructions may be transmitted from a website, computer,
server, or data center to another website, computer, server, or data center in a wired
(for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL))
or wireless (for example, infrared, radio, or microwave) manner. The computer-readable
storage medium may be any usable medium accessible by a computer, or a data storage
device, such as a server or a data center, integrating one or more usable media. The
usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or
magnetic tape), an optical medium, or a semiconductor medium, and the semiconductor
medium may be a solid state disk.
[0094] The foregoing descriptions are merely specific implementations of this application.
Based on the specific implementations provided in this application, a person skilled
in the art can figure out variations or replacements, which shall all fall within
the protection scope of this application.
1. A data processing method, wherein the method comprises:
obtaining first data and second data, wherein the first data and the second data are
adjacent sequence data, and a sequence of the first data is prior to a sequence of
the second data;
padding third data between the first data and the second data according to a preset
rule, to obtain fourth data, wherein the third data is used to isolate the first data
from the second data; and
completing data processing on the fourth data by using a convolutional neural network.
2. The method according to claim 1, wherein the first data identifies any one of image
data, voice data, or a text sequence, and the second data identifies any one of image
data, audio data, or a text sequence.
3. The method according to claim 2, wherein the padding third data between the first
data and the second data according to a preset rule, to obtain fourth data comprises:
padding third data of h1 rows and c1 columns between the last column of the first
data and the first column of the second data, to obtain the fourth data, wherein the
first data comprises h1 rows and w1 columns, the second data comprises h2 rows and
w2 columns, the fourth data comprises h1+2p1 rows and w1+c1+w2+2p1 columns, values
of h1 and h2 are the same, p1 is a padding size corresponding to a first network layer,
and the first network layer is a network layer to which the fourth data is to be input.
4. The method according to claim 2, wherein the padding third data between the first
data and the second data according to a preset rule, to obtain fourth data comprises:
padding third data of r1 rows and w 1 columns between the last row of the first data
and the first row of the second data, to obtain the fourth data, wherein the first
data comprises h1 rows and w1 columns, the second data comprises h2 rows and w2 columns,
the fourth data comprises h1+r1+h2+2p1 rows and w1+2p1 columns, values of w1 and w2
are the same, p1 is a padding size corresponding to a first network layer, and the
first network layer is a network layer to which the fourth data is to be input.
5. The method according to claim 3, wherein the padding third data of h1 rows and c1
columns between the last column of the first data and the first column of the second
data, to obtain the fourth data comprises:
determining a column quantity c 1 of the third data based on the column quantity w
1 of the first data and network parameters of the first network layer, wherein the
network parameters of the first network layer comprise a size k1 of a convolutional
kernel or a pooling kernel, the padding size p1, and a stride size s1; and
obtaining a row quantity h of a first feature map, and padding the third data of h
rows and c1 columns between the last column of the first data and the first column
of the second data, to obtain the fourth data.
6. The method according to claim 5, wherein the determining a column quantity c1 of the
third data based on the column quantity w1 of the first data and network parameters
of the first network layer comprises:
determining, based on the column quantity w1 of the first data, the size k1, the padding
size p1, and the stride size s1, a column quantity wo1 of fifth data output after
the first data is input to the first network layer;
determining, based on the column quantity w1, the padding size p1, the stride size
s1, and the column quantity wo1, a distance Δw between a center of the last operation
on the first data and a center of the first operation on the second data in a horizontal
direction when the convolutional kernel or the pooling kernel processes spliced data
after the spliced data is obtained by padding data of h1 rows and p1 columns between
the last column of the first data and the first column of the second data; and
determining the column quantity c1 based on the padding size p1, the stride size s1,
and the distance Δw.
7. The method according to claim 5 or 6, wherein the completing data processing on the
fourth data by using a convolutional neural network comprises:
inputting the fourth data into the first network layer for processing to obtain sixth
data, wherein the sixth data comprises seventh data, eighth data, and interference
data, the seventh data is obtained after the first network layer processes the first
data, the eighth data is obtained after the first network layer processes the second
data, and the interference data is data between the last column of the seventh data
and the first column of the eighth data;
determining a column quantity c2 of the interference data, and deleting the interference
data of c2 columns;
determining a column quantity c3 of ninth data padded between the last column of the
seventh data and the first column of the eighth data;
padding the ninth data of c3 columns between the last column of the seventh data and
the first column of the eighth data, to obtain tenth data; and
completing data processing on the tenth data by using the convolutional neural network.
8. The method according to claim 7, wherein the determining a column quantity c2 of the
interference data comprises:
determining a column quantity wo2 of the sixth data based on a column quantity w1+c1+w2+2p1
of the fourth data and the network parameters of the first network layer; and
determining the column quantity c2 of the interference data based on the column quantity
wo2 of the sixth data, a column quantity of the seventh data, and a column quantity
of the eighth data.
9. The method according to claim 4, wherein the padding third data of r1 rows and w1
columns between the last row of the first data and the first row of the second data,
to obtain the fourth data comprises:
determining a row quantity r1 of the third data based on the row quantity h1 of the
first data and network parameters of the first network layer, wherein the network
parameters of the first network layer comprise a size k1 of a convolutional kernel
or a pooling kernel, the padding size p1, and a stride size s1; and
obtaining a column quantity w1 of the first data, and padding the third data of r1
rows and w1 columns between the first data and the second data, to obtain the fourth
data.
10. The method according to claim 9, wherein the determining a row quantity r1 of the
third data based on the row quantity h1 of the first data and network parameters of
the first network layer comprises:
determining, based on the row quantity h1 of the first data, the size k1, the padding
size p1, and the stride size s1, a row quantity ho1 of fifth data output after the
first data is input to the first network layer;
determining, based on the row quantity w1, the padding size p1, the stride size s1,
and the row quantity ho1, a distance Δh between a center of the last operation on
the first data and a center of the first operation on the second data in a vertical
direction when the convolutional kernel or the pooling kernel processes spliced data
after the spliced data is obtained by padding data of h1 rows and p1 columns between
the last row of the first data and the first row of the second data; and
determining the row quantity r1 based on the padding size p1, the stride size s1,
and the distance Δh.
11. The method according to claim 9 or 10, wherein the completing data processing on the
fourth data by using a convolutional neural network comprises:
inputting the fourth data into the first network layer for processing to obtain sixth
data, wherein the sixth data comprises seventh data, eighth data, and interference
data, the seventh data is obtained after the first network layer processes the first
data, the eighth data is obtained after the first network layer processes the second
data, and the interference data is data between the last row of the seventh data and
the first row of the eighth data;
determining a row quantity r2 of the interference data, and deleting the interference
data of r2 rows;
determining a row quantity r3 of ninth data padded between the last row of the seventh
data and the first row of the eighth data;
padding the ninth data of r3 rows between the last row of the seventh data and the
first row of the eighth data, to obtain tenth data; and
completing data processing on the tenth data by using the convolutional neural network.
12. The method according to claim 11, wherein the determining a row quantity r2 of the
interference data comprises:
determining a row quantity ho2 of the sixth data based on a row quantity h1+r1+h2+2p1
of the fourth data and the network parameters of the first network layer; and
determining the row quantity r2 of the interference data based on the row quantity
ho2 of the sixth data, a row quantity of the seventh data, and a row quantity of the
eighth data.
13. A data processing apparatus, wherein the apparatus comprises:
an obtaining unit, configured to obtain first data and second data, wherein the first
data and the second data are adjacent sequence data, and a sequence of the first data
is prior to a sequence of the second data;
a padding unit, configured to pad third data between the first data and the second
data to obtain fourth data, wherein the third data is used to isolate the first data
from the second data; and
a processing unit, configured to complete data processing on the fourth data by using
a convolutional neural network.
14. The apparatus according to claim 12, wherein the first data is any one of image data,
voice data, or a text sequence, and the second data is any one of image data, audio
data, or a text sequence.
15. The apparatus according to claim 14, wherein the padding unit is specifically configured
to:
pad third data of h1 rows and c1 columns between the last column of the first data
and the first column of the second data, to obtain the fourth data, wherein the first
data comprises h1 rows and w1 columns, the second data comprises h2 rows and w2 columns,
the fourth data comprises h1+2p1 rows and w1+c1+w2+2p1 columns, values of h1 and h2
are the same, p1 is a padding size corresponding to a first network layer, and the
first network layer is a network layer to which the fourth data is to be input.
16. The apparatus according to claim 14, wherein the padding unit is specifically configured
to:
pad third data of r1 rows and w1 columns between the last row of the first data and
the first row of the second data, to obtain the fourth data, wherein the first data
comprises h1 rows and w1 columns, the second data comprises h2 rows and w2 columns,
the fourth data comprises h1+r1+h2+2p1 rows and w1+2p1 columns, values of w1 and w2
are the same, p1 is a padding size corresponding to a first network layer, and the
first network layer is a network layer to which the fourth data is to be input.
17. The apparatus according to claim 15, wherein that the padding unit pads the third
data of h1 rows and c1 columns between the last column of the first data and the first
column of the second data, to obtain the fourth data specifically comprises:
determining a column quantity c 1 of the third data based on the column quantity w
1 of the first data and network parameters of the first network layer, wherein the
network parameters of the first network layer comprise a size k1 of a convolutional
kernel or a pooling kernel, the padding size p1, and a stride size s1; and
obtaining a row quantity h of a first feature map, and padding the third data of h
rows and c1 columns between the last column of the first data and the first column
of the second data, to obtain the fourth data.
18. The apparatus according to claim 17, wherein that the padding unit determines a column
quantity c1 of the third data based on the column quantity w1 of the first data and
network parameters of the first network layer specifically comprises:
determining, based on the column quantity w1 of the first data, the size k1, the padding
size p1, and the stride size s1, a column quantity wo1 of fifth data output after
the first data is input to the first network layer;
determining, based on the column quantity w1, the padding size p1, the stride size
s1, and the column quantity wo1, a distance Δw between a center of the last operation
on the first data and a center of the first operation on the second data in a horizontal
direction when the convolutional kernel or the pooling kernel processes spliced data
after the spliced data is obtained by padding data of h1 rows and p1 columns between
the last column of the first data and the first column of the second data; and
determining the column quantity c1 based on the padding size p1, the stride size s1,
and the distance Δw.
19. The apparatus according to claim 17 or 18, wherein
that the processing unit completes data processing on the fourth data by using a convolutional
neural network specifically comprises: inputting the fourth data into the first network
layer for processing to obtain sixth data, wherein the sixth data comprises seventh
data, eighth data, and interference data, the seventh data is obtained after the first
network layer processes the first data, the eighth data is obtained after the first
network layer processes the second data, and the interference data is data between
the last column of the seventh data and the first column of the eighth data;
the apparatus further comprises a deletion unit, configured to determine a column
quantity c2 of the interference data, and delete the interference data of c2 columns;
the padding unit is further configured to: determine a column quantity c3 of ninth
data padded between the last column of the seventh data and the first column of the
eighth data; and
pad the ninth data of c3 columns between the last column of the seventh data and the
first column of the eighth data, to obtain tenth data; and
the processing unit is further configured to complete data processing on the tenth
data by using the convolutional neural network.
20. The apparatus according to claim 19, wherein that the deletion unit determines a column
quantity c2 of the interference data specifically comprises:
determining a column quantity wo2 of the sixth data based on a column quantity w1+c1+w2+2p1
of the fourth data and the network parameters of the first network layer; and
determining the column quantity c2 of the interference data based on the column quantity
wo2 of the sixth data, a column quantity of the seventh data, and a column quantity
of the eighth data.
21. The apparatus according to claim 16, wherein the padding unit is specifically configured
to:
determine a row quantity r1 of the third data based on the row quantity h1 of the
first data and network parameters of the first network layer, wherein the network
parameters of the first network layer comprise a size k1 of a convolutional kernel
or a pooling kernel, the padding size p1, and a stride size s1; and
obtain a column quantity w1 of the first data, and padding the third data of r1 rows
and w1 columns between the first data and the second data, to obtain the fourth data.
22. The apparatus according to claim 21, wherein that the padding unit determines a row
quantity r1 of the third data based on the row quantity h1 of the first data and network
parameters of the first network layer specifically comprises:
determining, based on the row quantity h1 of the first data, the size k1, the padding
size p1, and the stride size s1 of the kernel, a row quantity ho1 of fifth data output
after the first data is input to the first network layer;
determining, based on the row quantity w1, the padding size p1, the stride size s1,
and the row quantity ho1, a distance Δh between a center of the last operation on
the first data and a center of the first operation on the second data in a vertical
direction when the convolutional kernel or the pooling kernel processes spliced data
after the spliced data is obtained by padding data of h1 rows and p1 columns between
the last row of the first data and the first row of the second data; and
determining the row quantity r1 based on the padding size p1, the stride size s1,
and the distance Δh.
23. The apparatus according to claim 21 or 22, wherein
that the processing unit completes data processing on the fourth data by using a convolutional
neural network specifically comprises: inputting the fourth data into the first network
layer for processing to obtain sixth data, wherein the sixth data comprises seventh
data, eighth data, and interference data, the seventh data is obtained after the first
network layer processes the first data, the eighth data is obtained after the first
network layer processes the second data, and the interference data is data between
the last row of the seventh data and the first row of the eighth data;
the apparatus further comprises a deletion unit, configured to determine a row quantity
r2 of the interference data, and delete the interference data of r2 rows;
the padding unit is further configured to: determine a row quantity r3 of ninth data
padded between the last row of the seventh data and the first row of the eighth data;
and
pad the ninth data of r3 rows between the last row of the seventh data and the first
row of the eighth data, to obtain tenth data; and
the processing unit is further configured to complete data processing on the tenth
data by using the convolutional neural network.
24. The apparatus according to claim 23, wherein that the deletion unit determines a row
quantity h2 of the interference data specifically comprises:
determining a row quantity ho2 of the sixth data based on a row quantity h1+r1+h2+2p1
of the fourth data and the network parameters of the first network layer; and
determining the row quantity r2 of the interference data based on the row quantity
ho2 of the sixth data, a row quantity of the seventh data, and a row quantity of the
eighth data.
25. A computing device, wherein the computing device comprises a processor and a memory,
the memory is configured to store instructions, the processor is configured to execute
the instructions, and when executing the instructions, the processor performs the
method according to any one of claims 1 to 12.
26. A computer-readable storage medium, wherein the computer-readable storage medium stores
a computer program, and the computer program is executed by a processor to implement
the method according to any one of claims 1 to 12.