Field of the invention
[0001] The present invention has its application within the telecommunication networks, more specifically, relates to the Generative Adversarial Networks (GANs)
[0002] More particularly, the present invention refers to a method for modelling and shaping data produced by a GAN to fit a required distribution.
Background of the invention
[0003] Generative Adversarial -Neural- Networks (GANs) refer to a neural network architecture capable of producing synthetic data with similar statistical properties as the real one and with high resolution. A GAN is a Deep Learning model able to learn data distributions through twofold training using an adversarial learning method. This model can generate very sharp data, even for data such as images with complex, highly multimodal distributions. In this context, GAN networks have gained special attention owing to their high image reproduction ability for generating samples of natural images.
[0004] The adversarial learning method uses two neural networks: generative and discriminate. Roughly speaking, the generative network (G) implicitly learns the data distribution and acts as a sampler to generate new synthetic instances mimicking the data distribution. In particular, G needs to be flexible enough to approximately transform white noise into the real data distribution. On the other hand, the discriminative network (D) is powerful enough to learn to distinguish the generated distribution from the real data distribution.
[0005] In many standard use cases, like those arising in image processing, not too many problems arise using GANs. However, this is not always the case in other types of scenarios. GAN model has limitations when aiming to generate sequences of discrete elements or, in general, to match with a specific data distribution. In the case of discrete features, the problem relies on the fact that the associated mass/density function is not differentiable, so it is not suitable for optimizing the backwards weights of the generator network. In the case of continuous distributions, specific activation functions must be handcrafted for each possible real distribution.
[0006] In other words, GANs are trained by propagating gradients back from the discriminator D through the generated samples to the generator G. But, when data include discrete features or data themselves are a sequence of discrete items, the backpropagated gradients vanish, so gradient descent update via back-propagation cannot directly be applied for a discrete output. Analogously, in the case of continuous data, some extra restrictions on the distribution of the real data may arise (e.g. non-negative values) and the chosen activation function fully determines a concrete output data distribution. For this reason, a generic method for bending the output distribution with the real data to be mimic is required. It is worthy to mention that this problem is not observed for GANs applied to image processing, since pixel intensity distribution tends to be normal-like, so they are correctly generated with a simple linear activation, and human eye capability is not able to perceive the minor differences between real and fake pixels.
[0007] Specifically, traditional GANs produce data at their output that tend to distribute as a normal distribution. If a given scenario does not fulfil with this type of restrictions, non-suitable values will be obtained, like negative values for counters and accumulators (e.g. duration of a TCP flow, round-trip time or RTT, number of packets or bytes). In addition, if only finitely many real values (e.g. a discrete set of values) can be attained, the usual GAN architecture will only be able to replicate them as a continuous of values. In other words, it means that the shape of the generated data may not be coincident with the original data when these data do not follow a normal distribution. In particular, if a non-appropriate activation function is applied, then the obtained output data may not fulfil the domain restrictions of the real data (e.g. to get negative values when only positive ones can be obtained like in the duration of a TCP flow, rtt, number of packets or bytes). Furthermore, if real values follow a discrete set (e.g. size of a TCP window, Time To Live or TTL), a traditional GAN is going to generate real values following a normal distribution (i.e. a non-discrete set of values).
[0008] There are some approaches disclosed in articles found in the literature which address the aforementioned problem by training generators with discrete outputs. All the existing solutions in prior art deal with this problem by defining several ad hoc models to directly estimate or fit the data distribution by means of a gradient policy, namely:
- "GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution" by M. Kusner, J. Hernandez-Lobato (arXiv, 2016): This article propose uses a Gumbel-softmax output distribution in generating tasks in order to generating sequences of discrete elements, which is a continuous approximation to a multinomial distribution parameterized in terms of the softmax function. It allows that the resulting sampling process is differentiable. These results, which are proposed from a theoretical perspective and as a proof of concept, are only focused to generate specific discrete sequence data.
- "Categorical Reparameterization with Gumbel-Softmax" by Jang, E. et al. (arxiv:1611.01144, 2016) and "Boundary-Seeking Generative Adversarial Networks" by Devon, A. et al. (arXiv :1702.0843, 2018): These articles provide a policy gradient for training the generator based on a gradient estimator using Gumbel-Softmax, a continuous distribution on the simplex that can approximate categorical samples, and whose parameter gradients can be easily computed via the reparameterization trick. It implies the modification of the implicit estimation model.
- "Binary generative adversarial networks for image retrieval" by J. Song, T. He, et al. (AAAI Conference on Artificial Intelligence, pp. 394-401, 2018): This paper utilizes a reinforce algorithm as a policy gradient algorithm. The generator which generates fake data maximizing the discriminator's output can be thought of as a policy agent in reinforce learning, which is a probability distribution to take action maximizing a reward in a given state.
- "Importance Weighted Generative Networks" by Elenberg, E. et al. (Joint European Conference on Machine Learning and Knowledge Discovery in Databases ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases. pp 249-265, 2020): This paper modifies the GAN structure, specifically the standard estimator, by rescaling the observed data distribution during training, or equivalently by reweighting the contribution of each data point to the loss function.
- "Statistical Guarantees of Generative Adversarial Networks for Distribution Estimation" by Chen M. et al. (arXiv:2002.03938, 2020): This paper introduces a theoretical study that provides statistical guarantees of GANs for the estimation of data distributions which have densities in a Holder space. Particularly, generator and discriminator networks utilize general weight matrices and the non-invertible Rectified Linear Unit (ReLU) activation function.
[0009] Recently, some works addressing GAN models for generating structured data on continuous distributions have been proposed (e.g. "
GAN-based semi-supervised for imbalanced data classification" by Zhou, T. et al., 2018 4th International Conference on Information Management (ICIM), Oxford, pp. 17-21, 2018; "
Deep-Learning-Based Defective Bean Inspection with GAN-Structured Automated Labeled Data Augmentation in Coffee Industry" by Chou, Y.-C.; Appl. Sci. 2019, Vol.9, Issue No.19, Article No.4166, 2019), but in general these solutions only propose ad-hoc mechanisms for replicating a concrete data distribution and do not address the problem in a general way for any continuous or discrete data distribution.
[0010] Summarizing, none of the existing solutions solves the aforementioned problem due to:
- They require a very ad hoc and handcrafted approach to produce the appropriate activation function that mimics the real data distribution whereas it is discrete or continuous.
- They require modification of the GAN internals.
- They are not able to preserve the restrictions of the original data domain.
[0011] Therefore, there is a need in the state of the art for providing a GAN with an activation function which is capable of dealing with real data that follow any arbitrary distribution.
Summary of the invention
[0012] The present invention solves the aforementioned problems and overcomes previously explained state-of-art work limitations by providing a method to be attached, at the last stage, to a neural network architecture of GANs to fit the underlying distribution of the real data to be generated. This whole process is obtained by means of post-composing the standard generator output with a statistical transformation called the Inverse Smirnov transform.
[0013] The GAN (Generative Adversarial Network) comprises i) a generator agent or network configured to generate synthetic data and ii) a discriminator agent or network configured to distinguish between the generated synthetic data and real original data, the original data following any arbitrary, distribution which can be defined by a n-dimensional vector of input variables (x
_{i}).
[0014] An aspect of the present invention refers to a method for modelling data produced by a GAN which, before generating by the generator agent its output synthetic data, computes an Inverse Smirnov transformation for each of the n input variables x
_{i} and wherein an activation function which is a n-dimensional vector formed by the n computed Inverse Smirnov transformations is attached to the generator agent to generate the output synthetic data. By using the activation function based on Inverse Smirnov transformations, the distribution of the generated synthetic data output by the GAN has the same shape (continuous or discrete) as the arbitrary distribution of the original data.
[0015] The method in accordance with the above described aspects of the invention has a number of advantages with respect to the aforementioned prior art, which can be summarized as follows:
- The present invention improves the resolution of GANs at least at two issues: i) if real data follows a certain distribution (e.g. discrete or continuous but not normal), the proposed GAN architecture is going to generate synthetic values following the same distribution; ii) if the invention is applied to an existing GAN, the Smirnov inverse transformation at the end of generator allows the preservation of domain restrictions of the data (e.g. generate non-negative values for counters or accumulators).
- The present invention can work as a fast add-on in a GAN. Once precomputed, the Inverse Smirnov transform can be interpolated using standard interpolation techniques to operate efficiently in real-time.
[0016] These and other advantages will be apparent in the light of the detailed description of the invention.
Description of the drawings
[0017] For the purpose of aiding the understanding of the characteristics of the invention, according to a preferred practical embodiment thereof and in order to complement this description, the following Figures are attached as an integral part thereof, having an illustrative and non-limiting character:
Figure 1 shows a block diagram of a GAN architecture as known in the state of the art previous to the invention.
Figure 2 shows a block diagram of a GAN architecture, according to a preferred embodiment of the invention.
Figure 3 shows a schematic diagram of a generative neural network implementing an activation function via Smirnov transform, according to a possible embodiment of the invention.
Figure 4 shows a graphic comparing the performance of the method for adaptive scheduling of edge tasks to static scheduling policies known in the state of the art, running on an edge-based network topology.
Figure 5 shows the distributions of the original data, the synthetic data generated with a standard activation function and the synthetic data generated with the activation function of the invention, in case that the original data follows a continuous distribution.
Figure 6 shows the distributions of the original data, the synthetic data generated with a standard activation function and the synthetic data generated with the activation function of the invention, in case that the original data follows a discrete distribution.
Preferred embodiment of the invention
[0018] The embodiments of the invention can be implemented in a variety of architectural platforms, operating and server systems, devices, systems, or applications. Any particular architectural layout or implementation presented herein is provided for purposes of illustration and comprehension only and is not intended to limit aspects of the invention.
[0019] The current state-of-art of GAN technologies can be summarized in Figure 1. An existing given GAN (10) comprises two main components (G, D):
- A generative network or generator agent (G) which generates synthetic data (120) by transforming white noise (130) into a real data distribution through a standard activation function (100), such as a linear function, rectified linear activation function or ReLU, tanh, sigmoid, etc., the activation function being denoted as
As in the existing given GAN (10) of Figure 1,
is a standard activation function (100), the output of the generator agent (G) tends to be a normal distribution. It is worth noting that the original data (110) to be mimicked may not follow this normal shape, e.g. in the example illustrated in Figure 1, the original data (110) distribution is Poisson-like. Therefore, the shape of the generated data (120) produced by the conventional existing GAN (10) is not coincident with the original data (110) as shown in Figure 1.
The role of the generator agent (G) is to improve the quality of the generated data (120) to replicate the original distribution of the original data (110). - A discriminative network or discriminator agent (D) whose role is to distinguish between original data (110) and generated data (120). The ouput (140) delivered by the discriminator agent (D) may be a value selected from 0 and 1 which indicates if the produced data are either real or fake.
[0020] Figure 2 shows an embodiment of the invention, providing a GAN (20) with an activation function (200),
which is not any standard activation function (100) but
is an Inverse Smirnov transform. The GAN (20) can deal with real original data (110) following any arbitrary distribution, continuous or discrete. Its generative agent (G) and discriminative agent (D) may be implemented by neural networks, for instance, but, in general, the main components (G, D) can be implemented in a general scheme.
[0021] Regardless of the architecture adopted for the generating neural network or GAN (20), a preliminary step for modelling the synthetic data, Step 0, is to compute the Inverse Smirnov transform to be used as the activation function (200) for the output layer (L
_{o}) of the neural network (G
_{N}) implementing the generative network (G). This Inverse Smirnov transformation is computed for each of the features of the input dataset or input variables
x_{i}, column (310) input to each neuron (N
_{j}) whose internal detail are shown in Figure 3.
[0022] As explained before, in the state-of-art of GANs, the activations functions of the neurons are selected from sigmoidal, linear or rectified linear functions. By contrast, the neural network (G
_{N}) illustrated in Figure 3, according to a possible embodiment of the invention, uses the Inverse Smirnov Transforms of each feature to be fitted as activation functions. The parameters
f_{1},
f_{2},
f_{3},
f_{4} stand for the activation functions of the neurons of the output layer (L
_{o}) of the neural network (G
_{N}). Figure 3 shows at the output layer (L
_{o}) a plot indicating
i=1, 2,..., as the proposed solution is to take as activations the Inverse Smirnov Transforms. Also, regarding the example illustrated in Figure 3, another column (320) of parameters
w_{1}, ...,
w_{n} which are the so-called weights of the neural network (G
_{N}) can be applied in each neuron (N
_{j}). Neural networks are deep learning models designed to fit any desired output. For this purpose, the model contains a collection of parameters, in this case the weights
w_{1},...,
w_{n}, which are tuned during a process usually referred to as the training of the neural network.
[0023] For simplicity of calculation of the Inverse Smirnov transformation for an input variable
x_{i} at Step 0, the output for generative network (G) is assumed to be one-dimensional
In general, the G output (L
_{o}) is m-dimensional and therefore,
are m-dimensional vectors of functions forming the activation function (200).
Therefore, for each of the features (
x_{i}) of the input dataset, Step 0 computes the Inverse Smirnov transform
to be applied as activation function (200) in later steps performed by the generative network (G).
Step 0:
for x_{i} i = 1...
n
obtain Here
f_{Si} is the Smirnov transform and
is its inverse function, computed as follows: Explicitly, let
F_{Norm} be the cumulative distribution function of a standard normal distribution, and let
F_{Data} be the cumulative distribution of the original sample (x) which may be estimated through an histogram of the input data. The Inverse Smirnov transform,
is given by:
With this computation of
the activation function (200) post-composes the distribution generation of a generative agent (G), e.g. of a generating neural network, with the statistical transformation of the Inverse Smirnov transform.
[0024] At this point, these Inverse Smirnov transformations are gathered into a function
This
is a new activation function (200) to be attached at the end of the generator agent (G), as shown in Figure 2, so that the shape of the generated data (220) can agree with the original data (110).
[0025] Observe that no new training method is needed, and the system can be trained as usual, e.g., backpropagation in the case that the GAN is implemented through neural networks, but with the new activation function (200),
attached at the end of the generator (G).
[0026] In the case that neural networks are used for implementing GANs, the detailed attachment of the activation function (200) being a n-dimensional vector of Inverse Smirnov transformations to the GAN architecture distribution is shown in Figure 3. At each of the output neurons (N
_{j}) at the last layer or output layer (L
_{o}) of the generator network (G), the corresponding n activation functions are substituted by the Inverse Smirnov transforms obtained at Step 0,
Figure 4 shows a summary of the operation of the proposed method for modelling a shape of data using a distribution generator (400) which is a generative agent (G) of a given GAN (10) that is standard, existing in the prior-art as the GAN (10) shown in Figure 1, but the method can be applied to any generative technique.
[0027] To emphasize the advantages of the invention, Figures 5 and 6 present graphically the operation mode for some restrictions of the domain of the original data (110). For instance:
- In Figure 5, the original data (110) follows a continuous distribution (410). If a feature of the original data (110) is non-negative, i.e. a counter or a rate, the generated distribution (120) for this feature using standard GANs may attain negative values, but the output synthetic data (220) generated or transformed using the proposed activation function (200) presents a continuous distribution (420) that respects this restriction leading to non-negative values.
- In Figure 6: the original data (110) follows a discrete distribution (410'). The generated distribution (120) with standard GANs is continuous as in Figure 5. However, the output synthetic data (220) generated using the proposed activation function (200) leads to an output with a discrete distribution (420') and respects the discretely distributed features, i.e. counters or categories, of the original data (110). It is worthy to mention that this problem is a hot topic in the frontier of the current knowledge.
[0028] Therefore, the generated synthetic data (220) output by the GAN using the activation function (200) has a distribution (420, 420') whose shape is the same as the arbitraty distribution of the original data (110).
[0029] Note that in this text, the term "comprises" and its derivations (such as "comprising", etc.) should not be understood in an excluding sense, that is, these terms should not be interpreted as excluding the possibility that what is described and defined may include further elements, steps, etc.
1. A method for modelling a shape of data in a Generative Adversarial Network, GAN, comprising a generator agent (G) configured to generate synthetic data (110, 220) and a discriminator agent (D) configured to distinguish between the synthetic data (120, 220) generated by the generator agent (G) and real original data (110) following an arbitrary, continuous or discrete, distribution which is defined by a n-dimensional vector of input variables x
_{i};
characterized in that the method, before generating output synthetic data (220), computes an Inverse Smirnov transformation
for each of the n input variables x
_{i} and an activation function (200), wherein the activation function (200) is a n-dimensional vector
formed by the computed Inverse Smirnov transformations,
and generates the output synthetic data (220) using the activation function (200) attached to the generator agent (G).
2. The method according to claim 1, wherein the activation function (200) substitutes a standard activation function (100) of an existing given GAN (10).
3. The method according to any preceeding claim, wherein the generator agent (G) and the discriminator agent (D) are implemented by neural networks.
4. The method according to any preceeding claim, wherein the Inverse Smirnov transformation
is computed as
wherein F
_{Norm} is the cumulative distribution function of a standard normal distribution and F
_{Data} is the cumulative distribution of an original sample x from the n-dimensional vector of input variables x
_{i}.
5. The method according to claim 4, wherein the activation function (200) is performed by each neuron of an output layer (L_{o}) of the generator network (G).