Field of the Invention
[0001] The present invention refers to methods and systems for determining body measurements
of a person, as well as to methods and systems for providing clothing size recommendations.
Prior Art
[0002] Shopping online for clothing enjoys ever increasing popularity. One downside of the
online shopping experience is that, although the design of items of clothing can be
evaluated to some degree, the item cannot be tried on as it would be possible in an
offline store. This leads to several undesired effects; for example it is not uncommon
for shoppers that are interested in an item of a certain design to order one and the
same item in several different sizings. The intention is to find the one sizing that
optimally fits and/or suits the shopper, and to send back the other sizings.
[0003] Herein, a sizing should be understood to mean one particular version of the same
item of clothing that is different from other versions, i.e. other sizings, only in
at least some of its measurements. For example, one and the same sweater may be available
in sizings XS (extra small), S (small), M (medium), L (large), XL (extra large) and
so on.
[0004] An even more common occurrence is that a shopper orders only one sizing of an item
but finds out that the ordered sizing does not fit after all, either because the shopper
misjudged which sizing would be appropriate, or because for said specific item of
clothing a particular sizing is smaller than other items of clothing of the same size.
For example, an item of clothing that is considered to be "M" in Japan may be considered
to be "S" in Germany or the like.
[0005] As a result, there is a lot of mail traffic because of items of clothing being sent
back to the vendor by shoppers, often encouraged by national laws that force vendors
to pay the transport fees. This mail traffic taxes not only environmental and public
resources, but is also a substantial cost factor for vendors. Moreover, also the shoppers'
patience is tried and shoppers may refrain from ordering clothing online at all because
of the hassle with trying different sizings and having to send many or even all of
them back to the vendor.
[0006] One way to meliorate these problems would be to compare detailed body measurements
of a particular shopper - at least one measurement but the more measurements, the
better - to properties of each sizing of each item of clothing, for example as a professional
tailor would. However, very few people have their body measurements taken by professional
tailors, mostly due to the inconvenience and cost of it. Many people also take their
own measurements wrongly or simply attempt to guess the right sizing because it is
too much effort to measure oneself, ending up with inaccurate guesses.
[0007] Accordingly, a method is desired for conveniently yet accurately determining at least
one body measurement of a person.
[0008] However, even when body measurements are known, it is still not straightforward to
determine the correct sizing of a particular item of clothing for a particular person
since it will be a rare occurrence that the measurements of the person fit exactly
the measurements of a particular sizing.
[0009] In addition, for the shoppers comparing their measurements with numerous sizing charts
is too time-consuming when shopping. Moreover, different brands often have different
sizing standards and sometimes even clothing items of nominally the same sizings and
the same brand but for different items (designs) may differ from each other. Due to
the negligibly rare usage of sizing charts web shops often publish a single averaged
(across products) sizing chart to avoid the hassle of uploading (let alone generating)
one sizing chart for every product in stock. The more averaged the sizing chart is,
however, the less useful it will be when used to look up the right size of a product.
[0010] Accordingly, a method is desired for conveniently yet accurately providing a clothing
size recommendation for a person based on at least one body measurement of the person.
The clothing sizing recommendation may be a general size recommendation such as "Sizing
M fits this person", a sizing recommendation for a specific type of clothing such
as "For trousers, sizing M fits this person", an item-based sizing recommendation
such as "For this particular item of clothing, sizing M fits this person", an overview
recommendation such as "For this person, the following sizings of the following items
of clothing fit: ..." and/or the like.
[0011] Prior art document
US 2010/111370 A1 describes a system and a method of estimating the body shape of an individual from
input data such as images or range maps.
[0012] US 2015/154453 A1 describes a method for deriving body size measures of a user from a sequence of 2D
images.
[0013] US 5 515 168 A describes a device for the identification and classification of subjects having variable
non-standardized shapes, comprising a device for taking views, a module for the acquisition
and processing of the views and for the extraction of at least one given characteristic
dimensional parameter, and a module for identification and classification of the studied
subjects from the extracted characteristic parameters.
[0014] CN 108 986 159 A describes a method and a device for reconstruction and measurement of a three-dimensional
human body model.
[0015] However, there is still a need for providing a more efficient and more accurate method
for determining at least one body measurement of a person.
Summary of the Invention
[0016] The above objects, as well as further objects that may become explicitly or implicitly
apparent in the following, are solved by the independent claims of the present invention.
[0017] Thus, according to a first aspect of the present invention, a computer-implemented
method for determining at least one body measurement of a person, comprising at least
the steps of:
- obtaining a plurality of images of the person, wherein the images in the plurality
of images show the person from at least two different angles;
- selecting at least a sub-set of the received plurality of images to provide first
input images;
- generating segmentation maps for the first input images, wherein the segmentation
maps at least distinguish the person from a background;
- determining at least one body measurement of the person based at least on the generated
segmentation maps.
[0018] Nowadays, with the widespread availability of cameras of high quality, for example
in webcams, laptops, computer monitors, mobile phone, tablet computers and the like,
it is very easy and convenient to obtain images of people, and indeed comes natural
to many people that are fond of appreciating, buying and experimenting with fashion.
[0019] In some variants or embodiments, simply all of the obtained images may be selected
as the first input images. In other variants, at least one criterion may be applied
on the obtained images to select therefrom the first input images. One criterion may
be that only such of the obtained images may be selected as the first input images
that fulfill a pre-defined image quality criterion, for example that have a minimum
image resolution, a minimum size, a minimum contrast (due to lighting etc.) and/or
the like. Additionally, or alternatively, the method may be configured such that every
x-th image of the obtained images is selected, preferably every x-th image out of
a pre-selection of the obtained images based on the at least one criterion, wherein
x is suitably chosen integer number.
[0020] The segmentation maps comprise at least one segmentation masks, wherein each segmentation
mask contains, or indicates, pixels that are associated with a specific object category.
In simple cases, a segmentation map comprises a single segmentation mask which differentiates
the person in the image from the background. In other variants, the segmentation maps
may comprise several segmentation masks, wherein different segmentation masks identify
different body parts. For example, a segmentation map may comprise six segmentation
masks: one for each of head, torso, and each extremity.
[0021] In the present context, "at least one body measurement" shall be understood to include
a particular measurement type of a body of a person in a defined way in a defined
location. Typical body measurement types include, in particular for women, shoulder,
arm hole, upper arm, bust, upper belly, belly, waist, hips, thighs, inseam, bottom
length, and/or knees, and may also include, in particular for men, collar, top of
shoulder, chest, rise, inside leg and/or outside leg. Thus, determining more than
one body measurement may include determining values for more than one type of body
measurement. However, determining more than one body measurement may also include
determining more than one measurement for the same body measurement type but using
different methods.
[0022] In some advantageous embodiments, variants, or refinements of embodiments, the plurality
of images of the person is a video of the person or is extracted from a video of the
person. For example, the method may be configured such that a certain percentage of
frames of the video and/or a certain minimum number of frame or the like is selected
as the first input images.
[0023] In order to more accurately determine a plurality of body measurements, it is preferred
that the video shows the person from at least two, more preferably from at least four,
different viewing angles with respect to the longitudinal axis (from feet to crown)
of the person, wherein the viewing different angles are most preferably separated
from one another by a pre-defined minimum angle difference. For example, when the
video shows a person turning around their longitudinal axis by 90 degrees, and the
pre-defined minimum angle is 30 degrees, then said video shows the person from four
different viewing angles.
[0024] It is most preferred if the video shows the person turning at least once completely
around the longitudinal axis (i.e. at least 360 degrees), wherein additional complete
turns (i.e. 720 degrees, 1080 degrees and so on) may further improve the accuracy
using averaging techniques and the like.
[0025] In other words, in some advantageous embodiments, variants, or refinements of embodiments,
obtaining the plurality of images of the person comprises acquiring the video of the
person by filming the person while the person is at least partially turning around
their longitudinal axis, and is preferably turning at least once completely around
their longitudinal axis. Advantageously, the
[0026] The filming may be performed by one or more fixed cameras (for example a camera mounted
in a computer screen or a laptop screen) or by a camera handheld by another person
which may, for example, use their mobile phone to film the person for which the at
least one body measurement is to be determined.
[0027] It is preferred that the filming is performed using a dedicated application ("the
app") which may be installed on and run by a mobile device which, apart from acquiring
the video, may advantageously acquire additional data that may be saved and, partially
or completely, transmitted together with the video data. The app may also be a web
application which is accessed by a web browser of the mobile device and which is displayed
by a display of the mobile device.
[0028] For example, during the video acquisition process (i.e. the filming), the orientation
of the mobile device as well as its linear accelerations in 3-dimensional space may
be recorded. These may be helpful to determine, and possibly compensate, changes in
the viewing angle that are not due to the filmed person moving but instead due to
the filming mobile device moving. For example, when a person holding the mobile device
running the app suddenly moves a step back, the person in the images appears to become
smaller. However, using the orientation and acceleration of the mobile device it may
be easily determined that the person has actually stayed the same size.
[0029] The above is a rather trivial example; however, it will be understood that involuntary
or other movements of the filming mobile device could in principle result in other
distortions of the size of the person in the video which are however compensable due
to the measurements of orientation and/or acceleration of the filming mobile device.
Preferably, the filming (i.e. the acquiring or recording of the video) is performed
at frequencies in the range of 1 Hz to 100 Hz. It has been found that such frequencies
provide a good balance between the constraints of taking enough image frames in the
video so as to show the person from enough different viewing angles on the one hand,
and the constraints of the capabilities of the camera as well as the desire to reduce
data storage space needs.
[0030] In some advantageous embodiments, variants, or refinements of embodiments, the method
further comprises selecting at least a sub-set of the first input images to provide
second input images. Preferably, all of the first input images are selected as the
second input images to be provided. However, in some variants additional filters and/or
criteria which have to be fulfilled by the second input images may be applied to the
first input images. It is further preferred that the number of second input images
is in the range of 3 to 60.
[0031] The method may further comprise determining, in order to estimate a shape and/or
pose of the person in each of the second input images, a corresponding parameter value
θ
ki for each image k of the second input images and for each parameter of a set of pre-defined
parameters θ
i. In particular, this may be performed by fitting a parametric human body model to
the person in each of the second input images based on the generated segmentation
maps.
[0032] The parametric human body model used may e.g. be the SMPL body model (see
Loper et al.,"SMPL: A Skinned Multi-Person Linear Model", ACM Trans. Graphics (Proc.
SIGGRAPH Asia), Oct. 2015, No. 6, pages 248:1-248:16, volume 34 of 2015), BlendSCAPE (see
D.A. Hirshberg et al., "Coregistration: simultaneous alignment and modeling of articulated
3d shape", Proceedings of the twelfth European conference on computer vision (2012),
pp. 242-255) or any other suitable known parametric human body model. It is preferred that a
parametric body is used that comprises any set of parameters (θ = (θ
1, θ
2, ..., θ
n) of R
n , 10 < n < 1000 (preferably 20 < n < 300), that describe the geometrical shape, pose
and other features of the human body.
[0033] The method further comprises generating an input tensor based on the determined parameter
values θ
ki. The input tensor may be generated such as to comprise all individual values for
θ
ki for each parameter θ
i for each image k of the second input images. Alternatively, a pre-processing may
be performed; for example, an averaging metric θ
avgi for the values for each parameters θ
i may be determined, and the input tensor may comprise the n averaging metrics θ
avgi. The averaging metric may be a mean such as a geometric mean or an algebraic mean,
a median and/or the like.
[0034] The method may then comprise inputting the input tensor into an artificial neural
network configured and trained to infer at least one body measurement based on the
input tensor, and generating, by the artificial neural network, at least one body
measurement based on the input tensor.
[0035] The method comprises selecting at least a subset of the first input images to provide
third input images. All of the first input images may be selected as the third input
images, and/or the third input images may overlap, or be identical with, the second
input images.
[0036] The method further comprises determining locations for a number of pre-defined key
points on the human body by inputting the third input images into an artificial neural
network configured and trained to infer the location for the key points based on the
input third input images.
[0037] The method further comprises determining, based on the determined locations for the
key points, a subset of the first input images so as to provide fourth input images
such that each of the fourth input images corresponds to one view of a set of pre-defined
views of the person from different angles, preferably at least four different pre-defined
views, each corresponding to a different viewing angle of the person. It is preferred
that the at least four pre-defined view comprise a (full) front view, a (full) back
view, a left (side) view and a right (side) view. More preferably, more than four
pre-defined views are determined to increase accuracy.
[0038] The number of pre-defined key points may comprise a number of pairs of symmetric
key points (in particular symmetric respective to the sagittal-longitudinal plane
of the human body).
[0039] Advantageously, in images (or: frames) distances between the two key points of each
pair of symmetric key points can be determined, and a pose/orientation of the person
can be determined on at least one of these determined lengths. In particular, the
full front or full back viewing angle can be determined by the images in which at
least one (or a majority, or all) of the determined distances are maximal. Correspondingly,
a left or right side view of the person can be determined by the images in which in
which at least one (or a majority, or all) of the determined distances are minimal.
[0040] The method further comprises generating, for each of the fourth input images, a silhouette
of the person therein, and determining at least one body measurement for the person
by inputting an input based on the determined locations for the key points and the
generated silhouettes, into a body measurement determining model. Said input may comprise,
or consist of, the key points and the generated silhouettes themselves and/or data
calculated based thereon.
[0041] According to a second aspect of the present invention, a computer-implemented method
for providing a clothing size recommendation for a person is provided comprising generating
at least one body measurement according to the method according to any embodiment
of the first aspect of the present invention. The method may further comprise inputting
the at least one body measurement into a clothing size recommendation model configured
to generate a clothing size recommendation based on the at least one body measurement,
and generating, by the size recommendation model, the clothing size recommendation
based on the at least one body measurement.
[0042] As has been described in the foregoing, the clothing sizing recommendation may be
a general size recommendation such as "Sizing M fits this person", a sizing recommendation
for a specific type of clothing such as "For trousers, sizing M fits this person"
or "For jeans, 34W32L fits this person" or "For shirts, size 38 fits this person",
an item-based sizing recommendation such as "For this particular item of clothing,
sizing M fits this person", an overview recommendation such as "For this person, the
following sizings of the following items of clothing fit: ..." and/or the like.
[0043] In some advantageous embodiments, variants, or refinements of embodiments, the method
according to the second aspect comprises receiving or retrieving clothing item information
indicating at least one property of an item of clothing, and the clothing size recommendation
is generated also based on the clothing item information.
[0044] For example, the clothing item information may comprise, or consist of, information
about at least one measurement of at least one item of clothing (e.g. one that a customer
has selected) such as a recommended, or a maximum body measurement of specific type
for the item of clothing (e.g. recommended waist size X cm). the clothing item information
may also comprise, or consist of, information about a relationship between the person
and at least one sizing of the item of clothing.
[0045] For example, a user may have order sizing L of one particular item of clothing, may
have found out that sizing L is too large and has sent the item back, giving the feedback
that sizing L is too large for the person. When generating a clothing size recommendation
for that user regarding the same item of clothing, this information may be used (for
example to recommend a sizing smaller than L for the person). The clothing item information
may comprise, or consist of, many other items or types of information such as an elasticity
of the item of clothing at a position corresponding to one of the body measurements
(such as the waist) and/or the like.
[0046] In some advantageous embodiments, variants, or refinements of embodiments, the clothing
item information indicates properties of at least two different sizings of an item
of clothing, and wherein the clothing size recommendation includes, for at least one
of the at least two different sizings, a recommendation information of fitting or
non-fitting of said at least one sizing for the person. Preferably, the clothing item
information indicates properties of all available or existing sizings of the item
of clothing and the clothing size recommendation includes a recommendation information
of fitting or non-fitting of each of the sizings for the person. Optionally, the user
may input additional parameters, for example indicating that the user prefers a "loose
fit" or a "tight fit" in general or in a particular body measurement region, which
can be used in the method for the clothing size recommendation.
[0047] In some advantageous embodiments, variants, or refinements of embodiments, the method
according to the second aspect further comprises the step of receiving or retrieving
return information indicating at least one item of clothing which has been labelled
as being of a wrong size or poor fit (i.e. has been given a low fit rating) for a
user. This user may be the same as the user that is currently desiring a clothing
size recommendation but may also be another user, preferably a plurality of other
users.
[0048] The reason is that, for example, users with the same body type, or even very similar
body measurements, as the current user may have consistently given low fit ratings
to a particular sizing so it stands to reason that the same will be true also for
the current user. Vice versa, when no such information is available, but for example
a lot of users with very different body measurements from the current user's have
given high fit ratings to a particular sizing then it stands to reason that the same
sizing will very likely not be a good fit for the current user.
[0049] "Wrong size" may indicate that the person cannot fit (optionally differentiating
for at least one body measurement where the person does not fit), whereas "poor fit"
may indicate that the person fits but (for at least one body measurement) desires
a different fit, for example because the person desires a looser or tighter fit for
the at least one body measurement for medical, anatomical and/or aesthetic reasons.
[0050] The return information may further indicate at least one property of said returned
at least one item of clothing, most preferably information relating to the fitting
of the person to at least one specific body measurement of the item of clothing. Thus,
the return information may in particular comprise, or consist of, user feedback of
the person for which body measurements have been determined and which has tried the
returned at least one item of clothing.
[0051] For example, the person may have bought a shirt and then have returned it, indicating
the degree or quality of fitting of the person at the shoulders, chest, sleeve length
and/or the like. Such information is not only highly useful for providing an improved
clothing size recommendation to the person in the future, it may also be used to train
at least one of the artificial neural networks used in the method.
[0052] Preferably, the clothing size recommendation is generated also based on the return
information.
[0053] In some advantageous embodiments, variants, or refinements of embodiments, the clothing
size recommendation model comprises a learning sub-model, LSM, and a non-learning
sub-model, NLSM. A weighted sum of an output of the LSM and of an output of the NLSM
may calculated to provide a fitting score vector, and the clothing size recommendation
is generated based at least on the provided fitting score vector. The weighting between
the LSM and the NLSM may be based on a quality metric of the LSM such that with improving
quality metric of the LSM, automatically the weight of the LSM increases. Said quality
metric may include an accuracy of the LSM with a validation set, a number of training
sets used for training the LSM, a number of training epochs and/or the like.
[0054] In some advantageous embodiments, variants, or refinements of embodiments, at least
one weighting factor for the weighted sum for a clothing size recommendation regarding
a particular item of clothing is variable and is based on a number of datasets (in
particular training datasets) available for said particular item of clothing, wherein
the datasets include at least one information about at least one body measurement
of a user and at least one information ("fitting information") about how said particular
item of clothing fits said user, specifically how at least one measurement of the
item of clothing compared to at least one body measurement of the user (e.g. "waist
too tight", "hips to loose", "inseam perfect fit").
[0055] In some advantageous embodiments, variants, or refinements of embodiments, the NLSM
is based at least on a distance metric between:
- a) at least one parameter of at least one item of clothing, and
- b) at least one of the parameter value θki and/or at least one entry of the input tensor (for example, an averaging metric θavgi of the parameter values θki for one parameter θi a).
[0056] This implements the basic idea that an item of clothing fits a person well if the
body measurements of the person fit are as close as possible to the corresponding
measurements of the item of clothing (i.e. waist measurement value of the person with
respect to waist size of an item of clothing and so on). The distance metric may treat
negative and positive distances differently, and may in particular penalize a too-small
measurement of the item of clothing relatively more than a too-large measurement,
since as a compromise often a loose fit for some body measurement which may cause
a slight unseemliness is generally more acceptable than a tight fit that may cause
discomfort.
[0057] In some advantageous embodiments, variants, or refinements of embodiments, the LSM
comprises an artificial intelligence entity configured and trained to receive the
input tensor as an input and to output a first fit likelihood vector of the same dimensions
as a second fit likelihood vector output by the NLSM and/or outputs a first fit score.
The fit likelihood vector may comprise, for each of a list of sizings and/or items
of clothing, an entry indicating the likelihood that said sizing or item of clothing
or combination of sizing and item of clothing is the best fit for the person. The
first fit score may indicate a score (e.g. in the range of 0 to 100) of how well a
particular (e.g. the best-fitting) sizing/item according to the fit likelihood vector
actually fits the person. Alternatively, a first fit score vector may be comprised
that indicates for each sizing/item how well it fits the person. Further alternatively,
the fit likelihood vector may comprise the first fit score in one or more entries.
[0058] According to a third aspect, the present invention involves Straining the artificial
intelligence entity.
[0059] In particular, an artificial neural network may be trained by return information
provided by users, wherein the return information indicates fit ratings of the users
in which they describe how well at least one measurement type of at least one item
of clothing fits them.
[0060] Information about at least one body measurement of the users may be provided separately,
e.g. retrievable via a unique identifier, or may be provided in a signal in which
return information is transmitted. Thus, the return information encodes information
about users' body measurements and how well they fit in relation to measurements of
items of clothing. The return information may be automatically transmitted to a computing
device configured for training the artificial neural network from web shops that request
said information from users when they intend to return an item of clothing.
[0061] The invention further provides, according to a fourth aspect, a system configured
to perform the method according to any embodiment of the first aspect of the present
invention and/or configured to perform the method according to any embodiment of the
second aspect of the present invention. The system may in particular comprise a mobile
device (such as a smartphone, a tablet computer and/or the like) and a remote computing
device (such as a cloud computing platform or a web server). The remote computing
device may comprise a database and/or may be operatively coupled to a database of
the system for storing data thereon, for example learning models to be trained, return
information, user parameters, unique identifiers and/or the like.
[0062] Preferably the mobile device is configured (and used) to obtain the plurality of
images of the person, for example by running (or, more preferably, in the case of
a web application, displaying) an app that performs the necessary method steps. Further,
the remote computing device preferably performs the method steps not performed by
the mobile device, in particular the generating of the segmentation maps, the determining
of the at least one body measurement, the generating of the clothing size recommendation
and so on.
[0063] According to a further aspect, the invention thus also provides a server configured
to receive the plurality of images obtained from the mobile device and/or the selected
first input images, and to perform the remaining steps of any embodiment of the first
aspect and/or of the second aspect of the present invention, in particular the generating
of the segmentation maps, the determining of the at least one body measurement, the
generating of the clothing size recommendation and so on.
[0064] The invention will be explained in yet greater detail with reference to exemplary
embodiments depicted in the drawings as appended.
Brief Description of the Drawings
[0065] The accompanying drawings are included to provide a further understanding of the
present invention and are incorporated in and constitute a part of the specification.
The drawings illustrate the embodiments of the present invention and together with
the description serve to illustrate the principles of the invention. Other embodiments
of the present invention and many of the intended advantages of the present invention
will be readily appreciated as they become better understood by reference to the following
de-tailed description. Like reference numerals designate corresponding similar parts.
[0066] The numbering of method steps is intended to facilitate understanding and should
not be construed, unless explicitly stated otherwise, or implicitly clear, to mean
that the designated steps have to be performed according to the numbering of their
reference signs.
- Fig. 1
- shows a flow diagram schematically illustrating an embodiment of the method according
to the first aspect of the present invention;
- Fig. 2 and Fig. 3
- schematically illustrate the method of Fig. 1 further;
- Fig. 4
- shows a flow diagram schematically illustrating another embodiment of the method according
to the first aspect of the present invention;
- Fig. 5
- shows a flow diagram schematically illustrating a method according to an embodiment
of the second aspect of the present invention;
- Fig. 6
- shows an exemplary flow diagram according to further embodiments of the present invention;
- Fig. 7
- shows a block diagram schematically illustrating a computer program product according
to an embodiment of the sixth aspect of the present invention;
- Fig. 8
- shows a block diagram schematically illustrating a data storage medium according to
an embodiment of the seventh aspect of the present invention; and
- Fig. 9
- shows a schematic block diagram illustrating a system according to an embodiment of
the fifth aspect of the invention
Detailed Description of Embodiments
[0067] Fig. 1 shows a flow diagram schematically illustrating an embodiment of the method
according to the first aspect of the present invention, i.e. a method for determining
at least one body measurement of a person.
[0068] In a step S10 a plurality of images of the person is obtained, wherein the images
in the plurality of images show the person from at least two different viewing angles,
more preferably from at least four viewing angles, wherein the viewing angles are
advantageously equally-spaced.
[0069] Most preferably, the plurality of images is obtained by recording a video of the
person. In the following, an exemplary situation will be described in which a smartphone
is used to take a video of the person for which the body measurements are to be determined
using an app installed on the smartphone. However, it should be understood that numerous
are variants, some of which have been described in the foregoing, exist for recording
a video of the person.
[0070] In the present example, the recording of a video is performed using a standard smartphone
RGB camera (front or back-facing).
[0071] Fig. 2 schematically illustrates this process.
[0072] The smartphone 1 is handed to a person nearby to take the video (variant (A) in Fig.
2) or is placed against an object or a wall (variant (B) in Fig. 2). The recording
is started in a step S11, and the person 2 (of which the body measurements are to
be taken, and which may also be designated as the "subject" of the method) rotates,
in a step S12, normally around their own longitudinal axis clockwise or counter-clockwise.
The person 2 can either continuously rotate or introduce pauses in between. The rotation
of the subject rotates is at least 360 degrees within 3 or more seconds.
[0073] During the video recording process, in a step S13, the orientation of the smartphone
1 as well as the linear accelerations in 3-dimensional space are recorded, preferably
at frequencies of between 1 and 100Hz. A unit normal vector describing the axis parallel
to the optical axis of the device's front or back-side camera is referred to herein
as
nd and may be calculated in a step S14.
[0074] In a step S20, at least a sub-set of the received plurality of images is selected
to provide first input images. In the present example, individual frames of the recorded
video may be selected as the first input images. For example, every N-th frame of
a video may be selected as a first input image, wherein N may be determined as N=fps/C,
fps being the frames per second of the video and C being a constant that is preferably
in the range of between 30 and 5. Alternatively, N may be fixed to a specific number
of frames per second which may also be in the range of between 5 and 30.
[0075] In a step S30, segmentation maps 3 for the first input images are generated, wherein
the segmentation maps 3 at least distinguish the person 2 and /or differentiate their
body parts from a background 4. Any known segmentation method or segmentation algorithm,
in particular using at least one artificial neural network, can be used for the generation
of the segmentation maps.
[0076] In a step S40, location for a number of pre-defined key points on the human body
as well as at least one silhouette of the person 2 in at least one of the plurality
of obtained images (preferably in the first input images) are determined, specifically
as follows:
In a step S41, at least a subset of the first input images are selected to provide
third input images; all of the first input images may be selected as the third input
images.
In a step S42, the locations for a number of pre-defined key points K
i on the human body are determined by inputting the third input images into a first
artificial neural network configured and trained to infer the location for the key
points based on the input third input images.
[0077] The first artificial neural network DCNN1 can include multiple network branches in
a combination of 2D convolution and deconvolution layers with any non-linearity as
activation functions, Batch Normalization (BatchNorm) layers. The first artificial
neural network DCNN1 may comprise between 25 and 500 hidden layers trained on training
datasets to estimate the pose of the person from images (e.g. video frames).
[0078] The model infers a number key points (preferably in the range of from 4 to 100) on
the human body in 3-dimensional space which describe the pixel (on the respective
third input image) position and orientation (Euler angles, quaternions, dual quaternions
etc.) of pre-defined joints such as wrist, elbow, shoulder, knees etc. or center points
of organs or other landmarks on the human body such as the eyes, mouth, ears and others
(marked as points in Fig. 2). The key point K
i preferably comprise at least one pair of symmetric (with respect to the sagittal/longitudinal
plane) key points, in the present example: key points K
2 and K
4 (shoulders), K
5 and K
7 (elbows), K
8 and K
9 (wrists), K
10 and K
11 (lateral hip bone end points), K
12 and K
13 (knees) and K
14 and K
15 (ankles). The pose of the person may be considered to consist of the collection of
all key points K
i and their articulations.
[0079] From a subset (or all) of the key points K
i located on the human torso (in the present example e.g. K
2, K
4, K
10 and K
11), in a step S43 a unit normal vector
np is calculated which represents the person's orientation in 3-dimensional space.
[0080] In a step S44, at least one distance L between pairs of symmetric key points on the
left and right parts of the subject's torso (e.g. length from left shoulder key point
K2 to right shoulder key point K4 projected to the current camera view) is measured
and recorded for all or a subset of the frames. This may be performed already during
the recording of the video by the app.
[0081] Preferably, between 1 and 10 distances L are measured between pairs of symmetric
key points. If the pose is estimated for only a subset of the available frames (i.e.
when there are fewer third input images than first input images), the measured distances
L may be, in an optional step S45, interpolated and smoothed such as to have L for
each of the first input images, or even for each frame of the video.
[0082] In a step S46, a frontal and back view F
+ (front) and F
- (back) are selected by taking the images with the maximal Euclidean distance between
the selected key point pairs. The two side views (left and right) designated as S
L and S
R are selected by taking the frames with minimal Euclidean distance either of one of
the pairs (e.g. the largest one or the smallest one), or of the mean of the pairs.
The app used for obtaining S10 the video may guide the person 2 to move in such a
way that the specified view of the person are as clear as possible in the video.
[0083] If no clear the minima/maxima can be determined by using L, the unit normal vectors
np and
nd are used to select the front and back views when they are as parallel/antiparallel
as possible in the x-y plane (scalar product =1 or scalar product = -1, respectively).
The side views are selected such that
np and
nd are as parallel as possible in the x-y plane (scalar product = 0), wherein the x-y-plane
refers to the plane on which the filmed person stands, the z direction is the direction
in which the longitudinal axis of the person extends and x, y, z form an orthogonal
coordinate system.
[0084] Apart from said four views, additional views may be determined for improved accuracy
and robustness against occlusions and the like. For example, equidistantly between
each two adjacent views an additional view may be determined so that in total eight
views are provided. In the following, the situation of only four views will be described
for simplicity.
[0085] Once the images (frames) with the four views (front, back, left, right) are obtained
from the video, in an optional step S47 each image is normalized to have the same
number of pixels and cropped to the subject's segmentation map to have equal distance
from the outermost pixel in every side of the image.
[0086] In a step S48, in every fourth input image, the outline of the segmentation maps
(herein designated as a "silhouette") is determined (or: calculated), preferably by
using an edge detection algorithm such as Canny or similar (Canny generally consists
of 5 steps: Noise reduction; Gradient calculation; Non-maximum suppression; Double
threshold; Edge Tracking by Hysteresis). The silhouette of the segmentation maps includes
the pixel coordinates of every pixel in the subject's silhouette.
[0087] In a step S49, for every pixel in the subjects silhouette the nearest key point K
i is identified via 1-Nearest-Neighbor search with a Euclidean distance metric on the
pixel coordinates along the width (horizontal, c
w) and height (vertical, c
h) of the 2-dimensional fourth input image.
[0088] The horizontal and vertical distances along with the index of the key point K
i the silhouette pixel j was assigned to are stored as a silhouette triplet st
j = (i
kp, d
w, d
h), wherein i
kp indicates the corresponding key point K
i.
[0089] In a step S50, a first input tensor
X̃ is formed by concatenating the key points K
i from the pose estimation together with all silhouette triplets st
j for each selected fourth input image are concatenated, for example as

wherein SSM
+ designates all silhouette triplets st
j of the front view F
+, SSM
- designates all silhouette triplets st
j of the back view F
-, SSM
L designates all silhouette triplets st
j of the left side view F
L, SSM
R designates all silhouette triplets st
j of the right side view F
R, K
+ designates all key points K
i of the front view F
+, K
- designates all key points K
i of the back view F_, K
L designates all key points K
i of the left side view F
L, and K
R designates all key points K
i of the right side view F
R, and the "," in the above formula designates a concatenation.
[0090] In a step S60, the first input tensor
X̃ is input into a body measurement determining model, preferably into a general linear
model or an artificial neural network regressor model. The artificial neural network
regressor model may comprise hidden layers in the range of 1 to 200 and may be configured
and trained to receive the first input tensor
X̃ and to determine, based thereon, at least one body measurement of the person, preferably
all of the pre-defined body measurements (or: body measurement types). In other words,
the body measurement determining model will map the first input tensor
X̃ to a first body measurement vector y
1, wherein y
1 preferably has a size in the range of from 5 to 50, wherein each of the 5 to 50 entries
indicates one body measurement value. In yet other words, the body measurement determining
module represents a function f
1 with f
1(
X̃)=y
1.
[0091] The body measurements of the first body measurement vector y
1 are herein also designated as first type body measurements. The method described
in the foregoing is comparatively easy and quick to perform.
[0092] Fig. 3 schematically illustrates some of the core views of the person that are advantageously
taken from the video of the person turning around their longitudinal axis A.
[0093] In the following, with respect to Fig. 4, additional and/or alternative method steps
will be described which provide another type of body measurements which require longer
to compute but are more accurate. One and the same application of the method (e.g.
realized in an app or in a system) may be provided with options for generating both
types of measurements, depending e.g. on a selection of a user, on a given time frame,
on the requirements of the app and/or the like.
[0094] Thus, Fig. 4 describes a method according to another embodiment of the method according
to the first aspect of the present invention.
[0095] Steps S10, S20 and S30 may be performed as has been described in the foregoing.
[0096] In a step S100, at least a sub-set of the first input images is selected to provide
second input images. The selected first input images may be the same as the selected
third input images, may overlap with the third input images, or may be completely
separate from the third input images. Preferably, a number in the range of 10 to 300
frames are selected from the video as the second input images, and such frames are
selected that have the rotating person facing equidistant angles in the ground plane
(x-y plane in the coordinate system as defined in the foregoing) or, in other words,
in a plane perpendicular to the ground plane, as measured or indicated by
np, which may be determined as has been described in the foregoing.
[0097] The goal is to take frames that differ in camera viewpoint V as much as possible
regardless of the speed of rotation of the person, of video duration and frame rate.
Here a viewpoint V constitutes the position and orientation of the camera view's optical
axis.
[0098] In a step S110, in order to estimate a shape and/or pose of the person in each of
the second input images, a corresponding parameter value θ
ki for each image k of the second input images and for each parameter of a set of pre-defined
parameters θ
i by fitting a parametric human body model (such as SMPL, BlendSCAPE or the like) to
the person in each of the second input images based on the generated segmentation
maps.
[0100] In a simple variant, the influence of clothing on the person 2 while the images are
obtained in step S10 is neglected. In some variants, however, the present clothing
may be modelled by a noise term. The silhouettes of the segmentation maps 3 aligned
in 3-dimensional space based on the determined key points K
i may be used to form, or as a basis for interpolation of, a 3-dimensional silhouette
of the person. In such an approach, the images may be treated, instead of being consecutive
images of a rotating person 2, as images of a non-rotating person 2 taken from a camera
virtually revolving around the person 2 so that the entirety of the segmentation map
silhouettes form, or at least fit in, the 3-dimensional silhouette. The 3-dimensional
silhouette may then be used to define boundary conditions for a process reconstructing
the exact geometry of the "scanned" person 2 (i.e. of the "subject") apart from the
naked body of the person 2.
[0101] Then, depending on the item of clothing or type of clothing for which a clothing
size recommendation is sought, the noise term caused by the clothing worn by the person
2 may be neglected or included. For example, when the item of clothing or type of
clothing is a jacket which is usually worn over a sweater, a noise term caused by
the scanned person 2 wearing a sweater may be included. If, on the other hand, the
same person 2 also requests a clothing size recommendation for an undershirt, the
noise term may be neglected since undershirts are usually worn on the naked upper
body.
[0102] Here a parametric body model is a model with any set of parameters θ = (θ
1, θ
2, ..., θ
n) of R", 10 < n < 1000 that describe the geometrical shape, pose and other features
of the human body. The number of frames k is preferably in the range from 3 to 60
and the parametric body pose of a subsequent frame is initialized with the one from
the previous frame to ensure smoothness over time. In the present example, for every
parameter entry θ
i the mean value θ
avgi across the entire sequence of k-frames (i.e. of all θ
ki) is taken as a definitive parameter value. The resulting 3-dimensional body can be
optionally unposed to a canonical A pose or T pose and the geometry can be intermediately
stored in any standard format such as <.obj>, <.fbx> or similar.
[0103] In a step S120, a second input tensor
X̂ is generated. For example, the resulting geometry can be transformed to:
- A) a 3rd order tensor with width W, height H and number of channels C,
or
- B) a 2nd order tensor (matrix) with the width W, height H squashed into a single dimension
A = W X H and again the same number of channels C.
[0104] Both versions A) and B) describe all or a sub-set of the geometrical data within
the respective format and are referred to as second input tensor
X̂ herein. Both W, H or A represent an ordered list of all geometric vertices v and
vertex normals vn or parameter space vertices vp, polygonal face elements f, line
elements and others included in the format.
[0105] In a step S130, the second input tensor
X̂ is input into a second artificial neural network DCNN3 which is a deep convolutional
neural network configured and trained to learn a mapping function f
2 (
X̂) =y
2, wherein y
2 is a second body measurement vector.
[0106] In a step S140, the second body measurement vector
y2 is generated by the second artificial neural network based on the input tensor
X̂. y
2 preferably has a size in the range of from 5 to 100, wherein each of the 5 to 100
entries indicates one body measurement value. The body measurement vectors y
1 and y
2 may have the same size and their entries may indicate the same body measurement types,
respectively.
[0107] The second artificial neural network preferably has between 3 and 200 hidden layers
and is trained, e.g., on a dataset of between 100 and 1,000,000 unique instances of
a human body's geometry. The training examples may consist of a combination from different
datasets merged together as one data set with a common input format transformed into
the format of the second input tensor
X̂ from the following sources:
- up to scale reconstructed 3-dimensional scans of real people made with special hardware
like depth sensing time of flight cameras, KINECT or other IR and RGB sensors;
- up to scale reconstructed geometry of human bodies from Laser Scanners, stereo and
multi view camera systems; and/or
- synthetically generated 3-dimensional human bodies or reconstructed from monocular
video via parametric body models.
[0108] The 3-dimensional human bodies can have different poses and the target measurements
y for each training sample and were extracted from the up to scale scans or manually
taken.
[0109] As has been described in the foregoing, both the first and the second body measurement
vector y
1, y
2 may be provided, or it may be decided either based on pre-defined rules and/or on
the input of a user which body measurement vector is provided in a specific case.
It should be understood that the embodiment or embodiments as described with respect
to Fig. 1 to Fig. 4 may be adapted or modified according to any options, variants
or refinements as are described herein.
[0110] The first and/or the second body measurement vector y
1, y
2 may be stored for the person in a databased, e.g. connected to a unique identifier
of the person so that the person, as long as their body measurements do not significantly
change, in the future may enter only their unique identifier to be provided with their
first and/or second body measurement vector y
1, y
2.
[0111] Fig. 5 shows a flow diagram schematically illustrating a method according to an embodiment
of the second aspect of the present invention, i.e. a computer-implemented method
for providing a clothing size recommendation for a person.
[0112] First, a body measurement is generated according to an embodiment of the method according
to the first aspect of the present invention, in particular a first and/or second
body measurement vector y
1, y
2 is generated according to the method as described with respect to Fig. 1 to Fig.
4 in the foregoing.
[0113] In a step S200, the at least one generated body measurement vector y
1, y
2 is input into a clothing size recommendation model configured to generate a clothing
size recommendation based on the at least one body measurement vector y
1, y
2.
[0114] In a step S210, a clothing size recommendation is generated based on the at least
one body measurement.
[0115] In the presently described example, the clothing size recommendation in plain words
uses a weighted sum fr of two terms to indicate to the person to whom the body measurements
belong at least one item of clothing and whether and/or how that item would fit the
person.
[0116] For example, a user of the method (usually the person to whom the body measurements
belong) may select a particular item of clothing, and the clothing size recommendation
will indicate at least one of the sizings of said item of clothing which will fit
the person the best and/or how well said sizing will fit the person. Alternatively
or additionally, the user may also select a number of different items of clothing
and will be provided with the same information, either with the best-fitting sizing
of each item, or with the best-fitting sizing of all of the items and/or with corresponding
information how well said sizing or sizings will fit the person.
[0117] In the presently used example, the goal is to predict fr = (fr
sz1,fr
sz2, , fr
szM), i.e. a fit rating vector fr for all available product or brand sizings 1, 2, ...
M of a particular item of clothing, wherein the comma indicates a concatenation, and
wherein each entry may comprise a fit likelihood and a fit score. The plurality of
fit likelihood for all of the sizings may be designated as a fit likelihood vector.
In some variants, the fit rating vector fr comprises the fit likelihood vector concatenated
with the fit score for the sizing most likely to fit (according to the fit likelihood
vector).
[0118] The fit rating vector fr is preferably calculated as a weighted sum by:

[0119] In other words, the fit rating vector fr is calculated by a weighted sum of two terms,
f
NL and f
L (with corresponding weighting factors w
NL and w
L), corresponding to a hybrid combination of a learning (L) and a non-learning (NL)
approaches, realized by a learning sub-model LSM and a non-learning sub-model NLSM.
The weighting factors w
NL and w
L may be normalized such that they always add up to one, but can also be defined in
other ways.
[0120] In some variants, in order to calculate best fitting sizes of clothes, per-product
or per-brand sizing data is stored in a database. The sizing data containing the relevant
measurements can be derived from the sizing charts/tables provided by the clothing
manufacturer. Alternatively, the relevant measurements can be directly exported from
CAD software used to design clothes.
[0121] A sizing chart SC may be structured as follows in Table 1:
Table 1: Exemplary sizing chart SC
| B: <brand> |
Size 1, sz1 (e.g. S) |
Size 2, sz2 (e.g. M) |
Size 3, sz3 (e.g. L) |
Size 4, sz4(e.g. XL) |
| PC: <product category> |
|
|
|
|
| PF: <product fit> |
|
|
|
|
| |
| Relevant measurement 1 rm1 (e.g. height) |
m1,1,min |
m1,2,min |
m1,3,min |
m1,4,min |
| m1,1,max |
m1,2,max |
m1,3,max |
m1,4,max |
| Relevant measurement 2 rm2 (e.g. shoulders width) |
m2,1,min |
m2,2,min |
m2,3,min |
m2,4,min |
| m2,1,max |
m2,2,max |
m2,3,max |
m2,4,max |
| Relevant measurement 3 rm3 (waist circumference) |
m3,1,min |
m3,2,min |
m3,3,min |
m3,4,min |
| m3,1,max |
m3,2,max |
m3,3,max |
m3,4,max |
| Relevant measurement 4 rm4 (hips circumference) |
m4,1,min |
m4,2,min |
m4,3,min |
m4,4,min |
| m4,1,max |
m4,2,max |
m4,3,max |
m4,4,max |
| Relevant measurement 5 rm5 (sleeve length) |
m5,1,min |
m5,2,min |
m5,3,min |
m5,4,min |
| m5,1,max |
m5,2,max |
m5,3,max |
m5,4,max |
[0122] The relevant measurements on the first column refer to the different body measurement
types, and the remaining columns to minimum and maximum size values for these measurements
for the respective sizings (S, M, L, XL).
[0123] The product fit modifier
PF may be a number in a zero-mean range normalized to the range - 1 to + 1 defined by
the available fits for a brand
B. For example if a brand offers skinny fit, slim fit, regular, and comfort fit for
the product category
PC: pants, then
PF ∈ (
skinny fit: -1 ;
slim fit: -0.5 ;
regular: + 0.5 ;
comfort fit: + 1).
[0124] The product category PC can be for example 'tops' (shirts, t-shirts, sweaters, etc.),
'bottoms' (pants, jeans, tracksuit, shorts, etc.), 'outwear' (suit, blazer, coat,
parka) and 'dresses' and one may differentiate between men, women and children versions
for each, wherein the person may input additional information such as their gender,
age and/or the like. The relevant measurements and the boundary values in the sizing
chart SC can depend on the product category and the sizing data provided by the brand.
The particular sizes depend on the brand's definition and can be converted between
regional standards like EU, USA, DE, FR, UK, JP etc.
[0125] The learning (L) term is provided by the learning sub-model LSM, which may comprise
a general linear model (multivariate regression model) or a third artificial neural
network (regression network) with between 1 and 10 hidden layers of 3 to 100 hidden
units each. The learning sub-model LSM is trained to predict the fit rating for each
relevant body measurement type (depending on the product category, for example for
pants: waist circumference, hips circumference, crotch length, in seam length, leg
opening, etc..) with the user's relevant body measurements (according to the first
and/or the second body measurement vector y
1, y
2), gender, input height, fit preference and product sizing chart according to the
mapping function:

wherein SC is the target product/brand sizing chart; EBM is the extracted body measurements
of the user according to the first and/or the second body measurement vector y1, y2;
UFP is the person's (or: user's) fit preference; UH is the person's (or: user's) height;
UG is the person's (or: user's) gender; and UR is a user rating. As illustrated by
the middle terms of the above equation, the input of the learning sub-model LSM may
be divided in a feature vector u
i relating to a particular person (user or "subject") i, and a feature vector p
j,s comprising clothing item information, in particular information related to a specific
"target", i.e. target product/brand j and/or sizing s.
[0126] Herein, instead of directly using distance metric, a distance metric (e.g., Mean
Squared Error (L2) or similar) is used to calculate the loss of the learning sub-model
LSM during training via gradient descent-based optimization. In contrast to the non-learning
approach described in the following, here an additional input UR ("user rating") is
introduced. This can be seen as a content-based recommender system which predicts
the fit rating given the similarity of person's features to the fit ratings for persons
of similar body type, as well as the similarity of products they ordered and kept
in the past and/or dissimilarity with products they ordered and sent back in the past
(encapsulated e.g. in "return information"), as well as and the similarity of target
product/brand, where the similarity measure is implicitly learned. In other words,
the user rating may encapsulate information about products the same person has previously
ordered and kept and/or about items of clothing the same person has previously returned.
[0127] The non-learning (NL) approach uses a distance function ϕ using a distance metric
between the measurements present in the sizing chart of the target product/brand (i.e.
parameters of at least one item of clothing) and the same body measurements according
to the provided first and/or the second body measurement vector y
1, y
2.
[0128] The fit likelihoods (i.e. a number indicating whether a particular sizing is the
best-fitting sizing for the person 2) for each sizing of a certain item of clothing
and brand may be calculated by building a sizing vector sz
sizing for each available sizing as
sz
sizing = (m
1,avg, m
2,avg,..., m
k,avg) with the mean reference values for each of K body measurement types the K measurement
points and M available sizes of a product.
[0129] The distance function ϕ may then receive parameters and input and may then output
a second fit likelihood vector and/or a fit output score fr
NL, i.e.:

wherein SC is the target product/brand sizing chart; PF is the product fit modifier;
EBM is the extracted body measurements of the user according to the first and/or the
second body measurement vector y1, y2; UFP is the person's (or: user's) fit preference;
UH is the person's (or: user's) height; UG is the person's (or: user's) gender. Any
or all of these parameters may be removed or replaced with other suitable parameters,
and the user requesting the clothing size recommendation may be asked (e.g. by an
app, a web user interface, a web application and/or the like) to input any or all
of these parameters.
[0130] The distance function ϕ can use any distance metric such as Mean Square Error (MSE)
or the like.
[0131] Preferably, depending on how much data is available, the weights between the learning
(L) and a non-learning (NL) terms are changed such that more weight is put towards
NL when labelled data is scarce and vice versa the weight of the L approach is increased
as the dataset grows. This can for example be encoded by providing that:

wherein D
I is the number of users that ordered a particular item and provided fitting information
for it (preferably in relationship to at least one of their own body measurements,
for example "waist was too narrow" together with information that the user has a specific
waist body measurement), τ is a threshold constant between 10 and 1,000,000. The labelled
dataset that is used for training the third artificial neural network is automatically
growing as users who order using the recommendation and try the clothes on, submit
fitting information as feedback (e.g. on the recommendations).
[0132] As soon as more than τ ratings have been submitted by shoppers for a product w
L > 0 and the learning sub-model LSM starts to contribute towards the total fit rating
vector fr.
[0133] The fit rating vector fr (indicating for each sizing how well it fits the person
2) can be converted into a fit likelihoods vector fl for all available sizes by applying
a softmax function σ over the fit rating vector fr:

[0134] The item of clothing and sizing most likely to fit the person is then predicted e.g.
via the formula argmax
SZi(fl)
[0135] In Fig. 6 an exemplary flow diagram according to further embodiments of the present
invention is shown, in particular according to embodiments of the first aspect and/or
of the second aspect. In the lower part of Fig. 6, relevant measurements for a particular
product by a particular brand of the category 'bottoms' (specific: pants) are shown
in an exemplary user interface 30 as: waist, hips, crotch, inseam, and the user may
rate them e.g. according to eleven levels from "Tight" to "loose" or from "Too short"
to "Too long". The relevant measurements depend on the sizing data. The ratings are
positive 0 if the product is just the right size (no markings by the user in the graphical
user interface), positive if the product is too big (or too loose) on that relevant
measurement point and negative if the product is too small (or too tight).
[0136] According to Fig. 6, a process may be as follows: a user 10 decides to shop at an
online web shop 20 for an item of clothing.
[0137] Three options are in principle available for the user 10 to provide their body measurements.
As an option A, the user may use the method according to any embodiment of the first
aspect of the present invention, in particular according to any embodiment or embodiments
of Fig. 1 to Fig. 4, to generate first and/or the second body measurement vector y
1, y
2. For example, in order to provide option A, next to a specific item of clothing that
the user 10 is interested in, the browser displays on the web site of the web shop
20 a button indicating "find my size" or something similar. The result of clicking
this button (or otherwise activating the button, e.g. by touching it on a touch screen
device) can depend on which device is used to display the browser. If the browser
is already displayed by mobile device 1 comprising a suitable camera, for which a
check may be made, then the app may immediately be started as a web application, for
example in a new browser tab.
[0138] If the user 10 is browsing the web shop 20 on a device without a suitable camera,
e.g. a desktop PC, the user 10 may be provided automatically with a QR code and/or
a URL. The user 10 can then either scan the QR code with a mobile device with a camera,
or enter the URL in a mobile device with a camera, as a result again a browser tab
with the app will be opened.
[0139] Once the user 10 has been guided to the app, they follow the instructions provided
automatically by the app to complete the scan and record a video of themselves (or
another person 2) rotating. The video can either be recorded alone by placing the
mobile device on the ground or a friend nearby can take a video of the user 10. When
the measurement extraction is completed the user 10 sees a message to return back
to the web shop 20 either on desktop PC or on the mobile device 1. From now on, as
will be described in the following, next to every product they see on the web shop
20, they will be automatically shown e.g. the recommended sizes (or sizings) for the
top two most likely-to-fit product sizes and the respective fit score (fit rating)
for the single most likely-to-fit sizing. Using the app, the user 10 may be prompted
to also enter user parameters such as user gender UG, user height UG, user fit preference
UFP and so on. Any or all of these may be marked by the app as "required" or as "optional".
[0140] The recorded video, optionally together with acquired angles and orientation of the
mobile device 1 over time, and further optionally user parameters (height, gender,
fit preference) can be sent to a remote computing device (such as a web server or
a virtual machine, VM, on the Cloud) which temporarily stores the video locally and
stores the rest in a database attached to the browser session id.
[0141] The remote computing device may run the algorithm to select the images/frames according
to the method described in the foregoing and reconstruct the human body and extract
measurements. The resulting measurements are saved in the database an attached to
the browser session id. The video and selected frames are preferably deleted afterwards
to ensure privacy. For every new browser session id a (preferably human-readable)
unique identifier UID) is generated. It is attached to the initial browser session
id on the web shop 20 where the scan was initiated from, but can be reused on any
other web shop 20 that employs the methods described herein to directly get recommendations
without the person 2 having to be scanned anew.
[0142] As an alternative option B, the user 10 may choose to provide manual measurements
MM.
[0143] As a third alternative option C, the user 10 may input (e.g. into the app) a unique
identifier UID which has already been previously provided to them and which encapsulates
or indicates the previously generated first and/or the second body measurement vector
y
1, y
2 (or: EBM).
[0144] Options B and C may be offered to the user 10 within the app, or may be offered to
the user 10 as alternatives to the app, for example as separate buttons shown by the
browser displaying the website of the web shop.
[0145] In any case, the body measurements provided according to the chosen option A, B,
or C are provided to a service provider, in particular to a web shop 20. In addition,
user parameters as have been described in the foregoing may also be provided to the
web shop 20 such as user fit preference UFP, user height UH and/or user gender UG.
In case the user 10 already has a unique identifier UID, such user parameters may
already be stored as linked to said unique identifier UID so that, choosing option
C, the user 10 conveniently does not have to input them again.
[0146] Optionally, in addition return information RI may be provided to (and/or may be already
stored at) the web shop 20, for example as has been described in the foregoing. Further
optionally, clothing item information CII may be provided to the web shop 20, e.g.
from a sizing chart SC.
[0147] The user 10 is then provided with a clothing size recommendation REC for at least
one item of clothing, for example for a type of pants that the user is interested
in. The clothing size recommendation REC may be made according to any embodiment of
the method according to the second aspect of the present invention, in particular
according to any variant as described with respect to Fig. 5 in the foregoing.
[0148] The user 10 then makes an order ORD of a particular sizing as recommended to the
user 10. If the user 10 has used method A and has not previously had a unique identifier
UID provided then, as a result of the order ORD (as illustrated schematically in Fig.
6), or even as a result of performing the steps of option A (i.e. of having the first
and/or the second body measurement vector y
1, y
2 generated for them), the user 10 may be automatically provided with a unique identifier
UID (upper dashed line in Fig. 6), which will be linked with the first and/or the
second body measurement vector y
1, y
2.provided which will be stored in a database DB. Any new, or updated user parameters
such as user fit preference UFP, user gender UG, user height UH may also be linked
to the unique identifier UID and stored in the database DB.
[0149] Said user 10 may then, for their next shopping, choose option C of simply entering
their unique identifier UID into a prompt (e.g. in a web interface, a web application,
or in any other type), as a result of which the corresponding first and/or the second
body measurement vector y
1, y
2.(or: EBM) will be retrieved from the database DB.
[0150] In case the user 10 then wants to return, RET, the ordered item of clothing, the
user 10 will have to fill in a user interface 30 inquiring about fitting information
of the various body measurements of the user 10 as related to the specific purchased
item of clothing being returned. As has been described in the foregoing, a schematic
user interface 30 is shown in Fig. 6.
[0151] As is also illustrated in Fig. 6, the information entered by the user 10 may then
form part of the return information RI that is used to determine, in step S200 and
steps S210, the next clothing size recommendation for the same user 10 who is preferably
identified by their unique identifier UID.
[0152] Moreover, preferably the third artificial neural network used in the learning sub-model
LSM is then trained, or updated, based on the information input by the user 10 into
the user interface 30 (or based on the entirety of the return information RI).
[0153] In other words, the third artificial neural network may be pre-trained and may then
be trained by the return information provided by all of the users 10. The method as
outlined above ensures that for the first few (depending on the choice of τ) uses
of the user 10 of the described system, the non-learning sub-model NLSM will provide
suitable results, and that with increasing return information (for the rare cases
in which the NLSM was incorrect) the learning sub-model LSM will improve, and weigh
in on the clothing size recommendation, more and more. The users 10 will thus help
to train the learning sub-model which guarantees increasing satisfaction of the users
10 with the process described herein. Thus, the present invention also teaches a method
for training a learning sub-module, LSM, more specifically an artificial neural network
or a learning sub-module, LSM, using return information provided by a plurality of
users.
[0154] The unique identifier UID is preferably a short code from between 3 and 6 digits
and letters of lower or uppercase (e.g. 3G7, 412k, OL65A, or A1b2C3) that is attached
to the browser session ID of the web shop 20 the user 10 was routed to from to the
scanning app used for acquiring the video of the user 10. After the EBM are extracted
from the video, preferably all visual information is deleted and only the information
needed for the clothing size recommendations (e.g. EBMs, user height UH, user gender
UG and user fit preference UFP) corresponding to the unique identifier UI is stored
in the database DB. The unique identifier UI is anonymous in the sense that it only
connects a browser session ID and a set of letters and digits to a limited set of
values, which are not unique to a specific user/person. No names, emails or other
information which makes it easy to identify a person can be inferred from the unique
identifier UI.
[0155] For further improved data protection, optionally, upon submitting a video by the
app to a server, or even already during the acquisition of the video, the face of
the person in the video can be hidden e.g. via blurring so identification in the time
span between uploading and generating measurements is not possible. The unique identifier
UI can be re-used on the same web shop 20 after the browser sessions has expired,
when shopping from another device or shared with friends and family to get product
size recommendations for the person (user 10) it was generated for.
[0156] The unique identifier UI can also be used on other supported web shops 20 to get
size recommendations for their fashion products. The unique identifier UI does not
grant access to view the concrete measurements, but only to the measurements and/or
the recommendations associated with the measurements that were extracted from the
video and stored. The unique identifier UI can also be used offline in retail stores
to always get the most likely to fit size of clothes and try them on. Here the user
10 opens, for example, a web application in the browser by scanning a code (e.g. a
visual code such as a QR code or barcode, or another type of wireless code such as
RFID) on the clothes tags/labels or directly navigates to the web application and
scans the bar code of the product. If the user does not already have a unique identifier
UI from before, the scanning process can also be done in a fitting room in the store.
[0157] Fig. 7 shows a schematic block diagram illustrating a computer program product 400
comprising executable program code 450 configured to, when executed (e.g. by a computing
device), perform the method according to an embodiment of the first aspect of the
present invention and/or according to an embodiment of the second aspect of the present
invention, in particular as has been described with respect to any of Figures 1 through
6.
[0158] Fig. 8 shows a schematic block diagram illustrating a non-transitory computer-readable
data storage medium 500 comprising executable program code 550 configured to, when
executed (e.g. by a computing device), perform the method according to an embodiment
of the first aspect of the present invention and/or according to an embodiment of
the second aspect of the present invention, in particular as has been described with
respect to any of Figures 1 through 6.
[0159] Fig. 9 shows a schematic block diagram illustrating a system 5 according to an embodiment
of the fifth aspect of the invention, i.e. a system 5 configured to perform the method
according to any embodiment of the first aspect of the present invention and/or configured
to perform the method according to any embodiment of the second aspect of the present
invention, in particular in accordance with any of the methods and variants thereof
described with respect to Fig. 1 to Fig. 6. The system 5may in particular comprise
a mobile device 1 (such as a smartphone, a tablet computer and/or the like) and a
remote computing device 6 (such as a cloud computing platform or a web server). The
remote computing device 6 may comprise a database DB and/or may be operatively coupled
to a database DB of the system 5 for storing data thereon, for example learning models
to be trained, return information, user parameters, unique identifiers and/or the
like.
[0160] In the foregoing detailed description, various features are grouped together in the
examples with the purpose of streamlining the disclosure. It is to be understood that
the above description is intended to be illustrative and not restrictive.
[0161] In a short summary, the present invention provides at least a method for determining
at least one body measurement by obtaining a plurality of images, in particular a
video, of a user. The video can be easily acquired by the user or a friend using a
mobile device comprising a camera, such as a digital camera, a smartphone, a table
computer and/or the like. Optionally, the invention also provides generating a clothing
size recommendation based at least on the determined at least one body measurement.
The clothing size recommendation may take additional parameters into account. Both
for the determining of the body measurements and/or for the generating of the clothing
size recommendation, artificial intelligence entities, in particular machine learning
entities, more particular artificial neural networks may be used. Preferably, return
information comprising ratings of users of whether and how well certain items of clothing
fit them in the past is collected and used for the training of an artificial intelligence
entity.