BACKGROUND
[0001] Transportation of items is typically a delivery process that includes personnel identifying
items needed to be transported; loading the items onto a transportation vessel, such
as a vehicle, a ship, a train, a plane, and the like; operating the transportation
vessel to a drop-off location; and unloading the items at the drop-off location. This
delivery process may be performed by delivery companies, moving companies, personnel
relocating to another home, and so forth. Efficiently completing this delivery process
provides these personnel with more resources or time to complete other tasks. Computer-implemented
technologies can assist users in facilitating the delivery process. Example computer-implemented
technologies include computer applications or services such as global positioning
systems (GPS) to facilitate navigation of the transportation vessel to the drop off
location, task-managing services to facilitate scheduling the delivery, and the like.
SUMMARY
[0002] This Summary is provided to introduce a selection of concepts in a simplified form
that are further described below in the Detailed Description. This Summary is not
intended to identify key features or essential features of the claimed subject matter,
nor is it intended to be used as an aid in determining the scope of the claimed subject
matter.
[0003] The technologies described in this disclosure are directed toward computerized systems
and methods for providing analytics indicative of object detection or fill-level detection
at or near real-time based on video data captured during an unloading or loading process,
for example, to assess efficiency. A computerized system may detect and classify an
object based on the video data. The computerized system may determine a fill-level,
for example, expressed as a percentage, fraction, time until completion, or visual
progress indication, based on the video data.
[0004] In particular, an aspect of the present disclosure may include receiving a video
of the real-time or near real-time state of the inside of a storage compartment, such
as the inside of a loading trailer, while items are being loaded onto or out of the
storage compartment. The video may include a video stream of the storage compartment
as captured by a camera, such as a monocular camera. Video data may be extracted from
the video, to determine at least one video data feature. The video data feature may
correspond to an individual measurable property, such as a characteristic of an object
being detected in the video (e.g., referred to as "object-detection feature") or of
a fill-level of the unloading process (e.g., referred to as "fill-level data feature").
The detectable object may include a human object, a pallet object, a load stand object,
a parcel retainer object, a parcel (on a conveyor belt object), a forever bag object,
a conveyor belt object, or a small container object, and the like. The fill-level
may include an indication of the level of fullness of items in the storage compartment,
for example, expressed as a percentage (e.g., percent full), fraction (1/4, 1/3, 1/2,
and so forth), a time to completion of the loading (to 100%) or unloading (to 0% full),
or any other visual indication.
[0005] Aspects of the present disclosure include determining a fill-level associated with
the inside of the storage compartment based on at least the video and the fill-level
data feature, as well as detecting, tracking, and/or classifying an object detected
inside the storage compartment based at least on the video and the object-detection
data feature. Thereafter, analytics indicative of the determined fill-level and the
classified object may be generated at or near real-time. The analytics may be generated
based on a classification and/or predictive machine learning model, for example. Aspects
of the present disclosure include communicating the analytics to an external device,
such as a display, to provide real-time indications regarding the objects detected
and the fill-level of the storage compartment.
[0006] In this manner, the various implementations described herein provide a technique
to computing systems employing image processing and machine learning techniques to
a video data stream to generate analytics associated with the unloading or loading
process at or near real-time. Whereas conventional approaches fail to provide any
analytics in association with a video of the unloading process occurring, and instead
may require an administrator to manually monitor the individual in-bound/out-bound
bay doors (which may cause unnecessary expenses associated with having a dedicated
administrator, for each storage compartment being loaded/unloaded, who may fail to
manually identify objects or determine a fill-level), aspects of the present disclosure
can automatically determine a fill-level of the storage compartment, as well as detect,
track, and classify an object in the storage compartment, at or near real-time. By
automatically determining a fill-level and classifying an object, target analytics
may be determined and communicated in the form of graphical user interface (GUI) elements,
alerts, data analytics to provide real-time visibility, analytics, estimates to completion,
and so forth.
[0007] Accordingly, present embodiments provide improved technologies for generating, at
or near real-time, analytics indicative of a fill-level and objects detected in a
computationally inexpensive manner (e.g., employing computationally inexpensive machine
learning techniques on two-dimensional video frames) and reduce error and expenses
associated with instead of having a dedicated administrator make such determination
for a large number of storage compartments being loaded or unloaded.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The technology described herein is described in detail below with reference to the
attached drawing figures, wherein:
FIG. 1 is a block diagram of an exemplary computing environment suitable for use in
implementing some embodiments of this disclosure;
FIG. 2 is a block diagram illustrating an example system in which some embodiments
of this disclosure are employed;
FIG. 3 A is a flow diagram of an example process for employing an object-detection
machine learning (ML) model and a fill-level ML model on video frames captured by
a camera, according to some embodiments of this disclosure;
FIG. 3B is a screenshot of an example graphical user interface (GUI) including a stream
region and an analytics region, according to some embodiments of this disclosure;
FIG. 3C is a screenshot of an example GUI including a stream region and an analytics
region, according to some embodiments of this disclosure;
FIG. 3D is a flow diagram of an example process for employing an object and fill-level
analytics deploying engine on outputs of an object-detection ML model and a fill-level
ML model, according to some embodiments of this disclosure;
FIG. 3E is a screenshot of an example GUI generated by an object and fill-level analytics
deploying engine, according to some embodiments of this disclosure;
FIG. 3F is a flow diagram of an example process for employing an object and fill-level
analytics deploying engine on outputs of an object-detection ML model, according to
some embodiments of this disclosure;
FIG. 3G is a screenshot of an example GUI generated by an object and fill-level analytics
deploying engine, according to some embodiments of this disclosure;
FIG. 4A is a flow diagram of an example process for detecting, tracking, and classifying
objects from video frames captured by a camera, according to some embodiments of this
disclosure;
FIG. 4B is a flow diagram of an example process of an object-detection ML model being
applied to detect objects in video frames, according to some embodiments of this disclosure;
FIG. 4C is a flow diagram of an example process of an object-detection ML model being
applied to detect a status associated with a video frame, according to some embodiments
of this disclosure;
FIG. 4D is a flow diagram of an example process of an object-detection ML model being
applied to detect a status associated with a video frame, according to some embodiments
of this disclosure;
FIG. 4E is a flow diagram of an example process of an object-detection ML model being
applied to detect a status associated with a video frame, according to some embodiments
of this disclosure;
FIG. 4F is a flow diagram of an example process of an object-detection ML model being
applied to detect objects in video frames, according to some embodiments of this disclosure;
FIG. 4G is a flow diagram of an example process of an object-detection ML model being
applied to detect objects in video frames, according to some embodiments of this disclosure;
FIG. 4H is a graph diagram of a time graph of results of an object-detection ML model
being applied to determine objects in video frames, according to some embodiments
of this disclosure;
FIG. 4I is a graph diagram of an example time graph of results of an object-detection
ML model being applied to track objects in video frames, according to some embodiments
of this disclosure;
FIG. 5A is a flow diagram of an example process for post-processing data associated
with classified objects, according to some embodiments of this disclosure;
FIG. 5B is a graph diagram of an example time graph of results of an object-detection
ML model being applied before post-processing engine is employed, according to some
embodiments of this disclosure;
FIG. 5C is a graph diagram of an example time graph of results of an object-detection
ML model employing a post-processing engine, according to some embodiments of this
disclosure;
FIG. 6A is a flow diagram of an example process for determining and tracking a fill-level
associated with from video frames captured by a camera, according to some embodiments
of this disclosure;
FIG. 6B is a screenshot of edges detected inside of a storage compartment by employing
a fill-level ML model, according to some embodiments of this disclosure;
FIG. 6C is a screenshot of a fill-level ML model applied to determine a fill-level
based on edges detected inside of a storage compartment, according to some embodiments
of this disclosure;
FIG. 7A is a flow diagram of an example process for post-processing data associated
with fill-level, according to some embodiments of this disclosure;
FIG. 7B is a graph diagram of an example time graph of results of a fill-level ML
model employing a post-processing engine, according to some embodiments of this disclosure;
FIG. 7C is a graph diagram of an example time graph of results of a fill-level ML
model being employing a post-processing engine, according to some embodiments of this
disclosure;
FIG. 8 is a flow diagram of an example process for generating and communicating analytics
indicative of a fill-level and a classified object, according to some embodiments
of this disclosure;
FIG. 9 is a flow diagram of an example process for causing presentation of a GUI that
includes a stream region and an analytics region, according to some embodiments of
this disclosure;
FIG. 10 is a flow diagram of an example process for deploying a trained fill-level
ML model and a trained object-detection ML model, according to some embodiments of
this disclosure;
FIG. 11 is a block diagram of a computing device for which embodiments of this disclosure
are employed; and
FIG. 12 is a block diagram of a computing environment in which embodiments of the
present disclosure may be employed.
DETAILED DESCRIPTION OF THE INVENTION
[0009] The subject matter of aspects of the present disclosure is described with specificity
herein to meet statutory requirements. However, the description itself is not intended
to limit the scope of this patent. Rather, the inventors have contemplated that the
claimed subject matter might also be embodied in other ways, to include different
steps or combinations of steps similar to the ones described in this document, in
conjunction with other present or future technologies. Moreover, although the terms
"step" and/or "block" may be used herein to connote different elements of methods
employed, the terms should not be interpreted as implying any particular order among
or between various steps herein disclosed unless and except when the order of individual
steps is explicitly described. Each method described herein may comprise a computing
process that may be performed using any combination of hardware, firmware, and/or
software. For instance, various functions may be carried out by a processor executing
instructions stored in memory. The methods may also be embodied as computer-usable
instructions stored on computer storage media. The methods may be provided by a standalone
application, a service or hosted service (standalone or in combination with another
hosted service), or a plug-in to another product, to name a few.
[0010] Aspects of this disclosure are directed toward computer technologies for providing
analytics indicative of object detection or fill-level detection at or near real-time
based on video data associated with a video (e.g., live video stream) captured by
a camera during an unloading or loading process. For example, and at a high level,
a computerized system may detect, track, and classify an object based on the video
data, and the computerized system may determine a fill-level expressed as a percentage
or time until completion may be determined based on the video data.
[0011] As used herein, the "video stream" or "video" may correspond to a continuous transmission
of video frames or images from one device, such as a camera or a server, to another
device, such as a server, computing device, or client device. From the video, computer
technologies may extract video data, which, as user herein, may refer to disparate
types of structured or unstructured data that can be determined from the video, including
but not limited to video data indicative of pixels and associated time stamps, changes
in pixels, and so forth. In some embodiments, the video data is frame-specific. As
used herein, the "loading process" or the "unloading process" may refer to the steps
associated with loading to or unloading items from a storage compartment, either through
a manual process, an automatic process (for example, using robots and other computing
system, such as those discussed herein, among others), or a combination thereof. Although
the embodiments discussed herein include people performing the loading/unloading process,
it should be understood that any alternative or additional mechanisms may be used
to perform the loading/unloading process, including but not limited to robotic devices,
an autonomous shifter, an autonomous loader, building equipment (such as cranes, backhoes,
and so forth), or conveyor belts, to name a few. The "loading/unloading process" may
be used to refer collectively to the process of displacing items within (by loading
the items into and/or unloading the items out of) the storage compartment. As used
herein, "storage compartment" may refer to any suitable space that temporarily stores
items, for example, for temporary storage, cooling, transportation, and/or so forth.
Example storage compartments include a trailer, a storage facility sitting on real
property, a warehouse, or an office space, to name a few.
[0012] Conventionally, existing approaches may fail to provide users with computing logic
and infrastructure to generate analytics at or near real-time indicative of progress
during the delivery process. As a result, certain existing approaches cause inefficiencies
for enterprises or personnel looking to improve their operations by ensuring that
items have been unloaded and planning for future deliveries. In particular, many existing
approaches fail to provide a resource and computationally efficient technique for
generating, at or near-real time associated with loading items onto and unloading
items from a storage compartment during the delivery process; the implementation of
which is difficult to achieve in practice given the engineering, hardware, and data
processing challenges of such a possible solution.
[0013] As such, many existing approaches suffer from inefficiencies, such as: failing to
ensure that items have been unloaded, not allowing for the foresight to plan future
deliveries, providing inaccurate predictions regarding completion times so that enterprises
and personnel can plan future deliveries, and so forth. At most, existing approaches
may employ purely historical data to predict an ending time of the unloading/loading
process. However, the historical data fails to account how the current loading/unloading
process may differ from the samples making up the historical data (for example, based
on the number of employees currently completing the loading/unloading process, the
size of the current storage compartment, the number of items to load/unload, and so
forth). As such, the existing approach employing historical data may be inaccurate
and unfeasible across different loading/unloading processes. First, as a result of
this inaccuracy, the door to the storage compartment may degrade at a faster rate
due to opening and closing the door more often than necessary. For example, the door
may close because the historical data inaccurately predicted that the loading/unloading
process has been completed, but the door would have to be reopened since the loading/unloading
process has in fact not been completed. Second, as a result of this inaccuracy, the
door to the storage compartment may close when it has been inaccurately predicted
based on this historical data that the loading/unloading process has been completed.
For example, the door may close when there are personnel still performing the loading/unloading
process. Third, in some instances, an automatic system (e.g., employing an automatic
door and a conveyor) may fail to load or unload parcels from a temporary storage compartment.
In these automatic systems, the door may automatically close and open based on the
historical data, such that any disruption to the unloading process may cause the automatic
door to close or open before the loading/unloading process is complete. As a result,
it may be beneficial to develop a system providing real-time analytics with an option
for detecting an object, such as a person, and further for providing real-time predictive
analytics to avoid inaccurate estimates regarding the loading.
[0014] Another existing approach may include employing a LiDAR sensor. A LiDAR sensor is
a device that determines ranges by targeting an object or a surface with a leisure
and measuring the reflected light to return to a receiver of the LiDAR sensor. One
drawback of LiDAR sensors includes the computational expenses associated with operating
the LiDAR sensor, which becomes more resource intensive as LiDAR sensors become added
to storage compartments. Further, many existing storage compartments do not currently
include LiDAR sensors, yet some existing storage compartments may include an existing
camera, such as a security camera or a monocular camera. Thus, leveraging hardware
that may currently exist in the storage compartment may facilitate scaling so that
object detection using the existing cameras may be incorporated into a wider range
of storage compartments. The burden associated with scaling may further be reduced
through the use of computationally inexpensive operations, such as those discussed
herein.
[0015] With this in mind, aspects of the present disclosure include receiving a video of
the real-time or near real-time state of the inside of a storage compartment, such
as the inside of a trailer while items are being loaded onto or out of the storage
compartment. The video may include a video of the storage compartment as captured
by a camera, such as a monocular camera. Video data may be extracted from the video,
to determine at least one video data feature. The video data feature may correspond
to an individual measurable property, such as a characteristic of an object being
detected in the video (e.g., referred to as "object-detection feature") or of a fill-level
of the unloading process (e.g., referred to as "fill-level data feature"). The video
data features can be conveniently described by a feature vector, such as a nth dimensional
vector of numerical features that represent some phenomena, where n is any positive
integer, such as 1, 2, 3, and so forth. The video data features may be used to train
an object-detection machine learning (ML) model and a fill-level ML model, as discussed
herein.
[0016] In the context of training a ML model, such as the object-detection ML model and
the fill-level ML model discussed herein, the video data features may correspond to
"edges" and "objects." An "edge" in an image may refer to an image contour across
which the brightness or hue of the image changes, for example, in the magnitude or
in the rate of change in the magnitude. The edge may be detected using histograms
or Gaussian convolution numerical methods. The edges in an object may include the
surface-normal discontinuity (top vs. side), depth discontinuities (e.g., side of
an object), surface-reflectance or reflective discontinuities (e.g., text or ink),
or illumination discontinuities (e.g., shadows), to name a few. An "object" in an
image may refer to an entity identified as separate from another entity or background,
for example, based on edges separating the entities from each other or from a background.
For example, an object may be detected based on a gradient threshold that is exceeded
for gradients between neighboring pixels.
[0017] The object-detection ML model may be trained using supervised learning whereby training
data, such as label data and/or unlabeled data, is provided to the object-detection
ML model as discussed herein. For example, the labeled data may include a positive
or negative label for a corresponding edge or object. For example, labeled data for
a human object, a pallet object, a load stand object, a parcel retainer object, a
parcel on a conveyor belt object, a forever bag object, a conveyor belt object, or
a small container object, and the like, may be provided to the object-detection ML
model. However, it should be understood that the object-detection ML model may also
be trained via suitable techniques other than supervised learning, including unsupervised
or reinforcement learning, as may be appreciated by a person having ordinary skill
in the art.
[0018] Similarly, the fill-level ML model may be trained using supervised learning whereby
training data, such as label data and/or unlabeled data, may be provided to the fill-level
ML model as discussed herein. For example, the labeled data may include a positive
or negative label for a corresponding edge or object. For example, labeled data for
a plurality of edges corresponding to intersections of planes (e.g., intersection
of the side wall with the floor), may be provided to the fill-level ML model. However,
it should be understood that the object-detection ML model may also be trained via
suitable techniques other than supervised learning, including unsupervised or reinforcement
learning, as would be appreciate by a person having ordinary skill in the art.
[0019] Aspects of the present disclosure include determining, using the fill-level ML model,
a fill-level associated with the inside of the storage compartment based on at least
the video and the fill-level data feature. Additionally, aspects of the present disclosure
include detecting, tracking, and classifying, using the object-detection ML model,
an object detected inside the storage compartment based at least on the video and
the object-detection data feature. Thereafter, analytics indicative of the determined
fill-level and the classified object may be generated at or near real-time. As discussed,
herein the output of the object-detection ML model (e.g., the classified object) and
the fill-level ML model (e.g., the determined fill-level) may be post-processed, as
discussed herein, to improve results. Aspects of the present disclosure include communicating
the analytics to an external device, such as a display to provide real-time indications,
symbols, or icons regarding the objects detected and the fill-level of the storage
compartment. Additionally, or alternatively, an autonomous shifter, an autonomous
loader, and/or a storage door may be operationally controlled based on the analytics
to coordinate other aspects of a delivery process, such as the loading/unloading process,
a transportation process, and so forth.
[0020] Accordingly, embodiments described herein provide improved technologies for computer
systems for, among other aspects, enabling a computer system to train and employ the
object-detection ML model and the fill-level ML model to determine and generate analytics
indicative of an object detected or a fill-level associated with the inside of a storage
compartment during the loading/unloading process. As previously noted, many existing
approaches fail to provide a resource and computationally efficient technique for
generating, at or near-real time associated with loading items onto and unloading
items from a storage compartment during the delivery process; the implementation of
which is difficult to achieve in practice given the engineering, hardware, and data
processing challenges of such a possible solution. For example, an existing approach
involves employing LiDAR sensors. However, a major drawback of LiDAR sensors is the
computational expenses associated with operating the LiDAR sensors, which becomes
more resource intensive as LiDAR sensors become added to storage compartments. Further,
many existing storage compartments do not currently include LiDAR sensors.
[0021] In contrast, because many storage compartments may include a camera, scaling is facilitated
through the embodiments discussed herein, which employ computationally inexpensive
processing of two-dimensional images and applying machine learning operations. In
this way, embodiments provide new, enhanced object detection and fill-level determination
technologies during the loading/unloading processes and also reduce resources that
would be required from installation of new LiDAR sensors in each storage compartment
or from manual assignment of a designated administrator to each storage compartment.
[0022] Turning now to FIG. 1, a block diagram is provided showing an example operating environment
100 in which some embodiments of the present disclosure may be employed. It should
be understood that this and other arrangements described herein are set forth only
as examples. Other arrangements and elements (e.g., machines, interfaces, functions,
orders, and groupings of functions) can be used in addition to or instead of those
shown, and some elements may be omitted altogether for the sake of clarity. Further,
many of the elements described herein are functional entities that may be implemented
as discrete or distributed components or in conjunction with other components, and
in any suitable combination and location. Various functions described herein as being
performed by one or more entities may be carried out by hardware, firmware, and/or
software. For instance, some functions may be carried out by a processor or processing
circuitry executing instructions stored in memory.
[0023] Among other components not shown, example operating environment 100 includes a number
of user devices, such as user devices 102a and 102b through 102n; displays 103a and
103b through 103n; a number of data sources, such as data sources 104a and 104b through
104n; a number of cameras, such as cameras 105a and 105b through 105n; server 106;
sensors, such as sensors 107a and 107b through 107n; network 110; autonomous shifter
120; and autonomous loader 122. It should be understood that environment 100 as shown
in FIG. 1 is an example of one suitable operating environment. Any or each of the
components shown in FIG. 1 may be implemented via any type of hardware component having
a computing device, such as computing device 1100 as described in connection to FIG.
11, for example. These components may communicate with each other via network 110,
which may include, without limitation, one or more local area networks (LANs) and/or
wide area networks (WANs). In exemplary implementations, network 110 comprises the
Internet and/or a cellular network, amongst any of a variety of possible public and/or
private networks employing any suitable communication protocol.
[0024] It should be understood that any number of user devices, shifters, loaders, servers,
and data sources may be employed within operating environment 100 within the scope
of the present disclosure. Each may comprise a single device or multiple devices cooperating
in a distributed environment. For instance, server 106 may be provided via multiple
devices arranged in a distributed environment that collectively provide the functionality
described herein. Additionally, other components not shown may also be included within
the distributed environment.
[0025] As used herein, "autonomous shifter" refers to a transportation device, such as a
vehicle (e.g., electric vehicle), used for coupling to, decoupling from, and transporting
semitrailers and other containers (hereinafter referred to as "trailer" or "storage
compartments") storing items for transportation. In one embodiment, the autonomous
shifter comprises at least one computing device, such as the computing device 1100
shown in FIG. 11, configured to execute computer-readable instructions to automatically
couple the autonomous shifter to trailer(s), automatically transport the trailer(s)
to target drop-off locations, and/or automatically find another trailer(s) to couple
to.
[0026] As used herein, "autonomous loader" refers to any loading device, such as a robot,
used to automatically load items to a storage compartment (e.g., trailer) or unload
items from the storage compartment, for example, for the autonomous shifter to transport
the loaded or unloaded trailer as discussed herein. Example autonomous loaders may
include an autonomous forklift or conveyor system. In one embodiment, the autonomous
loader comprises a computing device, such as the computing device 1100 as shown in
FIG. 11, configured to execute computer-readable instructions to automatically load
items onto the trailer(s), automatically organize the items on the trailer(s), and/or
automatically unload the items from the trailer(s). In this manner, the autonomous
shifter 120 and the autonomous loader 122 may perform the delivery process discussed
above without any manual human intervention or with less manual human intervention
than using certain existing manual approaches.
[0027] User devices 102a and 102b through 102n, the autonomous shifter 120, and/or the autonomous
loader 122 can be client devices on the client-side of operating environment 100 while
server 106 can be on the server-side of operating environment 100. Server 106 can
comprise server-side software designed to work in conjunction with client-side software
on user devices 102a and 102b through 102n, the autonomous shifter 120, and/or the
autonomous loader 122 to implement any combination of the embodiments and functionalities
discussed in the present disclosure. This division of operating environment 100 is
provided to illustrate one example of a suitable environment, and there is no requirement
for each implementation that any combination of server 106, the autonomous shifter
120, the autonomous loader 122, and user devices 102a and 102b through 102n remain
as separate entities. The displays 103a and 103b through 103n may be integrated into
the user devices 102a and 102b through 102n, the autonomous shifter 120, and/or the
autonomous loader 122. In one embodiment, the displays 103a and 103b through 103n
are touchscreen displays.
[0028] User devices 102a and 102b through 102n may comprise any type of computing device
capable of use by a user. For example, in one embodiment, user devices 102a and 102b
through 102n may be the type of computing device 1100 described in relation to FIG.
11. By way of example and not limitation, a user device may be embodied as a personal
computer (PC), a laptop computer, a mobile device, a smartphone, a tablet computer,
a smart watch, a wearable computer, a personal digital assistant (PDA), a music player
or an MP3 player, a global positioning system (GPS) or device, a video player, a handheld
communications device, a gaming device or system, an entertainment system, a vehicle
computer system, an embedded system controller, a camera, a remote control, a bar
code scanner, a computerized measuring device, an appliance, a consumer electronic
device, a workstation, or any combination of these delineated devices, or any other
suitable computer device. In the context of an autonomous system, any of user devices
102a and 102b through 102n may be integrated into the autonomous shifter 120 or the
autonomous loader 122.
[0029] Data sources 104a and 104b through 104n may comprise data sources and/or data systems,
which are configured to make data available to any of the various constituents of
operating environment 100, or system 200 described in connection to FIG. 2. For instance,
in one embodiment, one or more data sources 104a through 104n provide (or make available
for accessing) the objects that were classified and detected, as well as the fill-level
determined by the trailer AI engine 210 of FIG. 2 and deployed by the object and fill-level
analytics deploying engine 270 of FIG. 2. Data sources 104a and 104b through 104n
may be discrete from user devices 102a and 102b through 102n, autonomous shifter 120,
the autonomous loader 122, and server 106. Alternatively, the data sources 104a through
104n may be incorporated and/or integrated into at least one of those components.
In one embodiment, one or more of data sources 104a through 104n may be integrated
into, associated with, and/or accessible to one or more of the user device(s) 102a,
102b, or 102n, autonomous shifter 120, autonomous loader 122, or server 106. Examples
of computations performed by server 106, autonomous shifter 120, autonomous loader
122, or user devices 102a, 102b, or 102n, and/or corresponding data made available
by data sources 104a through 104n are described further in connection to system 200
of FIG. 2.
[0030] In one embodiment, one or more of data sources 104a through 104n comprise one or
more sensors 107a and 107b through 107n or cameras 105 and 105b through 105n, which
may be integrated into or associated with one or more of the user device(s) 102a,
102b, or 102n or server 106. Examples of data made available by data sources 104a
through 104n are described further in connection to video collector 212 of FIG. 2.
The cameras 105 may include any device capable of recording visual images (e.g., two-dimensional
images) in the form of photographs, film, video signals, and so forth, to generate
a stream of video. By way of non-limiting example, the cameras 105 may include at
least one of a monocular camera, a compact camera, a bridge camera, or a mirrorless
camera. In one embodiment, the cameras 105 may include a camera that is already installed
inside of a storage compartment, for example, for security and monitoring purposes.
[0031] Operating environment 100 can be utilized to implement one or more of the components
of system 200, as described in association with FIG. 2. Operating environment 100
also can be utilized for implementing aspects of process flows 800, 900, and 1000
as described in FIGS. 8, 9, and 10, respectively. Turning to FIG. 2, depicted is a
block diagram illustrating an example system 200 in which some embodiments of this
disclosure are employed. System 200 represents only one example of a suitable computing
system architecture. Other arrangements and elements can be used in addition to or
instead of those shown, and some elements may be omitted altogether for the sake of
clarity. Further, as with operating environment 100, many of the elements described
herein are functional entities that may be implemented as discrete or distributed
components or in conjunction with other components, and in any suitable combination
and location.
[0032] Example system 200 includes a network 110, which is described in connection to FIG.
1, and which communicatively couples components of system 200 including trailer AI
engine 210 (which includes video collector 212, partitioning engine 214, object classification
engine 220, object feature training module 222, fill-level determining engine 230,
fill-level feature training module 232, and post-processing engine 240), model generating
engine 250 (which includes model initializer 252, model trainer 254, model evaluator
256, and model deploying engine 258), and storage 260 (which includes object detection
logic 262, fill-level logic 264, training data 266, and object-detection machine learning
(ML) model 268, and fill-level ML model 269), and object and fill-level analytics
deploying engine 270. The trailer AI engine 210 and the model generating engine 250
may be embodied as a set of compiled computer instructions or functions, program modules,
computer software services, or an arrangement of processes carried out on one or more
computer systems, such as computing device 1100 as described in connection to FIG.
11, for example.
[0033] In one embodiment, the functions performed by components of system 200 are associated
with one or more applications, services, or routines. In one embodiment, certain applications,
services, or routines may operate on one or more user devices (such as user device
102a), one or more servers (such as server 106), and may be distributed across one
or more user devices and servers, or may be implemented in a cloud-based system, such
as that illustrated in FIG. 12. Moreover, in some embodiments, these components of
system 200 may be distributed across a network, including one or more servers (such
as server 106) and client devices (such as user device 102a), in the cloud, may reside
on a user device (such as user device 102a), or may reside on the autonomous shifter
120 or autonomous loader 122 of FIG. 1. Moreover, these components and/or functions
performed by these components, or services carried out by these components may be
implemented at appropriate abstraction layer(s) such as the operating system layer,
application layer, hardware layer, and so forth, of the computing system(s). Alternatively,
or in addition, the functionality of these components and/or the embodiments of the
disclosure described herein can be performed, at least in part, by one or more hardware
logic components. For example, and without limitation, illustrative types of hardware
logic components that can be used include field-programmable gate arrays (FPGAs),
application-specific integrated circuits (ASICs), application-specific standard products
(ASSPs), system-on-a-chip systems (SOCs), complex programmable logic devices (CPLDs),
and so forth. Additionally, although functionality is described herein with reference
to specific components shown in example system 200, it is contemplated that in some
embodiments functionality of these components can be shared or distributed across
other components.
[0034] Continuing with FIG. 2, the trailer AI engine 210 is generally responsible for detecting,
tracking, and classifying objects, as well as determining (e.g., predicting) a fill-level
of items associated with a loading/unloading process, based on a stream of video,
as described herein. The trailer AI engine 210 may then generate analytics indicative
of the determined fill-level and the classified object. In this manner, the delivery
process may be improved and certain otherwise manual aspects may be automated through
the use of the real-time analytics disclosed herein. In particular, the disclosed
subject matter provides improved technologies for generating, at or near real-time,
analytics indicative of a fill-level and objects detected. These analytics may be
generated in a computationally inexpensive manner (e.g., employing computationally
inexpensive machine learning techniques on two-dimensional video frames) and reduce
error and expenses associated with having a dedicated administrator make such determinations
for a large number of storage compartments being simultaneously loaded or unloaded.
As such, the disclosed subject matter may provide advantages over many existing approaches
such as the use of LiDAR sensors, which fail to provide a resource and computationally
efficient technique for generating, at or near-real time associated with loading items
onto and unloading items from a storage compartment during the delivery process.
[0035] The video collector 212 of the trailer AI engine 210 may be configured to receive
or access a video data associated with a video, such as a (real-time) stream of video,
a recording of a video, video frames, saved document, and/or any content formatted
in any suitable manner. Example video formats include, but are not limited to MPEG-4
Part 14 (MP4) file, a MOV file, a QuickTime File Format (QTFF) file, a WMV file, an
AVI file, an Advanced Video Coding High Definition (AVCHD) file, a WebM file, MKV
file, or any other suitable video formatted file. As used herein, "video frame" may
refer to one or more still images which compose a complete moving picture (i.e., the
video). In one embodiment, the video frame may be formatted using any of the aforementioned
formats or any additional formats. Alternatively, or additionally, the trailer AI
engine 210 or the model generating engine 250 may receive images formatted in any
suitable image format, including but not limited to Joint Photographic Experts Group
(JPEG/JFIF), Exchangeable image file format (Exif), Tagged Image File Format (TIFF),
Graphics Interchange Format (GIF), BMP file format (Windows bitmap), Portable Network
Graphics (PNG), Portable Pixmap (PPM), WebP, BAT, and the like.
[0036] Video may be received from a camera 105d (FIG. 1) recording the inside of a storage
compartment. In some embodiments, the camera 105a, 105n of any user device 102a, 102b,
102n (FIG. 1) may be positioned inside of a storage compartment, such as the inside
of a trailer. In particular, a camera 105a, 105b, 105c, 105d, 105n of a computing
device may capture a video of the inside of the storage compartment, and the video
may be communicated to a software application associated with the object classification
engine 220 and/or the fill-level determining engine 230. Indeed, the video may be
communicated to the trailer AI engine 210 via any suitable communication technique
via network 110. For example, the video collector 212 may integrate (e.g., via a suitable
application peripheral interface (API)) with a camera application of a computing device,
such that the video may be directly communicated from the camera application to the
trailer AI engine 210 by way of the video collector 212. In some embodiments, the
video collector 212 receives the video as raw data. The video may be continuously
received, cached, and processed by the video collector 212.
[0037] The partitioning engine 214 is configured to divide the video into computer-recognizable
components. In some embodiments, the partitioning engine 214 is configured to determine
video frames from the video. The partitioning engine 214 may divide the video into
a set of video frames that may be processed for detecting, tracking, and classifying
objects, as well as determining a fill-level. The partitioning engine 214 may determine
an object based on any suitable image processing technique. For example, an object
may be determined based on edges separating pixels or candidate objects from each
other or from a background. For example, an object may be detected based on a gradient
threshold that is exceeded for gradients between neighboring pixels. For example,
the partitioning engine 214 may partition a video or video frame into candidate objects
that are classified by the object classification engine 220.
[0038] In some embodiment, the partitioning engine 214 may determine a position of the candidate
objects. In some embodiments, the partitioning engine 214 may determine coordinates
of the candidate objects relative to the entire video frame, relative to each other,
and the like. In one embodiment, the partitioning engine 214 may divide the video
frame into any number of partitions. For example, the partitioning engine 214 may
divide the video frame into a grid (for example, a 100 by 100 grid having 10,000 grid
elements) having uniform or non-uniform grid elements. The partitioning engine 214
may assign x-coordinates (for example, horizontal coordinates) and y-coordinates (for
example, vertical coordinates). In one embodiment, the x-coordinates and the y-coordinates
are perpendicular to each other. Coordinates of each of the candidate objects may
be determined based on the grid and/or the x and y coordinates. The partitioning engine
214 may generate a position profile for each candidate object, such that the position
profile includes x and y coordinates of the profile of the candidate object. In this
manner, the candidate objects identified in the video frame may better be related
to each other, as well as to the entire video frame to better determine the video
data. Indeed, a computing system may be better able to detect, track, and classify
the object, as well as to determine the fill-level based on a relationship between
a position profile of the candidate objects.
[0039] The partitioning engine 214 may determine and generate video data based on the video
frame. In some embodiments, the partitioning engine 214 may determine and/or generate
the video data based on the partitions of the video frames and/or based on the position
profile of the candidate objects. For example, the partitioning engine 214 may process
the raw video data and generate video data, as discussed below. The video data may
include machine-encoded edge information, position information, color information,
and so forth, corresponding to the video frame and its corresponding objects. The
partitioning engine 214 may extract video data for the video frame(s) based on the
candidate objects and their respective positions on the above-referenced coordinates.
Extracting video data may include determining the partitioned elements in the image
(e.g., the candidate objects) and a position profile for the candidate objects. As
discussed below with respect to the model generating engine 250, herein, the video
data may be processed by the model generating engine to train and generate a machine
learning model, such as the object-detection ML model 268 and the fill-level ML model
269.
[0040] Continuing with FIG. 2, the object classification engine 220 is configured with computing
logic, such as the object detection logic 262, to detect, track, and classify of the
candidate object determined by the partitioning engine 214. It should be understood
that, in some embodiments, the object classification engine 220 (instead of or in
addition to the partitioning engine 214) may determine the object in the video frame.
The object classification engine 220 may identify the object based on the object detection
logic 262. In some embodiments, the object classification engine 220 classifies and
tracks the object in the video frame based on the video data. For example, the object
classification engine 220 may employ object classification methodologies to determine
a context and classification of objects identified in the video frame. In some embodiments,
the object classification engine 220 may employ the object-detection ML model 268
that is trained and generated by the model generating engine 250. The object detection
logic 262 may define logic for using the object-detection ML model 268 to detect,
track, and classify an object. For example, the object-detection ML model 268 may
include a deep learning model, a neural network model, a logistic regression model,
a support vector machine model, and the like. The machine learning model may employ
a You Only Look Once (YOLO) methodology, Region-based Convolutional Neural Network
(R-CNN) methodology, Single Shot Detector (SSD) methodology, and the like.
[0041] The object classification engine 220 may detect, track, and/or classify the object
(e.g., candidate object) based on the object-detection ML model 268 being trained
based on a set of object-detection data feature. The object feature training module
222 may be configured with computing logic, such as the object detection logic 262,
to determine and generate object-detection data feature that may be used to train
the object-detection ML model 268. In one embodiment, the object feature training
module 222 may determine the object-detection data feature used to train the machine
learning model via any suitable process. For example, the object feature training
module 222 may determine the object-detection data feature via any suitable engineering
process, which may include at least one of the following steps: brainstorming or testing
features, deciding which features to create, creating the features, testing the impact
of the created features on an object or training data, and iteratively improving features.
Object-detection data feature may be engineered by the object feature training module
222 using any suitable computations, including, but not limited to, (1) numerical
transformation (e.g., taking fractions or scaling), (2) employing a category encoder
to categorize data, (3) clustering techniques, (4) group aggregation values, (5) principal
component analysis, and the like. In some embodiments, the object feature training
module 222 may assign different levels of significance to the video data, such that
certain object-detection data features that have a higher level of significant are
weighted accordingly when the model trainer 254 trains the object-detection ML model
268. In this manner, the model trainer 254 may prioritize and/or rank video data features
to improve identifying, tracking, and/or classifying the object.
[0042] The object classification engine 220 may employ any suitable classification or prediction
algorithm to classify and/or predict the identity of an object (e.g., the candidate
object identified by the partitioning engine), for example, based on the object-detection
data features. The object classification engine 220 may classify the object based
on the video. Example objects include a human object, a pallet object, a load stand
object, a parcel retainer object, a parcel on a conveyor belt object, a forever bag
object, a conveyor belt object, or a small container object. Based on the objects,
the object classification engine 220 may determine a dock door status (e.g., door
status), the number of loading/unloading workers (e.g., humans or robots), a parcel
on the conveyor belt, the usage of retainer/load stand, and so forth. It should be
understood that the embodiments disclosed herein may be broadly applied to predict
any suitable object, in addition to or other than those described herein.
[0043] Continuing with FIG. 2, the fill-level determining engine 230 may be configured with
computing logic, such as the fill-level logic 264, to determine a fill-level of a
storage compartment at or near real-time. The fill-level determining engine 230 may
determine the fill-level based on pixel information or the coordinates of a video
frame, for example, as generated by the partitioning engine 214. The fill-level determining
engine 230 determines the fill-level based on the fill-level logic 264. In some embodiments,
the fill-level determining engine 230 determines the fill-level based on the edges
between the floor and side walls. The edges may be determined based on the partitioning
engine 214 or the fill-level determining engine 230. For example, the edges 622, as
illustrated in FIG. 6B, between the floor and the side walls may correspond to bottom-right
and bottom-left edges extending toward the back wall inside the storage compartment.
The fill-level determining engine 230 may receive information indicative of the length
of the storage compartment from the storage 260. The dimensions (e.g., length) of
the storage compartment may be based on the type of storage compartment. For example,
the storage 260 may store information indicative of the type of storage compartment
from which the video was received. In one embodiment, storage 260 may store information
associated with the dimensions (e.g., length) of the storage compartment based on
which camera produced the video, such that the camera from which the video is received
is indexed to the storage compartment.
[0044] The fill-level determining engine 230 may employ any suitable (e.g., predictive)
methodologies to determine a context and fill-level of a storage container based on
the video frame(s). In some embodiments, the fill-level determining engine 230 may
employ a machine learning model, such as the fill-level ML model 269, that is trained
and generated by the model generating engine 250. The fill-level logic 264 may define
logic for training and using fill-level ML model 269 to determine the fill-level.
For example, the fill-level ML model 269 may include a deep learning model, a neural
network model, a logistic regression model, a support vector machine model, or any
suitable model employing lane detection architectures. The fill-level ML model 269
may employ a UFAST methodology, a top-to-down Lane Detection Framework Based on Conditional
Convolution (CondLaneNet) methodology, FOLOLane methodology (e.g., that predicts the
geometry of a lane marker by estimating adjacent key points), and the like.
[0045] The fill-level determining engine 230 may determine the fill-level based on a machine
learning model, such as the fill-level ML model 269, that is trained based on a set
of fill-level data features. The fill-level feature training module 232 may be configured
with computing logic, such as the fill-level logic 264, to determine and generate
fill-level data features that may be used to train the fill-level ML model 269. In
one embodiment, the fill-level feature training module 232 may determine the fill-level
data feature(s) used to train the fill-level ML model 269 via any suitable process.
For example, the fill-level feature training module 232 may determine the fill-level
data feature via any suitable engineering process, which may include at least one
of the following steps: brainstorming or testing features, deciding which features
to create, creating the features, testing the impact of the created features on an
object or training data, and iteratively improving features. Fill-level data feature
may be engineered by the fill-level feature training module 232 using any suitable
computations, including, but not limited to, (1) numerical transformation (e.g., taking
fractions or scaling), (2) employing a category encoder to categorize data, (3) clustering
techniques, (4) group aggregation values, (5) principal component analysis, and the
like. In some embodiments, the fill-level feature training module 232 may assign different
levels of significance to the video data, such that certain fill-level data features
that have a higher level of significance are weighted accordingly when the model trainer
254 trains the fill-level ML model 269. In this manner, the model trainer 254 may
prioritize and/or rank video data features to improve identifying, tracking, and/or
classifying the object.
[0046] The fill-level determining engine 230 may employ any suitable algorithm to determine
the fill-level, for example, based on the fill-level data features. The fill-level
determining engine 230 may determine the fill-level by applying any of the aforementioned
(or any additional suitable) methodologies discussed above to the video. In some embodiments,
the fill-level may be expressed as a percentage or time until completion. As discussed
above, the fill-level determining engine 230 may determine the fill-level based on
the edges between the floor and side walls (e.g., edges correspond to bottom-right
and bottom-left edges extending toward the back wall inside the storage compartment,
as illustrated in FIG. 6B). It should be understood that the embodiments disclosed
herein may be broadly applied to predict any suitable object, in addition to or other
than those described in this paragraph.
[0047] The post-processing engine 240 may improve results of the object classified by the
object classification engine 220 and of the fill-level predicted by the fill-level
determining engine 230. In some embodiments, the post-processing engine 240 employs
post-processing logic of the object detection logic 262 or of the fill-level logic
264 to improve the predictions and classifications. The post-processing logic may
define intent-specific instructions for filtering, smoothening, and further analyzing
data. For example, the post-processing engine 240 may perform various pruning routines,
rule quality processing, rule filtering, rule combination, model combination, and
time-stamped image processing. A more detailed discussion of the post-processing engine
240 applied to outputs of the object classification engine 220 is discussed below
with respect to FIGS. 4A and 5A. Similarly, a more detailed discussion of the engine
240 applied to outputs of the fill-level determining engine 230 is discussed below
with respect to FIGS. 6A and 7 A.
[0048] In the context of the object classification engine 220, the post-processing engine
240 may be implemented to further improve the detection results. In one embodiment,
the post-processing engine 240 may comprise of two post-processing engines, namely,
a data tilter/smoother engine 504 of FIG. 5A and a data analyzer engine 506 of FIG.
5A. First, the data filter/smoother engine 504 may remove the noise or small fluctuations
of the detection results from the minor instability of the loading/unloading environment.
For example, the changing of light condition inside a trailer or poor image resolution
because of low internet bandwidth can cause the object classification engine 220 to
miss detection of an object in a short period of time, so the estimation accuracy
can be improved by adding the proposed filter/smoother engine 504 of FIG. 5A.
[0049] Second, the data analyzer engine 506 of FIG. 5A may generate real-time outputs from
a data analyzer for operational purposes. In some embodiments, the data analyzer engine
506 may generate (1) real-time detection results and trailer information in each time
stamp, and (2) up-to-date summary of the current trailer loading/unloading process.
With regard to (1), the trailer information includes trailer's dimensions (width,
length, and height), source/destination location, trailer's ID, name of the current
processing facility, camera name and bay number for the current loading/unloading
process; the real-time detection results include a number of humans, pallets, and
many other targeted objects that were detected for every designated time stamp inside
a loading/unloading trailer. During a loading/unloading process these real-time detection
results and trailer information may be combined and uploaded to a table on a UPS cloud
storage space. With regard to (2), the summary information of the detection results
may be calculated and updated for the video frame associated with designated time
stamps. This information may include an indication of the maximum/minimum of the number
of the targeted objects/statuses, cumulated processing time, dock door open/close
time, remaining number of targeted objects. At the end of a trailer loading/unloading
process, the up-to-date summary and trailer information may be combined and uploaded
to a table stored in storage 260 (FIG. 2).
[0050] The object and fill-level analytics deploying engine 270 may be configured with computing
logic to configure the analytics output by the trailer AI engine 210 (e.g., via the
object classification engine 220 and/or the fill-level determining engine 230) for
use in any suitable abstraction layer, for example, user device 102a. In some embodiments,
the object and fill-level analytics deploying engine 270 may receive the classified
objects from the object classification engine 220 and the predicted fill-level from
the fill-level determining engine 230. Based on the classified object and/or the predicted
fill-level, the object and fill-level analytics deploying engine 270 may deploy the
associated analytics and any suitable alerts (e.g., an alert indicative of a person
locked inside the storage compartment, an indication of a fill-level, etc.) to an
associated software application, such as any suitable work management or scheduling
application. For example, in response to the trailer AI engine 210 determining the
fill-level and classifying the objects, the object and fill-level analytics deploying
engine 270 may format and deploy analytics for display on a GUI, as illustrated in
the screenshots depicted at least in FIGS. 3B, 3C, 3E, 3G, 4B, 4C, 4D, 4E, 4F, 4G,
4H, 4I, 5B, 5C, 6B, and/or 6C.
[0051] In some embodiments, the object and fill-level analytics deploying engine 270 may
communicate with any software application or computing device via any suitable API
or other communication means. For example, the object and fill-level analytics deploying
engine 270 may communicate with the autonomous shifter 120 and/or the autonomous loader
122. The autonomous shifter 120 and/or the autonomous loader 122 may coordinate with
each other (and other devices) to complete aspects of the delivery process (e.g.,
unloading or loading documents). Although this example includes the object and fill-level
analytics deploying engine 270 formatting, configuring, and communicating the analytics
for use by an autonomous device, the embodiments disclosed herein are not limited
to this user. For example, the object and fill-level analytics deploying engine 270
may instead or additionally communicate analytics to a software application of an
application layer for generating the GUIs discussed herein. It should be understood
that the object and fill-level analytics deploying engine 270 may format, configure,
and communicate the task to any suitable abstract layer, such as an operating system
layer, another application layer, or a hardware layer.
[0052] Continuing with FIG. 2, the model generating engine 250 may train and generate a
machine learning model (e.g., the object-detection ML model 268 and the fill-level
ML model 269) that may be employed by the trailer AI engine 210. The model initializer
252 may select and initialize a machine learning model. Initializing the machine learning
model may include causing the model initializer 252 to determine model parameters
and provide initial conditions for the model parameters. In one embodiment, the initial
conditions for the model parameters may include a coefficient for the model parameter.
[0053] The model trainer 254 may train the machine learning model determined by the model
initializer 252. As part of training the machine learning model, the model trainer
254 may receive outputs from the model initializer 252 to train the machine learning
model (e.g., the object-detection ML model 268 and the fill-level ML model 269). In
some embodiments, the model trainer 254 may receive the type of machine learning model,
the loss function associated with the machine learning model, the parameters used
to train the machine learning model, and the initial conditions for the model parameters.
Example loss functions include a standard cross entropy loss function, a focal loss
function, a dice loss function, and a self-adjusting loss function, to name a few.
The model trainer 254 may iteratively train the machine learning model. In one embodiment,
training the machine learning model may include employing an optimizer that trains
the machine learning model, using training data 266, until certain conditions are
met, for example, as determined by the model evaluator 256. Alternatively, the model
trainer 254 may feed one set of training data 266 to the machine learning model to
generate a predicted output that is used by the model evaluator 256.
[0054] Example training data 266 includes any labeled data or unlabeled data. In one embodiment,
an unlabeled 2D image of the storage compartment (e.g., trailer) is received, and
delineating features between the floor and the wall are identified and tagged with
a label identifying the pixels associated with the features. By way of non-limiting
example, training data 266 may include object information (indicative of a human object,
a pallet object, a load stand object, a parcel retainer object, a parcel on a conveyor
belt object, a forever bag object, a conveyor belt object, or a small container object,
to name few), status information (indicative of a dock/trailer door and trailer door
open status, a dock/trailer door and trailer door closed status, a dock/trailer door
and trailer door partially open status, a dock/trailer door open but trailer not ready
status, or an unloading or loading completed status, to name a few), trailer information,
camera device information (such as charging data, camera model, camera video resolution,
date/time, or other information derived from a computing device), security-sensor
data, appliance data, global positioning system (GPS) data, vehicle signal data, traffic
data, weather data (including forecasts), wearable device data, other user device
data (which may include device settings, profiles, network-related information (e.g.,
network name or ID, domain information, workgroup information, other network connection
data, Wi-Fi network data, or configuration data, data regarding the model number,
firmware, or equipment, device pairings, such as where a user has a mobile phone paired
with a Bluetooth headset, for example, or other network-related information)), gyroscope
data, accelerometer data, other data that may be sensed or otherwise detected, data
derived based on other data (for example, location data that can be derived from Wi-Fi,
cellular network, or IP (internet protocol) address data), calendar items (e.g., deliveries
to be made on a particular day), and nearly any other data that may be used to train
a machine learning model, as described herein.
[0055] The model evaluator 256 may evaluate the accuracy of the machine learning model (e.g.,
the object-detection ML model 268 and the fill-level ML model 269) trained by the
model trainer 254. In some embodiments, the model evaluator 256 is configured to assess
the accuracy of the model based on a loss (e.g., error) determined based on the loss
function. The model evaluator 256 may validate the machine learning model. In some
embodiments, the model evaluator 256 may validate the machine learning model based
on training data 266 used for validation purposes instead of training purposes. In
some embodiments, the training data used by the model evaluator 256 to validate the
machine learning model may correspond to training data different from the training
data used by the model trainer 254 to train the machine learning model. In some embodiments,
the training data 266 received via the model generating engine 250 from storage 260
may be split into training data used by the model trainer 254 and training data used
by the model evaluator 256. In one embodiment, the training data 266 used by the model
evaluator 256 may be unlabeled, while the training data 266 used by the model trainer
254 may be labeled.
[0056] The model evaluator 256 may validate the machine learning model based on a score
function. The score function may facilitate determining probabilistic scores for a
classification machine learning model or estimated averages for regression problems,
to name a couple examples. It should be understood that the score function may include
any suitable algorithm applied to training data 266 to uncover probabilistic insights
indicative of the accuracy of the machine learning model. In some embodiments, the
model evaluator 256 may employ a score function to determine whether the machine learning
model (e.g., the object-detection ML model 268 and the fill-level ML model 269) is
at or above a validation threshold value indicative of an acceptable model validation
metric. The model validation metric may include a percent accuracy or fit associated
with applying the machine learning model trained by the model trainer 254 to the training
data 266. If the model evaluator 256 determines that the machine learning model fails
to meet the model validation metric, then the model trainer 254 may continue to train
the machine learning model. On the other hand, if the model evaluator 256 determines
that the machine learning model passes validation, the model deploying engine 258
may deploy the machine learning model, for example, to the user device 102.
[0057] In some embodiments, the model deploying engine 258 may receive a machine learning
model determined to be sufficiently trained. The model deploying engine 258 may deploy
a trained machine learning model to the trailer AI engine 210. As discussed herein,
the trailer AI engine 210 may use the trained machine learning model deployed via
the model deploying engine 258 to perform the functionality described herein.
[0058] The object and fill-level analytics deploying engine 270 may deploy the trailer AI
engine 210, its outputs, and/or the machine learning model generated by the model
generating engine 250 to any suitable computing device (e.g., user device 102a), via
any suitable abstraction layer. For example, the object and fill-level analytics deploying
engine 270 may transmit the trailer AI engine 210, its outputs, and/or the machine
learning model to the operating system layer, application layer, hardware layer, and
so forth, associated with a device, such as the autonomous shifter 120 and/or the
autonomous loader 122 of FIG. 1. In one embodiment, the trailer AI engine 210, the
model generating engine 250, or any of its components may integrate with an existing
software application, such as a work management or scheduling application. For example,
the trailer AI engine 210, the model generating engine 250, or any of its components
may be installed as a plug-in (for example, a plug-in extension) to a web-based application
or browser or the computer productivity application. In this manner, a computing system
may present the GUIs depicted at least in FIGS. 3B, 3C, 3E, 3G, 4B, 4C, 4D, 4E, 4F,
4G, 4H, 4I, 5B, 5C, 6B, and 6C.
[0059] In the context of the object and fill-level analytics deploying engine 270 transmitting
to a computing device the trailer AI engine 210, its outputs, and/or the machine learning
model to the operating system layer (e.g., of a user device), analytics may be generated
to provide real-time insights into the loading/unloading process. Instead of having
to monitor raw footage, an administrator may monitor analytics alongside the raw footage,
as illustrated in at least FIGS. 3B and 3C. In some embodiments, the administrator
may altogether be omitted, such that the analytics may be deployed to computing devices,
such as robots, to automatically complete aspects of the loading/unloading process.
Alternatively, the computing device may access the functionality described herein
as any suitable software-as-a-service (SaaS) service or by any other means.
[0060] In one embodiment, the object and fill-level analytics deploying engine 270 may be
generally responsible for presenting content and related information, such as the
analytics presented on the GUIs illustrated in at least FIGS. 3B or 3C. The object
and fill-level analytics deploying engine 270 may comprise one or more applications
or services on a user device, across multiple user devices, or in the cloud. For example,
in one embodiment, presentation component 1116 (FIG. 11) manages the presentation
of content to a user across multiple user devices associated with that user. In some
embodiments, presentation component 1116 may determine a format in which content is
to be presented. In some embodiments, presentation component 1116 generates user interface
elements, as described herein. Such user interface elements can include queries, prompts,
graphic buttons, sliders, menus, audio prompts, alerts, alarms, vibrations, pop-up
windows, notification-bar or status-bar items, in-app notifications, or other similar
features for interfacing with a user.
[0061] Turning to FIG. 3A, depicted is a flow diagram of an example process 300 for employing
the object-detection ML model 268 and the fill-level ML model 269 on video frames
from a camera, according to some embodiments of this disclosure. The process 300 may
be performed via any component of the operating environment 100 of FIG. 1 and/or the
system 200 of FIG. 2 by any suitable component, such as the trailer AI engine 210,
for example. As illustrated, process 300 includes a camera 105d capturing a video
that is communicated to the object-detection ML model 268 and the fill-level ML model
269 (for example, via the video collector 212 of FIG. 2). At a high level, the object-detection
ML model 268 employs the object classification engine 220 to detect, track, and classify
objects in video frames of the video stream; and the fill-level ML model 269 employs
the fill-level determining engine 230 to predict a fill-level inside a storage compartment,
such as a trailer. Outputs of the object-detection ML model 268 and the fill-level
ML model 269 may be received by the post-processing engine 240 to generate analytics
that are deployed (via the object and fill-level analytics deploying engine 270 of
FIG. 2).
[0062] Turning to FIGS. 3B and 3C, depicted is a screenshot 320 and 350, respectively, of
an example graphical user interface (GUI) 322 including a stream region 330 and an
analytics region 332, according to some embodiments of this disclosure. In some embodiments,
the GUI 322 may be generated by the trailer AI engine 210 (FIG. 2) and deployed to
a user device 102a... n, autonomous shifter 120 (FIG. 1), or autonomous loader 122
(FIG. 1) by the object and fill-level analytics deploying engine 270 (FIG. 2). As
illustrated, the stream region 330 may be leftward of the analytics region 332 on
the GUI 322. However, it should be understood that the stream region 330 and the analytics
region 332 may be positioned in any suitable arrangement, such that the stream region
330 is above or below the analytics region 332, or such that the stream region and
the analytics region 332 may be on separate screens.
[0063] In some embodiments, the stream region 330 includes a video stream 334 of the video
frames captured by a camera 105d (FIG. 1). The video stream 334 may present a (live)
video of the inside of the storage compartment at or near-real time. The stream region
330 may include a visual classification indication indicative of the classified object.
In this example, the visual classification indication indicates that the object detected
is a person, as indicated by text and/or the visually distinct characteristics of
the visual classification indication, such as the color, indication pattern, symbol
structure, or shape of the visual classification indication. In one embodiment, the
visual classification indication may correspond to the classification of the object
determined by the object classification engine 220 (FIG. 2).
[0064] The stream region 330 may include an information panel 340. Among other visual indications,
the information panel 340 may include an open status indicator 342, a person detected
status indicator 344, a parcel detected status indication 346, and a fill-level indicator
348. The open status indicator 342 may provide an indication of whether the door to
storage compartment is open or closed. The person detected status indicator 344 may
provide of indication of whether a person has been detected in the storage compartment,
for example, by the object classification engine 220 (FIG. 2). The fill-level indicator
348 may provide an indication of how full the storage compartment is. For example,
the fill-level determining engine 230 (FIG. 2) may predict the fill-level of the storage
compartment. In some embodiments, the fill-level indicator 348 may be expressed as
a percentage (e.g., percent full), fraction (1/4, 1/3, 1/2, and so forth), a time
to completion of the loading (to 100%) or unloading (to 0% full), or any other visual
indication, icon, or symbol. However, it should be understood that the information
panel 340 may be omitted, as shown in FIG. 3C.
[0065] The analytics region 332 may include alphanumeric characters indicative of analytics
determined by the trailer AI engine 210 (FIG. 2). As illustrated, the analytics region
332 may provide information indicative of the dock/trailer door open time, the dock/trailer
door close time, the unloading time (e.g., time taken to complete the load/unload
process), the process time, the human enter time, the human exit time, the maximum
number of workers, the human working hours, the trailer unloading status (e.g., percent
complete of the loading/unloading process), a 5 minute warning, the usage of load
stand, the usage of load straps, and so forth. In some embodiments, the information
presented on the analytics region 332 may be communicated to the autonomous shifter
120 and autonomous loader 122 (FIG. 1).
[0066] FIG. 3D is a flow diagram of an example process 352 for employing an object and fill-level
analytics deploying engine 270 on outputs (block 508) of an object-detection ML model
268 and a fill-level ML model 269, according to some embodiments of this disclosure.
In one embodiment, the real-time object detection outputs (block 353) are classified
(block 355) into their corresponding detection class and post processed for each trailer/dock
door. This data may be used to generate (block 357) the radial bar plot and color
code the dock door labels based on the detection results. The radial bar plot (e.g.,
the radial bar plot 359 of FIG. 3E) may be updated at or near real-time to generate
a real-time dashboard (block 358). In one embodiment, the real time fill-level detection
(block 354) and object-detection outputs (block 353) are combined to generate last
5 minute notification (block 356) of the unloading process. Although discussed in
the context of a 5 minute notification, a notification indicative of any suitable
timing other than 5-minute may be generated. In this example, the five-minute notification
allows for the prediction and display of a state during unloading process based on
the detection results. The prediction may be generated with any suitable model(s)
such as the mean/median of a distribution or a more complex machine learning model
such as decision tree, random forest, or the like. For example, distribution of the
fill-level percentage 5 minutes before the end of each unloading process may be used
to find the median fill-level percentage that represents 5 minutes before the end
of each unloading process. When the fill-level of a dock door drops below this threshold,
the 5-minute notification (block 356) may be generated. Alternatively, a more complex
5-minute classification model may be created by fitting the object count and fill-level
trajectory data with true false labels that represent whether the process have less
than or more than 5 minutes less.
[0067] FIG. 3E is a screenshot of an example GUI, that includes a radial bar plot 359, generated
by an object and fill-level analytics deploying engine 270, according to some embodiments
of this disclosure. In one embodiment, the radial bar plot 359 is generated based
on process 352 of FIG. 3D. In some embodiments, the trailer AI engine 210 improves
the visibility of loading/unloading status of a facility. Real-time object-detection
(block 353) and real-time fill-level detection (block 354) may better achieve this
goal by digitizing the loading/unloading status of each dock door. However, if the
digitized data is displayed for each dock door separately, the visibility is limited
to each dock door and does not provide a comparison of all dock doors. The trailer
AI engine 210 may further improve the visibility by summarizing the detected results
in a radial bar plot 359, as illustrated in FIG. 3E. As illustrated, the radial bar
plot 359 may provide all the trailer/dock door statuses in a single view. Illustrating
a detected object, such as the fill-level, may be difficult with rectangular bar plots
due to limited space for displaying many dock doors on a single axis. The radial display,
such as that shown in FIG. 3E, can show the status of detected objects for larger
number of trailer/dock doors with enough space for each trailer/dock door label to
be displayed at a visible font size. With the radial bar plot 359 the trailer/dock
doors may be further grouped based on bar color, dock door label color, and additional
texts beside the labels. In this example, the bar colors represent unload parcel types
(e.g., IRREG, SMALL, CPU, SHORT, and LONG), the trailer/dock door label colors represent
the trailer/dock door status (dock door open is indicated as color green otherwise
in gray), and the additional texts beside the labels notify unloading processes with
5 minutes left. The radial presentation also allows the additional text to be displayed
in visible font size, just like the dock door labels. This allows the dock doors that
need attention to be quickly illustrated in a limited space.
[0068] FIG. 3F is a flow diagram of an example process 360 for employing an object and fill-level
analytics deploying engine 270 on outputs of an object-detection ML model 268, according
to some embodiments of this disclosure. As illustrated, the past object-detection
outputs (block 361) may be used to extract (block 362) process dependent data. Example
process dependent data that may be generated (block 363) based on the extraction includes
a start time and an end time of each process, idle time, active time, or similar.
A process may refer to when both the trailer/dock door open and close. Active time
may refer to the time when people and unloading parcels are detected. Idle time may
refer to when the doors are open, but the unloading process is at a stop (no progress
made). The past object-detection outputs (block 361) may also be used to generate
(block 364) object dependent data, that may perform calculations and/or predictions
without a start and end of each process and characterizes the status of each trailer/dock
door at a given time range. These data may include average number of people, average
parcel flow rate, or similar. In one embodiment, the process and object dependent
data (blocks 363 and 364) are combined to generate (block 365) various radial bar
plot, such as the radial bar plot 370 of FIG. 3G, to summarize the overall performance
of a loading/unloading facility on a summary dashboard 366.
[0069] FIG. 3G is a screenshot of an example GUI, including a radial bar plot 370, generated
by an object and fill-level analytics deploying engine 270, according to some embodiments
of this disclosure. The radial bar plot 370 may show the summary of the loading/unloading
status of each trailer/dock door. As illustrated, when a range of dates are selected,
the radial bar plot 370 may be updated to show statistical characteristic (such as
minimum, maximum, mode, average, or median) of each detected objects.
[0070] Turning to FIG. 4A, depicted is a flow diagram of an example process 400 for detecting,
tracking, and classifying objects from video frames captured by a camera 105d (FIG.
1), according to some embodiments of this disclosure. In some embodiments, the object
classification engine 220 (FIG. 2) receives video frames 402 from camera(s) 105d of
the storage compartment. That is, the object classification engine 220 may access
a video of the inside of the compartment and extract video data from the video that
represents one or more video frames of the video at a certain time (e.g., first time).
In some embodiment, the camera(s) 105d may correspond to security cameras that are
already installed and operating within the storage compartment.
[0071] In some embodiments, the object classification engine 220 (FIG. 2) includes an object-detection
engine 404 and an object tracking engine 406. The object-detection engine 404 may
detect objects in the video frames 402. The object-detection engine 404 may include
one or more machine learning models such as deep learning-based models. For example,
the object-detection engine 404 may include at least one model that employs a YOLO
model methodology, an SSD methodology, a Faster-RCNN methodology, or any suitable
model that may detect multiple objects in different classes, locations, and scales.
The object-detection engine 404 may process the video data using the one or more machine
learning models to detect an object present within the inside of the compartment at
the certain time
[0072] The object-detection engine 404 may be trained by the model generating engine 250
(FIG. 2). For example, the object-detection engine 404 may be continuously trained
using images and/or videos collected from various trailer loading/unloading scenarios.
In this manner, the object-detection engine may produce more accurate results under
changing loading/unloading environments such as changing of the lighting conditions
of a loading/unloading environment, different loading/unloading background textures,
and the like. The object-detection engine 404 may receive image/video frames 402 from
the cameras via data streaming. In this manner, the object-detection engine 404 may
produce pre-defined object classes with their locations indicated by bounding boxes.
[0073] The object classification engine 220 (FIG. 2) may include an object tracking engine
406 to account for discontinuous of object-detection results from the object-detection
engine 404, for example, due to the poor light conditions of the loading/unloading
environment and low image resolution of certain cameras (e.g., monocular security
cameras), especially for those inside of a storage compartment. The object tracking
engine 406 may generate, based on a first location of where an object is present within
the inside of the compartment at a first time, a trajectory of the object moving through
the inside of the compartment. The trajectory may include a predicted location of
the object within the inside of the compartment at a second time subsequent to the
first time. For example, the object tracking engine 406 may compare and record the
bounding box locations of those detected objects from the previous and current image/video
frames 402. In some embodiments, the object tracking engine 406 may process the coordinates
of objects detected bounding boxes from the previous frames, using a Kalman filter,
to predict their trajectories and associate with the detection from the current video
frame for matching.
[0074] For example, the object classification engine 220 may extract second video data from
the video that represents one or more video frames of the video at the second time;.
The object-detection engine 404 may process the second video data using the one or
more machine learning models, but not detect the object present within the inside
of the compartment at the second time. As noted, this may be due to poor light conditions
in the compartment and/or low image resolution of the cameras recording the video.
However, the object tracking engine 406 may identify a second location of where the
object is present within the inside of the compartment at the second time based at
least in part on the predicted location of the object as provided by the trajectory
of the object.
[0075] In some embodiments, the object tracking engine 406 may compare the position of a
detected object in the current video frame with a predicted position (predicted by
the Kalman filter) to determine differences between the positions. In some embodiments,
newly detected objects (detected by object-detection engine 404) that move into the
current frame may be added to a birth memory array for future tracking, and the objects
that move out of the current frame may be moved to the death memory array. Thereafter,
the objects that have been identified, tracked, and classified may be output (block
408) by the object classification engine 220 (FIG. 2). Thus, the object classification
engine 220 may make it possible to track objects by transmitting and inputting lower-resolution
images in the machine learning model, thereby using less bandwidth and computational
resources both at training and at inference time.
[0076] Turning to FIG. 4B, depicted is a flow diagram of an example process 410 of an object-detection
ML model 268 (FIG. 2) being applied by an object classification engine 220 (FIG. 2)
to detect objects in video frames 402, according to some embodiments of this disclosure.
In this example, three separate video frames 402 may be provided to the object-detection
ML model 268. In response to employing aspects of the object classification engine
220, the objects in classified video frames 412 may be identified, tracked, and classified.
As illustrated, the classified video frames 412 include the illustrated visual classification
indications 336 (e.g., symbol) on each identified object. When the object disappears,
the corresponding visual classification indication 336 may be omitted or removed.
In this example, the visual classification indications 336 includes a box border around
the identified objects. As illustrated, the object-detection ML model 268 may identify
a person (e.g., an entire contour of a person's body) or a portion of a person (e.g.,
a body part, such as an arm or head).
[0077] In some embodiments, the object-detection ML model 268 may detect the appearance
of a human object at a dock/trailer door or inside a storage compartment based on
images/videos streaming from an on-site security camera. Based on the video data,
the object-detection ML model 268 may determine a number of on-site humans and corresponding
positions of these detected human objects. The information about the number of workers
can help the operation manager understand if the current process has enough workers
or not. In this manner, a monitor/manager can adjust the size of the on-site loading/unloading
team to improve the process quality and efficiency, as well as to maintain efficiency
and flexibility of the loading/unloading process. On the other hand, the location
of the detected human objects may also be utilized for executing enterprise compliance-related
operations like determining when to close or open a dock-door.
[0078] As discussed above, the embodiments discussed herein are not limited to human object
detection, and may be applied to detect, track, and classify other objects such as
load stand, parcel retainer, or pallet that may appear during the loading/unloading
process. Moreover, embodiments discussed herein may detect if any of these objects
were captured by the security cameras in a video/image frame. By employing the object-detection
ML model 268, the quantity and location of pallets, load stands, or parcel retainers
may be detected in real-time through bounding boxes. These detection results can help
teams keep track of how many of these objects are inside a trailer and how many of
them need to be removed when a loading/unloading process is completed.
[0079] FIG. 4C is a flow diagram of an example process 420 of an object-detection ML model
268 (FIG. 2) being applied by an object classification engine 220 (FIG. 2) to detect
a status of a video frame 402 based on detection of an object, such as a dock/trailer
door 422, in video frames 402. In some embodiments, the camera 105d (FIG. 1) may be
positioned external to the storage compartment relative to the dock/trailer door 422,
so that the camera 105d may capture video of the dock/trailer door 422 being closed
or open. In this example, three separate video frames 402 may be provided to the object-detection
ML model 268. In response to employing aspects of the object classification engine
220, the objects in classified video frames 412 may be identified, tracked, and classified.
As illustrated, the classified video frames 412 include the illustrated visual classification
indications 336 on each identified object. In this example, the visual classification
indications 336 includes a text box proximate to the video frame to provide a description
of whether a dock/trailer door 422 has been detected. As illustrated, the object-detection
ML model 268 may determine whether the dock/trailer door 422 is open, partially opened,
or closed. By tracking the status of the dock/trailer door 422, the evaluation of
the start and/or end of a loading/unloading process may improve through the increased
situational awareness of the loading/unloading environment. In one embodiment, the
object-detection ML model 268 can detect/identify different status for the dock/trailer
door 422, such as: dock/trailer door open, dock/trailer door partially open, and dock/trailer
door closed.
[0080] First, the object and fill-level analytics deploying engine 270 (FIG. 2) may communicate
that status of the dock/trailer door 422 is open to allow the loading/unloading operators
to be aware that a dock/trailer door 422 is opened, and a loading/unloading process
is expected to start. The dock/trailer door 422 open status may be communicated to
the autonomous shifter 120 and/or the autonomous loader 122 (FIG. 1) to automate start
of the loading/unloading process. In one embodiment, the dock/trailer door 422 open
status may be received by the trailer AI engine 210 (FIG. 2) to begin the object-detection
process, using object-detection ML model 268 (FIG. 2), to initiate the detection procedure
for detecting/tracking human objects, pallets, fill-levels, or the other objects in
the trailer loading/unloading environment.
[0081] Second, the object and fill-level analytics deploying engine 270 (FIG. 2) may communicate
the status of the dock/trailer door 422 being in a closed status to allow operators
to be aware that a dock/trailer door 422 is closed, and the loading/unloading process
has ended. Similarly, the status that the dock/trailer door 422 is closed may be communicated
to the autonomous shifter 120 and/or the autonomous loader 122 (FIG. 1) to automate
ending the loading/unloading process. In some embodiments, the time between a pair
of door-open and door-close events can be used for process time estimation of a complete
loading/unloading session.
[0082] Third, the object and fill-level analytics deploying engine 270 (FIG. 2) may communicate
that the dock/trailer door 422 as being partially opened (e.g., the third dock/trailer
door is between dock/trailer door open and closed statuses). Dock/trailer door 422
partially open can be automatically communicated (e.g., in the form of an alert or
electronic message) to the operation team to take some action if the dock/trailer
door is not closed after a loading/unloading process or open during the loading/unloading
process. In some embodiments, during a loading/unloading process, a partially opened
dock/trailer door 422 may obstruct the view of the camera 105d (FIG. 1) and reduce
the amount of content captured by the camera 105d, making it difficult for the object-detection
ML model 268 (FIG. 2) to track and detect features of the video that may be used to
determine a fill-level, number of humans, number of pallets, and the like. Based on
the dock/trailer door 422 partial open status, the autonomous shifter 120 and/or the
autonomous loader 122 (FIG. 1) can take action to remove the blockage from camera
view, prevent unauthorized entry, and the like.
[0083] FIG. 4D is a flow diagram of an example process 430 of an object-detection ML model
268 (FIG. 2) being applied by an object classification engine 220 (FIG. 2) to detect
a status associated with a video frame 402, according to some embodiments of this
disclosure. In this example, the status indicates that the dock/trailer door 422 (FIG.
4C) is open but the trailer is not ready. As illustrated, four separate video frames
402 may be provided to the object-detection ML model 268. In response to employing
aspects of the object classification engine 220, the objects in classified video frames
412 may be identified, tracked, and classified. As illustrated, the classified video
frame includes the illustrated visual classification indications 336. In this example,
the visual classification indications 336 includes a text box proximate to the video
frame to provide a description of whether a loading/unloading process has been completed.
[0084] This status indicative of the "dock/trailer door being open but the trailer not being
ready" may result when in-bound dock/trailer doors are left open and unattended with
no trailer on the other side, or the trailer is not parked at the dock/trailer door.
These occurrences may slow down the loading/unloading process. To prevent any inefficiencies
related event from happening, the object-detection ML model 268 (FIG. 2) may provide
detection of dock/trailer door status, door open but trailer is not ready. In some
embodiment, whenever the object-detection ML model 268 detects that a trailer/dock
door 422 is open but trailer is not ready is detected, the object-detection ML model
268 may cause a message to automatically be sent to the operation team so the team
can respond by sending a task force to secure the dock/trailer door and follow up
the on-site situation. Alternatively, or additionally, whenever the object-detection
ML model 268 detects that a trailer/dock door 422 is open but trailer is not ready
is detected, the object-detection ML model 268 may cause a control signal to automatically
be sent to the autonomous shifter 120 and/or the autonomous loader 122 (FIG. 1) to
instruct the autonomous shifter 120 and/or the autonomous loader 122 to automatically
secure the dock/trailer door.
[0085] Turning to FIG. 4E, depicted is a flow diagram of an example process 440 of an object-detection
ML model 268 (FIG. 2) being applied by an object classification engine 220 (FIG. 2)
to detect a status associated with a video frame 402, according to some embodiments
of this disclosure. In this example, the status indicates that the loading/unloading
process is complete. As illustrated, two separate video frames 402 may be provided
to the object-detection ML model 268. In response to employing aspects of the object
classification engine 220, no objects (or stacked/empty pallets) in classified video
frames 412 may be identified, such that the object classification engine 220 may determine
that the unloading process is complete. As illustrated, the classified video frames
412 include the illustrated visual classification indications 336. In this example,
the visual classification indications 336 includes a text box proximate to the video
frame to provide a description that the loading/unloading process has been completed.
[0086] In some embodiments, a completed loading/unloading process is detected (by the object
classification engine 220) if there were no packages stacked on one another and the
fill-level is or is almost zero (e.g., less than 5%). The object classification engine
220 may record the time when the loading/unloading process has been completed, and
calculate the time spent before the dock/trailer door is closed. By determining this
status, present embodiments can improve the operational efficiency by highlighting
a completed process, promoting the loading/unloading team to quickly wrap up the trailer,
close the dock/trailer door, and rotate to the next trailer for another new loading/unloading
task.
[0087] Turning to FIG. 4F, depicted is a flow diagram of an example process 460 of an object-detection
ML model 268 (FIG. 2) being applied by an object classification engine 220 (FIG. 2)
to detect objects, such as the illustrated parcels, in video frames 402, according
to some embodiments of this disclosure. In this example, a video frame 402 may be
provided to the object-detection ML model 268. In response to employing aspects of
the object classification engine 220, the object in classified video frames 412 may
be identified, tracked, and classified. As illustrated, the classified video frame
includes the illustrated visual classification indications 336 on each identified
object. In this example, the visual classification indications 336 includes a box
border around the identified objects. As illustrated, the object-detection ML model
268 may identify and classify a parcel (e.g., a box of an item for shipment). As illustrated,
the object-detection ML model 268 may track movement of the parcel on a conveyor belt
462.
[0088] By tracking parcels on the conveyor belt, the object-detection ML model 268 may better
estimate the loading/unloading efficiency from different loading/unloading teams.
The time stamps of the generation/removal of a bounding box for a detected parcel
on the conveyor belt can be recorded and the number of loaded/unloaded parcels per
time unit can be calculated using the tracking capability of object classification
engine 220. In this manner, the object classification engine 220 can provide an analysis
of a loading/unloading efficiency under various trailer conditions. Additionally,
the object classification engine 220 can detect, classify, and track the conveyor
belt inside the storage compartment by the proposed object-detection framework via
a polygon-shape as shown in FIG. 4G.
[0089] In more detail, turning to FIG. 4G, depicted is a flow diagram of an example process
470 of an object-detection ML model 268 (FIG. 2) being applied by an object classification
engine 220 (FIG. 2) to detect an object, such as the conveyor belt 462, in video frames
402, according to some embodiments of this disclosure. In this example, a video frame
402 may be provided to the object-detection ML model 268. In response to employing
aspects of the object classification engine 220, the object in classified video frames
412 may be identified, tracked, and classified. As illustrated, the classified video
frame 412 includes the illustrated visual classification indications 336 on each identified
object. In this example, the visual classification indications 336 includes a box
border (e.g., polygon 472) around the identified object. As illustrated, the object-detection
ML model 268 may identify and classify a conveyor belt 462. The conveyor belt 462
may be determined based on identification of a polygon 472. The detected conveyor
belt 462 with pixel coordinates from the four corner points of the polygon 472 can
further be used to determine whether the parcels are on the conveyor belt 462 or are
omitted from (e.g., not on) the conveyor belt 462.
[0090] FIG. 4H is a graph diagram of an example time graph 480 of results of an object-detection
ML model 268 (FIG. 2) being applied by an object classification engine 220 (FIG. 2),
solely, to determine an object in video frames 402; and FIG. 4I is a graph diagram
of an example time graph 490 of results of an object tracking engine 406 being applied,
along with the object-detection ML model 268, by the object classification engine
220 to track the object in the video frames 402 (FIG. 4A), according to some embodiments
of this disclosure. In other words, while the example time graph 480 shown in FIG.
4H includes the object being determined solely using the object-detection engine 404
(FIG. 4A) of the object classification engine 220, FIG. 4I includes the object being
tracked using the object tracking engine 406 (FIG. 4A) of the object classification
engine 220 along with the object-detection engine 404. Here, the example time graph
480 shown in FIG. 4H reflects results of the object classification engine 220 omitting
the use of the object tracking engine 406, while the example time graph 490 shown
in FIG. 4I reflects results of the object classification engine 220 employing the
object tracking engine 406.
[0091] By tracking objects using both the object-detection engine 404 and the object tracking
engine 406 in various embodiments, an object that may temporarily disappear, for example,
due to poor lighting conditions or poor image quality, may be accounted for despite
no longer being detectable. For example, in FIG. 4H, three video frames 402A, 402B,
and 402C are captured over time. In the third video frame 402C, the object may have
become unidentifiable due to the poor lighting conditions and is no longer detected.
However, by employing the object tracking engine 406 of the object classification
engine 220, the object may continue to be tracked despite not being detectable by
the object-detection engine 404 of the object classification engine 220. In this manner,
the objects may continue to be tracked despite poor lighting conditions or poor image
resolution, and the change for detection discontinuity due to the false negative of
the detection results (targeted objects not detected), can be significantly decreased.
[0092] Turning to FIG. 5A, depicted is a flow diagram of an example process 500 for post-processing
data associated with classified objects, according to some embodiments of this disclosure.
In some embodiments, the process 500 illustrated in flow diagram may be performed
by the post-processing engine 240 (FIG. 2), which may receive (block 502) objects
that have been identified, tracked, and classified may be output (block 408 of FIG.
4A) by the object classification engine 220 (FIG. 2). The post-processing engine 240
may include a data filter/smoother engine 504 and/or a data analyzer engine 506.
[0093] The filter/smoother engine 504 may remove the noise or fluctuations of the detection
results from the minor instability of the loading/unloading environment. For example,
the changing of light condition inside the storage compartment or poor image resolution
resulting from low internet bandwidth can cause the classified objects (e.g., output
by the object classification engine 220) to improperly detect certain objects during
a short period of time or over a few video frames. Employing the data filter/smoother
engine 504 may improve the estimation accuracy, as shown in FIG. 5C.
[0094] The data analyzer engine 506 may generate, at or near real-time, outputs (block 508)
associated with the classified objects for which noise has been removed. In some embodiments,
the data analyzer engine 506 may generate (block 508) two outputs: (1) real-time detection
results and trailer information at each time stamp, and (2) up-to-date summary of
the current trailer loading/unloading process. In some embodiments, the outputs (block
508) may be communicated by the object and fill-level analytics deploying engine 270
(FIG. 2).
[0095] With respect to (1) real-time detection results and trailer information at each time
stamp, the trailer information may include trailer's dimensions (width, length, and
height), source/destination location, trailer's ID, name of the current processing
facility, camera ID, and/or bay number for the current loading/unloading process.
The real-time detection results may include a number of humans, pallets, and many
other targeted objects that were detected for every designated time stamp inside a
loading/unloading trailer. For example, the data analyzer engine 506 may generate
analytics indicative of an object present at a first location within the inside of
the trailer (e.g., compartment) at a first time and the object present at the second
location within the inside of the trailer at a second time. As an illustrative example,
the data analyzer engine 506 may generate analytics indicative of a human traveling
(e.g., walking) around different locations of the trailer such as total time in the
trailer, distance covered while in the trailer, different positions (e.g., locations)
while in the trailer, etc. During a trailer loading/unloading process, these-real-time
detection results may be displayed on a graphical user interface. Additionally, or
alternatively, these real-time detection results and trailer information may be combined
and uploaded to one or more storage devices, such as storage 260 (FIG. 2).
[0096] With respect to (2) up-to-date summary of the current trailer loading/unloading process,
summary information of the detection results may be calculated and updated in every
designated time stamps. This information may include an indication of the maximum/minimum
number of the targeted objects (e.g., workers) and/or statuses, cumulated processing
time, dock/trailer door open/close time, remaining number of targeted objects, and
the like. At the end of a trailer loading/unloading process, the up-to-date summary
and trailer information may be combined and uploaded to one or more storage devices,
such as storage 260 (FIG. 2).
[0097] FIG. 5B is a graph diagram of a time graph 510 of results of an object-detection
ML model 268 (FIG. 2) being applied by an object classification engine 220 (FIG. 2)
before employing post-processing engine 240 (FIG. 2), while FIG. 5C is a graph diagram
of a time graph 520 of results of an object-detection ML model 268 being applied by
an object classification engine 220 employing post-processing engine 240, according
to some embodiments of this disclosure. As illustrated, the first and second video
frames 402A and 402B may include two objects, while the third and fourth video frames
402C and 402D may include one object. As illustrated in the time graph 510, no objects
may be detected at the third and fourth video frames 402C and 402D when the post-processing
engine 240 is not employed. However, as illustrated in the time graph 520, one object
may be (accurately) detected at the third and fourth video frames 402C and 402D when
the post-processing engine 240 is employed (e.g., to remove noise and perform the
hereinreferenced analysis on the video frames). By employing the post-processing engine
240, objects that may have not been detected at a particular video frame (e.g., at
a particular time) may be accounted for. Therefore, objects, such as the pallet, load
stand, and many other objects and corresponding statuses may be accounted for by employing
post-processing engine 240 to an output of the object classification engine 220.
[0098] FIG. 6A is a flow diagram of an example process 600 for determining and tracking
a fill-level associated with video frames 402 captured by a camera 105d (FIG. 1),
according to some embodiments of this disclosure. In some embodiments, the object
classification engine 220 (FIG. 2) receives video frames 402 from camera(s) 105d of
the storage compartment. In some embodiment, the camera(s) 105d may correspond to
security camera(s) that are already installed and operating within the storage compartment.
For example, the object classification engine 220 may access a video of an inside
of a compartment and extract video data from the video that represents one or more
video frames of the video at a certain time.
[0099] In some embodiments, the object classification engine 220 includes a fill-level detection
engine 604 and a fill-level percentage generator engine 606. The fill-level detection
engine 604 may detect, in the video frames 402, edges corresponding to an intersection
between walls of the inside of the storage compartment. The fill-level detection engine
604 may include one or more machine learning models such as deep learning-based models.
For example, the fill-level detection engine 604 may include at least one model that
employs a UFAST methodology, a CondLaneNet methodology, FOLOLane methodology, or any
suitable model that may detect edges to generate pixel information (e.g., orientation
and size of edge and corresponding pixels) and fill-level data features.
[0100] The fill-level detection engine 604 may be trained by the model generating engine
250 (FIG. 2). For example, the fill-level detection engine 604 may be continuously
trained using images and/or videos collected from various trailer loading/unloading
scenarios. In this manner, the object-detection engine may produce more accurate results
under changing loading/unloading environments such as changing of the lighting conditions
of a loading/unloading environment, different loading/unloading background textures,
and the like.
[0101] The fill-level detection engine 604 may receive image/video frames 402 from the cameras
105. The fill-level detection engine 604 may process video data from the video frames
using the one or more machine learning models to detect a bottom left edge between
a floor and a left side wall of the inside of the compartment and a bottom right edge
between the floor and a right side wall of the inside of the compartment. In some
embodiments, the fill-level detection engine 604 may generate pixel information and
features to generate the results of the edges that correspond to the intersection
between the floor of the storage compartment and side walls of the storage compartment.
The fill-level detection engine 604 can estimate a storage compartment's parcel fullness
by detecting both the bottom-right and bottom-left edges inside the storage compartment.
Thereafter, the fill-level detection engine 604 may generate a set of pixel coordinates
corresponding to the edges.
[0102] The fill-level percentage generator engine 606 may match the camera information or
storage compartment information to a database 609 (e.g., storage 260 of FIG. 2) of
storage compartment dimensions, including the length. Thereafter, the fill-level percentage
generator engine 606 may receive (access) the storage compartment length (from the
database 609) of the storage compartment from which the video frames 402 have been
received. The fill-level percentage generator engine 606 may generate, based on the
bottom left edge, the bottom right edge, and the length, a fill-level of the compartment
that identifies a progress of loading items into or unloading the items from the inside
of the compartment. For example, based on the pixel coordinates for the two edges
(as determined by the fill-level detection engine 604) and the length from the database
609, the fill-level percentage generator engine 606 may calculate and output (block
608) the fill-level, which may be expressed as a numeric or visual percentage, ratio,
fraction, and the like. By determining the fill-level of a storage compartment, such
as a trailer, during a loading/unloading process, operation teams and crews can better
evaluate the efficiency of a loading/unloading process and determine how much time
is left until completion.
[0103] FIG. 6B is a screenshot 620 of edges 622 being detected inside of a storage compartment
by employing a fill-level ML model 269 (FIG. 2), according to some embodiments of
this disclosure. As illustrated, the edges 622 may correspond to the visible portion
of the intersection of the side walls with the floor surface. The edges 622 may correspond
to the portion of the intersection of the side wall and the floor surface up until
objects (in this example, boxes are detected). In some embodiments, the edges 622
may be collinear or parallel with an indication of the length 624 of the storage compartment.
As discussed above, the fill-level may be determined based at least on a ratio between
the length 624 of the storage compartment (e.g., trailer) and the edges 622. In one
embodiment, the length 624 of the storage compartment and size of the edges 622 may
be normalized so as to be in the same units of measurement. In some embodiments, the
screenshots 620 illustrated in FIG. 6B may be produced by the trailer AI engine 210
(FIG. 2) and exported by the object and fill-level analytics deploying engine 270
(FIG. 2) to be presented on a display device.
[0104] FIG. 6C is a screenshot 630 of a fill-level ML model 269 (FIG. 2) applied to determine
a fill-level based on edges 622 (FIG. 6B) detected inside of a storage compartment,
according to some embodiments of this disclosure. In some embodiments, the fill-level
ML model 269 may implement the fill-level determining engine 230 (FIG. 2) to determine
the fill-level. In this example, four separate video frames 402 may be received by
the fill-level ML model 269 (FIG. 2). In response to employing aspects of the fill-level
determining engine 230 (FIG. 2), the fill-level 640 may be determined, for example,
for each video frame 402. As illustrated, fill-level 640 may be present using any
suitable visual indication, such as a fill-level percentage 642 and/or a fill-level
bar indicator 644. In one embodiment, the fill-level bar indicator 644 may be visually
distinct to correspond to any percentage range. For example, the fill-level bar indicator
644 may be green for a fill-level between 80% and 100%, yellow for a fill-level between
40% and 80%, and/or red for a fill-level between 0% and 40%.
[0105] Turning to FIG. 7A, depicted is a flow diagram of an example process 700 for post-processing
data associated with fill-level, according to some embodiments of this disclosure.
In some embodiments, the process 500 illustrated in FIG. 5A may be performed by the
post-processing engine 240 (FIG. 2), which may receive (block 702) a fill-level that
has been determined and output (block 608 of FIG. 6A) by the fill-level determining
engine 230 (FIG. 2). The post-processing engine 240 may include a data filter/smoother
engine 704 and/or a data analyzer engine 706.
[0106] The data filter/smoother engine 704 may remove the noise or fluctuations of the detection
results from the minor instability of the loading/unloading environment. For example,
the changing of light condition inside the storage compartment, the material of the
inside of the storage compartment, the object (e.g., parcel) distribution within the
storage compartment, or poor image resolution resulting from low internet bandwidth
can cause the detected edges 622 (FIG. 6B) (e.g., output by the fill-level determining
engine 230) to improperly (or not) detect certain edges 622 during a short period
of time or over a few video frames 402. Employing the data filter/smoother engine
704 may improve the estimation accuracy, as shown in FIGS. 7B and 7C.
[0107] The data analyzer engine 706 may generate analytics indicative of the fill-level
of the compartment. For example, the data analyzer engine 706 may determine idle sections
of the received (block 702) fill-level results. The idle sections may correspond to
a time period or length of time during which the fill-level did not change, for example,
because items were not unloaded or loaded. In some embodiments, after the data filter/smoother
engine 704 processes the received (block 702) fill-level results, the data analyzer
engine 706 may extract candidate idle sections from the cumulative processed fill-level
result. The start and end time associated with these idle sections can be computed
based on the analysis of those processed fill-level results. The idle sections 742
(FIG. 7C) are illustrated as the flat horizontal regions. Information associated with
the idle sections, as well as the start and end time of a trailer loading/unloading
process can provide insights to the operation teams, allowing them to evaluate the
speed and efficiency of a loading/unloading process.
[0108] FIGS. 7B and 7C depict respective graph diagrams 710 and 740 of time graphs 720 and
730 plotting results of a fill-level ML model 269 (FIG. 2) employing a post-processing
engine 240 (FIG. 2), according to some embodiments of this disclosure. As illustrated,
an unloading process is depicted by a first graph 720 and a loading process is depicted
by a second graph 730. During the unloading process, the fill-level may reduce overtime
as items are unloaded. On the other hand, during the loading process, the fill-level
may increase over time as items are loaded. In this example, the solid line depicts
the fill-level determined at or near real time, for example, by the fill-level determining
engine 230 (FIG. 2), while the dashed line depicts the fill-level after the filter/smoother
engine 704 (FIG. 7A) may remove the noise or fluctuations of the detection results.
As illustrated in FIG. 7C, the idle sections 742 are illustrated as the flat horizontal
regions.
[0109] Turning now to FIG. 8, depicted is a flow diagram of process 800 for generating and
communicating analytics indicative of a fill-level and a classified object, according
to some embodiments of this disclosure. Process 800 (and processes 900 and 1000 of
FIG. 9 and 10, respectively) and/or any of the functionality described herein may
be performed by processing logic that comprises hardware (e.g., circuitry, dedicated
logic, programmable logic, microcode), software (e.g., instructions run on a processor
to perform hardware simulation), firmware, or a combination thereof. Although particular
blocks described in this disclosure are referenced in a particular order or a particular
quantity, it is understood that any block may occur substantially parallel with or
before or after any other block. Embodiments of example processes 800, 900, and/or
1000 may be carried out using a computer system having components described in connection
with example system 200 (FIG. 2). In particular, additional details of the various
blocks and operations performed in processes 800, 900, and/or 1000 are described in
connection with FIG. 2 and example system 200. Further, more (or fewer) blocks may
exist than illustrated. Such added blocks may include blocks that embody any functionality
described herein. The computer-implemented method, the system (that includes at least
one computing device having at least one processor and at least one computer readable
storage medium), and/or the computer storage median as described herein may perform
or be caused to perform the process 800 (or processes 900 and/or 1000) or any other
functionality described herein.
[0110] Per block 810, particular embodiments include accessing a video indicative of the
inside of a storage compartment from which items are being loaded or unloaded. In
one embodiment, the video collector 212 (FIG. 2) may receive the video.
[0111] Per block 820, particular embodiments include extracting video data for the video
based at least on the items being loaded or unloaded from inside of the storage compartment
to produce a plurality of video data features comprising a fill-level data feature
and an object-detection data feature. In some embodiments, the partitioning engine
214 (FIG. 2), the object feature training module 222 (FIG. 2), and/or the fill-level
feature training module 232 (FIG. 2) may extract the video data.
[0112] Per block 830, particular embodiments include determining, using a fill-level ML
model 269 (FIG. 2), a fill-level associated with the inside of the storage compartment
based on at least the video and the fill-level data feature. In one embodiment, the
fill-level determining engine 230 (FIG. 2) may determine the fill-level and/or employ
the fill-level ML model 269 (FIG. 2).
[0113] Per block 840, particular embodiments include classifying, using an object-detection
ML model 268 (FIG. 2), an object detected inside of the storage compartment based
at least on the video and the object-detection data feature. In one embodiment, the
object classification engine 220 (FIG. 2) may detect, classify, and/or track the object.
[0114] Per block 850, particular embodiments include generating analytics indicative of
the determined fill-level and the classified object. In one embodiment, the trailer
AI engine 210 (FIG. 2) may generate the analytics.
[0115] Per block 860, particular embodiments include communicating the analytics to a display
device to cause the display device to generate a GUI, such as the GUI illustrated
in the figures described herein. In one embodiment, the object and fill-level analytics
deploying engine 270 (FIG. 2) may communicate the analytics.
[0116] Turning to FIG. 9, depicted is a flow diagram of an example process 900 for causing
presentation of a GUI that includes a stream region 330 (FIG. 3B) and/or an analytics
region 332 (FIG. 3B), according to some embodiments of this disclosure. Per block
910, particular embodiments include accessing a video indicative of the inside of
a storage compartment from which items are being loaded or unloaded. In one embodiment,
the video collector 212 (FIG. 2) may receive the video.
[0117] Per block 920, particular embodiments include extracting video data for the video
based at least on the items being loaded or unloaded from inside of the storage compartment
to produce a plurality of video data features comprising a fill-level data feature
and an object-detection data feature. In some embodiments, the partitioning engine
214 (FIG. 2), the object feature training module 222 (FIG. 2), and/or the fill-level
feature training module 232 (FIG. 2) may extract the video data.
[0118] Per block 930, particular embodiments include determining a fill-level associated
with the inside of the storage compartment based on at least the fill-level data feature.
In one embodiment, the fill-level determining engine 230 (FIG. 2) may determine the
fill-level and/or employ the fill-level ML model 269 (FIG. 2).
[0119] Per block 940, particular embodiments include detecting and classifying an object
inside of the storage compartment based at least on the object-detection data feature.
In one embodiment, the object classification engine 220 (FIG. 2) may detect, classify,
and/or track the object.
[0120] Per block 950, particular embodiments include causing presentation of a GUI that
includes a stream region 330 and an analytics region 332, such as the GUIs described
herein (e.g., FIG. 3B and 3C). The stream region may include a stream of the video
at or near real-time. The analytics region may include analytics indicative of progress
in the items being loaded or unloaded from the storage compartment based on the determined
fill-level and the object. In one embodiment, the object and fill-level analytics
deploying engine 270 may communicate the analytics.
[0121] Turning to FIG. 10, depicted a flow diagram of an example process 1000 for deploying
a trained fill-level ML model and a trained object-detection ML model, according to
some embodiments of this disclosure. Per block 1010, particular embodiments include
accessing a fill-level ML model 269 (FIG. 2) and an object-detection ML model 268
(FIG. 2). In one embodiment, the fill-level ML model 269 and an object-detection ML
model 268 may be accessed from storage 260 (FIG. 2).
[0122] Per block 1020, particular embodiments include training the fill-level ML model 269
(FIG. 2) based on a first set of labeled data corresponding to a fill-level data feature,
of a plurality of video data features, indicative of a fill-level of items being loaded
or unloaded from a storage compartment. In some embodiments, the fill-level ML model
269 may be trained using the model generating engine 250 (FIG. 2).
[0123] Per block 1030, particular embodiments include training the object-detection ML model
268 (FIG. 2) based on a second set of labeled data corresponding to an object-detection
data feature, of a plurality of video data features, indicative of a classified object
within the storage compartment. In some embodiments, the object-detection ML model
268 may be trained using the model generating engine 250 (FIG. 2).
[0124] Per block 1040, particular embodiments include deploying the trained object-detection
ML model 268 (FIG. 2) and the trained fill-level ML model 269 (FIG. 2) to a computing
device. In one embodiment, the object and fill-level analytics deploying engine 270
(FIG. 2) may deploy the trained object-detection ML model and the trained fill-level
ML model.
[0125] Having described various embodiments of the disclosure, an exemplary computing environment
suitable for implementing embodiments of the disclosure is now described. With reference
to FIG. 11, an exemplary computing device is provided and referred to generally as
computing device 1100. The computing device 1100 is but one example of a suitable
computing environment and is not intended to suggest any limitation as to the scope
of use or functionality of the disclosure. Neither should the computing device 1100
be interpreted as having any dependency or requirement relating to any one or combination
of components illustrated.
[0126] Embodiments of the disclosure may be described in the general context of computer
code or machine-useable instructions, including computer-useable or computer-executable
instructions, such as program modules, being executed by a computer or other machine,
such as a personal data assistant, a smartphone, a tablet PC, or other handheld device.
Generally, program modules, including routines, programs, objects, components, data
structures, and the like, refer to code that performs particular tasks or implements
particular abstract data types. Embodiments of the disclosure may be practiced in
a variety of system configurations, including handheld devices, consumer electronics,
general-purpose computers, more specialty computing devices, or similar computing
or processing devices. Embodiments of the disclosure may also be practiced in distributed
computing environments where tasks are performed by remote-processing devices that
are linked through a communications network. In a distributed computing environment,
program modules may be located in both local and remote computer storage media including
memory storage devices.
[0127] With reference to FIG. 11, computing device 1100 includes a bus 1110 that directly
or indirectly couples the following devices: memory 1112, one or more processors 1114,
one or more presentation components 1116, one or more input/output (I/O) ports 1118,
one or more I/O components 1120, and an illustrative power supply 1122. Bus 1110 represents
what may be one or more busses (such as an address bus, data bus, or combination thereof).
Although the various blocks of FIG. 11 are shown with lines for the sake of clarity,
in reality, these blocks represent logical, not necessarily actual, components. For
example, one may consider a presentation component such as a display device to be
an I/O component. Also, processors have memory. The inventors hereof recognize that
such is the nature of the art and reiterate that the diagram of FIG. 11 is merely
illustrative of an exemplary computing device that can be used in connection with
one or more embodiments of the present disclosure. Distinction is not made between
such categories as "workstation," "server," "laptop," "handheld device," or the like,
as all are contemplated within the scope of FIG. 11 and with reference to "computing
device."
[0128] Computing device 1100 typically includes a variety of computer-readable media. Computer-readable
media can be any available media that can be accessed by computing device 1100 and
includes both volatile and nonvolatile media, removable and non-removable media. By
way of example, and not limitation, computer-readable media may include computer storage
media and communication media. Computer storage media includes both volatile and nonvolatile,
removable and non-removable media implemented in any method or technology for storage
of information such as computer-readable instructions, data structures, program modules,
or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM,
flash memory or other memory technology, CD-ROM, digital versatile disks (DVDs) or
other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage
or other magnetic storage devices, or any other medium which can be used to store
the desired information and which can be accessed by computing device 1100. Computer
storage media does not comprise signals per se. Communication media typically embodies
computer-readable instructions, data structures, program modules, or other data in
a modulated data signal such as a carrier wave or other transport mechanism and includes
any information delivery media. The term "modulated data signal" means a signal that
has one or more of its characteristics set or changed in such a manner as to encode
information in the signal. By way of example, and not limitation, communication media
includes wired media, such as a wired network or direct-wired connection, and wireless
media, such as acoustic, RF, infrared, and other wireless media. Combinations of any
of the above should also be included within the scope of computer-readable media.
[0129] Memory 1112 includes computer storage media in the form of volatile and/or nonvolatile
memory. The memory may be removable, non-removable, or a combination thereof. Exemplary
hardware devices include solid-state memory, hard drives, optical-disc drives, and
similar physical storage media. Computing device 1100 includes one or more processors
1114 that read data from various entities such as memory 1112 or I/O components 1120.
Presentation component(s) 1116 presents data indications to a user or other device.
Exemplary presentation components include a display device, speaker, printing component,
vibrating component, and the like.
[0130] The I/O ports 1118 allow computing device 1100 to be logically coupled to other devices,
including I/O components 1120, some of which may be built in. Illustrative components
include, by way of example and not limitation, a microphone, joystick, game pad, satellite
dish, scanner, printer, wireless device, and other I/O components. The I/O components
1120 may provide a natural user interface (NUI) that processes air gestures, voice,
or other physiological inputs generated by a user. In some instances, inputs may be
transmitted to an appropriate network element for further processing. A NUI may implement
any combination of speech recognition, touch and stylus recognition, facial recognition,
biometric recognition, gesture recognition both on screen and adjacent to the screen,
air gestures, head and eye tracking, and touch recognition associated with displays
on the computing device 1100. The computing device 1100 may be equipped with depth
cameras, such as stereoscopic camera systems, infrared camera systems, redgreen-blue
(RGB) camera systems, and combinations of these, for gesture detection and recognition.
Additionally, the computing device 1100 may be equipped with accelerometers or gyroscopes
that enable detection of motion. The output of the accelerometers or gyroscopes may
be provided to the display of the computing device 1100 to render immersive augmented
reality or virtual reality.
[0131] Some embodiments of computing device 1100 may include one or more radio(s) 1124 (or
similar wireless communication components). The radio 1124 transmits and receives
radio or wireless communications. The computing device 1100 may be a wireless terminal
adapted to receive communications and media over various wireless networks. Computing
device 1100 may communicate via wireless protocols, such as code division multiple
access ("CDMA"), global system for mobiles ("GSM"), or time division multiple access
("TDMA"), as well as others, to communicate with other devices. The radio communications
may be a short-range connection, a long-range connection, or a combination of both
a short-range and a long-range wireless telecommunications connection. When we refer
to "short" and "long" types of connections, we do not mean to refer to the spatial
relation between two devices. Instead, we are generally referring to short range and
long range as different categories, or types, of connections (i.e., a primary connection
and a secondary connection). A short-range connection may include, by way of example
and not limitation, a Wi-Fi
® connection to a device (e.g., mobile hotspot) that provides access to a wireless
communication network, such as a wireless local-area network (WLAN) connection using
the 802.11 protocol; a Bluetooth connection to another computing device is a second
example of a short-range connection, or a near-field communication connection. A long-range
connection may include a connection using, by way of example and not limitation, one
or more CDMA, GPRS, GSM, TDMA, and 802.16 protocols.
[0132] Referring now to FIG. 12, FIG. 12 illustrates an example distributed computing environment
1200 in which implementations of the present disclosure may be employed. In particular,
FIG. 12 shows a high level architecture of an example cloud computing platform 1210
that can host a technical solution environment, or a portion thereof (e.g., a data
trustee environment). It should be understood that this and other arrangements described
herein are set forth only as examples. For example, as described above, many of the
elements described herein may be implemented as discrete or distributed components
or in conjunction with other components, and in any suitable combination and location.
Other arrangements and elements (e.g., machines, interfaces, functions, orders, and
groupings of functions) can be used in addition to or instead of those shown.
[0133] Data centers can support a distributed computing environment 1200 that includes a
cloud computing platform 1210, a rack 1220, and a node 1230 (e.g., computing devices,
processing units, or blades) in the rack 1220. The technical solution environment
can be implemented with a cloud computing platform 1210 that runs cloud services across
different data centers and geographic regions. A cloud computing platform 1210 can
implement a fabric controller 1240 component for provisioning and managing resource
allocation, deployment, upgrade, and management of cloud services. Typically, a cloud
computing platform 1210 acts to store data or run service applications in a distributed
manner. A cloud computing platform 1210 in a data center can be configured to host
and support operation of endpoints of a particular service application. A cloud computing
platform 1210 may be a public cloud, a private cloud, or a dedicated cloud.
[0134] Node 1230 can be provisioned with host 1250 (e.g., operating system or runtime environment)
running a defined software stack on node 1230. Node 1230 can also be configured to
perform specialized functionality (e.g., compute nodes or storage nodes) within cloud
computing platform 1210. Node 1230 is allocated to run one or more portions of a service
application of a tenant. A tenant can refer to a customer utilizing resources of the
cloud computing platform 1210. Service application components of a cloud computing
platform 1210 that support a particular tenant can be referred to as a multi-tenant
infrastructure or tenancy. The terms service application, application, or service
are used interchangeably herein and broadly refer to any software, or portions of
software, that run on top of, or access storage and compute device locations within,
a datacenter.
[0135] When more than one separate service application is being supported by nodes 1230,
nodes 1230 may be partitioned into virtual machines (e.g., virtual machine 1252 and
virtual machine 1254). Physical machines can also concurrently run separate service
applications. The virtual machines or physical machines can be configured as individualized
computing environments that are supported by resources 1260 (e.g., hardware resources
and software resources) in the cloud computing platform 1210. It is contemplated that
resources can be configured for specific service applications. Further, each service
application may be divided into functional portions such that each functional portion
is able to run on a separate virtual machine. In the cloud computing platform 1210,
multiple servers may be used to run service applications and perform data storage
operations in a cluster. In particular, the servers may perform data operations independently
but exposed as a single device referred to as a cluster. Each server in the cluster
can be implemented as a node.
[0136] The client device 1280 may be linked to a service application in the cloud computing
platform 1210. The client device 1280 may be any type of computing device, such as
a user device 102a (FIG. 1), and the client device 1280 can be configured to issue
commands to the cloud computing platform 1210. In embodiments, the client device 1280
may communicate with service applications through a virtual Internet Protocol (IP)
and load balancer or other means that direct communication requests to designated
endpoints in the cloud computing platform 1210. The components of the cloud computing
platform 1210 may communicate with each other over a network (not shown), which may
include, without limitation, one or more local area networks (LANs) and/or wide area
networks (WANs).
[0137] Various aspects of the disclosure provide for at least one computer-storage media
having computer-executable instructions embodied thereon that, when executed by a
computing system having a processor and memory, cause the processor to perform operations
comprising: accessing a video indicative of the inside of a storage compartment from
which items are being loaded or unloaded; extracting video data for the video based
at least on the items being loaded or unloaded from inside of the storage compartment
to produce a plurality of video data features comprising a fill-level data feature
and an object-detection data feature; determining, using a fill-level machine learning
(ML) model, a fill-level associated with the inside of the storage compartment based
on at least the video and the fill-level data feature of the plurality of video data
features; classifying, using an object-detection ML model, an object detected inside
of the storage compartment based at least on the video and the object-detection data
feature of the plurality of video data features;_generating analytics indicative of
the determined fill-level and the classified object; and_communicating the analytics
to a display device to cause the display device to generate a graphical user interface
(GUI).
[0138] In some aspects, the video is received from a monocular camera. In some aspects,
the GUI comprises a stream region that comprises a live stream of the video and a
fill-level indication. In some aspects, the GUI comprises an analytics region that
comprises an indication of at least one of: a door open time, a door close time, a
door open period, a human enter time, a human exit time, a maximum number of workers,
a working hours, or a number of pallets.
[0139] In some aspects, the fill-level ML model is trained, using supervised machine learning,
based on labeled data indicative of an edge, a length of the edge, an orientation
of the edge, or an area coordinate associated with the storage compartment. In some
aspects, the object-detection ML model is trained, using supervised machine learning,
based on labeled data indicative of a human object, a pallet object, a load stand
object, a parcel retainer object, a parcel on a conveyer belt object, a forever bag
object, the conveyer belt object, or a small container object.
[0140] In some aspects, the analytics comprise: a fill-level indicative of the items stored
in the storage compartment; and an object indication indicative of whether the object
corresponds to a human object, a pallet object, a load stand object, a parcel retainer
object, a parcel, a forever bag object, a conveyer belt object, or a small container
object. In some aspects, the object-detection features comprise at least one of: an
object feature, comprising at least one of: a human object, a pallet object, a load
stand object, a parcel retainer object, a parcel, a forever bag object, a conveyer
belt object, or a small container object; or a status feature, comprising at least
one of: door open status, door closed status, door partially open status, door open
but trailer not ready status, or unloading or loading completed status. In some aspects,
the fill-level features comprise at least one of: edge detection, edge length, edge
orientation, or area coordinates associated with the storage compartment.
[0141] Various aspects of the disclosure provide a computerized system that comprises at
least one computer processor and computer memory storing computer-useable instructions
that, when used by at least one computer processor, cause the at least one computer
processor to perform operations comprising: accessing a video indicative of the inside
of a storage compartment from which items are being loaded or unloaded; extracting
video data for the video based at least on the items being loaded or unloaded from
inside of the storage compartment to produce a plurality of video data features comprising
a fill-level data feature and an object-detection data feature; determining a fill-level
associated with the inside of the storage compartment based on at least the fill-level
data feature of the plurality of video data features; detecting and classifying an
object inside of the storage compartment based at least on the object-detection data
feature of the plurality of video data features; and causing presentation of a graphical
user interface (GUI) comprising: a stream region comprising a stream of the video;
and an analytics region comprising analytics indicative of progress in the items being
loaded or unloaded from the storage compartment based on the determined fill-level
and the object.
[0142] In some aspects, causing presentation comprises: updating the stream region to continue
playing the stream of the video; and removing an indication associated with the object
when the object disappears from the video stream, or adding another indication when
another object is classified. In some aspects, the stream region comprises an indication
associated with the object, wherein the indication comprises a visually distinct symbol
that classifies the object based on the classification. In some aspects, the indication
associated with the object is determined based on a Kalman filter that compares a
current detected state of the object with a predicted state of the object, wherein
the state of the object corresponds to whether the object appears in the stream of
the video.
[0143] In some aspects, the video is received from a monocular camera positioned within
or on a vehicle that comprises the storage compartment. In some aspects, the analytics
indicative of progress comprises an indication corresponding to at least one of: a
door open status, a door closed status, a door partially open status, a door open
but trailer not ready status, or an unloading or loading completed status. In some
aspects, the object comprises at least one of: a human object, a pallet object, a
load stand object, a parcel retainer object, a parcel on a conveyer belt object, a
forever bag object, the conveyer belt object, or a small container object.
[0144] Various aspects of the disclosure provide a computer-implemented method that comprises:
accessing a fill-level machine learning (ML) model and an object-detection ML model;
training the fill-level ML model based on a first set of labeled data corresponding
to a fill-level data feature, of a plurality of video data features, indicative of
a fill-level of items being loaded or unloaded from a storage compartment; training
the object-detection ML model based on a second set of labeled data corresponding
to an object-detection data feature, of the plurality of video data features, indicative
of a classified object within the storage compartment; and deploying the trained object-detection
ML model and the trained fill-level ML model to a computing device.
[0145] In some aspects, the computer-implemented method further comprises: validating the
fill-level ML model based on a first set of unlabeled data applied to a corresponding
loss function; and validating the object-detection ML model based on a second set
of unlabeled data applied to a corresponding loss function. In some aspects, the first
set of labeled data comprises edges, from a video frame, that have been identified
as an intersection between surfaces defining an edge of an inside of the storage compartment.
In some aspects, the second set of labeled data comprises objects, from a video frame,
that have been identified as at least one of: a human object, a pallet object, a load
stand object, a parcel retainer object, a parcel on a conveyer belt object, a forever
bag object, a conveyer belt object, or a small container object.
[0146] Many different arrangements of the various components depicted, as well as components
not shown, are possible without departing from the scope of the claims below. Embodiments
of the present disclosure have been described with the command to be illustrative
rather than restrictive. Alternative embodiments will become apparent to readers of
this disclosure after and because of reading it. Alternative means of implementing
the aforementioned can be completed without departing from the scope of the claims
below. Certain features and sub-combinations are of utility and may be employed without
reference to other features and sub-combinations and are contemplated within the scope
of the claims.
[0147] Having identified various components utilized herein, it should be understood that
any number of components and arrangements may be employed to achieve the desired functionality
within the scope of the present disclosure. For example, the components in the embodiments
depicted in the figures are shown with lines for the sake of conceptual clarity. Other
arrangements of these and other components may also be implemented. For example, although
some components are depicted as single components, many of the elements described
herein may be implemented as discrete or distributed components or in conjunction
with other components, and in any suitable combination and location. Some elements
may be omitted altogether. Moreover, various functions described herein as being performed
by one or more entities may be carried out by hardware, firmware, and/or software,
as described below. For instance, various functions may be carried out by a processor
executing instructions stored in memory. As such, other arrangements and elements
(e.g., machines, interfaces, functions, orders, and groupings of functions) can be
used in addition to or instead of those shown.
[0148] Embodiments described in the paragraphs below may be combined with one or more of
the specifically described alternatives. In particular, an embodiment that is claimed
may contain a reference, in the alternative, to more than one other embodiment. The
embodiment that is claimed may specify a further limitation of the subject matter
claimed.
[0149] The subject matter of embodiments of the invention is described with specificity
herein to meet statutory requirements. However, the description itself is not intended
to limit the scope of this patent. Rather, the inventors have contemplated that the
claimed subject matter might also be embodied in other ways, to include different
steps or combinations of steps similar to the ones described in this document, in
conjunction with other present or future technologies. Moreover, although the terms
"step" and/or "block" may be used herein to connote different elements of methods
employed, the terms should not be interpreted as implying any particular order among
or between various steps herein disclosed unless and except when the order of individual
steps is explicitly described.
[0150] For purposes of this disclosure, the word "including" has the same broad meaning
as the word "comprising," and the word "accessing" comprises "receiving," "referencing,"
or "retrieving." Further the word "communicating" has the same broad meaning as the
word "receiving," or "transmitting" facilitated by software or hardware-based buses,
receivers, or transmitters using communication media described herein. In addition,
words such as "a" and "an," unless otherwise indicated to the contrary, include the
plural as well as the singular. Thus, for example, the constraint of "a feature" is
satisfied where one or more features are present. Also, the term "or" includes the
conjunctive, the disjunctive, and both (a or b thus includes either a or b, as well
as a and b).
[0151] For purposes of a detailed discussion above, embodiments of the present invention
are described with reference to a distributed computing environment; however, the
distributed computing environment depicted herein is merely exemplary. Components
can be configured for performing novel aspects of embodiments, where the term "configured
for" can refer to "programmed to" perform particular tasks or implement particular
abstract data types using code. Further, while embodiments of the present invention
may generally refer to the technical solution environment and the schematics described
herein, it is understood that the techniques described may be extended to other implementation
contexts.
[0152] Embodiments of the present invention have been described in relation to particular
embodiments which are intended in all respects to be illustrative rather than restrictive.
Alternative embodiments will become apparent to those of ordinary skill in the art
to which the present invention pertains without departing from its scope.
[0153] From the foregoing, it will be seen that this invention is one well adapted to attain
all the ends and objects hereinabove set forth together with other advantages which
are obvious and which are inherent to the structure.
[0154] It will be understood that certain features and sub-combinations are of utility and
may be employed without reference to other features or sub-combinations. This is contemplated
by and is within the scope of the claims.