Field
[0001] Examples relate to apparatuses and methods for assessing a quality of a prediction
model for predicting mobility-related information. For instance, the prediction model
may be part of an on-street parking information system of a vehicle.
Background
[0002] Prediction models for predicting mobility-related information, such as on-street
parking information systems (OSPI), have become more popular in the original equipment
manufacturer industry over the last decade. Vehicles cruising for on-street parking
contribute to a significant amount of congestion within a city's inner urban area.
Based on 22 studies in different cities ranging from 1927 to 2015 (
R. C. Hampshire and D. Shoup, "What Share of Traffic is Cruising for Parking?," J.
Transp. Econ. Policy, vol. 52, no. July, pp. 184-201, 2018.), the average cruising traffic share in a city is around 34% and drivers spend around
8 minutes searching for parking. OSPI navigate drivers in search for on-street parking
and reduce traffic congestion caused by cruising drivers. OSPI may also provide pre-departure
information of parking availability at destination.
[0003] Prediction models of OSPI differ in a collection of training data and in features
considered for training, validating, and testing the prediction models. Data sources
for training data may be smart parking meters, real-time ground sensors, images captured
by a camera mounted on a moving vehicle, crowd-sensing information from probe vehicles,
e.g., taxis, with on-board sensors, cameras, or ultrasonic sensors. Differences in
the training data affect the reliability and quality of the prediction model. The
quality of the prediction models is conventionally validated by comparing randomly
selected predefined mobility-related information, so-called ground truth data, against
predicted mobility-related information.
[0004] Most prediction models aim to achieve a real-time prediction, some estimate parking
availability for a given time interval, like 10 to 20 minutes. Further added value
for users come with a capability to correctly assess the quality of the prediction
model. A "true" quality of OSPI needs to be addressed. An accuracy and true quality
of OSPI determine benefits gained in a transport network.
[0005] Conventional quality assessment of prediction models like OSPI has mainly focused
on measuring an accuracy of the predictions using randomized testing, i.e., collecting
ground truth randomly from any street within a specific area. This quality assessment
may have a low significance as it does not consider features important to system objectives
and user or management expectations.
[0006] On-street parking information systems (OSPI) are a special case as high efforts are
required for a quality assessment in comparison to traffic prediction, for instance.
The prior involves a high number of small streets where low volume of on-street parking
events occurs, whereas the latter deals with low number of major roads where high
volume of traffic events are measured easier. This makes quality assessment of OSPI
comparably more error-prone and a high volume of so-called ground truth is needed.
For instance, ground truth is a measured or observed parking availability which is
used for a comparison with a corresponding prediction to assess the quality of the
OSPI. An elaborate collection of ground truth increases efforts for quality assessment.
Although many strategies for ground truth collection exist, there are so far no scalable
method that reduces efforts and costs of ground truth collection. An approach may
be acquiring local knowledge about land use and daily parking behaviour. However,
this approach is not scalable. Thus, a volume of ground truth shall be kept low without
affecting the quality assessment.
[0007] The subject matter claimed herein is not limited to embodiments that solve any disadvantages
or that operate only in environments such as those described above. Rather, this background
is only provided to illustrate one exemplary technology area where some embodiments
described herein may be practiced.
Summary
[0008] The present disclosure may introduce apparatuses and methods which may lower a volume
of ground truth data and increase a significance of quality assessment of prediction
models for predicting mobility-related information, such as parking availability.
For example, the relative importance of information about an arrival of a train with
a 15-minute headway is higher than of a train with a 2-minute headway. Another instance
is, information about a parking availability is more important for a driver in a busy
central area compared to a parking availability in a periphery of a city with minimal
traffic. This leads to a gap between an assessed quality by a service provider and
a users' expected quality based on perceived utility. Ideally, the quality of a prediction
model for predicting mobility-related information may be assessed based on a relative
importance of the prediction to a user. A collection of ground truth data in areas
or at times of low relative importance may be omitted.
[0009] According to a first aspect of the present disclosure, it is provided an apparatus
for assessing a quality of a prediction model for predicting mobility-related information.
The apparatus comprises a processor configured to provide vehicle fleet data indicative
of a usage of vehicles in slices corresponding to different geographical areas and/or
different time periods. The processor is further configured to assign an importance
weight to the slices in accordance with a density of the usage of vehicles in the
respective slice of geographical area and/or time period. The processor is further
configured to compare, for at least one of the slices, predicted mobility-related
information of the prediction model against predefined mobility-related information
in said slice, wherein the processor is configured to weight said slice in accordance
with its assigned importance weight. The processor is further configured to assess
the quality of the prediction model based on the weighted comparison of predicted
and predefined mobility-related information.
[0010] According to a second aspect of the present disclosure, it is provided a computer-implemented
method for assessing a quality of a prediction model for predicting mobility-related
information. The method comprises providing vehicle fleet data indicative of a usage
of vehicles in slices corresponding to different geographical areas and/or different
time periods. The method further comprises assigning an importance weight to the slices
in accordance with a density of the usage of vehicles in the respective slice of geographical
area and/or time. The method further comprises comparing, for at least one of the
slices, predicted mobility-related information of the prediction model against predefined
mobility-related information in said slice, wherein said slice is weighted in accordance
with its assigned importance weight. The method further comprises assessing the quality
of the prediction model based on the weighted comparison of predicted and predefined
mobility-related information.
Brief description of the Figures
[0011] Some examples of apparatuses and/or methods will be described in the following by
way of example only, and with reference to the accompanying figures, in which
- Fig. 1
- illustrates an apparatus for assessing a quality of a prediction model for pre-dicting
mobility-related information;
- Fig. 2
- illustrates a method for assessing the quality of the prediction model for pre-dicting
mobility-related information, with optional benefits analysis;
- Fig. 3
- illustrates a temporal distribution of parking events of example vehicle fleet data;
- Fig. 4
- illustrates a spatial distribution of the parking events of the example vehicle fleet
data;
- Fig. 5
- illustrates a distribution of the parking events over specific areas and time periods;
- Fig. 6
- illustrates a spatial distribution of quadkeys and their encoded labels;
- Fig. 7
- illustrates a spatial distribution of clusters.
Detailed Description
[0012] Some examples are now described in more detail with reference to the enclosed figures.
However, other possible examples are not limited to the features of these embodiments
described in detail. Other examples may include modifications of the features as well
as equivalents and alternatives to the features. Furthermore, the terminology used
herein to describe certain examples should not be restrictive of further possible
examples.
[0013] Throughout the description of the figures same or similar reference numerals refer
to same or similar elements and/or features, which may be identical or implemented
in a modified form while providing the same or a similar function. The thickness of
lines, layers and/or areas in the figures may also be exaggerated for clarification.
[0014] When two elements A and B are combined using an 'or', this is to be understood as
disclosing all possible combinations, i.e., only A, only B as well as A and B, unless
expressly defined otherwise in the individual case. As an alternative wording for
the same combinations, "at least one of A and B" or "A and/or B" may be used. This
applies equivalently to combinations of more than two elements.
[0015] If a singular form, such as "a", "an" and "the" is used and the use of only a single
element is not defined as mandatory either explicitly or implicitly, further examples
may also use several elements to implement the same function. If a function is described
below as implemented using multiple elements, further examples may implement the same
function using a single element or a single processing entity. It is further understood
that the terms "include", "including", "comprise" and/or "comprising", when used,
describe the presence of the specified features, integers, steps, operations, processes,
elements, components and/or a group thereof, but do not exclude the presence or addition
of one or more other features, integers, steps, operations, processes, elements, components
and/or a group thereof.
[0016] This document describes systems and techniques for assessing a quality of a prediction
model for predicting mobility-related information. The assessment may therefore reflect
a relative importance of the mobility-related information to a user of the prediction
model. The subsequent description may mainly refer to predicting parking availability,
e.g., of on-street parking information systems (OSPI). This may be due to OSPI being
a special case where an assessment of predictions needs a high volume of ground truth
data, thus, it may lead to high efforts in collecting comparative data for the assessment.
The present disclosure may seek an automated and scalable method to reduce efforts
of ground truth collection. Persons skilled in the relevant art will recognize that
the present disclosure may also be applicable to other systems using prediction models
in a mobility-related context, such as traffic prediction, prediction of public transport
delay, or alike.
[0017] The said prediction model for predicting mobility-related information may be a computer-implemented
program which may output the mobility-related information upon a request of a user
of the prediction model. The user may be any entity that could benefit from the prediction
model, for example, a driver or passenger of a vehicle or a provider of mobility-related
services. The prediction model may be integrated into a board computer of a vehicle
and may be part of an information system of the vehicle. The prediction model may
alternatively be integrated into an external computing system and exchange data with
the board computer or other user devices comprising processing circuitry to forward
a user request and receive the mobility-related information. The mobility-related
information may be information relevant to the user for a usage of the vehicle. For
example, the mobility-related information may comprise probability values of a parking
availability in the case of an OSPI. The prediction model may be part of a navigation
system. It may predict a parking availability in a certain area, like a street near
a destination defined by the user, and at a certain time, like an hour of the day
as defined by the user. Alternatively, the prediction model may be used for predicting
a traffic status of a vehicle infrastructure. The prediction may comprise probability
values for an availability of a vacant parking spot in the area and time defined by
the user or the navigation system. The prediction model may be developed using complex
machine learning techniques and may have previously been trained with appropriate
training data. The processing circuitry integrating the prediction model may have
an interface to a navigation system which may further process the prediction and navigate
the user according to the prediction via, for instance, a graphical user interface
in the vehicle.
[0018] Quality assessments of said prediction models may usually be made by a service provider
of the prediction models. Quality assessments may be necessary to improve a prediction
model, to advertise a quality of the prediction model, or to choose an appropriate
prediction model among a plurality of prediction models. Conventional quality assessment
of prediction model may be based on randomly collecting ground truth data within a
certain area, like a city. The ground truth data may be observed or measured vacant
parking spots collected in different areas within the city, like in different streets,
and at different dates, like different hours of several days. For example, the ground
truth data used for a BMW study may comprise 20000 random observations throughout
the city of Munich, Germany. The ground truth data may have been collected between
June 2018 and October 2020. Each observation may be made for a block at a certain
time. The block may be a stretch of a street measured from one intersection to the
other. For each observation, the area of the block and the time of the observation
may be recorded. When at least one vacant legal parking spot is observed on the block,
a parking availability value for the corresponding block and time may be set to 1,
otherwise it may be set to 0. Regardless of the number of open spots, the parking
availability may be recorded as a binary outcome - available,1, or not available,
0.
[0019] Conventional quality assessment may comprise requesting predictions of the parking
availability from the prediction model. The parking availability may be requested
for the same areas and times like those of the ground truth data. Conventional quality
assessment may then compare the predictions against the ground truth data for each
area and time. It may determine an average difference between the predictions and
the ground truth data over all areas and times. The average difference may be considered
a loss of the prediction.
[0020] However, conventional quality assessment may not consider a relative importance of
the areas and times to a potential user. For instance, a certain area may be located
within a center of a city where vacant parking spots are rare and frequently demanded
by drivers. Likewise, a certain time, like an afternoon hour during a working day,
may be critical in terms of parking availability. So, such an area or time may be
seen as relatively important to users of the prediction model. Another area may be
located in outskirts of a city where vacant parking spots are much easier to find
and seldomly requested by a potential user. Another time may be a night hour on Sunday
when vacant parking spots are usually non-critical. So, the latter mentioned area
and time may be seen as relatively unimportant to users of the prediction model. As
the relative importance is not considered in conventional quality assessment, a significance
of the quality assessment may be diminished.
[0021] Thus, it may be seen as an objective of the present disclosure to increase the significance
of the quality assessment of the prediction model.
[0022] Fig. 1 illustrates an apparatus 100 for assessing a quality of a prediction model 110 for
predicting mobility-related information according to some embodiments of the present
disclosure. The apparatus 100 comprises a processor 120 configured to provide vehicle
fleet data 130 indicative of a usage of vehicles in slices 140, such as 142, 144,
146 and others as indicated by the dots 148, corresponding to different geographical
areas and/or different time periods.
[0023] The vehicle fleet data 130 may comprise timestamps and/or geographical coordinates
of parking events or traffic events in the slices 140. Alternatively, the vehicle
fleet data 130 may indicate other usage of vehicles in slices 140. In the subsequent,
OSPI is used as non-limiting example. In the case of OSPI, the vehicle fleet data
130 may comprise parking events. Geographical coordinates of the parking events may
lie within a certain area, like a city. For instance, vehicle fleet data 130 may be
retrieved from a vehicle fleet of a plurality of vehicles equipped with GPS sensors,
real-time clocks or timers, controllers with an interface to backends of the vehicles'
board computers, and storage devices. The vehicles may usually move within the certain
area. For instance, BMW's OSPI for Munich, Germany may be examined. The vehicle fleet
data 130 may be gathered from a vehicle fleet of BMW vehicles. A data collection of
the vehicle fleet data 130 may be performed at BMW's backend services which may include
anonymization according to EU defined data privacy standards. A parking event of a
vehicle may be generated when a vehicle's engine is switched off or on, triggering
a parked-in event or parked-out event, respectively. Switching off or on the engine
may be tracked by the backend of the vehicle's board computer. The backend may trigger
the controller via the interface to the backend to store current geographical coordinates
of the GPS sensor, a current timestamp generated by the real-time clock, and a type
of the parking event (parked-in or parked-out) on the storage device. After a predefined
study time, data of the storage devices of all vehicles of the vehicle fleet may be
collected and read out by a computing system, such as the processor 120. For the BMW
study, data from February 2020 to September 2020 may have been taken. The collected
data may be considered as vehicle fleet data 130. The computing system may post process
the vehicle fleet data 130 to contain only parking events within a proximity of a
street. So, parking events triggered by parking in a garage or car park may be excluded.
[0024] Referring back to
fig. 1, the processor 120 is further configured to assign an importance weight 150 to the
slices 140 in accordance with a density of the usage of vehicles in the respective
slice 140 of geographical area and/or time period.
[0025] As mentioned above, the vehicle fleet data 130 may be used for determining a relative
importance of areas and times to the user and using the relative importance for a
quality assessment of a prediction model 110. For this purpose, the slices 140 within
an examined area and/or time for which the vehicle fleet data 130 is collected, like
the city of Munich and the year 2020, may be defined. The slices 140 may correspond
to different geographical areas and/or different time periods. For example, firstly,
a map of the examined area may be portioned into map tiles which may be square shaped
and equally sized spatial divisions of the examined area. The map tiles may be labelled
with numbers. Secondly, an optional temporal dimension of the slices 140 may be defined
as 168 week-hours being one-hour divisions of a week. Thirdly, the slices 140 may
be defined as spatial-temporal divisions of the examined area and time. The slices
140 may result in an array of the map tiles over the 168 week-hours. The parking events
of the vehicle fleet data 130 may be assigned to the slices 140 according to the geographical
coordinates and timestamps of the parking events. The relative importance may be defined
by the percent volume weight or density of the parking events that occur within the
slices 140. An importance weight 150 may be assigned to each slice 142, 144, 146,
148 according to the relative importance. For instance, the importance weight 150
for the slices 140 may be calculated as follows:

where ws may be the importance weight 150 for a slice with a number s. The slices
140 may in this case be numbered consecutively from 1 on. With the number s, a specific
slice may be selected. PEVolume
s may be a volume of the parking events in the slice with the number s. N may be a
total number of slices 140. The skilled person in the art will recognize that the
slices 140 may be defined as any other geographical areas, e.g., clusters of map tiles
or area divisions of other shapes, and/or time periods, e.g., 2-hour intervals of
a year or irregular time sections, that are appropriate for the application. The spatial
dimension of the slices 140 may not be combined with a temporal dimension or the other
way round. More examples of slices 140 will be given below. Furthermore, the slices
140 may be of any number other than demonstrated in the examples.
[0026] A portioning of areas and times for defining slices may be referred to as prioritization-based
subsampling strategy PSS. In other words, different PSS may look at vehicle fleet
data from several perspectives. A PSS may be chosen by robustness of an assessment
design. A first PSS may be purely based on spatial slices, referred to as neighbourhoods.
The first PSS may only consider a density of vehicle fleet data in each neighbourhood
within a city over an entire study period. The first PSS may be based on a quadkey
concept (
Microsoft, "Zoom levels and tile grid," 2020, https://docs.microsoft.com/en-us/azure/azure-maps/zoom-levels-and-tile-grid?tabs=csharp#quadkey-indices%5C.). The quadkey concept may be an indexing convention and unique identifier of a standard
map tile at a specific zoom level. The quadkey concept may be a method for partitioning
a map into map tiles. The quadkey concept may be a standard used by Microsoft's Azure
Maps. The zoom level of quadkeys varies from 0 to 24, corresponding to a map tile
size of 40,075,017 m x 40,075,017 m to 2.39 m x 2.39 m, respectively. The finer a
map tile level the lower may be a volume of the vehicle fleet data per map tile, and
thereby the higher may be a relative error. The quadkey concept may be favorable to
generate reproducible and comparable results for similar applications. Each map tile
may equate to a slice; the densest map tile, or the map tile with the highest number
of parking events, may then be considered most important. In this manner, the map
tiles may be sorted from most to least important.
[0027] A second PSS may define slices 140 as 168 week-hours. A time interval of one hour
may be due to heuristics as it is not too small, and not too large, while maintaining
interval consistency. In the mentioned BMW study half-hour slices were also experimented
with, but with negligible differences in overall scores, hence, omitted from further
analysis. A busiest week-hour may be the densest slice and, thereby, the most important.
Typically, morning and afternoon peak hours may have the highest densities and after
midnight hours may be the quietest.
[0028] A third PSS may be a combination of the first and the second PSS. Each neighbourhood
may be divided into 168 week-hours, resulting in a number of slices 140 equal to a
number of neighborhoods times 168. The first and the second PSS may be on a higher
aggregated level, while the third PSS may create lower aggregated priority, in other
words, result in more slices 140. The third PSS may be a generic strategy that can
be used in any city use case. It may divide a study area spatially based on the quadkey
concept combined with the week-hour basis. This may allow for a precise identification
of important areas. For instance, it may allow for pinpointing neighbourhoods that
are more important at specific hours during a week. The slices 140 may be sorted according
to a density of vehicle fleet data 130 in the slices 140, more concretely, the slices
140 may be sorted by the number of parking events having occurred in the respective
slices. Since third PSS may be both across neighbourhood and time, the sequence of
slices 140 ordered by importance may discontinuously step from one neighbourhood and
week-hour to another. For example, a topmost important slice may be neighbourhood
A on Monday 13:00-14:00, while a second most important may be from neighbourhood B
on Monday 8:00-9:00.
[0029] A fourth PSS may be based on neighbourhood clusters and time. Neighbourhood clusters
may be generated based on vehicle fleet data within the neighbourhoods of an examined
area, like a city. Neighbourhoods that have a similar spread of vehicle fleet data
130 over time may be grouped together and be treated as one entity. The clustering
may be done by defining a behaviour of each neighbourhood through an aggregation method
of the vehicle fleet data 130. And then, the clustering may be performed on a corresponding
behavioural pattern. For the case of OSPI, the neighbourhood clusters may be based
on a temporal trend of parking dynamics (TTPD) inferred from parking events. TTPD
may be a week-hour time-series of a cumulative sum of a difference of week-hour normalized
average parked-in and parked-out events per 30-minute intervals at quadkey zoom level
14. For OSPI, zoom level 14 may be selected as an optimum since a more localized level
may generate high relative errors given that a volume of parking events within 30-minute
intervals was small. Each neighbourhood at zoom level 14 may have a particular normalized
TTPD. The TTPDs may be used as a base for clustering similar neighbourhoods. Each
neighbourhood cluster may consist of multiple neighbourhoods and may be spatially
treated together. The logic behind this approach may be that different neighbourhoods
with similar parking behaviour may be analyzed on the same level and therefore be
combined in a neighbourhood cluster. As a next step, the neighbourhood clusters may
then be divided into 168 week-hours to form slices 140 as previously. The slices 140
may then be sorted according to a density of the vehicle fleet data 130.
[0030] Referring back to
fig.1, the processor 120 is further configured to compare, for at least one of the slices
140, for example, slice 142, 144, 146, predicted mobility-related information 162,
164, 166 of the prediction model 110 against predefined mobility-related information
172, 174, 176 in said slice 142, 144, or 146, respectively. The predicted mobility-related
information 162, 164, 166 may be probability values of a parking availability in the
respective slices 142, 144, 146; predefined mobility-related information 172, 174,
176 may be measured or observed ground truth data of a parking availability in the
respective slices 142, 144, 146. The processor is configured to weight said slice
142, 144, or 146 in accordance with its assigned importance weight 150.
[0031] So, once an appropriate PSS is identified, the slices 140 for the PSS may be generated.
The slices 140 may then be used for subsampling a collected ground truth data. In
other words: the ground truth data may be reduced by excluding ground truth data of
the least important slices 140. Likewise, the most important slices 140 may be prioritized
in a quality assessment of the prediction model 110. For instance, the processor 120
may be configured to select a certain percentage of the slices 140 with the lowest
importance weight 150 and set the respective importance weight 150 to zero. The weighted
comparison performed by the processor 120 may be considered a key performance indicator
(KPI) which may serve as a quality metric for assessing the prediction model 110.
[0032] The KPI may ensure that prioritizations are consistent over the slices 140 and may
measure a "true" quality correctly. A popular KPI may be the Brier Score, as described
in the following equation:

where pt may be the predicted mobility-related information 162, 164, or 166 for an
instance t, e.g., for a certain slice; ot may be the predefined mobility-related information
172, 174, or 176 for the instance t (ot may be 0 if there was no occurrence, 1 if
there was an occurrence); and N may be a total number of instances. The KPI may be
calculated for each slice 140 of a PSS. A total KPI for a PSS may be calculated based
on an evidence-based multi-criteria decision making method called weighted sum model
WSM as described in the following equation:

where KPI
s may be the KPI of the slice s, ws may be the importance weight 150 assigned to the
slice s according to the equation above and N may be a total number of slices 140.
WSM may be favorable due to its objectivity and not being prone to score skewness.
The total KPI may be dependent on which prediction model 110 and which PSS is used.
The KPI may be a loss function of the predictions. Hence, a low value of KPI may indicate
a good prediction, in other words, it may indicate a prediction is close to corresponding
ground truth data.
[0033] Referring back to
fig. 1, the processor 120 is further configured to assess the quality of the prediction model
110 based on the weighted comparison of predicted mobility-related information 162,
164, 166 and predefined mobility-related information 172, 174, 176. The weighted comparison
may be the total KPI of the prediction model 110 for a specific PSS.
[0034] The apparatus 100 may use the assessed quality for selecting a prediction model among
a plurality of prediction models for a user-selected slice. For instance, a driver
of a vehicle may use an OSPI integrated into a navigation system or other information
system of a board computer in the vehicle. The driver may request via a user interface
of the board computer an information about a parking availability in a destination
area at a destination time. The destination area and destination time may be assigned
to a slice of a PSS, being designated as the user-selected slice. The apparatus 100
may compare the assessed quality of the plurality of prediction models and select
the prediction model with the best assessed quality, e.g., the lowest total KPI, for
the user-selected slice. The selected prediction model may be used for generating
an estimated parking availability for the destination area and destination time. The
processing circuitry of the board computer may answer the request of the driver by
displaying the estimated parking availability on a display in the vehicle. The information
system of the vehicle may use the estimated parking availability to estimate a time
for searching a parking spot or to propose streets near the destination area for parking.
A navigation system may navigate the driver to the proposed streets or consider the
time for searching a parking spot for a proposed departure time. The apparatus 100
may increase a reliability of such a navigation system.
[0035] Additionally or alternatively, the apparatus 100 may have an interface to a computing
system generating the predictions of the prediction model 110. The apparatus 100 may
send via the interface the assessed quality of the prediction model 110 to the computing
system. The computing system may use the assessed quality to adapt the prediction
model 110, for instance, to adjust computational weights of a machine-learning model
used for the prediction model 110. The apparatus 100 may increase an accuracy of the
predictions of the prediction model 110.
[0036] Aside from importance weighting based on a percent volume share of vehicle fleet
data in each slice, another weighting technique may be considered in this document:
equal weighting for all slices, which may be computed by one divided by total number
of slices 140. This may be done to validate importance weighting and to see whether
importance weighting may shift a penalty or incentive of the assessment to important
slices.
[0037] Randomized subsampling may be the norm to reduce any biases in subsampling. In contrast,
the present disclosure may introduce a prioritization-based subsampling based on PSS
as a competing method to the conventional randomized subsampling. Prioritization-based
subsampling may enable an assessment of a "true" quality of prediction models as it
prioritizes slices 140 based on a relative importance to a user.
[0038] Fig. 2 illustrates a flow-chart of a method 200 for assessing a quality of a prediction
model 110 for predicting mobility-related information according to some embodiments
of the present disclosure. The method 200 may optionally include a validation of the
above-mentioned prioritization-based subsampling. The method 200 may comprise providing
210 vehicle fleet data 130 indicative of a usage of vehicles in slices 140 corresponding
to different geographical areas and/or different time periods. Providing 210 the vehicle
fleet data 130 may comprise acquiring vehicle fleet data 130 from an external source,
like an external server or other computing system. Providing 210 the vehicle fleet
data 130 may also comprise processing the vehicle fleet data 130, e.g., sorting the
vehicle fleet data 130 by the slices 140. Processing the vehicle fleet data 130 may
comprise identifying relevant information in the vehicle fleet data 130 for the usage
of vehicles, e.g., identifying parking events in the case of OSPI or traffic events
in the case of traffic prediction. Providing 210 the vehicle data 130 may further
comprise determining an appropriate PSS, e.g., the first, second, third, or fourth
PSS as explained above. The PSS may define slices 140. The method 200 may further
comprise assigning 220 an importance weight 150 to the slices 140 in accordance with
a density of the usage of vehicles in the respective slice of geographical area and/or
time. Assigning 220 may comprise generating priority slices as a subset of the slices
140 with the most important slices according to the importance weight 150. The method
200 may further comprise comparing 230, for at least one of the slices 140, predicted
mobility-related information 162, 164, 164 of the prediction model 110 against predefined
mobility-related information 172, 174, 176 in said slice, wherein said slice is weighted
in accordance with its assigned importance weight 150. Comparing 230 may comprise
acquiring the predicted mobility-related information 162, 164, 164 of the prediction
model 110 and the predefined mobility-related information 172, 174, 176. Comparing
230 may comprise calculating a Brier Loss, e.g., for determining a KPI, or using another
loss function for each priority slice. Comparing 230 may comprise determining a total
KPI as weighted product sum of KPI over all priority slices of the PSS. The method
200 may further comprise assessing 240 the quality of the prediction model 110 based
on the weighted comparison of predicted mobility-related information 162, 164, 164
and predefined mobility-related information 172, 174, 176. In other examples, there
may be more or less than the presented predicted mobility-related information 162,
164, 164 and predefined mobility-related information 172, 174, 176.
[0039] Optionally, the method 200 may comprise validating 250 the assessed quality by performing
a randomized comparison of predicted mobility-related information 162, 164, 164 and
predefined mobility-related information. The randomized comparison may be a baseline
for validating the weighted comparison of a prioritization-based subsampling, for
instance. Validating 250 may also comprise comparing the total KPI of different PSS
for selecting an appropriate PSS for the prediction model 110. Validating 250 may
be useful for a benefits analysis of the prioritization-based subsampling against
conventional randomized subsampling. Optionally, the method 200 may comprise reducing
260 a collection of predefined mobility-related information 162, 164, 164. As priority
slices are determined, a collection of ground truth data in non-priority slices may
be omitted for further assessments of the prediction model 110.
[0040] The method 200 may further comprise assessing (not shown) the quality of a plurality
of prediction models based on a weighted comparison of respective predicted and the
predefined mobility-related information. For example, the processor 120 may run through
the above-mentioned steps for each of a plurality of the prediction models. The method
200 may further comprise comparing the assessed quality of the plurality of prediction
models for determining a suitable prediction model for at least one of the slices.
For example, the processor 120 may determine the lowest loss of the predicted mobility-related
information for the at least one of the slices among respective losses of the prediction
models. The method 200 may further comprise using the determined suitable prediction
model to predict the mobility-related information in the slice. For example, the processor
120 may select the suitable prediction model with the lowest loss for the slice.
[0041] In the case of OSPI, the processor 120 may select the suitable prediction model with
the highest chances to predict correctly a parking availability in a certain slice.
A user of a car may use an application on a board computer of his or her car or an
application on another user device, like a smartphone. In the application, like a
navigation or mapping application, the user may select a certain destination and/or
time. The board computer or user device may have an interface to the apparatus 100
and convey the selection of the user to processing circuitry of the apparatus, like
the processor 120. According to the selection of the user, the processor 120 may define
a slice following a PSS. The slice may correspond to an area around the selected destination
and/or to a time interval around the selected time which is considered as potential
parking area and/or parking time. Alternatively, the user may indicate in the application
that he or she looks for a parking spot and automatically select a slice according
to a location of the user (as indicated by a GPS sensor) and a time (as indicated
by a clock). The selected suitable prediction model may then be used to predict the
parking availability of the slice. The processor 120 may convey the predicted parking
availability, for example in form of probability values, via the interface to the
board computer or user device where the application uses the prediction to answer
a request of the user, such as to navigate the user to a street within the slice with
high probability to have vacant parking spots or to display on a graphical user interface
in the car or on the user device the probability for a vacant parking spot in the
slice. The application may give, based on the prediction, a recommendation to park
in a different area than selected by the user or to depart to the destination to a
later point of time. By using the suitable prediction model selected by the apparatus
100, the application may be able to be more precise in its indications to the user.
The user may benefit from better parking predictions which may help the user to choose
a time for departure that decreases the cruising time for parking. The predictions
may help the user to make better decisions in his or her mobility behavior, for example,
the user may prefer to do without the car and use public transport when faced with
a low possibility to find a parking spot at time.
[0042] On a global scale, the suitable prediction model selected by the apparatus 100 may
reduce traffic in an area where the application is used. In particular, traffic in
busy times and areas may be equalized as those busy times and areas would be considered
as important in the slices and would shift the incentive of prediction models to those
busy times and areas, in other words: parking predictions would be very precise for
those busy times and areas. Users of the application would precisely see their chances
to get a parking spot in those busy times and areas. Thus, they may plan their car
trip differently to avoid long cruising for parking. A navigation system may use the
predictions to smartly navigate the users to quieter areas near a destination of the
users.
[0043] An experimental design may be defined to test the prioritization-based subsampling
against thousands of random subsamples of the randomized subsampling. The experimental
design may be defined to test the chances of falsely assess the quality of the prediction
models 110. After the total KPI are calculated for all considered PSS, the next step
may be to check the benefits of "true" quality assessment. This may be done by comparing
the total KPI against the total KPI of a baseline, which is equal weighting of slices
140 and randomized subsampling of ground truth data. The experimental design for randomized
subsampling of ground truth data may be necessary to assess and ensure the robustness
of the prioritization-based subsampling. One objective may be to ensure that if any
of the PSSs are followed for a collection of ground truth data, the followed PSSs
are likely to be representative of the "true" quality of the prediction system 110.
The goal of randomized subsampling may be to generate different random slices independently
from vehicle fleet data density. The ideal, however, unrealistic randomized subsampling
may be adjusted to give the best results for assessing the quality of the prediction
model 110. This may be useful to provide a baseline for the benefits analysis of the
PSS implemented. Validating 250 may aim to identify weakly designed prediction models
110 that only perform well in rare instances. The experimental design may ensure that
the randomized subsamples cover most possible combinations of subsets of slices 140.
The comparison of priority slices with random slices may be done for comparing effects
of a reduction of ground truth data. This also may provide an opportunity to check
the benefits of the PSS with a smaller volume of ground truth data, which may result
in higher relative error. It is noted that the top importance weights may correspond
to the vehicle fleet data percent share that is attributed to a slice, and therefore
may not correspond to a volume of the ground truth data in said slice. For instance,
within the top 50% most important slices, it may be possible to only have a sample
size of 30% of ground truth observations occurring in the top 50% most important slices.
In summary, exemplary steps taken for validating 250 the assessed quality are show
in the following:
- 1. Sorting the slices 140 of each PSS based on corresponding importance weights 150.
- 2. Selecting top 30% up to top 90% most important slices, at 5% interval steps and
calculating the total KPI for all PSS.
- 3. Getting a share of ground truth data selected randomly for randomized subsampling,
the share of ground truth data may be equivalent to a share of ground truth data resulting
from step 2.
- 4. Running n-number of random trials covering different subsets of the ground truth
data and calculating the KPI for all trials.
- 5. Calculating a variance of the KPI of an m-number of PSS.
- 6. Calculating a variance of a KPI of the n-number of random trials.
- 7. Using an interquartile range (IQR) method of outlier detection for robustness of
KPI scores. The IQR may be calculated as follows: IQR = Q3 - Q1, where Q1 may be a third quartile value and Q1 may be a first quartile value. A
lower bound and upper bound outlier may be detected with the following inequations:
Lower bound outlier > Q1 - 1.5 × IQR > Median > Q3 + 1.5 × IQR > Upper bound outlier
- 8. Comparing the KPI variance for random trials with the KPI variance for the PSS.
- 9. Assessing the robustness of the PSS.
- 10. Analyzing if it is feasible to reduce a collected ground truth data to the most
important slices and if an assessment based on the reduced ground truth data is representative
of the "true" quality.
[0044] As an example, the above-described steps may be performed for a use case of OSPI.
Various parking prediction models may be utilized to generate parking availability
predictions. However, a development of the prediction models may not be of essence
in this document and may be considered as exemplary predictions models to generate
adequate predictions for quality comparison between the prediction models. Some real
feature-based prediction models and random parking prediction models may be used for
the use case as described below. The code carrying out the benefits analysis may be
written in the programming language Python. The main packages used may be: Pandas,
GeoPandas, Folium, Numpy, OSMnx, Matplotlib, Sea-born, Statsmodel, PySal, and Scikit-learn.
[0045] The experimental design setup of the PSSs implemented for the use case are shown
in the following table:
| PSS setup # |
Neighbourhood zoom level |
TTPD Clusters level |
Time interval |
| |
14 |
15 |
16 |
17 |
14 |
|
| 1 |
X |
|
|
|
|
|
| 2 |
|
X |
|
|
|
|
| 3 |
|
|
X |
|
|
|
| 4 |
|
|
|
X |
|
|
| 5 |
X |
|
|
|
|
X |
| 6 |
|
X |
|
|
|
X |
| 7 |
|
|
X |
|
|
X |
| 8 |
|
|
|
X |
|
X |
| 9 |
|
|
|
|
|
X |
| 10 |
|
|
|
|
X |
X |
[0046] The above-mentioned BMW's vehicle fleet data may be used for the examined use case.
It may be observed that the parking events from Mondays to Friday evening have a similar
temporal distribution with small day to day discrepancies, as illustrated in fig.
3. Hence, those parking events may be grouped together. During a working weekday there
may be peaks in the morning and afternoon, as expected since the study area of Munich
is mainly commercial. On weekends, a peak may occur at around noon during lunch hours
and shopping directly before or after lunch hours.
[0047] For the examined use case, the volume of parking events of BMW vehicles in the vehicle
fleet data may be an indicator for relative importance of a corresponding slice. Only
parking event pairs (parked-in, parked-out) with a duration of more than 5 minutes
between parking in and out may be considered to eliminate noise generated by standing
by cars. Hundreds of thousands of parking events that were recorded in Munich during
the indicated collection period may show a spatial-temporal importance of a slice.
[0048] For the first PSS which is based on neighbourhoods, a total volume of parking events
in each quadkey may be considered as importance weight. The highest and lowest quadkey
zoom levels considered as neighbourhood may be level 14 (2457.6m x 2457.6m) and 17
(250m x 250m), respectively. These quadkey zoom levels may be heuristically determined
as assumption of a cruising distance range for on-street parking search. A spread
of the parking events may be mainly focused on city hubs within a polygon, as illustrated
in fig. 4. This may correspond to areas to focus on for the KPI calculation.
[0049] The second PSS which is based on time periods, may show that most parking events
occur in early mornings during weekdays and at noon during weekends. It may be observed
that on a global level, the importance weights in the second PSS are not well distinguishable
since the importance weights are similar during the day hence making it difficult
to prioritize. This may confirm a nature of the study area as being mainly commercial
and business centered. With prioritization only based on temporal slices, a small
trend shift of a collection of ground truth data may be done by selecting the following
top prioritized hours: period 7:00-15:00 during weekdays, 9:00-14:00 on Saturdays.
Sundays can essentially be left out, as it may not be as busy as weekdays.
[0050] The third PSS based on neighbourhoods and time periods applied to the vehicle fleet
data may provide detailed prioritized subsamples of slices, as illustrated in the
distribution of importance weights in fig. 5 for a zoom level of 14. The third PSS
may be performed for zoom levels 14 to 17, only level 14 may be discussed hereafter
as example. The study area may be divided into quadkeys which have encoded labels,
as illustrated by fig. 6. The neighbourhoods at quadkeys 6 and 8 may have the highest
hourly importance. The neighbourhood 8, which is located around the central station
of Munich, may have the highest share of parking events. Within a duration of 6:00-18:00,
most neighbourhoods may have stable hourly importance. In neighbourhood 14, a slight
increase in importance may be observed on Saturday afternoon; this may be traced to
neighbourhood 14 including lots of shopping and dining facilities. Neighbourhoods
0, 4, 10, and 8 may be located at a periphery of the study area, hence, considered
as less important. As an example, slices within the top 50th percentile of importance
weights may be examined. It is noted that the importance weights may not be normalized.
The top 50th percentile may have prioritized 539 slices out of 3671. And instead of
23 neighbourhoods, such a reduction may reduce the number of neighbourhoods to 10.
For top 10%, 20%, 30%, or 40%, respectively, 76, 167, 276, or 398 slices may be considered.
In this manner, the third PSS may provide narrowed down slices that need to be prioritized
for quick quality assessments.
[0051] The fourth PSS based on clusters of neighbourhoods and time periods may build on
the third PSS by aggregating similar neighbourhoods. The logic behind neighbourhood
clustering may be grouping neighbourhoods based on similar temporal trend of parking
dynamics (TTPD). Applying the fourth PSS on the use case of OSPI in Munich may generate
7 neighbourhood clusters, where 2 of them occurring at peripheries may have negligible
importance. So, 5 valid clusters may remain. 5 clusters in the study area may be sufficient
since neighbourhoods within central Munich are quite similar according to the BMW's
vehicle fleet data. The fourth PSS may be applied on zoom level 14 as this is considered
an optimal size for modelling TTPD in 15-minute intervals. The clusters may have encoded
labels, as illustrated by fig. 7. Cluster 1 may contain most areas in Munich city
center and may be considered important in almost all week-hours between 6:00-18:00,
with lesser importance on Sundays. For the same period, Cluster 2 has the same stable
hourly distribution but with lesser magnitude in the importance weights. For cluster
6, the importance weights may be lower in the morning hours and higher in the late
afternoon and evening hours, and then fade shortly after the evening. Clusters 0 and
4 may include neighbourhoods in the periphery, where the importance weights are lower
in magnitude, but uniform during the week. The benefit of the fourth PSS may be that
instead of being limited to certain neighbourhoods like in the third PSS, similar
neighbourhoods can be grouped together.
[0052] The priority slices selected for the respective PSS may now be used as input for
quality assessment of parking prediction models. A Brier Loss Score may be used as
KPI. The focus of this use case may lie mainly on assessing the quality of various
prediction models and not on modelling improvement or development. Only the predictions
as output of the prediction models may be used for comparison of the prediction models.
Twelve prediction models may have been used as examples for testing the quality assessment.
For the calculation of the KPIs, two weighting techniques may be applied: equally
weighted slices and importance weighted slices. The following table may display the
algorithms and the KPI for the twelve prediction models, as well as an average of
equally and importance weighted KPI scores:
| Model # |
Algorithm |
KPI |
Average KPI |
| |
|
|
Equally |
Importance |
| 1 |
Xgboost |
0.249 |
0.249 |
0.249 |
| 2 |
Random Forest |
0.303 |
0.306 |
0.307 |
| 3 |
Xgboost |
0.227 |
0.224 |
0.229 |
| 4 |
Random Forest |
0.236 |
0.233 |
0.238 |
| 5 |
Xgboost |
0.228 |
0.226 |
0.231 |
| 6 |
Random Forest |
0.231 |
0.231 |
0.235 |
| 7 |
Xgboost |
0.233 |
0.232 |
0.232 |
| 8 |
Random Forest |
0.248 |
0.247 |
0.248 |
| 9 |
Random |
0.332 |
0.334 |
0.335 |
| 10 |
Optimistic Random |
0.273 |
0.267 |
0.273 |
| 11 |
Pessimistic Random |
0.486 |
0.493 |
0.487 |
| 12 |
Single Optimum Value |
0.226 |
0.224 |
0.227 |
[0053] Models 1 to 8 may use actual on-street parking related features, while 9 to 11 are
random models. Model 12 may essentially be an unrealistic random guesser that only
has a single optimum prediction value determined based on an expected parking availability
from the ground truth data. Nonetheless, model 12 may be useful as a baseline reference
for a benefits analysis of the PSS. The best randomized subsampling models in this
case may be model 3, 5, 7, and 12, whereas the worst by large may be model 11.
[0054] The KPI scores may be calculated for the PSS and their importance weightings. All
feature-based models may have on average a slightly worse importance weighted KPI,
but not significantly different than the equally weighted and normal KPI. It may be
observed that the importance weights do not shift the KPI scores significantly compared
to the equally weighted scores. The KPI scores may be on average -1.06% worse compared
to importance weighting over all prediction models, while -1.07% over all PSSs. The
first PSSs may have the largest negative relative difference between the equal and
importance weighting. The third PSS, starting at zoom level 14 and 168 week-hours,
may have incurred a positive effect but as the zoom level increase, there may be a
gradual decrease in KPI scores. For the second and fourth PSS, the importance weighting
improved the KPI scores compared to the equal weighting. This may imply that, temporally,
the prediction models are assessed to perform better than the measured normal KPI,
while considering spatial importance, the KPI may punish the prediction models' scoring.
Essentially, the temporal and spatial aspects of the PSSs may create a push and pull
effect in the KPI scoring, thus, a difference between equally weighting and importance
weighting cannot be clearly distinguished. Since from multiple angles, it may have
been confirmed that the KPI scores may not significantly show a difference between
equal and importance weighting, further detailed comparison between the two weightings
is not necessary and only the importance weighting may be used hereafter.
[0055] Having calculated the KPI scores for the considered PSSs and weighting techniques,
the next step may be to validate the quality assessment. This may be done by proving
that the quality assessment using PSSs with priority slices may give better insights
about on-street parking prediction models compared to randomized subsampling of ground
truth.
[0056] The benefits analysis may be done by comparing the KPI scores of the topmost important
slices (prioritization-based subsampling) against the KPI scores for the baseline
case of non-prioritized randomized subsampling (NPRS) of ground truth data. The NPRS
may be done for the same slices as the ones from the PSSs, but the importance weight
was not considered, hence they are non-prioritized. In the case of prioritization-based
subsampling, the size of ground truth data may be reduced by sorting the importance
weights of the PSS slices and then taking a certain top fraction percentage of the
slices. For example, using the prioritization-based subsampling of ground truth data
considering top 90% most important slices of PSS setup 6, the ground truth data size
may be reduced to 3563 observations (30% decrease) out of 5152 observations. However,
if reduction of the ground truth data is done randomly, 90% of the ground truth data
may be 4637 observations. There may be two reasons for the large reduction: (1) slices
may only be generated in areas and time frames that have recorded a parking event,
hence, the ground truth data outside these slices may automatically be disregarded
as less important, in the case of the example, only 4838 observations (6% decrease)
exist for PSS setup 6; and (2) there may be a disproportionate distribution of the
ground truth data throughout the city since the observations were conducted randomly.
Based on the performed reduction, a substantial amount of the collected ground truth
data may be outside important areas.
[0057] Further prioritization-based subsampling may be performed at fractions ranging from
30% - 90% at 10% intervals as a preliminary heuristic step. For the main analysis
here the 50% top fraction may be considered as an example. The same experimental design
was setup for the NPRS. For the NPRS, at each fraction, 1000 random subsampling sets
may be created, resulting from 10 PSS setups and 100 unique random sampling trials
each. For prioritization-based subsampling and NPRS, this may be done to see a difference
in the information retained about quality as compared to calculating the KPI score
for the entire ground truth data. As a counterpart to the average sample size of the
top 50% fraction based on the different PSSs, only a 30% fraction was used for NPRS.
Top 50% importance was selected, as the variances of KPI scores from this fraction
size onwards to 90% may be relatively small.
[0058] The robustness indicator used herein may be the IQR method, as defined above. The
IQR method may be used to measure a spread of KPI variance for each prediction model
and to identify KPI scores far from the median. KPI Scores that are considered as
outliers may be interpreted as subsampling strategies with an unfortunate selection
of slices; it may be an indication that a strongly biased quality assessment is present.
Outliers are not to be considered as part of decisive factors. In the case of NPRS,
60% of KPI scores across the first 8 feature-based prediction models may be worse
than the normal KPI, while this may be 69% for the prioritization-based subsampling.
Moreover, it may be observed that for the pessimistic prediction model (number 11),
the KPI scores may improve in a PSS-based quality assessment since the important slices
may include busy areas and times, suggesting some pessimism necessary for a prediction
model to perform well in such slices.
[0059] The benefits assessment may prove to detect weakly designed prediction models that
only outperform other prediction models because of unfortunate subsampling selection.
It may also verify that marginal benefits comparison between several competing models
can be assessed. In summary, the present disclosure may enable assessing a "true"
quality of a prediction model while reducing the size of ground truth data. It may
also help decide between competing prediction models.
[0060] The proposed data-driven methodology in this document may show that it is possible
to smartly reduce ground truth data and still assess the "true" quality of different
prediction models using prioritization-based subsampling strategies (PSS). Important
neighbourhoods and/or time periods, called slices, may be identified based on a volume
share of parking events in the vehicle fleet data. For the use case of on-street parking
information (OSPI), the methodology was applied using the vehicle fleet data of Munich,
Germany. For OSPI, a particular strategy was created using neighbourhood clusters
based on the concept of temporal trend of parking dynamics (TTPD).
[0061] The benefits assessment of the methodology may confirm that, the prioritization-based
subsampling can identify weakly designed parking prediction models. This was evaluated
based on a comparison with non-prioritized randomized subsampling (NPRS) on a 30%
fraction of the ground truth data. The NPRS may be used to quantify the chances of
misfortunately selecting slices that do not necessarily represent the true quality.
This was accomplished by comparing the quality metric, here KPI, scores at the automatically
defined slices across 10 PSS design setups. The prioritization-based subsampling considered
the top 50% important slices as a subset of slices to assess the "true" quality of
different OSPI models. The methodology also may allow the quality managers of OSPI
service providers to gain first valuable insights of a prediction model fast at a
lower cost with less ground truth data needed. Thus, the introduced methodology may
be useful to companies that are increasing their resources for quality assessment
of mobility-related information systems.
[0062] The aspects and features described in relation to a particular one of the previous
examples may also be combined with one or more of the further examples to replace
an identical or similar feature of that further example or to additionally introduce
the features into the further example.
[0063] Examples may further be or relate to a (computer) program including a program code
to execute one or more of the above methods when the program is executed on a computer,
processor, or other programmable hardware component. Thus, steps, operations, or processes
of different ones of the methods described above may also be executed by programmed
computers, processors, or other programmable hardware components. Examples may also
cover program storage devices, such as digital data storage media, which are machine-,
processor- or computer-readable and encode and/or contain machine-executable, processor-executable
or computer-executable programs and instructions. Program storage devices may include
or be digital storage devices, magnetic storage media such as magnetic disks and magnetic
tapes, hard disk drives, or optically readable digital data storage media, for example.
Other examples may also include computers, processors, control units, (field) programmable
logic arrays ((F)PLAs), (field) programmable gate arrays ((F)PGAs), graphics processor
units (GPU), application-specific integrated circuits (ASICs), integrated circuits
(ICs) or system-on-a-chip (SoCs) systems programmed to execute the steps of the methods
described above.
[0064] It is further understood that the disclosure of several steps, processes, operations,
or functions disclosed in the description or claims shall not be construed to imply
that these operations are necessarily dependent on the order described, unless explicitly
stated in the individual case or necessary for technical reasons. Therefore, the previous
description does not limit the execution of several steps or functions to a certain
order. Furthermore, in further examples, a single step, function, process, or operation
may include and/or be broken up into several sub-steps, -functions, -processes or
-operations.
[0065] If some aspects have been described in relation to a device or system, these aspects
should also be understood as a description of the corresponding method. For example,
a block, device or functional aspect of the device or system may correspond to a feature,
such as a method step, of the corresponding method. Accordingly, aspects described
in relation to a method shall also be understood as a description of a corresponding
block, a corresponding element, a property or a functional feature of a corresponding
device or a corresponding system.
[0066] The following claims are hereby incorporated in the detailed description, wherein
each claim may stand on its own as a separate example. It should also be noted that
although in the claims a dependent claim refers to a particular combination with one
or more other claims, other examples may also include a combination of the dependent
claim with the subject matter of any other dependent or independent claim. Such combinations
are hereby explicitly proposed, unless it is stated in the individual case that a
particular combination is not intended. Furthermore, features of a claim should also
be included for any other independent claim, even if that claim is not directly defined
as dependent on that other independent claim.