[0001] The present invention concerns a computer-implemented method for determining a reward
specific to a player. The present invention also relates to an associated computer
program product and to an associated computer readable medium. The invention further
relates to an associated interrogator system of a game.
[0002] The invention concerns the field of exchanges of a human being with a computer system,
in particular in the context of e-learning systems. In particular, the invention relates
to so-called serious games which are computer-implemented. Such serious games aim
to facilitate the assimilation of pedagogical content using playful mechanisms. This
implies that a serious game can be construed as a technical system comprising at least
one server, configured for processing technical data.
[0003] In this frame, there is a need for a method that enables to simulate the good reactions
brought by a set of players to an item of the serious games and also to increase the
number of good reactions provided by the set.
[0004] To this end, the present description relates to a computer-implemented method for
determining a reward specific to a player, the player playing in a repeated way to
a game, notably a serious game, the game evaluating the reactions of the players to
at least one item provided by the game, the reactions being acquired by using an interrogator
system shared with other players, a reward being attributed to the player in function
of his reactions, the method for determining the reward comprising at least the following
steps: acquiring the previous reactions of the player, to obtain a reaction history
of the player, acquiring the previous reactions of other players, to obtain a collective
reaction history, and determining the reward specific to the player in function of
the reaction history of the player and of the collective reaction history.
[0005] According to particular embodiments, the method for determining a reward includes
one or more of the following features, taken in isolation or in all technically possible
combinations:
- at the determining step, an optimization criterion is used.
- the optimization criterion is one criterion or a combination of criteria chosen in
the group consisting of : maximizing the value of probability of success of the player,
a success being defined as the expected reactions, maximizing the value of probability
of success of the whole set of players, the set of players being formed by the player
and the other players, minimizing the number of games to be played by the player to
obtain a predefined threshold of success to the game, minimizing the number of games
to be played by the set of players to obtain a predetermined threshold of success
to the game, and maximizing the minimum success probability over the set of rewardable
actions, for all players
- at the determining step, a heuristic technique is used.
- the heuristic technique is an algorithm obtained by using a reinforcement learning
technique.
- the heuristic technique is an ant colony algorithm.
- at the acquiring steps, the number of games acquired is superior or equal to 11.
- at the determining step, a total budget afforded to the reward of the game is taken
into account.
- at the determining step, a corrective coefficient is used, the corrective coefficient
providing more rewards to the players with a r probability of success inferior or
equal to a predefined threshold.
- the method is carried out at each game.
- the interrogator system comprises a connected object.
[0006] Furthermore, the present description describes a computer-implemented method for
simulating a reaction model, notably for calibrating a method for determining a reward
specific to a player, the reaction model simulating the reactions of players playing
in a repeated way to a game, notably a serious game, the game evaluating the reactions
of the players to at least one item provided by the game, the reactions being acquired
by using an interrogator system shared with other players, the method for simulating
comprising: acquiring the previous reactions of the players, to obtain a collective
reaction history, selecting the items provided by the game to obtain sets of items,
a set of items gathering the items provided by the game implying with the same level
of difficulty an expected reaction, and obtaining the reaction model based on the
selected sets of items and the collective reaction history.
[0007] According to particular embodiments, the method for simulating a reaction model includes
one or more of the following features, taken in isolation or in all technically possible
combinations:
- at the obtaining step, the reaction model comprises the initial reaction of the set
of players to the at least one item.
- a reward is attributed to each player in function of his reactions and wherein, at
the obtaining step, the reaction model is based on the rewards attributed to the players.
- the reaction model depends on the total rewards attributed to one player.
- at the obtaining step, a parameter used to obtain the reaction model is the sum of
the initial response of the set of players to the at least one item and the number
of rewards obtained by the set of players.
- at the obtaining step, a limiting constant is used, inversely proportional to action
difficulty, to set an upper bound on simulated success probabilities.
- a desired number of expected reactions is provided for each item of the game and at
the obtaining step, the reaction model is based on the desired numbers of expected
reactions.
- an individual reaction profile is provided for each player and at the obtaining step,
the reaction model is based on the individual reaction profiles.
- at the obtaining step, the reaction model depends on an mathematical function.
- at the obtaining step, the reaction model depends on an exponential function.
- at the obtaining step, the reaction model has a general form

A and B being constants.
[0008] The present description also concerns a computer-implemented method for determining
a reward specific to a player, the player playing in a repeated way to a game, notably
a serious game, the game evaluating the reaction of the players to at least one item
provided by the game, the reaction being acquired by using an interrogator system
(10) shared with other players, a reward being attributed to the player in function
of his reactions, the method for determining the reward comprising at least the following
steps: acquiring the previous reactions of the player, to obtain a reaction history
of the player, acquiring the previous reactions of other players, to obtain a collective
reaction history, and completing the acquired collective reaction history by a set
of reactions simulated by using the model obtained when carrying out the method as
previously described, to obtain a completed collective reaction history, and determining
the reward specific to the player in function of the reaction history of the player
and of the completed collective reaction history.
[0009] Furthermore, the description relates to a computer program product comprising instructions
for carrying out the steps of a method as previously described when said computer
program product is executed on a suitable computer device.
[0010] In addition, the description relates to a computer readable medium having encoded
thereon a computer program as previously described.
[0011] In addition, the description also refers to an interrogator system of a game, notably
a serious game, played by a set of players, the game evaluating the reactions of the
players to at least one item provided by the game, the reactions being acquired by
using the interrogator system shared with other players, a reward being attributed
to the player in function of his reactions, the interrogator system comprising a first
acquisition sensor adapted to acquire the previous reactions of the player, to obtain
a reaction history of the player, a second acquisition sensor adapted to acquire the
previous reactions of other players, to obtain a collective reaction history, and
a controller adapted to determine the reward specific to the player in function of
the reaction history of the player and of the collective reaction history.
[0012] According to a particular embodiment, the interrogator system comprises a plurality
of entering modules specific to each player and a server connected to each entering
modules.
[0013] Other features and advantages of the invention will be shown in the following description
of the particular embodiments of the invention, given as examples only and with reference
to the drawings which are:
- Figure 1, a schematic representation of an interrogator system, and
- Figure 2, a graph showing a reaction model of a simulated player in a set of players
using the interrogator system of figure 1 to play to a game.
[0014] An interrogator system 10 is represented on figure 1.
[0015] The interrogator system 10 is an interrogator system of a game shared by a set of
players playing to the game.
[0016] The set of players gathers any number of players, for instance at least 50 players,
even at least 200 players.
[0017] The players are able to interact with the interrogator system 10 which is adapted
to emit information, such as questions or rewards in a context of the game.
[0018] It is assumed that each player plays several times to the game. In this context,
a game can be construed as a party which is to be distinguished from the game itself
which gathers all the events that may happen to the set of players.
[0019] By "reward", it is understood any object or attribution that is desired by the player.
The rewards create thus an incentive to continue the serious game. Rewards are, for
example, points, additional virtual possibilities of development in the game or monetary
incentives.
[0020] According to the described example, the game is a serious game.
[0021] A serious game or applied game is a game designed for a primary purpose other than
pure entertainment. The "serious" adjective is generally prepended to refer to video
games used by industries like defense, education, scientific exploration, health care,
emergency management, city planning, engineering, and politics. Serious games share
aspects with simulation.
[0022] The game is adapted to evaluate the reactions of the players to at least one item
provided by the game.
[0023] By "reaction", it is understood a behavior of the player in the games. The reaction
encompasses, for example, an answer to a question, a movement to another place, a
decision to pursue a problem or not in the games. According to other examples, other
reactions are possible.
[0024] In particular, the reaction comprises the behavior of the player in response to at
least one item provided by the game.
[0025] The expression "item" designates an object, in particular a virtual object, which
provides capabilities in the game to the player.
[0026] For example, an item is an additional function attributed to the player in the game,
an additional tool or any other object which may be useful for the player. The player
may appreciate the attribution of a specific item in function of factors specific
to the player, such as his character, previously attributed items, his convictions
and/or his or her cognitive capabilities.
[0027] The game also attributes the rewards to the player in function of his reactions.
[0028] As explained before, such rewards are any object or attribution that is desired by
the player. The rewards create thus an incentive to continue the serious game.
[0029] Rewards are, for example, points or additional virtual possibilities of development
in the serious game.
[0030] Other types of rewards are possible, such as bonuses or items to be used or enjoyed
in the game's universe: characters, animals, buildings, currency and spells.
[0031] According to another embodiment, the game is controlled by an independent server
which is adapted to interact with the interrogator system 10.
[0032] In the present case, it is supposed that the game is an ecologic game, that is a
serious game devoted to teach the good reaction so as to respect ecologic imperatives.
[0033] The interrogator system 10 comprises a first acquisition sensor 12, a second acquisition
sensor 14 and a controller 16.
[0034] The first acquisition sensor 12 is adapted to acquire the previous reactions of a
specific player (simply named the player in what follows), to obtain a reaction history
of the player.
[0035] According to the proposed example, the first acquisition sensor 12 is a connected
object.
[0036] A connected object is an object which is able to communicate with other electronic
devices such as a computer without using any physical link.
[0037] For instance, in this example, the connected object is a switch 18 controlling light.
[0038] The switch 18 and a memory 20 are part of the first acquisition sensor 12.
[0039] The first acquisition sensor 12 is adapted to record each time the player will interact
with the switch 18.
[0040] Such first acquisition sensor 12 is, for instance, located in a private house.
[0041] The second acquisition sensor 14 is adapted to acquire the previous reactions of
other players than the player, to obtain a collective reaction history.
[0042] Similarly, the second acquisition sensor 14 gathers a switch 22 and a memory 24.
[0043] In the proposed example, the second acquisition sensor 14 corresponds to a public
location wherein several players can access to play.
[0044] For example, each acquisition sensor 14 is configured for acquiring the reactions
of at least 11 previous games for the players so as to have statistically significant
averages on that acquisition.
[0045] According to other examples, the interrogator system 10 comprises three or more than
three acquisition sensors, for example one acquisition sensor for each player.
[0046] The controller 16 is adapted to determine the reward specific to the player in function
of the reaction history of the player and of the collective reaction history.
[0047] The controller 16 is integrated into a computer.
[0048] In particular, the controller 16 is at least partially formed by software which is
executable by a processor and stored in a computer memory.
[0049] Alternatively or in addition, the controller 16 is integrated, at least partially,
into a physical device, such as a programmable logic circuit, such as an FPGA (which
stands for "Field Programmable Gate Array"), or as a dedicated integrated circuit,
such as an ASIC (which stands for "Application Specific Integrated Circuit").
[0050] The operation of the controller 16 will be better understood in reference to an example
of carrying out a method for determining a reward specific to the player.
[0051] The method comprises a first step of acquiring, a second step of acquiring, a step
of determining
[0052] The first step of acquiring is carried out by the first acquisition sensor 12.
[0053] In the first step of acquiring, the previous reactions of the player are acquired.
[0054] This enables to obtain the reaction history of the player.
[0055] The second step of acquiring is carried out by the second acquisition sensor 14.
[0056] In the second step of acquiring, the previous reactions of other players than the
player himself are acquired.
[0057] This enables to obtain a collective reaction history.
[0058] For the merits of an example, in the following, a number of specific implementations
of the acquiring steps are described.
[0059] During the first acquiring step, the first acquisition sensor 12 measures different
reactions of a specific player. During the second acquiring step, the second acquisition
sensor 14 measures the previous reactions of other players. The reactions of the specific
player and/or of the other players comprise for example a content of an answer to
a technical question of the serious game, an evaluation of the answer to the technical
question (for example correct/incorrect or a grade), the time of response between
the receipt of the question and the receipt of the answer and/or the quality and/or
completeness of the answer. It can also be a specific behavior exhibited by the player,
in the game or in the real world as sensed by connected objects, and an evaluation
of the correctness of that behavior with respect to a predefined standard.
[0060] The first acquisition sensor 12 measures the answer or behavior by attributing a
value in a given data format to the measured answer. For example, the first acquisition
sensor 12 attributes a number of words/letters to the content of an answer to a technical
question the serious game, a rating number or a binary number to the evaluation of
the answer to the technical question, a time period in seconds to the time of response
between the receipt of the question and the receipt of the answer and/or a rating
number to the quality and/or completeness of the answer and/or a similarly quantified
evaluation of a specific player behavior.
[0061] The second acquisition sensor 14 measures the answer by computing aggregated metrics
on previous answers by other players, for example a mean value in a given data format
to the measured answers or an average probability of success for a given behavior
model. For example, the second acquisition sensor 14 attributes a mean number of words/letters
to the content of an answer to a technical question in the serious game, a mean rating
number or a binary number to the evaluation of the answer to the technical question,
a mean time period in seconds to the time of response between the receipt of the question
and the receipt of the answer and/or a mean rating number to the quality and/or completeness
of the answer.
[0062] In the proposed example, at the acquiring steps, the number of games acquired is
superior or equal to 11.
[0063] At the end of the acquiring steps, the reaction history of the player and the collective
reaction history are known.
[0064] The step of determining is carried out by the controller 16.
[0065] At the step of determining, the reward specific to the player is determined in function
of the reaction history of the player and of the collective reaction history.
[0066] In the described example, at the determining step, an optimization criterion is used.
[0067] For instance, the optimization criterion consists in maximizing the value of probability
of success of the player, a success being defined as the expected reactions.
[0068] In another example, the optimization criterion consists in maximizing the value of
probability of success of the whole set of players, the set of players being formed
by the player and the other players.
[0069] According to another embodiment, the optimization criterion consists in minimizing
the number of games to be played by the player to obtain a predefined threshold of
success to the game.
[0070] In another embodiment, the optimization criterion consists in minimizing the number
of games to be played by the set of players to obtain a predetermined threshold of
success to the game.
[0071] In another example, the optimization criterion consists in maximizing the minimum
success probability for all players over the set of possible rewardable actions.
[0072] When technically possible, the optimization criterion consists in using two or more
of the previous above-mentioned criteria.
[0073] The optimization criterion is, for instance, used in a heuristic or meta-heuristic
technique.
[0074] In the described example, the heuristic technique is an algorithm obtained by using
a reinforcement learning technique.
[0075] In particular, the heuristic algorithm is an ant colony algorithm.
[0076] As a specific example, the controller 16 uses the following definitions in order
to determine a reward value
R(
a, i, t), with player a, action i and time step t.
[0077] An action success rate
ASR(
i) is defined as:

[0078] An action failure proportion
AFP(
i) is defined as:

where α is positive real number.
[0079] A player success rate
PSR(
a) is defined as:

[0080] A player failure proportion
PFP(
a) is defined as:

where β is a positive real number.
[0081] Furthermore, a daily budget
DB of rewards for all players is defined.
Ut is defined as the unspent budget at time t.
[0082] A reward portion
RP(
a, i, t) is defined as:

[0083] Finally, a wealth factor
WF(
a, i, t) is defined as:

[0084] The controller 16 determines, in particular, the reward value
R(
a, i, t) according to the following equation:

[0085] The reward value
R(
a, i, t) corresponds to the optimum reward value that is to be attributed in response to
an action a of a player i. The final reward value is "optimum" in the way that it
respects the factors cited in the above definitions.
[0086] For example, the reward value respects the daily budget
DB and is specific to the player, depending on the reaction history of this player.
[0087] The computer-implemented method is, in particular, carried out at each game.
[0088] In other embodiment, the method is carried out at each time a reward is afforded
to the player.
[0089] For example, if the player answers a question correctly, he receives a specific type
or number of rewards with a value corresponding to the final reward value
R(
a, i, t)
.
[0090] This method for determining therefore enables to obtain dynamic rewards.
[0091] By contrast, in the documents of the prior art, in order to stimulate the will of
success of the players playing the serious game, predefined and static rewards are
allocated to each player. In particular, each player obtains statically predefined
rewards during the serious game, for example upon successful completion of a test
or exercise.
[0092] The attribution of dynamic rewards enables to avoid that the best players be too
easily rewarded and not stimulated while the less good players are not discouraged
and continues playing the serious game.
[0093] Thus, the method for determining a reward specific to each player of the serious
game augments the incentive to continue the game for each individual player.
[0094] This results in an increase of the number of good reactions provided by the set.
[0095] It should be emphasized that the method for determining is compatible with different
ways of playing to the game. In the current example, the method for determining is
used both in private and public contexts without any specific difficulty.
[0096] To increase such dynamic aspect of the method for determining, the method is carried
out at each game.
[0097] The method for determining can also be rendered more accurate by using at least one
of the following parameters.
[0098] According to an example, at the determining step, a total budget afforded to the
reward of the game is taken into account.
[0099] In another example, at the determining step, a corrective coefficient is used, the
corrective coefficient providing more rewards to the players with a probability of
success inferior or equal to a predefined threshold.
[0100] This means that using such corrective coefficient corresponds to apply a function
dynamically providing more rewards to the players with a lower probability of success.
[0101] In another embodiment, the interrogator system 10 comprises a plurality of entering
modules specific to each player and a server connected to each entering modules.
[0102] For instance, the entering modules are keyboards and the server is used as a collective
memory shared by each entering module. The server usually belongs to the provider
of the game.
[0103] According to a preferred embodiment, the method for determining a reward is calibrated
by using a method for simulating a reaction model which is now described.
[0104] A reaction model is simulating the reactions of players playing in a repeated way
to the game.
[0105] Such method for simulating is carried out by using the interrogator system 10 of
figure 1.
[0106] The method for simulating comprises a step of acquiring, a step of selecting and
a step of obtaining.
[0107] The step of acquiring of the method for simulating is similar to the second step
of acquiring of the method for determining. The remarks relative to the second step
of acquiring therefore do apply here and are not repeated.
[0108] A collective reaction history is thus obtained.
[0109] For instance, the step of selecting is carried out by the controller 16.
[0110] At the step of selecting, items provided by the game are selected to form sets of
items.
[0111] By definition, a set of items gather the items provided by the game implying with
the same level of difficulty an expected reaction.
[0112] In other words, a specific set of items is a group of items that is expected to imply
the same reaction of a player 12, at a specific level of difficulty of a question
or decision context in the game.
[0113] The obtaining step is carried out by the controller 16.
[0114] At the obtaining step, the reaction model is obtained based on the selected sets
of items and the collective reaction history.
[0115] The obtaining step is taking into account a plurality of parameters which are now
described, bearing in mind that in other embodiments, only some of the described parameters
are taken into accounts.
[0116] In the specific example which is described, at the obtaining step, the reaction model
is based on the initial reaction of the set of players to the at least one item.
[0117] The reaction model is further based on the rewards attributed to the players.
[0118] The reaction model also depends on the total rewards attributed to one player.
[0119] Furthermore, in the described example, a parameter used to obtain the reaction model
is the sum of the initial response of the set of players to the at least one item
and the number of rewards obtained by the set of players.
[0120] A limiting constant is also used to prevent simulated success probabilities to reach
unrealistically high values:

[0121] Where:
- δ is an adjustment constant empirically set to 0.1, and
- Di represents the intrinsic difficulty of an action i.
[0122] In addition, at the obtaining step, the reaction model is based on the desired numbers
of expected reactions, the desired number of expected reactions being provided for
each item of the game.
[0123] An individual reaction profile is also provided for each player and at the determining
obtaining step, the reaction model is based on the individual reaction profiles.
[0124] This results in a reaction model depending on an exponential function.
[0125] More precisely, the reaction model has a general form

A and B being constants.
[0126] In the described example, the reaction model corresponds to the graph represented
in figure 2 and is defined as follows :

with :
- a: player;
- i: action;
- t: time step;
- Di: intrinsic difficulty of an action i;
- Pa,i: intrinsic propensity of the player a for action i;
- Ta,i,t: accumulated rewards by player a for action i from 0 to t;
- δ boundary factor;
- γ: constant, and
- Amp: target amplitude.
[0127] The success
Psuccess of the player a is a function of characteristics of the player a and of the completed
actions i. The success
Psuccess corresponds to the reaction of the player a, as simulated by the reaction model of
the reaction model module.
[0128] The success
Psuccess of the player a presents a maximum value
Pmax which corresponds to 100% (corresponding to
Psuccess = 1 in the graph of figure 2) minus a boundary distance
δ · Di. The boundary distance implies that the simulated success
Psuccess of the player a is susceptible to never attain the value of 100%.
[0129] As shown in figure 2, the minimum value of success
Psuccess corresponds to 1-Di with Di being the intrinsic difficulty of an action i. For example,
if a question or problem is difficult, the intrinsic difficulty may be 0,9. The minimum
success may then be 1 - 0,9 = 0,1. In a first portion of the graph G of figure 2,
the success
Psuccess is susceptible to raise constantly according to the constant
γ. In a second portion of graph G, the success
Psuccess is susceptible to raise logarithmically, with a threshold at
Pmax.
[0130] The shown reaction model models or simulates the behavior of a specific player a
by taking into account both two elements. The first element is the natural propensity
to perform behaviors as expected (
Pa,i), incarnated by a default offset on the x-axis. The second element is the amount
of rewards received so far for that particular action, incarnated by a second offset
on the x-axis which fuels the variations of the success probability.
[0131] The second offset is dynamical because rewards come and go: naturally, they come
when players are successful but they also fade away progressively thanks to an evaporation
mechanism that represents the decreasing character of the psychological impact rewards
have on players over time.
[0132] Such method for simulating can be used for calibrating the method for determining.
[0133] For instance, the method for simulating is used in a completing step between the
second step of the method for determining and the step of determining the reward specific
to the player.
[0134] In such example, the method for determining further comprises a step of completing
the acquired collective reaction history by a set of reactions simulated by using
the model obtained when carrying out the method for simulating, to obtain a completed
collective reaction history.
[0135] The step of determining is the reward specific to the player in function of the reaction
history of the player and of the collective reaction history.
[0136] This enables to obtain a more accurate determination of the reward attributed to
the player.
1. A computer-implemented method for determining a reward specific to a player, the player
playing in a repeated way to a game, notably a serious game, the game evaluating the
reactions of the players to at least one item provided by the game, the reactions
being acquired by using an interrogator system (10) shared with other players, a reward
being attributed to the player in function of his reactions, the method for determining
the reward comprising at least the following steps:
- acquiring the previous reactions of the player, to obtain a reaction history of
the player,
- acquiring the previous reactions of other players, to obtain a collective reaction
history, and
- determining the reward specific to the player in function of the reaction history
of the player and of the collective reaction history.
2. The method according to claim 1, wherein, at the determining step, an optimization
criterion is used.
3. The method according to claim 2, wherein, the optimization criterion is one criterion
or a combination of criteria chosen in the group consisting of :
- maximizing the value of probability of success of the player, a success being defined
as the expected reactions,
- maximizing the value of probability of success of the whole set of players, the
set of players being formed by the player and the other players,
- minimizing the number of games to be played by the player to obtain a predefined
threshold of success to the game,
- minimizing the number of games to be played by the set of players to obtain a predetermined
threshold of success to the game, and
- maximizing the minimum success probability over the set of rewardable actions, for
all players
4. The method according to any one of the claims 1 to 3, wherein, at the determining
step, a heuristic technique is used.
5. The method according to claim 4, wherein the heuristic technique is an algorithm obtained
by using a reinforcement learning technique.
6. The method according to claim 4 or 5, wherein the heuristic technique is an ant colony
algorithm.
7. The method according to any one of the claims 1 to 6, wherein at the acquiring steps,
the number of games acquired is superior or equal to 11.
8. The method according to any one of the claims 1 to 7, wherein at the determining step,
a total budget afforded to the reward of the game is taken into account.
9. The method according to any one of the claims 1 to 8, wherein at the determining step,
a corrective coefficient is used, the corrective coefficient providing more rewards
to the players with a probability of success inferior or equal to a predefined threshold.
10. The method according to any one of the claims 1 to 9, wherein the method is carried
out at each game.
11. The method according to any one of the claims 1 to 10, wherein the interrogator system
(10) comprises a connected object.
12. A computer program product comprising instructions for carrying out the steps of a
method according to any one of claims 1 to 11 when said computer program product is
executed on a suitable computer device.
13. A computer readable medium having encoded thereon a computer program according to
claim 12.
14. An interrogator system (10) of a game, notably a serious game, played by a set of
players, the game evaluating the reactions of the players to at least one item provided
by the game, the reactions being acquired by using the interrogator system (10) shared
with other players, a reward being attributed to the player in function of his reactions,
the interrogator system (10) comprising:
- a first acquisition sensor (12) adapted to acquire the previous reactions of the
player, to obtain a reaction history of the player,
- a second acquisition sensor (14) adapted to acquire the previous reactions of other
players, to obtain a collective reaction history, and
- a controller (16) adapted to determine the reward specific to the player in function
of the reaction history of the player and of the collective reaction history.
15. Interrogator system according to claim 14, wherein the interrogator system (10) comprises
a plurality of entering modules specific to each player and a server connected to
each entering module.