TECHNICAL FIELD
[0001] The present invention relates to method and devices for classifying activity of a
user, especially using sound information.
BACKGROUND
[0002] With the rapid development of the mobile terminals such as mobile phones, more and
more functionalities are incorporated inside the terminal. One feature is to detect
motion of the terminal and thereby the motion and activity of the user.
[0003] Activity recognition, i.e. classifying how a user is moving, e.g. sitting, running,
walking, riding a car etc., is currently done mainly using accelerometer sensors and
in some cases location sensors or video. Activity recognition in handsets is a problem
since it may consume a lot of power and also has limited accuracy. This invention
tries to solve this by using body microphones to capture sound of vibrations transported
through the user's body to improve accuracy and/or reduce power consumption. It improves
accuracy compared to using only accelerometer or microphones recording external (non-body)
sounds.
SUMMARY
[0004] The present invention provides a solution to aforementioned problem by using body
attached microphones to capture sound of vibrations transported through the user's
body to improve accuracy and/or reduce power consumption.
[0005] Thus, the invention relates to a method for classifying an activity of an object,
the method comprising: receiving a sound signal from a sensor, determining type of
sound based on the sound signal, and determining the activity based on the type of
sound. The sound data corresponds to vibrations from the object. According to one
embodiment the sound receiver is a microphone attached to a person and facing skin
of the person. The sensor further comprises a motion detector. The method further
comprises comparing the sound signal with a number of sound signals stored in a memory,
which includes a plurality of sound types and a plurality of attributes associated
with each sound type. Each attribute comprises a predefined value and each sound type
is associated with each attribute. Each sound type is associated with each attribute
in accordance with Bayesian's rule, such that a conditional probability of each sound
type is defined for an occurrence of each attribute. The attributes may consist of
one or several of: histogram features, linear predictive coding, cepstral coefficients,
short-time Fourier transform, timbre, zero-crossing rate, short-time energy, root-mean-square
energy, high/low feature value ratio, spectrum centroid, spectrum spread, or spectral
roll-off frequency.
[0006] The invention also relates to a device for classifying an activity of a person, the
device comprising: a receiver for receiving a sound signal from a sensor, and a controller,
characterised in that the controller is configured to process the sound signal and determine type of sound
based on the sound signal, and determine the activity based on the type of sound.
The sound signals are received from one or several microphones attached to the person.
The microphones are arranged facing skin of the person. The device may further receive
motion data from one or several motion detectors. The controller is further configured
to compare the sound signal with a number of sound signals stored in a memory, which
includes a plurality of sound types and a plurality of attributes associated with
each sound type, each attribute comprising a predefined value and each sound type
is associated with each attribute, each sound type is associated with each attribute
in accordance with Bayesian's rule, such that a conditional probability of each sound
type is defined for an occurrence of each attribute.
[0007] The invention also relates to a mobile communication terminal comprising a device
as mentioned above.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] Reference is made to the attached drawings, wherein elements having the same reference
number designation may represent like elements throughout.
Fig. 1 is a diagram of an exemplary arrangement in which methods and systems described
herein may be implemented;
Fig. 2 is a diagram of an exemplary system in which methods and systems described
herein may be implemented;
Fig. 3 is a diagram of an exemplary sensor device according to one embodiment of the
invention; and
Fig. 4 is a diagram over the steps of an exemplary embodiment according to the invention.
DETAILED DESCRIPTION
[0009] The following detailed description refers to the accompanying drawings. The same
reference numbers in different drawings may identify the same or similar elements.
[0010] The term "image," as used herein, may refer to a digital or an analog representation
of visual information (e.g., a picture, a video, a photograph, animations, etc.)
[0011] The term "audio" as used herein, may include may refer to a digital or an analog
representation of audio information (e.g., a recorded voice, a song, an audio book,
etc.)
[0012] Also, the following detailed description does not limit the invention. Instead, the
scope of the invention is defined by the appended claims and equivalents.
[0013] The basic idea of the invention is to record sound waves internally transported through
the body of a user itself. This makes it suitable to also recognize activities that
do not generate distinct external sounds, e.g. walking or running. It also makes it
less susceptible to ambient noise and thus provides higher accuracy.
[0014] The microphone(s) can be placed, e.g. using a holder on the body of a user. The microphones
may be provided facing the body and in direct contact with the skin. The activity
classification itself can be done in a sensor and then communicated to the terminal
to be used in applications. The sound type detection may be carried on in a lower
level feature detection, which is then communicated to the terminal where the actual
activity classification is done.
[0015] The audio and accelerometer and audio data is preprocessed to extract features and
then fed to the classifier, which can be an assembly of classifiers, which then generates
a classification. The specific classification method used, e.g. bayesian, neural networks
etc, is an implementation detail.
[0016] Fig. 1 is a diagram of an exemplary arrangement 100 (internal) in which methods and
systems described herein may be implemented. Arrangement 100 may include a bus 110,
a processor 120, a memory 130, a read only memory (ROM) 140, a storage device 150,
an input device 160, an output device 170, and a communication interface 180. Bus
110 permits communication among the components of arrangement 100. Arrangement 100
may also include one or more power supplies (not shown). One skilled in the art would
recognize that arrangement 100 may be configured in a number of other ways and may
include other or different elements.
[0017] Processor 120 may include any type of processor or microprocessor that interprets
and executes instructions. Processor 120 may also include logic that is able to decode
media, such as audio and audio files, etc., and generate output to, for example, a
speaker, a display, etc. Memory 130 may include a random access memory (RAM) or another
dynamic storage device that stores information and instructions for execution by processor
120. Memory 130 may also be used to store temporary variables or other intermediate
information during execution of instructions by processor 120.
[0018] ROM 140 may include a conventional ROM device and/or another static storage device
that stores static information and instructions for processor 120. Storage device
150 may include a flash memory (e.g., an electrically erasable programmable read only
memory (EEPROM)) device for storing information and instructions.
[0019] Input device 160 may include one or more conventional mechanisms that permit a user
to input information to the arrangement 100, such as a keyboard, a keypad, a directional
pad, a mouse, a pen, voice recognition, a touch-screen and/or biometric mechanisms,
etc. Output device 170 may include one or more conventional mechanisms that output
information to the user, including a display, a printer, one or more speakers, etc.
Communication interface 180 may include any transceiver-like mechanism that enables
arrangement 100 to communicate with other devices and/or systems. For example, communication
interface 180 may include a modem or an Ethernet interface to a LAN.
[0020] Alternatively, or additionally, communication interface 180 may include other mechanisms
for communicating via a network, such as a wireless network. For example, communication
interface may include a radio frequency (RF) transmitter and receiver and one or more
antennas for transmitting and receiving RF data.
[0021] Arrangement 100, consistent with the invention, provides a platform through which
audible information and motion information may be interpreted to activity information.
Arrangement 100 may also display information associated with the activity to the user
of arrangement 100 in a graphical format or provided to a third part system. According
to an exemplary implementation, arrangement 100 may perform various processes in response
to processor 120 executing sequences of instructions contained in memory 130. Such
instructions may be read into memory 130 from another computer-readable medium, such
as storage device 150, or from a separate device via communication interface 180.
It should be understood that a computer-readable medium may include one or more memory
devices or carrier waves. Execution of the sequences of instructions contained in
memory 130 causes processor 120 to perform the acts that will be described hereafter.
In alternative embodiments, hard-wired circuitry may be used in place of or in combination
with software instructions to implement aspects consistent with the invention. Thus,
the invention is not limited to any specific combination of hardware circuitry and
software.
[0022] Fig. 2 illustrates a system 200 according to the invention. The system 200 comprises
a mobile terminal 210, such as a mobile radio phone, and a number of sensors 220 attached
to a user 250.
[0023] The mobile terminal 210 may comprise an arrangement according to Fig. 1 as described
earlier. The sensor 220 is described in more detail in the embodiment of Fig. 3.
[0024] Fig. 3 is a diagram of an exemplary embodiment of a sensor 220. The sensor 220 comprises
a housing 221, inside which a microphone 222, a motion sensor 223, a controller 224
and a transceiver 225 are arranged. A power source and other electrical portions,
such as memory, may also be arranged inside the housing but are not illustrated for
clarity reasons.
[0025] The housing 221 may be provided on an attachment portion 225, such as strap or band.
The attachment portion 225 allows the senor portion to be attached to a body part
of user.
[0026] The attachment portion may comprise VELCRO fastening band, or any other type of fastening,
which in one embodiment may allow the user to attach the sensor 220 to a body part,
such as wrist, ankle, chest etc. The senor may also be integrated in or attached to
a watch, closing, socks, gloves, etc.
[0027] The microphone 222, in one embodiment facing the skin of the user, records sound
waves internally transported through the body of the user itself, which allows recognizing
activities that do not generate distinct external sounds, e.g. body activities such
as running or walking. It also makes it less susceptible to ambient noise and thus
provides higher accuracy.
[0028] The motion sensor 223, such as accelerometer, gyro etc., allows detecting movement
of the user.
[0029] In one embodiment, the sensor 220 may only record sound, i.e. only comprise microphone
or in lack of motion only use microphone. In one embodiment both the microphone and
the motion sensor are in MEMS (Microelectromechanical systems).
[0030] The control 224 receives signals from the microphone 222 and motion sensor 223 and,
depending on the configuration, may process the signals or transmit them to the mobile
terminal. The controller 224 may include any type of processor or microprocessor that
interprets and executes instructions. The controller may also include logic that is
able to decode media, such as audio and audio files, etc., and generate output to,
for example, a speaker, a display, etc. The controller may also include onboard memory
for storing information and instructions for execution by the controller.
[0031] The transceiver 225, which may include an antenna (not shown), may use wireless communication
including radio signals, such as Bluetooth, Wi-Fi, or IR or wired communication, mainly
to transmit signals to the terminal 210 (or other devices).
[0032] With reference now to Figs. 2, 3 and 4, in operation, according to one embodiment,
the microphone 222 in contact with user 250 skin of the sensor 220 receives (1) sound
waves, which are converted to electrical signals and provided to the controller 224.
If the sensor 220 is used to classify activity, parts of arrangement 100 may be incorporated
therein. The controller may store the sound signal. A memory may also store a sound
database, which includes a plurality of sound types and a plurality of attributes
associated with each sound type. Each attribute may have a predefined value and each
sound type may be associated with each attribute in accordance with, e.g. Bayesian's
rule, such that a conditional probability of each sound type is defined for an occurrence
of each attribute. The attributes may consist of: histogram features, linear predictive
coding, cepstral coefficients, short-time Fourier transform, timbre, zero-crossing
rate, short-time energy, root-mean-square energy, high/low feature value ratio, spectrum
centroid, spectrum spread, spectral roll-off frequency, etc. Other determination methods
using neural networks, or the like, comparison methods may also be used to determine
the type of sound.
[0033] A more accurate classification may be obtained using the signal from the motion detector
223. Different motions, e.g. walking, running, dancing etc. have different movement
characteristics and give
[0034] The senor 222 may also be provided with other detectors, e.g. pulsimeter, heartbeat
meter, temperature meter, etc.
[0035] When the type of sound is determined (2), the activity classification, irrespective
of where (sensor, terminal, network) it is carried out, may comprise comparing the
sound type data (and motion data and other relevant data) with stored data in a database,
or use Bayesian, neural network methods to classify (3) the activity. The classification
may be carried out in the senor or the data is provided to the mobile terminal or
a network device for classification.
[0036] In one example, the user may have two sensors, as in Fig. 2, one attached to wrist
and ankle. During a walk, the motion sensor has a lower movement pace and the microphones
pick up sound e.g. from the ankle and wrist. The vibrations during the walk are lower.
If the user starts running, the vibrations, especially from the ankle microphone will
increase and also the movement pace.
[0037] It should be noted that the word "comprising" does not exclude the presence of other
elements or steps than those listed and the words "a" or "an" preceding an element
do not exclude the presence of a plurality of such elements. It should further be
noted that any reference signs do not limit the scope of the claims, that the invention
may be implemented at least in part by means of both hardware and software, and that
several "means", "units" or "devices" may be represented by the same item of hardware.
[0038] A "device" as the term is used herein, is to be broadly interpreted to include a
radiotelephone having ability for receiving and processing sound and other data. The
device may also be a sound recorder, global positioning system (GPS) receiver; a personal
communications system (PCS) terminal that may combine a cellular radiotelephone with
data processing; a personal digital assistant (PDA); a laptop; a camera (e.g., video
and/or still image camera) having communication ability; and any other computation
or communication device capable of transceiving, such as a personal computer, a home
entertainment system, a television, etc.
[0039] The various embodiments of the present invention described herein is described in
the general context of method steps or processes, which may be implemented in one
embodiment by a computer program product, embodied in a computer-readable medium,
including computer-executable instructions, such as program code, executed by computers
in networked environments. A computer-readable medium may include removable and non-removable
storage devices including, but not limited to, Read Only Memory (ROM), Random Access
Memory (RAM), compact discs (CDs), digital versatile discs (DVD), etc. Generally,
program modules may include routines, programs, objects, components, data structures,
etc. that perform particular tasks or implement particular abstract data types. Computer-executable
instructions, associated data structures, and program modules represent examples of
program code for executing steps of the methods disclosed herein. The particular sequence
of such executable instructions or associated data structures represents examples
of corresponding acts for implementing the functions described in such steps or processes.
[0040] Software and web implementations of various embodiments of the present invention
can be accomplished with standard programming techniques with rule-based logic and
other logic to accomplish various database searching steps or processes, correlation
steps or processes, comparison steps or processes and decision steps or processes.
It should be noted that the words "component" and "module," as used herein and in
the following claims, is intended to encompass implementations using one or more lines
of software code, and/or hardware implementations, and/or equipment for receiving
manual inputs. The foregoing description of embodiments of the present invention,
have been presented for purposes of illustration and description. The foregoing description
is not intended to be exhaustive or to limit embodiments of the present invention
to the precise form disclosed, and modifications and variations are possible in light
of the above teachings or may be acquired from practice of various embodiments of
the present invention. The embodiments discussed herein were chosen and described
in order to explain the principles and the nature of various embodiments of the present
invention and its practical application to enable one skilled in the art to utilize
the present invention in various embodiments and with various modifications as are
suited to the particular use contemplated. The features of the embodiments described
herein may be combined in all possible combinations of methods, apparatus, modules,
systems, and computer program products.
[0041] Other solutions, uses, objectives, and functions within the scope of the invention
as claimed in the below described patent claims should be apparent for the person
skilled in the art.
1. A method for classifying an activity of a person, the method comprising:
• receiving (1) a sound signal from a sensor (222),
• determining type of sound based on said sound signal, and
• determining said activity based on said type of sound.
2. The method of claim 1, said sensor (222) is a microphone facing body of a user (250).
3. The method of claim 1 or 2, wherein said sound data corresponds to vibrations transported
through the body.
4. The method according to any or previous claims, wherein said sensor further comprises
a motion detector (223).
5. The method according to any or previous claims, further comprising comparing said
sound signal with a number of sound signals stored in a memory, which includes a plurality
of sound types and a plurality of attributes associated with each sound type.
6. The method according to any of previous claims, further comprising using Bayesian
rules or neural network.
7. The method according to claim 5, wherein each attribute comprises a predefined value
and each sound type is associated with each attribute.
8. The method according to claim 6, wherein each sound type is associated with each attribute
in accordance with Bayesian's rule, such that a conditional probability of each sound
type is defined for an occurrence of each attribute.
9. The method according to any of previous claims, wherein said attributes consist of
one or several of: histogram features, linear predictive coding, cepstral coefficients,
short-time Fourier transform, timbre, zero-crossing rate, short-time energy, root-mean-square
energy, high/low feature value ratio, spectrum centroid, spectrum spread, or spectral
roll-off frequency.
10. A device (100) for classifying an activity of a person (250), the device (100) comprising:
a receiver for receiving a sound signal from a sensor (220), and a controller (120),
characterised in that said sensor is arranged receiving a sound wave and output a sound signal and the
controller (120) is configured to process said sound signal and determine type of
sound based on said sound signal, and determine said activity based on said type of
sound.
11. The device of claim 10, wherein said sound signals are received from one or several
microphones attached to said person.
12. The device of claim 11, wherein said microphones are arranged facing skin of said
person corresponding to vibrations transported through a body of the person.
13. The device according to any of claims 10-12, further receiving motion data from one
or several motion detectors.
14. The device according to any of claims 10-13, wherein said controller is configured
to compare said sound signal with a number of sound signals stored in a memory, which
includes a plurality of sound types and a plurality of attributes associated with
each sound type, each attribute comprising a predefined value and each sound type
is associated with each attribute, each sound type is associated with each attribute
in accordance with Bayesian's rule, such that a conditional probability of each sound
type is defined for an occurrence of each attribute.
15. The device according to any of claims 10-14, wherein said attributes consist of one
or several of: histogram features, linear predictive coding, cepstral coefficients,
short-time Fourier transform, timbre, zero-crossing rate, short-time energy, root-mean-square
energy, high/low feature value ratio, spectrum centroid, spectrum spread, or spectral
roll-off frequency.
16. A mobile communication terminal comprising a device according to any of claims 10-15.