Activity classification - Patent 2636371

(19)

(11)

EP 2 636 371 A1

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	11.09.2013 Bulletin 2013/37

(21)	Application number: 12158835.4

(22)	Date of filing: 09.03.2012

(51)

International Patent Classification (IPC):

A61B 5/11^(2006.01)
G06K 9/00^(2006.01)

A61B 7/00^(2006.01)

(84)	Designated Contracting States:
	AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
	Designated Extension States:
	BA ME

(71)	Applicant: Sony Mobile Communications AB
	221 88 Lund (SE)

(72)	Inventors:
	Thörn, Karl Ola 216 12 Limhamn (SE) Mårtensson, Linus 246 50 Löddeköpinge (SE) Bengtsson, Henrik 227 36 Lund (SE) Aronsson, Pär-Andersson 214 24 Malmö (SE) Jonsson, Håkan 245 65 Hjärup (SE)

(74)	Representative: Valea AB
	Lindholmspiren 5 417 56 Göteborg 417 56 Göteborg (SE)

(54)	Activity classification

(57) The present invention relates to a method and device for classifying an activity of an object, the method comprising: receiving a sound signal from a sensor, determining type of sound based on said sound signal, and determining said activity based on said type of sound.

Description

TECHNICAL FIELD

[0001] The present invention relates to method and devices for classifying activity of a user, especially using sound information.

BACKGROUND

[0002] With the rapid development of the mobile terminals such as mobile phones, more and more functionalities are incorporated inside the terminal. One feature is to detect motion of the terminal and thereby the motion and activity of the user.

[0003] Activity recognition, i.e. classifying how a user is moving, e.g. sitting, running, walking, riding a car etc., is currently done mainly using accelerometer sensors and in some cases location sensors or video. Activity recognition in handsets is a problem since it may consume a lot of power and also has limited accuracy. This invention tries to solve this by using body microphones to capture sound of vibrations transported through the user's body to improve accuracy and/or reduce power consumption. It improves accuracy compared to using only accelerometer or microphones recording external (non-body) sounds.

SUMMARY

[0004] The present invention provides a solution to aforementioned problem by using body attached microphones to capture sound of vibrations transported through the user's body to improve accuracy and/or reduce power consumption.

[0005] Thus, the invention relates to a method for classifying an activity of an object, the method comprising: receiving a sound signal from a sensor, determining type of sound based on the sound signal, and determining the activity based on the type of sound. The sound data corresponds to vibrations from the object. According to one embodiment the sound receiver is a microphone attached to a person and facing skin of the person. The sensor further comprises a motion detector. The method further comprises comparing the sound signal with a number of sound signals stored in a memory, which includes a plurality of sound types and a plurality of attributes associated with each sound type. Each attribute comprises a predefined value and each sound type is associated with each attribute. Each sound type is associated with each attribute in accordance with Bayesian's rule, such that a conditional probability of each sound type is defined for an occurrence of each attribute. The attributes may consist of one or several of: histogram features, linear predictive coding, cepstral coefficients, short-time Fourier transform, timbre, zero-crossing rate, short-time energy, root-mean-square energy, high/low feature value ratio, spectrum centroid, spectrum spread, or spectral roll-off frequency.

[0006] The invention also relates to a device for classifying an activity of a person, the device comprising: a receiver for receiving a sound signal from a sensor, and a controller, characterised in that the controller is configured to process the sound signal and determine type of sound based on the sound signal, and determine the activity based on the type of sound. The sound signals are received from one or several microphones attached to the person. The microphones are arranged facing skin of the person. The device may further receive motion data from one or several motion detectors. The controller is further configured to compare the sound signal with a number of sound signals stored in a memory, which includes a plurality of sound types and a plurality of attributes associated with each sound type, each attribute comprising a predefined value and each sound type is associated with each attribute, each sound type is associated with each attribute in accordance with Bayesian's rule, such that a conditional probability of each sound type is defined for an occurrence of each attribute.

[0007] The invention also relates to a mobile communication terminal comprising a device as mentioned above.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] Reference is made to the attached drawings, wherein elements having the same reference number designation may represent like elements throughout.

Fig. 1 is a diagram of an exemplary arrangement in which methods and systems described herein may be implemented;

Fig. 2 is a diagram of an exemplary system in which methods and systems described herein may be implemented;

Fig. 3 is a diagram of an exemplary sensor device according to one embodiment of the invention; and

Fig. 4 is a diagram over the steps of an exemplary embodiment according to the invention.

DETAILED DESCRIPTION

[0009] The following detailed description refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements.

[0010] The term "image," as used herein, may refer to a digital or an analog representation of visual information (e.g., a picture, a video, a photograph, animations, etc.)

[0011] The term "audio" as used herein, may include may refer to a digital or an analog representation of audio information (e.g., a recorded voice, a song, an audio book, etc.)

[0012] Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents.

[0013] The basic idea of the invention is to record sound waves internally transported through the body of a user itself. This makes it suitable to also recognize activities that do not generate distinct external sounds, e.g. walking or running. It also makes it less susceptible to ambient noise and thus provides higher accuracy.

[0014] The microphone(s) can be placed, e.g. using a holder on the body of a user. The microphones may be provided facing the body and in direct contact with the skin. The activity classification itself can be done in a sensor and then communicated to the terminal to be used in applications. The sound type detection may be carried on in a lower level feature detection, which is then communicated to the terminal where the actual activity classification is done.

[0015] The audio and accelerometer and audio data is preprocessed to extract features and then fed to the classifier, which can be an assembly of classifiers, which then generates a classification. The specific classification method used, e.g. bayesian, neural networks etc, is an implementation detail.

[0016] Fig. 1 is a diagram of an exemplary arrangement 100 (internal) in which methods and systems described herein may be implemented. Arrangement 100 may include a bus 110, a processor 120, a memory 130, a read only memory (ROM) 140, a storage device 150, an input device 160, an output device 170, and a communication interface 180. Bus 110 permits communication among the components of arrangement 100. Arrangement 100 may also include one or more power supplies (not shown). One skilled in the art would recognize that arrangement 100 may be configured in a number of other ways and may include other or different elements.

[0017] Processor 120 may include any type of processor or microprocessor that interprets and executes instructions. Processor 120 may also include logic that is able to decode media, such as audio and audio files, etc., and generate output to, for example, a speaker, a display, etc. Memory 130 may include a random access memory (RAM) or another dynamic storage device that stores information and instructions for execution by processor 120. Memory 130 may also be used to store temporary variables or other intermediate information during execution of instructions by processor 120.

[0018] ROM 140 may include a conventional ROM device and/or another static storage device that stores static information and instructions for processor 120. Storage device 150 may include a flash memory (e.g., an electrically erasable programmable read only memory (EEPROM)) device for storing information and instructions.

[0019] Input device 160 may include one or more conventional mechanisms that permit a user to input information to the arrangement 100, such as a keyboard, a keypad, a directional pad, a mouse, a pen, voice recognition, a touch-screen and/or biometric mechanisms, etc. Output device 170 may include one or more conventional mechanisms that output information to the user, including a display, a printer, one or more speakers, etc. Communication interface 180 may include any transceiver-like mechanism that enables arrangement 100 to communicate with other devices and/or systems. For example, communication interface 180 may include a modem or an Ethernet interface to a LAN.

[0020] Alternatively, or additionally, communication interface 180 may include other mechanisms for communicating via a network, such as a wireless network. For example, communication interface may include a radio frequency (RF) transmitter and receiver and one or more antennas for transmitting and receiving RF data.

[0021] Arrangement 100, consistent with the invention, provides a platform through which audible information and motion information may be interpreted to activity information. Arrangement 100 may also display information associated with the activity to the user of arrangement 100 in a graphical format or provided to a third part system. According to an exemplary implementation, arrangement 100 may perform various processes in response to processor 120 executing sequences of instructions contained in memory 130. Such instructions may be read into memory 130 from another computer-readable medium, such as storage device 150, or from a separate device via communication interface 180. It should be understood that a computer-readable medium may include one or more memory devices or carrier waves. Execution of the sequences of instructions contained in memory 130 causes processor 120 to perform the acts that will be described hereafter. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement aspects consistent with the invention. Thus, the invention is not limited to any specific combination of hardware circuitry and software.

[0022] Fig. 2 illustrates a system 200 according to the invention. The system 200 comprises a mobile terminal 210, such as a mobile radio phone, and a number of sensors 220 attached to a user 250.

[0023] The mobile terminal 210 may comprise an arrangement according to Fig. 1 as described earlier. The sensor 220 is described in more detail in the embodiment of Fig. 3.

[0024] Fig. 3 is a diagram of an exemplary embodiment of a sensor 220. The sensor 220 comprises a housing 221, inside which a microphone 222, a motion sensor 223, a controller 224 and a transceiver 225 are arranged. A power source and other electrical portions, such as memory, may also be arranged inside the housing but are not illustrated for clarity reasons.

[0025] The housing 221 may be provided on an attachment portion 225, such as strap or band. The attachment portion 225 allows the senor portion to be attached to a body part of user.

[0026] The attachment portion may comprise VELCRO fastening band, or any other type of fastening, which in one embodiment may allow the user to attach the sensor 220 to a body part, such as wrist, ankle, chest etc. The senor may also be integrated in or attached to a watch, closing, socks, gloves, etc.

[0027] The microphone 222, in one embodiment facing the skin of the user, records sound waves internally transported through the body of the user itself, which allows recognizing activities that do not generate distinct external sounds, e.g. body activities such as running or walking. It also makes it less susceptible to ambient noise and thus provides higher accuracy.

[0028] The motion sensor 223, such as accelerometer, gyro etc., allows detecting movement of the user.

[0029] In one embodiment, the sensor 220 may only record sound, i.e. only comprise microphone or in lack of motion only use microphone. In one embodiment both the microphone and the motion sensor are in MEMS (Microelectromechanical systems).

[0030] The control 224 receives signals from the microphone 222 and motion sensor 223 and, depending on the configuration, may process the signals or transmit them to the mobile terminal. The controller 224 may include any type of processor or microprocessor that interprets and executes instructions. The controller may also include logic that is able to decode media, such as audio and audio files, etc., and generate output to, for example, a speaker, a display, etc. The controller may also include onboard memory for storing information and instructions for execution by the controller.

[0031] The transceiver 225, which may include an antenna (not shown), may use wireless communication including radio signals, such as Bluetooth, Wi-Fi, or IR or wired communication, mainly to transmit signals to the terminal 210 (or other devices).

[0032] With reference now to Figs. 2, 3 and 4, in operation, according to one embodiment, the microphone 222 in contact with user 250 skin of the sensor 220 receives (1) sound waves, which are converted to electrical signals and provided to the controller 224. If the sensor 220 is used to classify activity, parts of arrangement 100 may be incorporated therein. The controller may store the sound signal. A memory may also store a sound database, which includes a plurality of sound types and a plurality of attributes associated with each sound type. Each attribute may have a predefined value and each sound type may be associated with each attribute in accordance with, e.g. Bayesian's rule, such that a conditional probability of each sound type is defined for an occurrence of each attribute. The attributes may consist of: histogram features, linear predictive coding, cepstral coefficients, short-time Fourier transform, timbre, zero-crossing rate, short-time energy, root-mean-square energy, high/low feature value ratio, spectrum centroid, spectrum spread, spectral roll-off frequency, etc. Other determination methods using neural networks, or the like, comparison methods may also be used to determine the type of sound.

[0033] A more accurate classification may be obtained using the signal from the motion detector 223. Different motions, e.g. walking, running, dancing etc. have different movement characteristics and give

[0034] The senor 222 may also be provided with other detectors, e.g. pulsimeter, heartbeat meter, temperature meter, etc.

[0035] When the type of sound is determined (2), the activity classification, irrespective of where (sensor, terminal, network) it is carried out, may comprise comparing the sound type data (and motion data and other relevant data) with stored data in a database, or use Bayesian, neural network methods to classify (3) the activity. The classification may be carried out in the senor or the data is provided to the mobile terminal or a network device for classification.

[0036] In one example, the user may have two sensors, as in Fig. 2, one attached to wrist and ankle. During a walk, the motion sensor has a lower movement pace and the microphones pick up sound e.g. from the ankle and wrist. The vibrations during the walk are lower. If the user starts running, the vibrations, especially from the ankle microphone will increase and also the movement pace.

[0037] It should be noted that the word "comprising" does not exclude the presence of other elements or steps than those listed and the words "a" or "an" preceding an element do not exclude the presence of a plurality of such elements. It should further be noted that any reference signs do not limit the scope of the claims, that the invention may be implemented at least in part by means of both hardware and software, and that several "means", "units" or "devices" may be represented by the same item of hardware.

[0038] A "device" as the term is used herein, is to be broadly interpreted to include a radiotelephone having ability for receiving and processing sound and other data. The device may also be a sound recorder, global positioning system (GPS) receiver; a personal communications system (PCS) terminal that may combine a cellular radiotelephone with data processing; a personal digital assistant (PDA); a laptop; a camera (e.g., video and/or still image camera) having communication ability; and any other computation or communication device capable of transceiving, such as a personal computer, a home entertainment system, a television, etc.

[0039] The various embodiments of the present invention described herein is described in the general context of method steps or processes, which may be implemented in one embodiment by a computer program product, embodied in a computer-readable medium, including computer-executable instructions, such as program code, executed by computers in networked environments. A computer-readable medium may include removable and non-removable storage devices including, but not limited to, Read Only Memory (ROM), Random Access Memory (RAM), compact discs (CDs), digital versatile discs (DVD), etc. Generally, program modules may include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps or processes.

[0040] Software and web implementations of various embodiments of the present invention can be accomplished with standard programming techniques with rule-based logic and other logic to accomplish various database searching steps or processes, correlation steps or processes, comparison steps or processes and decision steps or processes. It should be noted that the words "component" and "module," as used herein and in the following claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs. The foregoing description of embodiments of the present invention, have been presented for purposes of illustration and description. The foregoing description is not intended to be exhaustive or to limit embodiments of the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of various embodiments of the present invention. The embodiments discussed herein were chosen and described in order to explain the principles and the nature of various embodiments of the present invention and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated. The features of the embodiments described herein may be combined in all possible combinations of methods, apparatus, modules, systems, and computer program products.

[0041] Other solutions, uses, objectives, and functions within the scope of the invention as claimed in the below described patent claims should be apparent for the person skilled in the art.

Claims

1. A method for classifying an activity of a person, the method comprising:

• receiving (1) a sound signal from a sensor (222),

• determining type of sound based on said sound signal, and

• determining said activity based on said type of sound.

2. The method of claim 1, said sensor (222) is a microphone facing body of a user (250).

3. The method of claim 1 or 2, wherein said sound data corresponds to vibrations transported through the body.

4. The method according to any or previous claims, wherein said sensor further comprises a motion detector (223).

5. The method according to any or previous claims, further comprising comparing said sound signal with a number of sound signals stored in a memory, which includes a plurality of sound types and a plurality of attributes associated with each sound type.

6. The method according to any of previous claims, further comprising using Bayesian rules or neural network.

7. The method according to claim 5, wherein each attribute comprises a predefined value and each sound type is associated with each attribute.

8. The method according to claim 6, wherein each sound type is associated with each attribute in accordance with Bayesian's rule, such that a conditional probability of each sound type is defined for an occurrence of each attribute.

9. The method according to any of previous claims, wherein said attributes consist of one or several of: histogram features, linear predictive coding, cepstral coefficients, short-time Fourier transform, timbre, zero-crossing rate, short-time energy, root-mean-square energy, high/low feature value ratio, spectrum centroid, spectrum spread, or spectral roll-off frequency.

10. A device (100) for classifying an activity of a person (250), the device (100) comprising: a receiver for receiving a sound signal from a sensor (220), and a controller (120), characterised in that said sensor is arranged receiving a sound wave and output a sound signal and the controller (120) is configured to process said sound signal and determine type of sound based on said sound signal, and determine said activity based on said type of sound.

11. The device of claim 10, wherein said sound signals are received from one or several microphones attached to said person.

12. The device of claim 11, wherein said microphones are arranged facing skin of said person corresponding to vibrations transported through a body of the person.

13. The device according to any of claims 10-12, further receiving motion data from one or several motion detectors.

14. The device according to any of claims 10-13, wherein said controller is configured to compare said sound signal with a number of sound signals stored in a memory, which includes a plurality of sound types and a plurality of attributes associated with each sound type, each attribute comprising a predefined value and each sound type is associated with each attribute, each sound type is associated with each attribute in accordance with Bayesian's rule, such that a conditional probability of each sound type is defined for an occurrence of each attribute.

15. The device according to any of claims 10-14, wherein said attributes consist of one or several of: histogram features, linear predictive coding, cepstral coefficients, short-time Fourier transform, timbre, zero-crossing rate, short-time energy, root-mean-square energy, high/low feature value ratio, spectrum centroid, spectrum spread, or spectral roll-off frequency.

16. A mobile communication terminal comprising a device according to any of claims 10-15.

Drawing

Search report

Search report