D2.4 State of the art of technologies - usefil

Oliver Johansen | Download | HTML Embed
  • May 10, 2012
  • Views: 33
  • Page(s): 109
  • Size: 3.46 MB
  • Report

Share

Transcript

1 Information and Communication Technologies Collaborative Project Unobtrusive Smart Environments For Independent Living USEFIL Grant Agreement Number 288532 D2.4 STATE OF THE ART OF TECHNOLOGIES Report Identifier: D2.4 Work-package, Task: WP2 Status Version: Approved V3.0 Distribution Security*: PU Deliverable Type: ** R Editor(s): Warwick: Sylvester Vijay Rozario, Christopher James, James Amor Contributors: Fraunhofer, PCL, VTT, NCSR, AUTH Abstract: This deliverable presents an analysis of the state of the art of technologies, systems and methods related to the USEFIL: in sensors and networking, monitoring physiological parameters and patients behaviour, electronic health records, and so on. Keywords: state-of-art, technologies *PU Public, PP Restricted to other programme participants (including the Commission Services), RE Restricted to a group specified by the consortium (including the Commission Services), CO Confidential, only for members of the consortium (including the Commission Services). **R Report, P Prototype, D Demonstrator, O Other.

2 Disclaimer Neither the USEFIL Consortium nor any of its members, their officers, employees or agents shall be responsible or liable in negligence or otherwise howsoever in respect of any inaccuracy or omission herein. Without derogating from the generality of the foregoing, neither the USEFIL Consortium nor any of its members, their officers, employees or agents shall be liable for any direct or indirect or consequential loss or damage caused by or arising from any information advice or inaccuracy or omission herein. Copyright notice Copyright 2011-2014 by the USEFIL Consortium This document contains information that is protected by copyright. All Rights Reserved. No part of this work covered by copyright hereon may be reproduced or used in any form or by any means without the permission of the copyright holders.

3 Document Revision History Date Version Author/Editor/Contributor Version Description 10.02.2012 0.1 Jarkko Leino (VTT) VTT contributions only. 14.02.2012 0.2 Jarkko Leino (VTT) Additions and polishing to section 6.2. 23.03.2012 0.9 Sylvester Rozario (Warwick) First Draft 05.04.2012 0.99 James Amor (Warwick) Editing, introduction and conclusion 06.04.2012 1.0 Artem Katasonov (VTT) Some cleaning, upgraded to Draft status for review 11.04.2012 1.1 Artem Katasonov (VTT) Section 5.2 edited + small fixes 18.04.2012 1.11 James Amor (Warwick) Merged different versions of document 04.05.2012 1.11 James Amor (Warwick) Edited section 2.1 and cleaned up section 2.2. 06.05.2012 1.2 Detlef Ruschin (Fraunhofer) Integration of paragraph 7 into paragraph 4 and edits in the sections on emotion theories. The resulting chapter is renamed to Monitoring Humans. Addition of anonymisation and pseudonymisation to the data security/privacy section 09.05.2012 1.22 James Amor (Warwick) General tidy. Some minor additions to section 2.1 and 10 10.05.2012 1.3 Artem Katasonov (VTT) Small fixes throughout. Added mention of ASUS Xtion. Added section on open interconnected systems. 10.05.2012 2.0 Artem Katasonov (VTT) Review process finalized. Upgraded to Proposal status. 10.05.2012 3.0 Artem Katasonov (VTT) Upgraded to Approved status.

4 D2.4 State of the Art Table of Contents ABBREVIATIONS ...................................................................................................................................................... 9 1. INTRODUCTION ........................................................................................................................................... 10 2. MOBILE MONITORING SYSTEM.............................................................................................................. 11 2.1 MOBILE SENSOR PLATFORMS ................................................................................................................................ 11 2.1.1 HealthPAL ................................................................................................................................................... 11 2.1.2 LIBRI .............................................................................................................................................................. 11 2.1.3 Body Media SenseWear Armband..................................................................................................... 12 2.1.4 Basis............................................................................................................................................................... 13 2.1.5 Sensory Watch........................................................................................................................................... 13 2.1.6 Wriskwatch ................................................................................................................................................ 14 2.1.7 Metawatch................................................................................................................................................... 14 2.1.8 Jawbone: UP ............................................................................................................................................... 14 2.1.9 Affectiva Emotion Sensor ..................................................................................................................... 15 2.1.10 eZ430 Chronos Development Kit ................................................................................................... 16 2.1.11 im Watch..................................................................................................................................................... 16 2.1.12 Motoactv ...................................................................................................................................................... 16 2.1.13 Wimm............................................................................................................................................................ 17 2.1.14 SmartMonitor ............................................................................................................................................ 18 2.1.15 Pebble ........................................................................................................................................................... 18 2.1.16 Device Comparison Table ..................................................................................................................... 19 2.2 SMARTPHONE PLATFORMS ..................................................................................................................................... 20 2.3 COMMUNICATION TECHNOLOGIES ........................................................................................................................ 21 2.3.1 Wireless Communication Protocols ................................................................................................. 21 2.3.2 Wired Connection Protocol.................................................................................................................. 25 3. SLATE TABLET-PC ...................................................................................................................................... 26 3.1 TECHNICAL CAPABILITIES....................................................................................................................................... 26 3.2 SOCIAL NETWORKS CATERING TO THE AGEING POPULATION ........................................................................... 28 4. VIDEO MONITORING SYSTEM................................................................................................................. 33 4.1 CAMERAS.................................................................................................................................................................... 33 4.2 PROCESSING UNITS .................................................................................................................................................. 34 5. MONITORING HUMANS ............................................................................................................................. 36 5.1 MONITORING VITAL SIGNS ..................................................................................................................................... 36 5.2 MONITORING EMOTIONS......................................................................................................................................... 40 Page 4 of 109

5 D2.4 State of the Art 5.2.1 The Theory of Basic Emotions ............................................................................................................ 41 5.2.2 The Theory of Affect Dimensions ...................................................................................................... 41 5.2.3 The Theory of Communicative Emotion Displays ...................................................................... 42 5.2.4 Automated Emotion Recognition ...................................................................................................... 42 5.3 MONITORING BEHAVIOUR ...................................................................................................................................... 57 5.3.1 Human behaviour theories .................................................................................................................. 57 5.3.2 Techniques for human behaviour recognition ............................................................................ 58 6. HOME GATEWAY SYSTEMS ..................................................................................................................... 67 6.1 WEB-TV..................................................................................................................................................................... 67 6.1.1 Introduction ............................................................................................................................................... 67 6.1.2 Philips NetTV ............................................................................................................................................. 67 6.2 SMART HOME PLATFORMS ..................................................................................................................................... 72 6.2.1 Home automation platforms ............................................................................................................... 72 6.2.2 Trends highlights from recent technology shows ...................................................................... 76 6.2.3 Research related to smart homes and ambient intelligence .................................................. 77 6.3 OPEN INTERCONNECTED SYSTEMS ....................................................................................................................... 80 6.3.1 Java Enterprise Edition .......................................................................................................................... 80 6.3.2 UPnP / DLNA.............................................................................................................................................. 81 6.3.3 Web Services .............................................................................................................................................. 81 6.3.4 Multi-Agent Systems ............................................................................................................................... 82 6.3.5 Tuple/Triple Spaces................................................................................................................................ 83 7. DECISION SUPPORT SYSTEMS ................................................................................................................ 85 7.1 INTRODUCTION ......................................................................................................................................................... 85 7.2 DATA TO INFORMATION TO OUTCOME: LIFECYCLE MANAGEMENT ................................................................ 86 7.2.1 Feature Selection...................................................................................................................................... 86 7.2.2 Sensor fusion ............................................................................................................................................. 86 7.2.3 Data mining methods ............................................................................................................................. 87 7.2.4 Knowledge representation .................................................................................................................. 87 7.2.5 Inference mechanisms ........................................................................................................................... 89 7.3 SEMANTIC INFERENCE AND PERSONALIZATION TECHNOLOGIES..................................................................... 90 7.3.1 Ontologies: User & Domain Modelling ............................................................................................ 91 7.3.2 Reasoning through ontologies ............................................................................................................ 96 7.3.3 Linked Open Data..................................................................................................................................... 98 8. DATA SECURITY ........................................................................................................................................101 8.1 AUTHENTICATION ................................................................................................................................................. 101 Page 5 of 109

6 D2.4 State of the Art 8.2 AUTHORISATION .................................................................................................................................................... 102 8.3 SECURE DATA STORAGE ........................................................................................................................................ 102 8.4 SECURE COMMUNICATION .................................................................................................................................... 103 8.5 PRIVACY .................................................................................................................................................................. 104 9. OTHER TECHNOLOGIES WORTH WATCHING ................................................................................. 105 9.1 INTERACTIVE DISPLAYS........................................................................................................................................ 105 9.2 HOLOFLECTOR ....................................................................................................................................................... 105 10. SUMMARY .................................................................................................................................................... 106 10.1 MOBILE MONITORING SYSTEMS ......................................................................................................................... 106 10.1.1 Mobile Sensor Platforms .................................................................................................................... 106 10.1.2 Smart-Phone Platform ........................................................................................................................ 106 10.1.3 Communication Technologies ......................................................................................................... 106 10.2 SLATE TABLET-PC ................................................................................................................................................ 106 10.2.1 Tablet-PCs for Social Interaction .................................................................................................... 106 10.3 VIDEO MONITORING UNITS ................................................................................................................................. 107 10.4 HOME GATEWAY SYSTEMS .................................................................................................................................. 107 10.5 DECISION SUPPORT SYSTEMS .............................................................................................................................. 107 10.6 MONITORING HUMANS ......................................................................................................................................... 107 10.6.1 Photoplethysmography ...................................................................................................................... 107 10.6.2 Emotion Recognition ........................................................................................................................... 108 10.6.3 Behaviour Monitoring ......................................................................................................................... 108 10.7 DATA SECURITY ..................................................................................................................................................... 109 List of Tables TABLE 1: COMPARISON OF DIFFERENT MOBILE SENSOR PLATFORMS. ......................................................................... 19 TABLE 2: COMPARISON OF KEY SMARTPHONE OS FEATURES. ................................................................................... 20 TABLE 3: EXAMPLES OF CAMERAS THAT CAN BE USED FOR BEHAVIOUR MONITORING.................................................... 33 TABLE 4: EXAMPLES OF SMALL PROCESSING UNITS .................................................................................................. 34 TABLE 5: SUMMARY OF RESEARCH PROJECTS ATTEMPTING TO AUTOMATE VIDEO CODING ............................................ 44 TABLE 6: SUMMARY OF STUDIES WHERE PHYSIOLOGICAL SIGNALS, WERE USED FOR EMOTION RECOGNITION .................... 52 TABLE 7: TECHNICAL CAPABILITIES OF THE PHILIPS NETTV ....................................................................................... 69 TABLE 8: SMART HOME RESEARCH PROJECTS ........................................................................................................ 77 TABLE 9 ONTOLOGIES FOR SENSORS AND OBSERVATIONS SEMANTIC REPRESENTATION ................................................. 95 Page 6 of 109

7 D2.4 State of the Art List of Figures FIGURE 1: HEALTHPAL, MOBILE HEALTH MONITORING DEVICE ................................................................................. 11 FIGURE 2: LIBRI, PERSONAL HEALTH GATEWAY ....................................................................................................... 11 FIGURE 3: THE LIBRI SYSTEM ARCHITECTURE .......................................................................................................... 12 FIGURE 4: BODY MEDIA ARMBAND ..................................................................................................................... 12 FIGURE 5: THE BASIS WRIST WATCH BASED MOBILE SENSOR PLATFORM ..................................................................... 13 FIGURE 6: SENSORY WRISTWATCH, RESEARCH PROJECT AT FRAUNHOFER, IZM ........................................................... 13 FIGURE 7: METAWATCH, WEARABLE SENSOR DEVELOPMENT PLATFORM .................................................................... 14 FIGURE 8: JAWBONE UP .................................................................................................................................... 15 FIGURE 9: AFFECTIVA Q SENSOR ......................................................................................................................... 15 FIGURE 10: THE IM WATCH SMARTWATCH PLATFORM. .......................................................................................... 16 FIGURE 11: THE MOTOACTV WRIST WATCH DEVICE ................................................................................................ 17 FIGURE 12: THE WIMM DEVICE MOUNTED ONTO A WATCH STRAP ............................................................................ 17 FIGURE 13: THE PEBBLE SMARTWATCH. ............................................................................................................... 18 FIGURE 17: BETTIE, A DISPLAY WITH INTEGRATED PC AND SOCIAL NETWORKING SOFTWARE FOR ELDERLY ....................... 30 FIGURE 18: ELDY, A PC PROGRAM TO RUN A LIMITED NUMBER OF APPLICATIONS TAILORED TO ELDERLY USERS ................ 30 FIGURE 19: MEMO, A TABLET FOR ELDERLY AND THEIR CAREGIVER NETWORK, SHOWN WITH THE MOST COMPLEX HOME SCREEN LAYOUT AVAILABLE (LEFT) AND PLACED IN A LIVING ROOM (RIGHT).......................................................... 31 FIGURE 20: ANGELA, ANOTHER TABLET FOR ELDERLY CARETAKERS ............................................................................ 31 FIGURE 21: THE CONNECT TABLET ON A SIDEBOARD AT HOME (LEFT) AND A LARGER VIEW OF ITS HOME SCREEN (RIGHT) .. 32 FIGURE 19: EXAMPLES OF CAMERAS .................................................................................................................... 33 FIGURE 15: KINECT AND ASUS XTION PRO LIVE .................................................................................................... 34 FIGURE 16: EXAMPLES OF SMALL PROCESSING UNITS .............................................................................................. 35 FIGURE 22: THE HUMAN SKIN (VLISUO ET AL., 2010). .......................................................................................... 37 FIGURE 23: USING A SMARTPHONE (SOURCE: VLISUO ET AL., 2010)....................................................................... 39 FIGURE 24: USING A WEB CAMERA (ADAPTED BY POH ET AL., 2010). ....................................................................... 39 FIGURE 25: EMOTION IN FACE DETECTION MODE (LEFT) AND IN TRACKING MODE (RIGHT) ............................................ 47 FIGURE 26: THE MESH USED FOR TRACKING IN AFFETIVO (LEFT) AND THE MEASURED TIME COURSE OF HAPPINESS WHILE VIEWING AN ADVERTISING SPOT (RIGHT) ........................................................................................................ 47 Page 7 of 109

8 D2.4 State of the Art FIGURE 27: PERSON WATCHING A TV SPOT OBSERVED BY A CAMERA (LEFT), HIS EMOTIONS MEASURED WITH SHORE IN REAL-TIME (MIDDLE) AND THE RECORDED TIME-COURSE OF HIS EMOTIONS (RIGHT) .............................................. 48 FIGURE 28: FACIAL EXPRESSION TRACKED BY TUM EMOTION MONITORING SYSTEM IN A CAR (LEFT) AND MEASURED TIME COURSE OF HAPPINESS (BELOW) ................................................................................................................... 49 FIGURE 29: THE RELATIONSHIP BETWEEN PERFORMANCE AND AROUSAL. (EYSENCK & EYSENCK, 1985, P. 199).............. 51 FIGURE 30: THE MASLOW HIERARCHY OF NEEDS.................................................................................................... 57 FIGURE 31: NETTV PORTAL INTERFACE................................................................................................................. 68 FIGURE 32: OVERVIEW OF THE NETTV DEVICE AND SERVICES ................................................................................... 69 FIGURE 33: THEREGATE..................................................................................................................................... 73 FIGURE 34: GREENWAVE REALITY GATEWAY, SWITCHES AND DISPLAY. ...................................................................... 73 FIGURE 35: HOMESEER AUTOMATION GATEWAYS. ................................................................................................. 74 FIGURE 36: 3RD PARTY AUTOMATION DEVICES. ..................................................................................................... 74 FIGURE 37: SYSTEM INTERFACES. ........................................................................................................................ 74 FIGURE 38: EXAMPLE INTERFACES. ...................................................................................................................... 75 FIGURE 39: EBTS HOME SYSTEM WEB INTERFACE. ................................................................................................ 75 FIGURE 40: THE GENERAL MODEL OF CDSS.......................................................................................................... 85 FIGURE 41: COCON CONCEPTUAL MODEL............................................................................................................ 93 FIGURE 42: LINKING OPEN DATA CLOUD DIAGRAM, BY RICHARD CYGANIAK AND ANJA JENTZSCH. ................................. 99 FIGURE 43: USEFIL DATA PROCESSING .............................................................................................................. 101 Page 8 of 109

9 D2.4 State of the Art Abbreviations CDSS Clinical Decision Support Systems DECT Digital Enhanced Cordless Telecommunications (European communication standard for phones) DSL Digital Subscriber Line DSS Decision Support System HCI Human-Computer Interaction HTTP Hypertext Transfer Protocol HVAC Heating, ventilation and air conditioning system in a house ISM Industrial, Scientific, Medical radio-frequency band JSON JavaScript Object Notation OS Operating system OSGi Open Services Gateway initiative framework OWL Web Ontology Language M2M Machine-to-Machine MAS Multi-Agent System UPnP Universal Plug and Play RDF Resource Description Framework REST Representational state transfer UI User Interface USB Universal Service Bus WS4D Web Services for Devices AC Affective Computing Page 9 of 109

10 D2.4 State of the Art 1. Introduction In developing and implementing the USEFIL concept a wide range of technology will be used. Wearable sensor units will form the basis of wearable data capture and these may be enhanced through the use of a smart-phone; tablet-PCs will be used to facilitate social interaction and the usage of social networks; web-TVs will be used as a single home gateway for the system; video monitoring units will be used to facilitate unobtrusive bio-signal acquisition and emotion detection; and there will be a central communication infrastructure providing the communication pathways within the system. This report presents an analysis of the state of the art of systems and methods for monitoring physiological and behavioural signals and other technologies that are likely to be utilised in the USEFIL system. The report is split into several chapters. Their order roughly follows the data flow within the envisioned USEFIL system: from sensors and end-user devices to processing software to central home gateway to decision support system on a remote server. Chapter 2 examines mobile technology, including wearable sensor units, smart-phone platforms and communication technologies. Chapter 3 examines slate tablet-PCs and social networks for enabling social interaction for the elderly. Chapter 4 overviews the video monitoring systems in the context of USEFIL, while Chapter 5 then discusses the state of the art in monitoring (mostly through video but not only) human beings. Three specific areas are presented: monitoring vital signs, emotion recognition, and monitoring behaviour. Chapter 6 presents in-home gateway technologies, including web-TV, smart- home systems, and approaches for building open interconnected systems. Chapter 7 presents a discussion of decision support systems and their key components. Chapter 8 presents a discussion of data security and how it will be applied in USEFIL. Chapter 9 briefly examines a few more interesting technologies that do not otherwise fit into the above sections. Finally, Chapter 10 presents a summary of this report and some preliminary analysis on which technology platforms would be suitable for USEFIL. Page 10 of 109

11 D2.4 State of the Art 2. Mobile Monitoring System 2.1 Mobile Sensor Platforms 2.1.1 HealthPAL HealthPAL is a personal health gateway that can connect to compatible medical devices using either a Bluetooth or wired connection. Through the HealthPAL data from these devices can be collected and transferred, via M2M, to an electronic health record (EHR), such as Microsoft Health Vault or Google Health. The device offers no on-board sensors and acts purely as a gateway system, but can operate without the need for a smart-phone or other data transmitting device. Currently there are compatible devices to measure blood pressure, blood oxygenation, blood glucose level and weight. Further device compatibility is announced on the product website along with current EHR compatibility. Figure 1: HealthPAL, mobile health monitoring device 2.1.2 LIBRI Figure 2: Libri, personal health gateway The Libri is composed of the 'Libri' device connected over the mobile network to a cloud-based architecture. The Libri is a personal heath gateway: a small, simple and wearable cellular device. A single button connects the user to Libri servers, where voice, activity and sensor information are Page 11 of 109

12 D2.4 State of the Art routed to a caregiver. The Libri connects to personal sensors over Bluetooth or Bluetooth LE link. The information collected is tracked over time to allow caregivers to make intelligent, timely and informed decisions. Strict privacy and access rules are implemented to safeguard sensitive information. Figure 3: The Libri system architecture 2.1.3 Body Media SenseWear Armband Figure 4: Body Media Armband The SenseWear Armband provides a wearable multi-sensor data logging device. The armband is capable of recording activity, with a 2-axis accelerometer, galvanic skin response (GSR), skin temperature and ambient temperature. The armband has a battery life of approximately 14 days under normal use and is powered by a single AAA battery. However, the on-board memory of the device is only sufficient for 10 days under normal use. Page 12 of 109

13 D2.4 State of the Art 2.1.4 Basis Figure 5: The Basis wrist watch based mobile sensor platform The Basis is a wristwatch that is capable of tracking heart rate, movement with a 3-axis accelerometer, GSR and skin and ambient temperatures. Data are aggregated and various health and lifestyle indicators are extracted. These indicators are uploaded to a web portal for the user to access. The Basis system appears to be a closed system and does not make raw data available for use. The device is reported as close to being on the market but not available as of the 4th May 2012. 2.1.5 Sensory Watch The sensor wristband, engineered at IZM, is suited for the long-term monitoring of various important body functions of older patients and also that of athletes. It is like a plastic wristwatch. Instead of a clock dial, the sensor wristband is equipped with an illuminated "electroluminescent display" that indicates for example the actual body temperature at any time of the day. It also detects skin moisture, which may be a sign for the dehydration of the patient or athlete. For a person with a pacemaker the patient wristband may also signal potential danger, indicating the strength of electric or electromagnetic fields in close proximity. A number of other applications are conceivable: if needed, the most diverse array of sensors can be integrated into the polytronic platform. Figure 6: Sensory wristwatch, research project at Fraunhofer, IZM Page 13 of 109

14 D2.4 State of the Art 2.1.6 Wriskwatch The Wriskwatch is a monitoring device geared specifically to the detection of cardiac emergency situations. If an emergency is detected the Wriskwatch can alert the relevant authorities to the emergency and geographical location of the user. The watch can also connect with an Automated External Defibrillator (AED). The Wriskwatch primarily tracks heart rate but also includes GPS, fall detection and a panic button. The Wriskwatch system is a closed system and there appears to be no way to use the device as a data monitoring or logging system. 2.1.7 Metawatch Metawatch is a wearable development system that enables connected-watch applications to be developed. This means that the user interface of a smartphone or other compatible device can be extended to the watch. The Metawatch platforms utilize Bluetooth technology to enable this connection link. The Metawatch includes a 3-axis accelerometer and a light sensor for sensing purposes. The screen for the Metawatch however is of a reflective type and can be difficult to see unless under direct, bright light sources. The Metawatch is an open development platform and can be configured and adapted to suit an intended purpose. Figure 7: Metawatch, wearable sensor development platform 2.1.8 Jawbone: UP The UP is a motion-sensing wrist band with an inbuilt accelerometer. The UP is marketed as an exercise tracking and lifestyle gadget and is intended to aid the user in maintaining a healthy lifestyle. The product interfaces with a web-portal that provides the user with access to their data Page 14 of 109

15 D2.4 State of the Art and some social network functionality. The UP requires a smart-phone to be used to synch data, which is done via a 3.5mm jack. The device is further sweat-proof and water resistant. Data from the UP are uploaded to a web system for the user to access and as such, the UP is closed system. Figure 8: Jawbone Up 2.1.9 Affectiva Emotion Sensor The Affectiva Q Sensor is a wearable, wireless biosensor that is designed to measure emotional arousal. The sensor monitors GSR, which grows higher during states such as excitement, attention or anxiety and lower during states such as boredom or relaxation, temperature and activity with a 3- axis accelerometer. The sensor uses a Bluetooth link to connect to a PC for the download of data. It appears from the literature and manufacturers website that the Q Sensor requires specific software on a PC to enable the downloading of data, which is achieved with a USB cable. The Q Sensor itself is a closed system, but the data that it provides is easily accessible. Figure 9: Affectiva Q Sensor Page 15 of 109

16 D2.4 State of the Art 2.1.10 eZ430 Chronos Development Kit The eZ430-Chronos is a highly integrated, wearable wireless development system based for the CC430 in a sports watch. It may be used as a reference platform for watch systems, a personal display for personal area networks, or as a wireless sensor node for remote data collection. Based on the CC430F6137

17 D2.4 State of the Art The device is tethered through Bluetooth to a smart-phone and uses the phones data connection to upload data to the Motoactv portal. Users can use the portal to view their data and that of their friends. Figure 11: The Motoactv wrist watch device 2.1.13 Wimm The Wimm is a small android based development platform and can either be paired to a smart- phone or used as an independent device. It offers both WiFi and Bluetooth connectivity options and has an on-board accelerometer and magnetometer. The device is primarily aimed at the development of connected watch type systems and can utilise the data connection from a connected smart-phone. The WiFi connection however allows the Wimm to access a data connection independently of a tethered phone. Figure 12: The Wimm device mounted onto a watch strap Page 17 of 109

18 D2.4 State of the Art 2.1.14 SmartMonitor The SmartMonitor is an epileptic episode detection and notification device in the form of a wrist watch. The device is built on top of the Metawatch platform and must therefore be paired to a smart- phone in order to be used. The inbuilt accelerometer is used to detect the characteristic movements associated with Tonic-Clonic seizures. Once a seizure is detected the watch uses the phones facilities to alert the users nominated contacts that a seizure is occurring. 2.1.15 Pebble The Pebble is a smartwatch device that uses an e-paper display rather than a conventional LCD or touchscreen display. As with other smartwatches, it needs to be paired with a smartphone in order to use the phones connectivity features. The Pebble has an on-board accelerometer and Bluetooth capability. The use of an e-paper screen rather than a touch screen means that, unlike the Wimm and the im Watch, the Pebble relies on buttons, rather than a touch screen interface. The Pebble is not (as of 09.05.2012) available on the market but is expected to be by the end of the year. An SDK will be made available so that app developers can develop custom apps that integrate with the watch. Figure 13: The Pebble smartwatch. Page 18 of 109

19 D2.4 State of the Art 2.1.16 Device Comparison Table The following table presents a comparison of the devices discussed in section 2.1. Where data is omitted from the table it is because this data is not available, particularly in reference to battery life. It should be noted that the battery life of the smartwatch platforms (Chronos, Meta Watch, im Watch, Wimm and Pebble) will depend heavily on the operational load placed on the device. Table 1: Comparison of different mobile sensor platforms. Device Communication Sensors Battery Life Open System HealthPAL Bluetooth, wired, Oxymetry, Blood pressure, No, but connects Machine-to-Machine weight and glucose level all to some EHR. (M2M) via external sensors Libri Bluetooth, GSM Voice and connections to No external sensors Body Media Bluetooth, USB Accelerometer (2-axis), 14 days No SenseWear temperature, galvanic skin response (GSR) Basis USB Accelerometer (3-axis), No temperature (ambient and skin), GSR Sensory Temperature, skin Watch moisture Wriskwatch Bluetooth GPS, fall detection, panic No button and heart-rate Metawatch Bluetooth Accelerometer (3-axis) and Yes. Custom APIs light sensor available. Jawbone: UP Wired Accelerometer (3-axis) < 10 days No Affectiva Q Bluetooth, USB Accelerometer (3-axis), 24hrs logging. No, but data can Sensor temperature and GSR 5hrs streaming be freely retrieved from device. Chronos RF Accelerometer (3-axis), 2 days Yes. Custom APIs temperature and external estimated for available. unit for heart-rate heavy usage im Watch Bluetooth Accelerometer (3-axis) and 3hrs speaker- Yes. Android compass phone, 24hrs (Java) based and standby can run apps. Motoactv WiFi, Bluetooth, wired Accelerometer (3-axis), No GPS and external heart-rate unit Wimm WiFi, Bluetooth, wired Accelerometer (3-axis) Yes. Android (Java) based and can run apps. Smart Bluetooth Accelerometer (3-axis) No Monitor Pebble Bluetooth Accelerometer (not 7+ days Yes. SDK will be specified) made available Page 19 of 109

20 D2.4 State of the Art 2.2 Smartphone Platforms Smartphones are mobile phones that typically combine a phone with many other features into a single platform. Features commonly included in smartphones PDA functionality, cameras, media players and GPS navigation. Communication technologies such as WiFi and Bluetooth are available on most smartphone platforms, as are a limited set of sensor devices. Accelerometers are present on a number of smartphones, as are GPS units; some devices also incorporate a magnetometer to act as a compass. It is standard for most smartphone operating system providers to allow third-parties to develop mobile applications (apps) that can be run on the smartphone platform. The use of apps extends the smartphone into the mobile computing space and allows for a wide array of functionality to be developed. Smartphones typically use a large touch-screen as an input device, sometimes augmented with physical buttons and/or a physical keyboard (although this is becoming rare). The touchscreen allows for a fully customised soft-input and allows an app developer to utilise the screen space as they wish. This allows for a large degree of customisation of user interaction; something which is expected to be important if a smartphone is to be used in the USEFIL project. There are several different operating systems available under which smartphone operate, such as Android, iOS, Windows Phone, Blackberry OS and Symbian. At a basic level however, these operating systems all serve to offer the same functionality to the user; that is to enable the functions on whichever phone they are installed on. The choice of operating system can impact app development too, principally in the way multi-tasking is handled. In particular, if an app needs to run constantly in the background, it requires a platform that supports proper multi-tasking. The choice of OS can also affect the development cost. Table 2 shows a comparison of the three major smartphone OS platforms. Table 2: Comparison of key smartphone OS features. Operating Open Multitasking Languages Development Tool Cost System Source Java but portions can Yes Yes Android Free be in C, C++ Installing on a device needs a fee for a No Very limited iOS Objective-C developer signing key. Windows Free No Limited C#, Visual Basic Phone From the point of view of the USEFIL project the open source nature of the operating system is also important, as the decision has been made to use open source solutions. Currently, Android is the only major mobile operating system that is open source. However, the Android OS is also the most open in terms of multi-tasking and API exposure, in that multi-tasking is fully supported and the whole suite of phone features are exposed via APIs. In addition, among the three major systems above, Android is the only one where the development is possible of background processes that access to the phones sensors such as GPS, accelerometer, etc. In both iOS and Windows Phone, only foreground applications can access sensors. Page 20 of 109

21 D2.4 State of the Art 2.3 Communication Technologies 2.3.1 Wireless Communication Protocols 2.3.1.1 Personal Area Networks 2.3.1.1.1 Bluetooth Bluetooth has dominated recent activity in short-range wireless connectivity (Bluetooth standard: IEEE 802.15.1). Bluetooth provides wireless transmission of data over short ranges (typically less than 10 metres, although there are longer-range versions). It is often used for connection to health monitoring devices, although the major expansion has been in hands-free connections to mobile phones. It is the Bluetooth Medical Device Profile (announced November 2007) that is optimised for the secure transfer of medical data. Data rates of around 1Mbit/s are sufficient for most applications. Typically, most connections are between a computing device (e.g. home health station, mobile phone, telemedicine device, computer or PDA) and one or more Bluetooth enabled devices (e.g. medical, health and fitness sensors such as heart rate, blood pressure, glucose, weight, and oximeters). The transmission of data takes place quickly and seamlessly such that the user need not be involved. For the elderly, wireless applications allow a more mobile monitoring system and less confinement to a room or a bed, albeit still within the home. For emergency applications the reduction of cables may reduce complications A Bluetooth profile defines how different applications use Bluetooth wireless technology to set up a connection and exchange data. The Medical Devices Working Group of the Bluetooth Special Interest Group developed a profile to ensure that devices used in medical, health and fitness applications can transfer data between devices in a secure and well defined way via Bluetooth wireless technology. http://bluetooth.com/Bluetooth/SIG/ 2.3.1.1.2 OBEX OBEX (abbreviation of OBject EXchange, also termed IrOBEX) is a communications protocol that facilitates the exchange of binary objects between devices. It is maintained by the Infrared Data Association but has also been adopted by the Bluetooth Special Interest Group and the SyncML wing of the Open Mobile Alliance (OMA). OBEX is also used over RS232, USB and WAP. OBEX is the foundation for many higher-layer "profiles", e.g. Bluetooth SIG Generic Object Exchange Profile: Object Push Profile (phone to phone transfers) File Transfer Profile (phone to PC transfers) Synchronization Profile Basic Imaging Profile Basic Printing Profile 2.3.1.1.3 ZigBee Page 21 of 109

22 D2.4 State of the Art ZigBee (standard: IEEE 802.15.4) is a low-cost, low-power, wireless networking standard. It also supports the build of mesh structures, offering the prospect of reliable, wide-range coverage across buildings. The low cost allows the technology to be widely-deployed in wireless control and monitoring applications, whilst the low power-usage allows longer life with smaller batteries. The ZigBee Alliance is an association of companies working towards an open, global standard. The goal of the Alliance is to provide the consumer with flexibility, mobility, and ease of use by building wireless intelligence and capabilities into everyday devices. It is expected that ZigBee technology will be embedded in a wide range of products and applications across consumer, commercial, industrial and government markets worldwide. ZigBee operates in the industrial, scientific and medical (ISM) radio bands (868MHz in Europe, 915MHz in countries such as USA and Australia, and 2.4GHz in most jurisdictions worldwide). The technology is intended to be simpler and cheaper than other Wireless Personal Area Networks such as Bluetooth. This is achieved by pursuing communication protocols that require lower data rates and low power consumption. ZigBees current focus is to define a general-purpose, inexpensive, self- organizing mesh network that can be used for industrial control, embedded sensing, medical data collection, smoke and intruder warning, building automation, home automation, etc. The resulting network will use very small amounts of power so individual devices might run for a year or two using the originally installed battery. http://www.zigbee.org/en/index.asp 2.3.1.1.4 Z-Wave An alternative to ZigBee is Z-Wave, which is wide-spread especially in the U.S. Z-Wave is a proprietary wireless communications protocol designed for home automation, specifically to remotely control applications in residential and light commercial environments. The technology uses a low-power RF radio embedded or retrofitted into home electronics devices and systems, such as lighting, home access control, entertainment systems and household appliances. Z-Wave was developed by a Danish startup called Zen-Sys that was acquired by Sigma Designs in 2008. Z-Wave is currently supported by over 160 manufacturers (forming Z-Wave Alliance) worldwide and appears in a broad range of consumer products in the U.S., Europe and Asia. The standard itself is not open and is available only to Sigma-Designs customers under non-disclosure agreement. All these product share the Z-Wave transceiver chip that is supplied by Sigma Designs and Mitsumi. The Z-Wave wireless protocol is optimized for reliable, low-latency communication of small data packets. Z-Wave operates in the sub-gigahertz frequency range, around 900 MHz. The range of Z- Wave is approximately 30 meters assuming "open air" conditions, with reduced range indoors depending on building materials, etc. 2.3.1.1.5 Bluetooth Low Energy (Wibree) Bluetooth Low Energy is a development from Nokia. It was adapted from the Bluetooth standard to provide lower power usage and price whilst attempting to minimize the difference between Bluetooth and this variant. The results of the Nokia programme were published in 2004 using the name Bluetooth Low End Extension, and further development with partners led to public release in October 2006 with the brand name Wibree. After negotiations with Bluetooth SIG members, in June 2007, an agreement was reached to include Wibree in future Bluetooth specification as an ultra-low- power Bluetooth technology (ULP Bluetooth: IEEE 802.15.6). Wibree was designed to work side-by-side with and complement Bluetooth. It operates in 2.4 GHz ISM band with a physical layer bit rate of 1 Mbit/s. Main applications include devices such as wrist watches, wireless keyboards, toys and sports sensors where low power consumption is a key design Page 22 of 109

23 D2.4 State of the Art requirement. It was not designed to replace Bluetooth, but rather to complement the technology in supported devices. Enabled devices will be smaller and more energy-efficient than their Bluetooth counterparts. This is especially important in devices such as wristwatches, where Bluetooth models may be too large and heavy to be comfortable. Replacing Bluetooth will make the devices closer in dimensions and weight to current standard wristwatches. 2.3.1.1.6 ANT ANT is a proprietary wireless sensor network technology featuring a wireless communications protocol stack that enables semiconductor radios operating in the 2.4GHz ISM band to communicate by establishing standard rules for co-existence, data representation, signalling, authentication and error detection. ANT is characterized by a low computational overhead and high efficiency that results in low power consumption by the radios supporting the protocol. ANT has been targeted at the sports sector, particularly fitness and cycling performance monitoring. The transceivers are embedded in equipment such as heart-rate belts, watches, cycle power and cadence meters, and distance and speed monitors to monitor a users performance ANT claim the wireless sensor networking technology's low overhead, low power, interference free characteristics and operation in the 2.4 GHz ISM band suit applications in the health, home automation and industrial sectors. 2.3.1.1.7 WASP - 802.11 Wi-Fi to ANT+ Bridge WASP is a standalone unit providing a bridge for ANT+ devices to communicate wirelessly through Wi-Fi networks to other devices or over the Internet. Integrating an NPE WiFi-IT! module, 8-channel ANT+ receiver, power management circuitry and rechargeable Li-Ion battery, WASP provides a data gateway for monitoring, recording and analysing ANT+ data remotely. WASP receives data from connected ANT+ devices and translates the data into Wi-Fi packets, making it available to any Wi-Fi connected device. For example, ANT+ home scales, pulse-oximeter monitors, and blood glucose monitors are all able to use this bridge module to communicate their data to central monitoring stations via a Wi-Fi network. WASP is also usable as a bridge between multiple ANT+ nodes in distributed ANT+ network topologies. Since ANT+ is a personal area network, it has a typical range of approximately two meters. If the ANT+ network is used as a mesh or hub and spoke topology, WASP can join networks together that would normally not be able to communicate with each other because of range limitations. The WASP Application Programmers Interface is an open API that is used by developers to integrate WASP into ANT+ applications. 2.3.1.1.8 Ultra Wideband An alternative or complementary short range technology is Ultra Wideband (UWB: IEEE 802.15.3/4). UWB is very low power and transmits across a wide frequency range. It can achieve very high data rates. Ofcom, who are responsible for the civil use of radio spectrum in the UK, consulted on deployment of UWB in 2005, because UWB results in transmissions spread across large parts of the spectrum used by others. In August 2007 following an EC decision to allow the use of spectrum for UWB technology, Ofcom announced the necessary amendments to the regulations, allowing UWB technology to be deployed. Ofcom report that UWB solutions now have the advanced technical characteristics necessary for example to allow for co-location of multiple devices in a small Page 23 of 109

24 D2.4 State of the Art area which is a requirement of the Short Range Device, consumer electronics, and retail and logistics industries. 2.3.1.2 Local Area Networks 2.3.1.2.1 WLAN/WiFi The IEEE 802.11 series of WLAN standards provide for in-building coverage. Use in the UK covers 802.11 (a, b, g and n variants) and is widely implemented. Roll out of WiFi was largely a result of the wide availability of the Intel chip set in lap tops. The WiFi Alliance promotes the technology and interoperability (See http://www.wi-fi.org). Typical coverage is 10s of metres, but in building hot spots can leak beyond the confines of the building or home, where security can be a concern but is manageable. Hot spots have been set up in hotels, airports and restaurants, but it may be that WiFi has now passed its peak, because it does not generate an operator profit model, largely because of the problems of supporting differentiation and quality in unlicensed spectrum. Wi-Fi can provide community-based provision of low cost access and VoIP, and for free social initiatives it will still be relevant, particularly outside the population groups that are of interest to commercial operators and advertisers. For mainstream mobile or nomadic broadband, and anywhere access, the market could for example move to HSPA 3G Mobile (see below). WiFi can also be used to generate ad hoc networks, which may connect large numbers of nodes. 2.3.1.2.2 Femtocells In telecommunications, a femtocell (originally known as an Access Point Base Station) is a small cellular base station, typically designed for use in a home or small business. It connects to the service providers network via broadband (such as DSL or cable); current designs typically support 2 to 4 active mobile phones in a residential setting, and 8 to 16 active mobile phones in enterprise settings. A femtocell allows service providers to extend service coverage indoors, especially where access would otherwise be limited or unavailable. The femtocell incorporates the functionality of a typical base station but extends it to allow a simpler, self-contained deployment. For a mobile operator, the attractions of a femtocell are improvements to both coverage and capacity, especially indoors. There may also be opportunity for new services and reduced cost. 2.3.1.2.3 Ad Hoc and mesh networks Ad hoc mobile networks describe networks that are autonomous and self-organising. An example of an ad hoc network would be a group of vehicles, moving at different speeds and maybe in different directions and at variable distances apart. Together they form a dynamic communicating network. Another application area for ad hoc networks would be for the emergency services or in a disaster recovery situation where existing infrastructure was damaged. The advantages for assisted living are that such networks are spontaneous, involve little or no infrastructure, and are able to configure automatically and maintain themselves with little or no input from users. Mesh networks can be made up of nodes or sensors and can be networked together to improve their communications capabilities. Mobile ad hoc and mesh networks can be combined with location technologies to provide improved location information. Page 24 of 109

25 D2.4 State of the Art 2.3.1.3 Wide Area Networks 2.3.1.3.1 Mobile Networks A major constraint on use of mobile networks for transmission of any healthcare information has been lack of coverage. Although the UK has theoretical 99% coverage by mobile networks, coverage holes exist and vary across networks, and in building coverage will be less than outdoor coverage. HSPA (High Speed Packet Access, or 3G+) mobile networks may gain momentum for WAN applications in assisted living. 2.3.1.3.2 The IP Multimedia Subsystem (IMS) The IMS is an architectural framework for delivering internet protocol (IP) multimedia to mobile users. To ease the integration with the Internet, IMS as far as possible uses IETF (i.e. Internet) protocols such as Session Initiation Protocol (SIP). IMS is not intended to standardise applications itself but to aid the access of multimedia and voice applications across wireless and wireline terminals, a form of fixed mobile convergence. Alternative and overlapping technologies for access and provision of services across wired and wireless networks depend on the actual requirements, and include combinations of Generic Access Network, soft switches and "naked" SIP. This makes the business use of IMS less appealing. It is easier to sell services than to sell the virtues of "integrated services". But, services for IMS have not been prolific. 2.3.1.3.3 WiMAX Where WiFi provides local area coverage, then WiMax (IEEE 802.16) is the equivalent for wide area coverage. WiMAX is a standards-based technology enabling the delivery of last mile wireless broadband access as an alternative to wired broadband like cable and DSL. WiMAX provides fixed, nomadic, and portable wireless broadband connectivity without the need for direct line-of-sight with a base station. In a typical cell radius deployment of three to ten kilometres, WiMAX Forum Certified systems can be expected to deliver capacity of up to 40 Mbps per channel, for fixed and portable access applications. The main issue around the deployment of WiMax in the UK has been spectrum. Different frequencies are being used in the USA. In the UK spectrum at 3.5GHz was allocated to UK Broadband. The latest spectrum auction proposal from Ofcom is technology neutral. It may be that spectrum will be bought to launch more WiMax services in the UK. WiMax may provide a better service than other wireless broadband technologies in the medium term, and could potentially be optimised for Assisted Living. 2.3.2 Wired Connection Protocol The USB-IF Personal Healthcare Device Working Group is tasked to enable personal healthcare devices to seamlessly interoperate with USB hosts. The groups initial goal is to define a USB Personal Healthcare Device Class specification. The specification will enable health-related devices, such as blood pressure cuffs and exercise watches, to connect via USB to consumer electronic products such as PCs and health appliances. Interoperability of health-related devices and consumer electronic products will facilitate the communication between a patient and a doctor, an individual and a fitness coach, or an elderly person and a remote caregiver. Page 25 of 109

26 D2.4 State of the Art 3. Slate Tablet-PC The term tablet computer or just tablet refers to a mobile computing device larger than a mobile phone or personal digital assistant, integrated into a flat touch screen and primarily operated through this touch screen. If alphanumerical input is required, this is normally achieved by using a virtual keyboard on the screen. Some models include pens that can be used optionally, for instance for handwriting input. Often it is possible to connect additional peripheral devices including physical keyboards, cameras, mass-storage devices, wireless network adapters etc. The standard form just described is called slate. Three other forms with different approaches for providing the user with a separate keyboard not consuming main screen real estate have been offered on the market, which are called Convertibles, Hybrids, and Booklets respectively. Convertibles have an integrated real keyboard that can be hidden by a swivel or slide joint. Hybrids have a keyboard into which the tablet can be inserted like into a docking station and which holds the tablet in an upright position. The detached tablet can be operated like an ordinary slate. Booklets are devices that fold open like a book and have a screen on either side. Both screens are equipped with touchscreen functionality. If the Booklet is operated in an half opened position and placed in front of the user like a notebook computer, an onscreen keyboard can be loaded on the horizontal screen and the booklet can be used like a Convertible. Otherwise it resembles a foldable slate. Tablet computers equipped with a desktop operating system like Microsoft Windows are called tablet-PCs1. Tablet PCs are intended to provide similar office productivity functions as a stationary computer through a user interface that has been extended for mobile use (touch screen, virtual keyboard, handwriting input with a pen). The tablet PC approach found some success only in niche markets like health care, education or aviation. A drawback of these devices is that the GUI widgets have been designed for mouse operation and require the precise positioning of a cursor. A typical tablet PC therefore needs to be stylus driven and must be held with one arm only to leave the other arm free for using the stylus. The current popularity of the newer tablets is due to the fact that they are primarily designed for media consumption and only a limited amount of text input as needed for social networks, messaging, e-mail, configuration entries and internet navigation. All these tasks can be achieved either by pushing fairly large virtual buttons with the fingertip or by grabbing and pushing or by otherwise manipulating large graphical entities on the touch screen. It is expected that such features will be available for users, who wish to employ a Microsoft operating system on a tablet, only with the advent of Windows 8 at the turn of the year 2013. More details about the evolution of the tablet computer can be found on Wikipedia (http://en.wikipedia.org/wiki/Tablet_computer). 3.1 Technical Capabilities The form of tablet computer envisaged for the USEFIL project is the slate. Hybrids will be considered, if it turns out that text input has a more important role than expected. Currently many models from a range of manufactures trying to rival the popularity of Apple's iPad are on offer, which gave rise to the notion of tablet war. Since the USEFIL consortium made the decision to employ Open Source software, all tablets exclusively running either Apples iOS or one of Blackberrys operating systems or Microsofts Windows will not be taken into account here. Cheap tablets with relatively low performance costing less than 100 Euros can be found as well as high end models costing several hundred Euros. The most obvious difference between the various 1 The distinction between tablet computers and tablet PCs is not always drawn and cannot be found in the original documents describing the work to be performed in the USEFIL project. There the terms tablet PC or slate tablet PC had been used, even though the iPad-like tablet computers currently gaining great popularity were meant. Page 26 of 109

27 D2.4 State of the Art slates is the size of their display, which determines the overall size of the device. Display diagonals currently fall into the range between 7 and 10.1, with aspect ratios of either 4:3, like the one of legacy TV or computer screens, or 16:10, which is better adapted for displaying widescreen movies. Since the pixel density on all tablet screens is similar, larger models usually have a correspondingly higher display resolution. The highest resolution currently available on tablets is 1280 x 800 pixels. This is good enough to display high definition videos in the format with the lowest resolution allowed by the standard (720p) and to render web pages in such a way that they require only a little zooming and scrolling. However, considerably higher pixel densities and thus higher screen resolutions up to 2560 x 1600 for 10 tablets have been announced, and the first steps towards that mark are to be expected during this year (2012). Many tablets restrict the user to a narrow viewing angle due to the fact that LCDs of the relatively cheap Twisted Nematic (TN) type are used. While this is tolerable during handheld use by a single person, the stationary use in a wall mount or docking station and the shared viewing by two or more persons is hampered by that type of display. The more advanced technologies of Vertical Alignment (VA) and In-Plane Switching (IPS) have overcome that problem, but they are found only in larger and more expensive tablets. Some feature restrictions had to be introduced to limit the power consumption of the tablets, because users expect at least several hours up to one day of operation with a single battery charge. Furthermore there is no room for cooling devices in a tablet, which also dictates a low power design. None of the current tablets includes true colour (24 bit) graphics, and in most cases de-saturated colours are used to make the display appear brighter than with more saturated colours given the same backlight level. However, the initial restrictions are likely to be overcome at some point in the near future. Central processing units, graphics processing units and several controllers are always integrated as System on a Chip (SOAC). The central processors used are of the Reduced Instruction Set (RISC) type and thus able to perform often occurring complex computations in a single step, and only need to revert to more long-winded procedures for seldom occurring tasks. While the chips employed in tablets stem from a number of different manufacturers like Texas Instrument, Samsung, Renesas, Freescale and so on, many of these manufacturers use designs licensed from ARM holdings, a company that only produces IPR and does not make chips itself. About half of all tablet processors are ARM based. A similar situation exists with graphics processors where PowerVR is a vendor from which many chip manufacturers are licensing designs. While the overall performance of tablets may not yet have reached that of a modern desktop computer, impressive advances have already been made and further ones are to be expected for the near future. Multicore processors running at frequencies up to 1 GHz, decoders for modern video codecs and HD-video, and shader hardware supporting recent versions of DirectX and OpenGL are already available. The most advanced SOACs to date are Snapdragon S4 by Qualcomm (own CPU and GPU), OMAP 5 by Texas Instruments (with ARM CPU and PowerVR GPU) and TEGRA 3 by NVidia (with ARM CPU and own GPU). Tablets are currently equipped with up to 1 Gb RAM. They contain a solid state disk with a capacity of some GBs rather than a hard disk. Storage space can be expanded with a memory chip or an external drive. Network connectivity options usually include WiFi, Bluetooth, and often also GSM or UMTS or both. Other features commonly found are USB host and HDMI connectors, front and back cameras, a microphone, acceleration sensors and SD card slots. Sometimes a GPS sensor is also available. In the cheaper models the mandatory touch screen is often of the resistive type, whereas the capacitive type, which is easier to operate with the fingertips, is more often found in the expensive models. For multi-touch functionality the OSs sometimes support the tracking of only two, Page 27 of 109

28 D2.4 State of the Art sometimes of more simultaneous finger positions, depending on the underlying processing capacities of the system. Even though a couple of interesting alternatives exist, currently practically all tablets employing an Open Source operating system are running Googles Android, , for which programs (and code examples) for all conceivable purposes are available. New programs (apps) are usually only available through the internet either from the generic Android Market or from another one that the tablet manufacturer or a network provider installed. However, technically it is possible to side- load apps from a storage device that are self-written or otherwise unavailable through the internet, even though this may not be supported and require certain skills. Hewlett Packard recently announced that it will release webOS, which had been running on its short lived tablet named Touchpad, as Open Source. WebOS is highly acclaimed for its multitasking ability and its user friendly interface that allows a direct smooth transition between the simultaneously running tasks. HP plans to supply webOS with the ability to run Android Apps, and it remains to seen whether any tablet maker will take interest in using it. SUGAR is a Fedora-LINUX combined with a user interface written in Python. It can run on ARM as well as on Intel CPUs. SUGAR tries to circumvent the ubiquitous distinction between data files and applications with the concept of activities. An activity is an application that does not require it to explicitly open or store files, because all events and generated data are automatically stored in a journal, from which the activity can be resumed at each of the entries. Another important difference in the user interface is that all activities are designed for shared use in a network and that the single user mode is just a special case in which the network is restricted to a single computer. The interface was designed for young children and therefore runs all activities in full screen mode and is completely operated through icons. Those features seem to be equally well suited to elderly, even though SUGAR apparently has not yet been tested with this user group. The OLPC project has been developing an own tablet for children that was presented during CES 2012. Another attempt to produce an Open Source software platform for mobile computing and communication devices had been jointly started by Nokia and Intelearly in 2010 under the name MeeGo, again based on Linux. It can be used with Intel as well as with ARM processors. MeeGo was also destined to be used on SmartTVs, nettops, in-vehicle infotainment devices and embedded systems and had found support from other major companies. However, after 1 years of development the main protagonists parted ways and the development of MeeGo was halted. Nokia is now using Windows Phone on their smartphones besides developing an own new smartphone-only operating system, and Intel began collaborating with Samsung on a MeeGo-successor named Tizen. Tizen, the first full release of which is now available, is targeted at the same devices as MeeGo. Tizens unique approach is that all applications are developed in HTML5. 3.2 Social Networks catering to the ageing population In the current context a social networking service or, in short, a social network is defined as an internet service offered to establish and maintain personal relations between the members of spatially distributed groups with shared interests and/or activities2. Currently Facebook is almost a synonym for services of that kind. Social networks usually combine several methods of communication. One is the release of self-descriptions in text and image format (the profile), which can be searched by others. Another method is messaging via e-mail or chat, and still another one is a log of physical or mental activities that can either be retrieved (blog/bulletin board) or is even broadcast. It is always possible to define lists of privileged members (friends or buddies) 2 C.f. Boyd, D. M. and Ellison, N. B. (2007), Social Network Sites: Definition, History, and Scholarship. Journal of Computer- Mediated Communication, 13: 210230. Page 28 of 109

29 D2.4 State of the Art who are allowed to initiate contact and to receive certain information. Event and appointment calendars are also common. While these features seem to appeal mostly to younger adults when the service is open to pursue any kind of activity or to discuss any topic, many attempts have been made to establish thematically oriented social networks primarily targeted at seniors. Here editorials, newsletters and advertising are used to encourage and sustain topics that are of interest for elderly users like health, nutrition and diets, travel, death and personal loss etc. An entry in the Senior Communities Blog dated from 2008 lists 20 US-American social networks for mature users3, but many of these sites are now defunct. A test report from 2011 covered five social networks for seniors in Germany4. While senior is often defined as 50+, there are in fact social networks with even older users. For example the age of the members of planetsenior.de is 62 on average. However, planetsenior.de has only about 1,100 members as compared to about 15,000,000 German Facebook users, which might be taken as a hint to the technological barriers that still exist for many of the elderly. The primary way to participate in a social network is to login to the corresponding website with a browser on a desktop or laptop computer. This requires computer and internet knowledge as well as the sensorimotor skills to operate a mouse. Even though the first precondition will be met by everybody when the Digital Natives, who are now in their thirties, have grown old, at present both of them are less often fulfilled with higher age, and rarely by people aged 80 or beyond. However, smartphone apps make the same services available with a user interface that is at least in principle better adapted to the skills of the elderly. Smartphone user interfaces offer a simple alternative to the complicated arrangement of windows, menus, toolbars, text-links etc. by always presenting the user with a choice between just a small number of pushbuttons. Unfortunately the small size of the screens makes them still difficult to use for elderly persons, and smartphones still presume too much background knowledge about the technology and the way of interaction defined by the operating systems style guide. Tablet computers with similar apps on their larger screens can be used to overcome these problems, if they are employed as single purpose turnkey systems. Once properly configured by an expert they can boot directly into the social networking app and hide all other apps and widgets that are unnecessary for the interactions with the social network. Further simplifications are possible and advisable. For example all functions destined to develop new relationships with strangers, i.e. profile management, people search, and formal friends list adding procedures can be removed so that the social networking application mirrors only the real-life relationships. The same ideas can also be applied to ordinary computers, if the portability of the device is dispensable. Bettie5 is such a PC-based turnkey system for social networking currently under development by an Irish start-up. Bettie comes with a keyboard, but has no pointing device. Contact to other people is neither initiated by clicking on a friend's ID" nor by typing in an address, but rather through placing onto the screen an RFID representing that friend. The internet connection is established automatically through WiFi and via a special wireless router. 3 http://www.seniorhome.net/blog/2008/50-best-social-networks-for-seniors/ 4 http://www.test.de/themen/computer-telefon/meldung/Soziale-Netzwerke-fuer-Aeltere-Gut-vernetzt-und- ueber-50-4190043-4190045/ 5 http://www.bett.ie/ Page 29 of 109

30 D2.4 State of the Art Figure 14: Bettie, a display with integrated PC and social networking software for elderly Figure 15: Eldy, a PC program to run a limited number of applications tailored to elderly users Page 30 of 109

31 D2.4 State of the Art Another PC-based solution - which has also been ported to TV set-top boxes - is Eldy6. Eldys approach is less radical, and it is a pure software application. Eldy can be considered as a simplified desktop with a simplistic web browser and an otherwise conventional interaction style. Recently a couple of tablet computers for seniors, who require some kind of home care or regular medical consultations, have been put on the market. Here the social network is basically a network of caregivers. Memo is a slate destined for persons who are cared for by their family members or other non- professional caregivers. Figure 16: Memo, a tablet for elderly and their caregiver network, shown with the most complex home screen layout available (left) and placed in a living room (right) Memo can display the daily schedule, a to-do-list and instant messages from caregivers on its home screen. Additionally five buttons are available. With a help-button an emergency call can be issued as instant message and as e-mail, and a pill-box-button can be used to display a list and a descriptions of the daily medication. The other buttons provide a phone book, a photo album and weather information respectively. A web page is used to edit the data and to compose instant message so that the caregivers are not required to have a Memo tablet themselves. Figure 17: Angela, another tablet for elderly caretakers 6 http://www.eldy.eu/ Page 31 of 109

32 D2.4 State of the Art Angela7 is a product very similar to Memo, but providing more features. It includes a video chat and offers the elderly the option to compose text chat or e-mail messages himself. Connect8 is a tablet PC co-developed by Intel and running Windows 7. It goes one step further towards an e-health appliance by incorporating wellness surveys and brain fitness games into the range of available functions. Figure 18: The Connect tablet on a sideboard at home (left) and a larger view of its home screen (right) The surveys consist of custom designable questionnaires that the patient regularly completes at home. Caregiver staff remotely monitoring the well-being of the patients can see their self-assessed status and react accordingly. Also compliance with the medication is actually checked by the professional caregivers. Otherwise Connect and its complementary system components follow the same asymmetric design as Memo and Angela insofar as only the elderly is given a tablet and the other network partakers are connected through ordinary computers. All of the above solutions require a conscious effort of the caregivers to provide the elderly caretaker with a feeling of being in contact with and belonging to them. Not available are solutions like the one envisaged in the USEFIL project, in which a natural impression of togetherness is meant to be evoked through automatically created awareness signals. 7 http://independa.com/angela 8 http://www.careinnovations.com/Products/Connect Page 32 of 109

33 D2.4 State of the Art 4. Video Monitoring System 4.1 Cameras Figure 19 shows examples of cameras that can be used for behaviour recognition while Table 3 gives a comparative list of their features. All these cameras are relatively small and lightweight, supporting WLAN with TCP/IP or HTTP protocol. Prices of these cameras range from 80 Euros to 150 Euros (March 2012). Their main differentiating features are The video resolution, ranging from 640x480/30fps to 1280x960/15fps Whether they also support audio Whether they support night vision using infrared Table 3: Examples of cameras that can be used for behaviour monitoring TP-Link Trendnet LevelOne TP-Link Trendnet LevelOne TL-SC3130G TV-IP110WN WCS-0010 TL-SC3171G TV-IP121WN WCS-0040 resolution 640 x 480 640 x 480 640 x 480 640 x 480 640 x 480 1280 x 960 frame rate 30fps 30fps 25fps 30fps 30fps 15fps light sensitivity 0.5lx 0.5lx 3lx 0.5lx 3lx audio 8kHz(mic) no 58dB yes yes (mic?) yes (mic) weight 155g 396 155g 235 dimensions 96 x 58 x 31 70 x 100 x 57 114 x 41 x 141 75 x 47 x 136 70 x 100 x 57 114 x 41 x 141 energy 5V 6W 5V 12V 6W 12V / 8W infrared no no no 12 LEDs yes no indicative price 80E 82.79E 88.81E 94E 101E 150 E (a) TP-Link SC3130G (b) Trendnet TV- (c)LevelOne WCS0010 (d) TP-Link SC3171G IP110WN Figure 19: Examples of cameras Beside cameras, monitoring accuracy has important chances to be boosted using Kinect. Kinect is a motion sensing input device by Microsoft. It has been launched in November 2010 for the Xbox 360 video game console though a PC-dedicated version is expected to be launched in 2012. Kinect possesses an infrared camera and sensor that makes it possible to create a range image of the scene it monitors, thus making easier to model it in three dimensions. It also possesses a regular 640x480 32-bit colour video camera at 30fps and a 16 kHz resolution audio microphone. Drivers and an SDK Page 33 of 109

34 D2.4 State of the Art for controlling Kinect and obtaining Kinect data have been developed from Microsoft as well as from the OpenKinect community. The price of Kinect is around 120Euros (March 2012). A PC version of the Kinect is also scheduled to be available soon, which allows monitoring in a closer distance, 40cm, as opposed to 50cm. Its cost however is expected to be increased by 100Euros. The technology behind Kinect is developed by Israeli company PrimeSense. Although Kinect is the only customer product using this technology at the moment, the developers can also purchase ASUS Xtion Pro (Depth sensors only) and Xtion Pro Live (Depth sensor + RGB camera) platforms for PC. Figure 20: Kinect and ASUS Xtion Pro Live 4.2 Processing Units Processing audio-visual monitoring information is a challenging task that may require substantial computing power. At the same time, USEFIL should try to use unobtrusive and inexpensive devices as much as possible. Towards this end, nettops with good GPU capabilities would be a good choice, since these tend to be small sized, relatively inexpensive and elegant. Importantly, possessing a GPU that can be used for General Purpose (GP) computing may boost processing to the real-time needs of USEFIL. The specifications of such nettops are given in Table 4. In addition, FXI has recently released a USB-sized computer under the code name Cotton Candy. This device will soon be available for preorders at around 150Euros. Cotton Candy is an attractive solution, mainly due to its small size. Nevertheless, one should mind that (a) it has a low processing power capabilities compared to the nettops mainly due to the big difference in GPU performance (b) it has no hard disk (only an additional microSD storage is possible) and (c) it would probably need an extra USB hub to accommodate for potential devices (e.g. a Kinect). This would mean that the overall installation may not be so unobtrusive. Figure 24 shows indicative photos of nettops and the Cotton Candy. Table 4: Examples of small processing units ZOTAC ZBOX ID41 Plus Sapphire EDGE-HD2 ZOTAC ZBOX ID80Pl FXI Cotton Candy CPU Intel Atom D525 1.8 GHz Intel Atom D525 1.8GHz IntelAtomD2700 2.13 GHz ARM [email protected] Memory 2GB DDR3 800 SO-DIMM 2GB DDR3-800 So-DIMM 2GB DDR3 1066SO-DIMM 1GB DRAM Storage 250 GB SATA 3.0Gps 320 GB SATA 2.5 Up to 64GB memory local storage (microSD) Network 1000 Mbps Built-in Ethernet, WiFi 802.11n/g/b, WiFi 802.11 b/g/n, Bluetooth Graphics NVIDIA ION 2 NVIDIA ION 2 (16 cores) NVIDIA GeForce GT520M Quad Core ARM Mali-400MP (48 cores) I/O HDMI, DVI, 2xUSB 3.0, 3x VGA, HDMI, 4xUSB 2.0 HDMI, DVI, 2xUSB 3.0, 4xUSB USB 2.0 male, Female micro USB 2 2.0 USB (2.0), HDMI 1.3a with audio Power DC19V AC DC19V USB Size 18.7 x 28 x 4.0 19.3 x 14.8 x 2.2 18.8 x 18.8 x 4.4 USB sized Price (no ~286 Euros ~330 Euros ~300 Euros ~150 Euros VAT) Page 34 of 109

35 D2.4 State of the Art (a) Sapphire EDGE-HD2 (b) Zbox ID80 (c) FXI Cotton Candy Figure 21: Examples of small processing units Page 35 of 109

36 D2.4 State of the Art 5. Monitoring Humans 5.1 Monitoring Vital Signs Unobtrusive physiological monitoring is an emerging research topic with a direct potential application for peoples health in particular elderly people and those living alone. Its aim is to quietly observe and assess vital signs in order to provide valuable information with respect to the health condition of the ones monitored, without requiring extra care or effort from them. In this section, we review the literature that relates to unobtrusive physiological monitoring using visual information and, in particular, by means of observing how skin reflects optical and near-optical radiation. We begin by introducing photoplethysmography and review the vital signs that can be deduced through photoplethysmograms. We then review recent methods that aim to observe such signs through common cameras. The foundation of monitoring physiological parameters using optical or near-optical information is photoplethysmography (PPG). It is a topic that is widely known in medicine due to a now very popular device, the pulse oximeter that attaches to a patients finger or ear and allows monitoring blood oxygenation. The basic idea of PPG is that by using a light source and a detector and by observing skin absorbance or reflectance to light, we may indirectly measure important indexes related to a persons physical condition. This is possible due to the combination of thinness, low optical absorption and sufficient scattering of skin that allows visible light to penetrate deep into the skin and scatter back to its surface9. A model of the skin is shown in Figure 22. The skin reacts to many kinds of local and global stimuli by adjusting the perfusion and thus its blood volume. In particular, the PPG signal consists of a steady component, which is related to the relative vascularisation of the tissue, and a pulsatile component which is related to changing blood pulse volume. The resultant signal is a measure of expansion of skin vessels which is a summation effect of the arterial pulse and the opposing elastic properties of the vessel wall (Vlisuo et al., 2010). The signal has at least five different frequency components in the interval 0.007Hz - 1.5Hz. These frequency components may be related to several physiological parameters, including heart rate, respiration, blood pressure control, thermoregulation, central baroreflex activity, vasomotoric rhythms and the autonomous nervous system (ANS)10. The potential of detecting physiological parameters is a research topic under continuous investigation11. In particular, we may reliably detect the respiration rate12. Once the respiratory component has been removed, the heart beat may also be monitored to provide time-critical information. For instance, a beat-to-beat change of the waveform amplitude is often the first clue that the person has developed an irregular heart rhythm. Also, heart rate variability analysis may also be an index to 9 J. L. Reuss. Multilayer modeling of reflectance pulse oximetry. Biomedical Engineering, IEEE Transactions on, 52(2):153159, 2005. 10 B. F. Keogh and R. J. Kopotic. Recent findings in the use of reflectance oximetry: a critical review. Current Opinion in Anesthesiology, 18(6):649, 2005. 11 K. H. Shelley. Photoplethysmography: beyond the calculation of arterial oxygen saturation and heart rate. Anesthesia & Analgesia, 105(6S Suppl):S31, 2007. 12 A) PS Addison and JN Watson. Secondary wavelet feature decoupling (SWFD) and its use in detecting patient respiration from the photoplethysmogram. In Engineering in Medicine and Biology Society, 2003. Proceedings of the 25th Annual International Conference of the IEEE, volume 3, pages 26022605. IEEE, 2003. B) P. A. Leonard, J. G. Douglas, N. R. Grubb, D. Clifton, P. S. Addison, and J. N. Watson. A fully automated algorithm for the determination of respiratory rate from the photoplethysmogram. Journal of clinical monitoring and computing, 20(1):3336, 2006. C) Lena Nilsson, Tomas Goscinski, Anders Johansson, Lars-Gran Lindberg, and Sigga Kalman. Age and gender do not influence the ability to detect respiration by photoplethysmography. Journal of Clinical Monitoring and Computing, 20:431 436, 2006. 10.1007/s10877-006-9050-z. D) Jinseok Lee and K. H. Chon. Time-varying autoregressive model-based multiple modes particle filtering algorithm for respiratory rate extraction from pulse oximeter. Biomedical Engineering, IEEE Transactions on, 58(3):790 794, March 2011. Page 36 of 109

37 D2.4 State of the Art assess the emotional state of a person13. The waveform amplitude is a further index since it may either increase due to warming or sedation, which causes vasodilatation, or decrease due to cold or stress that causes vasoconstriction14. Figure 22: The human skin (Vlisuo et al., 2010)15. The photoplethysmographic waveform may give us indications that go beyond what heart rate and respiration. For example, gastric motility is the spontaneous peristaltic movements of the stomach that aid in digestion, moving food through the stomach and out through the pyloric sphincter into the duodenum. These movements induce a low frequency component on the waveform that is also detectable in the PPG waveform16. We may therefore conclude if someone had their stomach full when monitored. Even beyond that, by observing the respiratory sinus arrhythmia phenomenon in the PPG waveform, we may derive conclusions with respect to the physical condition and/or age of a person.17 Finally, an important vital sign that can be measured through the PPG signal is the blood hematocrit, defined as the percentages of the packed red blood cells volume to whole blood (Franceschini, 2002). The main causes of the high level of the hematocrit for healthy people are hypernutrition, excessive sweating and social stress. Measurement of the hematocrit is possible because of the 13 Pei-Yang Hsieh and Chiun-Li Chin. The emotion recognition system with heart rate variability and facial image features. In Fuzzy Systems (FUZZ), 2011 IEEE International Conference on, pages 19331940, June 2011. 14 K. H. Shelley. Photoplethysmography: beyond the calculation of arterial oxygen saturation and heart rate. Anesthesia & Analgesia, 105(6S Suppl):S31, 2007. 15 P. Vlisuo, I. Kaartinen, H. Kuokkanen, and J. Alander. The colour of blood in skin: a comparison of allens test and photonics simulations. Skin Research and Technology, 16(4):390396, 2010. 16 A) S. Mohamed Yacin, V. Srinivasa Chakravarthy, and M. Manivannan. Reconstruction of gastric slow wave from finger photoplethysmographic signal using radial basis function neural network. Medical and Biological Engineering and Computing, 49:1241 1247, 2011. 10.1007/s11517-011-0796-1. B) S. Mohamed Yacin, M. Manivannan, and V. Srinivasa Chakravarthy. On non-invasive measurement of gastric motility from finger photoplethysmographic signal. Annals of Biomedical Engineering, 38:37443755, 2010. 10.1007/s10439-010-0113-4. 17 M. A. Franceschini, D. A. Boas, A. Zourabian, S. G. Diamond, S. Nadgir, D. W. Lin, J. B. Moore, and S. Fantini. Near-infrared spiroximetry: noninvasive measurements of venous saturation in piglets and human subjects. Journal of Applied Physiology, 92(1):372 384, 2002. Page 37 of 109

38 D2.4 State of the Art different absorbance and reflectance properties of the saturated with oxygen cells at least two different light wavelengths. This has been the principle of pulse oximetry and, recently of pulse hematometry.18Recent Advances Until recently, the PPG waveform has been acquired only in a controlled way that included the source of light and the particular skin location on the body. These parameters have been assumed to greatly influence the accuracy/richness of measurements results. To give a few examples, Cui et al. (1990)19 concluded that minimum reflectance and a peak sensitivity to the blood pulsations that exists in the wavelength range from 510 to 590 nm. Moreover, it is claimed that a particular shaped light source, a ring-shaped light source, more than doubles the measurement depth, compared with the diffuse light20. Also, in Reuss and Silker (2004) found that monitoring site selection is of importance for the accuracy of measurements21. Nevertheless, there is now an increase interest in relaxing the conditions needed to obtain the photoplethysmographic waveform. The motivation is that, by having less control over the light source and/or the exact skin position, and at the expense of obtaining a less accurate waveform, a PPG waveform may be obtained with less involvement from the patient, a feature need for unobtrusive monitoring. A method that has been investigated is to use mobile phones/smartphones22. An advantage of the mobile phone monitoring is that it allowed patients to make baseline measurements at any time. Given illumination of the area with a white LED mobile phone flash that the smartphone possessed, a camera receives the reflected light (see Figure 23). This type of imaging falls into the reflection photoplethysmographic imaging paradigm. Scully et al.23 observed that the technology available in a standard mobile phone camera has the potential to be used as an accurate multi-parameter physiological monitor, including breathing rate, cardiac R-R intervals and blood oxygen saturation. Note that using a smartphone introduces some noise, since there is no physical device ensuring a stable connection with the skin, as is the case with pulse oximeter clips. Nevertheless, the signal acquired is still quite strong since the light emission and detection are in contact with the fingertip. 18 M. Nogawa, S. Tanaka, and K. Yamakoshi. Development of an optical arterial hematocrit measurement method: pulse hematometry. In Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the, pages 26342636, 2005. 19 W. Cui, L. E. Ostrander, and B. Y. Lee. In vivo reflectance of blood and tissue as a function of light wavelength. Biomedical Engineering, IEEE Transactions on, 37(6):632639, June 1990. 20 P. Valisuo and J. Alander. The effect of the shape and location of the light source in diffuse reflectance measurements. In Computer-Based Medical Systems, 2008. CBMS08. 21st IEEE International Symposium on, pages 8186. IEEE, 2008. 21 J. L. Reuss and D. Siker. The pulse in reflectance pulse oximetry: modeling and experimental studies. Journal of clinical monitoring and computing, 18(4):289299, 2004. 22 A) M. J. Gregoski, M. Mueller, A. Vertegel, A. Shaporev, B. B. Jackson, R. M. Frenzel, S. M. Sprehn, and F. A. Treiber. Development and validation of a smartphone heart rate acquisition application for health promotion and wellness telehealth applications. International Journal of Telemedicine and Applications, 2012, 2012. B) D. Grimaldi, Y. Kurylyak, F. Lamonaca, and A. Nastro. Photoplethysmography detection by smartphones videocamera. In Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), 2011 IEEE 6th International Conference on, volume 1, pages 488491. IEEE, 2011. C) E. Jonathan and M. Leahy. Investigating a smartphone imaging unit for photoplethysmography. Physiological Measurement, 31:N79, 2010. 23 C. G. Scully, J. Lee, J. Meyer, A. M. Gorbach, D. Granquist-Fraser, Y. Mendelson, and K. H. Chon. Physiological parameter monitoring from optical recordings with a mobile phone. IEEE Transactions on Biomedical Engineering, 59(2):303306, February 2012. Page 38 of 109

39 D2.4 State of the Art Figure 23: Using a smartphone (Source: Vlisuo et al., 2010) Figure 24: Using a web camera (adapted by Poh et al., 201024). An even more challenging task is acquiring and analysing the PPG signal remotely, using only ambient light and a general purpose camera, such as one that common laptops possess (see Figure 24. Though infrared light has been considered as a default choice for PPG, as early as 199225 and later in 200826, the potential of using the visible spectrum and green light in particular has been pointed out. On the other hand, the possibility of measuring PPG signals remotely using a camera, though with controlled light and small distance from the skin was examined. 27 A justification of the RGB colour space was further analysed by Vlisuo et al. in 2010. A simple remote cameras and 24 M. Z. Poh, D. J. McDuff, and R. W. Picard. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Biomedical Optics Express, 18(10):1076210774, 2010. 25 A) J.A. Crowe and D. Damianou. The wavelength dependence of the photoplethysmogram and its implication to pulse oximetry. In Engineering in Medicine and Biology Society, 1992 14th Annual International Conference of the IEEE, volume 6, pages 24232424. IEEE, 1992. B) D. Damianou. The wavelength dependence of the photoplethysmogram and its implication to pulse oximetry. University of Nottingham, 1995. 26 Y. Maeda, M. Sekine, T. Tamura, A. Moriya, T. Suzuki, and K. Kameyama. Comparison of reflected green light and infrared photoplethysmography. In Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE, pages 2270 2272, aug. 2008. 27 A) FP Wieringa, F. Mastik, and A. F. W. Steen. Contactless multiple wavelength photoplethysmographic imaging: A first step toward a spO 2 camera technology. Annals of biomedical engineering, 33(8):10341041, 2005.B) K. G. Humphreys. An investigation of remote non-contact photoplethysmography and pulse oximetry. PhD thesis, National University of Ireland, 2007.C) S. Hu, J. Zheng, V. Chouliaras, and R. Summers. Feasibility of imaging photoplethysmography. In BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on, volume 2, pages 7275. IEEE, 2008. D) Alexei A. Kamshilin, Serguei Miridonov, Victor Teplov, Riku Saarenheimo, and Ervin Nippolainen. Photoplethysmographic imaging of high spatial resolution. Biomedical Optics Express, 2(4):996 1006, April 2011. Page 39 of 109

40 D2.4 State of the Art visible spectrum from ambient light has been combined, showing that remotely acquiring a PPG waveform from the human face is possible28. Work by others29 has gone one step further, by incorporating an automatic face detection step and using Independent Component Analysis to separate interesting components of the PPG waveform. Finally, in 2011 Sahindraker30, tracking particular sites to compensate for motion artefacts has been considered. While some of the aforementioned studies31 have already lead to the development of software products, namely MITs Cardiocam and Philips Vital Signs Camera, unobtrusive PPG using simple web cameras and ambient light is still a challenging research issue. In particular, robustness with respect to the persons motion is still an issue largely unsolved. Incorporating recent results for compensating motion artefacts32 could be proven useful in this respect. Also, monitored indexed by the web cam are currently limited to heart and respiration rate. To allow monitoring more complex indexes, such as oxygen saturation, elaborate contact-PPG analysis methods, such as radial basis functions33 or particle filtering34 could be a good starting point. 5.2 Monitoring Emotions In the USEFIL project the development of video processing algorithms for automated emotion monitoring is planned, because it is hoped that certain emotions can be used as indicators of actual or forthcoming health problems. An excess of emotions can even be a (mental) health problem in itself. Techniques for emotional monitoring depend on knowledge on the exact nature and the observable symptoms of emotions. This section will therefore start with a brief summary of the state of emotion research. There is no universally accepted definition of emotion, but for the current considerations an intuitive conception will suffice. In the following emotion will not be distinguished from mood, affect, feeling and other related concepts. The possible differences reside in the timing, duration and intensity of the related phenomena and that the terms might either refer to a mere mental state or to a mental with an associated physiological state. The conception that a persons current emotion is encoded in visible movements (of the face or other body parts) is usually associated with additional more or less implicit assumptions. Because the possible success of any proposed solution for automated face monitoring may depend on the validity of these assumptions, four of them will at least be mentioned here, namely: 28 W. Verkruysse, L. O. Svaasand, and J. S. Nelson. Remote plethysmographic imaging using ambient light. Optics express, 16(26):2143421445, 2008. 29 A) M. Z. Poh, D. J. McDuff, and R. W. Picard. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Biomedical Optics Express, 18(10):1076210774, 2010. B) M.-Z. Poh, D. J. McDuff, and R. W. Picard. Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Transactions on Biomedical Engineering, 58(1):711, January 2011. 30 P. Sahindrakar. Improving motion robustness of contact-less monitoring of heart rate using video analysis. Masters thesis, Eindhoven University of Technology, 2011. 31 A) M.-Z. Poh, D. J. McDuff, and R. W. Picard. Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Transactions on Biomedical Engineering, 58(1):711, January 2011. B) P. Sahindrakar. Improving motion robustness of contact-less monitoring of heart rate using video analysis. Masters thesis, Eindhoven University of Technology, 2011. 32 A) G. Cennini, J. Arguel, K. Akit, and A. van Leest. Heart rate monitoring via remote photoplethysmography with motion artifacts reduction. Optics Express, 18(5):48674875, 2010. B) Yu Sun, Sijung Hu, Vicente Azorin-Peris, Stephen Greenwald, Jonathon Chambers, and Yisheng Zhu. Motion-compensated noncontact imaging photoplethysmography to monitor cardiorespiratory status during exercise. Biomedical Optics Express, 16(7):077010, 2011. 33 S. Mohamed Yacin, V. Srinivasa Chakravarthy, and M. Manivannan. Reconstruction of gastric slow wave from finger photoplethysmographic signal using radial basis function neural network. Medical and Biological Engineering and Computing, 49:1241 1247, 2011. 10.1007/s11517-011-0796-1. 34 Jinseok Lee and K. H. Chon. Time-varying autoregressive model-based multiple modes particle filtering algorithm for respiratory rate extraction from pulse oximeter. Biomedical Engineering, IEEE Transactions on, 58(3):790794, March 2011 Page 40 of 109

41 D2.4 State of the Art I. that all emotions are composed of just a few discreet and clearly distinguishable states (the basic or fundamental emotions) II. that certain emotions are mapped to visible behaviour III. that different emotions are mapped to different behaviours IV. that the mapping is essentially the same for all humans, i.e. that people having the same emotion exhibit the same behavioural features 5.2.1 The Theory of Basic Emotions All of the above assumptions are widespread belief amongst laypersons, but even though they also have strong proponents in the scientific community like former UCSF professor of Psychology Paul Ekman, they have been debated heavily there. Ekmans list of basic emotions contains happiness, anger, disgust, sadness, fear and surprise as the ones that are visible on the face. In the view of this discreet emotion expression theory each of the different emotions tends to automatically produce a specific behavioural pattern, e.g. a specific movement of the facial muscles. Therefore happiness will make us smile, and the smile is thus a sign for happiness (it expresses happiness) etc. The intensities of the patterns are assumed to be coupled to the intensities of the corresponding emotions so that for example the happier a person is, the stronger is his or her smile. If a facial expression does not fully coincide with one of the prototypical patterns, the discreet emotion theory assumes that this is due to the presence of an emotion mixture. For example a sudden perception of something terrible may result in fear as well as in surprise35 and may be represented on the face with a pattern that consists of features from both basic expressions. One of the more obvious problems is that some of the behavioural features that are commonly associated with certain emotions are neither mandatory nor specific to those emotions. For example angry persons sometimes have their eyes wide open (which is the stereotyped textbook expression) but sometimes closed to narrow slits. Happy persons are often pulling up their mouth corners sometimes with an open and sometimes with a closed mouth but they can also weep and thus exhibit all features commonly associated with sadness. 5.2.2 The Theory of Affect Dimensions An alternative to the discreet emotion theory is the theory of continuously gradable core affects. It assumes that humans can be in different states on at least two dimensions, namely the dimensions of pleasedness and arousedness. A third dimension is also sometimes assumed. Modern authors think about it in terms of self-assuredness or powerfulness, but Wilhelm Wundt, the founder of psychology as an experimental science, had claimed the third dimension to be tension. Those two or three dimensions pertain to the mentioned core affects. Different names like for example evaluation, activity and potency are used for them. In this type of theory emotion words are labels referring to different areas in core affect space plus a certain mind-set towards an object, a situation or a behaviour. For example, anger is the label given to core affects with a low level of pleasedness and high levels of arousedness and self-assuredness that occur if a presumably weaker person hinders the achievement of a goal. There can also exist different emotions, which, however, do not differ in the core affects but in the cognitive representation of the circumstances that caused the affects like for example grief, pity and disappointment. Obviously different language communities can define different emotions by labelling different combinations of core affects and causes. For example feeling blue has no exact translation in the German language. 35 Surprise is probably a cognitive state rather than an emotion, but the example nevertheless illustrates the assumed principle of expression blending. Page 41 of 109

42 D2.4 State of the Art In the view of a dimensional emotion theory emotions are never mixed. Rather than that the mixed emotions are nothing but affective states positioned in core affect space somewhere between those that have a label. Dimensional emotion theories do not preclude that emotions are expressed with the face, but since the number of possible emotions is large, complex mapping rules would be required. No automated mapping of core affects to facial actions is assumed. Grief, pity and disappointment may be considered as subcases of sadness, but it nevertheless remains clear that a facial expression cannot specify their difference. 5.2.3 The Theory of Communicative Emotion Displays Another difference in emotion theories resides in their assumptions about the semiotic status of emotions. One camp maintains that emotion expressions are symptoms of inner states, which unless they are suppressed or masked just leak or even break through. The opposite camp believes that bodily especially facial expressions are used similar as language so that they are normally only then produced, if a person wishes to mention certain emotions. In one of the investigations supporting the view of communicative facial displays swimming athletes were observed after they had won a match, and it was found that even though the athletes were definitely happy about their win, they almost never smiled before they stepped onto the winners stand and thus exposed their face to the public. 5.2.4 Automated Emotion Recognition Emotional recognition begins with passive sensors which capture data about the user's physical state or behaviour without interpreting the input. The data gathered is analogous to the cues humans use to perceive emotions in others. For example, a video camera might capture facial expressions, body posture and gestures, while a microphone might capture the feeling through speech. Other sensors detect emotional cues by directly measuring physiological data, such as electroencephalography (EEG), skin temperature and skin conductance36. Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done using machine learning techniques that process different modalities speech recognition, natural language processing, or facial expression detection, and produce either labels (i.e. 'confused') or coordinates in a valence-arousal space. Following, the affective computing (AC) approaches to emotional recognition will be overviewed. This report on the state of the art technologies of the current affect detection systems is organized with respect to individual modalities or channels (e.g. face, voice, text and physiological signals). Each one of the aforementioned modalities has its own advantages and disadvantages towards its use as an emotion recognition system. Some of the factors that are taken into account include: 1. The validation of the modality as a natural way to identify an affective state. 2. The usefulness of the modality in real world scenarios 3. The time resolution of the modality respecting the specific needs of each application 4. The cost and the intrusiveness of the user. Each modality has its own space in literature and an extensive review of them would be beyond the scope of the current deliverable. Hence, a broad overview of the research in each modality focused on the major accomplishments is provided. Page 42 of 109

43 D2.4 State of the Art 5.2.4.1 Emotion Recognition through Video Analysis 5.2.4.1.1 General In 1992 a 3-day Planning Workshop on Facial Expression Understanding was held in Washington D.C. supported by the U.S. National Science Foundation37. Afterwards delegates from several disciplines jointly concluded that many basic science questions on how to interpret the messages of the face, how messages function in communication, and how they are produced and interpreted by the neuro- muscular system, remain open. Answering these questions is prerequisite for developing and applying automated measurements fully, but answering these questions would, in turn, be expedited by automated measurement tools. Some of the anticipated applications after solving the above questions were located in the medical area. The workshop report stated that many disorders in medicine, particularly neurology and psychiatry, involve aberrations in expression, perception, or interpretation of facial action. Coding of facial action is thus necessary to assess the effect of the primary disorder. The greatest part of the two decades that have since passed has been devoted to academic research and demonstrations. For video based emotion monitoring there are four problem areas in which solutions needed to be found, namely (i) detection and segmentation, (ii) tracking, (iii) parameter estimation and (iv) interpretation. The first challenge of segmentation is to find human bodies, body parts and body features in an image and to tell them apart from the rest of the image content. Numerous approaches to achieve this have been pursued, but necessarily the common denominator of the developed methods is that they attend to characteristics which distinguish a human figure from other objects. Such characteristics can be found in the motion, the appearance (colours) and the shape of a human or in the depth range occupied by his body38. Another challenge is to track the found objects, i.e. to find out where a previously segmented image part has moved after some time, for example after the time of one image frame. Furthermore that what happens to the tracked entity must be measured, i.e. the changes must be described through a suitable set of parameters, and finally the meaning of the measured parameter values in terms of emotions must be found. The distinction between the problem areas is somewhat artificial though, because for example sometimes motion (problem ii) that is common between picture elements is used to distinguish an object from its background (problem i). Also the parameters (problem iii) of a geometrical model can be readjusted in every image to fit a body that has moved or changed its form, and thus the body is tracked (problem ii). In that case tracking and parameter estimation are simultaneously achieved, and generally the successive algorithmic stages can be interlocked in feedback loops. It might be mentioned that there have been approaches in which human behaviours are detected through observing the activity in a spatio-temporal frequency space, i.e. after applying some kind of Fourier or wavelet transformation to the image data. However, the applications which are now gradually appearing on the market seem to operate on the space-time-domain. The first three problems in the processing hierarchy sketched above are mainly technical ones, even if they are motivated by the fourth, which is a psychological one. 37 http://face-and-emotion.com/dataface/nsfrept/nsf_contents.html 38 For an overview see: Thomas B. Moeslund, Adrian Hilton, Volker Krger, A survey of advances in vision-based human motion capture and analysis, Computer Vision and Image Understanding (CVIU), 104 (23) (2006), pp. 90126 Page 43 of 109

44 D2.4 State of the Art It seems natural that engineers and computer scientists tended to concentrate on the technological problems and trusted the psychologists, ethologists, neuroscientists etc. with providing the semantics for human movements. To achieve progress at first it was necessary to decouple the mentioned three technical problems. For example the segmentation problem would be set aside by analysing images depicting single persons in front of an unstructured background, or the tracking problem would be avoided by analysing poses and face configurations in still images. In the case that videos were processed the algorithms ran offline on stored image sequences. Given the limited processing power of the computers in the last millennium there was little choice. However, the interest in real-time expression analysis was strengthened when Rosalind Picard published her book on Affective Computing in 1997. Affective Computing she defined as "computing that relates to, arises from, or deliberately influences emotions", and such a type of computing implies that information about human emotions is immediately available when they occur. 5.2.4.1.2 Facial Expression Inspired by the expressional nature of emotions, the majority on emotional recognition research has focused on detecting the basic emotions from the face. This way of research supports that there is a distinctive facial expression associated with each one of the basic emotions39. The expression is triggered for a short period of time, when emotion is experienced, thus detecting the emotional state of the subjects needs just to detect his prototypical facial expression. Ekman and Friesen40 have developed the Facial Action Coding System (FACS) to measure the facial activity using some objective characteristic facial motions as a method for the recognition of the emotions. They noticed a set of action units that identify independent motions of the face. Then trained human coders decompose an expression into a set of action units. This technique has become the leading methodology for objectively classifying facial expressions in behavioural sciences, which are then linked to the six basic emotions: anger, disgust, fear, joy, sadness and surprise41. In the same sense, the manual coding of a video is an expensive task, since it requires professionally trained coders who spend 1 hour for each minute of video42. As it is expected, various researchers have tried to automate this process; some of their work is listed on the next table43: Table 5: Summary of Research Projects attempting to automate video coding Study Modality Classifier Model Evaluation Stimulus (El Kaliouby & Face DBN (real time) 6 3rd : 10 Acting 39 P. Ekman and W.V. Friesen, Unmasking the Face. Malor Books, 2003; P. Ekman, An Argument for Basic Emotions, Cognition and Emotion, vol. 6, pp. 169-200, 1992.; P. Ekman, Expression and the Nature of Emotion, Approaches to Emotion, K. Scherer and P. Ekman, eds., pp. 319-344, Erlbaum, 1984 40 P. Ekman and W. Friesen, Facial Action Coding System: A Technique for the Measurement of Facial Movement: Investigators Guide 2 Parts. Consulting Psychologists Press, 1978 41 P. Ekman, An Argument for Basic Emotions, Cognition and Emotion, vol. 6, pp. 169-200, 1992 42 G. Donato, M.S. Bartlett, J.C. Hager, P. Ekman, and T.J. Sejnowski, Classifying Facial Actions, IEEE Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 974-989, Oct. 1999 43 Calvo, Rafael A.; Sidney DMello (2010). [Affect Detection: An Interdisciplinary Review of Models,Methods, and their Applications "Affect Detection: An Interdisciplinary Review of Models, Methods, and their Applications"] (PDF). IEEE Transactions on Affective Computing 1 (1): 1837. Retrieved 2011-11-16. Page 44 of 109

45 D2.4 State of the Art Robinson expressions and categories annotators 2004)44 head movement Gabor 3rd : Self (false- (Bartlett et al., Face wavelets/ SVN, 20 AU (2 FACS option 2006)45 AdaBoost, LDA coders) +other) (Pantic & 3rd : (2 FACS Patras, Face temporal rules 27 AU Self coders) 2006)46 (Gunes & 6 Piccard, Face, Body C4.5 BN(fusion) 1st Self categories 2007)47 (McDaniel et 6 1st + 3rd (2 ITS Face 33 AU/DA al., 2007)48 categories FACS coders) interactions Despite the faced problems, an important progress is being made in the development of fully automated face-based emotional detection. Robinson and his colleagues have done a lot of work on this.4950 Moreover, other projects in the area of ITS used facial expression recognition to improve the interaction between students and the e-learning systems. 5152 For example, Arroyo et al. (2009) showed how the students facial expressions alongside with physiological data can predict the students emotions. 44 R. El Kaliouby and P. Robinson, Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures, Proc. Intl Conf. Computer Vision and Pattern Recognition, vol. 3, p. 154, 2004. 45 M.S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, and J. Movellan, Fully Automatic Facial Action Recognition in Spontaneous Behaviour, Proc. Intl Conf. Automatic Face and Gesture Recognition, pp. 223-230, 2006. 46 M. Pantic and I. Patras, Dynamics of Facial Expression: Recognition of Facial Actions and Their Temporal Segments from Face Profile Image Sequences, IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 36, no. 2, pp. 433-449, Apr. 2006. 47 H. Gunes and M. Piccardi, Bi-Modal Emotion Recognition from Expressive Face and Body Gestures, J. Network and Computer Applications, vol. 30, pp. 1334-1345, 2007. 48 B. McDaniel, S. DMello, B. King, P. Chipman, K. Tapp, and A. Graesser, Facial Features for Affective State Detection in Learning Environments, Proc. 29th Ann. Meeting of the Cognitive Science Soc., 2007 49 R. El Kaliouby and P. Robinson, Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures, Proc. Intl Conf. Computer Vision and Pattern Recognition, vol. 3, p. 154, 2004. 50 R. El Kaliouby and P. Robinson, Generalization of a Vision-Based Computational Model of Mind-Reading, Proc. First Intl Conf. Affective Computing and Intelligent Interaction, pp. 582-589, 2005 51 I. Arroyo, D.G. Cooper, W. Burleson, B.P. Woolf, K. Muldner, and R. Christopherson, Emotion Sensors Go to School, Proc. 14th Conf. Artificial Intelligence in Education, pp. 17-24, 2009. 52 B. McDaniel, S. DMello, B. King, P. Chipman, K. Tapp, and A. Graesser, Facial Features for Affective State Detection in Learning Environments, Proc. 29th Ann. Meeting of the Cognitive Science Soc., 2007 Page 45 of 109

46 D2.4 State of the Art Zeng et al. (2009)53 reviewed 29 state of the art vision-based emotion recognition methods. A meta- analysis of them can provide us with some important insights to the current systems. The most crucial insight is that only 6 out of the 29 systems could operate in real time scenarios, which is a crucial requirement for practical applications. Moreover, the half of the aforementioned systems relied on datasets with posed facial expressions which is an extra drawback for real time applications. Finally, almost all the systems were concerned with detecting only the six basic emotions, irrespective of whether these emotions are related to the AC applications, while they require pre-segmented emotion expressions instead of naturalistic video sequence. Only very recently, however, the technological skills and the processing power of microcomputers have reached a state in which applications are leaving the labs. This applies to only to facial expression analysis though, because similar applications for monitoring bodily expressions are not yet available. At least four solutions with almost identical features are offered in Europe, even though the algorithms employed might be quite different. All four solutions are able to distinguish one or more faces in live video and require only a webcam to achieve this. They continuously display measured values between 0% and 100% for six basic emotions. These values might either be interpreted as the intensity of the respective emotion in a mixture or as the estimated probability that this emotion is present. The striking similarity of the applications is due to the fact that they all employ Ekmans theory of basic emotions, even though Klaus Scherer from Geneva, another prominent figure in emotion research, seems to have participated in the development of SHORE (see below). Also Ekmans Facial Action Coding System (FACS) for the parameterization of facial activity in terms of so called Action Units is used throughout. Another commonality is that the only serious use that has been made of these applications so far is in market research. A guess for the reason is that manufacturers are mainly interested to find out whether people are happy with their products. Because the current vision based emotion monitoring systems are fairly robust smile detectors, but may be less accurate in measuring negative emotions, their use is more or less restricted to the marketing sector. eMotion54 is a software produced by the Dutch company Visual Recognition, which is a spin-off from the University of Amsterdam. A face is detected by matching a face template. Then a 3D mesh is fitted onto the detected face, and the changes in that mesh are first categorized as Action Units and then interpreted as emotion expressions. 53 Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang, A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39-58, Jan. 2009 54 http://www.visual-recognition.nl/eMotion.html Page 46 of 109

47 D2.4 State of the Art Figure 25: eMotion in face detection mode (left) and in tracking mode (right) eMotion has been used to study the reaction to different types of food: Some 300 women in six European countries were filmed as they ate five foods: vanilla ice cream, chocolate, cereal bars, yogurt and apples. Not surprisingly, ice cream and chocolate produced the happiest expressions across the Old Continent 55 Affetivo is the name of a software product offered by the Swiss company nViso56, which is a spin-off from the Federal Institute of Technology Lausanne. The employed techniques have been patented57. Affetivo has been employed by the Link Institute for Marketing and Social Studies to evaluate the viewers reaction to an advertising spot for a Swiss insurance company. Figure 26: The mesh used for tracking in Affetivo (left) and the measured time course of happiness while viewing an advertising spot (right) Shore58 is a software application developed in the German Fraunhofer Institute for Integrated Circuits (IIS), which is located in Erlangen. In 2010 Unilever had installed a smile-o-meter based 55 Cited from http://www.wired.com/science/discoveries/news/2007/07/expression_research 56 http://www.nviso.ch/solutions-for-market-researchers.html 57 Method and System for Measuring emotional Probabilities of a facial image. International Patent Application No. PCT/EP2010/065544 58 http://www.iis.fraunhofer.de/en/bf/bsy/produkte/shore/index.jsp Page 47 of 109

48 D2.4 State of the Art on Shore in an ice-cream vending machine at Cannes, where customers received free ice-cream for a very big smile, a photograph of which was also uploaded to a social networking site. A set-up with Shore was also placed in the entrance area of the market research trade fair Research & Results 2001, where it was used to count the number of happy visitors. The GfK Group Nuremberg, one of the largest market research enterprises worldwide, conducted feasibility studies with Shore in 2011 after two years of cooperation with the IIS59, in which the customer reaction to still images and also to advertising spots was measured. Apparently the result was that further development is desirable. All three products have been advertised as tools for market research. Figure 27: Person watching a TV spot observed by a camera (left), his emotions measured with Shore in real-time (middle) and the recorded time-course of his emotions (right) Affdex60 is a commercial technology that allows measuring emotion over the web and reads emotional states such as liking and attention from facial expressions using a webcam to give marketers faster, more accurate insight into consumer response to brands and media. Automated facial expression recognition is a relatively new and quickly evolving technology with its roots in the field of human computer interaction. Methods for recognizing facial expression have typically been developed by training the system on participants displaying exaggerated expressions of prototypical emotions, which limited its capabilities and made it unlikely to work in real-world applications. Affdex recognizes anatomical facial muscle movements called action units as well as characterizes collections of those movements together with larger motions of the head such as nods or shakes. Bayesian machine learning processes are used to combine the facial and head movements in order to recognize positive and negative displays of emotion as well as complex states such as interest and confusion. 59 http://www.gfk.com/group/press_information/press_releases/004131/index.en.html 60 http://www.affectiva.com/affdex/ Page 48 of 109

49 D2.4 State of the Art The Technical University Munich (TUM)61 has also developed a real-time emotion monitoring application. It was used by Daimler for a driving-fun study, in which a C-Class model was compared to an older Mercedes 190. Drivers were observed through a camera besides the windscreen while driving on a test track. Figure 28: Facial expression tracked by TUM emotion monitoring system in a car (left) and measured time course of happiness (below) All attempts at vision based emotion detection are, as a matter of logic, based on a leakage theory of emotions. To conclude, up to now automated emotion detection has invariably been based on the discreet emotion theory. However, the ability to detect expressions for other of the assumed basic emotions besides happiness has been of little utility. It appears that after the impressive progress in the techniques of detecting and tracking faces in real-time further research on the semantics of facial displays is needed. 5.2.4.1.3 Bodily Expression For whole body movements the tracking problem is hampered by additional complexities, because the movements are comprised of kinematic chains with many degrees of freedom and because almost always some parts of the body are occluded by other parts. Depth cameras that have recently been developed based on various principles like time-of-flight or structured lighting greatly simplify the problem of distinguishing an object from the background, and with the availability of low-priced models like Microsofts Kinect body tracking has suddenly become feasible. A candidate for high- level parameterisation for body movements like the Action Units for faces that even takes the movement dynamics into account is Labanotation. There are, however, only sparse academic attempts at performing automated Laban Movement Analyses62. Such analyses would only be the first step to emotion monitoring. Unfortunately in terms of movement and posture semantics the state of the art is not yet as advanced as that of facial expression research63. 61 http://www9-old.in.tum.de/ 62 An example is Zhao L., Badler N.I.Acquiring and validating motion qualities from live limb gestures (2005) Graphical Models, 67 (1), pp. 1-16. 63 A most recent contribution can be found in Gross, M., Crane, E., Fredrickson, B., Effort-Shape and kinematic assessment of bodily expression of emotion during gait, Human Movement Science, 31, 1, 202-221 (2012) Page 49 of 109

50 D2.4 State of the Art 5.2.4.2 Emotion Recognition through Audio Analysis Speech includes affective information through the explicit message (what is said) alongside with the implicit paralinguistic features of the message (how it is said). The decoding of paralinguistic features of the messages has not yet been fully understood, although listeners seem to be able to easily decode the basic emotions using prosody64 and non-linguistic vocalizations (laughs, cries etc.). An extensive analysis of the literature on vocal communication of emotions has provided us with some useful conclusions with important implications for AC applications.6566 First of all is that affective information can be encoded and decoded through speech. The most reliable finding is that the pitch appears to be an index of arousal. However, the accuracy rates of detecting emotions from speech are lower than facial expressions for the basic emotions. Sadness, anger and fear are some of the basic emotions which are best recognized through voice, while disgust is the worst. Zeng et al. (2009)67 have also reviewed 19 state of the art speech-based emotion recognition systems, where a meta-analysis of them gives an indication of the current state of the art in this field. One of the most crucial insights is that the speech-based systems are more appropriate for real- world applications because 16 out of 19 systems were trained on spontaneous speech. Second, although many systems are still focused on detecting the basic emotions, there are some remarkable efforts aiming at the detection of other states like frustration. Finally the focus on realistic scenarios such as call centres, tutoring systems etc., has yielded to rich sources of data that they will improve the next-generation speech-based emotion detection systems. 5.2.4.3 Emotion Recognition through Biosignal Analysis Performance is optimal at a critical level of arousal and the arousal-performance relationship is influenced by task constraints.68 Stimulus-Response (SR) theory supports that for a specific stimulus; subjects will produce specific physiological response patterns. Ekman and his colleagues69 alongside with numerous other researchers found that autonomic nervous system specificity allows certain emotions to be recognized. Individual-Response (IR) specificity integrates SR specificity but with one crucial difference. This difference lies with the fact that SR specificity claims that the pattern of a certain response is similar for most people, while IR pertains to how consistent the individuals responses are to different stimulations. Cardio-Somatic features refer to the changes of heart responses as the preparation of the body for a certain behavioural response (e.g. fight or fly). This effect might be the result of physiological changes in situation like the dissonance effect described by Croyle (1983)70. In this study, subjects were stimulated with an argument that it is in agreement with their 64 P.N. Juslin and K.R. Scherer, Vocal Expression of Affect, The New Handbook of Methods in Nonverbal Behaviour Research, Oxford Univ. Press, 2005 65 T. Johnstone and K. Scherer, Vocal Communication of Emotion, Handbook of Emotions, pp. 220-235, Guilford Press, 2000 66 J.A. Russell, J.A. Bachorowski, and J.M. Fernandez-Dols, Facial and Vocal Expressions of Emotion, Ann. Rev. of Psychology, vol. 54, pp. 329-349, 2003. 67 Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang, A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39-58, Jan. 2009 68 E. Duffy, Activation, Handbook of Psychophysiology, pp. 577-622, Holt, Rinehart and Winston, 1972. 69 P. Ekman, R. Levenson, and W. Friesen, Autonomic Nervous System Activity Distinguishes among Emotions, Science, vol. 221, pp. 1208-1210, 1983. 70 R. Croyle and J. Cooper, Dissonance Arousal: Physiological Evidence, J. Personality and Social Psychology, vol. 45, pp. 782-791, 1983. Page 50 of 109

51 D2.4 State of the Art existing attitudes and with an argument in disagreement with them. The results revealed that the first stimulus showed lower arousal in form of electrodermal activity in contrast to the second one. Habituation and Rebound are two common effects which appear in sequences of stimuli with the same psychological background. When a certain stimulus is presented repeatedly, the physiological responses are decreased (habituation). On the other hand, the rebound effect refers to the return in the prestimulus levels after stimuluss presentation.71 Figure 29: The relationship between performance and arousal. (Eysenck & Eysenck, 1985, p. 199)72 Despite the evidence for SR specificity, the accurate recognition of emotions requires models that are personalized, so researchers are now focused in this direction. In this study by Nasoz et al. (2004)73, different algorithms for mapping the physiological signals to certain emotions were compared. More recently Villon & Lisseti (2006)74 proposed a PsychoPhysiological Emotional Map (PPEM) that created personalized mappings between biological signals, like heart rate and skin conductance, and the two dimensions of emotions (valence and arousal). In the following table, we summarize some of the studies where physiological signals, including EEG, were used for the recognition of emotions. Some of these studies combined multiple signals, exploiting the advantages of each one for laboratory or real-world applications. Signals include ECG, Skin Conductivity, Electrodermal activity (SC), Skin Temperature (ST). Feature selection methods include Statistical (Carchi-square, PSD) and Physiological (interbeat interval or IBI, Heart Rate Variability (HRV), and others). Classifiers include Naive Bayes (NB), Function Trees (FT), BNs, Multilayer Perceptron 71 L. Andreassi, Human Behaviour and Physiological Response. Taylor & Francis, 2007. 72 Eysenck, H. J., & Eysenck, M. W. (1985). Personality and individual differences: A natural science approach. New York: Plenum. 73 F. Nasoz, K. Alvarez, C.L. Lisetti, and N. Finkelstein, Emotion Recognition from Physiological Signals Using Wireless Sensors for Presence Technologies, Cognition, Technology and Work, vol. 6, pp. 4-14, 2004. 74 O. Villon and C. Lisetti, A User-Modeling Approach to Build Users Psycho-Physiological Maps of Emotions Using BioSensors, Proc. IEEE RO-MAN 2006, 15th IEEE Intl Symp. Robot and Human Interactive Comm., Session Emotional Cues in HumanRobot Interaction, pp. 269-276, 2006. Page 51 of 109

52 D2.4 State of the Art (MLP), Linear Logistic Regression (LLR), SVMs, Discriminant Function Analysis (DFA), and Marquardt Backpropagation (MBP), C4.5 (Calvo & Mello, 2010)75. Table 6: Summary of studies where physiological signals, were used for emotion recognition Stimulus and Study Signal Description Evaluation 8 categories (baseline 12.5%). Best (Picard et acc: 81.25% using 40 features on ECG,SC,EMG Self-elicitation al., 2001)76 the dataset II. Fisher Proj with SFFS/KNN (Wagner et 4 categories. Acc: 80-90% using al., 2005)77 ECG,SC,EMG Self-selected Songs 120 features then PCA 3&4 categories with acc of 78% for (Kim et al., Audio visual, self- ECG,ST,SC 3 and 62% for 4 using SVM. 2004)78 evaluation Features included RR and HRV 6 categories +intensity of each./ (Nasoz et KNN (71%), DFA (74%), MBP Movie clips SC,HR,ST al., 2004)79 (83%) for individual emotions. No selected by panel results reported for overall. (Vilon & Dimensional model, several signal No subjects Lisetti, HR+SC processing techniques discussed, evaluated 2006)80 no classification result 75 Calvo, R.A.; D'Mello, S.; , "Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications," Affective Computing, IEEE Transactions on , vol.1, no.1, pp.18-37, Jan. 2010 doi: 10.1109/T-AFFC.2010.1 76 R.W. Picard, E. Vyzas, and J. Healey, Toward Machine Emotional Intelligence: Analysis of Affective Physiological State, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1175-1191, Oct. 2001. 77 J. Wagner, N.J. Kim, and E. Andre, From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification, Proc. IEEE Intl Conf. Multimedia and Expo, pp. 940-943, 2005. 78 K. Kim, S. Bang, and S. Kim, Emotion Recognition System Using Short-Term Monitoring of Physiological Signals, Medical and Biological Eng. and Computing, vol. 42, pp. 419-427, May 2004. 79 F. Nasoz, K. Alvarez, C.L. Lisetti, and N. Finkelstein, Emotion Recognition from Physiological Signals Using Wireless Sensors for Presence Technologies, Cognition, Technology and Work, vol. 6, pp. 4-14, 2004. Page 52 of 109

53 D2.4 State of the Art 8 categories, 68-80% for individual (Calvo et al., subjects, 42% for all subjects. SVM ECG, SC, EMG Self-elicited 2009)81 classifier. 120 features / NB,FT,BN,MLP,LLR,SVM (Bailensos 2 categories (amusement vs. et al., ECG, SC+Face sadness) + intensity. Chi-square/ 3rd and Films 2008)82 SVM and Logistic regression 1st and 3rd person (parent and therapist) (Liu et al., 3 categories. 83% acc. using SVM. ECG, SC, EMG, ST evaluations. 2008)83 Aimed at children with Autism Computer based tasks, Pong game and anagram (Vyzas & 8 categories, 40-46% using Picard, EMG,BVP,SC,RESP SFFS/KNN, FP/MAP, Hybrid SFFS/ Self-elicited 1998)84 FP Dimensional model. Acc. for 10- (Haag et al., ECG,EMG,SC,ST,BVP,RESP 20% bands: Valence (90-97%), IAPS 2004)85 Arousal (63-90%) used MLP 10 categories 33-55% for (Alzoubi et EEG Self-elicited individual subjects using SVM. 80 O. Villon and C. Lisetti, A User-Modeling Approach to Build Users Psycho-Physiological Maps of Emotions Using BioSensors, Proc. IEEE RO-MAN 2006, 15th IEEE Intl Symp. Robot and Human Interactive Comm., Session Emotional Cues in HumanRobot Interaction, pp. 269-276, 2006. 81 R.A. Calvo, I. Brown, and S. Scheding, Effect of Experimental Factors on the Recognition of Affective Mental States through Physiological Measures, Proc. 22nd Australasian Joint Conf. Artificial Intelligence, 2009. 82 J.N. Bailenson, E.D. Pontikakis, I.B. Mauss, J.J. Gross, M.E. Jabon, C.A.C. Hutcherson, C. Nass, and O. John, Real-Time Classification of Evoked Emotions Using Facial Feature Tracking and Physiological Responses, Intl J. Human-Computer Studies, vol. 66, pp. 303-317, 2008. 83 C. Liu, K. Conn, N. Sarkar, and W. Stone, Physiology-Based Affect Recognition for Computer-Assisted Intervention of Children with Autism Spectrum Disorder, Intl J. Human-Computer Studies, vol. 66, pp. 662-677, 2008. 84 E. Vyzas and R.W. Picard, Affective Pattern Classification, Proc. AAAI Fall Symp. Series: Emotional and Intelligent: The Tangled Knot of Cognition, pp. 176-182, 1998. 85 A. Haag, S. Goronzy, P. Schaich, and J. Williams, Emotion Recognition Using Bio-Sensors: First Steps towards an Automatic System, Affective Dialogue Systems. pp. 36-48, Springer, 2004. Page 53 of 109

54 D2.4 State of the Art al., 2009)86 Evaluated static and adaptive NB,KNN,SVN (Heraz & 3 dimensions: valence arousal, Frasson, EEG dominance. 17 subjects Evaluated IAPS 2007)87 Nearest Neighbours, DT, Bagging 4 categories acc. 57%-92% depending on Gender differences (Frantzidis and on the dimensional space of et al., EEG IAPS emotions. Several neuroscientific 2010)88 features used. C4.5 classification algorithm A novel model highlighting the role of delta band in emotional (Klados et processing. 4 categories. Statistical al., 2009)89 EEG IAPS significant differences in neuroscientific features, no classification result. 5.2.4.4 Emotion Recognition through Multi-Modal Analysis It is known that multiple physiological and behavioural responses are occurring during an emotional episode. For example anger is expected to trigger various responses, such as particular facial, vocal and body expressions, and changes in humans physiology like increased heart rate etc. Responses from multiple channels which are bound in space and time during an emotional experience are essential for the emotion theory.90 Although that, multi-modal emotion recognition systems are widely advocated, they are rarely implemented91, because of technical difficulties which 86 O. AlZoubi, R.A. Calvo, and R.H. Stevens, Classification of EEG for Emotion Recognition: An Adaptive Approach, Proc. 22nd Australasian Joint Conf. Artificial Intelligence, pp. 52-61, 2009. 87 A. Heraz and C. Frasson, Predicting the Three Major Dimensions of the Learners Emotions from Brainwaves, World Academy of Science, Eng. and Technology, vol. 25, pp. 323-329, 2007. 88 Frantzidis, C; Bratsas, C; Klados, M; Konstantinidis, E; Lithari, C; Vivas, A; Papadelis, C; Kaldoudi, E; Pappas, C; Bamidis, P;, "On the classification of emotional biosignals evoked while viewing affective pictures: an integrated data mining based approach for healthcare applications.", IEEE Transactions on Information Technology in Biomedicine Volume PP, Issue 99, 2010 89 Manousos A. Klados, Christos Frantzidis, Ana B. Vivas, et al., A Framework Combining Delta Event-Related Oscillations (EROs) and Synchronisation Effects (ERD/ERS) to Study Emotional Processing, Computational Intelligence and Neuroscience, vol. 2009, Article ID 549419, 16 pages, 2009. 90 P. Ekman, An Argument for Basic Emotions, Cognition and Emotion, vol. 6, pp. 169-200, 1992. 91 A. Jaimes and N. Sebe, Multimodal Human-Computer Interaction: A Survey, Computer Vision and Image Understanding, vol. 108, pp. 116-134, 2007. Page 54 of 109

55 D2.4 State of the Art are increased in multisensory environments. Nevertheless, the advantages of multi-modal human- computer interaction systems have been recognized as the next-generation approaches.92 There are three methodologies to fuse signals from different sources/sensors, and each one is depending on when the information from the different channels is combined93 Data Fusion is performed on the raw signals exported from the sensors and it can only be applied when these signals have the same temporal resolution. It is not commonly used because of its sensitivity to noise produced by the malfunction or misalignment of the different sensors. Feature Fusion is performed on features sets extracted from each channel. This is the approach that is used more multi-modal HCI and it has been used in AC as well (for example the Ausburg Biosignal Toolbox.94 As features of each signals (EEG, ECG, SCR, etc) we can consider the mean, median, standard deviation, maximum and minimum alongside with some unique features from each sensor (for example P300 amplitude for EEG, HRV for ECG etc.). These are individually computed for each sensor and then they are combined across the sensors. Decision Fusion is performed by combining the outputs of the classifiers for each signal. Thus firstly, the emotional states will be classified from each sensor and then they will be integrated in order to obtain a global view across the various sensors. This is the most common approach used in multi-modal HCI.9596 Actually, there are very few systems that have achieved emotional recognition by using multiple modalities or in other words sensor fusion. These modalities contain so physiological signals, like brain or heart activity, as some combination of features extracted by face recognition techniques and speech processing97. In this manner, it is very useful to present here the difference in the discriminations performance of multi-modal systems in contrast to the single channel systems. Scherer & Ellgring (2007)98 combined facial, vocal and body movements to discriminate among 14 92 Ludmila I. Kuncheva, Thomas Christy, Iestyn Pierce and Saad P. Mansoor, Multi-modal Biometric Emotion Recognition Using Classifier Ensembles, Lecture Notes in Computer Science, 2011, Volume 6703/2011, 317-326, DOI: 10.1007/978-3-642-21822-4_32 93 R. Sharma, V.I. Pavlovic, and T.S. Huang, Toward Multimodal Human-Computer Interface, Proc. IEEE, vol. 86, no. 5, pp. 853- 869, May. 1998. 94 J. Wagner, N.J. Kim, and E. Andre, From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification, Proc. IEEE Intl Conf. Multimedia and Expo, pp. 940-943, 2005. 95 M. Pantic and L. Rothkrantz, Toward an Affect-Sensitive Multimodal Human-Computer Interaction, Proc. IEEE, vol. 91, no. 9, pp. 1370-1390, Sept. 2003 96 R. Sharma, V.I. Pavlovic, and T.S. Huang, Toward Multimodal Human-Computer Interface, Proc. IEEE, vol. 86, no. 5, pp. 853- 869, May. 1998. 97 A) R.W. Picard, E. Vyzas, and J. Healey, Toward Machine Emotional Intelligence: Analysis of Affective Physiological State, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1175-1191, Oct. 2001. B) Z. Zeng, Y. Hu, G. Roisman, Z. Wen, Y. Fu, and T. Huang, AudioVisual Emotion Recognition in Adult Attachment Interview, Proc. Intl Conf. Multimodal Interfaces, J.Y.F.K.H. Quek, D.W. Massaro, A.A. Alwan, and T.J. Hazen, eds., pp. 139-145, 2006.C) Y. Yoshitomi, K. Sung-Ill, T. Kawano, and T. Kilazoe, Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face, Proc. Intl Workshop Robot and Human Interactive Comm., pp. 178-183, 2000. D) L. Chen, T. Huang, T. Miyasato, and R. Nakatsu, Multimodal Human Emotion/Expression Recognition, Proc. Third IEEE Intl Conf. Automatic Face and Gesture Recognition, pp. 366-371, 1998. E) B. Dasarathy, Sensor Fusion Potential Exploitation: Innovative Architectures and Illustrative Approaches, Proc. IEEE, vol. 85, no. 1, pp. 24-38, Jan. 1997. F) G. Caridakis, L. Malatesta, L. Kessous, N. Amir, A. Paouzaiou, and K. Karpouzis, Modeling Naturalistic Affective States via Facial and Vocal Expression Recognition, Proc. Intl Conf. Multimodal Interfaces, pp. 146-154, 2006 98 K. Scherer and H. Ellgring, Multimodal Expression of Emotion:Affect Programs or Componential Appraisal Patterns?, Emotion, vol. 7, pp. 158-171, 2007 Page 55 of 109

56 D2.4 State of the Art emotions (e.g. hot anger, shame etc.). Single channel classification rated from 21 facial features and 16 acoustic parameters were 52.2% and 52.5% respectively, while their combination leaded to 79% classification accuracy. Samely Castellano and his colleagues (2008)99 tried to detect eight emotions (the basic ones plus irritation and despair) by capturing some facial features, speech contours and gestures. The classification rates for the single channels of face, gestures and speech were 48.3%, 67.1% and 57.1% respectively, while the multimodal classifier achieved 78.3% accuracy, presenting 17% improvement compared to the best single channel system. So it is more than obvious that the sensor fusion can lead to a better discrimination accuracy regarding the detection of the humans emotions. Although Scherers and Castellanos systems demonstrate that there are some advantages to multichannel emotion recognition systems, these systems were trained and validated on acted emotional expressions, while the real-world applications require naturalistic emotional expressions. In this sense, we have to mention that there are some important differences among the real and the posed affective expression100; so it is proposed that studies on acted expressions should not be generalized to real contexts. On a more naturalistic way Kapoor and Picard(2005)101 have developed a probabilistic system to measure a childs interest level in the sense of upper and lower facial feature tracking, posture patterns and some contextual information, such as difficulty level and state of the game (Kapoor & Picard, 2005). The fusion of these modalities yielded to a recognition accuracy of 86% which was greater than the achieved accuracy from the single channels (Upper Face: 67%; Lower Face: 53%; Contextual Information: 57%; Posture: 82%). Kapoor et al. have extended their system by including skin conductance and a pressure-sensitive mouse in addition to their previous study Kapoor et al., (2007)102 . This system is able to predict self-reported frustration while children are engaged in problem solving task. The system scored 79% accuracy which is a great improvement over the 58.3% accuracy base line. An extra analysis of the discriminations ability of the 14 emotion predictors revealed that mouth fidgets, velocity of the head and the ration of postures were the most useful diagnostic features. In another recent study, Arroyo and his colleagues (Arroyo et al., 2009)103 tested a combination of context, facial features, seat pressure, skin conductance and mouse pressure to detect the levels of confidence, frustration, excitement and interest of some students in naturalistic school settings. Their results support that two parameters (Face+Context) explained 52%, 29% and 69% of the variance for confidence, interest and excited states respectively. The combination of context and seat pressure leaded to the most accurate model with 46% of variance for predicting the frustration. 99 G. Castellano, L. Kessous, and G. Caridakis, Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech, Affect and Emotion in Human-Computer Interaction, pp. 92-103, Springer, 2008. 100 A) M. Pantic and I. Patras, Dynamics of Facial Expression: Recognition of Facial Actions and Their Temporal Segments from Face Profile Image Sequences, IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 36, no. 2, pp. 433-449, Apr. 2006.B) S. Afzal and P. Robinson, Natural Affect Data: Collection & Annotation in a Learning Context, Proc. Third Intl Conf. Affective Computing and Intelligent Interaction and Workshops, pp. 1-7, 2009.C) J. Cohn and K. Schmidt, The Timing of Facial Motion in Posed and Spontaneous Smiles, Intl J. Wavelets, Multiresolution and Information Processing, vol. 2, pp. 1-12, 2004.D) P. Ekman, W. Friesen, and R. Davidson, The Duchenne SmileEmotional Expression and Brain Physiology .2, J. Personality and Social Psychology, vol. 58, pp. 342-353, 1990 101 A. Kapoor and R.W. Picard, Multimodal Affect Recognition in Learning Environments, Proc. 13th Ann. ACM Intl Conf. Multimedia, pp. 677-682, 2005. 102 A. Kapoor, B. Burleson, and R. Picard, Automatic Prediction of Frustration, Intl J. Human-Computer Studies, vol. 65, pp. 724- 736, 2007. 103 I. Arroyo, D.G. Cooper, W. Burleson, B.P. Woolf, K. Muldner, and R. Christopherson, Emotion Sensors Go to School, Proc. 14th Conf. Artificial Intelligence in Education, pp. 17-24, 2009. Page 56 of 109

57 D2.4 State of the Art So summarizing their results, they concluded that in most cases, face and context information yielded to the best fitting models, and the other channels did not provide any additional advantages. 5.3 Monitoring Behaviour In the USEFIL project the development of video processing algorithms for automated human behaviour extraction is planned. Certain behaviours can be indicators of actual or forthcoming health problems. This section will therefore start with a brief summary of the theories about the motivation behind human behaviour. Then we will give an overview of human behaviour recognition techniques and we will focus on learning from labelled data, unlabelled data as well as using depth information. 5.3.1 Human behaviour theories There are several theories that tried to explain human behaviour, which have to be considered in human behaviour recognition frameworks. Human behaviour results from people trying to satisfy their needs. Different people have varying needs. Abraham Maslow developed a theory based on the idea that all humans share common needs and that these needs can be arranged in a hierarchy, of ladder format, in order of priority104. Lower level needs have the highest priority. According to Maslow, lower-level needs must be satisfied before higher-level needs become sources of motivation. For example, a persons need for food, water, and shelter (physiological needs) take priority over social and self-esteem needs. Only when ones basic needs are met, can one focus on higher-level needs. Once a lower-level need is satisfied, the next higher-level need becomes the most important to the person. If a lower-level need suddenly becomes unsatisfied, it can take priority over a higher-level need. The lower order needs, according to Maslow, are physiological (oxygen, water, food, sleep, and relief from pain), safety (protection from possible threats, such as violence, disease, or poverty), and social (need to be liked and accepted by your family, immediate friends, and associates). Higher order needs become important once the lower order needs are fulfilled. These include Esteem and self-accomplishment. Figure 30: The Maslow hierarchy of needs 104 A.H. Maslow. A theory of human motivation. Psychological Review, 50(4):370396, 1943. Page 57 of 109

58 D2.4 State of the Art Most contemporary theories of motivation assume that people initiate and persist at behaviours to the extent that they believe the behaviours will lead to desired outcomes or goals. This premise has led motivation researchers to explore the psychological value people ascribe to goals105, peoples expectations about attaining goals106, and the mechanisms that keep people moving toward selected goals107. Self-determination theory (SDT)108 has differentiated the concept of goal-directed behaviour, yet it has taken a very different approach. SDT differentiates the content of goals or outcomes and the regulatory processes through which the outcomes are pursued, making predictions for different contents and for different processes. Further, it uses the concept of innate psychological needs as the basis for integrating the differentiations of goal contents and regulatory processes and the predictions that resulted from those differentiations. Specifically, according to SDT, a critical issue in the effects of goal pursuit and attainment concerns the degree to which people are able to satisfy their basic psychological needs as they pursue and attain their valued outcomes. SDT has, maintained that a full understanding not only of goal-directed behaviour, but also of psychological development and well-being, cannot be achieved without addressing the needs that define the goals and that influence which regulatory processes direct peoples goal pursuits. Specifically, in SDT, three psychological needs-for competence, relatedness, and autonomy-are considered essential for understanding the what (i.e., content) and why (i.e., process) of goal pursuits. An exhaustive survey of human behaviour theories can be found in109. 5.3.2 Techniques for human behaviour recognition 5.3.2.1 General Human behaviour recognition has been the focus of interest of computer vision and machine learning communities for years, mostly as isolated activities and not as part of a continuous process. Especially for assistive environments, being able to recognize and analyse human daily activities (e.g., go to bed, mop the floor and eat meal etc.) in a low cost and intelligent way for elderly people living-alone is essential for further providing them with appropriate health and medical services. There are several systems that focus on specific domains, e.g., event detection in sports110 [SO05, HH08], retrieving actions in movies111, human gesture recognition using Dynamic Time Warping112 and Time Delay Neural Networks113, and automatic discovery of activities114. 105 Virginia Grow Kasser and Richard M Ryan. The relation of psychological needs for autonomy and relatedness to vitality, well- being, and mortality in a nursing home1. Journal of Applied Social Psychology, 29(5):935954, 1999. 106 A Bandura. Human agency in social cognitive theory. American Psychologist, 44(9):117584, 1989. 107 C.S. Carver and Scheier M. F. On the self-regulation of behaviour. Cambridge University Press, 1998. 108 E. L. Deci and R. M. Ryan. Intrinsic motivation and self-determination in human behaviour. Plenum, 1985. 109 Edward L. Deci and Richard M. Ryan. The what and why of goal pursuits: Human needs and the self-determination of behaviour. Psychological Inquiry, 11(4):227268, 2000. 110 A) D. A. Sadlier and N. E. OConnor. Event detection in field sports video using audio-visual features and a support vector machine. Circuits and Systems for Video Technology, IEEE Transactions on, 15(10):12251233, oct. 2005. B) Mao-Hsiung Hung and Chaur-Heh Hsieh. Event detection of broadcast baseball videos. Circuits and Systems for Video Technology, IEEE Transactions on, 18(12):17131726, dec. 2008. 111 I. Laptev and P. Perez. Retrieving actions in movies. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 18, oct. 2007. 112 A. F. Bobick and A. D. Wilson. A state-based approach to the representation and recognition of gesture. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19(12):13251337, December 1997. 113 Ming-Hsuan Yang and N. Ahuja. Extraction and classification of visual motion patterns for hand gesture recognition. In Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on, pages 892897, June 1998. Page 58 of 109

59 D2.4 State of the Art Comprehensive literature reviews regarding isolated human action recognition can be found in115. As for behaviour recognition in an assistive environment setting, the employed techniques do not differ significantly from the general approaches. In116, some short term behaviours like walking, standing, abrupt motion, running etc. are recognized using cameras or other sensors. Fall detection, which is more specific to the senior person scenario, has attracted the attention of several researchers; a survey mainly about accelerometers can also be found117. The work of Thome et al118 exploits multiple views for detecting the falls using a Layer Hidden Markov Model (LHMM). In Qian et al119, a background subtraction method is adopted to identify humans and then cascading support vector machines are exploited to categorize different types of humans activities. Finally, Foroughi et al.120 adopts an elliptical model to estimate the shape of humans body and then exploits support vector machines to categorize these shapes as falls or other humans activities. Doulamis121 proposes a visual fall detection scheme using an iterative motion estimation algorithm which is constrained by time and shape rules. The dire fate of most of the proposed systems has been to remain prototypes deployed in laboratories. In order to develop visual behaviour classification systems that can work in real environments, much more research effort is required towards the resolution of the following problems: How can we extract reliable and representative features of tractable dimensionality that will by-pass the error-prone detectors and trackers? How can we model highly diverse and complex behaviours that will be more tolerant to noise and outliers? How can we exploit new sensors that provide a rich set of 3d points and improve accuracy? How can we efficiently build reliable behaviour models without having to annotate large 114 R. Hamid, S. Maddi, A. Bobick, and M. Essa. Structure from statistics - unsupervised activity analysis using suffix trees. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 18, oct. 2007. 115 A) Ronald Poppe. A survey on vision-based human action recognition. Image and Vision Computing, 28(6):976990, 2010 and B) Weiming Hu, Tieniu Tan, Liang Wang, and S. Maybank. A survey on visual surveillance of object motion and behaviours. Systems, Man and Cybernetics, Part C, IEEE Transactions on, 34(3):334352, 2004. 116 A) Dimitrios I. Kosmopoulos. Behaviour monitoring for assistive environments using multiple views. Universal Access in the Information Society, 10(2):115123, 2011. and B) O. Brdiczka, M. Langet, J. Maisonnasse, and J.L. Crowley. Detecting human behaviour models from multimodal observation in a smart home. Automation Science and Engineering, IEEE Transactions on, 6(4):588 597, oct. 2009. 117 James T. Perry, Scott Kellog, Sundar M. Vaidya, Jong-Hoon Youn, Hesham Ali, and Hamid Sharif. Survey and evaluation of real- time fall detection approaches. In Proceedings of the 6th international conference on High capacity optical networks and enabling technologies, HONET09, pages 158164, Piscataway, NJ, USA, 2009. IEEE Press. 118 N. Thome, S. Miguet, and S. Ambellouis. A real-time, multiview fall detection system: A lhmm-based approach. Circuits and Systems for Video Technology, IEEE Transactions on, 18(11):1522 1532, nov. 2008. 119 Huimin Qian, Yaobin Mao, Wenbo Xiang, and Zhiquan Wang. Home environment fall detection system based on a cascaded multi-svm classifier. In Control, Automation, Robotics and Vision, 2008. ICARCV 2008. 10th International Conference on, pages 1567 1572, dec. 2008. 120 H. Foroughi, A. Rezvanian, and A. Paziraee. Robust fall detection using human shape and multi-class support vector machine. In Computer Vision, Graphics Image Processing, 2008. ICVGIP 08. Sixth Indian Conference on, pages 413 420, dec. 2008 121 Nikolaos Doulamis. Iterative motion estimation constrained by time and shape for detecting persons falls. In Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments, PETRA 10, pages 62:162:8, New York, NY, USA, 2010. ACM. Page 59 of 109

60 D2.4 State of the Art amounts of data? How can we incorporate the knowledge about ones motivational factors in a behaviour recognition system? 5.3.2.2 Representation of raw data using feature vectors One of the key challenges real-time action recognition systems are confronted with concerns selection of appropriate features for representing the observed raw data. The ideal features should describe different actions accurately, with high discrimination capability, and should be efficiently calculated. Ideally, these features should also provide a hierarchical representation scheme (coarse to fine) so that a desirable, application-wise trade-off between representation capabilities and computational complexity can be reached. In the following, some popular features and their applicability to behaviour recognition tasks are discussed. The employment of features directly extracted from the video frames has the significant advantage of obviating the need of detecting and tracking the salient scene objects, a process which is notoriously difficult in cases of occlusions, target deformations, illumination changes etc. Thus, by using such an approach, the intermediate levels of semantic complexity, as met in typical bottom-up systems, are completely bypassed. For this purpose, either local or holistic features (or both122) may be used. An advantage of local descriptors is that their computation does not require static cameras (or a registration process to make the captured video frames comparable); however, real-life installations usually employ static cameras, hence rendering this advantage of local descriptors rather indifferent in the examined context. On the other hand, a major disadvantage of local descriptors is the significant computational burden required for their calculation. Another drawback of local descriptors is that, despite their suitability for extracting the motion patterns of tracked objects within certain regions (e.g., Willems et al., 2008 and Shechtman, 2007)123, they are not suitable for capturing the shape of the moving objects. Holistic features remedy these drawbacks of local features, while also requiring a much less tedious computational procedure for their extraction. Motion history images and motion energy images are among the first holistic representation methods for behaviour recognition124. A very positive attribute of such representations is that they can easily capture the history of a task that is being executed. In one study125, it was shown that pixel change history (PCH) images are able to capture relevant duration information with better discrimination performance. 5.3.2.3 Learning activities from labelled data One of the key functionalities of any machine learning model (classifier) suitable for application in visual behaviour understanding is the ability to extract the signature of a behaviour from the captured visual input. The key requirements when designing such a classifier is (a) to support task 122 X. H. Sun, M. Y. Chen, and A. G. Hauptmann. Action recognition via local descriptors and holistic features. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5865, 2009. 123 A) Geert Willems, Tinne Tuytelaars, and Luc Gool. An efficient dense and scale-invariant spatio-temporal interest point detector. In ECCV 08: Proceedings of the 10th European Conference on Computer Vision, pages 650663, Berlin, Heidelberg, 2008. Springer-Verlag. B) Eli Shechtman and Michal Irani. Space-time behaviour-based correlationORhow to tell if two underlying motion fields are similar without computing them? IEEE Trans. Pattern Anal. Mach. Intell., 29(11):20452056, 2007. 124 James W. Davis and Aaron F. Bobick. The representation and recognition of action using temporal templates. In IEEE Conference on Computer Vision and Pattern Recognition, pages 928934, 1997. 125 Tao Xiang and Shaogang Gong. Beyond tracking: modelling activity and understanding behaviour. International Journal of Computer Vision, 67:2151, 2006. Page 60 of 109

61 D2.4 State of the Art execution in various time scales, since a task or parts of it may have variable duration; and (b) to support stochastic processes, because of the task intra-class variability and noise. A very flexible framework for stochastic classification of time series is the HMM (see e.g., Rabiner, 1989126). It can be easily extended to handle outliers (see e.g., Chatzis et al., 2009127) and to fuse multiple streams (e.g., Zheng et al., 2008128). It is very efficient for application in previously segmented sequences129, however when the boundaries of the sequence that we aim to classify are not known in advance, the search space of all possible beginning and end points make the search very inefficient130. A typical way to treat this problem is given in Ly and Nevatia (2006)131 where a 3 dynamic programming algorithm of cost T , is used to perform segmentation and classify then the segments; however the cost is restrictive in real applications. The behaviour recognition problem can become easier if we exploit context information, e.g., knowledge about the motivation of the monitored persons or information about the visual process. In the past there have been some efforts to exploit the hierarchical structure of some time series, e.g., by using the hierarchical HMMs132. Each state is considered to be a self-contained probabilistic model (an HHMM). Examples of such approaches can be found in literature133, where the workflow in a hospital operating room is described. Another approach is the layered hidden Markov model (LHMM)134, which consists of N levels of HMMs where the HMMs on level N + 1 corresponds to observation symbols or probability generators at level N. Every level i of the LHMM consists of K i HMMs running in parallel. In that work a LHMM is used for event identification in meetings. In Xiaoling and Layuans paper135 structure learning in HMMs is addressed in order to obtain temporal dependencies between high-level events for video segmentation. An HMM models the simultaneous output of event-classifiers to filter the wrong detections. In many workflow-based activities, such as in industrial production or breakfast/meal preparation at home where a sequence of different tasks has to be completed, the execution of a task means that it will not appear again in the same workflow. Therefore the whole history of tasks must be kept in memory to exclude false positives and the Markovian property is obviously not applicable. Thus, the above approaches have an inherent problem to describe such workflows. 126 L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257286, 1989. 127 Sotirios P. Chatzis, Dimitrios I. Kosmopoulos, and Theodora A. Varvarigou. Robust sequential data modeling using an outlier tolerant hidden Markov model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9):16571669, 2009. 128 Z. H. Zeng, J. L. Tu, B. M. Pianfetti, and T. S. Huang. Audio-visual affective expression recognition through multistream fused HMM. IEEE Trans. Multimedia, 10(4):570577, June 2008. 129 A) D. Kosmopoulos and S. P. Chatzis. Robust visual behaviour recognition. Signal Processing Magazine, IEEE, 27(5):3445, September 2010. B) Athanasios Voulodimos, Helmut Grabner, Dimitrios I. Kosmopoulos, Luc J. Van Gool, and Theodora A. Varvarigou. Robust workflow recognition using holistic features and outlier-tolerant fused hidden markov models. In ICANN (1), pages 551560, 2010. 130 Stefan Eickeler, Andreas Kosmala, and Gerhard Rigoll. Hidden markov model based continuous online gesture recognition. In In Int. Conference on Pattern Recognition (ICPR, pages 12061208, 1998. 131 F. J. Lv and R. Nevatia. Recognition and segmentation of 3-D human action using HMM and multi-class adaboost. In ECCV06, pages IV: 359372, 2006. 132 Shai Fine, Yoram Singer, and Naftali Tishby. The hierarchical hidden markov model: Analysis and applications. Machine Learning, 32(1):4162, 1998. 133 Nicolas Padoy, Diana Mateus, Daniel Weinland, Marie-Odile Berger, and Nassir Navab. Workflow Monitoring based on 3D Motion Features. In Workshop on Video-Oriented Object and Event Classification in Conjunction with ICCV 2009, pages 585592, Kyoto Japan, 2009. IEEE. 134 Nuria Oliver, Ashutosh Garg, and Eric Horvitz. Layered representations for learning and inferring office activity from multiple sensory channels. Comput. Vis. Image Underst., 96(2):163180, 2004. 135 Tao Xiang and Shaogang Gong. Optimising dynamic graphical models for video content analysis. Comput. Vis. Image Underst., 112:310323, December 2008. Page 61 of 109

62 D2.4 State of the Art The Echo State Network (ESN) as in Jaeger et al. (2007)136 could be a promising method for online classification of workflow time series, because it does not make any explicit Markovian assumption. However, it was shown that it effectively behaves as a Markovian classifier137, i.e., recent states have a far larger influence on the predicted state. The ESN has already been used in a work using the same dataset that we are using138. However, their results are not directly comparable to ours, since the features they are using are different. In previous works139, human actions or surveillance scenes are analysed automatically for the extraction of topics from spatio-temporal words. Their goal is to find correlated motion in order to segment behaviour in space and time. In other approaches abnormal events are often detected as outliers. This has been successfully applied to traffic monitoring140, the surveillance of public places141, assisted living142 or the analysis of motion patterns143. In another study144 temporal relations between consecutive frames are encoded using discriminative slow feature analysis. Activities are automatically segmented and represented in a hierarchical coarse to fine structure. Pentney et al.145 demonstrated the use of a large number of hand-entered common sense database to interpret activity traces. They also proposed chain graphs to represent objects used and activities performed. The approach is based on combining relational databases of large common sense created by the user with techniques for information retrieval on web. Wang et al.146 combined a generative common sense model of activity with a discriminative model of actions to automate feature selection. Hamid et al. (2009) 147 modelled activity as a sequence of discrete events; recognition is done by discovering and matching the Motif which is defined as the subsequences with similar behaviour appeared frequently in time-series data. They also proposed to represent activities as bags of n-grams to extract global structure information of activities and presented a computational framework for unsupervised activity discovery and classification. 136 Herbert Jaeger, Wolfgang Maass, and Jose Principe. Special issue on echo state networks and liquid state machines. Neural Networks, 20(3):287289, 2007. 137 Claudio Gallicchio and Alessio Micheli. Architectural and markovian factors of echo state networks. Neural Networks, 24(5):440456, 2011. 138 Galina V. Veres, Helmut Grabner, Lee Middleton, and Luc J. Van Gool. Automatic workflow monitoring in industrial environments. In ACCV (1), pages 200213, 2010. 139 A) Daniel Kuettel, Michael D. Breitenstein, Luc Van Gool, and Vittorio Ferrari. Whats going on? discovering spatio-temporal dependencies in dynamic scenes. In IEEE Conference on Computer Vision and Pattern Recognition, June 2010 B) Xiaogang Wang, Xiaoxu Ma, and W. E. L. Grimson. Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Trans. Pattern Anal. Mach. Intell., 31:539555, March 2009. 140 T. Hospedales, S. G. Gong, and T. Xiang. A markov clustering topic model for mining behaviour in video. In ICCV09, pages 1165 1172, 2009. 141 Amit Adam, Ehud Rivlin, Ilan Shimshoni, and Daviv Reinitz. Robust real-time unusual event detection using multiple fixed- location monitors. IEEE Trans. Pattern Anal. Mach. Intell., 30:555560, March 2008. 142 Fabian Nater, Helmut Grabner, and Luc J. Van Gool. Exploiting simple hierarchies for unsupervised human behaviour analysis. In CVPR, pages 20142021, 2010. 143 Chris Stauffer, W. Eric, and W. Eric L. Grimson. Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:747757, 2000 144 Helmut Grabner Fabian Nater and Luc Van Gool. Temporal relations in videos for unsupervised activity analysis. In Proceedings of the British Machine Vision Conference, pages 21.121.11. BMVA Press, 2011. 145 William Pentney, Ana-Maria Popescu, Shiaokai Wang, Henry Kautz, and Matthai Philipose. Sensor-based understanding of daily life via large-scale use of common sense. In Proceedings of the 21st national conference on Artificial intelligence - Volume 1, AAAI06, pages 906912. AAAI Press, 2006. 146 Shiaokai Wang, William Pentney, Ana-Maria Popescu, Tanzeem Choudhury, and Matthai Philipose. Common sense based joint training of human activity recognizers. In Proceedings of the 20th international joint conference on Artificial intelligence, IJCAI07, pages 22372242, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc. 147 Raffay Hamid, Siddhartha Maddi, Amos Johnson, Aaron Bobick, Irfan Essa, and Charles Isbell. A novel sequence representation for unsupervised analysis of human activities. Artificial Intelligence, 173(14):12211244, 2009. Page 62 of 109

63 D2.4 State of the Art In another approach148 actions are encoded in a weighted directed graph, referred to as action graph, where nodes of the graph represent salient postures that are used to characterize the actions and are shared by all actions. The weight between two nodes measures the transitional probability between the two postures represented by the two nodes. An action is encoded as one or multiple paths in the action graph. The salient postures are modelled using Gaussian mixture models (GMMs). Both the salient postures and action graph are automatically learned from training samples through unsupervised clustering and expectation and maximization (EM) algorithm. The proposed action graph can be expanded efficiently with new actions. An algorithm is also proposed for adding a new action to a trained action graph without compromising the existing action graph. 5.3.2.4 Using Depth data As human bodies and motions are in essence three-dimensional, the information loss in the depth channel could cause significant degradation of the representation and discriminating capability for these feature representations. Recent emergence of depth sensor (e.g., Microsoft Kinect) has made it feasible and economically sound to capture in real-time not only the colour images, but also depth maps with appropriate resolution (e.g., 640 480 in pixel) and accuracy (e.g., = 1cm). It can provide three-dimensional structure information of the scene as well as the three-dimensional motion information of the subjects/objects in the scene. Therefore the motion ambiguity of the colour camera, i.e., projection of the three-dimensional motion onto the two-dimensional image plane, could be bypassed. Recently, a series of spatio-temporal interest points (STIPs) based methods have been proposed, which achieve the state-of-the-art performances in activity recognition. These methods include Harris3D149, HOG3D150 and Cuboid151. Although slightly different from each other, these methods share the common feature extraction and representation framework, which involves detecting local extremes of the image gradients and describing the point using histogram of oriented gradients (HOG)152 and histogram of optic flows (HOF)153. The first work using RGB+Depth sensor for activity recognition154 is where a bag of 3D points (BOPs) are efficiently sampled from the depth map and Gaussian mixture models are used to model the human postures. This method yields superior results over the conventional method which uses 2D silhouettes. However, it has several limitations: 1) Instead of direct utilization of the three-dimensional motion information, it uses two-dimensional projections of key poses, which could essentially lead to sub-optimal feature representations; 2) Only depth information is used for recognition while colour information is completely ignored; however, colour and depth information are rather complementary than exclusive. More recently, another group155 directly use skeleton motion data extracted from Kinect SDK for activity 148 Wanqing Li, Zhengyou Zhang, and Zicheng Liu. Expandable data-driven graphical modeling of human actions based on salient postures. Circuits and Systems for Video Technology, IEEE Transactions on, 18(11):14991510, November 2008. 149 . Laptev. On space-time interest points. Int. J. Computer Vision, 64(2):107123, 2005. 150 Alexander Klser, Marcin Marszaek, and Cordelia Schmid. A spatio-temporal descriptor based on 3D-gradients. In British Machine Vision Conference, pages 9951004, September 2008. 151 P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behaviour recognition via sparse spatio-temporal features. In Proceedings of the 14th International Conference on Computer Communications and Networks, pages 6572, Washington, DC, USA, 2005. IEEE Computer Society. 152 Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 886893, 2005. 153 Nazli Ikizler, R. Gokberk Cinbis, and Pinar Duygulu. Human action recognition with line and flow histograms. In In Proc. ICPR, 2008. 154 W. Q. Li, Z. Y. Zhang, and Z. C. Liu. Action recognition based on a bag of 3D points. In CVPR4HB10, pages 914, 2010. 155 Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. Human activity detection from RGBD images. In AAAI workshop on Pattern, Activity and Intent Recognition, 2011. Page 63 of 109

64 D2.4 State of the Art representation; however, this method cannot be applied when skeleton data cannot be reliably obtained. Methods to extract skeletal data are presented in some recent work156, where a pictorial structure model (PSM) and a generative model for limbs are used. In one study157 a random forest classifier is used. In another study158 a dataset for daily activity recognition is presented, which includes synchronized colour and depth images. 5.3.2.5 Symbolic Approaches to Activity Recognitions In addition to the techniques presented above, symbolic approaches to activity recognition may be used in USEFIL. The input to such approaches is a symbolic representation of short-term activities detected on video content as well as other sensors, such as wrist watches. The output of a symbolic approach is a series of composite/long-term activities. Numerous recognition systems have been proposed in the literature (see159,160 for two recent surveys). In this section we focus on long-term activity (high-level) recognition systems that exhibit a formal, declarative semantics. A well-known system for activity recognition is the Chronicle Recognition System (CRS) (http://crs.elibel.tm.fr/). A chronicle can be seen as a long-term activity - it is expressed in terms of a set of events (short-term activities), linked together by time constraints, and, possibly, a set of context constraints. The language of CRS relies on a reified temporal logic, where propositional terms are related to time-points or other propositional terms. Time is considered as a linearly ordered discrete set of instants. The language includes predicates for persistence and event absence. Details about CRS may be found on the web page of the system and161. The CRS language does not allow mathematical operators in the constraints of atemporal variables. Consequently, the computation of the distance between two people/objects, which is of great interest in the domain of activity recognition, cannot be computed. CRS, therefore, cannot be directly used for activity recognition in video surveillance applications. More generally, CRS cannot be directly used for activity recognition in applications requiring any form of spatial reasoning, or any other type of atemporal reasoning. These limitations could be overcome by developing a separate tool for atemporal reasoning that would be used by CRS whenever this form of reasoning was required. To the best of our knowledge, such extensions of CRS are not available. Clearly, the computational efficiency of CRS, which is one of the main advantages of using this system for activity recognition, would be compromised by the integration of an atemporal reasoner. 156 James Charles and Mark Everingham. Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 12021208, November 2011. 157 Jamie Shotton, Andrew W. Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake. Real-time human pose recognition in parts from single depth images. In CVPR, pages 12971304. IEEE, 2011. 158 Bingbing Ni, Gang Wang, and Pierre Moulin. RGBD-hudaact: A color-depth video database for human daily activity recognition. In ICCV Workshops, pages 11471153, 2011. 159 Artikis A., Skarlatidis A., Portet F., & Paliouras G. (2012). Logic-based event recognition. Knowledge Engineering Review. 160 Cugola G. & Margara A. (2011). Processing flows of information: From data stream to complex event processing. ACM Computing Surveys. 161 Dousson C. & Maigat P. L. (2007). Chronicle recognition improvement using temporal focusing and hierarchisation. IJCAI 2007. 324329. Page 64 of 109

65 D2.4 State of the Art Hakeem and Shah162 have presented a hierarchical activity representation for analysing videos. The temporal relations between the sub-activities of an activity definition are represented using the interval algebra of Allen and Ferguson163 and an extended form of the CASE representation164, originally used for the syntactic analysis of natural languages. The Event Calculus has also been used for activity recognition165,166,167. In this approach, the availability of the full power of logic programming is one of the main attractions of employing this calculus as the temporal formalism. It allows activity definitions to include not only complex temporal constraints but also complex atemporal constraints. Shet et al. have presented a logic programming approach to activity recognition. See168,169 for two recent publications. These researchers have presented activity definitions concerning theft, entry violation, unattended packages, and so on. Within their system, Shet and colleagues have incorporated mechanisms for reasoning over rules and facts that have an uncertainty value attached. Uncertainty in rules defining long-term activities corresponds to a measure of rule reliability --- it is often the case that we only have imprecise knowledge about a long-term activity definition. On the other hand, uncertainty in facts represents the detection probabilities of the short- term activities. In the VidMAP system, a mid-level module which generates Prolog facts automatically filters out data that a low-level image processing module has misclassified (such as a tree mistaken for a human). Shet and colleagues have noted of the filtering carried out by this module that ...it does so by observing whether or not the object has been persistently tracked. In [168], a bilattice is used to detect human entities based on uncertain output of part-based detectors, such as head or leg detectors. The bilattice structure associates every activity with two uncertainty values, one encoding available information and the other encoding confidence. The more confident information is provided, the more probable the respective long-term activity becomes. Markov Logic Networks (MLNs)170 have also been used for dealing with uncertainty in activity recognition. MLNs combine first-order logic and probabilistic graphical models. The use of first- order logic allows for the representation of activity definitions including complex (temporal) constraints. In MLNs each first-order formula may be associated with a weight, indicating the confidence we have on the formula. The main idea behind MLNs is that the probability of a world increases as the number of formulas it violates decreases. Therefore, a world violating formulas 162 Hakkem, A. & Shah, M. (2007). Learning, detection and representation of multi-agent events in videos. Artificial Intelligence, 171(8-9): 586-605. 163 Allen, J. & Ferguson, J. (1994). Actions and events in interval temporal logic. Journal of logic and Computation, 4(5), 531-579. 164 Fillmore, C. (1968). The case for CASE. In E. Bach and R. Harms (Editors), Universals in Linguistic Theory (pages 97-135). Holt, Rinehart, and Winston. 165 Artikis A., Skarlatidis A. & Paliouras G. (2010). Behaviour Recognition from Video Content: A Logic Programming Approach, International Journal of Artificial Intelligence Tools, 19(2):193-209. 166 Paschke, A. & Bichler, M (2008). Knowledge representation concepts for automated SLA management. Decision Support Systems, 46(1): 187-205. 167 Artikis, A. & Paliouras, G. (2009) Behaviour recognition using the event calculus. Artificial Intelligence Applications & Innovations. Springer Press. 168 Shet, V, .Neumann, J., Ramesh, V. & Davis, L. (2007). Billatice-based logical reasoning for human detection. In Computer Vision and Pattern Recognition. IEEE. 169 Shet, V., Harwood, D. & Davis, L. (2005) VidMAP: video monitoring of activity with Prolog. In Advanced Video and Signal Based Surveillance. IEEE. 170 Richardson, M. and Domingos, P. 2006. Markov logic networks. Machine Learning 62, 1-2, 107136. Page 65 of 109

66 D2.4 State of the Art becomes less probable, but not impossible as in first-order logic. A set of Markov logic formulas represents a probability distribution over possible worlds. Activity recognition in MLNs involves querying a ground Markov network. Such a network is produced by grounding all rules expressing long-term activities using a finite set of constants that typically come from the input short-term activity streams. The MLN inference algorithms take into consideration not only the weights attached to the long-term activity definitions, but also the weights (if any) attached to the input short-term activities by the underlying video processing units. Activity recognition using MLNs has been performed by Biswas et al.171, for example. An approach that can represent persistent and concurrent activities, as well as their starting and ending time- points, is proposed in172. The method in173 employs hybrid-MLNs174 in order to recognise successful and failed interactions between multiple humans using noisy location data. Similar to pure MLN- based methods, the knowledge base is composed by long-term activity definitions. However, hybrid formulas aiming to remove the noise from the location data are also included. Hybrid formulas are defined as normal formulas, but their weights are also associated with a real-valued function, such as the distance of two persons. As a result, the confidence of the formula is defined by both its weight and function. Although these methods incorporate first-order logic representation, the presented long-term activity definitions have a limited temporal representation --- for instance, temporal constraints are defined over successive instants of time. Note that until training data become available in USEFIL, transfer learning techniques175 may be employed in order to use knowledge concerning long-term activity definitions that is available in other related application domains. 171 Biswas, R., Thrun, S., and Fujimura, K. 2007. Recognizing activities with multiple cues. In Workshop on Human Motion, A. M. Elgammal, B. Rosenhahn, and R. Klette, Eds. Lecture Notes in Computer Science, vol. 4814. Springer, 255270. 172 Helaoui, R., Niepert, M., & Stuckenschmidt, H. (2011). Recognizing interleaved and concurrent activities: A statistical-relational approach. Pervasive Computing and Communications (pp. 1-9). IEEE.. 173 Sadilek, A. and Kautz, H. 2012. Location-based reasoning about complex multi-agent behaviour. Journal of Artificial Intelligence Research 43, 87133. 174 Wang, J. and Domingos, P. 2008. Hybrid markov logic networks. In Proceedings of the 23rd national conference on Artificial intelligence. Vol. 2. 11061111. 175 Pan S. and Yang Q. "A Survey on Transfer Learning," Knowledge and Data Engineering, IEEE Transactions on , vol.22, no.10, pp.1345-1359, Oct. 2010. Page 66 of 109

67 D2.4 State of the Art 6. Home Gateway Systems 6.1 Web-TV 6.1.1 Introduction A Web-TV device is either a television set with integrated internet capabilities that offers more advanced computing ability and connectivity than a contemporary basic television set. Web-TVs may be thought of as an information appliance or the computer system from a handheld computer integrated within a television set unit, as such Web-TV often allows the user to install and run more advanced applications or plugins/add-ons based on a specific platform. Per definition Web-TV platform or middleware have to have a public SDK (Software Development Kit) for apps so that third-party developers can develop applications for it, and an app store so that the end-users can install and uninstall apps themselves, just like for Smart Phones. The public SDK should enable third-party companies and other interactive application developers to write applications once and see them run successfully on any device that supports the Web-TV platform or middleware architecture which it was written for, no matter of who the hardware manufacturer is. The two main services of "Web-TV" are: To deliver content from other computers or network attached storage devices on your network. To provide access to internet based services including traditional broadcast TV channels, catch-up services, video-on-demand, EPG, interactive advertising, personalisation, voting, games, social networking, and other multimedia applications. While the concept of Web-TVs is still in its incipient stages, with up and coming software frameworks such as the proprietary Google TV and the open source XBMC platforms getting a lot of public attention in the news media within the consumer electronics market area, and commercial offerings from companies such as Philips, Logitech, Sony, LG, Boxee, Samsung and Intel have indicated products in the area that will give television users search capabilities, ability to run apps, interactive on-demand media, personalized communications, and social networking features. There is an array of operating systems currently available, and while most are targeting smartphones, nettops or tablet computers, some also run on Web-TVs or are even designed specifically for Web-TV usage. Most often the operating systems of Smart TVs are originally based on Linux, Android, and other open-source software platforms. 6.1.2 Philips NetTV Net TV has been first launched in April 2009 on Philips TVs. The product (Net TV device) is a Television, a Blue Ray player or a Home Theatre System manufactured by Philips or Sharp. Its use is supported by the so-called Net TV portal. This portal can be accessed via a button on the remote control on a Philips TV device, which is connected to the internet. A simple press on the button will show the consumer a portal in which all kind of services (Apps) are shown. These apps can be accessed via a press on the button on the remote control as well. The apps within the Net TV portal are created and hosted by third party companies. While watching TV, you will get to see the Net TV portal filled with Apps for your country with a press on the button on the remote control and you can choose you App another press on the button (Figure below). Page 67 of 109

68 D2.4 State of the Art Figure 31: NetTV portal interface The products on which the Net TV portal is available are: - Philips 2009 TVs, ranges 8000 and 9000 series - Philips 2010 TVs, ranges 7000, 8000 and 9000 series - Philips 2010 AVM devices (multiple device types) series - Sharp 2010 TVs (multiple device types) series - Philips 2011 TVs, Ranges 6000, 7000, 8000, 9000 series - Philips 2011 AVM devices (multiple device types) series - Sharp 2011 TVs (multiple device types) series To be able to add the service portal with the apps to the TV platform some changes to the product were needed. The most important of these changes was to integrate a browser into the platform (Opera created a specific version of their browser for this purpose). Next to having a browser a list of platform requirements was set. This list contains platform requirements regarding: - Architecture (Opera browser version, CPU speed) - Codecs (supported video codecs, audio codecs, containers, 3D, subtitles) - Storage / Streaming (Download to SD, Stream to RAM, auto-bandwidth selection) - DRMs (Marlin-BB, Microsoft WMDRM, SSL streaming, Client certificates) To implement the above in the platform, multiple suppliers are needed. For the Opera browser Opera is the supplier, Trident is the chip supplier responsible for the codec support, Marlin is supplier for Marlin-BB, Microsoft the supplier for WMDRM, etcetera. For each new generation of Net TV enabled devices the strategy is to include at least the same requirements as in the previous generation and add certain requirements (capabilities) to the device based on market insights. Example: the 2010 generation Net TV devices are equipped with Marlin DRM. This DRM is used to be able to encrypt video content, making it possible to include Hollywood content in apps on the Page 68 of 109

69 D2.4 State of the Art service portal. In discussions with third parties, our business developers received information that some of the App providers preferred to use WMDRM encryption instead of Marlin. In 2011 generation Net TV devices therefore both Marlin as WMDRM is supported. The below is a simplified overview in which the product (device) and the services (3 rd party services Lifestyle, news etc.) are represented. In this figure also the device portal and the service portal are mentioned. The device portal is the switchboard which authenticates if the device trying to connect really is a Net TV device. If the device is authenticated the Service portal can be reached. The service portal is the access point to the variety of the services. The device portal is created, hosted and maintained by Philips. The service portal is created hosted and maintained by IBM. The services are created, hosted and maintained by third parties. Lifestyle Content 3rd party Services Etc TV Guide News Advertising Photo Service Community 1 2 3 Games eCommerce Device Portal Services Portal transparent to the user 4 Product unique ID User Details device FW version Country Usage Profile Remote UI VIDEO Language Services package Advertising Content 10 Figure 32: Overview of the NetTV device and services The following table indicates the state to the art of the technical capabilities of the Philips NetTV: Table 7: Technical capabilities of the Philips NetTV Feature Detail Net TV 2.0 2010, 2011 Net TV Browsing CE-HTML-RevA, keys, security, Device Portal, etc. Browser in background Scratch Window Local UI property setFullScreen() Page 69 of 109

70 D2.4 State of the Art Dashboard Basic profile (no PIP) Advanced profile (PIP) Resolutions 720p Internationalization Fonts and Character Sets - generic Chinese font for Open Internet Arabic font for NetTV (France) Hebrew font for Open Internet HbbTV Red Button 1.0 Red Button 1.5 Mandatory if EPG HbbTV 1.1.1 - all mandatory items expect CI+ API. All EU except UK, IT HbbTV 1.1.1 compliant PIP for EPG/Portal only IP-EPG Tuning with video/local object Analog / DVB-T/C DVB-S Recording via IP-EPG ISDB-T (LATAM) DVB-SI EPG on the go Program info Scheduled recording list Pushed notifications EPG reminders Third party notifications (social messaging) Sign on iFrame signon Digest signon (ICP client) Storage Protected Content Download to SD-card Protected Content Download to USB storage Open Internet Access to Open Internet, mode switching Open Internet Page Bookmarking Flash10.x Plugin HTML5 HTML5 Audio/Video Tag for H264/AAC Local storage Page 70 of 109

71 D2.4 State of the Art Browser highlight switch Codecs & Streaming WMV/H264 SD WMA/MP3/AAC Stereo HTTP Shoutcast WMV/H264 720p/1080i AAC & AC3 5.1 WMS-HTTP WMV/H264 1080p AVC in TS Multi audio External subtitles (SAMI) - Latin External subtitles (SAMI) - Cyrillic, Greek 3D autodetect 3D side-by-side, top-bottom video, 720p 3D, MVC WMA Pro 5.1 Microsoft Smooth Streaming Apple HTTP Live Streaming Google WebM Codec Content Protection Marlin-BB download Marlin-BB streaming WMDRM-PD - streaming Client certificates - range based SSL Streaming PlayReady DRM smooth streaming Widevine DRM adaptive streaming Ambilight Ambilight API YouTube YouTube Leanback (local flash-lite app) YouTube Leanback v3 (widevine & 3D) NetFlix NetFlix 3.1. Local app Page 71 of 109

72 D2.4 State of the Art LEGEND for Profile Definition Mandatory, tested for certification, dependent Apps/Content enabled on Device Optional, tested for certification if included, dependent Apps/Content enabled on Device Not required, not tested for certification, dependent Apps/Content on Device enabled by explicit agreement 6.2 Smart Home Platforms This section surveys the technologies related to home automation and smart homes in general. The current goals of the USEFIL project include only data collection from various sensors in an elderly persons home and do not include exercising control over any devices (e.g. lights, heating) in that home. This review is included here mostly for the completeness reasons, as well as for a case that the need for controlling some appliances will appear later in the project. Using energy consumption information as an additional source of data about user activities is, however, under a consideration. This could be achieved exploiting a solution as those discussed below. Selected examples of companies and products are listed. Some relevant research projects from the field of smart homes and ambient (supporting) intelligence are also included. There is no universal way of connecting consumer electronics with HVAC (heating, ventilation and air conditioning), security, alarm and other home devices, yet. The trends seem to indicate that the solution to interoperability is not likely to emerge from the will of the home automation manufacturers but rather from the increasing networking of all human activities through various Internet technologies. This development will make prices of home automation drop and introduce a plethora of new applications, most of which are not created by home automation companies but creative people and the large digital service industry. In the wake of technical development combined with the ageing population in developing countries, we will see more applications targeted to older and mentally less capable people, requiring less configuring and setting up, and smarter, more connected and more automatic home automation solutions. 6.2.1 Home automation platforms 6.2.1.1 ThereGate www.therecorporation.com There Corporation is a spin-off from Nokia (discontinued Smart Home program of the Nokia Research Center) that specializes in smart homes interoperability. Originally, the aim was to control different home appliances via a mobile phone but later the company concentrated on the energy management. ThereGate is the central product of There Corporation. It is a home gateway that acts as a WiFi access point as well as collects data from various sensors (e.g. temperature, energy consumption, motion detectors) and enables controlling various actuators (e.g. light switches) within a home. The default radio interface for communication with sensors/actuators is Z-Wave (integrated controller). Other radio interfaces like ZigBee can also be used via extensions. Page 72 of 109

73 D2.4 State of the Art Figure 33: ThereGate ThereGate provides also an HTTP REST API for easy access to sensors data and controlling the actuators. ThereGate is a Linux-based computer as such, and can be run custom applications as well. Some specifications of ThereGate TG800GZ: 533 MHz Broadcom processor, 4 x Gbps LAN Ports, 1 x Gbps WAN Port, 4 x USB 2.0 full speed host, 802.11 b/g/n, 300 Mbps transfer rate, GSM/GPRS/3G, Integrated Z-Wave controller (European), Apache Web server. 6.2.1.2 GreenWave Reality http://www.greenwavereality.com/solutions/ GreenWave Reality is a company originally established in Denmark but in present mostly concentrating its operation in their U.S office. It offers solutions very similar to those of There Corporation above. At the heart of the GreenWave Reality solution is the concurrent radio Gateway that supports Z-Wave, ZigBee and 6LoWPAN to create a mesh-based Home Area Network (HAN). The HAN supports smart PowerNodes, connected lighting, and in-home displays making homes smart, energy conscious, and controllable. Unlike There Corporation that produces only the gateway node and relies on 3rd party sensors/actuators, GreenWave Reality produces also their own sensors/switches. Figure 34: GreenWave Reality gateway, switches and display. Page 73 of 109

74 D2.4 State of the Art 6.2.1.3 HomeSeer - home automation with Android and iPhone http://www.homeseer.com/ http://www.homeseer.com/products/Android-iPhone-Home-Automation.htm http://store.homeseer.com/store/HomeSeer-Control-System-Comparison-W13C81.aspx http://www.youtube.com/watch?v=3itu7S7B1AQ (Android Home Automation with HomeSeer) HomeSeer is an U.S. company that similarly to There Corporation or GreenWave Reality has a home gateway as their main product. The focus is not only on energy monitoring, but on home automation in general. HomeSeer enables monitoring and controlling virtually every aspect of a home, including lighting, climate, appliances, audio/video equipment, security systems, shades & blinds, irrigation, energy management and more. And this monitoring and controlling can be exercised from anywhere in the world using a smart phone. The system can utilize different wire or wireless communication technologies like Z-Wave. Setting up requires four components: A HomeSeer automation gateway controller unit for monitoring and control (the brain of the system). Figure 35: HomeSeer automation gateways. One or more 3rd party automated devices replacing normal manual or dumb devices such as thermostats, locks and switches. Figure 36: 3rd Party automation devices. System interfaces that establish the communications network by using the selected technology: Z-Wave (908 MHz America) / (868 MHz Europe), X10, UPB, Insteon, infrared, phone touch tone, voice, WLAN (Wi-Fi) or some other networks. Figure 37: System interfaces. Selecting the controller device and installing the user interface for it (available for Android or iPhone terminals and Windows computers). Page 74 of 109

75 D2.4 State of the Art Figure 38: Example interfaces. According to the page http://www.ihomeautomate.eu/tag/homeseer/, the HomeSeer system can also be controlled programmatically using Python or Java languages. This seems to be a competent technology base for a smart home but it falls to the category of using much money and effort in automating the home, thus missing the target of being cheap, off-the-shelf and little installing required solution that is the aim in USEFIL. 6.2.1.4 EKE Building Technology Systems www.ebts.fi (Finnish site, English description below) EKE Building and Technology Systems is a Finnish home automation company. They provide a Web based control system for home automation, including monitoring of energy and water consumption, security functions (burglar, fire and water leakage alarms), and lighting, air conditioning and heating measurements and control. The system also provides the residents an access portal to building information and maintenance database. Although the system does not provide a public API, its Web-based interface could be used to mine needed information. In the USEFIL user scenario, the home automation system could provide additional sensor data (temperature, lighting and luminosity levels, energy consumption, air humidity) for the USEFIL control system. Figure 39: EBTS Home system Web interface. Page 75 of 109

76 D2.4 State of the Art 6.2.1.5 Unity Home System http://www.legrand.us/home-systems/unity-home-system.aspx Another example is the Unity Home System by French company Legrand. It is an affordable, modular and expandable home automation system. It is targeted to the home builders at the entry level. It consolidates such popular home technology products such as camera, intercom, multi-room audio and lighting control into one convenient, easy-to-install system. Control and monitoring is possible through any Web-enabled device (e.g. iPhone or Android terminals or PCs). Unity Home System is not design, however, to work with 3rd party appliances, as all of the system components are supposed to be Legrands On-Q series products. 6.2.2 Trends highlights from recent technology shows 6.2.2.1 HYDRA middleware (CeBIT 2010) http://www.hydramiddleware.eu/viewpage.php?page_id=7 The HydraGizer Energy Efficiency Demonstrator was shown at the Fraunhofer FIT demonstration stand at CeBIT 2010 Hannover. The application runs on Android or iPhone smartphones where a summary UI show the current energy consumption of standard household devices like lamps, coffee machines or DVD players. The application recognizes the device using image recognition software and finds the corresponding energy metering device. 6.2.2.2 FRITZ!BOX from AVM (CeBIT 2011) http://www.avm.de/en/press/announcements/2011/pi_cebit_hausautomat.php3 http://www.avm.de/en/press/announcements/2011/pdf/AVM_PI_Cebit_2011_Hausautomatisieru ng_EN.pdf FRITZ!BOX is an internet-enabled home automation hub running open source server FHEM. The system uses an 868-MHz wireless transmitter, which is connected by USB. Additional actuators can also utilize WLAN, DECT and Ethernet connections. Control, regulate and measure with FRITZ!Box Internet-enabled devices as an intelligent interface for house automation Control even when out and about via the Internet or smartphone apps 6.2.2.3 uRemote (CeBIT 2011) http://www.techau.tv/blog/tag/cebitaus/ The uRemote company specializes on IP (Web) controlled home automation controller boxes. They are devices that can be commanded to emit customized RF or Infrared signals for other devices, like home entertainment or remote controlled home appliances. Thus it is possible to virtually have your home remote controller with you all the time, in your smart phone or PC. Possible applications include televisions, projectors, Apple TV, home theatres, DVD/Blu-Ray, PC or Apple Mac, Lighting, air-conditioning, blinds, PlayStation/Xbox, security, pool and garage door controls. Page 76 of 109

77 D2.4 State of the Art 6.2.2.4 CES 2012 trends in consumer electronics at home The International Consumer Electronics Show 2012 (http://www.cesweb.org/), among other things, introduced prototypes and concepts of networked home appliances and the HVAC (heating, ventilation and air conditioning) systems. Not all solutions are yet commercially available (but are coming in the next few years). The new trends in the smart home ecosystem and interoperability: Home appliance speaking to each other, e.g. fridge sending a food recipe to an oven (requires appliances from the same manufacturer, as there is no open interoperability). Mobile (smart) phone connected to various home appliances, like fridges, ovens, washing machines and TVs trough a cloud (WLAN and Internet), for monitoring and control. Media can be shared from a smart phone to other home devices. TVs and other appliances are using Android OS. Medical sensors at home connecting with the smart phone. Links: CES 2012: Google's Schmidt Hints at "Smart Home" Strategy (http://www.mobiledia.com/news/123596.html). CES 2012: Smart appliances transform the home (http://www.canada.com/technology/2012+Smart+appliances+transform+home/5991494 /story.html). Bluetooth product announcements flood out of CES (http://www.bluetooth.com/Pages/ces- 2012.aspx). Especially Sony SmartWatch, iHealth Wireless Blood Pressure Monitor and BodyMedia Link armband and smartphone app. CES 2012 Smart TV Round-Up (http://www.slashgear.com/ces-2012-smart-tv-round-up- 12208959/). Android Makes the Jump to Home Appliances (http://www.dailytech.com/Android+Makes+the+Jump+to+Home+Appliances/article2160 0.htm). Samsung rolls out smart appliances at CES 2012 (smart fridge, Android powered washer and dryer) (http://www.youtube.com/watch?v=ZAhiHY5KtXk). In the USEFIL scenario the use of Android platform in TVs, home appliances, computers and user terminals (mobile phone or tablet) would provide benefits in interoperability. However, the OS also imposes its own restrictions (programmability, computing power, and interfaces to other systems) that should be weighted into the consideration. 6.2.3 Research related to smart homes and ambient intelligence Below is overview of some research projects and selected publications related to smart homes, with priority given to research that also focused on elderly care or health-related applications. 6.2.3.1 Research projects Table 8: Smart home research projects Page 77 of 109

78 D2.4 State of the Art Middleware platform for inter-working of smart embedded Smart Homes For All (SM4ALL) services in immersive and person-centric environments, http://www.sm4all-project.eu/ through the use of composing and semantic techniques for dynamic service reconfiguration. Created a ViSi Tool (based on Google SketchUp and Ruby) to simulate and visualize the home environments. Project ended in 2010. Bridging Research in Ageing and ICT BRAID is an EU FP7 Support Action. Development (BRAID) http://www.braidproject.eu/ The goal of BRAID is to develop a comprehensive RTD roadmap for active ageing by consolidating existing roadmaps and by describing and launching a stakeholder co-ordination and consultation mechanism. It characterises key research challenges and produces a vision for a comprehensive approach in supporting the well-being and socio-economic integration of increasing numbers of senior citizens in Europe. BRAID responds to the apparent need to consolidate the various existing perspectives, plans, roadmaps and research and to coordinate effectively the stakeholders in ICT and Ageing. Long Lasting Memories (LLM) projectThe project has three main application areas by which it tackles the challenge of maintaining health in the aging http://www.longlastingmemories.eu/ population by the help of using ICT: physical training, cognitive training and independent living. The last one utilizes the smart home concept. The home is equipped with sensors (to detect falling down) but not cameras or microphones. Other devices provide mental and physical training activities, or help connecting with professional care people or friends and relatives. There system also enables remote health monitoring. The ICS-FORTH Ambient Intelligence Within this project, a line of research is targeted towards Programme intelligent home environments capable of assisting their inhabitants in everyday life by supporting the pervasive http://www.ics.forth.gr/ami/ diffusion of intelligence in the surrounding environment through various wireless technologies and intelligent sensors. A laboratory has been set up to test different user control interfaces for homes and patient rooms in hospitals. The system automatically maps new devices to the system as widgets. Page 78 of 109

79 D2.4 State of the Art inCASA (Integrated Network for The inCASA project aims to create and demonstrate citizen- Completely Assisted Senior Citizens centric technologies and a services network that can help Autonomy) and protect frail elderly people and prolong the time they can live well in their own homes. The goal will be achieved http://www.incasa- by a series of pilots across Europe that integrate solutions project.eu/news.php and services for health and environment monitoring in order to profile user behaviour. inCASA utilizes the (HYDRA) Data will be made available to professional care service LinkSmart open source middleware providers including privacy protection; day-by-day activity platform planning; co-ordination of Public Social and Health Care http://www.hydramiddleware.eu Services; and deployment of specialist community based services. inCASA is a 30-months project funded by the European Commission under the CIP-PSP programme. It started in 2010. 6.2.3.2 Selected recent publications The paper Requirements for Smart Home Applications and Realization with WS4D- PipesBox176examines the required application and protocol integration efforts needed when creating a home automation system from heterogeneous devices. Besides physical connectivity, real interoperability to effectively develop home automation applications requires harmonizing multiple protocol standards, dealing with a heterogeneous device landscape, different data formats, managing resource constraints of devices and providing means to react quickly when devices and applications leave or join the system. Multiple layers of abstraction are presented as a solution to the many problems in the integration.S4D-PipesBox is software tool and framework to mash up devices and Internet services to create new applications in service oriented device-centric environments. It illustrates how applications could be developed using multiple layers of abstraction. The technology uses Java, OSGi, WS4D, AJAX and simple data formats such as XML and JSON. The paper Design of an Internet of Things-based smart home system177 presents an Internet of Things (IoT) based smart home system that should mitigate the problems apparent in current home automation systems: reliance on central computer, integration of devices from different vendors and difficulty to re-adapt to changing user needs. The Internet of things is shown as a solution as it can bind internet with everyday sensors and devices for linking physical and virtual objects through the exploitation of data capture and communication capabilities. The new system integrates information, telecommunication, entertainment, and living systems for supporting centralized control by communication between the home network and Internet (3G or Ethernet), and configurability by the user. The presented solution relies on a dedicated hardware gateway, however. 176Beckel, C.; Serfas, H.; Zeeb, E.; Moritz, G.; Golatowski, F.; Timmermann, D. Requirements for smart home applications and realization with WS4D-PipesBox. In IEEE 16th Conference on Emerging Technologies & Factory Automation (ETFA), 2011, pp. 1-8 177 Kang Bing; Liu Fu; Yun Zhuo; Liang Yanlei. Design of an Internet of Things-based smart home system. In 2nd International Conference on Intelligent Control and Information Processing (ICICIP), 2011, pp. 921-924 Page 79 of 109

80 D2.4 State of the Art The paper User-Centric Environment Discovery With Camera Networks in Smart Homes178proposes a data integration and reasoning technique for automatic environment discovery in smart homes based on observations of user interactions with objects. This approach is complementary to traditional appearance-based object recognition, which often demands large training sets. In the presented approach, object recognition is achieved in a semantic way by linking object behaviours to the pose and activity of the person using them. The embodiments of the proposed approach in two multi-camera smart environments are described. The presented principle would help a home monitoring system to infer interactions between the resident and his or her household objects, and also locate such object from the camera views. As an example of usage, it could be detected if a person lies on a sofa, bed or floor. The paper Location of an inhabitant for domotic assistance through fusion of audio and non-visual data179 develops a new method to locate a person using multimodal non-visual sensors and microphones in a home environment. The information extracted from sensors is combined using a two-level dynamic network to obtain the location hypotheses. This method was tested within two smart homes using data from experiments involving about 25 participants. The preliminary results show that an accuracy of 90% can be reached using several uncertain sources. The use of implicit localisation sources, such as speech recognition, mainly used in this project for voice command, can improve performances in many cases. The technology appears to be still in development. Still, the ability to find people with only cameras and microphones within an apartment could be a useful addition to the USEFIL solution. Finally, the paper The Design of a Video-Based OSGi-compliant Remote Home Network Control System180 describes an application that enables a user to acquire a remote user interface to an OSGi compliant home device by pointing to its picture in a live video feed from home, using their mobile phone, desktop PC or PDA. The system is not very flexible as the devices need to be manually recognized at the time the home gateway and cameras are set up. 6.3 Open Interconnected Systems The USEIL system is envisioned as an open interconnected system integrating several of the sub- systems discussed above: mobile devices, tablets, video monitoring units, Web-TV, (potentially) home automation systems. Therefore, a major aspect is building an integrated system. This section reviews state-of-art software technology for building open interconnected systems. 6.3.1 Java Enterprise Edition Java Platform, Enterprise Edition, or Java EE, is Oracle's enterprise computing platform. Java EE is a free and highly popular tool for implementing distributed applications that follow the client-server paradigm. It can also be used for building peer-to-peer applications, but each peer has to be a sufficiently-resourceful server (e.g. enterprise Web Services). The Java EE platform provides an API and runtime environment for developing and running software, including network and web services, and other large-scale, multi-tiered, scalable, reliable, 178Chen Wu; Aghajan, H. User-Centric Environment Discovery With Camera Networks in Smart Homes. IEEE Transactions on Systems, Man and Cybernetics, 41(2): 375-383, 2011 179 Chahuara, P.;Portet, F.; Vacher, M. Location of an inhabitant for domotic assistance through fusion of audio and non-visual data. In 5th International Conference on Pervasive Computing Technologies for Healthcare, 2011, pp. 242-245 180 Chuan-Feng Chiu; Chun-Hong Huang; Hsu, S.J.; Sen-Ren Jan. The Design of a Video-Based OSGi-compliant Remote Home Network Control System. In 4th International Conference on Ubi-Media Computing, 2011, pp. 269-273 Page 80 of 109

81 D2.4 State of the Art and secure network applications. Java EE extends the Java Platform, Standard Edition providing an API for fault-tolerance, object-relational mapping, distributed and multi-tier architectures, and web services. One most commonly used base instrument of Java EE is a Servlet. A servlet is a Java application (or a part of an application) to be deployed on an HTTP server such as Apache Tomcat to implement a request-response programming model. A servlet approach does not dictate what specific application protocols (e.g. SOAP, REST, proprietary) are to be used, so it is up to the developers. Another popular Java EE instrument is Enterprise JavaBeans (EJB). EJB is a managed, server-side component architecture for modular construction of enterprise applications. The EJB specification intends to provide a standard way to implement the back-end 'business' code typically found in enterprise applications. Such code addresses the same types of problems, and solutions to these problems are often repeatedly re-implemented by programmers. Enterprise JavaBeans are intended to handle such common concerns as persistence, transactional integrity, and security in a standard way, leaving programmers free to concentrate on the particular problem at hand. Enterprise JavaBeans can we exposed as Web services or through the Java Remote Method Invocation interface (a remote procedure calls (RPC)-based approach). To deploy and run EJB beans, a Java EE Application server, such as JBoss, can be used. 6.3.2 UPnP / DLNA Universal Plug and Play (UPnP) is a set of networking protocols that permits networked devices, such as personal computers, printers, Internet gateways, Wi-Fi access points and mobile devices to seamlessly discover each other's presence on the network and establish functional network services for data sharing, communications, and entertainment. UPnP focuses on peer-to-peer interaction scenarios without a central server. The UPnP technology is promoted by the UPnP Forum. The Digital Living Network Alliance (DLNA) is a non-profit collaborative trade organization established by Sony in June 2003 that is responsible for defining interoperability guidelines to enable sharing of digital media between consumer devices such as TVs, computers, printers, cameras, cell phones, game consoles and other multimedia devices. DLNA uses Universal Plug and Play (UPnP) for media management, discovery and control. UPnP defines the types of device that DLNA supports ("server", "renderer", "controller") and the mechanisms for accessing media over a wireless home network (WLAN). The DLNA guidelines then apply a layer of restrictions over the types of media file format, encodings and resolutions that a device must support. DLNA-support is often branded differently by non-Sony manufacturers, e.g. Samsung uses AllShare to denote their version of DLNA implementation. The most common application of UPnP/DLNA in a home context is media (images/videos/audio) sharing over WiFi: any UPnP-enabled device can expose their media so it can be played on any other of the devices. Some devices can also enable remote control, so that media can be pushed for playing from another device (e.g. from a smartphone to a TV). On the application-level UPnP uses HTTP-based messaging with XML and SOAP. One problem with UPnP is that it uses HTTP over UDP (instead of TCP), even though this approach is not standardized and is specified only in an Internet-Draft that expired in 2001. 6.3.3 Web Services Web services is a popular approach to building distributed software systems interoperating over the Internet or even a local network. In principle, a Web service is any software system designed to be accessed remotely via the HTTP protocol. World Wide Web consortium (W3C) promotes, however, a more restricted viewpoint where Web Services are required to follow one of particular standardized Page 81 of 109

82 D2.4 State of the Art design approaches to further increase the level of interoperability. At present, two such approaches are recognized. One is known as Service-Oriented Architecture (SOA) or WS-*, while the other is Representational state transfer (REST). WS-* is developed with business applications in mind, and is often considered to be too heavy for use in ubiquitous applications. A WS-* service is accessed using Simple Object Access Protocol (SOAP) messaging. SOAP defines an XML-based message envelope and presumes an XML-based message content. WS-* services are presumed to have their interfaces described in a machine- processable format, using Web Services Description Language (WSDL). Such WSDL descriptions are to be published with a Universal Description, Discovery and Integration (UDDI) service broker, to enable service discovery and interface adaptation. The alternative approach building Web Services, REST, is growing in popularity, especially among developers of ubiquitous and Web of Things (WoT) applications, due to its light weight and simplicity. REST explicitly allows for various Web data formats to be used in addition to XML: JSON, RDF (as N3 or RDF/XML), even CSV. REST also attempts to reduce to an absolute minimum the requirements posed on devices hosting Web Services, to enable in the future deployment of such Web services even on very resource-limited devices such as sensors181. For that, REST assumes stateless interaction between servers and clients: the server does not have to keep the history of past requests to process and answer any given request. Also, instead of requests that are commands consisting of parameter/value pairs, REST uses simple URIs to address a resource plus typical HTTP commands: GET to retrieve the representation of a resource, POST to create a new resource, PUT to update the state of an existing resource or to create a resource, and DELETE to remove a resource. For example, a request like GET http://:/generic- nodes/1/sensors/temperature intends to retrieve a temperature measurement from a sensor, while a request like PUT http://:/generic-nodes/1/sensors/temperature with e.g. JSON-encoded HTTP payload {value: 25, unit: C} can be used by the sensor itself to push the measurement into a client. The REST as such does not prescribe how service discovery, etc., is to take place. Mentioned above Web of Things (WoT) is a trend in networking that attempts to integrate physical things (e.g. in a smart home domain) seamlessly with the existing Web infrastructure and to expose connected things uniformly as Web resources. The aim is to reuse the architectural principles of the Web and apply them to the connection with the real world, i.e. with (smart) entities like smart fridges (with embedded computers), smart packages (with RFIDs), smart rooms (with sensors and actuators), thereby making them first-class citizens of the Web. In other words, instead of designing special solutions for localized interaction (e.g. within a home) like UPnP, the focus is on enabling traditional Internet/Web to grow and reach into the physical world. Then, e.g. a home network is not an isolated entity but just an Intranet within and seamlessly connected to the rest of the Web. As also mentioned, WoT developers tend to favor the REST approach to implementing Web services. 6.3.4 Multi-Agent Systems A major alternative to the Web Services approach to development of open distributed systems is multi-agent systems (MAS). A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents (e.g. software agents) within an environment. The generality of this definition means that, in principle, a set of interacting Web Services can be seen as a kind of MAS. In Web Services, however, one important simplifying assumption is made: that the agents do not share any common resources, deliberate use of each other being thus the only kind of interaction between the agents, and 181 D. Guinard A Web of Things Application Architecture - Integrating the Real-World into the Web, Ph.D. Thesis, ETH Zurich, 2011 Page 82 of 109

83 D2.4 State of the Art therefore that just the knowledge of services provided by each agent on its interface is sufficient for interoperation. In general multi-agent systems, such an assumption is not made. Moreover, much of MAS research and development is directed towards enabling shared use of resources (physical space, computational resources, etc.) through communication of intentions, coordination, negotiation, and conflict resolution. The simplification assumption of Web Services works fine as long as the Services are running on different or powerful-enough servers and do not need to access to the same physical-world entities. In other cases, MAS can be a better paradigm. In most cases, a middleware-based architecture is utilized for implementing MAS. This implies that instances of a middleware platform are deployed on all interacting hosts and they host agents that have to be programmed using provided APIs. Then, agents communicate using asynchronous messaging with middleware guaranteeing that the recipient will be discovered and the message will be delivered. Such an architectural approach is standardized in IEEE Foundation for Intelligent Physical Agents (FIPA). FIPA specifications182 describe the services to be provided by middleware (messaging, agent lifecycle management, white and yellow pages discovery etc.) and defines a set of languages that can be used in agent communication, like FIPA ACL (defines the structure of a message envelope, can be encoded in XML). One popular FIPA-compliant middleware is JADE (Java Agent DEvelopment Framework)183 184. It is noticeable that most of MAS platforms like JADE are Java-based, meaning that it is difficult to directly utilize them for connecting also to small devices that are not supporting Java. It should be noticed, however, that there is nothing in principle prohibiting developing some of the agents in a different programming language without a middleware. They will still be able to communicate with the rest of the system as long as common communication protocols are followed. 6.3.5 Tuple/Triple Spaces A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing. It provides a repository of tuples that can be accessed concurrently. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. This is also known as the blackboard metaphor or distributed shared memory. A tuple is ordered list of data elements. The extreme case of merging between a multi-agent system and a tuple space is the case where any direct communication between the agents is substituted with posting to and reading from the tuple space. While potentially having some issues with respect to scalability and performance, on the positive side such an approach separates the data itself from such questions as data availability (where to find it?) and transmission (when and where to send?) and thus greatly simplifies distributed application development. This paradigm has become quite popular in the Web of Things field. Popular Cloud-services like Cosm185 (used to be known as Pachube) are such triple-spaces. A part of an application can send data (typically sensor measurements, but also commands to devices) to the Cosm database over simple HTTP REST API, while another part of this application, or another application, can access this data, also over HTTP REST API, from anywhere else in the world. A special case of a tuple-space is the triple space, where all the tuples are semantic triples {subject, predicate, object}. A triple space (often implied in research literature when speaking of a smart space) can be organized simply but deploying an RDF data server, such as Sesame or Virtuoso, and 182 http://www.fipa.org/repository/standardspecs.html 183 http://jade.tilab.com/ 184 F. L. Bellifemine, G. Caire, and D. Greenwood. Developing Multi-Agent Systems with JADE. Wiley, 2007 185 https://cosm.com/ Page 83 of 109

84 D2.4 State of the Art making agents to post and read data using the standard SPARQL language over HTTP. Some triple- space-based approaches were also developed specifically with resource-limited devices in mind. One noticeable is Smart-M3186 developed in EU ARTEMIS JUs SOFIA project. In Smart-M3, both the agents and the RDF database are very light weight and programming-language agnostic, with a goal of enabling deployment on a variety of devices including embedded sensor platforms. An additional design choice is support for the publish-subscribe mechanism not available in typical databases: i.e. allowing agents to subscribe to a certain data pattern and then to receive automatic notifications that some change (triples added or removed) has occurred, in so avoiding continuous polling of data. A number of Smart-M3 implementations have been developed, varying in reliability, performance, and supported feature set. 186 http://en.wikipedia.org/wiki/Smart-M3 Page 84 of 109

85 D2.4 State of the Art 7. Decision Support Systems 7.1 Introduction Clinical Decision Support Systems (CDSSs) have almost 40 years of history. From the first generation of CDSSs such as MYCIN187 and QMR188, to the second generation such as Protg189, and to very recent CDSSs such as a DSS for lower back pain diagnosis190, significant research progress, both theoretically and practically, has been made since the idea of computer-based CDSSs first emerged. Several term definitions have been given in the literature. Some of them are provided at this section. Musen191 has defined a CDSS as any piece of software that takes information about a clinical situation as inputs and that produces inferences as outputs that can assist practitioners in their decision making and that would be judged as intelligent by the programs users. Figure 40: The general model of CDSS Miller and Geissbuhler192 defined a CDSS as a diagnostic tool based on computer-based algorithms that assists a clinician with one or more steps of the diagnosis. Sim et al.193 defined CDSSs as software that is designed to be a direct aid to clinical decision-making, in which the characteristics of an individual patient are matched to a computerized clinical knowledge base and patient specific assessments or recommendations are then presented to the clinician or the patient for a decision. Recently, researchers have been trying to classify CDSSs in the literature so as to provide a holistic view of CDSSs. For example, Berlin et al.194 did research on a CDSS taxonomy to describe the 187 E. H. Shortliffe, Computer-Based MetricalConsultations: MYCIN (Elsevier, New York, 1976). 188 R. A. Miller, F. E. J. Masarie, Methods of Information in Medicine 28, 340 (1989). 189 M. A. Musen, J. H. Gennari, H. Eriksson, S. Tu, A.R. Puerta, Medinfo 8, 766 (1995). 190 L. Lin, P. J.-H. Hu, O. R. Liu Sheng, DecisionSupport Systems 42, 1152 (2006). 191 M. A. Musen, in Handbook of medical informatics J.H. V. a. M. Bemmel, M. A. , Ed. (Bohn Stafleu VanLoghum, Houten, 1997). 192 R. A. Miller, A. Geissbuhler, in Clinical Decision Support Systems E. S. Berner, Ed. (Springer-Verlag,New York, 1999), vol. 3-34. 193 I. Sim et al., J Am Med Inform Assoc 8, 527 (November 1, 2001, 2001). 194 A. Berlin, M. Sorani, I. Sim, Journal of Biomedical Informatics 39, 656 (2006). Page 85 of 109

86 D2.4 State of the Art technical, workflow, and contextual characteristics of CDSSs, and the research results are very useful for researchers to have a comprehensive understanding of various designs and functions of CDSSs. 7.2 Data to Information to Outcome: Lifecycle Management Before proceeding to the description of the bibliography regarding CDSSs a short review will be given on the data pre-processing steps and mainly on the various data feature (attributes) selection algorithms and sensor fusion techniques. 7.2.1 Feature Selection The preliminary procedure of feature selection allows the definition of the more relevant markers from the collected data as the most important dimensions of the measurements. The primary goal of marker selection is to find a set of (relatively few of) them, which will be able to describe the domain of interest fairly well, both in terms of classification accuracy and marker quality. The first attribute (accuracy) relates to the ability of the selected markers to successfully classify samples into their correct class, whereas the quality attribute reflects the ability of each marker to clearly differentiate its measurement among the states of interest. Feature selection methods can be roughly divided into two categories195 , i.e. Filter and Wrapper approaches. In filter schemes, features are ranked in a pre-processing step according to some coefficient independent of the classification method. Thus, each feature is ranked individually, based on its discriminate power among classes indicated by the t-test, fold-change, or other metrics of class differences196. Instead, in wrapper approaches a classifier is used to generate scores to be used as the feature ranking criterion. Thus, features are used as groups instead of individuals, considering the group classification power instead of class-discriminate power. Wrapper approaches are very much dependent on the classification outcome, since they follow a recursive process where feature weights are re-evaluated in every classification cycle. Thus, wrapper approaches focus on accuracy, whereas filter methods consider mainly the quality aspects of the feature selection process. Among the various feature selection methods proposed, RFE-SVM197 is an approach that has shown remarkable results in various large-size datasets. It uses a linear kernel to estimate the weight vector of the separating hyperplane and this vector is then applied as the ranking criterion of features. However, depending on the data distribution and the complexity of the classification problem, the algorithmic design based on the philosophy of RFE-SVM may lead to ill-defined and ill- distinctive clusters of markers. A new criterion for the RFE recursion based on a hybrid criterion that employs an appropriate learning procedure taking under consideration the quality aspect of features, produces more compact and distinct clusters of markers198 . 7.2.2 Sensor fusion Generally speaking sensors, sensor nodes, and communication links are not always reliable or even available at some time periods. By combining individual sensor and sensor nodes and involving 195 M. . Blazadonakis, A. Perperoglou and M. Zervakis, (2007), Using a Single Neuron as a Marker Selector A Breast Cancer Case Study, In Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon 2007, 4219-4222. 196 M. J. Van De Vijver, Y. D. He, L. J. Vant Veer, et al., (2002), A gene expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 34 197 I. Guyon, J. Weston, S. Barnhill and V. Vapnik, (2002), Gene selection for cancer classification using Support vector machines, machine learning, 46, 2002, 389-422.7. 198 M. E. Blazadonakis, M. Zervakis, M. Kounelakis, E. Biganzoli and N. Lama, (2006), Support Vector Machines and Neural Networks as Marker Selectors for Cancer Gene Analysis, In proceedings of the 3rd International IEEE Conference on Intelligent Systems, London 2006, 626-631. Page 86 of 109

87 D2.4 State of the Art more than one sensor node in the classification process, it is ensured that there are always some sensor nodes contributing to the classification process and compensating for the errors. The fusion-based approach uses the basic notions of the local approach and lets individual sensor nodes first classify and detect activities on their own. Then, the classification results are all sent to a fuser/ voter node (e.g., a cluster head) to reach a consensus. The most commonly used sensor fusion methods are: classical199or Bayesian inference200, Dempster- Shafer theory of evidence201, voting, fuzzy logic and neural networks. According to recent research efforts, effective sensor fusion can be used to combine the strengths of several kinds of sensors (e.g. wearable, unobtrusive sensors) by fusing their sensory data at hardware, raw data, feature, or decision levels202. At the hardware level, it can be achieved by using simple thresholds203. In the data level fusion, the raw output data of sensors are combined and dimensionality reduction such as Principal Component Analysis (PCA) is often deployed before further pattern classification techniques are applied. Feature Fusion is performed on features vectors extracted from each sensor. This approach is used in various multi-modal application areas such as Human Computer Interaction and Affective Computing as well204. Decision Fusion is performed by combining the outputs of the classifiers for each signal205. The decision level fusion involves combination of sensor high level output data (e.g. events).The use of sensor fusion at the decision level facilitates an extensible sensor system, because the number and types of sensors are not limited. 7.2.3 Data mining methods As it is shown from the above schema, i) knowledge representation schemes of a certain medical domain, along with ii) an appropriate inference engine form the main heart of a computerized CDSS. A short review regarding those two core components will follow in the next sections. 7.2.4 Knowledge representation According to Carter (1999)206 knowledge representation is the key factor to formalize domain specific information in the best efficient way for the expert system to further analyse. A possible classification of knowledge representation schemes can be as follows: logic, procedural, graph/network and structured. 199David Lee Hall, .Mathematical Techniques in Multisensor Data Fusion., Artech House Inc., 1992 ISBN 0-89006-558-6 200 Lawrence A. Klein, .Sensor and Data Fusion Concepts and Applications. (second edition), SPIE Optical Engineering Press, 1999, ISBN 0-8194-3231-8 201 Huadong Wu, Mel Siegel, Rainer Stiefelhagen, and Jie Yang, "Sensor Fusion Using Dempster-Shafer Theory," presented at IEEE International Measurement Technology Conference (IMTC) 2002, Anchorage AK USA, 2002 202McCullough C L, Dasarathy B V, Lindberg P C (1996) Multi-levelsensor fusion for improved target discrimination, Decision and Control.Proceedings of the 35th IEEE , vol.4, no.pp.3674-3675 vol.4, 11-13 Dec 1996 203Laerhoven K V, Gellersen H W, Malliaris Y G (2006) Long termactivity monitoring with a wearable sensor node. InternationalWorkshopon Wearable and Implantable Body Sensor Networks (BSN2006), 3-5 April 2006 204 Kim, J.: Bimodal emotion recognition using speech and physiological changes. RobustSpeech Recognition and understanding, I-Tech Education and Publishing, Vienna, Austria(2007) 205 Ahiskali M, Green D, Kounios J, Clark CM, Polikar R (2009) ERP based decision fusion for AD diagnosis across cohorts. Conf Proc IEEE Eng Med Biol Soc 1: 24942497. 206 J. H. Carter, in Clinical Decision Support Systems E.S. Berner, Ed. (Springer-Verlag, New York, 1999) pp. 169-198 Page 87 of 109

88 D2.4 State of the Art Logic-based representations are declarative in nature, in that they consist of true or false propositions and all problems are resolved through standard logic inference mechanism which is simply a look up of known facts (Carter, 1999). On the other hand, procedural knowledge representation, by introducing rules, offers the ability to perform diagnosis and drive decision- making (Carter, 1999). In addition to simple rules, many CDSS also incorporate probability theory in order to describe uncertainty. Fuzzy logic207208 and Bayes rule209 are used by researchers in designing knowledge representation schemes. However, Bayes rule has a limitation regarding the independent nature of most clinical signs and symptoms.210 Graph/network knowledge representation include Bayesian belief network,211212 decision trees,213214 artificial neural networks215 and semantic networks.216 Adopting a Bayesian network as representation scheme allows one to explicitly take advantage of conditional independencies from the modelling viewpoint, and to rely on several powerful algorithms for probabilistic inference.217 Decision trees are frequently used in guideline-based CDSSs targeted for therapeutic recommendations, such as EsPeR system.218 Decision trees main advantage is that they are simple to understand and interpret, but they have to be combined with other representation schemes. Types of artificial neural networks used in CDSSs include feed forward neural network, recurrent network, stochastic neural network, and modular neural network. The greatest advantage of artificial neural networks is that they have the ability to learn from the observed data. The disadvantage is that they are unable to provide reliable and logical representation of knowledge beyond their learnt zones. Structural knowledge is organized and managed in database management systems (DBMS). Relational database schemas are the most common approach to record patient history data and clinical signs and symptoms. However, several CDSSs use object-oriented database management systems (OODBMS) to store medical knowledge, which are limited by data types in relational databases.219 DBMSs could be the ideal mechanism in order to manipulate declarative and procedural knowledge with or without uncertainty. However, DBMS has a major drawback. Although its structured query language (SQL) can query existing data, create new records, update existing ones and delete information, it lacks a specific knowledge inference mechanism to reason and draw logic conclusions from the data. A semantic network or net is a graphic notation for representing knowledge in patterns of interconnected nodes and arcs. Semantic networks are a declarative graphic representation that can be used either to represent knowledge or to support automated systems for reasoning about knowledge. The main advantages of Semantic Networks can be summarized as follows: i) 207 S. Shiomi et al., J Nucl Med 36, 593 (April 1, 1995,1995). 208 S. Suryanarayanan, N. P. Reddy, E. P. Canilang, International Journal of Bio-Medical Computing 38, 207 (1995). 209 H. R. J. Warner, Methods of Information in Medicine 28, 370 (Nov, 1989). 210 S. A. Spooner, in Clinical Decision Support Systems E. S. Berner, Ed. (Springer-Verlag, New York, 1999) pp. 35-60. 211 R. Montironi, P. H. Bartels, D. Thompson, M. Scarpelli, P. W. Hamilton, Analytical and quantitative cytology and histology 16, 101 (1994). 212 P. H. Bartels, D. Thompson, M. Bibbo, J. Weber, Analytical and quantitative cytology and histology 14, 459 (Dec, 1992). 213 P. Kokol, M. Mernik, J. Zavrnik, K. Kancler, M. Ivan, Journal of Medical Systems 18, 201 (August, 1994). 214 V. Podforelec, P. Kokol, in Engineering in Medicine and Biology Society, 1998. Proceedings of the 20th Annual International Conference of the IEEE. (Hong Kong, 1998), vol. 3, pp. 1202-1205 vol.3. 215 W. Baxt, Annals of internal medicine 115, 843 (Dec 1, 1991). 216 Abdel-Badeeh M. Salem, Marco Alfonse, Ontology versus Semantic Networks for Medical Knowledge. 12th WSEAS Int.l Conf. on COMPUTERS, Heraklion, Greece, July 23-25, 2008. 217 P. T. Stefania Montani, International Journal of Intelligent Systems 21, 585 (2006). 218 I. Colombet et al., International Journal of Medical Informatics 74, 597 (2005). 219 F. Pinciroli, C. Combi, G. Pozzi, Medical informatics 17, 231 (Oct-Dec, 1992). Page 88 of 109

89 D2.4 State of the Art hierarchical organization of knowledge, ii) semantic networks in the form of massively parallel networks or agents may provide the underlying framework for modelling reflexive reasoning (is-a- link), iii) they are easy to visualize and iv) related knowledge is easily clustered. On the contrary semantic networks show ambiguities, e.g. no semantics for semantic networks, different interpretations for the same network, no standards about node and arc values. Semantic networks lack a common understanding of semantics, which leads us to ontologies, which will be discussed in the next sections regarding possible use of semantic inference techniques in USEFIL DSS. 7.2.5 Inference mechanisms From the available literature it stems that the most frequently used inference mechanisms for CDDSs include rules, Bayesian, Bayesian Belief networks, heuristic, semantic networks, neural networks, genetic algorithms and several other problem case-oriented. In rule-based CDSSs, sequences of if-then rules are processed, resulting to a true or false statement. The forward and backward chaining of rules may be used to conclude a diagnosis and provide diagnostic explanations for clinical users.220 Bayesian systems predict the posterior probability of diagnoses based on the prior disease probabilities, and the sensitivity and specificity of confirmed clinical signs and symptoms.221 Bayesian belief networks are often created as reformulations of traditional Bayesian representations and can provide many of the same browsing and explanation capabilities of traditional systems222. Heuristic systems include statistical learning methods. Such models that had been proposed in literature use for example support vector machine (SVM)223 and least square support vector machine (LSSVM).224 In regard to semantic network, since most medical knowledge is ill-structured and involves uncertainties, it is difficult to use a pure semantic network to make clinical inference in CDSSs. Neural networks are frequently used tools as inference mechanism since during the development of a CDSS, one is not required to understand the relationship between input and output variables. Neural networks are a black box modelling technique that models relationships by learning from historical data, while developers of CDSSs based on Bayesian networks need to have sufficient domain knowledge including related probabilities. According to Li et al.225 neural networks prove to be a better solution than conventional statistical techniques, in case of a complex non-linear CDSS such as a traumatic brain injury CDSS. The disadvantage of a neural network is that the rules that the network uses do not follow a particular logic and are not explicitly understandable. Genetic algorithms226227 can extract the best solutions through an iterative process. An optimal solution which is the fittest can be reached. However the main challenge is to set correctly the fitness metric. 220 E. H. Shortliffe, L. E. Perreault, Medical Informatics (Addison Wesley Publishing, Reading, MA, 1990). 221 H. R. Warner, Computer-Assisted Medical Decision Making (Academic Press, New York, 1979). 222 Y. C. Li, P. J. Haug, H. R. Warner, in Proceedings /the ... Annual Symposium on Computer Application [sic] in Medical Care. (American Medical Informatics Association, 1994) pp. 765-769. 223 L. Guo, W. Yan, Y. Li, Y. Wu, X. Shen, paper presented at the Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society 2005. 224 E. Comak, K. Polat, S. Gunes, A. Arslan, Expert Systems with Applications 32, 409 (2007). 225 Y.-C. Li, L. Liu, W.-T. Chiu, W.-S. Jian, International Journal of Medical Informatics 57, 1 (2000). 226 J. W. Grzymala-Busse, L. K. Woolery, in Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. (American Medical Informatics Association, 1994) pp. 730-734. 227 M. Levin, M.D.Computing: computers in medical practice 12, 193 (May-Jun, 1995). Page 89 of 109

90 D2.4 State of the Art Recent studies, have implemented a combination of the aforementioned inference mechanisms to provide medical reasoning, such as LSSVM with fuzzy weighting,228 and artificial neural networks with fuzzy weighting.229 A separate section follows where ontologies and semantic inference mechanism are discussed as a further CDSS solution. 7.3 Semantic Inference and Personalization Technologies In USEFIL, information coming from different sensors will be fused into a coherent and personalized representation of the person's status. This will be based on semantic inference and personalization technologies. Such technologies are typically applied to multimedia content interpretation, fusing information from the analysis of the individual modalities (text, audio, images, and video). The overall approach is to use single-source analysis results to populate a semantic model of the domain and then use inference to build a comprehensive abstract interpretation of the multimedia scene, not only putting together the data from different sources but also detecting inconsistencies and inferring implicit information. The semantic model encodes background information about the domain, encoding expectations about what can possibly happen and what not as well as defining high-level concepts in terms of low-level analysis results.230Such a semantic model representation could be realised through event algebras, which have been adopted by recent surveys231,232 for sensor data fusion and event recognition (pattern matching) purposes. A properly structured event algebra can define the semantics of complex events by allowing to easily adapt to different application needs or support the integration of applications that combine events from heterogeneous data sources (e.g. sensors). One of the most prominent formalisms for representing semantic knowledge is OWL.233 OWL is closely coupled to DescriptionLogic (DL), the fragment of first-order predicate logic that is sufficient to reason over OWL ontologies.234 In order to better represent the uncertainty that is inherent in content analysis as well as the domain itself, state-of-the-art semantic fusion relies on probabilistic235 and fuzzy236 extensions of OWL inference. 228 E. Comak, K. Polat, S. Gunes, A. Arslan, Expert Systems with Applications 32, 409 (2007). 229 K. Polat, S. Gunes, Digital Signal Processing 16, 913 (2006). 230 Castano, S.; Espinosa, S.; Ferrara, A.; Karkaletsis, V.; Kaya, A.; Melzer, S.; Moller, R.; Montanelli, S.; Petasis, G. "Ontology Dynamics with Multimedia Information: The BOEMIE Evolution Methodology". In Proc. Intl Workshop on Ontology Dynamics (IWOD -07), 4th European Semantic Web Conference (ESWC-07). Innsbruck, Austria, June 2007. 231 Alexander Artikis, Georgios Paliouras, Franois Portet, Anastasios Skarlatidis: Logic-based representation, reasoning and machine learning for event recognition. DEBS 2010: 282-293 232 Cugola and Margara. Processing flows of information: from data stream to complex event processing. ACM Computing Surveys. 2012. 233 W3C OWL Working Group, "OWL 2 Web Ontology Language". W3C Recommendation, 27 October 2009. http://www.w3.org/TR/owl2-overview 234 Baader, F.; Calvanese, D.; McGuinness, D.; Nardi, D.; Patel-Schneider, P. "The Description Logic Handbook: Theory, Implementation and Applications". Cambridge University Press, 2003. 235 Gries, O.; Mller, R.; Nafissi, A.; Rosenfeld, M.; Sokolski, L.; Wessel, M. "A Probabilistic Abduction Engine for Media Interpretation based on Ontologies". In Proc. 4th International Conference on Web Reasoning and Rule Systems (RR 2010). Bressanone, Italy, September 2010. 236 A)Konstantopoulos, S. and Apostolikas, G. "Fuzzy-DL reasoning over unknown fuzzy degrees". In Proceedings of 3rd International IFIP Workshop on Semantic Web & Web Semantics (IFIP SWWS 2007), Vilamoura, Portugal, November 2007. LNCS 4806, Springer, 2007. B) Konstantopoulos, S. and Charalambidis, A. "Formulating Description Logic learning as an Inductive Logic Programming task". In Proc. 2010 IEEE Conference on Fuzzy Systems (FUZZ-IEEE 2010), Barcelona, 18-23 Jul 2010. Page 90 of 109

91 D2.4 State of the Art Besides supporting information extraction application, fusion and interpretation results are also used to refine and evolve the domain knowledge used in order to achieve fusion and interpretation, placing domain knowledge, fusion, and interpretation in a spiral of increasing accuracy. This methodology was introduced in the BOEMIE project237238 and in USEFIL, will be extended to personalize, rather than correct, the domain model. A complementary approach for fusing sensor data concerns symbolic activity recognition (a survey of related techniques was presented earlier). Languages, such as the Event Calculus, are used for semantic inference over the multi-modal sensor data. Such languages have built-in rules for temporal representation and reasoning, which is quintessential for this type of semantic inference. Furthermore, they have direct routes to reasoning under uncertainty, as well as direct routes to machine learning allowing for dynamic knowledge refinement. 7.3.1 Ontologies: User & Domain Modelling Domains of interests are nowadays mostly modelled in term of ontologies, which provide for conceptualizations of the domain knowledge, but also for more generic knowledge. As ontologies represent the shared understanding of some domain of interest and thus are the result of a social process239 their development is anything but a trivial task and requires a well elaborated and matured methodology. A relevant number of projects have been dealing with issues related to sensor networks and smart home context representation based on ontologies. 7.3.1.1 Smart Home contextual representation Until now, many endeavours have been made towards the vision of functional smart homes in the sense of adaptation, personalization and context-awareness. However there seems to be a gap between heterogeneous technologies that are part of a smart home infrastructure. The reason that this may happen probably stems from the fact that there is no common data model both for data and application levels, that also incorporates expressiveness. Lack of semantics and great difficulty in sharing and integrating multiple data sources, has led to low potential in performing intelligent data analysis and enriching the existing knowledge base with new patterns. However numerous approaches have tried to support smart homes creating a more flexible and configurable data model enriched with semantic information. Latfi et al.240 partitioned the domain of a tele-health monitoring system into subparts, thus creating numerous ontologies along with properties called meta relations in order to model the relations between the ontologies. The most important ontologies that are used to model the context of smart home application domain are: i) the habitat ontology used to describe the household environment, e.g. rooms, ii) the PersonAndMedicalHistory ontology dedicated to user modelling and his/her medical history consisting of a description of deficits, medication and possible risk factors for cognitive decline, iii) the behaviour ontology which addresses life habits and critical physiological 237 Kosmopoulos, D.; Petridis, S.; Pratikakis, I.; Gatos, V.; Perantonis, S.; Karkaletsis, V.; Paliouras, G. "Knowledge Acquisition from Multimedia Content using an Evolution Framework". In Proc. 3rd IFIP Conference on Artificial Intelligence Applications & Innovations (AIAI- 06). Athens, 7-9 June, 2006. 238 Castano, S.; Espinosa, S.; Ferrara, A.; Karkaletsis, V.; Kaya, A.; Melzer, S.; Moller, R.; Montanelli, S.; Petasis, G. "Ontology Dynamics with Multimedia Information: The BOEMIE Evolution Methodology". In Proc. Intl Workshop on Ontology Dynamics (IWOD -07), 4th European Semantic Web Conference (ESWC-07). Innsbruck, Austria, June 2007. 239 Davies, J., Fensel, D., & Harmelen, F. ,Towards the Semantic Web: Ontology-Driven Knowledge Management: John Wiley and Sons, 2003. 240 Latfi, F., Lefebvre, B., Descheneaux, C.: Ontology-Based Management of the Telehealth Smart Home, Dedicated to Elderly in Loss of Cognitive Autonomy, Proceedings of the OWLED 2007 Workshop on OWL: Experiences and Directions Page 91 of 109

92 D2.4 State of the Art parameters, iv) the software ontology which represents software components of the smart home in such a way that they can be easily understood and reused. A great example of use within the proposed approach is the adaptation of the user interfaces that the senior uses to communicate with the outside world, according to his/her deficiencies, which are described in the PersonAndMedicalHistory ontology and that is interlinked with the Software ontology, thus feeding intelligent user interfaces with appropriate information. Klein et al.241 proposed a shared context ontology focusing on state properties of people under monitoring, the environment, detecting events and scheduling system responses. Instances of states are enriched with temporal metadata so as to evaluate current status regarding former information. Chen et al.242 research shares some consensus with243 in ontology modelling (including ADLs, inhabitants, devices (sensors and/or actuators), services and applications) and with 244 in the role and use of ontologies. However, it significantly differs from these works in that ontologies are regarded as a conceptual backbone and a common vehicle for enabling and supporting communication, interaction, interoperability, integration and reuse among devices, environments, inhabitants and external sources. In 245 context is separated in five categories in order to serve personalized healthcare advices. Personal health, environment conditions, task, spatio/temporal and terminal contexts can be of great value to personalization engine in order to reason about personal health conditions and make health suggestions on the right time and with right representation manner (accessibility is of utmost importance) based upon available media. A context-aware system for ubiquitous computing has been introduced by Ko et al.246 Context was formalized into three ontologies: person context ontology, device context ontology and environment context ontology. Psychological context represented in the form of a subclass, namely Emotion context ontology classified into: No Emotion, Anger, Hate, Platonic Love, Romantic Love, Joy, Grief and Reverence. Two inference mechanisms are incorporated in order to produce high-level context from low-level contextual sensor values: the axiomatic semantics-based inference and domain specific rules of the ontology. The latter serves as a personalized alarm triggering mechanism as it recognizes the health status of the user and adjusts its logic according to user status changes. Another ontological representation of context has been proposed by Ricquebourg et al.247 SWRL rules were used as an inference tool from the ontology content. First order logic rules (logical connectors AND (), IMPLY (), NEGATION ()) and more advanced operators such as comparison operators (e.g. `swrlb: lessThan'), mathematical operators (addition, subtraction, etc.)). 241 Klein, M., Schmidt, A., Lauer, R.: Ontology-Centred Design of an Ambient Middleware for Assisted Living: The Case of SOPRANO, In proceeding of Towards Ambient Intelligence: Methods for Cooperating Ensembles in Ubiquitous Environments (AIM-CU), 30th Annual German Conference on Artificial Intelligence (KI 2007), Osnabrck, (2007) 242 Chen, L., Nugent, C.D., Mulvenna, M., Finlay, D. and Hong, X. (2009) Semantic Smart Homes: Towards Knowledge Rich Assisted Living Environments, Studies in Computational Intelligence, Vol.189, pp.279-296. 243 Latfi, F., Lefebvre, B., Descheneaux, C.: Ontology-Based Management of the Telehealth Smart Home, Dedicated to Elderly in Loss of Cognitive Autonomy, Proceedings of the OWLED 2007 Workshop on OWL: Experiences and Directions 244 Klein, M., Schmidt, A., Lauer, R.: Ontology-Centred Design of an Ambient Middleware for Assisted Living: The Case of SOPRANO, In proceeding of Towards Ambient Intelligence: Methods for Cooperating Ensembles in Ubiquitous Environments (AIM-CU), 30th Annual German Conference on Artificial Intelligence (KI 2007), Osnabrck, (2007) 245 D.Zhang, Z.Yu, C.Y.Chin, Context-Aware Infrastructure for Personalized HealthCare, International Workshop on Personalized Health, ISO Press, Dec. 13-15 2004. 246 E-J. Ko, H-J. Lee, J-W. Lee, 2007, Ontology-Based Context Modeling and Reasoning for U-Healthcare, IEICE Transactions on Information and Systems 2007, 90(8), pp. 1262-1270. 247 V. Ricquebourg, D. Durand, D. Menga, B. Marhic, L. Delahoche, C.Loge, and A.-M. Jolly-Desodt, "Context Inferring in the Smart Home: An SWRL Approach" presented at 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW '07), 2007. Page 92 of 109

93 D2.4 State of the Art Much research has also been done on the domain of chronic patients monitoring. Such work is presented in 248 and 249. In 250 an ontology-based context model and a related context management middleware are described providing reasoning mechanisms for alarm situation handling, whereas a context model for detecting and handling agitation in people suffering from dementia is described in the paper251. Figure 41: COCON conceptual model Apart from the above mentioned test cases there are two upper ontologies that are used as reference in context aware domain modelling middleware. These are namely the SOUPA ontology 252 and the CONON ontology.253 SOUPA consists of two distinctive but related set of ontologies: SOUPA Core and SOUPA Extension. The set of SOUPA core ontologies defines generic semantics that have a universal character regarding pervasive computing applications, while the set of SOUPA extension ontologies defines additional more specific vocabularies and provides examples on how to define new ontology extensions. The SOUPA core ontology consists of nine ontology documents. Together these ontology documents define vocabularies for describing person contact information (FOAF ontology254), beliefs, desires, and intentions of an agent, actions, policies, time (DAML-time and 248 Paganelli, F., Giuli, D.: An Ontology-based Context Model for Home Health Monitoring and Alerting in Chronic Patient Care Networks. Accepted at First International Workshop on Smart Homes for Tele-Health, AINA (2007) 249 Fook, V.F.S., Tay, S.C., Jayachandran, M., Biswas, J., Zhang, D.: An Ontology-based Context Model in Monitoring and Handling Agitation Behaviour for Persons with Dementia. Proc. of Fourth International Conference on Pervasive Computing and Communications, Workshop on Ubiquitous and Pervasive HealthCare (2006) 250 Paganelli, F., Giuli, D.: An Ontology-based Context Model for Home Health Monitoring and Alerting in Chronic Patient Care Networks. Accepted at First International Workshop on Smart Homes for Tele-Health, AINA (2007) 251 Fook, V.F.S., Tay, S.C., Jayachandran, M., Biswas, J., Zhang, D.: An Ontology-based Context Model in Monitoring and Handling Agitation Behaviour for Persons with Dementia. Proc. of Fourth International Conference on Pervasive Computing and Communications, Workshop on Ubiquitous and Pervasive HealthCare (2006) 252 Harry Chen, Philip Perich, Tim Finin, and Anupam Joshi, The SOUPA Ontology for Pervasive Computing, in Valentina Tamma, Stephen Cranefield, Tim Finin, and Steven Willmott (Ed.), Ontologies for Agents: Theory and Experiences (London, UK: Springer-Verlag, 2005). 253 Xiao Hang Wang, Da Qing Zhang, Tao Gu, and Hung Keng Pung, Ontology based context modeling and reasoning using OWL, Proc. of the 2nd IEEE Annual Conference on Pervasive Computing and Communications Workshops, Orlando, Florida,USA, 2004, 18-22. 254 Edd Dumbill. Finding friends with xml and rdf. In IBM developer Works, XML Watch. xmlhack.com, June 2002. Page 93 of 109

94 D2.4 State of the Art entry sub-ontology of time), space (OpenCyc and OpenGIS255), and events. The context model in COCON ontology is structured around a set of abstract entities, each describing a physical or conceptual object including Person, Activity, Computational Entity (CompEntity) and Location, as well as a set of abstract sub-classes. 7.3.1.2 Semantic Sensor Networks 7.3.1.2.1 OGC Sensor Web Enablement Sensor Web is an initiative started by the Open Geospatial Consortium (OCG) and forms a web- enabled infrastructure for collecting, modelling, storing, retrieving, sharing, manipulating, analysing, and visualizing information about sensors and sensor observations of phenomena. OGC describes the sensor Web as Web-accessible sensor networks and archived sensor data that can be discovered and accessed using standard protocols and application program interfaces. A suite of specifications and standards have been developed by Sensor Web Enablement (SWE)256 in order to develop common sensor data models, and reusable sensor Web services that will enable sensors to be accessible and controllable via the Web. The core suite of language and service interface specifications includes the following: Data modelling languages: o Observations and Measurements (O&M). Real-time observations and measurements obtained by sensors are encoded in XML-style format. o Sensor Model Language (SML). XML-driven data models for the description of sensors and their processes. o Transducer Model Language (TML). XML-style schema for describing transducers and supporting real-time streaming of data to and from sensor systems. Web services suite middleware: o Sensor Observation Service (SOS). A Web service interface that provides access and several other actions on observation data. Its also the intermediary between a client and an observation repository or near real-time sensor channel. o Sensor Planning Service (SPS). A Web service interface for requesting user-driven acquisitions and observations. Its also intermediary between a client and a sensor collection management environment. o Sensor Alert Service (SAS). A Web service interface that publishes and subscribes alerts to sensors. 255 Simon Cox, Paul Daisey, Ron Lake, Clemens Portele, and Arliss Whiteside. Geography markup language (gml 3.0). In OpenGIS Documents. OpenGIS Consortium, 2003. 256 Mike Botts et al., OGC Sensor Web Enablement: Overview and High Level Architecture (OGC 07-165), Open Geospatial Consortium white paper, 28 Dec. 2007. Page 94 of 109

95 D2.4 State of the Art o Web Notification Services (WNS). A Web service interface for asynchronous delivery of messages or alerts from SAS and SPS Web services and other elements of service workflows. 7.3.1.2.2 Semantic Sensor Web Semantic Sensor Web is a framework that enriches existing SWE language standards with semantic annotations and provides the capability to analyse further context or situation awareness. Semantic Web technologies such as RDF/OWL-based context description along with semantic inference techniques (e.g. SWRL) allow for the analysis, interoperability and reasoning over heterogeneous multimodal sensor data In order to enable semantic annotation within existing XML-based languages for describing sensor data, RDFa257 is introduced in XML documents by adding a set of attributes that enable the mapping of existing descriptions with RDF triples. An example of RDFa annotation in an existing instance of a XML file is given below. 2012-0103T05:00:00 A number of ontologies have been developed in order to describe semantically sensors and their observations/measurements. A review held by Compton et al.258 studied a number of research efforts on the ontological modelling of knowledge representation of the sensor networks domain. These are found in the table below. Table 9 Ontologies for sensors and observations semantic representation Reference Purpose Avancha et al.259 adaptive sensor networks Matheus et al. 260 pedigree (provenance) OntoSensor261 knowledge base and inference Eid et al. 262 searching heterogeneous sensor network data 257 http://www.w3.org/2006/07/SWD/RDFa/ 258 Compton, M., Henson, C., Lefort, L., Neuhaus, H.: A survey of the semantic specification of sensors. Technical report (2009) [Available online at http://lists.w3.org/Archives/Public/public-xg-ssn/2009Aug/att-0037/SSN-XG StateOfArt.pdf ]. 259 S. Avancha, C. Patel, and A. Joshi. Ontology-driven adaptive sensor networks. In 1st Annual International Conference on Mobile and Ubiquitous Systems, Networking and Services, 2004. 260 C. J. Matheus, D. Tribble, M. M. Kokar, M. G. Ceruti, and S. C. McGirr. Towards a formal pedigree ontology for level-one sensor fusion. In 10th InternationalCommand & Control Research and Technology Symposium, 2005. 261 D. Russomanno, C. Kothari, and O. Thomas. Building a sensor ontology: a practical approach leveraging ISO and OGC models. In 2005 International Conferenceon Artificial Intelligence (vol 2), 2005. 262 M. Eid, R. Liscano, and A. E. Saddik. A novel ontology for sensor networks data. In IEEE International Conference on Computational Intelligence for Measurement Systems and Applications, 2006. Page 95 of 109

96 D2.4 State of the Art Kim et al. 263 Services CESN 264 inferring domain knowledge from data SWAMO 265 intelligent agents A3ME 266 resource constrained devices ISTAR 267 task assignment OOSTethys268 integrating standards-compliant Web services MMI 269 interoperability CSIRO 270 data integration, search, classification and workflows SSN ontology 271 Sensor assets 7.3.2 Reasoning through ontologies 7.3.2.1 Knowledge Representation, Reasoning & Querying For knowledge representation OWL, the Web Ontology Language, has been already referenced in the above sections. OWL is an instance of the description logic (DL) family of knowledge representation languages. Knowledge is described in terms of classes (or concepts), properties (or roles), and instances (or individuals). Sound, complete, and terminating decision procedure for basic inference problems have been implemented in highly-optimized DL reasoners, such as FaCT (http://www.cs.man.ac.uk/~horrocks/FaCT ), Racer (http://www.racer-systems.com ) or Pellet (http://pellet.owldl.com/), OWL builds on top of the Resource Description Framework RDF (www.w3.org/RDF/) that can be seen as a general-purpose language for representing information. The basic RDF data model is a 263 J. Kim, H. Kwon, D. Kim, H. Kwak, and S. Lee. Building a service-oriented ontology for wireless sensor networks. In 7th IEEE/ACIS International Conference on Computer and Information Science, 2008 264 M. Calder, R. Morris, and F. Peri. Machine reasoning about anomalous sensor data. In International Conference on Eccological Informatics, 2008. 265 K. J. Witt, J. Stanley, D. Smithbauer, D. Mandl, V. Ly, A. Underbrink, and M. Metheny. Enabling Sensor Webs by utilizing SWAMO for autonomous operations. In 8th NASA Earth Science Technology Conference, 2008. 266 A. Herzog, D. Jacobi, and A. Buchmann. A3ME - an Agent-Based middleware approach for mixed mode environments. In 2nd International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, 2008. 267 M. Gomez, A. Preece, M. Johnson, G. de Mel, W. Vasconcelos, C. Gibson, A. BarNoy, K. Borowiecki, T. Porta, and D. Pizzocaro. An ontology-centric approach to sensor-mission assignment. In 16th International Conference on Knowledge Engineering and Knowledge Management, 2008. 268 OOSTethys. http://www.oostethys.org/, last accessed 20th February 2012. 269 MMI device ontologies working group. http://marinemetadata.org/community/ teams/ontdevices, last accessed 20 th February 2012. 270 H. Neuhaus and M. Compton. The semantic sensor network ontology: a generic language to describe sensor assets. In AGILE Workshop: Challenges in GeospatialData Harmonisation, 2009. 271 Neuhaus, H. and Compton, M. The Semantic Sensor Network Ontology: A Generic Language to Describe Sensor Assets, AGILE 2009 Pre-Conference Workshop Challenges in Geospatial Data Harmonisation, 02 June 2009, Hannover, Germany Page 96 of 109

97 D2.4 State of the Art triple of the form {subject, predicate, object}. RDF Schema then states how to describe RDF vocabularies. RDFS furthermore defines other built-in RDF vocabulary, such as rdfs:domain or rdfs:subClassOf. RDF and RDFS have been given a precise semantics in terms of axiomatic triples or facts and entailment rules or inference rules in 272, for instance OWL defines three increasingly expressive sublanguages: OWL Lite, OWL DL, and OWL Full, the latter in general undecidable. OWL Lite and OWL DL differ from one another in that OWL DL syntactically (and semantically) provides further expressivity, for instance, owl:unionOf, owl:complementOf, owl:oneOf, plus general cardinality constraints. Since one can easily show that general knowledge formulation usually needs more expressive means than DL or OWL can provide, a number of proposals have been published that extends description logics with rules. The most promising that has been standardized is SWRL, the Semantic Web Rule Language that builds on top of OWL. A popular implementation that is much in the spirit of OWL and SWRL is Sesame, an open source RDF framework273 that employs OWLIM274, a forward-chaining reasoner to define a rule-based extension on top of OWL. Domain-specific rules and facts are simply added here to the axiomatic triples and entailment rules that implement a semantics for OWL, along the lines of 275 and 276. A similar approach has been implemented in the Jena semantic web framework277. For adding querying support over ontologies, SPARQL278 is an SQL-like language to query RDF (and so OWL) data. Both Sesame 2.0 together with forthcoming SwiftOWLIM 3.0 and Jena provide a SPARQL interface to RDF data. Besides conjunctive querying, SPARQL provides means to ask for optional and even disjunctive information, plus numerical constraints based on XSD data types. A distinct feature that will be investigated throughout USEFIL is the incorporation of temporal reasoning which is capable of inferring temporal information from representations in the underlying ontology model. Representation of static and dynamic spatio-temporal information in ontologies calls for mechanisms allowing for uniform representation of the notions of time (and of properties varying in time) within a single uniform ontology279. Methods for achieving this goal include (among others), temporal description logics 280, temporal RDF 281, versioning 282, named graphs 283, reification, N-ary relations284 and the 4D-fluent (perdurantist) approach285. 272 Patrick Hayes. RDF Semantics. (http://www.w3.org/TR/rdf-mt/), last accessed 20th February 2012. 273 http://www.openrdf.org/ 274 http://www.ontotext.com/owlim 275 Patrick Hayes. RDF Semantics. (http://www.w3.org/TR/rdf-mt/), last accessed 20th February 2012. 276 Herman J. ter Horst. Combining RDF and Part of OWL with Rules: Semantics, Decidability, Complexity. Proc. Of ISWC, 668-684, 2005. 277 http://jena.sourceforge.net/ 278 http://www.w3.org/TR/rdf-sparql-query 279 P. Grenon and B. Smith. SNAP and SPAN: Towards Dynamic Spatial Ontology. Spatial Cognition and Computation, Vol 4, No. 1, pp 69104, 2004 280 A. Artale, and E. Franconi. A Survey of Temporal Extensions of Description Logics . Annals of Mathematics and Artificial Intelligence, 30(1-4), 2001. 281 C. Gutierrez, C. Hurtado, and A. Vaisman. Introducing Time into RDF. In IEEE Transactions on Knowledge and Data Engineering, 19(2) , pp. 207-218, 2007. 282 M. Klein and D. Fensel. Ontology Versioning for the Semantic Web. In International Semantic Web Working Symposium (SWWS01), pages 7592, California, USA, JulyAugust 2001. 283 J. Tappolet, and A. Bernstein Applied Temporal RDF: Efficient Temporal Querying of RDF Data with SPARQL. In Proceedings of the European Semantic Web Conference, LNCS 5554, 308-322, 2009 284 N. Noy and B. Rector. Defining N-ary Relations on the Semantic Web. W3C Working Group Note 12, April 2006. 285 C. Welty and R. Fikes. A Reusable Ontology for Fluents in OWL. Frontiers in Artificial Intelligence and Applications, 150:226 236, 2006. Page 97 of 109

98 D2.4 State of the Art 7.3.3 Linked Open Data The classic Web is a single global information space that is built upon the notion of setting hyperlinks between web documents. Classic Web is built upon a small set of standards, such as Uniform Resource Locators (URLs) which act as globally unique IDs of web documents and can be exploited by several information retrieval mechanisms, the Hypertext Transfer Protocol (HTTP), and the hypertext markup language (HTML) which acts as a shared content format. Recently, major Web data providers such as Google and Amazon tend to provide access to their databases through different Web APIs. This phenomenon has led to the development of new mashups that combine information from different sources. However, different Web APIs provide distinct identification and access mechanisms which represent data in different formats, without using global unique identifiers to define data items. As a result, Web is separated and fragmented into data silos as there is no possible way to set links between different data sets. To overcome the aforementioned obstacles, Tim Berners-Lee outlined a number of principles regarding the way structured data are published and connected in the Web, namely the Linked Data principles286. These are: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information (i.e. structured description - metadata) 4. Include RDF statements that link to other URIs so that they can discover related things. Summarizing, Linked Data introduces a method to publish structure data on the Web so that they can be easily interlinked and expand knowledge. Linked Data builds upon widely used Web protocols, such as HTTP (mechanism for retrieval of entities) and URIs (identification of entities), along with semantic web technologies such as RDF (data format to semantically describe Linked Data and their relationships). Therefore, Linked Data technology enables heterogeneous data sources to be integrated into a global data space by using data-level links. This way it is possible to fuse data about entities from multiple data sources, thus enriching the query process with more expressiveness. The Linking Open Data project287 is the most glaring example of adopting Linked Data principles. The aim of the project is to discover existing datasets on the Web that are under open license, convert them into RDF triples and publish them to the Web of Data. Bio2RDF project 288 has started a major attempt to interlink more than 30 different many bioinformatics and cheminformatics data sources, including UniProt (the Universal Protein Resource), KEGG (the Kyoto Encyclopedia of Genes and Genomes), CAS (the Chemical Abstracts Service), PubMed, and the Gene Ontology. 286 T. Berners-Lee, Linked DataDesign Issues, 2006; www.w3.org/DesignIssues/LinkedData.html. 287 http://www.w3.org/wiki/SweoIG/TaskForces/ 288 Belleau F., Nolin., M.-A., Tourigny N., Rigault, P., and Morissette, J. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. J. Biomed. Infor. 41.706-716, 2008. Page 98 of 109

99 D2.4 State of the Art Figure 42: Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. Other efforts towards the linked open data concept related to life sciences are: Linked Clinical Trials (LinkedCT) data source 289 (derived from a service, called ClinicalTrials.gov, a registry of more than 60,000 clinical trials conducted in 158 countries). DrugBank290 is a large repository of almost 5000 FDA-approved small molecule and biotech drugs. Diseasomedataset291 contains information about 4,300 disorders and disease genes linked by known disordergene associations for exploring known phenotype and disease gene associations and indicating the common genetic origin of many diseases. Dailymed292 is published by the National Library of Medicine, and provides high quality information about marketed drugs. Its linked data version can be found at http://www4.wiwiss.fu-berlin.de/dailymed/. W3Cs Linking Open Drug Data293 effort aims at delivering a data space with interlinked open-license drug and clinical trials data to support drug discovery. 289 Hassanzadeh, O., Kementsietsidis A, Lim L, Miller, RJ, and Wang M. LinkedCT: A Linked Data Space for Clinical Trials, 2009, http://arxiv.org/abs/0908.0567. 290 Wishart D.S., Knox C., Guo A.C., Shrivastava S., Hassanali M., Stothard P., Chang Z., Woolsey J.: DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nuc. Acids Res. 1(34): D668-72, 2006. 291 Goh K.-I., Cusick M.E., Valle D., Childs B., Vidal M., Barabsi A.L.: The human disease network. Proc. Natl. Acad. Sci. USA 104:8685- 8690, 2007. 292 http://dailymed.nlm.nih.gov/ 293 Jentzsch, A., et al. 2009. Enabling tailored therapeutics with linked data. In Proc. 2nd Workshop Linked Data on the Web. Page 99 of 109

100 D2.4 State of the Art Serious amount of research has taken place recently regarding Linked Sensor Data in the Web.294295296 Main research interests include the identification of sensor resources using URIs, linking them with clear semantics and how to publish sensor data in the Web. The aim of these research efforts focus in many applications, such as: automated conversion from OGC standards and services297 , sensor data resource addressability298 , alignment with foundation ontologies299 , publishing linked sensor locations and attributes300 , sensor discovery over Linked Data301 , and integration from multiple sensor sources into a single service302. 294 Page, K., De Roure, D., Martinez, K., Sadler, J., Kit, O.: Linked sensor data: Restfully serving rdf and gml. In K., T., Ayyagari, A., De Roure, D., eds.: Proceedings of the 2nd International Workshop on Semantic Sensor Networks (SSN09). Volume Vol-522., CEUR (2009) 49-63 15. 295 Phuoc, D.L., Hauswirth, M.: Linked open data in sensor data mashups. In Kerry Taylor, A.A.D.D.R., ed.: Proceedings of the 2nd International Workshop on Semantic Sensor Networks (SSN09). Volume Vol-522., CEUR (2009) 1-16 296 Patni, H., Henson, C., Sheth, A.: Linked sensor data. In: 2010 International Symposium on Collaborative Technologies and Systems, IEEE (2010) 362-370 297 Patni, H., Henson, C., Sheth, A.: Linked sensor data. In: 2010 International Symposium on Collaborative Technologies and Systems, IEEE (2010) 362-370 298 K. Janowicz et al., Towards meaningful uris for linked sensor data, in Towards Digital Earth: Search, Discover and Share Geospatial Data.Workshop at Future Internet Symposium, 2010. 299 K. Janowicz and M. Compton, The stimulus-sensor-observation ontology design pattern and its integration into the semantic sensor network ontology, in 3rd International Workshop on Semantic Sensor Networks (SSN10), 2010. 300 P. Barnaghi, M. Presser, and K. Moessner, Publishing linked sensor data, in 3rd International Workshop on Semantic Sensor Networks (SSN10), 2010. 301 J. Pschorr et al., Sensor discovery on linked data, in 7th Extended Semantic Web Conference (ESWC2010), 2010. 302 Phuoc, D.L., Hauswirth, M.: Linked open data in sensor data mashups. In Kerry Taylor, A.A.D.D.R., ed.: Proceedings of the 2nd International Workshop on Semantic Sensor Networks (SSN09). Volume Vol-522., CEUR (2009) 1-16 Page 100 of 109

101 D2.4 State of the Art 8. Data Security Data security is an important issue within the USEFIL system (Figure 43). The following aspects are important: authentication to verify, that someone is the one, he or she claims to be authorisation to ascertain, whether the authenticated user has the right to access the intended resources secure data storage to ensure, that stored data is accessible only to authorised entities secure communication to avoid eavesdropping of network-based communication privacy the ability to control the own personal data, i.e. which data is collected and stored and who else has access to which of these data. Figure 43: USEFIL data processing 8.1 Authentication Authentication is the process of verifying whether the user is the one, he or she claims to be. In computer networks it is realized usually by a combination of username and password. Each user has to be registered in advance with a unique username and an assigned or self-declared password, which has to meet certain security regulations (like the use of numbers and special characters, a minimum password length ...). In contradiction to usernames, passwords are usually not stored as a restorable datum. Usually a hash-code is stored and within the authentication process a hash code of the inserted password is calculated and compared to the stored datum. The advantage is that an Page 101 of 109

102 D2.4 State of the Art attacker cannot use the user-credentials even if he gets access to the file or database-table of these user data. The weakness of this system is that passwords can be stolen, accidentally revealed or spied out. For critical applications more stringent authentication processes can be used. A common standard is the use of digital certificates issued and verified by a certification authority (CA) as part of a public key infrastructure (PKI). These certificates are stored on a token (USB-token, smartcard ...) and the user needs two different means for authentication: possession (the token) and knowledge (password or PIN for accessing the certificate stored on the token). The decision, which kind of authentication to implement should be the result of a cost-benefit analysis, which takes into account the data protection requirements. 8.2 Authorisation Authorisation is the process of giving an already authenticated user the permission to access certain resources. In most computer systems a system administrator defines roles with certain privileges (which resources are allowed to access, writing or read-only ...) and assigns each user to one or more specific roles. Each user has to get access to all the data, he or she has to work with. All other resources are not accessible andin the best caseeven not visible. Authorization in web-based application servers can be applied by explicitly stating which web resources are accessed by which system user (directory or file permissions to specific users or user groups). Another type of applications uses stateful authentication/authorisation. However, building stateful web applications requires special care because of the stateless nature of HTTP. By using session management, you can build stateful applications. A user can be authenticated and start a valid session. If there is also the need for authorization for accessing specific services the session must be associated to a user account where the privileges are stored. This is achieved through the interaction of the web application with a data base where a list of user credentials (username/password-pairs) is matched to specific privileges. Whereas logically authentication precedes authorisation, these issues are usually managed together. In USEFIL it is necessary for getting access to the stored data at the USEFIL server. Doctors need access to detailed medical information of their patients (and only their patients!). Carers usually need only some of these data or a rough aggregation of the detailed information. So doctors and carers have to be assigned to different roles. Other roles may define the rights of relatives, friends, neighbours or maintenance personnel. Even within the home environment some kind of authentication is required, e.g. to ensure, that the one standing in front of the USEFIL mirror system is the intended user and not a visitor. Here it is not necessary to use credentials (username/password), since it might be more reasonable to detect it automatically (like e.g. via face or voice recognition or similar). This kind of authentication is reasonable for all USEFIL sensors and units. 8.3 Secure data storage In USEFIL we can differentiate between current data stored within the home environment and those (current and historic) data stored at the USEFIL server (see Figure: local data storage vs. USEFIL server). Security issues arise particularly at the server, which is accessible via the Internet and contains the full set of (aggregated) current and historic data of all USEFIL users (whereas the locally stored data is accessible only within the home environment and consist only of the current (but detailed/raw) data of one user). Page 102 of 109

103 D2.4 State of the Art To protect confidentiality of the data stored on a computer (uploaded files / database files with all its database entries) a disk encryption is commonly used. Disk encryption solutions often work transparently on an entire disk volume, a directory, or even a single file. Microsoft, as an example, introduced the Encrypting File System (EFS) in version 3 of NTFS303. EFS is supported by all Windows operating systems since Windows 2000. Transparent encryption can be enabled by users on a per-file, per-directory, or per-drive basis. Similar solutions exist for other operating systems as well304. A commonly used multi-OS software is TrueCrypt305, a disk encryption software for Windows 7/Vista/XP, Mac OS X, and Linux. It can create a virtual encrypted disk within a file or encrypt a partition. TrueCrypt supports on-the-fly encryption. A survey of different available disk encryption software can be found here: http://en.wikipedia.org/wiki/Comparison_of_disk_encryption_software. 8.4 Secure communication A nowadays self-evident aspect of network security is the use of a firewall. A firewall is a system between the site it protects (here the home environment) and all other networks (see Figure 43). It is usually implemented as part of a router, although a personal firewall may be implemented on an end-user machine. Firewall-based security depends on the firewall being the only connectivity to the site from outside. A firewall provides access control by restricting which messages it will relay between the site and other networks; it forwards messages that are allowed, and filters out messages that are disallowed. In effect, a firewall divides a network into a more-trusted zone internal to the firewall, and a less-trusted zone external to the firewall. Commercially available routers are usually equipped with a firewall. Communication security between client and server is commonly achieved by the use of the Secure Socket Layer (SSL) protocol or the Transport Layer Security (TLS) protocol. TLS and SSL encrypt the segments of network connections above the Transport Layer. The combination of HTTP and SSL/TLS is called HTTPS (Hypertext Transfer Protocol Secure) and should be used when accessing the USEFIL server via Web Interface. SSL can also be used with XMPP, but it secures only the communication between the client and the first XMPP server and you cannot be sure about the encryption between XMPP-servers within the XMPP-network. If USEFIL operates its own XMPP- server it can be assured, that the communication between its own members (USEFIL-XMPP-Client USEFIL-XMPP-Server USEFIL-XMPP-Client) is secured. Another commonly used option of securing network communication is the use of the Internet Protocol Security (IPsec), which operates in the lower layers of the TCP/IP model. Applications do not need to be specifically designed to use IPsec, since IPsec protects any application traffic across an IP network. IPsec secures communications be authenticating and encrypting each IP packet and can be used between hosts, between networks or between host and network. IPSec is often used with (certificate-based) Virtual Private Networks (VPN). A VPN might be a suitable solution for connecting the USEFIL server to the home gateway, doctor or clinical client to encrypt the whole communication between them. Within the home environment USEFIL will apply wireless technologies like WLAN and Bluetooth. The current safe encryption standard for wireless LAN is WPA2306. It is implemented in all current 303 NTFS = New Technology File System 304 see http://en.wikipedia.org/wiki/List_of_cryptographic_file_systems 305 see http://www.truecrypt.org/ 306 WPA = Wi-Fi Protected Access Page 103 of 109

104 D2.4 State of the Art WLAN-routers and easy to use. Bluetooth devices have to be paired before exchanging data and since v2.1 encryption is required for all non-SDP307 connections. 8.5 Privacy Privacy plays an important role within the USEFIL project, since a wealth of personal (medical) data will be processed. Besides national regulations and laws, the Data Protection Directive of the EU has to be applied. First of all, it is important to collect and distribute data according to the principle that nothing more than those details will be collected and stored which are absolutely necessary (data reduction and data economy). Instead of storing / transmitting the collected raw data, pre-processed data might be used. This data aggregation is realised stepwise in USEFIL (see dotted rectangles in Figure 43: (1) algorithm processing, (2) data fusion, (3) data aggregation, (4) data mining, rule-based analysis). Data have to be stored and transmitted securely to prevent unauthorized access. This will be achieved by the above mentioned security measures (depending on the protection requirements of the respective data). Each user has to be able to control his or her personal data, i.e.: which data will be collected when and how often will they be collected how are they stored and how long are they stored who has access to which of these stored data. The kind of control depends on the specific application (and should be realised in a user-friendly way, designed for the target group). In some cases it might be sufficient to configure it once using an adequate interface at the tablet or Internet TV (e.g. my doctor and my carer generally should have access to my medical data), whereas in other situations the user may want to decide it from case to case (like e.g. enabling video and/or audio when using the chat application on the tablet or Internet TV). Another way to achieve privacy is to hide the person to whom the data belongs by anonymisation or pseudonymisation. Anonymisation is the procedure to change all person-related data within a data record in such a way, that they cannot be matched to a certain person. Pseudonymisation is a procedure by which all person-related data within a data record are mapped to an artificial identifier (pseudonym) that maps again one-to-one to the person. Neither anonymisation nor pseudonymisation are reasonable in USEFIL: In communication applications the user wants to know to whom he or she talks and wants, that his or her interlocutor knows, who he or she is. Storage of medical data is not considered to be sensible, if the doctor or carer does not know, who the correspondent patient is. Furthermore, depending on the amount of participating members it might simply be impossible to anonymise or pseudonymize them reliably. Therefore privacy has to be achieved within USEFIL by means of the other above mentioned measures. 307 SDP = Service Discovery Protocol Page 104 of 109

105 D2.4 State of the Art 9. Other Technologies worth Watching 9.1 Interactive Displays Behind the Screen Overlay Interactions http://www.youtube.com/watch?v=oGa1Q7NvsI0 (Presented at TechFest 2012) Behind-the-screen interaction with a transparent OLED with view- dependent, depth-corrected gaze. 9.2 Holoflector Holoflector is a unique, interactive augmented-reality mirror. Graphics are superimposed correctly on your own reflection to enable a novel augmented-reality experience. Presented at Microsoft Researchs TechFest 2012, Holoflector leverages the combined abilities of Kinect and Windows Phone to infer the position of your phone and render graphics that seem to hover above it. http://research.microsoft.com/en-us/events/techfest2012/video.aspx Page 105 of 109

106 D2.4 State of the Art 10. Summary 10.1 Mobile Monitoring Systems 10.1.1 Mobile Sensor Platforms There are a wide range of mobile sensor platforms currently on the market which include both complete system solutions and individual monitoring platforms. A wide variety of physiological, physical and behavioural parameters can be measured and recorded using these systems, but the exact capabilities differ from system to system. For the USEFIL project, a complete solution type system is unsuitable; a monitoring platform that can be paired to a smart-phone platform is much more desirable. Using this criterion, the Metawatch and the Chronos could both be suitable for USEFIL. Both solutions offer a wrist based platform and have three-axis acceleration sensors, which have been identified as a key area for USEFIL and both can be extended to support additional sensors. The iM watch and the Wimm would also be suitable for the USEFIL project and may offer a greater degree of customisation and a better user interface than either the Chronos or the Metawatch. The exact choice of hardware will be determined by requirements on the USEFIL system. 10.1.2 Smart-Phone Platform Due to the USEFIL consortiums decision to use open source technology, in conjunction with the probable need for multi-tasking, neither the Windows Mobile nor iOS platforms are suitable. The smart-phone choice for the USEFIL project should almost certainly be an Android based device. 10.1.3 Communication Technologies There are a range of communication technologies employed across wearable sensor devices, smart- phone platforms and communication networks, such as Bluetooth, WiFi and WiMax. The range and bandwidth requirements of any particular communication link, in addition to the capabilities of the hardware being used will often define which technology will be used. The hardware selected for the USEFIL system will have to be selected to ensure that each element in the system can connect to those other elements that it needs to connect to. For example, a Bluetooth enabled mobile wearable unit will need to be connected to a Bluetooth enabled smart-phone. The USEFIL system will most likely employ a range of communication types. 10.2 Slate Tablet-PC There are a number of tablet-PCs available on the market for a range of price points, although for the USEFIL system choice will be restricted to Android platforms. Most tablets offer similar functionality and communication options. In general, the higher the performance of a tablet-PC, especially in terms of screen size, the higher the cost. The exact choice of tablet-PC will be dependent on the processing, communication and usability requirements. 10.2.1 Tablet-PCs for Social Interaction Tablet computers have been used in several systems, such as Bettie, Connect and Memo to provide easy access to social networks for older users. The tablets are configured to load straight into the social network and hide all the other functionality often associated with a tablet-PC. Furthermore, Page 106 of 109

107 D2.4 State of the Art the capabilities of the social network itself are often customised to meet the needs of the user, such as hiding the functionality to search for new friends. In this way, the tablet can be used to allow the elder to communicate easily with existing friends and family. 10.3 Video Monitoring Units There are a number of cameras available on the market that would be suitable for the USEFIL system. The choice of camera will depend on the required operating characteristics, especially the sampling rate, light sensitivity, ability to record sound and the physical size of the camera. 10.4 Home Gateway Systems There are two possible technology options for home gateway systems; web-TV or the smart-home concept, although they offer markedly different capabilities. Web-TV is available on a range of Phillips and Sharp TVs and serves to connect the TV to the internet and provide access to a number of apps on the TV. The smart-home concept is aimed at home automation and allows the networking of a number of devices within the home and also provides internet connectivity. There are a range of smart-home systems available, such as ThereGate and HomeSeer. Of the systems reported in this document, HomeSeer is the most technically polished but is also expensive and may not be suitable for USEFIL because of this. A cheaper alternative, such as ThereGate may be more appropriate if smart-home technology is to be used. ThereGate is also one platform with which VTT is most familiar and is using in some other projects. 10.5 Decision Support Systems A DSS is a computer system that takes a set of measured parameters and produces an inferred response to those parameters. In a clinical context, a CDSS will take a variety of patient medical information as input and then produce inferences about the patients condition. This can be used as a diagnostic aid for example and there are several such systems described in the literature. In general terms a DSS comprises a knowledge base and an inference engine. The knowledge base contains all the knowledge that is needed for the context of the system. In USEFIL this would include information about the user, their environment and their medical history for example. Currently, the most often used form of knowledge base is the ontology and is often realised through the use of OWL and its extensions. The inference engine is used on the information in the knowledge base and performs semantic inference based on a set of rules and reasoning logic. There are a number of inference engines that have been developed, of which SWRL has been identified as the most promising. 10.6 Monitoring Humans 10.6.1 Photoplethysmography Photoplethysmography is the technique of identifying several physiological functions, such as a persons respiration and heart-beat, through the use of a light source and camera. Traditionally this Page 107 of 109

108 D2.4 State of the Art has been accomplished by using a clip design to hold the light source and sensor securely in place to facilitate the acquisition of good signals. More recently, research has been carried out into using a mobile-phone camera, and light from the cameras flash, to acquire the PPG signal. The use of web-cams and ambient light is also being investigated, but presents some challenging research problems, particularly in providing robustness with respect to a persons motion. 10.6.2 Emotion Recognition Emotion recognition is the task of identifying a persons emotion based on data gathered from one or more data sources. These may be audio, visual or physiological. Visual emotion recognition systems employ a camera and operate on an image sequence. There are a number of systems in existence that perform some emotion recognition, such as Affectivo, Affdex and Shore. Many of these systems are advertised as tools for market research and are good at identifying happiness but perform poorly on detection of other emotions. The technology to track faces in real-time video exists but more research is required for adequate emotion recognition for the USEFIL system. Speech based systems rely on detecting audible cues to a persons emotion and there are again, a number of studies examining speech as a means to detect emotion. In comparison to visual systems, more current speech based systems are trained and tested on naturalistic emotion. This difference suggests that speech, rather than video, may be the easier option for emotion detection in a naturalistic setting. Physiological signal based systems make use of physiological data gathered from the person, such as EEG, ECG and EOG. Research has shown that a persons physiological signals are linked to their current emotional state and that these can be used to detect emotion. It is possible, and indeed desirable, to combine two or more of the above data sources in an emotion detection system. Multi-modal systems make use of various fusion techniques, such as feature fusion, where features are extracted from each channel and then fused, and decision fusion, where each channel is fully processed and the results are fused, to combine different data channels. Several studies have shown that multi-modal systems perform better than their component channels. 10.6.3 Behaviour Monitoring Human behaviour stems from the attempt of an individual to meet their needs. According to Maslow there exists a hierarchy of human needs, where the most important needs take priority. An individuals behaviour is the action that they take in order to meet these needs. The task of recognising human behaviour in an automated way will be important to the USEFIL system. There are several systems that have been developed to identify specific behaviours in a lab setting, such event detection in sports, retrieving actions in movies and human gesture recognition. Fall detection is a specific case of behaviour monitoring and is of special importance to the elderly, and therefore to the USEFIL project. Fall detection typically involves either accelerometers, to identify the movements of the body, or visual techniques, to extract the body shape and movement from an image sequence. It is the case however, that a large number of these systems do not make the transition from the lab into the real world. In order to develop systems suitable for use in real world settings, more research is needed into several areas such as feature set extraction, modelling of behaviour, use of depth information and unsupervised learning. Page 108 of 109

109 D2.4 State of the Art 10.7 Data Security Data security is an important part of USEFIL and comprises five separate but related aspects; authentication, authorisation, secure storage, secure communication and privacy. Authentication is the process of making certain that a user is who they claim to be. This is conventionally achieved with a username and password. More complex, and therefore more secure, forms of authentication are available but these come with an increased cost. In the context of USEFIL it may be helpful in some situations to use unconventional forms of authentication, such as facial recognition to determine the identity of a user in front of the VMU. Authorisation is the process of determining that a user has the authority to perform the task they are trying to perform. This is usually accomplished by the setting of a users permissions, which define the things a user may and may not do in a particular system. Secure storage is the process of making sure that any data is stored in such a way as to make sure that it cannot be retrieved from the storage by an unauthorised person. This is typically achieved by using disk encryption methods to encrypt either an entire disk, or a partition of the disk. Secure communication is the process of preventing any unauthorised access to information whilst it is being transferred from one component in the system to another. This is achieved in a variety of ways depending on the communication link in question. For example, WiFi communication is secured using WPA2 encryption and communication between a web server and browser can be secured using the HTTPS protocol. Finally, privacy relates to the ability of a user to manage the access rights of others to their data. This additionally has to comply with data protection legislation. In the USEFIL system this will need to be realised in a user-friendly way and may need to be realised differently for different applications. Page 109 of 109

Load More