CHAI Home Research Directions Publications Contact People Prospective Students

Research Directions


Human-centered signal processing (HCSP) is the science of decoding human behavior. HCSP is part of the larger emerging field of Behavioral Signal Processing (BSP). BSP seeks to provide a computational account of aspects of human behavior ranging from interaction patterns to individual emotion expression using techniques from both signal processing and machine learning. HCSP encompasses a subset of the BSP domain including emotion production and perception, the coordination of verbal and non-verbal behavior, turn-taking behavior, user preferences, and user judgments. Results from these sub-domains can be integrated to develop quantitative models of user state.

Emotion Quantification and Emotion Dynamics

Emotion profiles (EP) describe the emotions present in an utterance. This makeup is characterized not in black or white semantic labels (e.g, the speaker is angry), but instead through the estimation of the degree of presence or absence of multiple emotional components. These components can either be defined by conventional semantic labels (e.g., angry, happy, neutral, sad) or based on unsupervised clustering of the feature space. These representations, which we refer to as profiles, are a multi-dimensional description of the emotional makeup of an utterance.

Our work focuses on methods to estimate the natural dynamics underlying emotional speech. We study utterance-level patterns and investigate methods to identify salient local dynamics. Our recent work (best student paper at ACM MM 2014) has focused on methods to characterize the dynamics of emotional facial movement in the presence of continuous speech.

Emotion Classification and Deep Learning Studies

Engineering models provide an important avenue through which to develop a greater understanding of human emotion. These techniques enable quantitative analysis of current theories, illuminating features that are common to specific types of emotion perception and the patterns that exist across the emotion classes. Such computational models can inform design of automatic emotion classification systems from speech, and other forms of emotion-relevant data.

Research in emotion recognition seeks to develop insights into the temporal properties of emotion. However, automatic emotion recognition from spontaneous speech is challenging due to non-ideal recording conditions and highly ambiguous ground truth labels. Further, emotion recognition systems typically work with noisy high-dimensional data, rendering it difficult to find representative features and train an effective classifier. We tackle this problem by using Deep Belief Networks, which can model complex and non-linear high-level relationships between low-level features. We create suites of hybrid classifiers based on Hidden Markov Models and Deep Belief Networks.

Perceptual Studies

The proper design of affective agents requires an understanding of human emotional perception. Such an understanding provides designers with a method through which to estimate how an affective interface may be perceived given intended feature modulations. However, human perception of naturalistic expressions is difficult to predict. This difficulty is partially due to the mismatch between the emotional cue generation (the speaker) and cue perception (the observer) processes and partially due to the presence of complex emotions, emotions that contain shades of multiple affective classes.

An understanding of the mapping between signal cue modulation and human perception can facilitate design improvements both for emotionally relevant and emotionally targeted expressions for use in human-computer and human-robot interaction. This understanding will further human-centered design, necessary for wide-spread adoption of this affective technology.

Click here for examples.

Mood Tracking for Individuals with Bipolar Disorder

Speech patterns are modulated by the emotional and neurophysiological state of the speaker. There exists a growing body of work that computationally examines this modulation in patients suffering from depression, autism, and post-traumatic stress disorder. However, the majority of the work in this area focuses on the analysis of structured speech collected in controlled environments. Here we expand on the existing literature by examining bipolar disorder (BP). BP is characterized by mood transitions, varying from a healthy euthymic state to states characterized by mania or depression. The speech patterns associated with these mood states provide a unique opportunity to study the modulations characteristic of mood variation. We describe methodology to collect unstructured speech continuously and unobtrusively via the recording of day-to-day cellular phone conversations. Our pilot investigation suggests that manic and depressive mood states can be recognized from this speech data, providing new insight into the feasibility of unobtrusive, unstructured, and continuous speech-based wellness monitoring for individuals with BP.

Assistive Technology for Individuals with Aphasia

Aphasia is a common language disorder which can severely affect an individual’s ability to communicate with others. Aphasia rehabilitation requires intensive practice accompanied by appropriate feedback, the latter of which is difficult to satisfy outside of therapy. In this paper we take a first step towards developing an intelligent system capable of providing feedback to patients with aphasia through the automation of two typical therapeutic exercises, sentence building and picture description. We describe the natural speech corpus collected from our interaction with clients in the University of Michigan Aphasia Program (UMAP). We develop classifiers to automatically estimate speech quality based on human perceptual judgment. Our automatic prediction yields accuracies comparable to the average human evaluator. Our feature selection process gives insights into the factors that influence human evaluation. The results presented in this work provide support for the feasibility of this type of system.