Posted by Eric Horvitz and Munmun De Choudhury

generic image

At Microsoft Research, we’ve been exploring the use of data analysis and machine learning to gain insights about health and well-being—and to enhance the quality of health care. Our efforts in this area include research on using data stored in electronic health records to construct predictive models that can provide physicians with advance warning about patient outcomes.

We’ve worked with colleagues to develop systems that can predict the likelihood that a patient will contract an infection while in the hospital or that a patient being discharged will be readmitted to the hospital within a short time. Some of these models have been deployed and are in use at hospitals throughout the world, providing demonstrated value to patients and physicians.

Beyond examining data from medical health records about hospitalized patients, we have been interested in the prospects of developing new methods that can transform anonymized data about the search and communications activities of people into a large-scale sensor network for public health. As an example of directions and opportunities in this realm, we recently showed how we can detect previously unknown drug interactions via analysis of anonymized web-search logs. We identified useful signals via analysis of tens of millions of queries sent to search engines by millions of users who had consented to share their search activities with Microsoft for research purposes.

In another direction, over the last year, we have been exploring the value of harnessing social media as streaming sensor data for studies in public health. In the upcoming week, during the ACM CHI 2013 meeting in Paris, we are presenting results on the influence of pregnancy and childbirth on Twitter activity and content. We started this work to see if we could identify key milestones in the lives of people from what they say in their Twitter posts—and to see if we might understand how that major life event might change the media activities of new mothers. We were particularly interested in exploring signs of how the birth of a baby changes the lives of people, as revealed by changes in communications activity.

Besides being a joyous event for parents, the birth of a baby can induce considerable changes in daily lifestyles. Sleep and daily routines are disrupted, and adjustments must be made in personal and professional lives. First-time mothers may be particularly challenged with navigating the new, complex realm of caring for their newborn.  A portion of new mothers may experience changes in mood, from minor postpartum blues to more severe postpartum depression. The latter is a significant challenge, and the roots of the condition are not yet well known. But the condition is not rare: It is believed that 10 to 15 percent of new mothers grapple with this condition and that the condition is underreported. Postpartum depression typically begins in the first month following the birth of a child and is characterized by symptoms including sadness, guilt, exhaustion, and anxiety.

In the study we are presenting in Paris, we discuss how we identified with high confidence when women had given birth, based on their tweets about the birth. We will describe how we aligned the birth milestones of about 400 mothers and then examined patterns of activity and content for three months before and after that milestone.  Beyond characterizing shifts in language, activities, and social engagement, along with measures of mood and emotion with the birth, we found that we could actually forecast forthcoming changes—that is, we can use machine learning to build predictive models that can provide the likelihood that there will be significant changes postpartum, solely from observations made before the birth announcement. The predicted changes include measures of risk for seeing behavioral changes downstream, before they appear, that may be associated with significant downturns in mood.

Our goal has been focused on doing core, basic research on the prospect of understanding and predicting behavioral changes via analysis of public tweets. But we believe that these methods might one day be used in valuable tools that enhance awareness about health and well-being, as well as in methods that support analyses at the broader level of studies in public health. For example, the results suggest that it may be possible to deploy new kinds of early warning systems that bring people timely information and proactive assistance within the privacy of their own devices.

At the same time, we expect that our results will stimulate discussions on ethics and privacy. Our research used publicly available data with no personally identifiable information used in the analysis. But the results nonetheless bring to the fore the prospect that analyses of publicly shared data may reveal information that people could find sensitive—in this case, identifying existing or forthcoming challenges with mental health. We look forward to joining with others to reflect about the study that we have done and the implications of the results. We’d love to hear your thoughts.