Mathematics and health promotion - discussing diabetes on Twitter

Social media for health promotion is a fast-moving, complex environment, teeming with messages and interactions among a diversity of users. In order to better understand this landscape a team of mathematicians and medical anthropologists from Oxford, Imperial College and Sinnia led by Oxford Mathematician Mariano Beguerisse studied a collection of 2.5 million tweets that contain the term "diabetes". In particular, the research focused on two main questions:

(1) Who are the most influential Twitter users that have posted about diabetes?

(2) What themes arise in these tweets?

The researchers used a mixed-methods approach to answer these questions, that relies on techniques from network science, information retrieval, and medical anthropology.

To answer question (1) the team constructed temporal retweet networks, in which the nodes are twitter users, and connections between them exist whenever a user "retweets" a message posted by another. The crucial feature of these networks is that the connections are "directed", that is, there is a distinction between who the author of the tweet is and who retweeted it. The directionality of connections is what allow us to extract the "hub" and "authority" centrality scores for each user in time. In networks, a centrality score is a proxy for importance; hubs and authority scores are useful to distinguish the different roles played by nodes in retweet networks. A good hub is a user that consistently retweets quality tweets, and a good authority is a user who posts them. Whereas the hub landscape is diffuse and has few consistent players, top authorities are highly persistent across time and comprise bloggers, advocacy groups and NGOs related to diabetes, as well as for-profit entities without specific diabetes expertise.

To get a closer look at who the most influential accounts are, the researchers constructed the follower network of the top authorities  (i.e., who follows whom among top authority nodes). An analysis of this network's communities places these top hubs in different groups with a distinct character such as Twitter accounts that are mostly focused on diabetes activism, health and science, lifestyle, commercial accounts, and comedians and parody accounts. 

To answer question (2) the team separated the tweets by weeks, and obtained the topics in each weekly bin using a technique known as "Latent Dirichlet Allocation", which estimates the probability that a tweet containing a specific word belongs to a topic. Once the topics were obtained, the researchers used thematic coding, a technique used by social scientists, to classify them in four broad thematic groups: health information, news, social interaction and commercial. Interestingly, humorous messages and references to popular culture appear consistently more than any other type of tweet. The abundance of jokes about diabetes in online social media is a signal that there is a baseline understanding about the disease and its causes, which may be the result of nutritional heath promotion over the past decades. This observation is at odds with the belief that more health education is required to help people to understand the sorts of foods which might contribute to the development of diabetes.

The results of this work indicate that the diabetes landscape on Twitter is complex, and it cannot be assumed that people can easily discern "good" and "bad" information, and that clearly there is more information available to consumers than they can be expected to absorb. Public health approaches that simply aim to "inform" the public might be insufficient or even be counterproductive, as they make a complicated cacophony of messages even busier. For example, information from bloggers, companies or automated accounts may be in line with broad health recommendations (and indeed may provide a valuable service to users), but without clear distinction from "legitimate" health advice, such information might also push an agenda that could lead to harm or greater health costs in future. In this case, public health agencies may have to develop new approaches to ensure that the electronic health information landscape is one that promotes healthy citizens and not only sweet profits.