Five hundred million tweets are tweeted each day – with so many details about the location, interests and behaviors of users, the tweets are a trove of useful information for scientists who might be, for example, looking to find patterns in human behaviors, checking out risk factors for health conditions and track the spread of infectious diseases.
There are many potential uses to this information. By analyzing emotional cues in pregnant women, Microsoft researchers developed an algorithm that predicts those at risk for postpartum depression. The United States Geological Survey uses Twitter to track the location of earthquakes as people tweet about tremors. I could go on for days.
However, while all tweets are public, researchers wanting to access them have to do it through Twitter’s application programming interface, which currently only looks through 1 percent of the archive – drastically limiting the amount of available data. But all that is about to change.
Twitter announced that in February 2015, they will make all their tweets dating back to 2006 available for scientific research – with everything up for grabs, the usage of Twitter as a research tool will likely skyrocket. With so many data points to mine, it’s almost impossible to think of all the potential applications.
But this also raises some tough ethical questions: will Twitter claim any legal rights to any scientific findings? It seems somewhat understandable, and they could make a very strong case. But the most important question is: is it ethical to use the data of the people, without them giving consent? Again, on one hand, it’s very valuable data, and scientists could make good use of it, ultimately providing benefits to mankind. But on the other hand, maybe I just don’t want to reveal my data – I don’t feel comfortable with it. How could this be solved?
Caitlin Rivers and Bryan Lewis, computational epidemiologists at Virginia Tech, published guidelines for the ethical use of Twitter data in February. It seems like common sense, but I guess it needed to be written down. The gist of it is: never reveal personal information about users. Username, location, personal preference, whatever – that’s private, and you shouldn’t reveal personal information, just statistical information. Rivers and Lewis argue that it is crucial for scientists to consider and protect users’ privacy as Twitter-based research projects multiply. Well, as Spiderman said, with great data comes great responsibility! Or was that Snowden?