You’ve probably heard by now that a ‘researcher’ by the name of Emil Kirkegaard released the sensitive data of 70,000 individuals from OKCupid on the Open Science framework. This is an egregious violation of research ethics and we’re already beginning to see mainstream media coverage of this unfolding story. I’ve been following this pretty closely as it involves my PhD alma mater Aarhus University. All I want to do here is collect relevant links and facts for those who may not be aware of the story. This debacle is likely going become a key discussion piece in future debates over how to conduct open science. Jump to the bottom of this post for a live-updated collection of news coverage, blogs, and tweets as this issue unfolds.
Emil himself continues to fan flames by being totally unapologetic:
@tranquileye If you don’t want other people to see things, don’t post them publicly on the Internet. OKCupid does feature private answering.
— Emil OW Kirkegaard (@KirkegaardEmil) May 13, 2016
@jetsumgerl Let’s wait until the SJW-storm is over and talk about it.
— Emil OW Kirkegaard (@KirkegaardEmil) May 11, 2016
An open letter has been formed here, currently with the signatures of over 150 individuals (myself included) petitioning Aarhus University for a full statement and investigation of the issue:
Meanwhile Aarhus University has stated that Emil acted without oversight or any affiliation with AU, and that if he has claimed otherwise they intend to take (presumably legal) action:
1/2 Aarhus University states the following regarding #OKcupid: The views and actions by student Emil Kirkegaard is not on behalf of AU …
— Aarhus Universitet (@AarhusUni) May 12, 2016
2/2 … his actions are entirely his own responsibility. If @AarhusUni’s name has been misused, we will take action.
— Aarhus Universitet (@AarhusUni) May 12, 2016
I’m sure a lot more is going to be written as this story unfolds; the implications for open science are potentially huge. Already we’re seeing scientists wonder if this portends previously unappreciated risks of sharing data:
This is why I am v cautious of calls for open data @OSFramework https://t.co/xcBQlqOiOW
— Antonia Hamilton (@antoniahamilton) May 13, 2016
I just want to try and frame a few things. In the initial dust-up of this story there was a lot of confusion. I saw multiple accounts describing Emil as a “PI” (primary investigator), asking for his funding to be withdrawn, etc. At the time the details surrounding this was rather unclear. Now as more and more emerge it seems to paint a rather different picture, which is not being accurately portrayed so far in the media coverage:
Emil is not a ‘researcher’. He acted without any supervision or direct affiliation to AU. He is a masters student who claims on his website that he is ‘only enrolled at AU to collect SU [government funds])’. I’m seeing that most of the outlets describe this as ‘researchers release OKCupid data’. When considering the implications of this for open science and data sharing, we need to frame this as what it is: a group of hacktivists exploiting a security vulnerability under the guise of open science. NOT a university-backed research program.
What implications does this have for open science? From my perspective it looks like we need to discuss the role oversight and data protection. Ongoing twitter discussion suggests Emil violated EU data protection laws and the OKCupid terms of service. But other sources argue that this kind of scraping ‘attack’ is basically data-gathering 101 and that nearly any undergraduate with the right education could have done this. It seems like we need to have a conversation about our digital rights to data privacy, and whether those are doing enough to protect us. Doesn’t OKCupid itself hold some responsibility for allowing this data be access so easily? And what is the responsibility of the Open Science Foundation? Do we need to put stronger safeguards in place? Could an organization like anonymous, or even ISIS, ‘dox’ thousands of people and host the data there? These are extreme situations, but I think we need to frame them now before people walk away with the idea that this is an indictment of data sharing in general.
Below is a collection of tweets, blogs, and news coverage of the incident:
Tweets:
Brian Nosek on the Open Science Foundations Response:
Initial step for OKCupid data release on @OSFramework. @KirkegaardEmil password protected user datafile, version history is now inaccessible
— Brian Nosek (@BrianNosek) May 12, 2016
More tweets on larger issues:
@neuroconscience Protection of personal data is left, right and center of open science discussions with special sessions at meetings…
— Björn Brembs (@brembs) May 13, 2016
@neuroconscience also: dangerous to call someone a hacker who accessed info that they were “allowed” to access: remember Aaron Schwartz
— Richard D. Morey (@richarddmorey) May 13, 2016
@neuroconscience 1) because OSF doesn’t check all postings manually; 2) because that’s how HTML works, unfortunately
— Richard D. Morey (@richarddmorey) May 13, 2016
Emil has stated he is not acting on behalf of AU:
@neuroconscience as he has stated herehttps://t.co/ZrUmD932o3https://t.co/8ytnzbQ0ylhttps://t.co/HGliDdERgs
— Karsten Olsen (@karsolsen) May 13, 2016
News coverage:
Vox:
Motherboard:
http://motherboard.vice.com/read/70000-okcupid-users-just-had-their-data-published
ZDNet:
http://www.zdnet.com/article/okcupid-user-accounts-released-for-the-titillation-of-the-internet/
Forbes:
http://www.themarysue.com/okcupid-profile-leak/
Here is a great example of how bad this is; Wired runs stury with headline ‘OKCupid study reveals perils of big data science:
OkCupid Study Reveals the Perils of Big-Data Science