When Jillian York, a 36-year-old American activist, was on vacation in February, she received an unexpected text. Her friend Adam Harvey, another activist and researcher, had discovered photos of her in a US government database used to train facial-recognition algorithms, and wondered whether she knew about it.
York, who works in Berlin for the Electronic Frontier Foundation, a digital rights non-profit group, did not. She was stunned to discover that the database contained nearly a dozen images of her, a mixture of photos and YouTube video stills, taken over a period of almost a decade.
When she dug into what the database was used for, it dawned on her that her face had helped to build systems used by the federal government to recognise faces of interest, including suspected criminals, terrorists and illegal aliens.
“What struck me immediately was the range of times they cover,” York says. “The first images were from 2008, all the way through to 2015.” Two of the photos, by a photographer friend, had been scraped from Google. “They were taken at closed meetings. They were definitely private in the sense that it was me goofing around with friends, rather than me on stage,” she adds.
Another half-dozen photos had been clipped from YouTube videos of York speaking at events, on topics including freedom of expression, digital privacy and security. “It troubles me that someone was watching videos of me and clipping stills for this purpose,” she says.
To teach a machine to recognise a human face it has to be trained using hundreds of thousands of faces. The more natural, varied and unposed the faces are, the better they simulate real-life scenarios in which surveillance might take place © Sébastien Thibault
York is one of 3,500 subjects in this database, which is known as Iarpa Janus Benchmark-C (IJB-C). Iarpa is a US government body that funds innovative research aimed at giving the US intelligence community a competitive advantage; Janus — named after the two-faced Roman god — is its facial-recognition initiative.
The dataset, which was compiled by a government subcontractor called Noblis, includes a total of 21,294 images of faces (there are other body parts too), averaging six pictures and three videos per person, and is available on application to researchers in the field. By their own admission, its creators picked “subjects with diverse occupations, avoiding one pitfall of ‘celebrity-only’ media [which] may be less representative of the global population.”
Other subjects in the dataset include three EFF board members, an Al-Jazeera journalist, a technology futurist and writer, and at least three Middle Eastern political activists, including an Egyptian scientist who participated in the Tahrir Square protests in 2011, the FT can confirm.
None of the people described above was aware of their inclusion in the database. Their images were obtained without their explicit consent, as they had been uploaded under the terms of Creative Commons licences, an online copyright agreement that allows images to be copied and reused for academic and commercial purposes by anyone.
The primary use of facial-recognition technology is in security and surveillance, whether by private companies such as retailers and events venues, or by public bodies such as police forces to track criminals. Governments increasingly use it to identify people for national and border security.
The biggest technical obstacle to achieving accurate facial recognition thus far has been the inability of machines to identify human faces when they are only partially visible, shrouded in shadow or covered by clothing, as opposed to the high-resolution, front-facing portrait photos the computers were trained on.
Now somebody’s face is used as a tracking number to watch them as they move across locations on video, which is a huge shift Dave Maass, senior investigative researcher at the Electronic Frontier Foundation
To teach a machine how to better read and recognise a human face in these conditions, it has to be trained using hundreds of thousands of faces of all shapes, sizes, colours, ages and genders. The more natural, varied and unposed the faces are, the better they simulate real-life scenarios in which surveillance might take place, and the more accurate the resulting models for facial recognition.
In order to feed this hungry system, a plethora of face repositories — such as IJB-C — have sprung up, containing images manually culled and bound together from sources as varied as university campuses, town squares, markets, cafés, mugshots and social-media sites such as Flickr, Instagram and YouTube.
To understand what these faces have been helping to build, the FT worked with Adam Harvey, the researcher who first spotted Jillian York’s face in IJB-C. An American based in Berlin, he has spent years amassing more than 300 face datasets and has identified some 5,000 academic papers that cite them.
The images, we found, are used to train and benchmark algorithms that serve a variety of biometric-related purposes — recognising faces at passport control, crowd surveillance, automated driving, robotics, even emotion analysis for advertising. They have been cited in papers by commercial companies including Facebook, Microsoft, Baidu, SenseTime and IBM, as well as by academics around the world, from Japan to the United Arab Emirates and Israel.
“We’ve seen facial recognition shifting in purpose,” says Dave Maass, a senior investigative researcher at the EFF, who was shocked to discover that his own colleagues’ faces were in the Iarpa database. “It was originally being used for identification purposes . . . Now somebody’s face is used as a tracking number to watch them as they move across locations on video, which is a huge shift. [Researchers] don’t have to pay people for consent, they don’t have to find models, no firm has to pay to collect it, everyone gets it for free.”
The dataset containing Jillian York’s face is one of a series compiled on behalf of Iarpa (earlier iterations are IJB-A and -B), which have been cited by academics in 21 different countries, including China, Russia, Israel, Turkey and Australia.
They have been used by companies such as the Chinese AI firm SenseTime, which sells facial-recognition products to the Chinese police, and the Japanese IT company NEC, which supplies software to law enforcement agencies in the US, UK and India.
The images in them have even been scraped by the National University of Defense Technology in China, which is controlled by China’s top military body, the Central Military Commission. One of its academics collaborated last year in a project that used IJB-A, among other sets, to build a system that would, its architects wrote, “[enable] more detailed understanding of humans in crowded scenes”, with applications including “group behaviour analysis” and “person re-identification”.
In China, facial scanning software has played a significant role in the government’s mass surveillance and detention of Muslim Uighurs in the far-western region of Xinjiang. Cameras made by Hikvision, one of the world’s biggest CCTV companies, and Leon, a former partner of SenseTime, have been used to track Muslims all over Xinjiang, playing a part in what human-rights campaigners describe as the systematic repression of millions of people.
Earlier this week, it emerged that SenseTime had sold its 51 per cent stake in a security joint venture with Leon in Xinjiang after the growing international outcry over the treatment of the Uighurs.
“That was the shocking part,” York says, as she considers the ways multiple companies and agencies have used the database. “It’s not that my image is being used, it’s about how it’s being used.”
Harvey has been investigating face datasets since 2010. The collection he has built up in that time comprises datasets that are readily accessible to researchers for academic and commercial purposes. The 37-year-old has been analysing where these faces come from, and where they’ve ended up.
By mapping out these biometric trade routes, he has started to slowly piece together the scale of distribution of faces, which may have contributed to commercial products and surveillance technologies without any explicit permission from the individuals in question.
“There’s an academic network of data-sharing, because it’s considered publicly beneficial to collaborate. But researchers are ignoring the stark reality that once your face is in a dataset, it’s impossible to get out of it because it’s already been downloaded and re-used all over the world,” he says over coffee in Berlin’s Mitte neighbourhood…….Read More>>