Personas is a visualization project from MIT that converts digital footprints into a category breakdown that describes an individual. The mission of the web project, however, is not to provide some definitive interpretation of one’s data. Its creators hope to raise awareness about the capricious nature of data mining.
On the surface, there appears to be some value in compressing all of the blogs, articles, and social media postings into a compact visualization. The high-level descriptive words about a person are easy to understand. In looking at our three most active online family members, one striking observation is that 9-year-old Carter has 20 category descriptors compared to just 11 for Amy. The outcome doesn’t completely make sense: I’m a designer, but politics and management both eclipse that category in my visualization.
Does it mean anything? Potentially, it could, if you revisit the process over time.
It would be interesting to see how Carter’s description changes as he continues to publish his own material, rather than relying on his parents to share his life online. Ambiguous design like this is a collaborative process between the viewer and the system to co-create meaning, often by comparing what one sees now with what was experienced previously. To mine value in this way, though, there needs to be some longitudinal consistency in how the data is processed.
That consistency isn’t there. I ran the same search for “Kevin Makice” three times in succession. With each run, Personas created a different interpretation of the same data:
That’s OK, because this is one of the quirks the creators want to reveal.
Personas is a critique of data mining. While acknowledging the real value Google and Netflix bring to people through statistical analysis of large data sets, there is also a dark side that includes TSA watch lists (to name one). The Personas site explains this insight:
Data mining is “technologically neutral” in the sense that its power is derived from what people do with it. The creators of an algorithm choose how to model the world, deciding (somewhat arbitrarily) what inputs and outputs to use. You as the potential “victim” of data mining cannot control any of these factors, especially given the usual lack of transparency of the process.
The purpose of this project is to allow people to peek into one such black box, while still preventing access to controls to shape its engine.
Inside the Black Box
The analytical process of Personas starts with a Yahoo search for public data using “characterizing” queries (which are different from a simple ego search) and limiting the results to no more than 30 items. Some filtering is done on the results to remove hate speech and focus on English language, and words are also stemmed to remove suffixes and simplify the data sets.
Using a technique called Latent Dirichlet Allocation, the results are categorized with an unsupervised algorithm—which means the computer doesn’t know if its work is “correct.” In this case, a method of clustering called topic modeling was used to guess which categories best describe a given document.
Personas was created as an art installation by Aaron Zinman (with help from Alex Dragulescu, Yannick Assogba and Judith Donath) as part of an interactive exhibit—Metropath(ologies)—by the MIT’s Sociable Media Group. Zinman is a PhD student whose past work included “Is Britney Spears Spam” (PDF), an attempt to classify users by the humanness of their communication behavior and social structure.Tags: Aaron Zinman, analysis, art installation, data mining, digital footprint, Internet, interpretation, Judith Donath, kids, media, MIT, personas, project, Sociable Media Group, transparency