Categories
BlogSchmog In the News Of Course

How the Web Sees You

The value of data mining is dependent on the people wielding the algorithm. Personas, an art installation by MIT’s Sociable Media Group, wants to create awareness about the capricious nature of results.

Personas is a visualization project from MIT that converts digital footprints into a category breakdown that describes an individual. The mission of the web project, however, is not to provide some definitive interpretation of one’s data. Its creators hope to raise awareness about the capricious nature of data mining.

How the Web Sees the Makice Family
How the Web Sees the Makice Family

On the surface, there appears to be some value in compressing all of the blogs, articles, and social media postings into a compact visualization. The high-level descriptive words about a person are easy to understand. In looking at our three most active online family members, one striking observation is that 9-year-old Carter has 20 category descriptors compared to just 11 for Amy. The outcome doesn’t completely make sense: I’m a designer, but politics and management both eclipse that category in my visualization.

Does it mean anything? Potentially, it could, if you revisit the process over time.

It would be interesting to see how Carter’s description changes as he continues to publish his own material, rather than relying on his parents to share his life online. Ambiguous design like this is a collaborative process between the viewer and the system to co-create meaning, often by comparing what one sees now with what was experienced previously. To mine value in this way, though, there needs to be some longitudinal consistency in how the data is processed.

That consistency isn’t there. I ran the same search for “Kevin Makice” three times in succession. With each run, Personas created a different interpretation of the same data:

Differences in each run
Blink, and the interpretation changes

That’s OK, because this is one of the quirks the creators want to reveal.

Personas is a critique of data mining. While acknowledging the real value Google and Netflix bring to people through statistical analysis of large data sets, there is also a dark side that includes TSA watch lists (to name one). The Personas site explains this insight:

Data mining is “technologically neutral” in the sense that its power is derived from what people do with it. The creators of an algorithm choose how to model the world, deciding (somewhat arbitrarily) what inputs and outputs to use. You as the potential “victim” of data mining cannot control any of these factors, especially given the usual lack of transparency of the process.

The purpose of this project is to allow people to peek into one such black box, while still preventing access to controls to shape its engine.

Inside the Black Box
The analytical process of Personas starts with a Yahoo search for public data using “characterizing” queries (which are different from a simple ego search) and limiting the results to no more than 30 items. Some filtering is done on the results to remove hate speech and focus on English language, and words are also stemmed to remove suffixes and simplify the data sets.

Using a technique called Latent Dirichlet Allocation, the results are categorized with an unsupervised algorithm—which means the computer doesn’t know if its work is “correct.” In this case, a method of clustering called topic modeling was used to guess which categories best describe a given document.

Personas was created as an art installation by Aaron Zinman (with help from Alex Dragulescu, Yannick Assogba and Judith Donath) as part of an interactive exhibit—Metropath(ologies)—by the MIT’s Sociable Media Group. Zinman is a PhD student whose past work included “Is Britney Spears Spam” (PDF), an attempt to classify users by the humanness of their communication behavior and social structure.

By Kevin Makice

A Ph.D student in informatics at Indiana University, Kevin is rich in spirit. He wrestles and reads with his kids, does a hilarious Christian Slater imitation and lights up his wife's days. He thinks deeply about many things, including but not limited to basketball, politics, microblogging, parenting, online communities, complex systems and design theory. He didn't, however, think up this profile.

3 replies on “How the Web Sees You”

Kevin – I stumbled across your blog via Google Alert and I am glad I did. This is a fantastic post and its highlights some important issues regarding data mining and its uses and abuses. What I find amazing is how similar the profile images look to those of gene profiles. Could this be a visualization of our digital DNAs?

Thanks.

The point of the project was really about how much interpretation and choice at the data crunching level plays a role in outcome, but the visuals are so compelling I wish it were that simple to understand my interaction like a biologist would a gene.

Comments are closed.