Our first Ph.D. Brownbag session came and went today with three second-year Ph.D. students talking about their research work.
To kick-start our new doctoral forum, Jim Costello and Amrita Mohan gave nice overviews of bioinformatics and their areas of interest. Since there isn’t much opportunity, outside of capstone presentations, to understand what the science folk are up to, it was a welcome introduction. I’ll muddle through this a bit, I’m sure, but here is what I heard Friday:
Bioinformatics isn’t just about using this gene pattern-matching application, BLAST, or coming up with new widgets to automate some of these lookup tasks biologists need to do regularly. It is also about dealing with extremely large and diverse data sets, trying to find new insights and ways of visualizing the information in a meaningful way. According to Jim, there are a number of very different methods of study biologists explore (protein-protein interactions, gene expression, genetic interactions, phenotypic annotation, and some other things I didn’t fully process). Systems biology looks at all of these data set and levels of investigation at the same time, trying to mine information about the system as a whole.
Whereas Jim is looking across the board, Amrita is focusing on some specific properties and kinds of proteins. Nearly one-third of proteomes are intrinsically disordered, allowing for the flexibility proteins need to generate all their many functions. A special class of disordered proteins — MoREs (molecular recognition elements) — are distinguished because they are predictable. Biologists know Protein A will always bind with Protein B. An interesting thing happens, however, when MoREs bind to much larger proteins: they become structured proteins (plasticity). These proteins are very short, on the order of 17 in the sequence instead of 200-300. Amrita is looking closely at one of these, p53, which is relevant to cancer research. By focusing on which substances it binds to and on large structures that attract multiple MoREs, she hopes to be able to make some predictions computationally.
My take-home for the bio work is that there may be some parallels to online communities. Interactions between people are also studied in many ways without understanding (or often looking) laterally to see how they relate. Perhaps by following what Jim does with biological data, some methodological approaches might prove useful when studying the mechanics of community. Likewise, Amrita’s technique of data mining — (a) recognize class/type of form, (b) note something unique about interaction, and the importance of it in some larger problem, (c) flip the analysis around, and look for interactions from the point of view of the other form — might be a nice approach to understanding critical interactions, after they have been identified in some way.
Justin Donaldson‘s work is already well known to me (), but it is helpful to hear it explained again. His main focus with MyStrands is to help visualize the complicated network that arises when tracking a zillion or so personal music playlists. Since it is practically meaningless to show everything to a user, Justin concentrates on revealing the neighborhood surrounding a single playlist. That neighborhood is a reflection of best fit, rather than the strongest individual connections. In other words, the best recommendation would be a new song connected to every song in a playlist but nothing else. This participation ratio is the measure that prevents a pop sensation like the Beatles from being the top recommendation for everything.
Justin’s technique creates some meaningful structures. The global popularity of a given song is shown through the size of the “bubble” in a graph. The fan contains the mainstream part of the larger network, moving from the most popular songs down to a zero point where the interesting information is seen. Tails spike out from there, indicating songs that correlate only with themselves, forming niches of songs or a particular sub-genre or artist. One of Justin’s big contributions is how he handles occlusion, the points where many nodes are situated in the same space and are therefore difficult to differentiate. He made the other nodes move away from the selected song, to make it clearer what the relationships are in that part of the neighborhood. The result is a tool that facilitates new discovery of information not picked up in a standard field-name query.
The take-home here is still that I can see value to such discovery in social networks established through forums. This kind of visualization could help identify groups of people by topic or conversational style based on simple interactive relationships. It could also help to find new discussion partners. Imagine going to a forum and getting a message like: “If you like to talk to this person, then you’ll also like this one”