I was reading a post yesterday. It touches the topic of social filtering or collaborative filtering. I’m not sure if those are equivalent terms yet.
It raises questions on recommendations, popularity but from the other end. Most of the recommendation engines in my opinion focus on
similarity of attributes, doing matrix manipulations to come up with more of the same. The reasons for those are obvious, people like something and they want to see more of it. If you show them items more aligned to their interests, they are more likely to buy stuff, spend more time on your site and improve upon a variety of such positive metrics.
I’ve always thought that was an incomplete way of going about it. In anything involving social behavior, there are infinite such aspects which we might not ever fully understand. All we can hope to do is capture and simplify everything into essential bits that explains most of it. I rather see recommendations as:
recommendations = similarity + serendipity + sieving
- similarity – this is simply including the items that we think are most similar based on some finite attribute set. This is where the current focus is and what collaborative filtering tries to do.
- serendipity – A good recommendation system should have an element of surprise built in as well. Thus given the same item, the result set should not be fully predictable. Tastes change over time, and in addition a StumbleUpon-like feature has a potential for interesting user feedback
- sieving – or another kind of filtering, which is simply the act of filtering out stuff i would hate.
I think sites like http://www.thesixtyone.com do a good job of hooking people in with good music. The reason i spend so much time on that site is they give you a feeler first, of tried/tested/proven music. The music that people before you have liked a lot, it somewhat increases the probability of you feeling the same way. They have an explore section where i can set the level of “adventure”, and listen to completely random new music and which is the most interesting aspect of the site for me. Although it’s probably harder to figure out what algorithm they or anyone else use to recommend the next item but it’s easy to get a feel for it based on the kinds of data they collect.
Facebook like buttons are everywhere now. It’s adding a social layer on top of external links that tells us something important about its popularity. It also tells me Facebook wants to know what people read, watched or listened to. But i still think they’re missing out on important pieces of the puzzle here. Why shouldn’t there be a dislike button there. Why are companies not collecting any negative metrics? Page rank builds on the implicit recommendations in links. Does the meta data even exist on the Internet for generic Web pages to specify dis-similarity? Sites like Digg/Reddit have down votes, which is a start. My Yahoo Hackday project was an Anti-recommendation engine for music/movies(it used to live at http://epcntr.appspot.com). But it didn’t do a very good job, and the reason for that was simply lack of the right dataset. The questions being asked right now, ofcourse do not provide enough data to answer the converse questions.
How can we start collecting the right data? How would it improve the web if we had additional data on “hate patterns”? Do upvotes/downvote data from Digg, Reddit tell us something more about all the uncharted anti-matter on the internet. Like the article asks, what would a negative page rank look like? I have no clue, but collecting the right data might be a start.
Some other related articles/papers, that might be of interest: http://www.readwriteweb.com/archives/why_filtering_is_the_next_step.php http://ways.org/en/blogs/2010/jan/07/social_filtering_of_scientific_informati... http://blog.superfeedr.com/social/algorithm/a-social-filtering-algorithm/ http://portal.acm.org/citation.cfm?id=1451983.1451997