                    Trust-Based Recommendation Systems

                    Reid Andersen from Microsoft Research is talking about trust-based recommendation systems (PDF). To build a personalized recommendation, you need a trust graph among users. What system should you use to determine the recommendation? The researchers use an axiomatic approach. The context of their axiomatic system is social choice theory (see Arrow's impossibility theorem for voting systems from 1951). More recent treatments are Webpage ranking systems (Altman, Teeneholtz, '05). The details are fairly complex, but the basic idea is that by proposing axioms until you get an inconsistency in the axiom set and then backing off and exploring other axioms
                    Exploring Beijing

                    Parking attendent(click to enlarge) I'm in Beijing for WWW2008 which starts tomorrow. I came out early (last Saturday) because I find conferences much more enjoyable when I'm not suffering from jet lag. I'm pretty well adjusted now and I'm looking forward to the talks tomorrow. In the meantime, I've taken some time to explore Beijing a bit. Sunday I was quite tired and other than going to church, a fun experience in Beijing, stuck close to the hotel. It was rainy both Sunday and Monday, so the weather wasn't up to outdoor activities. Because of that, I decided that
                    Elias Torres on SPARQL

                    I just published an interview I did with Elias Torres on SPARQL and the semantic Web at IT Conversations. This is part of my personal podcast that I call Technometria to couple it to this blog. Rohit Khare introduced to me to Elias while we were all touring the castle in Edinburgh while at WWW2006 in May. I started talking with him about SPARQL and immediately knew I wanted to know more about it and that he was the right guy to explain it. I think you'll find his interview interesting whether or not you're a fan of the
                    WWW2006 Conference Wrap-Up

                    Rent-a-cop at the convention center. They look very professional here.(click to enlarge) So, WWW2006 is wrapping up. There are still a few sessions and dinner tonight with some new friends, but for the most part it's done. Overall, this has been a good conference. When I looked at the conference program before I came it was overwhelming and, frankly, there wasn't much that looked all that interesting based on the titles that I scanned. In spite of that, when I got here, I found that it was rather easy to focus on specific tracks that looked interesting and there were
                    Late Breaking News Session

                    My presentation on LDDI was in the "Late Breaking News" session since we basically missed all the deadlines. There were some other interesting presentations in that session as well. Daniel Harris and Niel Harris (no relation) presented Kendra, a non-profit initiative to create an open market for digital goods. They presented Kendra Base, a tool for describing digital goods using meta-data. They describe it as "a semantic information publishing and querying system prototype." They also called it a "provocation," meaning that they're hoping someone can do it better--they're just exposing the ideas. The user shouldn't have to know RDF,
                    China, the Internet's Broken Link

                    Danny Weitzner, W3C, at WWW2006(click to enlarge) Danny Weitzner from the W3C started out today's plenary session with a discussion of the Internet and Society called "China: A Broken Link on the Web. Is it the case that if everyone's a publisher, then too is every government a filter and interceptor? He starts off noting the story of Yahoo! "helping jail a Chinese writer" and made some interesting points: Yahoo! has no basis for ignoring Chinese law while obeying the laws of other countries. That leaves the choice of simply not doing business in China. There's an argument that being
                    Identity Management Panel

                    I attended an identity management panel moderated by Arnaud Sahuguet of Google. On the panel were Rick Hull, Bell Labs, Conor Cahill, Intel, Kim Cameron, Microsoft, Mike Neuenschwander, Burton Group, and Stefan Brands, Credentica & McGill University. Arnaud started off with the famous "no one knows your a dog" cartoon and the ACLU pizza video. He asked each panelist how many different identities they have. The answers ranged from 40 to 313 (Cahill knew exactly). Kim said he uses classes of identities (my own strategy) for different kinds of sites. Converged networks (wireless, television, Internet) make the problem of
                    WWW2006 Conference Dinner

                    The conference reception was held at Edinburgh castle. I've been taking photos while I'm here. Here are a few from our visit to Edinburgh Castle last night. Edinburgh Castle Edinburgh Castle's Main Gate The Firth of Forth from Edinburgh Castle Tim Berners-Lee chatting with a bagpipe player at Edinburgh Castle Yesterday my nine year-old son asked me if I'd seen a bagpiper yet. I hadn't, so when I saw on at the castle, I went over to take a picture. Interestingly Tim Berners-Lee was chatting with him, so I snapped a picture. The trip to the castle was a
                    Free the Data!

                    Free the Data! Panel(click to enlarge) A specially arranged panel session called Freeing the Data was moderated by Kieron O'Hara (Univ. of Southhampton). On the panel were Daniel Weitzner (W3C & MIT), Daniel Harris (Kendra), and Jeremy Frey (Univ. of Southhampton). Jeremy Frey is a chemist and took the position that any scientist doing research should not only make results available, but the data as well. But making the data available isn't enough. We need to make it findable as well. Moreover, we need the context to be available and machine readable. Another issue with data is correctness. Published papers
                    Detecting Cloaking in Web Pages

                    Baoning Wu from LehighUniversity(click to enlarge) Here's something I'd never heard of before: cloaking. Cloaking is the process of returning different pages to a search engine crawler for a given URL than you return to other users. You can imagine why people intent on getting higher search engine rankings than they deserve might want to do this. When you change the meaning of the page (rather then merely its structure) it's called "semantic cloaking." So, how can you detect semantic cloaking? Baoning Wu from Lehigh University presented work aimed at answering this question. (See the paper.) You can't reliably detect
                    Improving Search Results Inside the Enterprise

                    Pavel Dmitriev from Cornell(click to enlarge) Organizations often use search engines as part of their corporate information infrastructure. The problem is that inside corporations creating Web pages is typically much more difficult than it is on the Web at large and consequently, links to pages are a much less useful indicator of page relevance. How do you solve this problem? I attended a presentation by Pavel Dmitriev from Cornell that discusses one such solution. (See the paper.) Within an organization, users are much more likely to be interested in improving the results from a search engine. Dmitriev and his co-authors
                    Knowing the User's Every Move

                    I sat through Richard Atterer's talk on User Activity Tracking for Website Usability Evaluation and Implicit Interaction. (See the paper.) The problem is that putting code on the client to track user actions is invasive and users aren't likely to put up with it. On the other hand, putting the code on the server misses JavaScript actions that don't result in server requests. Their answer was to use a rewriting proxy called UsaProxy that rewrites any page you request to make sure their tracking JavaScript is included. Very clever and related to some other things I've seen for modifying
                    Visualizing Flickr Tags

                    This afternoon I popped into Andrew Tomkins' talk on Visualizing Tags over Time. The paper was nominated for a best paper award. The research looks at visualizing Flickr tags. Images and tags form a bi-partite graph that encourages "pivot browsing." Tag clouds represent the default way of visualizing tags. Tags are not fixed in time. Does the temporal structure lead to a representation that allows up to surf through time and pick a gestalt sense of what was happening over time? He demos a visualization that scans through the tags for each day, picks out representative tags and then
                    Symmetric Queries in XML

                    Also in the XML session, Shuohao Zhang from Washington State University spoke on Symmetrically Exploiting XML. This paper was nominated as a best student paper. (See the paper.) XML queries are asymmetric because they're hierarchical. Rearranging the hierarchy requires changing the query. This work is aimed at making a single query work across multiple structures. This is useful when you don't know what the schema is, for heterogeneous or irregular data, or when the schema evolves. Axes (parent, child, ancestor, descendent, preceding, following, etc.) are all directional. This work proposes a non-directional axes called closest. The semantics is a
                    XML Screamer, a Fast XML Parser

                    This afternoon I attended the XML session. The first speaker was Eric Perkins who spoke on XML Screamer, an integrated, high-performance XML parser/validator. This paper has been nominated for the best paper award. (See the paper.) XML parsers are slow. Many people think that the human readability of XML is what makes it slow. How fast should we be able to go? Reading through an input file should take about 10 cycles/byte (1GHz processor). Xerces-C does 6Mbytes/Sec/GHz. Expat is 12Mbytes/Sec/GHz. What's happening with all the other cycles? Eric walks through the steps required to parse a file. There are
                    Mashups, Web Data, and APIs

                    Frank Mantek, Jeff Barr, Dan Theurer, and Kevin Lawver(click to enlarge) I decided to take in Rohit Khare's panel on Next Wave (Business) this morning. This was part of the developer track that has normally been Rohit was kind enough to invite me to the panel dinner last night. It was fun and I Dan Theurer from Yahoo! was first up and used the theme "What Powers Web 2.0 Mashups?" Dan introduced the Yahoo! Developer Network. The first APIs that Yahoo! launched were the search APIs a little over a year ago. He showed a long list of APIs that
                    The Next Wave of the Web

                    Plenary Panel(click to enlarge) WWW2006 was started this morning with an introduction to the technical program. The conference is very competitive; of the 697 papers submitted this year, 84 were accepted or 11%. The plenary panel was entitled "The Next Wave of the Web." Nigel Shadbolt (University of Southampton) was the panel chair. The panelists were Tim Berners-Lee (W3C), Richard Benjamins (iSOCO), Clare Hart (Dow-Jones), and Jim Hendler (MINDSWAP). The discussion was mostly about the semantic web. Shadbolt asked Berners-Lee what the achievements have been in the semantic web since the first article appeared in Scientific American. He pointed to
                    MoodViews: Analyzing Mood Data from Blogs

                    Krisztian Balog(click to enlarge) Blogs are one of the places on the web you can reliably find people's writing about their moods. Krisztian Balog presented a paper called "Decomposing Bloggers' Moods: Towards a Time Series Analysis of Moods in the Blogosphere." This can be used to produce interesting data. For example, MoodViews tracks a stream of mood-annotated text from LiveJournal. MoodViews tracks, predicts, and analyzes moods on blogs. Moods have a cyclic component. Some moods depend on time of day, some on the day of week. You can show a correlation between major events (say the London Bombing) and mood.
                    Detecting Splogs

                    I went to a session on blogging this afternoon. One talk was by Tim Finin on detecting splogs. He is part of the ebiquity research group at UMBC. He and his students do some interesting work in recognizing splogs. Tim wrote a funny splog bait post to see where it would get picked up. Here's an interesting data point: the in-degree distribution of authentic blogs are described by a power-law, but splogs are not. The same is true of the out-degree. Ping times for real blogs is periodic according to the sleep cycle of the blogger. Splogs ping on
                    I'm just getting ready to leave to WWW2006 in Edinburgh Scotland. I'll be blogging interesting talks and events after I get there (sometime tomorrow). You can follow my coverage by looking at my www2006 tag or even subscribing to my www2006 specific RSS feed.
