                    Trust-Based Recommendation Systems

                    Reid Andersen from Microsoft Research is talking about trust-based recommendation systems (PDF). To build a personalized recommendation, you need a trust graph among users. What system should you use to determine the recommendation? The researchers use an axiomatic approach. The context of their axiomatic system is social choice theory (see Arrow's impossibility theorem for voting systems from 1951). More recent treatments are Webpage ranking systems (Altman, Teeneholtz, '05). The details are fairly complex, but the basic idea is that by proposing axioms until you get an inconsistency in the axiom set and then backing off and exploring other axioms
                    Exploring Beijing

                    Parking attendent(click to enlarge) I'm in Beijing for WWW2008 which starts tomorrow. I came out early (last Saturday) because I find conferences much more enjoyable when I'm not suffering from jet lag. I'm pretty well adjusted now and I'm looking forward to the talks tomorrow. In the meantime, I've taken some time to explore Beijing a bit. Sunday I was quite tired and other than going to church, a fun experience in Beijing, stuck close to the hotel. It was rainy both Sunday and Monday, so the weather wasn't up to outdoor activities. Because of that, I decided that
                    WWW2007 Wrap-Up

                    Today I'm on my way home from Banff. The conference goes until Saturday, but with IIW starting Monday of next week and Sunday being Mother's Day, I didn't feel like I could hold out until the end. My feelings on WWW2007 are mixed. This is one of the few conferences I'm aware of in this space that mixes academic and commercial interest. I think that's a worthy goal. What's more, I attended many good presentations that led me to new lines of thought. That's the ultimate measure of a presentation or conference, I think. And yet, I was also
                    Marc Hadley on WADL: a RESTful API Description Language

                    Marc Hadley (from Sun Microsystems) is giving a talk called "Describing Web Applications - WADLing with Java." WADL is a RESTful description language for Web APIs. WADL comprises resource, method, request, and response descriptions. Marc gives an example using the Yahoo News Search API. Resources are specified relative to a base URI and can describe parameters that are common to all methods. Methods are the standard HTTP methods and can specify a request and response set for that method. Responses have representations that describe the type of the response. The language can also describe faults as responses. There are
                    Theodore Bullock: HTTPerf is New and Improved

                    HTTPPerf is a tool for measuring Web service performance. The problem is it hadn't been updated since 2000, even though there had been numerous bug reports in the intervening seven years. Theodore Bullock, recently of the University of Calgary, reported on a project to fix reported bugs and redo the build system, making it more portable that a Software Engineering class carried out last year. The result is version 0.9 is is freely available. There are plugins that do sessions and Web log playback. Others could be written. For example, I'd like to see a plugin that incorporates Rhino
                    Olivier Thereaux on the Unicorn Validator

                    I'm in a talk in the Developer's Track where Olivier Thereaux is discussing the Unicorn project, which is building a new, opensource, generation of Web content validation.
                    Hunting Down Spammers

                    The last talk reminds me that on my way into Canada, as I was passing through customs, the customs officer asked me my business. I reported I was going to give a tutorial at a Web conference. Here's the conversation: Customs Officer: On what? Me: Digital identity. Customs Officer: What's that? Me: Ways to identify people on the Web. Customs Officer: Will it help with Spam? Me: Not directly. Customs Officer: Will you ask the people at the conference if there's any way we can hunt them [spammers] down and kill them? N.B. I think by "we" he meant
                    Understanding Splogs

                    Have you ever wondered exactly how splogging (spam blogs) work? What's the structure of that industry (and it is an industry)? Yi-Min Wang and Ming Ma (of Microsoft Research) and Yuan Niu and Hao Chen (of UC Davis) have studied the problem and found that there's a bottleneck in the economy of splogging at what they call the "aggregator level." This is the place to fight splogs. Here's the PDF version of the paper and here's a NY Times article on the results.
                    Finding Quality Blogs

                    This talk entitled "Exploring in the Weblog Space by Detecting Informative and Affective Articles" by researchers from Shanghai Jiao-Tong University (see full paper) describes the use of machine learning techniques to classify blogs and blog articles according to the amount of "informative" and "affective" information in the blog. Affective here is a fancy word for "touchy-feely." The authors use various discrimination techniques and give results on which are the best for their purposes. The propose that being able to find blogs and blog articles they classify as "informative" leads to information, usually by experts, and is the kind of
                    Compact and Fast XML Processing

                    I went to a talk on a paper called "Querying and Maintaining a Compact XML Storage" by Raymond Wong, Franky Lam, and William Shui. Here's the abstract and here's the paper (PDF). The authors created a clever encoding of XML that not only takes much less storage, but is also much faster. For example, here's some data he shows for a 100Mb XML document (compared to MS Vista's native XML libraries): The results are sufficient that you could imagine doing this on a mobile phone, for example.
                    Prabhakar Raghavan on Science for Engaging and Monetizing Audience

                    Prabhakar Raghavan from Yahoo! Research(click to enlarge) Prabhakar Raghavan is giving the morning keynote. He's the head of Yahoo! Research. The title of the talk was "What sciences will Web N.0 take?" But, more accurately, I'd call it "Science for Engaging and Monetizing Audience." Yahoo! takes in editorial, free (including blogs, twitter, pictures, etc.) and commercial content "content." The audience "consumes the content" but also enriches the content. Finally the audience transacts (commerce) with the content. Yahoo! isn't the only one in this business. Google, AOL, MSN, and even NewsCorp are in the business of matching content to audience
                    User-Centric Identity Tutorial Resources

                    Banff Springs resort. (click to enlarge) I gave my tutorial on user-centric identity today. There were around 40 people there--a good crowd and very interested in identity. I promised that I'd post a list of resources, so here we go. First, my slides in PDF format. Warning: the upload from the hotel is going very slowly, so this probably won't be available until later tonight. Here's the tarball for the demonstration code I did with OpenID. I add authentication to a simple Web application using a separate, general login controller. There are pictures in the slides. It's in Perl.
                    Off to Banff for WWW2007

                    I'm headed to Banff next week for WWW2007. If you're going to be there too, I arrive Monday afternoon and I'm looking for a group to go dinner with on Monday night. Let me know. I'm doing a tutorial on user-centric identity on Tuesday morning. Not quite ready, but getting there. The demos are working and the slides are mostly done. Just need a little polish. In any event, I'll be writing about the conference throughout the week, and tagging the coverage with www2007. If you're curious, here's what I wrote about www2006 last year in Edinburgh.
