XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 Weblogs, Publish-Subscribe, and Web Collections: A REST Analysis
Subject: Some corrections to claims about PubSub
Date: 2004-12-02 00:08:12
From: Bob Wyman

Thanks for taking the time to study PubSub.com and provide your comments. I would, however, like to point out a few inaccuracies in your note:


1. You understate PubSub's throughput capacity by three orders of magnitude! We benchmark at 3 billion matches per second -- not the mere 2.4 million/second that you claim! If PubSub could only handle a few million matches per second, PubSub wouldn't be a very useful system. An Internet Scale system must be able to handle millions of complex subscriptions that are each being matched against at least hundreds of messages per second. Matching rates of billions per second are a *minimal* requirement for Internet Scale matching engines.


2. In addition to providing notifications via "Atom over XMPP/Jabber", RSS and Atom files, PubSub also supports email notification for SEC Edgar alerts, press releases, and Airport Alerts. (We don't support email for Weblog or Newsgroup updates simply because volumes could too easily become excessive for users.)


3. PubSub does support "REST" notifications -- i.e. Atom entries which are POSTed using pure HTTP. However, we have found no interest in the community for using these things (primarily because they can't punch through firewalls) and thus have given up on publicizing our REST support. We continue to support those few people who have been using these REST alerts, but don't anticipate finding many new users. Note: We also implemented SOAP based notification but couldn't find anyone interested in even testing it. Notifications are most valuable when they reach to the desktop (not just to servers) but the desktop is typically shielded by firewalls.


4. It is odd that you suggest that PubSub is not a "generalized" pubsub system. Actually, that is precisely what we have built. However, at this time, we only choose to expose the specific applications that we have built using our very generalized system. The money is in the applications -- not tools or technologies. We'll provide access to the general system later but need to focus on the applications for now.


5. Your suggestion that PubSub eliminate the "two step" subscription process of 1) Specify subscription query and 2) receive result URI, ignores a number of important elements of our service. First, doing what you suggest would require that no subscription could be more complex then what can be packed into a limited-size URI. This is a significant restriction. We support general boolean queries with potentialy dozens of terms or predicates. A subscription can easily become much larger than what can be stored in a URI. Also, to rely on a user provided URI would require that user-specific information be included in the URI. That would make it impossible for users to "share" the URI's for the subscriptions they generate.


Thanks again for taking the time to review what we've done. If you have any more ideas on how we might improve our service, please let me know.


bob wyman
CTO, PubSub.com


No Previous Message Previous Message   Next Message Next Message


Titles Only Titles Only Newest First
  • Some corrections to claims about PubSub
    2004-12-02 22:07:45 Mike Dierken [Reply]

    1. I apologize for the incorrect metrics of pubsub.com matching rates - I thought I got it right off of your weblog at http://bobwyman.pubsub.com/main/2004/06/hyperbole_numbe.html. (Given 225B/day I still get 2.6M per second).


    3. You are right, I didn't realize pubsub.com supported HTTP POST. I would be interested in using this capability, since the mod-pubsub.org system has the ability receive POSTs and to forward messages to clients and desktops. It would be interested to see the two systems hooked up together.


    4. Sorry for giving the impression that I thought pubsub.com was not a generalized system - I only meant to describe the data sources currently supported. And I definitely agree that the money is in the applications rather than the technologies.


    5. My impression is that most queries are simple enough to express in a URI. More complex ones are the edge case that don't have to follow the one-step creation process. Also, I don't know what user-specific data would be placed in the query that would wind up in the URI, but it definitely would hinder sharing URIs.


    All in all, I think pubsub.com is a great system and I have the utmost respect for you and your team - keep up the good work of bringing publish/subscribe technologies to the Web.

    • Some corrections to claims about PubSub
      2004-12-03 01:25:09 Bob Wyman [Reply]

      Mike,
      1. Yes, you are right. The most we've ever needed to do in production is a few million matches per second. The 3 billion/second number is what we get in testing.


      3. re: mod-pubsub support. We used a slightly modified version of our REST support to feed messages to KnowNow LiveServers. I believe the mod-pubsub interfaces are very similar to KnowNow's. The KnowNow/mod-pubsub technology is very useful since it allows us to establish a light-weight, persistent, firewall-piercing connection to the desktop. Our focus is on the matching problem and we're pleased to be able to leverage existing solutions to actually do delivery of messages. Let's talk offline in email about working with mod-pubsub.


      5. It would take too long to explain in a comment, however, let me just say that a number of methods for computing "relevance" of matches rely on examining the history of messages that have been delivered to a particular subscription over time as well as on user feedback. Thus, even if two subscriptions have identical queries, they may deliver different results based on when they were created and the user's history of interaction with the results. If we don't provide a binding between a user and a subscription, we will be severely limited in our ability to implement a whole class of improved methods for determining the "relevance" of a matched item. This would not be a good thing. The "single step" solution that you propose works very well with retrospective searches (i.e. what Google, Feedster, etc. do) where the entire result set is available each time the query is re-evaluated. However, this solution is much less useful in a "prospective" system like we implement since in such a system, the result set accumulates over time and we can benefit from user interaction and hinting over time.


      bob wyman


Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938