Thursday, July 21, 2011

What is wrong with PubSubHubBub.

I have been having a great set of skype meetings with our team in Trento (Fabio, Marcos and Beatrice) about the Publish and Subscribe system. We are making excellent progress and have the system fleshed out already. However, from the beginning we have all been aware of some limitation of the usual publish and subscribe models for both RSS/ATOM and PubSubHubBub. We are working with Trento because of three limitations in particular: Size of feeds, lack of security and need to filter output. Today, in anticipation of our meeting Beatrice posted her list of PubSubHubBub limitations which I think are the best to date from anywhere. With her permission, I am posting them here. By the by, for these reasons, we are not using the Google Code PubSubHubBub implementation, but will be using Apache ActiveMQ (

PubSubHubbub limitations

by Beatrice Valeri (, Trento University Computer Science, Italy

PubSubHubbub is a protocol for pushing updates of atom feeds. It is not useful for the UCLA Push project because it is thought for very simple things and it doesn’t cover many of the requirements.

1.   When a new subscriber arrives, he should receive also all messages written before his arrival. This is not completely supported by PubSubHubbub. If the subscriber subscribes before the feed is published, then, after the feed is published, the subscriber receives all messages written before publishing and all the following updates. Once the feed is published, new subscribers receives only the updates.
2.   There is no security on the hub. Anyone can subscribe and publish.
3.   Subscribers have no way to subscribe only on some messages from a feed. Filtering have to be done by the subscriber after messages are received.
4.   PubSubHubbub is not able to manage feeds that are already big at the moment of publishing. When a feed is published, the hub reads it completely and parses it. If the feed is too big, the hub is not able to parse it.
Feeds have to be broken into pieces and each piece has to be published.
5.   The subscriber has to know the feed url in order to subscribe to it. This is not a real pub-sub system since subscriber has to be aware of publishers and has to subscribe again if a new interesting feed is published.
6.   With PubSubHubbub, a feed can be published only if there is already a subscriber waiting for it. This is not what we want.

    Sunday, May 8, 2011

    Why we are not interested in Portals - 2

    From my last post, I have worked up a model of a Portal implementation so we can now compare our model with that of a Portal. Of course, there is a lot more functionality in your average Portal than I have modeled, but the point remains as this model fits the fundamental structure of the vast majority of Portals.

    If we look at the Portal model we see that all of the information flow is from the User to the Portal. There is no information moving back to the User, except that they are looking at on the Portal. The control of the presentation, organization and interface is with the Portal and stays with the Portal. Even the comments don't move back to core Institutional database, but remains within the Portal. As the Portal is almost always under the control of the Institution anyway, this means that information not only moves from the Users to the Institution (no change there), but also the control of the way the information is managed, presented, accessed and ordered remains with the Institution as well (again, no change there).

    If we look at our model, there are three fundamental differences. First, the information that goes to the hub, is not organized, presented nor ordered there, but simply PuSHed from there to the Subscribers' servers. It is at the Subscribers' servers that the information is organized, presented and ordered many times over, and in the local context. There is also the key difference that comments are not simply attached to a Portal instance, but are available to return into the Institution as part of the object's primary record. Most fundamental of all, though, is that our model is reversible. Any Subscriber in our model can become a Publisher, and any Publisher a Subscriber. Knowledge developed through the local use of Institutional information, can be PuSHed back to the Institution to enhance their documentation of the object. A Portal cannot be reversed as it is not a distribution system, but a broadcast system.

    Sunday, May 1, 2011

    Why we are not interested in Portals.

    I thought we would lay down the gauntlet now, even though we are a bit of a way off demonstrating our work. I have uploaded two models for our systems. The first is a User Model, which is a demonstrative diagram for a general audience that shows what our system will intend to do. The second is an Object Model, using UML, which is more for developers. The two more or less depict the same system, though. I should emphasize, for those developers reading this, that the Object Model is not a full Class Model, but more of a "conceptual" Object Model.

    What I want to show, however, is not simply what we intend to do, but to also explain why this is different from a Web Portal. In a recent discussion between the IT Officer for Anthropology at the American Museum of Natural History (New York), Jim and I, it became clear that what we are doing could easily be confused with a Portal. Or, worse in my mind, that it could be assumed that there would be little difference between what we are doing and a Portal. In fact, I think that what we are doing is fundamentally different, and even opposed, to what Portals do. Here is why.

    If you will pardon me drawing a definition from Wikipedia, a Web Portal is "a web site that function as a point of access to information on the World Wide Web. A portal presents information from diverse sources in a unified way." (Wikipedia, emphasis added). Wikipedia goes on to say that a Portal "provide[s] a way for enterprises to provide a consistent look and feel with access control and procedures for multiple applications and databases, which otherwise would have been different entities altogether." This is the key difference to what we are trying to do and what Web Portals are trying to do. Whereas a Web Portal takes a diverse set of resources, centralizes them and gives them a single "enterprise" identity, what we are trying to do is the opposite. We are trying to do is to take a diverse set of resources, distribute them as filtered sets to diverse expert communities, so that these filtered sets of resources can be localized and used in completely different ways.

    The difference between a Web Portal and our approach is not simply superficial, but goes right down to our understanding of what Knowledge is. Where the assumption about knowledge in a Web Portal, and most "knowledge systems," is that knowledge is an accumulated resource, a set of commodities that gain their power as knowledge through their packaging or their organizing, we accept a different, less colonial, view of knowledge. We see knowledge not as a set of proscriptively ordered and presented resources, but as a personal, local and community achievement. Knowledge, for us, is something you do, and do skillfully, not something you acquire, proffer or stockpile.

    So the difference may seem subtle, even trivial, but is in fact fundamental. A Web Portal seeks to share information resources between individuals and communities through a unified, proscribed and centralized system -- an enterprise system much like a museum or archive. However, what we trying to do is to share information resources between individuals and communities by distributing those resources into the diverse local systems so that they can be directly used to build local knowledge. While a Web Portal is, by its very nature, a system that creates a unified identity for information, and information use, through its "enterprise" identity, our system seeks to fundamentally undermine this universalizing and commodifying approach to knowledge by radically replacing unity and centralization with diversity and localization.


    Thursday, March 10, 2011

    Cost-Share Admin Details

    I am posting a copy of the letter that Ramesh sent out to each of the project partners that describes the details of tracking each institution's cost-sharing commitments. Feel free to reference this for preparing upcoming cost-share letters. -KB

    "I am writing to clarify a few administrative details you should be aware of regarding tracking and documenting the cost sharing that each of you as partners committed to during the original submission of the federally funded IMLS grant for which I serve as the Principal Investigator at UCLA. First, let me say I appreciate your work and financial cost sharing, which has contributed to a successful completion of work performed during the first year of the project. Your partnership has been critical to this!

    It is important that your financial office track the dollar values for each type of cost share (people, purchases, etc.) since UCLA as a public institution of higher education is required to follow federal guidelines to substantiate cost share committed by each organization/institution. To achieve this goal, we will request at the end of each year a cost share report from each of you that accounts for how you met the total cost share commitment for your organization/institution with a brief description of how the cost related to the project. Please share this information with your respective financial offices.

    Sample ($10,000 cost share)
    - J. Smith (dbase manager) = $50,000 Annual Salary x 10% effort (or equivalent hours, if hourly) + $1,000 benefits = $6,000
    - Software Purchase for data analysis = $2,500
    - Materials and supplies for survey development/instruments = $1,500
    Total = $10,000

    The authorized organizational leader signs a letter containing the elements of the sample above specific to their cost share certifying that the cost share commitment was met as described in the original proposal. Detailed payroll and/or non-payroll documentation is then maintained by the partner should any additional questions arise in the future. The documentation should be maintained by the partner for a period of five (5) years after the final end date of the grant. We appreciate your diligence and cooperation to document the cost share in the most straight forward manner with minimal time investment beyond what you already do to meet your own financial standards.

    Should you have any questions about the contents required for the cost share report, please contact Tracy Nguyen-Phan at (310) 825-4426.

    Ramesh Srinivasan"

    Saturday, January 15, 2011

    Interview with Jussi Parikka

    Hi all, I was recently interviewed by Jussi Parikka (media archaeologist and digital theorist). We talked about the past, present and future of archives. Might be of interest. You can find the interview here.