Thursday, July 21, 2011

What is wrong with PubSubHubBub.

I have been having a great set of skype meetings with our team in Trento (Fabio, Marcos and Beatrice) about the Publish and Subscribe system. We are making excellent progress and have the system fleshed out already. However, from the beginning we have all been aware of some limitation of the usual publish and subscribe models for both RSS/ATOM and PubSubHubBub. We are working with Trento because of three limitations in particular: Size of feeds, lack of security and need to filter output. Today, in anticipation of our meeting Beatrice posted her list of PubSubHubBub limitations which I think are the best to date from anywhere. With her permission, I am posting them here. By the by, for these reasons, we are not using the Google Code PubSubHubBub implementation, but will be using Apache ActiveMQ (http://activemq.apache.org/getting-started.html).




PubSubHubbub limitations

by Beatrice Valeri (xinecs87@gmail.com), Trento University Computer Science, Italy

PubSubHubbub is a protocol for pushing updates of atom feeds. It is not useful for the UCLA Push project because it is thought for very simple things and it doesn’t cover many of the requirements.

1.   When a new subscriber arrives, he should receive also all messages written before his arrival. This is not completely supported by PubSubHubbub. If the subscriber subscribes before the feed is published, then, after the feed is published, the subscriber receives all messages written before publishing and all the following updates. Once the feed is published, new subscribers receives only the updates.
2.   There is no security on the hub. Anyone can subscribe and publish.
3.   Subscribers have no way to subscribe only on some messages from a feed. Filtering have to be done by the subscriber after messages are received.
4.   PubSubHubbub is not able to manage feeds that are already big at the moment of publishing. When a feed is published, the hub reads it completely and parses it. If the feed is too big, the hub is not able to parse it.
Feeds have to be broken into pieces and each piece has to be published.
5.   The subscriber has to know the feed url in order to subscribe to it. This is not a real pub-sub system since subscriber has to be aware of publishers and has to subscribe again if a new interesting feed is published.
6.   With PubSubHubbub, a feed can be published only if there is already a subscriber waiting for it. This is not what we want.