I first tried Diaspora a few years ago by joining a pod with lots of users on it, and loved that right after creating my account I had posts appear in my feed that matched the tags I was interested in. I finally returned to D* a couple weeks ago to help with development, and was disappointed after setting up my own pod how lonely it feels without public posts from the rest of the network being pushed to me! So I’m glad to see so many here who agree that federation of public posts is a very important feature for D*.
Doing this right is not trivial, but I think a DHT-based (Distributed Hash Table) solution might be the right fit. I’m not an expert on the various flavors of DHT out there, but after doing some research it looks like Pastry might be a good choice. In particular, there is already a publish/subscribe application called Scribe designed for it, and an open source implementation called FreePastry. In a nutshell, the Pastry+Scribe combination provides O(log(n)) average routing hops between nodes, high tolerance of nodes entering/leaving the network, automatic load balancing of topic subscription management and notification multicasts across the network, and the ability to structure the routes between nodes in a way that minimizes overall latency/bandwidth (or other relevant metric.) The idea would be that every D* pod would run a node in the DHT network, which would allow the overhead associated with managing subscriptions and disseminating public posts to subscribers to be automatically shared among all the network’s nodes.
I am going to run some simulations using FreePastry+Scribe to verify this approach for a “hashtag subscription” feature for D*, but before digging in too deeply I have a few questions:
FreePastry is written in Java 5 and its architecture takes advantage of Java threads and asynchronous IO. It might not be a trivial exercise to port this to Ruby+Rails, and I think in any case it would be best to keep any new DHT component cleanly decoupled from the main D* application. What is the development team’s stance towards adding a JVM instance (OpenJDK 6/7) as a new tier to the pod design? I think it would complicate pod setup and configuration a little, but probably not too much.
FreePastry’s implementation uses its own TCP connections for messaging, and UDP for keep-alives. Its architecture is very modular, and so it’s probably possible to proxy all its communication through D*'s existing https-based communication scheme if absolutely necessary. But in the interests of performance and clean design, the much better approach is probably to let the Pastry tier handle its own P2P network communications, and let it communicate through a web services API locally with the Ruby+Rails tier and/or directly with the local database for everything else. In terms of security, https isn’t needed, since we’re dealing with public posts; all that needs to be done is make sure that the payloads carried by Pastry are cryptographically signed. The downside to the separate P2P communication is that it would add to the firewall setup requirements for a pod (although FreePastry already has the ability to use uPnP to open its own ports to the internet, where supported.) How does the development team feel about the idea of requiring additional ports to be opened between pods and the internet?
Are there any parts of the database that are designed to be usable as an interface, i.e. not meant to be controlled and accessed exclusively by the Ruby+Rails and Sidekiq tiers? (For example, is it “legal” to write posts directly to the database without going through the Ruby+Rails app?)
With a DHT-based P2P network to leverage, other useful functions could eventually be added in a scalable way to D*, for example (a) load-balancing of requests for relatively large content like images, so that for example an image in a post from a tiny pod that gets wide distribution in the D* network doesn’t result in that pod being swamped with requests for the image from the entire network, (b) network-wide features like user search/discovery, © helping other D* functions to scale as the network grows, such as propagation of public posts from originators to followers.
Thanks for your time, I appreciate any feedback or advice you may have!