Public post federation

macieklozinski · March 17, 2014, 12:00am

Can you tell more about the search-based approach? When the search would be performed and how often? By whom and on which servers? What exactly would be searched for?

rasmusfuhse · March 17, 2014, 12:00am

There is more than one possible search-approach. Jason for example would like some central search-server(s) like friendica or redmatrix have. But it would also be possible to have a search-endpoint on each pod that might be called by “neighbor” pods periodically or only if a user requests a search. But those details are not the question at this early stage, I think. The big question is still what do we want: pushing the news or pulling the info?

macieklozinski · March 17, 2014, 12:00am

Maybe better than wonder and debate, it would be better to try one way, and when it’s not ok, then try another. There are quite a few ideas for developers to choose from. Maybe we should let them decide what is easier/faster to implement?

jasonrobinson · March 17, 2014, 12:00am

Interesting proposal @macieklozinski - not a bad concept imho. Would love to hear from the more federation-stuff experienced devs.

Although imho I still think federating public posts should be outsourced outside pod software itself. Podmins are already complaining about heavy sidekiq processes - keeping public post federation in the core code would be a big burden to all pods.

Would need to do a simulation to calculate really

But I agree that we should just do something Any sane implementation would be cool.

markwilliams2 · July 5, 2014, 12:00am

I first tried Diaspora a few years ago by joining a pod with lots of users on it, and loved that right after creating my account I had posts appear in my feed that matched the tags I was interested in. I finally returned to D* a couple weeks ago to help with development, and was disappointed after setting up my own pod how lonely it feels without public posts from the rest of the network being pushed to me! So I’m glad to see so many here who agree that federation of public posts is a very important feature for D*.

Doing this right is not trivial, but I think a DHT-based (Distributed Hash Table) solution might be the right fit. I’m not an expert on the various flavors of DHT out there, but after doing some research it looks like Pastry might be a good choice. In particular, there is already a publish/subscribe application called Scribe designed for it, and an open source implementation called FreePastry. In a nutshell, the Pastry+Scribe combination provides O(log(n)) average routing hops between nodes, high tolerance of nodes entering/leaving the network, automatic load balancing of topic subscription management and notification multicasts across the network, and the ability to structure the routes between nodes in a way that minimizes overall latency/bandwidth (or other relevant metric.) The idea would be that every D* pod would run a node in the DHT network, which would allow the overhead associated with managing subscriptions and disseminating public posts to subscribers to be automatically shared among all the network’s nodes.

I am going to run some simulations using FreePastry+Scribe to verify this approach for a “hashtag subscription” feature for D*, but before digging in too deeply I have a few questions:

FreePastry is written in Java 5 and its architecture takes advantage of Java threads and asynchronous IO. It might not be a trivial exercise to port this to Ruby+Rails, and I think in any case it would be best to keep any new DHT component cleanly decoupled from the main D* application. What is the development team’s stance towards adding a JVM instance (OpenJDK 6/7) as a new tier to the pod design? I think it would complicate pod setup and configuration a little, but probably not too much.
FreePastry’s implementation uses its own TCP connections for messaging, and UDP for keep-alives. Its architecture is very modular, and so it’s probably possible to proxy all its communication through D*'s existing https-based communication scheme if absolutely necessary. But in the interests of performance and clean design, the much better approach is probably to let the Pastry tier handle its own P2P network communications, and let it communicate through a web services API locally with the Ruby+Rails tier and/or directly with the local database for everything else. In terms of security, https isn’t needed, since we’re dealing with public posts; all that needs to be done is make sure that the payloads carried by Pastry are cryptographically signed. The downside to the separate P2P communication is that it would add to the firewall setup requirements for a pod (although FreePastry already has the ability to use uPnP to open its own ports to the internet, where supported.) How does the development team feel about the idea of requiring additional ports to be opened between pods and the internet?
Are there any parts of the database that are designed to be usable as an interface, i.e. not meant to be controlled and accessed exclusively by the Ruby+Rails and Sidekiq tiers? (For example, is it “legal” to write posts directly to the database without going through the Ruby+Rails app?)

With a DHT-based P2P network to leverage, other useful functions could eventually be added in a scalable way to D*, for example (a) load-balancing of requests for relatively large content like images, so that for example an image in a post from a tiny pod that gets wide distribution in the D* network doesn’t result in that pod being swamped with requests for the image from the entire network, (b) network-wide features like user search/discovery, © helping other D* functions to scale as the network grows, such as propagation of public posts from originators to followers.

Thanks for your time, I appreciate any feedback or advice you may have!

melroyvandenberg · August 6, 2014, 12:00am

@markwilliams2 I love the idea, and I’m also researching the problem and possible solutions. See point #2 on my list: https://wiki.diasporafoundation.org/User:Danger89

I hope that we can come in contact with each other to discuss this futher and finally try to implement a working prototype.

melroyvandenberg · August 7, 2014, 12:00am

melroyvandenberg · August 7, 2014, 12:00am

Let’s place is like this: I think to make Diaspora a good decentralized social network, the relational database should be removed and replaced by an Apache Cassandra database (for example), at-least a database which is vertically scalable with high availability & reliability.

This is also known as ‘NoSQL database environment’. This means in fact… that the current project as it is should be rewritten almost entirely (!) to compete against existing social networks like Facebook, Google+, Twitter, etc.

So… Good luck

jasonrobinson · August 7, 2014, 12:00am

@melroyvandenberg I think you will find little support for a complete rewrite - unless you do it yourself You can always fork and replace the DB.

diaspora* started with MongoDB which didn’t work for some reason. Do you mind explaining in more detail why you think a NoSQL database would be better than a relational database, for diaspora*?

melroyvandenberg · August 7, 2014, 12:00am

@jasonrobinson I try to dive deeper into distributed hash table (DHT), which makes it possible to search users within the network (regardless of the pod). But the same will work both public messages and hashtags, etc.

A non relational database and using hashing (key-value) will make this possible. That is the current problem of Diaspora, the decentralized network isn’t really connected, a pod floats in the Internet currently.

melroyvandenberg · August 7, 2014, 12:00am

Maybe this site gives you a better explanation of the implementation details of the idea of DHT (funny sentence):
http://www.rackspace.com/blog/cassandra-by-example/

jasonrobinson · August 7, 2014, 12:00am

That is the current problem of Diaspora, the decentralized network isn’t really connected, a pod floats in the Internet currently.

I think that is the whole point that pods “float” in the internet. I’m quite sure the current model isn’t something that would be powerful and scalable enough to take on a network like Facebook - but the diaspora* server isn’t really something that is supposed to do that IMHO. It’s just server software - and there is no requirement to connect to the wider network of diaspora pods.

To make it really big the server should be just nodes that people can run that automatically enhance the network. Now things are different. Each pod is very independent with absolutely no constraints placed on how to run it or even on what configuration.

The diaspora* network really only federates on the protocol level. What uses the protocol doesn’t matter. There is already Friendica (made in PHP) that talks the diaspora* protocol. There is also a Python version (Pyaspora) that also talks diaspora*.

goob · August 7, 2014, 12:00am

diaspora* started with MongoDB which didn’t work for some reason.

Sarah Mei (a previous developer for Diaspora) wrote this article about why MongoDB didn’t work.

makes it possible to search users within the network (regardless of the pod)

It is already possible to search for users on other pods. Melroy, the problems you’re encountering (including some of those on your to-do list) may be because you’ve set up a new pod for yourself very recently. One of the software’s problems is the case when a new pod connects to the network for the first time - at first it doesn’t have established connections with other pods, so things such as search and following #tags return no results. This is a real problem, and it is something that could usefully be tackled.

It may be that changing the database isn’t the answer to your problem: simply running your pod for a while, making connections with other pods, will bring the results you’re looking for.

melroyvandenberg · August 7, 2014, 12:00am

@goob far enough, however… I got 2 registered persons, who doesn’t do anything… nothing happens with the system, it will not ‘connect with other pods’, meaning it will not share information among other pods including myself.

That is the problem, the base of a social system should be sharing. That is where DHTs kicks-in, however, this requires a whole different way of thinking

jasonrobinson · August 7, 2014, 12:00am

@melroyvandenberg what is you d* handle - or add me at jaywink@iliketoast.net

Yeah as @goob said, this is a huge problem. We need some addition to the protocol to support network wide searching (imho hackish and a large burden to the network) - OR a central hub that would be queried (opt-in publishing handle there).

Unfortunately the opposition to any “central helpers” is kinda strong here - maybe that sentiment will change

jasonrobinson · August 7, 2014, 12:00am

Public post federation (pushing) around the network is also one thing - a few proposal have been made to tackle that here and here at least - but no implementation yet.

melroyvandenberg · August 7, 2014, 12:00am

More info on wiki:
http://en.wikipedia.org/wiki/Distributed_hash_table

goob · August 7, 2014, 12:00am

That is the problem, the base of a social system should be sharing.

The basis of Diaspora is sharing. Every pod in the network shares data (where appropriate) with every other pod. The problem you are experiencing - and it is a major problem, which needs solving - is how to get your pod to start connecting and sharing data with enough other pods to receive all relevant content.

I don’t think it’s a problem with the type of database being used - it’s a problem of your pod, new to the network, knowing what other pods are part of the network and how to find them in order to be able to access their databases.

goob · August 7, 2014, 12:00am

Have a look at our tutorial series on getting started, which will tell you how the network should work, once your pod is connected to other pods.

I made a proposal to help in certain situations, one being when a new pod is added to the network, here. Apparently some of my proposals would not be scalable as the network grows in size, but there are various ideas knocking around related to this problem of the experience of new pods. If you can help solve this problem, that would be fantastic.

melroyvandenberg · August 7, 2014, 12:00am

I still think the solution could be DHT, please read the Bittorrent DHT spec about nodes / node ID’s and route tables:
http://www.bittorrent.org/beps/bep_0005.html

And read ‘5.4 Bootstrapping’ of the Facebook Cassandra PDF:

EDIT, even Patrick McFadin says it: