Public post federation

jasonrobinson · November 16, 2013, 12:00am

Great @bradkoehn ! And @goob all ideas are welcome - they all add to our ability to build the best solution.

There are many similarities regarding mine and Brad’s idea - especially the part that relates to a central hub to store pod information. I really really am sure we need this, for many reasons. I’ll put something in the wiki and separate this into it’s own place since it is kinda separate though required by the public post federation/aggregation. Once we have a spec we’ll just need to vote since I know some community members don’t like this idea even if it is totally opt in

As for the idea from Brad, I’m quite sure it would do the job and would be happy if either idea was implemented. Initially I was questioning the idea of pull instead of push, but I guess the pubsubhubbub takes care of that problem (even though then we do rely on those external services, the default diaspora uses is from google).

I do think though that my relay server idea is lighter because there is no need to save posts. It also handles redundancy - pods are not tied to any particular aggregator and thus even if all but one of them are down the post will be delivered to all listeners.

Security I guess in both would be the same. Except I see some worry that an aggregator could be populated with non-authentic posts, and even if no pods accept the posts, some other source might do. Since the aggregator would have an open interface, it wouldn’t take long for someone to build an app to show posts in the diaspora network going through the aggregator. In this situation it would be trivial to inject posts into the aggregator, unless the aggregator checks all of them. In the relay idea this is not a risk since the aggregator doesn’t store posts.

Any other opinions on these ideas?

flaburgan · November 18, 2013, 12:00am

I don’t know enough to talk about the technical point, but I know something: I’m strongly opposed to anything which would involve Google services (even if it’s public data): we saw how they turned of Google reader, or decide suddently to make Google Maps API a paying service. I don’t want to depend of a company for a feature critic like this one.

macieklozinski · November 19, 2013, 12:00am

There are many P2P networks and routing protocols out there, In my opinion we should go this path.
What if every pod was a relay for it’s own users’ followed tags?

loelo · November 20, 2013, 12:00am

I’ll prefer the distributed path even if some seeds/pods would loose a bit of not federated info. (Not sure to understand how it would work out though). @macieklozinski : could you precise how it would work for tag federation ?

macieklozinski · November 20, 2013, 12:00am

I’ll try to do some deeper research on possible p2p solutions.

macieklozinski · November 20, 2013, 12:00am

I’m not sure if it fits well with Diaspora’s protocols, but I could suggest something like this:

When user A shares with user B on another pod, user B’s pod becomes “neighbor” of user A’s pod.
User B’s pod “subscribes” to user A’s pod for all tags that user B’s pod users follow.
Each pod keeps a list of it’s neighbors and tags they subscribe.
When user on a certain pod makes a public post, it’s sent to all neighbors subscribed for tags present on this post.
If a pod receives a public post from other pod and does not have this post in it’s database, it passes it to all neighbors subscribing to tags present in this post, and saves post to database.
If a received post is already present in database, nothing happens.

goob · November 28, 2013, 12:00am

I think there’s a big difference between having a central hub which contains information pertaining to the D* network but which is separate from the network itself (such as the project website, poduptime, etc), and a central hub which is an integral part of the D* network and receives/sends/stores data from that network, such as post data, which is what is being proposed here. With a central hub as an integral part of the network, the network would no longer be fully distributed.

If a central hub of any sort is actually needed in order for post/tag federation to work properly, I suggest it be restricted to holding meta-data, such as a list and IP addresses of pods or relay servers. This could be the same central hub which helped people to choose a pod to register at, as poduptime does at the moment.

It would then only be referred to when a new pod or relay server was brought online. The new pod would then call hub.diasporafoundation.org (for example), which would give it some pods/relay servers to contact from which it could pull post data. The actual transmitting of post data would be done by the pods/relay servers themselves, with no involvement from the central hub.

This is similar to one of the proposals I made in this discussion on adding pull to Diaspora’s push model (the proposal concerning tags).

I’m not sure relay servers separate from the pods themselves would be needed; I think there is a way of making pods federate public data more effectively without using a separate network of relays, if they are connected correctly together.

Note that in the following, when I talk of connections/sharing between pods, I’m not talking about the normal connections between pods which exist, but a kind of meta-network to push public data around more effectively, of the kind Jason talks about in his proposal.

I would suggest using a kind of ‘cell structure’, in which each pod is connected directly with several other pods in the network, and through that structure build up a list of public posts and tagged posts data to pass on to other pods. This avoids the problem of scalability faced if ‘every pod knows every pod’. If the relay connections between pods are made correctly, public data will be federated to every pod quickly, via indirect routes (Pod A shares it with the several pods to which it has direct connections; those pods share it with the pods with which they have direct connections; and so on). If there is redundancy built in to this network, it won’t matter if several pods in this network are down; the data will get fed around to the whole network eventually in any case.

It might be that each pod needs to be connected only to two other pods in the network for this to work, like the classic Communist cell structure – as in the graphic below (not perfectly illustrated, but it gives you an idea):

Cell structure network

I’m sure there is a way of coding into the D* software itself so that it builds a network of connections such that each time a new pod is brought into the network, the network recalibrates its connections so that this new pod is made a part of the sharing network, without reference to any external source such as relay servers or a central hub. Likewise each time a pod drops out. However, I would have no idea how to do this! I hope someone out there will do, and that my partly developed concept will spark ideas for practical solutions in their mind.

If a central hub is needed to help new pods get connected, I think we should have a mirror or two on other servers just in case the project site is down when a pod is brought online.

jasonrobinson · November 28, 2013, 12:00am

@goob , will read the rest of you long comment later, but you should maybe read my proposal too

a central hub which is an integral part of the D* network and receives/sends/stores data from that network, such as post data, which is what is being proposed here.

I have proposed no such thing. This is the reason I stopped the whole vote for the central hub because not many people even understood my proposal.

goob · November 28, 2013, 12:00am

My mistake. I did read you various proposals and wiki articles, but there’s been so much to read and digest that I got confused. I read that suggestion somewhere on one of the several threads on this/related topics, then while writing I got my wires crossed and thought it was you who had proposed it.

Just ignore the last seven words in that extract. The point stands, no matter who proposed it, or even if no one has proposed it yet!

jasonrobinson · November 28, 2013, 12:00am

OK read the whole post now. I think we are thinking on similar lines. However, as a software developer I always think of one of the golden rules of software design - making sure each component has one purpose and that only. Incorporating everything and the rest too is possible - hey we could make diaspora also serve files and incorporate an IRC server + maybe do some test automation services on top. But it’s a bad idea. Diaspora server as it is now exists to provide the UI for the server. The federation stuff is actually being pushed out of the main component just so that diaspora will be more flexible. Why would we want to bundle up more non-UI related features then?

IMHO, the system to federate posts around should be decentralized, but it should also be it’s own mini-network of volunteers. This is exactly what my relay servers proposal is about. A bunch of relays taking care of the public post handling in a decentralized way - and pods will not even have to decide which relay to use, giving total redundancy even if all except one relay is down.

I still feel many people misunderstood this which is why when I finish the statistics hub, I’ll start working on a POC relay and see if I can provide the hooks on D* side (the more difficult part for myself, being ruby).

Also, as you said, we could federate the metadata for relays around totally without a central hub. Sure it’s possible, but imho it’s a bad idea. It adds nothing to decentralization and does not benefit anyone in any way except adding complexity. A simple list on the project site would do fine, since pods would only need to pull it in every so often to refresh their list.

Decentralization is a good thing and awesome - but it’s not a magic word to use with everything and assume that it makes thing better.

jasonrobinson · November 28, 2013, 12:00am

Btw, my original proposal said storing the “wants tags” list on the central hub. This is not really necessary if such data is stored on the relays instead. It just would mean more posting of said lists around since all relays need to know asap or the pod will miss posts. Storing the list on the central hub would make for less bouncing off lists around - if the central hub is down it doesn’t matter since relays have the latest list and will then refresh once the central hub is back up.

At no point in the proposal was I proposing that traffic stops when the central hub is down

goob · December 21, 2013, 12:00am

Does anyone know what it is specifically in the code or structure of the Diaspora network which is causing public post federation to work unreliably?

If it is because Diaspora relies on push notifications to transmit data between pods, could this be solved by allowing a pod to send pull requests to other pods in the network for any data missed when it comes online after some downtime or after being overloaded and unable to receive communications from other pods, or after it is brought online to the network for the first time? I propose a potential solution to this under ‘Non-communication’ (the third point) in the discussion about adding pull to the push model for federation. While there are concerns about scalability for the other points (getting new pods fully connected and federating tags) in my post, hopefully enabling a pod to send a pull request to other pods when it comes online so it can pick up data (including public posts) it missed while it was offline would help federation of public posts at least in some circumstances where it currently fails.

If we could identify the various factors causing causing federation of public posts not to work properly in different circumstances, it would, I’m sure, be a big help in solving the problem.

jasonrobinson · December 21, 2013, 12:00am

@goob it’s not that public post federation does not work properly, it’s that it’s not implemented at all. Currently posts just end up on various pods - there is no technical design to say that any public post should be available to any subscriber on any pod.

Personally I want to start prototyping the relay concept - I think once I have a working demo it might be liked

goob · December 21, 2013, 12:00am

OK, thanks. That definitely sounds like a design flaw!

goob · February 17, 2014, 12:00am

By the way, @bradkoehn, I think it would be worth making your proposal here in Loomio, as proposals on the wiki tend to get overlooked.

ryunoki · February 17, 2014, 12:00am

Thanks, @seantilleycommunit, for granting me writing permission here

@jasonrobinson: Say, I’m a bad programmer guy.

Relay receives a post from a pod

Relay already has a cached list of pods and what hashtags they want so relay will deliver post to pods that are interested in one or more of the hashtags in this post. Relay is not for public message keeping - it will delete any posts as soon as they have been pushed out.

Could I misuse your proposal in any way by running a modified version of the code, which does not delete any posts?

It would come in handy, if you could list the information, you want to “store” in the directory/hub, to better judge this proposal.

jasonrobinson · February 18, 2014, 12:00am

@ryunoki well, since all the posts are public that would go through the relay, what does it matter if someone would?

You could already do it, just start saving posts from large popular pods by following a few hundred popular tags.

Actually the relay way you would only get a subset of posts - the more relays, the less posts that go through each relay. Say 5 relays, you would only get approx 1/5 of public posts from opt-in pods, and even then only those with one or more hashtags.

By proposal, the hub would not store anything else than data related to which pods should receive which tags. So something like a dictionary with pod host and N tags that it wants.

ryunoki · February 18, 2014, 12:00am

I’m trying to consider worst cases to improve the proposal, Jason. That’s all

macieklozinski · March 17, 2014, 12:00am

my federation protocol proposal:
https://github.com/loziniak/diaspora_federation

rasmusfuhse · March 17, 2014, 12:00am

In my opinion the big question is:

Is it better make following of hashtags be part of the protocol (like Maciek’s proposal).

Or is it better to make a search-endpoint be part of the federation, which can be used to search for postings with hashtags (and maybe users and other stuff).

Both ways will work, but which way is more reliable and more performant? Is there any third option?