Central hub

jasonrobinson · November 16, 2013, 12:00am

Here is my proposal to create a central (opt-in) hub of information relating to pods in our little social network. Please read the full proposal before judging it

I would like to vote on this as soon as all constructive comments relating to technical specifications is processed. I am prepared to code it and host it (until official servers exist).

Full proposal here: https://wiki.diasporafoundation.org/Central_hub

Ps. I created this in the main community group, not a subgroup, since I think this is something that all community members should be allowed to vote on without having to hassle with requesting subgroup rights.

Note: This discussion was imported from Loomio. Click here to view the original discussion.

emmanouelkapernaro · November 16, 2013, 12:00am

Sorry if my comment is wrong because I may have not completely understand your proposal although I have read the wiki about the central hub and relays.

I have to tell you that, as I see it, diasporafoundation.org or github repository (which are central hubs in a way) are not the same…

For example if diasporafoundation.org is down, I dont care nothing bad happens to my pod. If github close for ever, I dont care either, because the source code is not dependent on github… git is decentralized…

But if the central hub you are proposing is down, then we have a problem. One of the reasons diaspora is better is because it is not dependent on a central mashine.

From diasporafoundation.org:
What is decentralization?

diaspora* is completely different from most networks that you use. It is completely decentralized, with no central “hub”. Even so, it’s very easy to connect and communicate with people. Here’s how.

jasonrobinson · November 16, 2013, 12:00am

@emmanouelkapernaro the central hub should not be required for a pod to run - so your pod would not be affected in any way if the central hub is down or disappears.

Some services which use it might be affected - but pods would not be affected in any way. Please read the proposal carefully

jasonrobinson · November 16, 2013, 12:00am

Maybe calling it a “hub” was wrong and confusing Infostore? Register?

emmanouelkapernaro · November 16, 2013, 12:00am

Yes, I understand this. If the central hub is down the my pod stops having public tags federation… How this is not affecting my pod in any way?

jasonrobinson · November 16, 2013, 12:00am

@emmanouelkapernaro actually if you read the proposal for relays carefully, the central hub is not needed since pods would cache a list of relays. Anyway, that proposal does not relate to the hub directly - these are separate things. The relay is not even the only proposal concerning public posts.

Is your pod affected now by missing public post federation? I think it might be affected

emmanouelkapernaro · November 16, 2013, 12:00am

@jasonrobinson public post federation now is a missing feature. It affects my pod, yes. I dont want to have something new that also affects my pod, thats why I prefer not to have anything centralized.

So, to understand better your proposal, the central hub is only needed just to hold as much information about what pods are out there and helping them get in touch by distributing this info openly, right? Like podupti.me but more useful for features like federation.

If this is correct, then why not thinking about a decentralized method of distributing information? I am not an expert but isn’t that what DHT is for? Am I too wrong here?

bradkoehn · November 16, 2013, 12:00am

What happens if different pods use different hubs? Can pods use multiple hubs? Is there anything good or bad there?

Also, are the pods listed in the hub (“directory” might be a better term) curated in any way against bad actors (e.g., a pod that is spamming)? I understand that the hub doesn’t move messages on its own, but I’m trying to get at the policy implications.

jasonrobinson · November 16, 2013, 12:00am

@emmanouelkapernaro because making a decentralized way of providing information on the network would be overkill. If one morning, until someone installs a new hub (imagine the data center is wiped off the earth) and restores a backup, you will not be able to see how many pods or users the diaspora network has - will that affect anyone at all? No, because the hub (or directory) should not contain any information that stops pods from working without it. That is stated in the proposal

Whatever system we would make to sync all the information to all pods for this requirement would just overkill. When building software one was meet the needs as is required, not spend time on building stuff that doesn’t offer enough benefit concerning the effort that goes into it (imho at least). It’s not like we’re overflowing with developers and we don’t have anything to do

So my question back is - what would be the real benefit from decentralizing opt-in information concerning the diaspora* network as a whole? A real use case or just because it’s possible?

Also I think you’re missing the main addition by this - information. The main benefit is information on the network towards the rest of the internet. To show that diaspora* is not a dead project. To show the network is growing. To prove to ourselves the network is growing

@bradkoehn Similarly I don’t really see the point for multiple hubs - the benefit is really small. The code would be open source and if the data is open - any disaster to the hub would just involve someone reassign the subdomain IP and setting up a new hub.
But then again, I don’t think there is any reason to not include the hub address in configuration. Maybe multiple hubs are needed when we have 1 million pods, then they would just need to sync up together

As for spamming, the hub collects the information, it doesn’t take information. If someone goes through the effort to customize their pod to give wrong information or makes a dummy pod that talks like a pod - I’m sure those cases can be dealt with quite quickly with some manual admin work. The important part is that the hub calls the pods for info, not the other way around.

I’ll add this to the proposal;

End point to get the full data export from the hub (just a static link to an archive updated daily for example - this should of course not be abused). In a disaster case this export would be imported to a new hub. Thus no one needs to trust the server admin.

Also please remember this is just a suggestion and a draft, please feel free to suggest on changes. And btw, some resources we have centralized would really hurt the network. Like the wiki - if that just disappears we will not have any new pods up since no one will know how to set one up. Would be really nice to have the articles synced somewhere

jasonrobinson · November 16, 2013, 12:00am

Oh yeah, I forgot my original idea that pods could optionally hide some of the dataset. For example some pod might not want to report the amount of users but would want to register to the hub (directory).

I’ll add this too.

bradkoehn · November 16, 2013, 12:00am

@jasonrobinson There are many benefits to having multiple servers; we allow pods to configure ther services they want to use (like Google’s Pubsubhubbub provider) in case they want to isolate themselves from another group of pods, or simply prefer another provider’s services. All centralized services should be selectable by configuring the pod.

Once you get into the business of curating the pod list, the ability to support multiple servers is critical.

I’m not sure I like using a different stack for the reference implementation; if everybody who implements a centralized service does it in the stack of their choice we’ll soon turn into a hodgepodge of disparate server technologies. I guess the API is so small that somebody could whip out a new one with minimal effort, but a new developer now has to learn another stack or write a completely new implementation from the ground up. I don’t think Mongo buys you anything you can’t get from a normal RDBMS anyway (this is pretty much some REST APIs in front of a single table, right?); my guess is that it’s a stack you already know more than the right stack for the team as a whole.

jasonrobinson · November 16, 2013, 12:00am

@bradkoehn That is why I suggest the hub is configurable already.

The MEAN stack is not something new or unknown - it is or at least the components used are becoming very popular. Node is HUGE and really the best tool for things like this, imho. Constantly someone is saying “I would participate but Ruby…” so I don’t think the answer is to just build everything in one language. Ruby is also very resource hungry compared to a pure javascript implementation. There is a reason that Diaspora is made in rails - there is no reason the hub needs to be Also there is no reason to not use something like MariaDB instead of Mongo.

Anyway, technical details can be decided on separately, that was just a suggestion. The main thing is to agree (or not agree) on the need for this.

But if it’s in Ruby I cannot do it since it would take too much time - I don’t have that much interest to learn more Ruby than I need since I want to concentrate on Python and JavaScript. I’ll still offer to host it

ram518 · November 16, 2013, 12:00am

This idea just flies in the face of the decentralized nature of D*, it seems.

jasonrobinson · November 16, 2013, 12:00am

@ram518 how?

crazypedia · November 17, 2013, 12:00am

couldnt much of this be done with an API (in the works, or so ive heard?) in that any one, or any server could ping any or all known pod’s API to get this information, provided the admin has allowed this feature to be enabled? This would allow continued federation of information, and the option of a hidden pod or simply a pod tha has decided to opt out for privacy reasons.

jasonrobinson · November 17, 2013, 12:00am

@jacobschleappi how would you know what pods to ping?
Also this proposal does not force anyone anything - please read the “opt-in” in the proposal mentioned many times.

rekado · November 17, 2013, 12:00am

Isn’t the proposal a bit “fat”? Pods can’t query unknown pods directly for obvious reasons; to address this, much less work is required: one only needs to let pods “register” with any number of trackers and ship the Diaspora server code with a customisable list of such trackers. Everything else can then be negotiated pod to pod.

A tracker doesn’t need to store anything about a pod except for the hostname and the time of last ping (to allow for record expiry).

rekado · November 17, 2013, 12:00am

About pod registration: your proposal says this:

/register Pods will call with this initially when they want to register the pod. This call will be followed by the central hub calling back to check the pod is really a pod.

How would the hub be able to ensure that a) the party issuing the registration is a pod and b) speaks on behalf of the registered pod? Re a: is it a problem if a registering client behaves like a pod but is not? Re b: it should not be possible to register a pod you don’t control. To demonstrate control, the creation of a DNS text record with a token could be sufficient to solve this problem.

This brings me to the next question: if this all only works for Diaspora pods and requires explicit consent — why use REST? Pods are capable of much more complex communication than simple HTTP requests (e.g. stateless vs stateful).

jasonrobinson · November 17, 2013, 12:00am

@rekado I think you (and everyone else who has commented) are missing the (main) point of the whole system - to gather metrics on the diaspora network. And in metrics it’s not just about pods - it’s about users. We cannot have reliable statistics towards the outside world without a reliable way of collecting them (from participating opt-in pods). Listing pods is another thing, collecting user counts is another.
Yes we could do as you say and have a gazillion of trackers. You can call this hub I propose a tracker because that is what it would be. I’m happy to have many trackers - but I don’t see how that makes the system any better.

I’m going to create a vote on the public metrics thing - I should have started there. Clearly proposing something technical only made sure to hide the whole point of the proposal, eg the end result.

On the other questions;

How would the hub be able to ensure that a) the party issuing the registration is a pod and b) speaks on behalf of the registered pod?

Simple - it doesn’t save the pod until it calls back. If the pod “speaks pod” - it’s a pod. If it replies “wtf I don’t want to be registered go away” - then clearly it did not make a request. This is the same way “verify your account by clicking the link in the email” works to ensure ownership (well, other direction, but same principle).

To demonstrate control, the creation of a DNS text record with a token could be sufficient to solve this problem.

We’re gathering statistics, not solving privacy or authentication issues - no need to shoot a fly with a cannon Making pod admins have to modify their DNS just to register is a bit too much imho. But of course, the way I indicated was only an example - it would be nicer to agree on whether this thing is needed.

if this all only works for Diaspora pods and requires explicit consent — why use REST? Pods are capable of much more complex communication than simple HTTP requests (e.g. stateless vs stateful).

What would that achieve that a simple POST/GET would not? To be honest, I’d agree to anything, the end result is the most important. I just don’t see the point of building complicated things to make something simple. We should leave the complicated things for the important stuff, eg Diaspora itself.

jasonrobinson · November 17, 2013, 12:00am

Proposal: Does the diaspora* network need statistics?

Ignoring all the technical details on how to accomplish this, please consider the following.

Would you want the diaspora* network to gather statistics from opt-in participating pods (with default being not participating in new pod configuration) about the following;

Name of pod
URL of pod
Registrations open / closed
Version
TOS (when implemented)
Amount of local users
Amount of local users active last 6 months

The statistics would be fully open to anyone with full transparency and possibility to opt-out for pods who change their minds.

Outcome: Will do in my own repo

Votes:

Yes: 7
Abstain: 2
No: 5
Block: 0

Note: This proposal was imported from Loomio. Vote details, some comments and metadata were not imported. Click here to view the proposal with all details on Loomio.