Public post federation

jasonrobinson · July 16, 2015, 12:00am

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "http://the-federation.info/social-relay/well-known-schema-v1.json",
  "type": "object",
  "properties": {
    "subscribe": {
      "type": "boolean"
    },
    "scope": {
      "type": "string",
      "pattern": "^all|tags$"
    },
    "tags": {
      "type": "array",
      "items": {"type": "string"},
      "uniqueItems": true
    }
  },
  "required": [
    "subscribe",
    "scope",
    "tags"
  ]
}

jasonrobinson · July 16, 2015, 12:00am

social-federation can now generate it.

jasonrobinson · July 16, 2015, 12:00am

Actually, @jhass @dennisschubert do you want the .well-known/social-relay generation in diaspora core or diaspora-federation gem? It’s not really part of the federation, more like add-on system to push posts around, so I’m kinda hesitant to push it there. Can I just add a new route/controller/presenter etc, like the current statistics.json is done?

Or should I make a gem well-known-social-relay?

jhass · July 16, 2015, 12:00am

I guess having it in the core would be okay for now, should be fairly easy to push elsewhere if needed.

flaburgan · July 26, 2015, 12:00am

Okay I read the specification and discussed with @jasonrobinson on IRC about it.

First of all Jason, thank you very much for dealing with this important problem of diaspora*.

Although this proposition solves most of the problem, there are some points we should should be careful about:

[Warn] External dependency for a core feature is dangerous. To send messages to other pods is the core feature of a pod. To use external app servers to do that means the network would have a big dependency to a few servers, which can be attacked or not correctly maintained. This looks dangerous to me.
[Warn] On the same topic, to use a centralized list of pods is a potential vector of attack / problem. We’re loosing part of the force of diaspora* here.
[Warn] Pods are not equal anymore. Until now, the difference between pods was on side features like services enabled or chat. With this proposition, we would have to explain to user that, depending where they choose to register, they will not have the same content available. This is the opposite of what we always said, and this is exactly the problem we are trying to solve: we don’t want users to choose a pod because they have to go there, because that’s where the content is.
[Blocking] The interactions on posts that are transmitted by relay are not federated. This point is a blocking point to me. It completely breaks the usage of diaspora* and means a lot more complains about the federation being broken. I don’t see the point of displaying a post if I know that only the users of my pod will see my reaction on it. Most of the time, I want to answer to the author of the post.

For those reasons, I think your proposition is not a good solution. I’ll try to propose something else soon.

jasonrobinson · July 26, 2015, 12:00am

For those reasons, I think your proposition is not a good solution. I’ll try to propose something else soon.

This is not a proposition any more. It stopped being one when I changed the original one not to depend on the core so much. Right now it depends on only the carbon copying of posts outwards - even if that which is now in develop was reverted I could do a single commit patch which podmins could pull in if they want.

So this is pretty much live now, just not fully functional. I already see diasporapr.tk sending posts out to the relay I’ll push the latest changes to the relay live early next week so posts will be relayed for the first time and start real world testing.

[Warn] External dependency for a core feature is dangerous.

It’s not a core feature. The core feature is to NOT deliver posts by design to all pods. And that works and will continue to work.

[Warn] On the same topic, to use a centralized list of pods is a potential vector of attack / problem. We’re loosing part of the force of diaspora* here.

Part 2 would be decentralizing the relays themselves. Initially yes each pod configuring a single relay makes it weaker. But less weaker than pod email delivery or hosting, which is the weakest part of diaspora, users being locked into a single server for life. And since this is not a core feature, like user login is…

[Warn] Pods are not equal anymore.

They are even less equal now. Right now it makes sense to join a large pod, to see many public posts. Setting up your own pod doesn’t make sense. Using relays will make pods more equal.
But, the relay will also enable pods to be more strongly themed, for example a pod could subscribe to only linux and open source posts, ignoring all the other stuff.

[Blocking] The interactions on posts that are transmitted by relay are not federated.

Well, the same problem is with reshares. And the interactions can be solved, just have to decide which way to go, to relay them or to only use relays for the initial post delivery. I think only using this in real world will tell which is better. Anyway, it needs to happen before 0.6 is released and also before that the participations bloat needs to be dealt with and the federation tuned to be more efficient. Will be submitting something for both these for consideration.

I don’t see the point of displaying a post if I know that only the users of my pod will see my reaction on it.

Not entirely true. Since a pod which gets a post via a relay will fetch the author contact (by diaspora protocol design), interactions will be sent to the original pod as if the pod had delivered the post. The problem is that afaict the original pod will only relay the interaction as normal, not to other pods that depend on relays. This is how I understand it:

pod A <— author of post
pod B <— contact of pod A author
pod C <— not in contact with pod A author
pod D <— not in contact with pod A author

So when pod A author sends a post it will be delivered to pod B user directly and pod C and pod D users via relay (assuming both subscribe in this case).

Initial relay concept doesn’t relay interactions, so when a user on any of the pods comments:

pod A will receive it
pod B will receive it (since pod A relays it)
pod C will not receive it (unless done from pod C)
pod D will not receive it (unless done from pod D)

But this situation is fixable by defining whose responsibility is to do what. Of course, either the relay should take care of whatever “broken” links it creates OR it should create participations so that interactions flow as they should. Though as said, the current reshare concept also has these kind of bugs.

Thanks for your comments and while I’m looking forward to seeing a proposal to the core that would solve federating all posts to whoever wants them but allow still pods to not receive all posts, I really doubt that kind of solution is doable to the core and it wouldn’t even make sense to bloat the core with it.

flaburgan · July 26, 2015, 12:00am

I wrote what I have in mind on https://wiki.diasporafoundation.org/Follow_other_pods_tags

I’ll now read your answer

flaburgan · July 26, 2015, 12:00am

In my opinion, it is a core feature to deliver the message. Currently, this feature is incomplete because it doesn’t allow to follow tags on other pods. So, we have to patch the core, not to build another tool to balance its weakness. Related, the fact that pods are not equal now is due to this incomplete federation, not because of a setting. This is really important to me. In the first case, it only means we need to improve the software, when in the second case, it means the equality is broken by choice. A bad thing in my opinion.

Part 2 would be decentralizing the relays themselves.

With the “perfect situation” becoming one relay per pod? And then, to make relays to forward interactions? I can’t loose the feeling that we’re building another network on the top of the diaspora one instead of patching it here.

jasonrobinson · July 26, 2015, 12:00am

In my opinion, it is a core feature to deliver the message. Currently, this feature is incomplete because it doesn’t allow to follow tags on other pods. So, we have to patch the core, not to build another tool to balance its weakness.

Well I still disagree - the core doesn’t have to be a does everything solution. It’s bloated as it is and already takes too much resources to run. Granted, the relay system will increase the load across the network, but it will increase it less than if all the pods did all the work.

Related, the fact that pods are not equal now is due to this incomplete federation, not because of a setting. This is really important to me. In the first case, it only means we need to improve the software, when in the second case, it means the equality is broken by choice. A bad thing in my opinion.

Well, as we are talking about a decentralized place, pods should be allowed to be not equal if they want to be.

I read your tag based proposal and it could be a nice improvement to the core. However, as you note, it would not help in the case of new pods which would still have to do a lot of manual work to register with this and that pod. The relay system only requires a new pod to register with a pod list - and relays could even use many pod lists or even be pod lists themselves.

Also I don’t believe this is true:

Every interactions is possible on the posts received with that solution, so answers (comments), likes and reshares will be received by the original pod which created the post and all the others which received it

Assuming you mean that participations would also follow the tags in the post, then this is true only to the point where users don’t stop following tags. If the last user stops following a tag on a pod, the relations would stop going through.

All in all, that could be a nice addition to consider for the core (with maybe the addition that only active users tags are considered, not everybody) but IMHO it doesn’t solve the broken network problem like the relay does. It only makes the broken network problem less dissipate faster, but the effect is the same for brand new pods. The solution would also be heavier on every single post for post delivery.

flaburgan · July 26, 2015, 12:00am

I read your tag based proposal and it could be a nice improvement to the core. However, as you note, it would not help in the case of new pods which would still have to do a lot of manual work to register with this and that pod.

That is true, but it is a different issue in my opinion, this is what I would call “network discovery”. It is not only about tags, we can want to find users too for example.

About the tag following problem and my proposition, if the pod knew every other pod on the network, the problem would be solved. So we can choose to solve this by simply fetching the list of pod from the-federation.info, as you propose to do for the relays.

Assuming you mean that participations would also follow the tags in the post, then this is true only to the point where users don’t stop following tags. If the last user stops following a tag on a pod, the relations would stop going through.

Not sure what you meant here. What I meant was, if you write a post about #diaspora from your pod, that I receive it because my pod told yours that it is interested about diaspora*, and then Jonne answers on your post from his pod, I will receive Jonne’s answer because your pod knows it sent me the message so it is able to forward Jonne’s comment.

active users tags

I don’t get what you’re talking about?

jasonrobinson · July 26, 2015, 12:00am

and then Jonne answers on your post from his pod, I will receive Jonne’s answer because your pod knows it sent me the message so it is able to forward Jonne’s comment.

You mean pods would explicitly track who they’ve sent posts to? I think it works currently the way that contacts are checked through (sharing and shared with) when deciding where to send. I don’t think posts “remember” where they have been sent. I might be wrong

active users tags

I don’t get what you’re talking about?

Just a small detail. For active tags it makes sense to only look at tags followed by active users. Otherwise a user that logs in once and follows a tag will cause the pod to forever follow that tag. The relay subscription prefs work (if set so) using the 6 month active users.

jasonrobinson · October 3, 2015, 12:00am

Added some notes and ideas regarding the participations relaying to our Paris board. Would love to discuss at least for some brainstorming.

richarddecal · December 14, 2015, 12:00am

Re: Jason’s “pods should be allowed to be not equal if they want to be.”

I strongly believe that which content any user wants to subscribe to should decided at the user-level rather than the pod admin level. If one user wants to follow basketball posts, and another wants to follow Linux posts, they should make that decision rather than it be imposed on them by some stranger. I don’t want to join a pod only to find out the admin severed my access to one of my interests because they don’t share that interest.

jasonrobinson · December 14, 2015, 12:00am

@richarddecal

If one user wants to follow basketball posts, and another wants to follow Linux posts, they should make that decision rather than it be imposed on them by some stranger.

I couldn’t agree more. Currently, the defaults are probably not the best, for the diaspora* relay code. Could probably change them before the relay hits “mainstream” in 0.6, currently it’s only in development pods.

The defaults are:

inbound:
  subscribe: false
  scope: tags
  include_user_tags: false
  pod_tags:

So, podmin must change “subsribe” to true to enable the functionality and “include_user_tags” to true, if user tags should be collected. I think the latter should be changed to “true” by default.

I’d enable the whole relay functionality for user tags on by default but that would never pass And since the relay is third-party stuff, it’s prob a good idea to keep it off by default.

jasonrobinson · December 14, 2015, 12:00am

(also, the code allows mixing, podmin can define tags and still have user tags being subscribed to)

jasonrobinson · January 9, 2016, 12:00am

My proposals to solve the participations and relay decentralization issues.

paulsutton · January 22, 2016, 12:00am

If we use relays would that information on which relays are up be available on a site such as podupti.me as that also shows if pods are down (for what ever reason) it could help developers improve the network and identify problems

alexstacey · May 17, 2016, 12:00am

Hi guys. I’m #newhere and don’t know much about the existing architecture of d* but I read through lots of this thread yesterday with interest, and have a couple of comments…

If I understand correctly, some proposals involve pushing public posts to pods that have users following certain tags. This seems problematic to me as it ignores past public posts. For example, if a user starts following #privacy and happens to be the first user on that pod to do so, they will only get future posts; they won’t be able to look through the history of that tag. I don’t think that would be the expected (or desired) behaviour.

The alternative that came to mind (which may well have been suggested already) is that each pod could publish a list of the tags that they have public posts for, and then they could be pulled in when needed. So, using the example above, when the first user starts following #privacy, the pod then (somehow) finds all of the other pods that have public posts for that tag and pulls them in. Something like that also gives more power to certain pods to decide what they want to pull in. Some pods might want to ignore #nsfw for example.

Anyway, just my thinking while reading this thread. Excuse me if I’m repeating what has already been said.

flaburgan · July 10, 2017, 6:23pm

This topic appeared again in a conversation in diaspora* and I definitely agree this is the biggest weakness of the software at the moment. I personally consider it as a bug, not as a missing feature. Users should be able to receive all the content they want to read.

So, I was answering as usual that the remote tag following “feature” is still not developed when this comment made me doubt. I always answered that receiving all the public content of the network would blow up the database. But now with the huge improvements made by @supertux88, is this still true?

So I checked.
On diaspora-fr (running postgreSQL), there are 2,750 users and 62,038 people (external users), 2,363,784 posts (50,331 are local) and 1,613,855 comments (140,314 are local). \l+ says the database is 5609 MB big.

On framasphere (running postgreSQL), there are 41,460 users and 84,558 people with 2,820,140 posts (352,792 local) and 1,650,759 comments (400,156 local). The DB is 8964 MB big.

Podmins, what is the size of your database? Can we try to extrapolate the size of a DB containing the whole network? Would that size be acceptable and if so, should we send every public posts to every pods out there?

supertux88 · July 10, 2017, 6:38pm

Yes. I don’t know how big the database of joindiaspora or geraspora is, but I’m pretty sure that my pod couldn’t handle such big database (which is still growing). And I think that my pod can handle more than most pods, there are pods much smaller than mine which already complain about storage usage. Not everybody has a big dedicated root server with many resources.

My cleanups didn’t change anything about posts, only interactions (comments and likes) need now less storage.

The problem is, that the database only grows. We would need to remove old remote-content, then we would be able to store all public content from now and then remove it later again (still store it on the home pod of the content). So not every pod needs to store all old content. We could then fetch old content from the home-pod of the content if needed. But that is a really big and complex problem. (That is not a final solution, that was just my result after thinking about this problem many times … but it’s still not a “that’s easy, lets do it”.)