Cached user data breaks federation?

grin · May 15, 2019, 9:11am

I think I have figured out why my pod have extremely weird problems with federation, but I don’t see how could I apply the proper magic to fix it.

The pod was sleeping for more than a year (much more), then heavily upgraded and restarted. The federation occasionally work, but often not. I see some posts immediately, some others heavily delayed (20+ days) or missing, others cannot be forced to be seen. Some foreign servers cannot access local accounts.

I have noticed plenty of 404 in the proxy log, like

"POST /receive/users/0e8377826c716733 HTTP/1.1" 404 0 "-" "DiasporaFederation/0.2.6"

which suggest to me that the foreign server (diasp.org this time) thinks that grin@diasp.grin.hu have a different user id than it really has (https://diasp.grin.hu/people/1c6d9630daf1013291df1fa34b479473 seems to be a link to the user, but I’m not sure they use the same id).

But anyhow it looks like the foreign server thinks that the user have an id which doesn’t exist.
I don’t see how could I

force a foreign server to forget its cache and retrieve the ids again,
make old contacts’ posts pulled in (and I suspect the same effect may cause the breaking),
generally how to prevent this (apart from “not stopping pod for extended time”),
how to actually check and debug federation problems,
and to see whether a completely reinstalled pod would absolutely break federation.

grin · May 15, 2019, 9:27am

…and indeed webfingering the account results the second id as well.

jhass · May 15, 2019, 1:53pm

What do you mean with “heavily upgraded” or “completely reinstalled”? Did you touch the database at all besides the usual db:migrate after updating the installation?

grin · May 15, 2019, 3:17pm

You’ve been there watching IIRC.

So, heavily was upgraded through 2 major versions from 0.5 to 0.7, and it required a few repeatedly started migration steps, but I haven’t manually fiddled with the db. There shouldn’t be any reason for the uids to change, unless it’s planned in the migration steps.

I see that the “wrong ids” (from diasp[.]org) are much shorter than the current ids, but I am not familiar with changed internals, and I’m not even sure they’re the same id space.

(My question about complete reinstall was that if anyone would completely reinstall from scratch on the same domain then all of the user requests would run on 404s? But this is just a theoretical question.)

supertux88 · May 20, 2019, 12:04am

You can’t, at least not the way you probably want to.

That’s no problem, your pod is working as well as it was before you put it to sleep. So if you were happy how it worked back then, just ignore the brokenness and continue.

As already said, you can’t. It’s not only the IDs which are different, but it’s also your keys, and since you don’t have the matching private keys anymore, it would be a huge security problem if you could take over account’s you don’t own the key.

Your pod doesn’t receive posts from the time it was down. Diaspora resends posts for a short time (so short downtimes aren’t a problem), but being down for years mean that you’re going to miss posts. You maybe receive some public posts from the time you were down because your pod fetches them for various reasons, but it will never be complete. However, you should receive all new posts (at least as many as you received before your downtime).

Stopping a pod over a long time isn’t a problem, as long as you keep the same database. It probably takes a few hours until all pods start sending you new posts again (because your pod was marked as down), but after 24h you should be fully up and running again. But don’t mess with the database and don’t re-use usernames, which means …

… don’t reinstall your pod (with an empty database, reinstalling and importing a backup from before, for example when you reinstall to migrate to a new server isn’t a problem).

And while you did everything correctly this time (as far as I can see, because your current account 1c6d9630daf1013291df1fa34b479473 is from at least 2015, so probably before the sleep), you messed up earlier, and that’s why your pod is currently working as well or as bad as before the sleep.

But I found at least two other GUIDs for your diaspora ID on different pods. One you found already yourself (0e8377826c716733, which is also at least known on geraspora), and there is also 88950694b93a2888 on joindiaspora. Both were used back in September 2011, so you at least reinstalled twice since 2011 and dropped the database, but reused the same username on the same domain, which doesn’t work. Your current account was and will never be working with at least diasp.org, geraspora and joindiaspora, and I don’t know if there are more.

So there are now two ways how you can “fix” this, but you will never be able to fix grin@diasp.grin.hu. To make the federation working with all pods, you need the same account known by all pods, and for your current diaspora ID are at least three different accounts out there. The easier way is to just register a new account on your pod, so by using a new username, you have a different diaspora ID which then can be known by all pods again, because it’s not used already. If you want to keep grin@ at the beginning of your diaspora ID, you can also change the domain-part. BUT when you change the domain, you NEED to start with a fresh database. So your new user grin@your-new-domain does have a different (not yet used) GUID and key again.

grin · May 22, 2019, 8:08am

I have been started my reply several times, then deleted it, and now I try yet again.

TL;DR: Future podmins should be warned in all relevant places that it is absolutely forbidden to reinstall a running pod from scratch. It will not cause “just” downtime but it will annihilate all old addresses on the pod forever (for all their relevant federation contacts), and it cannot be undone.

I would put it into podmin FAQ as well as in the installation instrucions’ prominent places.

I understand.

The next step is the adventurous one: how can a podmin manually invalidate old addresses on the local pod (what are the necessary steps without screwing the Db up)?
Seems like the only way is to gather all the pods misbehaving and contact their admins.

I agree about taking over but right now we are talking about a podmin requesting deletion of the cached data for addresses local to his/her own pod. Since a podmin can delete any local user account at will anyway (and by that completely messing up federtion of that said address in all the future of the Universe) it seems to be reasonable to be able to handle a request from a pod to have all (or selected) cached local ids to be invalidated by federation partners. It seems to be rather simple to make them unfederated, then re-request all the key data, which may cause only a small disruption.

I have tried the same doing unfollowing everyone then re-following them but to no avail: the id can’t be forced to be updated.

This part doesn’t seem to be particularly emphasized. I would expect people to strongly stick to their usernames and such problems to be relatively frequent (in the case of ignorant reinstalls).

So I am sorely missing a big flashing red warning about that in the installation document, that’s basically my moral of the story.

supertux88:

And while you did everything correctly this time (as far as I can see, because your current account 1c6d9630daf1013291df1fa34b479473 is from at least 2015, so probably before the sleep), you messed up earlier, and that’s why your pod is currently working as well or as bad as before the sleep.

But I found at least two other GUIDs for your diaspora ID on different pods. One you found already yourself (0e8377826c716733, which is also at least known on geraspora), and there is also 88950694b93a2888 on joindiaspora. Both were used back in September 2011, so you at least reinstalled twice since 2011 and dropped the database, but reused the same username on the same domain, which doesn’t work. Your current account was and will never be working with at least diasp.org, geraspora and joindiaspora, and I don’t know if there are more.

So there are now two ways how you can “fix” this, but you will never be able to fix grin@diasp.grin.hu. To make the federation working with all pods, you need the same account known by all pods, and for your current diaspora ID are at least three different accounts out there. The easier way is to just register a new account on your pod, so by using a new username, you have a different diaspora ID which then can be known by all pods again, because it’s not used already. If you want to keep grin@ at the beginning of your diaspora ID, you can also change the domain-part. BUT when you change the domain, you NEED to start with a fresh database. So your new user grin@your-new-domain does have a different (not yet used) GUID and key again.

Thank you for the really good and detailed description. I may request this to be promoted to a FAQ, without (or with, I don’t particularly care) the details.

Apart from the two options above there’s the third one which I have mentioned, which is

gathering all misfederating pod from the logs (or actually sidekiq)
asking a D* dev how to remove the old data reliably
convincing the fellow podmin to invalidate the old cached data

My guess is that all the relevant data is in the people table, so a purge for one person would be

BEGIN; 
DELETE FROM people WHERE diaspora_handle='grin@diasp.grin.hu'; 
--- or in fact using the bad GUID:
DELETE FROM people WHERE guid='0e8377826c716733' ;

for one specific person, or in the general case of reinstall

BEGIN; 
DELETE FROM people 
 WHERE pod_id=(SELECT id FROM pods WHERE host='diasp.grin.hu');

which would remove the account and its cached posts from the pod (at least I see it’s set on CASCADE DELETE for whatever mysterious future telling reasons).

What I don’t see whether it’s valid to make instead

BEGIN;
UPDATE people 
 SET diaspora_handle='grin@diasp.grin.hu.invalid' 
 WHERE diaspora_handle='grin@diasp.grin.hu';
--- or by automagically invalidating using GUID
UPDATE people 
 SET diaspora_handle=diaspora_handle || '.' || EXTRACT(EPOCH FROM NOW())
 WHERE guid='0e8377826c716733';

which would keep the posts but free the specific handle(s).

denschub · May 22, 2019, 8:51am

The installation guides contain a giant red box, saying

You have to do backups of your pod data. If you lose your data, you won’t be able to use the combination of your old username and old domain ever again.

which is a pretty clear indication that if you lose your database for whatever reason (either caused by accidental data loss, or by removing it), you’re doomed. We could add even more text to that in future iterations, but eh, it’s there, it’s bold, and it has a red background.

I agree about taking over but right now we are talking about a podmin requesting deletion of the cached data for addresses local to his/her own pod .

As outlined by @supertux88, if you do not have a copy of the old database, including the old private keys, then that’s it. You cannot change federated data, and you also cannot delete/refresh federated data. Full stop.

Maybe @supertux88 was not clear enough, so let me try again: All interactions on the federation layer are encrypted and signed using public-key cryptography. When alice@alice.com interacts with bob@bob.com for the first time, bob.com will fetch alice@alice.coms public key and permanently store that inside the database. After that point, bob.com will verify the signature of all incoming federation payloads with the known public key for alice@alice.com. If the signature check fails for any reason, the payload is dropped. No exceptions. There is no “please accept it anyway” flag in the federation layer, and there never will be.

The are multiple reasons for that, but it ultimately boils down to identity verification and protecting against identity theft. An accounts private key is the one and only key to that account. If you have the private key, you can use that account. If you do not have the private key, you cannot use the account.

We also do not allow fetching a new key for existing users just because the domain name matches. Domain names are easy to hijack, and even more easy to buy. If someone would buy a domain formerly used by a pod that has been closed down, fetching a new key would enable the new domain owner to impersonate all the users that once were hosted on that pod.

I do not endorse your database queries, nor any other methods asking podmins to delete existing data. And on a personal note, as the podmin of Geraspora, I will not be removing your previous profile or manually re-fetch the key.

Get a new domain or use different usernames.

grin · May 22, 2019, 11:03am

Now that you actually mentioned, I had to search for the text and I have found it at the end of the installation document in the section called backup.

My general visual image of the events are much more like the admin actually reads either sequentially (as from top to bottom), or reads the relevant (“relevant looking”) parts then starts installing. For a new install I wouldn’t see any reason to care about the backup section. (Since the old install was destroyed at that point anyway.)

Accepted that I cannot possibly imagine all possible behaviours; mine was reading the install doc (“relevant looking parts”) before installing, so I have based my suggestion on that.

The imaginary picture in my had looks like that the beginning of the install doc mentions a similar red one that says

"Warning! You cannot create a new install using the same domain and same identifiers as an already federated (and possibly deleted) pod: the cryptographic keys of those identifiers are already cached everywhere where they were federated and cannot be changed, deleted or overwritten due to protection against identity theft. If you try to re-create them from scratch they will not be able to get any content from their old connections, and this restriction cannot be resolved, neither using the UI nor manually!"

(As a sidenote: I will probably simply resolve my particular problem by creating a new automagic puller account which will subscribe to all the federated content, pulling them into the pod, which will then be available for every local users, full stop, end of story. The whole problem set is created by not being able to automagically get all the public content from other pods, so hashtags are useless.)

Technical question:

While I am not overly familiar with the internals it seems that there is multiple unique identifiers assigned to a specific source of content, apart from the “email-like-id” there is a GUID as well. I would guess that the protocol would follow the GUID and not the “email-like-id”, since, as it seems, pulls use the GUID and not the “e-l-i”, so it seems if someone would re-create "alice@example.com" then it would have a new GUID (also a new key) so the old contacts would not share with her, would not pull content from her and generally would not consider her to be the same person.

In the case of complete deletion of old "alice@example.com" and its associated GUID and key material then when new alice appears her content would not be pulled by old contacts (GUID doesn’t match, and if it would, the key wouldn’t), she will not get content from old contacts (same reaons), but she would be able to re-request connections, which would require manual intervention from everyone. Is that right?

Ack.
It is an adminstrative decision, as far as I see, not a technical one; I am (actually, I was) interested in both.

This is your personal preference and attitude towards problems. I am different, and both have its merits.

In a rewindable alternative universe I would curiously watch what would happen if Geraspora’s DB would get lost and you’d face with a similar problem, but instead of my humble self you’re member of the core team. But it’s not possible to see that from this point of time in this universe, so this definitely was not a question, and any answer would be hopelessly inapplicable.

I would try to help my users, in that alternative universe, to the extent of possible safety within the security’s limits. And maybe get burnt, who knows?

Thanks for taking your time to reply!

denschub · May 22, 2019, 12:04pm

For a first install, there is nothing for a podmin to be warned about. For a second install, well, they have read the instructions before, so they also read the large red warning. If someone decided not to read the backup section on a new install, then clearly they do not care about data loss, so they’d also not read any other warnings we put up. If we’d put all “important information” at the top, nobody would ever read the installation guide, because it would be prefixed by 20 warnings and red boxes.

No. There is one GUID assigned to one profile. The GUID of your profile is generated by your pod and federated alongside the rest of your profile. A profile always has one and only one GUID. In your case, there are “different GUIDs on different pods”, because you generated a new profile multiple times, but that doesn’t change the fact that these are technically different profiles that just happen to share the same diaspora* ID, which is where the breakage originated. One diaspora* ID belongs to one GUID, and if that rule is broken, something went horribly wrong.

so the old contacts would not share with her, would not pull content from her and generally would not consider her to be the same person.

GUIDs are a) public, b) not used for the account identification or discovery. Because of a), spoofing the GUID would be simple, and b) is a given, because a GUID alone is worthless in a federated syste,. If you don’t already know the profile, the GUID does not allow you to look up the profile, as the GUID contains no information about the username or the pod. The GUID is just a simpler internal identification mechanism, which we could simply remove and use the diaspora ID everywhere (there are reasons why we do not do that, but that’s not the point of this discussion).

GUIDs are not an authoritative account identifier, only the diaspora* ID is.

It is an adminstrative decision, as far as I see, not a technical one; I am (actually, I was) interested in both. […] This is your personal preference and attitude towards problems

No, it is a technical decision, because as we have explained multiple times, there is no way to alter/refresh information without a profile’s private key. Just manually dropping entities from the database will break relations to other tables, and things will break in hilarious ways because of that. There is a reason we ask people not to mess with their database, and there is also a reason why we design interfaces and protocols the way we do.

I do have backups of the database stored in multiple locations, and I have no intentions of dropping the database because I want to reinstall my pod. Besides, your point is even funnier if you consider the actual past of the pod. Because in the beginning, we actually did drop our database twice. Once on purpose, where we changed the domain from gerspora.de to geraspora.de, and again shortly afterwards, after some heavy reconstruction work on diaspora* corrupted some profiles, where I had to face the reality that 9 account handles were effectively burnt (just like yours) because they missed the deadline of a previous backup. I created dummy accounts with all the previously used usernames and permanently blocked them, to ensure that these identities will not get used ever again, because I couldn’t guarantee that communication with those handles would work, and I also could not retract previously federated profiles. Not sure where you’re going with that point, but you kinda failed.

I would try to help my users, in that alternative universe, to the extent of possible safety within the security’s limits. And maybe get burnt, who knows?

Saying “you can’t do that” is * precisely* helping you to the extent of what we can do. Even though it would be simple to come up with a few lines of ruby script that would re-fetch your new key and override the old one, that would not be an honest solution, because you can’t fulfill the requirements needed to make that solution reliable, namely:

Knowing which pods have a copy of your profile. This point is straight up impossible because your pod does not keep track of the pods that have queried a copy of your profile, and you can’t even “brute-force check all pods” because there is no complete list of pods.
Getting all those podmins to run that script, which you also will not achieve, given I am already one blocker, and I know at least 2 more podmins of big pods who will not manually re-fetch keys.

Without fulfilling both requirements, you end up in a state where communicating with some pods works and breaks with other pods, and in a couple of months, nobody will know why that breakage is a thing. And even if you remember what’s up, your users won’t. And from their perspective, all they see is an unreliable communication system where you can’t actually be sure that stuff ever works.

We care about building systems that work, and we try to guarantee that to the best of our ability. Providing workarounds that violate the core design of our system, as you ask us to do, is a direct violation of that, so it’s just not going to happen.

One last time, and then I’ll stop to avoid us spinning in endless circles: If you open a Swiss numbered bank account and you somehow end up losing the identification papers, well, congratulations, you lost the money. No matter what fancy passports you bring up, or how loud you yell at the teller, you won’t be getting your money back. diaspora* is a cryptographic system that depends on asymmetric cryptography for encryption and verification. If you end up losing the matching private key, you won’t be getting that identity back, no matter if you can prove domain ownership, and no matter how loud you yell at the project. It’s just not going to happen.

supertux88 · May 22, 2019, 5:26pm

So when you just want to receive public posts, that still working normally, because public posts are federated to /receive/public which is working normally on your pod. So what you see failing in your log is failing private stuff (private posts, or private messages, or contact requests) to your old non-existing users. So as mentioned earlier, when you were fine with the brokenness in the past, you can just ignore it and continue as you are. What is broken is:

everything private from pods which know your old user
everything you want to post to these pods. So you can’t write own private or public posts to these pods, and you can’t comment or like on public posts from these pods or on others posts which then will never go to the pods which know your old user.

So if you just want to passively read public posts from other pods, then you’re fine, everything is working normally. If you want to interact or write own stuff, you should probably create a new user (either with a different username or a fresh domain and a clean database for the new domain).