Purging old federated content

Having run a pod for a fairly long time, and noticing load times taking progressively longer as the database gets progressively larger, I’m looking for ways to improve performance without hacking the EvilQuery that produces the stream. (I prototyped a hack that ignored posts older than a certain age regardless of origin and it worked very well, but I haven’t taken the time to repair all the tests that broke as a result.)

One thing I’m targeting are old posts that federated to my pod. Does anyone have a query that I could run against my pod’s database that would delete posts older than a certain age that didn’t originate from my pod (so my users wouldn’t lose any of their data)? Any pointers would be welcome; I’m reasonably good at hacking SQL, but I haven’t dived deeply into understanding the diaspora schema.

Could it be as simple as (in my case):

delete from posts inner join people on posts.author_id=people.id where people.diaspora_handle not like '%diaspora.koehn.com' and posts.created_at < '2018-10-27';

That would remove three million federated posts over a year old from my pod’s database, without touching any posts originating from users on my pod. I’m doubtful anyone is looking at content that old, so collateral damage would be minimal.

To detect local users you should check if the owner_id is not null in the people table (that’s how diaspora does it, it’s much easier than comparing the diaspora_handle). But doing this directly in the database is a really bad idea, because it creates a lot of inconsistent data (for example it wouldn’t delete comments for a post, which then can lead to all sort of problems with future migrations). You should always use the rails console to do this and destroy the posts the proper way.

When you just delete all posts that would also destroy all limited posts which were shared with your users and then they can’t see them anymore and this can’t be restored (while public posts could be fetched again in theory). So that’s something you maybe want to think about.

OK, I’m learning Rails queries now. Would this be correct?

Post.where("posts.created_at < ?", Date.today - 365.days).where(public: true).includes(:author).where(people: {owner_id: nil}).destroy_all

I was thinking that one painless way to delete old content would be to delete limited posts shared only with accounts on your pod that are now closed. Whether it’s feasible to write a query to do this, or how much content it would actually remove, I don’t know; but it could be a way to delete some content that no one would every be able to view again in any case.

This seems to work well:

Post.where("posts.created_at < ?", Date.today - 365.days).where(public: true).includes(:author).where(people: {owner_id: nil}).in_batches(of: 100).destroy_all
2 Likes

Note to future me: to get to the Rails Console:

RAILS_ENV=production bin/bundle exec rails console