One of the reasons that Joindiaspora is slow/requires lots of resources to run (and possibly other old pods), is there are tons of old, dead accounts in the system. I don’t have the exact numbers in front of me, even if we expired accounts that have not signed in in the last three years, that would drastically reduce the database size, which means messages would federate much faster, page loads be more reliable, and would be more cost effective to run. It would drastically improve the entire experience for people using using the Pod, and any new users that try Diaspora out.
There are two tables specifically which really bog down performance (Person and PostVisibilities), and drive the cost and memory of running a large pod up. I’d like to optimize Diaspora for the community who actively use it, so I’d love this discussion to turn into a plan of action to improve this scenario.
There is a few things that we would be a good idea to expire.
Local User accounts that have not signed up in a period of time.
as such, we could also expire…
local person objects
all post visibilities from this account (things that they can see)
contacts (both sides)
One thing we don’t keep track of if a pod goes down
a) we don’t want to send messages to this pod
b) if we have not have had contact for it for some period of time, we should expire all data related to said pod (Person, contacts, post visibilities for local users)
Empty accounts just following dhq
Any other ideas?
The goal here is that if we can actually expire a proper amount of the data, JD.com (and most likely other pods) can have small data sets, and require less resources to run, which makes them more sustainable for the future. I’ve been paying for JD.com out of my own pocket, but its starting to become a burden, so I wanted to make sure we found a solution that people found acceptible (and share that process with others).
I’d love all of your thoughts.
Note: This discussion was imported from Loomio. Click here to view the original discussion.
Don’t like deleting old accounts while there is no way to migrate accounts and date from one pod to another. I get a new account some weeks ago, but my old account have more then 3000 posts and it wouldn’t be nice to loose all of them.
One interesting approach could be implementing the move account by building an exportable archive that can fully restore an account on any pod. If we have that, one could build a mechanism that automatically builds such an archive for old accounts, uploads that to some location and then deletes the accounts data. When the user then signs in again he could download that archive or it’s even automatically restored at that pod.
No matter what approach we settle on here, the big question is who is going to implement it.
@maxwellsalzberg you cannot imagine how happy I am to see you opening this discussion. I opened #4183 about that a while ago but unfortunately we saw no progress on this.
IMO we can safely delete :
- Users who never signed up (they were invited but never clicked on the link)
- Pods which didn’t respond for a long time (one year?)
We can also delete users who didn’t sign in for more than (period to define), but to send them an email first would be done.
My opinion about this is to propose these actions in the admin panel of the pod:
*Clean pods which didn’t respond for more than [period selectable by the podmin, min 6 months]
*Clean users who didn’t sign in for more than [period selectable by the podmin, min 6 months]
*Mail users who didn’t sign in for more than [period selectable by the podmin]
The mail can be to alert and ask to connect, something like “hey, diaspora* was really improved since the last time. wanna have a look again? If you’re not going to use our service again, please delete your account to improve our performance”.
Or a mail to warn that the accounts would be deleted “hey, you didn’t sign in for more than a year, your account will be deleted if you don’t log in before 15 days”
As Ravenbird said…
desperately waiting since more than 1 year for a tool to move the postings from my old account into my new one (thousands of postings). It would be sad to loose them. Wish I could help working on such tool but I have no clue about programming
Well, maybe we could add a criteria: “accounts who never posted anything”.
certainly we would not delete any info from people who wanted their data… all they would have to do is to just log in to JD.com (or whatever pod) ONCE in the last 3 years. (or whatever timeframe we agree on)
This way we could still dogear accounts to be migrated.
in the case of a private pod where there aren’t a lot of users, it is often the size of the participations table that bloats the DB
it would be handy to have a function to clear participations more than a year old, they are too far down in the activity feed to ever view them anyway
One of the things I’ve noticed is that some of the pods (e.g. Diasporg) gives an authentication error even though the login creds are correct. That happened with my first account and has happened approx. 3 times to a friend of mine who eventually gave up on Diaspora because of it.
I would contact some of the account owners somehow. In my case, delete my old account on Diasporg and my account on joindiaspora. I really don’t give a shit.
Confirmed what @aj says re smaller DBs blowing up on the “Likes” etc, it’s completely out of all proportion. Did I see a cleanup script for this somewhere?
@maxwellsalzberg thank you for this initiative - absolute it is very important to start optimizing the network structure as it’s starting to get old enough to suffer from problems of no optimization done.
Unlike commercial providers, the diaspora* network is afaik fully powered by private people from their own pockets. Thus we should absolutely offer tools for podmins to clean data and make sure that they can run their pod on a small as possible hosting plan. This will not only ensure pods run for longer, but also make it more easier for new podmins to start their journey.
My suggestion for user deletions;
- Make two rake jobs. The first rake job would send a warning email to accounts that would be closed.
- The date the warning was sent would be stored to the Users table as a timestamp.
- The second rake job would then delete these account where “warning sent timestamp + configurable period < current timestamp”.
Why send a warning email? Two reasons. Firstly, the obvious, to warn the user. In some cases like indicated by @saschamorr - the user could be interested to keep the account.
But the second reason is more important IMHO. By sending the warning email, we are also contacting the user and saying “Hey! We’re still here”. From the management work I do with the diaspora* project social media accounts, it seems likely that 90% of those 1 million created accounts probably think diaspora* has died.
Whatever way this is done, it would be great to lessen the burden of running a pod. Optimizing joindiaspora.com is a perfect task to achieve that for other pods too
Oh and forgot - of course in the two rake jobs scenario, when a user logs in, any “warning sent” timestamp should be cleared - thus removing the account from possible deletions.
In theory, implementation sounds good.
for jd.com, most likely better run/schedule batches (say 1000 at a time)
Trying to delete all that data in once process will most likely take forever, so it might be an ongoing thing that would have to be run over the course of many days, but keeping it in chunks that make it easy to stop/start as load increases could be good
@rich1 you are also right, that is a good catch! That would also help as well.
What is the best way to go about finding the correct processes for figuring out
what to delete
how to do it in a repeatable, humane, cost effective way?
how to implement it.
@jasonrobinson just opened a PR to send an email to all users.
Well I didn’t have this in mind actually but sure, with small adaptations a similar rake job could send out warnings and flag accounts (assuming that is what we want to do). Will see once I finish this up whether I could do that too.
This sounds really good. I made a suggestion a few months ago, very much along the lines of what @jasonrobinson suggested, to help the biggest pods clean up their user bases and improve performance:
- Podmin chooses time limit since last activity (default two years seems sensible).
- Emails sent out to addresses in database, in batches, giving the users a set period (30 days default?) to log in to their account - also giving them a link to use in case they have forgotten their log in details.
- After the period has elapsed for each batch, delete the accounts from that batch which have been inactive since that batch of emails was sent.
I’ve not idea of how to achieve this technically, I’m afraid, but if possible, such a feature would be really useful for the network.
Really at it’s simplest, just a bunch of rake jobs would do fine IMHO - later they can be built in to the admin UI if needed.
Still haven’t finished the “send email to users” rake job thingy - might have a look after that but will take some time tbh.
I began doing something to remove old users, haven’t tested any of it, just putting together some code.
The idea is to;
- Have a cron job (whenever gem) to send expiry warnings per settings, and to queue actual expirations to sidekiq. To be expired users will be flagged as such in user table too (timestamp when ok to remove).
- Login will check for this timestamp and remove it if it is encountered.
- Sidekiq will process the row and if expiration timestamp is still there, it will do the expiration
How does this sound for a basic principle? Also, what exactly would be cleaned? The aim here is to remove bloat from pods (optionally of course). So the removals need to be efficient if the podmin wants, not just little slice here and there.
Just a normal
WIP stuff be here: https://github.com/jaywink/diaspora/compare/remove-old-users
I started working on this because joindiaspora is going super slow with all the activity going on So input of @maxwellsalzberg appreciated.
Personally I would like to see such a feature my self, since allot of people are just register and do nothing with their profiles. Because they do not read the ‘Help’ pages to find new friends or think its a FB rip off and lack the knowledge to understand what D* actually is.
Also it would be a good clean up for older pods that have long forgotten members such as Poddery.