A script to backup pod's data


I’ve written a simple script to create a tar archive of pod’s data: DB dump, public/uploads folder and config/diaspora.yml file.

I think that it won’t harm to include it into diaspora source repository (e.g. script folder) so that podmins might make use of it and didn’t have to write their backup scripts on their own. Also if it is in the repo, diaspora maintainers/contributors can control that the script doesn’t contain anything harmful, any dramatic mistakes (like rm -rf /) and is up-to-date with the diaspora* version.

Also the script supports PostgreSQL only for now, but can easily be extended to support MySQL as well.

If this post gets enough approval, I’ll create a PR on diaspora*.

I’m not sure if we should put it in the repo or just as a example to the wiki (like the init/systemd stuff and reverse-proxy examples). I think the wiki fits better (and we could put more tips there to also upload the backup somewhere). But it’s cool to have an example script. :thumbsup:

Do you think it’s okay to just embed the script code to a wiki page then?

No objections against the wiki. Maybe we should create a backup page, like this one for startup methods, where we can collect stuff around backups (and also describe what should be backed up). You can also link to the gist if you like.

Yes, I can create this page. I think that I should either embed the script into the page or make a link to gist, not both at the same time (to not duplicate).

I can also describe usage of Amazon Glacier for backup since that is what I’m going to set up.

Yes, not both.

Sounds interesting :thumbsup:

I have created a page in the diaspora wiki on pod data backup information.

What I think is not very good is that it is actually possible to change the script on the page by anyone. It enables attacks by substitution of script parts with malicious code in hope that some podmins will run the code on their pods without checking the contents.

Is it possible to make this page require premoderation of any change? Or of a page section with the code specifically?

1 Like

Thanks for creating this page. :thumbsup:

I edited the beginning of the script, you can do cd $(dirname $0)/.. directly, no need to call ruby for that.

That’s how a wiki works, so everybody can improve it, I think that’s a good thing. There can be a collection of different scripts doing things differently and somebody can then use what matches better.

That’s true for every page, one could also put an rm -rf / in the installation or upgrade guides. But people (especially admins) should think and understand, before running anything on their server. I know, there are many who don’t, but that are that kind of people who only read half of it, then execute the other half, something doesn’t work, they google half of the error message and then execute something from somewhere in the internet … So they’ll break their server anyway :wink:

I don’t know what’s possible and what not, I think @waithamai knows that better. But I know that she monitors every change in the wiki, so nobody can edit something without the dragon noticing it :wink: So I think that’s fine.

Hm, I copied this snippet from script/server.

I just checked that @jhass had made a similar change to the script in 2011, but then it was changed back to support OS X.

If we still support OS X, then perhaps it’s better to keep the scripts way it works everywhere. But if we can do the same thing without ruby and with OS X support then I guess we should change our script/server so that we don’t call ruby there as well.

Maybe we should create a note on this page and on pages where scripts are saying something like “check scripts contents before execution”?

I also used $(dirname $0) at work, and we have people using OS X as well, and nobody complained that it doesn’t work. But I see the reason is that there was an additional readlink -e (which doesn’t work on OS X), but I don’t know why this is needed, because everywhere I use $(dirname $0) I have no problems (and works on OS X too).

People could also remove such note if the want, so I don’t know … But let’s see what @waithamai thinks about that problem, I think she knows the best what you could/should do and what not.

So… I don’t know anything about the script, I only know wiki; so I’ll ignore the rest here :wink:

Yes, it’s possible: The page could be protected so that only wiki admins can edit it (but it’s not possible to protect only some parts of a page). But I’m not sure if this is useful here. It’s a wiki and as long as there’s no need to protect pages, they should be kept open for editing, imho.
The page history is visible to everyone, so if somebody finds the script suspicious they can always check who wrote the latest version and what they changed. (And I guess admins should at least skim through such scripts anyway, before running them on their servers(?) - those who don’t do this will also ignore any notes on the wiki page.)

As for the wiki in general: It’s running in moderated mode since we got lots of spam users. That means, every new users’ edits have to be reviewed before they’re visible to other visitors. Only edits made by older users and users who have already been approved (= who made meaningful edits before) are visible right after editing.

I usually check the wiki’s moderation queue and all recent changes on a daily basis (or at least every few days, if I’m not at home), so I’ll notice vandalism on any page very quickly and can revert them. :slight_smile:
I think we can leave the page open for editing, as long as we don’t get malicious edits there. :slight_smile:


@waithamai, thanks for clarifying, I agree with your points.

About the script:

Current recomendation is to backup public/uploads folder. It has images, tmp and users subfolders. It it actually useful backup users and tmp? tmp is normally empty and users hold users’ data backup archives which are redundant if we make a database backup. Maybe we should just pack public/upload/images then?

When you don’t backup public/uploads/users all download links in the user settings are broken after a restore, because the database still have the old filenames. While that is not a real problem (because all exported data is still there), it can be weird if a user wants to download a file a second time.

I do incremental backups, and full backup once a month, so the space for this folder is only used once per month, so I have no problem with backup it. But when you add it to an archive like in your script, you have this content in every backup, which makes the backups significantly bigger.

But I think that’s up to the podmins if they want to backup it and how they want to do it, different methods have different advantages/disadvantages.

(Also I just noticed, that we should probably cleanup this folder, because I have many exports for the same user in this folder)

How do you make incremental backup? With rsync? Is there a way to incremently backup a database? Or it is just for other files, and database backups are always full?

Database is always full, but other files are incrementally (I use rsync and duplicity).

Okay, then I think I can achieve the same effect by using --listed-incremental tar option.

Pod databases are large so incremental backup of DB could make sense, but that makes everything much more complicated. I think that at this point I’ll follow the same approach as you.

@comradesenya Sorry for coming in late to this but I’m curious about using Amazon Glacier for backups. What’s involved in getting that setup?

Senya created this page in the wiki, but at the moment only for postgres. But in the end you just need to backup everything you needed to move your pod to a new server (database, uploads and configs), so for mysql just use mysqldump and then push everything to glacier (I have no experience with glacier, but I’m sure you can figure out how to upload stuff there yourself :slight_smile: ).

Maybe there is also a solution to do incremental backups for uploads, so you don’t need to backup all uploads every time, only the new files. For the DB you need a complete dump every time anyway.

Thanks, @supertux88! What other alternatives to glacier are there? On the old server 1and1 provided a backup service but they only have it for data centers in the US.

There are a lot of other solutions, in the end you only need storage somewhere, that can be glacier (or S3, to mention another amazon solution), but can also be another server you own, or something like google drive or just storage like this (there are a lot of such storage solutions from different providers, this is just the first that came to my mind). They have different prices and different features (for example glacier is cheap, but has slow restore), so you need to choose what is best for you :slight_smile: