Any appetite for pod deployment on docker-compose and/or AWS?

danielgblanco · July 18, 2020, 3:02pm

I’ve recently found out about diaspora*. I think it’s a brilliant idea and I would like to start using it, host a pod and contribute. Looking at the deployment options, I thought it’d be a good idea to provide a way to run a pod locally using docker-compose (although I believe there’s already something in the main repo), and also be able to host a pod in AWS, for higher availability nd redundancy. I’ve not put much thought into it yet but the plan would still be to keep it simple and easy to deploy, probably involving Fargate, RDS, ACM, Route53 deployed via CloudFormation or Terraform.

I am an experienced DevOps Engineer and could start working on this in my free time and deliver something fairly quickly, but before I start I would like to know if there’s an appetite for it, or if everything I’ve just mentioned has already been considered or even done If it has, I’d also like to contribute if help is needed.

Thanks!

cleanshooter · July 22, 2020, 4:13pm

I’m in the same boat, except I’m looking to host from my personal server rather than running it out on AWS. If your looking for a docker compose file @kohen has been building docker images for diaspora for over 5 years and has published a docker-compose.yml file on gitlab. https://gitlab.koehn.com/docker/diaspora/-/tree/master/compose

I’m digging through it myself atm and am a little overwhelmed by the plethora of configuration options. Good luck with AWS though, I’m using ECS for another project. I think you may run into issues trying to use RDS since this project uses postgre & redis afaik (just going off the docker compose file).

denschub · July 22, 2020, 4:35pm

We do provide a Docker-setup for running a development installation, which you can access via script/diaspora-dev. Run it without any parameters or script/diaspora-dev help to get details.

We have discussed a Docker-based deployment for production setups many many times in the past, and we decided not to do it. Not because we don’t have the time or because we’re not interested in making production deployments easier, but because there are many unresolved issues with such a setup that cause those Docker-based installs to be significantly more complex to maintain, to a point where we’re not comfortable offering that.

You’ll find a couple of notes here, and I’m sure you can dig out the other discussions if you’re interested. Ultimately, it boils down to significant issues around things like how to handle manual upgrade steps during major diaspora* upgrades, how to work with major upgrades for PostgreSQL which are still a huge pain in the butt and nobody cared enough in the last 6 years, and there are a lot of security implications attached to running Docker that most podmins are most likely not aware, for example that Docker bypasses any software firewall (like iptables) and so running Docker can, with improper configuration, expose other services running on the same server.

I’d love to be able to offer a quick and easy way for podmins to set up their production pods, but we’d rather not hand them a giant foodgun. There are a couple of community-driven Dockerfiles (and compose files) around on the internet. Feel free to use them on your own risk and responsibility. Keep in mind that those might outdated - both in terms of setting the pod up, but also in terms of the included configuration files/examples. Unless we magically find a way to resolve all the issues we previously identified, chances of us officially supporting Docker-based production installs are low.

Also, on a more personal note: Quite frankly, I also don’t want to offer any official help with autodeployong pods to AWS, GCP, or Azure - because having a large number of pods on those cloud service providers would lead the whole decentralization idea completely ad absurdum.

danielgblanco · July 22, 2020, 5:39pm

Thanks for the responses To be honest, and replying to your last comment, the idea of centralising it on a public cloud provider does seem to defeat the purpose… I thought about it straight after I created this topic

@cleanshooter thanks for your comment, I believe PostgreSQL on RDS and Redis on ElastiCache would definitely be possibilities, but I’m now less inclined to go down the public cloud provider route.

I will have a read through those links you sent me. I worked with PostgreSQL for a few years so your comments don’t really surprise me Regarding iptables, Docker does indeed add forward rules to your opened ports open to the entire world on its filter chain, but AFAIK you can stop Docker messing with iptables and do it manually, or bind ports to specific IPs. Anyway, thanks for the response. I’ll play around with Docker for a bit and see where that leads me.

denschub · July 22, 2020, 5:51pm

Correct, and that’s great if people who know what they are doing want to use Docker to deploy their things. I can, with absolute confidence, say that most podmins, however, have no idea about the security implications of running Docker in production on a node with a public IP, and it’s not possible for us to do this kind of education. I’ve seen to many horror stories from inexperienced server admins who just followed the Docker installation guide and then entered docker-compose up to get something running, lacking deeper understanding. Most of these stories end in something going horribly wrong at some point.

I don’t even blame those server admins. Docker, and most people working with Docker on a regular basis, are used to setups where the node does not have a public IP, but instead get their traffic routed in via a load balancer. That’s great, and I believe Docker is a great tool for certain scenarios (I use Docker for large parts of everything I host), but for diaspora*, the usual use-case is someone with a single VPS at some hosting company, and little to no experience with hosting things.

To add more context. This may sound incredibly patronizing, and I understand that “well, let people shoot themselves in the foot if they want to”, but we usually try to make decisions that result in people having a more reliable experience. Hosting diaspora* is something fairly critical, not only are you dealing with people’s private information, but you also can seriously make your life harder if things go wrong. For example, if you run the database inside a docker volume and you somehow destroy that volume, you’ll no longer have the required encryption/signing keys for your users, and you and your users will never be able to use the same username/domain combination ever again. Accidentally dropping docker volumes is surprisingly easy (and doing proper backups is surprisingly hard!), and our experience shows that just having people install things on their servers without leveraging containers is a lot more robust, even though it is a bit of a pain to set up initially.

But yeah, if you have ideas on how we can get around those issues, please do let us know.

goob · July 22, 2020, 7:31pm

Pinging @koehn (the ping above to @kohen was incorrect), who is probably the most experienced at packaging Diaspora in docker containers.

koehn · July 22, 2020, 7:42pm

Hey there, please let me know if there are particular questions I can answer; I run Diaspora/Postgres/Redis inside Docker-Compose, as has been mentioned. RDS/Postgres should be fine, and AWS Redis should be fine as well, leaving just Diaspora running inside e.g., EC2, EKS, etc.

cleanshooter · July 24, 2020, 10:04pm

Wow couple of things to unpack here… first off sorry @koehn I didn’t realize you were on here and that my original ping wasn’t linked properly . I do have some recommendations and reqeusts for help for your docker-compose project. As someone new to diaspora I don’t truly understand 95% of the diaspoa.yml config options so I left the majority of them default to get my pod running. I have been running into some issues with it: container needed manual restart after first run, couldn’t connect to redis db, and now pages can’t load but .js and .css files, I can’t tell if they aren’t being served by the pod (at least not where the page is looking for them). Take a look here: diaspora.joemotacek.com

I’m using a traefik container for my reverse proxy needs but wondering if you’ve seen this issue before… googling has yet to provide any insight.

Ok… now for some of @denschub comments. It sounds like there are some architectural issues preventing smooth updates that’ll prevent dockerization for the foreseeable future. I would caution against swearing off docker for these types of issues for the simple fact that pursuing dockerization of the application will expose/highlight/identify those issues, get them in the issue log and in the end make the app more stable. Having a stable/predictable upgrade methodology is crucial for an applications longevity. (I’m sure I’m preaching to the choir here…)

As for you thoughts of against having pods running on cloud services… I fail to truly understand how that goes against the goal of decentralization. Much like other decentralized application frameworks as long as the person hosting it is willing to pay for the up time on those services ultimately what’s the difference? I’ll be hosting this on my personal server granted but if I were a broke college student with an AWS account and enough docker knowledge to boot this up to get myself off Facebook I’d use a cloud service in a heartbeat.

Dennis as to you comment, “I can, with absolute confidence, say that most podmins, however, have no idea about the security implications of running Docker in production on a node with a public IP, and it’s not possible for us to do this kind of education.” There are a couple of issues with this concern. First there are “security implications” when running any kind of server… if your app is truly going to be decentralized you’ve already ceded that responsibility to any admin who wants to run this. Whether it’s docker or plain old debain if the admin doesn’t know what their doing they can face having their server compromised. Secondly, I don’t understand why you think it’s your responsibility to educate people on the security potions of standing up a docker setup… do you provide this type of education for standing up the other types of production environments? I didn’t see any on the Ubuntu Xenial guide…

Your concerns in regards to security and docker inexperience seem more like excuses to avoid pursing dockerization than legitimate concerns. Also, your assumption that most people working with docker aren’t exposing it on a public IP is not accurate in my experience.

The biggest argument I can make to pursue dockerization at the project is that it will simply the process for people and docker is becoming more widely used all the time, not only that but Kubernetes support is yet another reason to containerize. When you have limited resources you need to be able to ensure that you get the most of the resources you are deploying… dedicating an entire server or VPS to running this would be costly and since your installing Ruby, Redis, Postgres those on now running on the host system… you basically end up dedicating the entire box to one thing. If you have another Ruby app you want to run, you might run into version problems or conflict. Containerizing allows us to host multiple applications from one server with less fear of conflicts or version issues between applications.

Yes docker-compose down can destroy volumes if the docker-compose.yml is not built correctly… (e.g. using volumes) which is why having a supported version would be a good idea. As for backups once you’re using a volume backing up a dockerized database is no different from backing up one running on the host… you just have to back up the database files, some people use cronjobs, others manually copy once in a while… there are literally a multitude of tools one can use to back up files… and backing up your postgre DB whether it’s installed on the host or the db files are mounted from a docker volume shouldn’t be that different.

At the end of the day I would urge you to reconsider your conclusion to forgo Dockerizing the project. Containerization will only become more prevalent.

denschub · July 24, 2020, 11:19pm

That’s because those files are generated by diaspora*, placed inside the public directory, and those are not served by diaspora*, but instead have to be served by the reverse proxy - our supported configs to precisely that. You have to find a way to share those assets between the app server and whatever servers the frontend. This is important anyway, as user uploads are stored in the same directory.

We provide example configs for Apache and Nginx in our installation guides. If you’re familiar with Traefik, I’m sure you can adopt those.

With all due respect, I don’t think you know what you are talking about. Pretty much all server applications have manual steps to perform during major version upgrades at some point, regardless of how amazing your architecture is. This could be because you remove legacy modules from your code, you’re doing a large refactor that needs manual adjustments from people running the service (for example in config files), or just running some long-running migration that can’t be done in the usual startup loop because it could take a significant amount of time.

All large projects have these things, and quite frequently, you can’t automate these things away because using crystal balls to make decisions is quite unreliable. Some projects build migration containers, others spend a lot of time building interactive upgrade scripts, but it’s never as simple as just pointing your compose file to :latest and running docker-compose pull && docker-compose up. That’s not how it’s meant to be, and these are not “architectural issues”. I understand that someone who joined this project apparently two days ago doesn’t have quite the insights to make accurate judgments about how the project and the architecture looks like - but I see no reason why anybody should just jump into a project’s discussions and immediately claim there is some kind of bad architecture present. That’s kinda annoying, to be honest.

The goal of building a decentralized social network is to enable people to have more control over their nodes and to reduce the amount of single-points-of-failure in a network. Imagine if a third of all diaspora* nodes was hosted on AWS, and imagine Amazon suddenly having a large-scale outage (happened before, will happen again). A third of the network would suddenly be down. Decentralization means … moving away from centralized infrastructure pieces. AWS is quite the opposite.

If you were a broke college student, you hopefully wouldn’t run things on AWS, because as a broke college student, you wouldn’t enjoy burning money. Instead, you would buy a cheap VPS somewhere, which wouldn’t be as reliable as AWS, but probably an order of magnitude cheaper.

That’s not a hypothesis, but fact: the project team runs zero diaspora* nodes for anyone but the official team account and we have no intentions of doing so.

Actually, yes, we do. For any piece of software we ask the podmins to install, we make sure that our guides result in an environment that isn’t worse then previously. For database servers, we make sure to point to documentation that correctly explains how to set those up so that they’re not exposed without proper protection. We make sure that the Ruby version is updated. We make sure that our distribution doesn’t contain any security vulnerabilities. We make sure that the configs we provide (including the nginx config for example) match best-practices, for example, by disabling insecure TLS ciphers. Our default production setup is only listening to a local Unix domain socket, and not a public port, and the only way to reach the service is via a reverse proxy that we provide a well-known configuration for. If you follow our installation guides, we can be reasonably sure that you don’t open holes in your system by doing what we say. You’re right that we can’t help people with setting up the server itself and things like adequately designed SSH authentication, fail2ban, firewalls if needed, … but we can make sure that the things we ask podmins to do don’t make things worse. Quite frankly, I think we should, because otherwise, only people with a perfect understanding of Ruby on Rails applications could be allowed to run a diaspora* pod.

There is a high level of trust that podmins have to place in our hands - especially people who may only be used to running PHP-based applications for example. Trust is something you don’t play with if you want to be taken seriously.

Docker, on the other hand, would be a different beast. We’d have to tell people to install Docker, and there is no short “how to make your Docker setup not suck” documentation anywhere. We’re also not going into creating those documents ourselves, because that’s more effort (both to initially create and to maintain) than we can justify.

We’ve outlined the issues we have with a Docker-based production setup. So far, we have received a small number of people who’re actually interested in this (the majority of podmins are very fine with our current setup, by the way), and an even smaller number of people who are very vocal about how we “need” to add Docker containers. Those vocal people raise one very valid point: making the setup very easy for people - but nobody has yet addressed a single point on our list with more than a “just look away, that’s just a small issue”. I find that quite concerning.

That’s a slightly weird claim, given that we already went through the trouble of building a Docker-based development setup. The development setup is, unarguably, significantly more complex to build than a production-setup would be, as we not only have to set up all required components, but we also have to make sure that developers can edit the code outside the container and make sure things like live code reloading works. Our already existing Dockerfile for diaspora, and the compose file would work just fine in a production environment, and are technically ready except for a reverse proxy - we’d just have to strip out a couple of things that we put in place to make development setups less painful. Like the code sharing between host and container, and the 550 lines of cross-platform bash script that makes it easier to run things.

It would also be significantly easier just to build a production setup based on our current configs to shut everyone up instead of spending hours on end discussion these things. That would get us some love from people like you and a nice “look how easy it is to set up diaspora*” blog post. Believe it or not, there are actually very good reasons of why we don’t take the easy way.

Again, that’s simply not true. We set up RVM to resolve exactly that - we are in control over the Ruby version and the Gemset in use, and the whole system is designed not to conflict with whatever else is running on your server. The only thing that could possibly conflict is if you’re running another application that uses Redis - but we added a very explicit note about that to the installation guides and we provide an easy-to-access way to change the Redis database in our config file. Even our nginx and apache config examples are designed not to conflict with other things running on the server.

In my earlier post, I made a very open invitation to @danielgblanco. They claimed to be interested in looking into these things, and I invited them to let us know if they figured out solutions to the issues we have.

This is an invitation to everyone. If you think you have ideas on how we can provide people with a setup that results in the same maintainability, upgradeability, and doesn’t open new potential security issues, I can guarantee you that we’ll be more than happy to work with you on getting this shipped to production. What absolutely doesn’t help, tho, is another “you need to support Docker because Docker is cool” kind of post. We’ve had enough of those.

cleanshooter · July 25, 2020, 3:17am

@denschub I’m sorry you took my post that way… it seems my arguments to encourage you to consider your stance on dockerization only caused you to further entrench. First I want to apologize if my assumptions were taken the wrong way. When you explained the issues and I read the other thread it sounded like you were trying to explain that there were architectural issues… I was simply restating what I thought you had explained. My misunderstanding.

I understand you believe that upgrading an app shouldn’t be as easy as docker-compose pull && docker-compose up but I’m trying to tell you it really should be… that should be a project goal not something you plan to avoid because it’s hard. Granted I’m no Ruby engineer but I’ve worked on enough projects to know that these days you really can make it that easy and it’s a goal worth pursing on any project… if I knew more about ruby I’d offer to help… but I’m still just getting acquainted with the project, as you mentioned.

As to your thoughts about the cloud infrastructure muddy the decentralization goal. I get you don’t want to put all your eggs in one basket and TBH even if the configs were provided/supported I doubt that people would spin up and pay for enough cloud nodes on a single service provider, in only one of the data centers for this to be a realistic concern. If people did have the option to more easily use these service to support the project wouldn’t that be a good thing?

Not sure why you think AWS is more expensive than a VPS… I use it for free or a few dollars a months to run testing environments. I could see the cost comparison leaning towards a VPS for something like this where you put the databases, applicaiton, proxy / load balancer all on a single server vs splitting it up across all the services required on AWS (RDS, Elastic Cache an EC2, ect…).

I think we should (educate people), because otherwise, only people with a perfect understanding of Ruby on Rails applications could be allowed to run a diaspora* pod.

Agreed. I think that the setup instructions (from what I read) are more than adequate to meet security needs. I do have to disagree with your assertion that the OOTB install of docker sucks, however.

You referenced a list of issues that prevent proper dockerization. Manual upgrades would require scripting and automation for releases… which admittedly is a big one to tackle. However any application that intends to a life-cycle will needs this eventually. As for the postgre db upgrade the post you linked to has multiple work around for the issue and someone started an entrypoint script to address the issue last year that’s in the works. So progress is being made .

I’m sure the current method works for a lot of people and I’m not suggesting that its not sufficient. I am just saying that I’m trying to use it for a production setup and would appreciate an official image. I can imagine that I may be in the minority as far as voice go at the moment but I’m sure that’ll change over time.

Our already existing Dockerfile for diaspora, and the compose file would work just fine in a production environment, and are technically ready except for a reverse proxy - we’d just have to strip out a couple of things…

It sounds like you guys are almost there

Believe it or not, there are actually very good reasons of why we don’t take the easy way. (and dockerize)

I can tell this isn’t the first time you’ve had this discussion… and for whatever reason it’s seems like it’s become a contentious issue. After reading the reasons you’ve posted here I can honestly say I’m at a loss to understand what the reasons are… I understand your frustrated but can you help us understand what the hurdles are so we can help? Some of the issues raised thus far seem more ideological (establishing scripted upgrades, docker security concerns) than technical. I truly believe that if I can better understand the requirements I can help.

danielgblanco · July 25, 2020, 5:26pm

Thanks for all the comments, I really appreciate the discussion that this has kicked off Very good food for thought in there. And sorry for taking a while to reply. I would like to touch on a few points, and then move on to what the next steps may be, at least from what I can believe I can contribute. For the following, I will leave the discussion on Dockerisation to one side, as that was not the my intended focus on this discussion, and it’s a deployment decision that can be considered at a later stage and improved upon if needed.

I would like to focus on the topic of running on AWS as an alternative deployment option. I thought about it again, following @cleanshooter 's comments, and I believe it’s still worth investing some time in (for me). I can provide more figures on runnings costs, but I do cost and resource optimisation in AWS for a living, working on optimising deployment configurations and reducing toil, in a company serving 100 million monthly users, so I believe I can certainly bring something to the table. I don’t know how it would compare to a VPS but for small deployment running on burstable instances with spot pricing on, or even possibly adapting parts of it to run on lambdas, I don’t think it would break anyone’s bank. The beauty of it is to make it easily scalable so that, if needed/wanted it can seamlessly accommodate more users.

As it was mentioned earlier, “we usually try to make decisions that result in people having a more reliable experience”. I fully agree with this. This is where we can utilise some of the tools that AWS provide and balance the commitments different types of podmins can agree on. Of course, one can run a highly available infrastructure and provide any degree of reliability without having to run on a cloud provider, but perhaps some podmis prefer to rely on RDS and EBS snapshots, with easier point-in-time recovery and upgrade paths, than doing it themselves. At the expense of a potentially more expensive setup. In terms of AWS outages. they have happened in the past, but AFAIK only specific availability zones, which operate independently in their 24 regions. For example, during the Fukushima disaster one of their availability zones got taken out but the rest of the region stayed up. Running on AWS will always be more reliable for individual pods, providing the infrastructure is built accordingly to the level of reliability desired.

In terms of security by deploying our infrastructure as code, we can make sure that the stack is setup following the least privilege principle. If we don’t trust that podmins are aware of the implications of running a stack in AWS, the default configuration should limit access to the minimum needed to run a stack (this is a good practice anyway). Allowing podmins to experiment with their stack if needed, at their own expense. Probably not a concern here, but PCI compliant setups rely on providing no (or very limited and audited) access to hosts, which could be achieved here.

Finally, on the topic of making it easy or not for people to deploy a pod. In any reliable and highly available environment, it should not just be easy, it should be automated and 100% hands off. This is straight off Google’s Site Reliability Engineering series. Of course, we’re under a different set of constraints here, as we don’t have a CI/CD pipeline that can apply new deployments and upgrades to all the decentralised network, but I believe we should aim to provide the tools for podmins to achieve Constant Delivery (as opposed to Constant Deployment). We humans are bad are following steps and repetitive tasks. Manual ops should only be considered in case of incidents or special circumstances. As an example, we run dozens of custom Kubernetes clusters across multiple regions, and there is not a single time an engineer has to run any manual step when operating the infrastructure. It gets merged to Git, pipeline triggers, changes are automatically rolled out. This includes even major upgrades of Kubernetes, draining whole clusters and even regions. This is the best way to achieve reliability.

Now, where do we go from here? (Or perhaps where do I go from here? ). As I said, the topic of Dockerising the application can be discussed later. I would like to come up with a recommended stack to deploy in AWS. This will also include automating the creation of AMIs with Packer. The deployment will create a new AMI given a diaspora* version, and deploy the infrastructure using CloudFormation, probably using Autoscaling Groups, EBS/EFS, RDS, ElastiCache, S3, ACM, Route53 and a few other things to glue it all together. This will also take care of rolling updates on new versions (although it should allow for upgrades that require downtime for non-backward compatible database upgrades).

I hope to come up with something in the next few weeks (quite long days WFH at the moment ). We’ll be able to talk a bit more in length about things like cost and deployment options with an architecture design in front. Thanks.

cleanshooter · July 27, 2020, 1:04pm

@danielgblanco This sounds like a fun project. Basically using AMIs (VM) instead of containers and cloud formation instead of docker compose. If you start a project somewhere on (github, gitlab…) I’d be happy to contribute (if you’ll have me) . Do you have any preference for root OS? I might play around with building the AMI a bit over the next couple of days.

danielgblanco · July 27, 2020, 7:31pm

Thanks! I will create a repo soon. I want to think a little bit about the architecture to make it cost effective while reliable, but it’s clear that we’ll need a installation script for Packer and a way to change config and start the diaspora with env variables (or similar) in EC2 UserData. I definitely would not say no to some help!

koehn · September 28, 2020, 1:01pm

FWIW, I’ve been running Diaspora inside a Docker container for years without issue. I find it far easier to work with than trying to maintain all the dependencies inside a shared instance, instead having them preconfigured inside an image. I’ve worked hard to make the image as small as possible, both to reduce disk consumption and attack footprint. For those who find Docker better and easier to work with, it’s a very effective solution that automates the ceremony of configuring a solution yourself. Over 100K downloads later, apparently there are others who agree.

denschub · September 28, 2020, 9:05pm

This… really gets old a bit, honestly. This whole “Docker” topic has driven itself into an endless cycle of us saying “yeah, our current situation isn’t great, but here are the concrete issues we’re having with a Docker-based setup, is someone interested in solving those?”, people then claiming those are easily solvable, and then… disappearing, just for someone else to pop up months alter with the same “Docker is great!” statement. This is not productive.

Yeah, as we said many many many times before, Docker is indeed nice for us. Our current installation guides work, but we’re well aware that setting up diaspora* can be painful. We have explained our reasoning of why we do it the way we do, but we’ve also acknowledged that there is no perfect solution here. Using Docker has downsides, but so does forcing people to manually install everything. Our job here is to find the method that gets the best balance between providing a good time for podmins, but also allowing us and podmins to easily debug things when necessary.

So, @koehn, since you wanted to revive this topic, I want you to answer the following questions:

In the compose file in your GitLab (which is outdated), I noticed that you expose two configs, but everything else is inside the volumes. How do you backup the database, the uploaded files, as well as config files? Also, whatever your solution is, do you think that’s feasible for a podmin with no prior understanding of Docker to use as well? If not, are there alternative solutions for a reliable backups that you could recommend that are easier for people to understand?
How do you handle Postgres major upgrades? I assume you just do a dump (maybe using the existing backup infrastructure?) of the old, bump the version in the compose file, and import that dump?
Your setup is confusing, because I can’t find the part that serves static files. Have you toggled assets.serve in your diaspora config to have the appserver take care of that? Do you run nginx/apache inside the same compose definition and have that mount the volumes for static files?
Also, how to you route external requests? Do you have a central loadbalancer set up? How would you recommend podmins with a less complex infrastructure to handle this? Would we run one nginx inside Docker for serving statics, and one outside Docker to handle ingress?
How do you handle upgrades? I’m aware that your startup script runs bin/rake db:migrate, which is fine for minor upgrades, but this is not the case for major upgrades. Some major upgrades require manual steps before database migrations run, some upgrades require manual intervention after the database migration. Some upgrades, for example the 0.8.0.0 upgrade, will require even more complex steps, like moving the config file form a YAML to a TOML file. How do you make sure that you don’t just pull latest and ignore everything else, and how do you think we could ensure that all podmins are aware of these manual steps, and how we could ensure that these steps are actually done before people start up their broken pods and potentially end up in a horribly inconsistent state?

Looking forward to your reply here. Maybe we’ll find a setup that works for everyone.

The diaspora/diaspora repo has, on average, 20 unique clones per day. The installation guides in the wiki get roughly 35k requests a week. But I doubt that 35k people try to install a pod every week, even 20 people a day is not realistic. What I’m saying is: these numbers don’t say much if you don’t know how many actual production pods run that way.

koehn · September 28, 2020, 9:48pm

This… really gets old a bit, honestly.

So why spend so much time commenting on a simple reply? Honestly, put your energies elsewhere. I’m not criticizing anything about the documented way of deploying the application; I’m simply offering an alternative that works better for me and a bunch of other podmins. Why is that a problem?

I’ll try to answer your questions nonetheless.

How do you backup the database, the uploaded files, as well as config files?

I use WAL-G for streaming backups, and duply for backing up my docker host, but on Kubernetes I use velero. I keep my configurations in a git repo on my Gitlab (also run in Docker). Since everyone seems to have their own solutions for backups, I don’t presume to provide one. This is a pretty common pattern for Docker images.

Also, whatever your solution is, do you think that’s feasible for a podmin with no prior understanding of Docker to use as well?

An understanding of Docker is a prerequisite for using Docker images. This is a pretty typical pattern for image authors, and aligns with other images out there. I’ve tried to make the documentation pretty good, but if you have suggestions for improvements I’d love to hear them.

Your setup is confusing, because I can’t find the part that serves static files.

Thanks for the feedback. Personally, I’ve been using Diaspora to serve the static files, but I’ll probably add support for pushing the assets to S3. A PR to improve the documentation (or even just an issue) would be welcome.

Also, how to you route external requests?

Most Docker applications leave this to the admin, because (yet again), there are many ways to do it. Some use nginx, others apache, and others use a dedicated load balancer. Since most admins who run containerized apps deploy multiple applications, they’ve already figured this out, hence image developers don’t document it.

How do you handle upgrades?

In all the years that I’ve been running a Diaspora pod, the db:migrate task has been enough to upgrade from one version to another, given that the other steps are managed by the image. If there ever is a release that requires a more complex upgrade, I’ll handle it then; it’s just not that hard.

denschub · September 28, 2020, 10:24pm

That’s a healthy opinion to have, and agree. Unfortunately, I don’t think that’s what most people would assume. Pretty much all of the request for first-party Docker setup are requests like this one or this one, where people are very explicit about their motivations: diaspora* is complex to install. Just installing Docker, downloading a compose file (or some other magic like a debian installer package), and copy-pasting some commands is super easy.

This is not only a diaspora* thing, btw. I’ve seen tons of people run projects like Discourse or Mastodon “because they are super easy to install with their Docker setup”, and I’ve seen a ton of those folks ask the most basic questions - like “how do I change the public port, because I have a conflict”, so it’s clear that those folks do not, in fact, have “an understanding of Docker”.

And I truly get it. Most people know Docker as some kind of black box that runs applications inside a container, you just have to apt-get install docker and it’s done, so why should this be bad. Also, who would not like a install guide that’s as easy to follow as this one, when the alternative is following a multi-page guide that forces you to install a lot of packages, to tons of manual setup, and all the other fuzz. So when we’re talking about official adoption of Docker as a supported and officially promoted installation method, I’m 100% sure that a significant portion of the people using this don’t actually know what they’re doing.

That’s dangerous, because it puts us into a position where we have to design our setup in a way that works without requiring a deep knowledge. And unfortunately, this does mean that

folks will not have a replica database set up (and they’re hopefully also not using k8s)
folks don’t know how to backup Docker volumes
folks don’t know how to set up ingress routing

So those are all things we have to figure out. From what I gather, that’s actually not what you’re interested in, and that’s fine. Those points are still valid questions for people who are, so I’ll leave it here. Unless we have those things figured out, and we have a solution that we feel comfortable people with little Docker-knoweldge could run, Docker and diaspora* will remain something that has to be done by third parties and will not receive support or promotion from our side, unfortunately.

Just a couple of notes to your setup, tho:

That’s generally ill-advised. Rails (more precisely, rack) as a strict request->response flow and doesn’t really allow for parallel processing. This means that pretty much all stacking/queuing methods, as well as things like http/2 multiplexing, are not possible. If you want to serve an asset but all appservers are busy doing stream queries or whatever, the asset will not serve. Also, since media assets are hotlinked, you could actually kill your user’s performance, just because the appservers are busy serving external assets, and can’t serve stream requests.

It’s much better to have a dedicated webserver deliver those files. Plus, you get fine control over things like caching parameters, content compression, and also fun like h2 multiplexing with is a huge gain for mobile connections. Don’t have diaspora*s appservers serve assets, it’s just not good practice. (This is true for pretty much all Rails applications)

While 0.7.x was tame, I’d like to invite you to have a look at the changelogs for 0.6.x and 0.5.x. Those things would really break your setup. I trust you that you will figure this out, but as I said, people like you are not the only audience interested in a Docker-based setup, unfortunately.

Because people keep bringing up Docker as “the ultimative and super simple way to install diaspora*” and then get super pissed when they realize we actively do not support that, and people usually are not interested in our reasons. So if I don’t explain the reasons every single time over and over again, people get even more pissed when they are told “we have discussed this thirty times already” without a proper explanation.

koehn · September 29, 2020, 2:09pm

Nice commentary, thank you.

I think something that would help people looking to host (whether in Docker or some other way) would be two guides: one for helping people understand their responsibilities as a system operator (backups, certificates, etc.) and another that would help people understand their responsibilities (and liabilities) for hosting people’s social media content (GDPR, moderation, spam, etc.). Understanding those two topics should be a prequisite for running any social media service on any stack.

Docker (and Kubernetes) simplifies some of this, but not all of it, by a long shot. People need to know that just because it’s technically possible for anyone to run their own node, that doesn’t mean it’s as simple as installing an app on your phone.

I didn’t understand about the rails limitation; I’ll try to get my assets into my S3, and add support for uploading assets into the Docker image. My initial attempt at this was unsuccessful; I’ll need to try again.

danielgblanco · February 21, 2021, 12:37am

Hi all! It has been a while since my last comment here! After a few months, I finally found some time to dedicate to finish up a PoC for running diaspora* on AWS. You can find the code here https://github.com/danielgblanco/diaspora-aws

As the containerised version of diaspora* has sparked some discussion, I’ve decided to base it purely on the official installation guides on an raw EC2 instance, using Packer and CloudFormation to provision and configure certificates, DNS, database, instance, etc. It creates a new stack from scratch in less than 10 minutes and my estimates are that this can be run for less than $20/month. As the instance would be automatically recovered in case of hardware failure, if one wanted to run it cheaper it could be run on EC2 Spot, at the expense of reliability of course.

As the data is kept on RDS and S3, backups are simpler, but I still haven’t documented the upgrade process properly. Especially in case of rollback. Happy to get feedback and contributions on it. Thanks!