Problem storing User Assets in S3

Newbie here.

Trying to set up a pod all in AWS (ec2+postgress+s3, which I hope won’t be too expensive) and I’m running into trouble getting asset storage into s3.

I’ve manage to configure all the permissions/policies, and from ec2 instance shell, I can access and store to the s3 bucket (using awscli) but when trying to upload a picture via d*, I get 403 :frowning:

:host          => "mybucket.s3-us-east-2.amazonaws.com"
:local_address => "XX.XX.XX"
:local_port    => 54102
:path          => "/uploads/images/fdc5342c9b3c423243.jpg"
:port          => 443
:reason_phrase => "Forbidden"
:remote_ip     => "XX.XX.XX.XX"
:status        => 403
:status_line   => "HTTP/1.1 403 Forbidden\r\n"

I’ve confirmed that the credentials in diaspora.yml are correct.

s3: ## Section
  enable: true
  key: 'MYKEY'
  secret: 'MY-SECRET'
  bucket: 'MY-BUCKET'
  region: 'us-east-2'

assets: ## Section
  #serve: false
  ## Upload your assets to S3 (default=false).
  upload: true
  ## Specify an asset host. Ensure it does not have a trailing slash (/).
  host: https://MY-BUCKET.s3.amazonaws.com

Any ideas?

P.S. I noticed the most recent s3 patch/upgrade, which I don’t think has to do with what I’m trying to do.

You are correct that the PR doesn’t have anything to do with your code base since it is a development branch that hasn’t been integrated. As you probably guessed a 403 error implies a permissions problem with those credentials for the operation that the code is performing. There were setup instructions for the user role for those credentials etc. I haven’t personally done them since I’ve never set up an S3 system but ironically am in the process of setting one up in Digital Ocean Spaces to test that patch you described. I had thought about going through and setting up AWS S3 as the initial pass to prove that works before working with the other things.

When experimenting I was using s3cmd not awscli. Does it work in that mode too? Have you confirmed that it can upload a file to that specific path from the command line tools?

On a somewhat unrelated note, if you aren’t expecting to have an enormous pod you would probably be fine with a traditional configuration that doesn’t use S3.

I will be playing with that part of the code this week so can hopefully help shed some more light on this later if another podmin doesn’t have another idea.

Hey Hank, thanks for your response.

Yeah, I have confirmed the IAM role/permissions/policies are correct (as per the documentation) and I’ve even tried with admin permissions to make sure nothing it’s on the way, but same problem.

I can also confirm that the credentials are correct and that it validates the bucket, because if I change to a non existing bucket, it fails stating so…

Just tried s3cmd and it also works, from the command line I can put files into the bucket… it looks like I may still be missing something at the permission level… I’m really puzzled with this…

You may be right that I don’t need S3, at this time, it’s just that I’m an aws fan :wink: I will probably keep going on setting up the pod and make it fully functional, and once I acquire more expertise will can try it again.

Thanks again, let’s see if somebody else has other ideas or if you learn later.

1 Like

I’ll keep you posted on my progress as well. Good luck!

@nicolas You need to do more on the AWS side, I think. Let’s assume you’re making a pod called examplepod.com. You’ll use the domain “www.examplepod.com” or just “examplepod.com” to go to the pod itself. (I.e., the thing the user sees in their browser). You have a few choices for how you serve up the assets to end users if you’re using S3 (BTW, I’m doing this myself on A Grumpy World, so I know it works):

First create an S3 bucket for the assets. Let’s assume that’s a bucket named examplepod-assets. Now you need a URL that will point to the objects in the bucket and will allow strangers on the internet to download them.

  • The easy way, which I do, is via HTTPS, direct from S3. You use a URL like https://s3-eu-west-1.amazonaws.com/examplepod-assets/. You have to make the S3 bucket public, which seems suboptimal from a security point of view, but these are only ever public web assets. I haven’t gotten around to doing it the right way. Make sure you use the correct region in the URL (e.g., eu-west-1, or us-east-1, or whatever). In my diaspora.yml file I have:
bucket: 'a-grumpy-world'
image_redirect_url: 'https://s3-eu-west-1.amazonaws.com/a-grumpy-world'
  • The right way: CloudFront. You create a CloudFront distribution and you give it an “Alternate Domain Name” of assets.examplepod.com. You’ll go into Amazon Certificate Manager in the us-east-1 region and have them issue you an SSL certificate (for free) for assets.examplepod.com. Then you set up an S3 origin for your CloudFront distribution. All this is described in the AWS documentation. When this is all done, if you have an object that is s3://examplepod-assets/image/logo.png, it will be reachable using something like https://assets.examplepod.com/image/logo.png. So then you use that URL as your image_redirect_url. When using CloudFront, it becomes really important to set the max-age cache lifetime to something long (e.g., hours) because that allows CloudFront to do its work caching your objects and saving you money. I haven’t figured out how to set that cache time in fog-aws yet (the subsystem that diaspora uses to put assets into S3).

Here’s what my costs look like, for running my pod on AWS. It’s a t2.small in Ireland, has 66 users on it. The cloudfront is $0.61/month, the S3 is about $2. The “EC2-Other” is about $2/month to cover EBS volume snapshots (daily backups). The Route53 is $1 every month. $0.50 for the domain (grumpy.world) and $0.50 for a Route53 health check that will alert me if the pod isn’t available. EC2 itself is about $13.50/month because I took a 12-month reserved instance contract. The Cost Explorer graph is weird because it is based on tags. And I only started tagging my AWS resources consistently around August.
05

1 Like

One more thing to be aware of with respect to user privacy and S3. If you run all your asset hosting on your pod, then your assets have URLs like https://www.examplepod.com/uploads/images/foo.jpg. I’m not sure what Diaspora would do on a restricted post if someone had the direct URL to an image. Let’s say I posted something, a nude picture or something I intend to share only with a small number of people. I restrict it in Diaspora. Let’s imagine also that all the users who I have shared it with are users on my pod, so no federation is required. If you get the URL to my jpg file and try to fetch it, I’m not sure what the Diaspora app server at www.examplepod.com would do. I hope it would force you to login, and only let you view the image if it was actually shared with you.

When you use S3 asset hosting the way I suggested, all assets shared that way are public with URLs like https://s3-eu-west-1.amazonaws.com/examplepod-assets/uploads/images/foo.jpg. It’s basically security by obscurity. If someone knows the URL to the image, they can fetch that image from S3, regardless of whether they’re a diaspora user or not. Regardless of whether the image/asset was shared with them or not. This may not be what your users expect.

Although S3 and CloudFront both have mechanisms for authenticating, for restricting access, and so on, Diaspora itself outsources the whole asset management layer to fog-aws. And the last time I checked Fog doesn’t offer support for authentication/authorisation mechanisms in S3 or CloudFront.

Images do open with direct links without authentication on the pod, restricted posts included. I believe this is done to achieve consistency on the federated network. Some other projects (I checked Friendica and Hubzilla) do offer access control for images but it doesn’t work with other networks - if Friendica user restricts access to image his friends on Diaspora and Hubzilla won’t see the image even if they are included.

I don’t see this as much of vulnerability though since guessing random 20-char URL from outside is highly unlikely and linking it to specific user seems impossible. I see only two ways of potential misuse: either someone’s browsing history gets stolen so the attacker gains “secret” image URLs or someone from the restricted friend group leaks URL of something sensitive say to law enforcement and it will serve as a proof that this content is indeed hosted on the server.

Also note that the first vunerability can be mitigated by enabling Camo on the pod.

Yeah. I’m a security guy by trade, so I tend to think about this stuff. Nobody randomly gueses 20-char filenames, but that’s not the threat.

If you have access to AWS and S3, I recommend that you run aws s3 ls s3://a-grumpy-world/ and notice that you can list all the assets that all my users have uploaded to S3. Note that I haven’t given you any access to my AWS account, and yet that command succeeds because I have made that a public bucket.

Some really common vulnerabilities would be: unauthorised shell code execution on the app server (e.g., execute “cat diaspora.yml” as the diaspora user and get the results back in the body of the web request) or get access to a postgres dump file (because you found it in a bucket somewhere, or on the server somewhere) or SQL injection (I know the platform takes a lot of precautions against that, but…). The point is that those URLs are stored in bulk in oodles of places—all the Diaspora servers that saw the post via federation, all the database servers that store the data for those servers, the files that backup those database servers, etc.

What happens a lot in the modern day is big bulk dumps of data. Someone accidentally makes the AMI of their Diaspora instance public and others can launch it (seeing their postgres credentials). Or someone gets access to the S3 bucket that stores database server backups, etc. etc. etc.

The fact that there are actually no access controls on something that nominally says it has access controls is really unintuitive. Take the URL to an image hosted on Facebook, Instagram, Twitter, Tumblr, etc. where it has been shared with a llmited number of people (i.e., not public). Now try to load that image as an anonymous (not logged in) user or as a logged-in user who shouldn’t have access to it. You won’t get the image, and that is the semantic that users expect.

I can solve the discoverability problem by changing my pod to use CloudFront instead of direct S3 access. Then I can make the bucket private. But that doesn’t change the semantics problem. We are not delivering the semantics that users expect.

While I welcome any critical discussion about storing user assets on Amazon S3, please make sure that your arguments are sound, and more importantly, factually correct. I will assume that the last post was wrong by accident, but let’s make sure this does not turn into a FUD spreading thread.

Wrong. The command succeeds because you added public read access to the bucket’s ACL. The example bucket policy in the wiki explicitly only allows ListBucket for the authenticated diaspora user, and the only permission for everyone is a GetObject for everything inside /uploads/*. GetObject will allow you to download an object if you know the URL, but it will not listing the bucket’s contents. The default for creating new buckets is to explicitly block all public access unless specified in the bucket policy, and if you set public read permissions in the bucket’s ACL, Amazon will warn you with a huge red warning. What you are describing as a misconfiguration on your side, and not an issue with the information or implementations we provide. At the same time, you can also set up your nginx to provide a directory listing on the public/ folder with a single line in the config, but that’s also not our issue.

In the 8’ish year history of this project, we had zero RCEs. Some of the libraries we use had RCE vulnerabilities, but not a single one of them was actually exploitable in production. We also had zero SQL injections, and it’s unlikely this will change soon. Please don’t make something that the world frequently sees mainly in poorly designed and not maintained software into a " really common vulnerability", because the overall amount of actually vulnerable applications is surprisingly low. And believe me, there are a lot of very friendly people who try to take out data of diaspora pods every day. :slight_smile: A more viable vulnerability would actually be some form of XSS, and unsurprising by the amount of user generated content diaspora handles, we actually had a few of those.

You are right about claiming there is a non-zero chance that diaspora will, at some point, be vulnerable in some way or another, and I’d go as far that it is absolutely guaranteed that there will be a security issue at some point. However, that’s completely irrelevant to this discussion. Storing assets on S3 is the absolutely least of your concerns if someone has a RCE on your server, because no matter how you store your assets, they will be compromised either way. Have them stored on your local server? Easy to transfer those. Stored on an external server? Well, the diaspora server needs to have access to that somehow, so that’s also not a solution.

This is a screenshot of my Nintendo Switch I uploaded to Facebook visible to “only me”, and yet, the URL is perfectly accessible to all, even with a fresh identity in Tor. This URL will expire after some time, but that’s the same for public images, and is mainly done to improve caching and load balancing. Nothing here requires to be logged-in, in fact, it works perfectly fine with 0 cookies attached.

Now, we could come up with very fancy mechanisms to require authentication to see images, but that would not only break federation with external nodes (which then suddenly had to authenticate somehow), it would also be a straight-up lie. Because nothing we can do will make the image more secure, simply because there is absolutely zero protection against someone right-clicking and selecting “Save image”, or simply taking a screenshot.

We are delivering exactly the semantics that users expect. We enable users to decide which contents they want to share with which contacts. We cannot solve the issue behind the fact that you still have to trust your recipients, but that’s a social issue, not a technical one.

And we really don’t like lying to our users.

2 Likes

You corrected a lot of stuff there. Thanks. You’re right that the listbucket acl was needless. (I’m in the process of getting rid of the public bucket, too. I’ve been lazy) The origin of most of the vulnerabilities I talk about is exactly that: sloppy admins (like me!). Only 2 of the vulnerability possibilities I named (RCE and SQLi) were things in the developers’ control and I took some pains to caveat one of those with an acknowledgement that the code is sound (because I’ve looked at the SQL layer and saw how it was done). I didn’t caveat the other. My bad.

I’m genuinely surprised that these are the semantics not just here, but in both Facebook and Mastodon. Knowing the URL is tantamount to having access. I’m going to have to go off and do some more exhaustive research. I say “users” are suprised because I assume they’re like me, and I’m surprised. Sometimes one finds out that one is not a representative sample. :slight_smile:

I’ve been testing the whole “private pictures” thing since our discussion yesterday. As I pointed out and Dennis pointed out in most places this is the same behavior. The only one I’ve found so far that actually does credential checking on a private photo is Friendica. I didn’t bother testing Hubzilla but supposedly Hubzilla does this as well. I know users don’t expect that behavior but it’s not atypical

I highlighted in my S3 brush up post that we probably should have a quick setup instructions for the various clouds to prevent something like being able to browse the directory or something.

1 Like

Thanks for all the help, I have figured out.

As @grumpy-podmin pointed out, problem was AWS.

  • The Bucket’s Public Access Settings were too restrictive:
    • Block new public ACLs and uploading public objects must be unchecked
    • Block public and cross-account access if bucket has public policies must be unchecked
  • To be able to properly deliver the resources, a proper CORS configuration must be defined in the bucket:
    <CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <CORSRule>
    <AllowedOrigin>https://YOUTPODHOST</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    </CORSRule>
    </CORSConfiguration>
  • The EC2 instance must be run under an IAM Role with access to the Bucket
  • The first time, the assets have to be rebuilt and uploaded to s3 using RAILS_ENV=production bin/rake assets:precompile

In a nutshell, that’s what I had to do to get it going.

Thanks again for the help :slight_smile:

EDIT: Fixed typo

1 Like

I’d like to make sure we capture this in a revision to the wiki on the whole S3 and S3-compatible object storage.