S3 Asset Hosting Testing


(Hank G) #1

To Diaspora podmins: There is a Pull Request for making the AWS S3 asset hosting feature work across S3-compliant back ends for systems like Digital Ocean, Google, et cetera. To enhance stability of the initial release and in the long term it would be great to have podmins who can help with long and short term testing. This would take the form of running the pre-release software on a pod during the initial testing of the feature and each code freeze period before a release. If any podmin is interested in helping in either the short or long term testing please feel free to reach out, comment here, or comment on the below Diaspora post to discuss further.

Universal Link: web+diaspora://hankg@diasp.org/post/8a6c9fb0f4ee0136770c047d7b62795e
HTML Link: https://diasp.org/posts/8a6c9fb0f4ee0136770c047d7b62795e


(Hank G) #2

I’ve gotten, to a certain extent, this working with the Digital Ocean S3-compatible Spaces system. The proper configuration for that will look something like:

s3: ## Section
  enable: true
  key: <your key>
  secret: <your secret>
  bucket: <your space name>
  region: '<your region>'
  host: '<your region.digitaloceanspaces.com'
  endpoint: 'https://<your space name>.<your region name>.digitaloceanspaces.com'
  cache : true

This is working as in the images upload and content is served from it. So it completes the whole lifecycle. That’s pretty cool. Unfortunately I think there is a way to go on the entire S3 functionality, but I’d like to know how this interacts with the other systems. My preliminary notes on testing so far:

  • General
    • The path_style flag added to the configuration in the s3_compat branch doesn’t seem to affect behaviors in anyway. It should probably just be taken out if this is generally true.
    • The path stored in the database to the picture is the full URL to the bucket. This is a problem because you can’t push to the CDN but you should be reading from the CDN. The CDN would be defined in the image_redirect_url however because the path doesn’t start with /uploads/images it doesn’t do the redirect. We need to investigate if this is happening on the AWS S3 example, I would imagine it is, and change the behavior in the Photo.update_remote_path in Photo.rb so that it works correctly.
    • The Wiki’s instruction to use environment variables for key and secret didn’t work. The Fog site’s recommendation to use .fog file didn’t either. For now it needs to go in that file specifically.
    • Because of nuances of each type we probably want a Gist or examples of the diaspora.yml configuration plus the permissions configurations on the respective system.
  • Digital Ocean Specific
    • To push content up for your assets etc. use the below command from your public folder:
      s3cmd put * s3://diaspora-dev-bucket1 --recursive -P -M --no-mime-magic
    • Because the path_style setting doesn’t work the uploaded images will have an extra depth to them. So a path to the directory will look like:
      https://<space name>.<region name>.digitaloceanspaces.com/<space name>/uploads/images

It would be good for someone on Digital Ocean to confirm my findings and to have people on the other systems report their results here. Thanks for your help!


(Hank G) #3

FYI for people that are trying to seed their new S3 bucket with their current site data…

The DigitalOcean documentation wants the user to use the s3cmd command line tool. When doing this there are some arguments that it doesn’t mention which I found required to get this to work correctly. The big problem it needs to overcome are default permissions and MIME type selection. This command executed from the /public folder worked for me:

s3cmd put * s3://<bucket-name> --recursive -P -M --no-mime-magic

What’s going on here:

  • Put is pushing the files up to the bucket
  • the * wildcard means everything (just like it would on the command line)
  • Then it’s the name of the bucket in S3 URI syntax. This may have worked with HTTP URI syntax but I didn’t use it
  • --recursive is telling it to do a recursive traversal of the wildcard, just like a -r on the command line
  • -P is telling it to set the uploaded file to be public. The bucket is listed as restricted because you don’t want random bots to be able to traverse your bucket directory listing. On DigitalOcean this means that it assumes you want the contents to also be restricted. We however want people to get to a file if they have the URL but not be able to do the listing. This option gives you that behavior
  • -M tells s3cmd to guess the MIME type. This is really important. If you don’t do this it will upload CSS and JavaScript as plain text. When the MIME type is set as this and it is pulled by the browser it will not execute the JavaScript or CSS. That will create a very ugly and non-functional Diaspora site because all of our style sheets and JavaScript files are essentially not there as they aren’t processed.
  • --no-mime-magic By default the command tries to use mimemagic for figuring out the MIME type. On my Linux Mint 19 machine (based on Ubuntu 18.04) I could never get this to work. The MIME types we are trying to determine are easy to pick off from the file extension and the tool is capable of doing it itself so it actually works without mimemagic. You may want to try to use the command without it to see if it works in the standard way. However if you run into the same problem I did this command will be indispensible.

(Hank G) #4

FYI this is the behavior not just with S3 but in general it seems. When I look at the image redirect code it seems like it won’t redirect with paths like this, only with something beginning with /uploads/images. This makes sense from the perspective of if the asset is being stored on another server which it was federated from. I believe it works for locally stored assets because the relative path to the hosting server will still look like /uploads/images. However for assets stored in the database with an external path to a remote bucket I don’t think this code is going to ever hit the CDN because the redirect logic isn’t going to trip correctly.


(Hank G) #5

Capturing some details from the IRC channel for moving implementation and documentation forward:

  • We probably want to encode the pod’s URL not the bucket URL in the database so that it knows to redirect it to the CDN. Even if an admin chooses not to use a literal CDN they could have it redirect to an off-server location.
  • The documentation should highlight that the redirect should be happening at the reverse proxy level on a production pod not in D* server proper. Having an example of this in Nginx and Apache could be helpful
  • The YAML file should put a similar warning to the self-hosted content that these settings are really meant for development pods and that the production pods should be doing this at the reverse proxy level.