Help migrating a MySQL database from utf8_general_ci to utf8mb4_bin


(David Thiery) #1

My MySQL database is utf8_general_ci and I need help migrating it to utf8mb4_bin.


(Flaburgan) #2

I’m just linking the github issue in case someone needs the history of the investigation.


(CSammy) #3

Thank you for creating this topic. Just one last thing before we dive into the mess: I need you to verify that your indices already have a length restriction. A sample will suffice for now. In order to check this, please open your database and run

SHOW INDEX FROM diaspora_production.tags;

In the Sub_part field, there should be the number 191 for the index index_tags_on_name. Please report if that is the case.


(David Thiery) #4

Yes, it has 191 in the Sub_part field for index_tags_on_name.


(CSammy) #5

Very good. Please test the following steps with a copy of your production setup so you know it won’t destroy your pod, as this has not been extensively tested. I only did a proof-of-concept run on an empty installation on my local machine.

  1. Copy this content to a file foo.rb in your diaspora folder:
class SetMysqlToUnicodeMb4 < ActiveRecord::Migration[5.1]
  def self.up
    change_encoding('utf8mb4', 'utf8mb4_bin') if AppConfig.mysql?
  end
  
  def change_encoding(encoding, collation)
    execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET #{encoding} COLLATE #{collation};"

    tables.each do |table|
      next if ["ar_internal_metadata", "schema_migrations"].include? table

      modify_text_columns = columns(table).select {|column| column.type == :text }.map {|column|
        "MODIFY `#{column.name}` TEXT #{'NOT' unless column.null } NULL#{" DEFAULT '#{column.default}'" if column.has_default?}"
      }.join(", ")

      execute "ALTER TABLE `#{table}` CONVERT TO CHARACTER SET #{encoding} COLLATE #{collation}#{", #{modify_text_columns}" unless modify_text_columns.empty?};"
    end
  end
end
  1. Stop the pod you are doing this on
  2. In the diaspora folder, execute bin/rails console
  3. In the opening console, execute require_relative 'foo.rb'
  4. Afterwards, do SetMysqlToUnicodeMb4.new.up

The last step may take some time, as the tables need to be rewritten. MySQL actually copies the entire table while changing the respective fields. Please, again, do not forget to test this beforehand in a copy of your production setup. Doing so will give you two benefits. First, you can rule out any data-destroying bugs. Second, you will have a rough estimate of the expected downtime.

If you feel like it, you’re welcome to join on our IRC channel to have more real-time support (as long as somebody is there :wink:).


(David Thiery) #6

Ok, quick question. I made a backup copy of my database to a new one (diaspora_test). Do I need to create a backup of my entire diaspora installation to do this? If so, is it as simple as just copying the entire installation directory of my diaspora to a new directory?


(Flaburgan) #7

Sammy was suggesting that you test it locally. Looks like the perfect timing to introduce you to the docker image for test env if you don’t want to set up a diaspora installation locally.


(David Thiery) #8

Ahh…gotcha!

Also, I noticed that the fields that need changed are varchar(255) and not text. Should I just change the text to varchar(255)? Or do those fields need to be converted to text?


(David Thiery) #9

Never mind, I just realized this is converting them to text.


(Dennis Schubert) #10

Please do not do anything manual with the database. There already has been enough destruction. :wink: If it turns out the script missed something, that’s relatively easy to reproduce and fix, but that’s not true for manual actions.

You can actually use your existing diaspora installation:

  1. Edit config/database.yml and make sure the database name in in the test section matches the name of your clone database. If the database is named diaspora_test, you should be all set already, but please double-check.
  2. Start your Rails console with RAILS_ENV=test bin/rails console
  3. Run ActiveRecord::Base.connection.current_database and make sure it returns the name of your testing database.
  4. Continue with step 4 in @csammy’s reply.

Also make sure to run the commands in a tmux or something, just in case your SSH connection drops.


(David Thiery) #11

The only manual thing I did was the backup and restore to diaspora_ test :slight_smile:

Ok, knowing I can use my existing installation makes this easier, else I’d probably have to wait until this weekend to get a local version setup.

Do I still need to stop the pod before doing this since it’ll be running against diaspora_test?


(Dennis Schubert) #12

No. As long as you verify that you are, indeed, using the test database, running it while the pod is running in production is fine. However, as your database will be doing a lot of disk I/O, there might be a performance hit. If you want to avoid that, the only option you have is to set diaspora up locally and import your database dump in your local machine.


(David Thiery) #13

When I run RAILS_ENV=test bin/rails console I get an error that says Namespace test not found in my diaspora.yml.


(Dennis Schubert) #14

That’s a warning. And as such, can be ignored.


(David Thiery) #15

But then it exits and leaves me at the terminal prompt. I tried that command with production to see what happened and it’s fine (exited that right away).


(Dennis Schubert) #16

If it exists, you’re probably getting an additional error you didn’t tell me about. So please paste the entire output. :slight_smile:


(David Thiery) #17

uninitialized constant Rack::SSL (NameError)

Didn’t include that the first time because I thought there was a problem with diaspora.yml that caused that error :slight_smile:


(CSammy) #18

Would it be possible for you to hop on to IRC? It would speed up this back and forth greatly


(David Thiery) #19

Yep, give me a second and I’ll get on there.


(CSammy) #20

Fair Warning: This is just a summary to capture the situation and the steps we went through. Please don’t use this without getting back to us here on Discourse or on IRC, as these steps can destroy your database if anything we did not capture here is different.

Initial Situation

  • Symptoms:

    • Impossible to post a hashtag that differs only by diacritic from an existing hashtag (in this case: dandelíon exists, dandelion is throwing errors)
      • Error Message to User - <tag> is already taken (Example Screenshot)
      • Error Message in Log - Rails: ActsAsTaggableOn::DuplicateTagError (‘<tag>’ has already been taken)
    • Impossible to use unicode emoticons in posts
      • Error Message to User - Internal server error. Our bad! Sorry about that. :( Example Screenshot
  • Installed using a BitNami installer (for now an advised-against way to install)

  • Database in MySQL has character set utf8 and collation utf8_general_ci due to not being created by bin/rake db:create

  • All indices (except for AR tables) are already restricted in length due to the pod in question being fairly new

  • Change to character set utf8mb4 and collation utf8mb4_bin is required for proper functionality

Our solution for this special case

  1. Copy following code to foo.rb in your diaspora* root folder
class SetMysqlToUnicodeMb4 < ActiveRecord::Migration[5.1]
  def self.up
    change_encoding("utf8mb4", "utf8mb4_bin") if AppConfig.mysql?
  end
  
  def change_encoding(encoding, collation)
    execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET #{encoding} COLLATE #{collation};"

    tables.each do |table|
      next if ["ar_internal_metadata", "schema_migrations"].include? table

      modify_text_columns = columns(table).select {|column| column.type == :text }.map {|column|
        "MODIFY `#{column.name}` TEXT #{'NOT' unless column.null } NULL#{" DEFAULT '#{column.default}'" if column.has_default?}"
      }.join(", ")

      execute "ALTER TABLE `#{table}` CONVERT TO CHARACTER SET #{encoding} COLLATE #{collation}#{", #{modify_text_columns}" unless modify_text_columns.empty?};"
    end
  end
end
  1. Copy your database to a new one named diaspora_test (or whatever is in your config/database.yml in the test section, if you changed that)
  2. Add the following lines to the bottom of your config/diaspora.yml file:
test:
  environment:
    require_ssl: false
  1. Start your Rails console with RAILS_ENV=test bin/rails console
  2. Run ActiveRecord::Base.connection.current_database and make sure it returns the name of your testing database.
  3. Run require_relative 'foo.rb'
  4. Run SetMysqlToUnicodeMb4.new.up
    This may put some load on your server as it is rewriting the entire database copy. It also gives you a rough estimate how long changing the production database will take.
  5. If the database looks okay, close the console with quit
  6. Stop your pod
  7. Make a backup of your database
  8. Start your Rails console with RAILS_ENV=production bin/rails console
  9. Run require_relative 'foo.rb'
  10. Run SetMysqlToUnicodeMb4.new.up
    This may take about as much time as your test run.
  11. If the database looks okay, close the console with quit
  12. Start your pod again
  13. Test the functionality of your pod (try a post with an emoticon or a hashtag that did not work previously)

Partial pod-connection-issues (sidekiq)