Wednesday, November 4, 2015

MySQL encoding problem and convertion to UTF8

Many times an incorrect installation of a CMS (eg.
Drupal, Wordpress, OS-Commerce) gives a wrong collation and encoding of a MySQL database. Usually, the problem shows up after a short amount of time, eg. when 1000 records have stored in the database. 
The problem is that the database is in the default LATIN1 encoding, instead of the preferred let's say UTF-8.
So, how can we change the collation and encoding of the "meshed up" database? And the most important, how can we keep our data and convert the data to the correct encoding?

The correct solution seems to be the following:

  1. Backup your database!
  2. Ensure that no data will be stored to the database in that time (eg. put your site in maintenance mode)
  3. Using a Mysql editor (eg. phpMyAdmin), ALTER all tables to the correct UTF-8 encoding using the following SQL command:
    ALTER TABLE `categories_description` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
    for example in eg. 'categories_description' table.
  4. Convert all data of the given table to the correct encoding using CONVERT and CAST command as following:
    UPDATE `categories_description` SET `category_title` = convert( cast( convert( `category_title` USING latin1 ) AS BINARY ) USING utf8 ), SET ....(repear SET for all string fields in table)..;
  5. Repeat the above process for all the tables in the database!
After that, everything seems to be fixed!