WordPress MySQL Database Encoding Fun

I’ve finished moving thesamet.com to a new web host. The most tedious thing to do was to move the WordPress SQL database to the new server, that is why I decided to get a best wordpress hosting to improve the performance. Those who need help installing MySQL on CentOS 7 may follow the guide on this helpful site.

It turns out that the database was created with latin1 character set while WordPress pushed utf8 encoded data to it. Visit collectiveray.com for more info. For those who got stuck in the same problem, here is what I did to transfer the data to a new server.

First, get a sql dump of your database. Use the encoding of the database when doing it (i.e. latin1). Here is how to do it from the command line:

$ mysqldump --default-character-set=latin1 -u db_user -p db_name > mydb.sql

After inspecting the dump, it turned out that this file could be considered as UTF8 encoded. this was to be expected as we had UTF8 data in the database considered as latin1 and we dumped it as latin1.

Python is known to have great support for handling Unicode, and is very comfortable to inspect strings with it. The next thing I did was to load the file with Python, and tried to decode the data as utf8.

>>> s = open('mydb.sql').read()
>>> s.decode('utf8')
Traceback (most recent call last):
  File "", line 1, in <module>
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: utf8 codec can't decode byte \x92 in position 561129: unexpected code byte

Great, so the first trouble comes at index 561129. Let’s see what’s in there:

>>> s[561110:561136]
' points of view, it\x92s good'

That \x92 looks suspicious. It turns out that this one is still encoded in latin1, and \x92 in unicode is some kind of ‘ (slanted single-quote or something). To translate it to UTF8, I did:

>>> '\x92'.decode('latin1').encode('utf8')
'\xc2\x92'

Now changing all occurrences of \x92 to \xc2\x92, can change it in places we would not want. So before we do that, we check what the other occurrences of \x92 looks like:

>>> [s[i-5:i+5] for i,ch in enumerate(s) if ch=='\x92']
['\xd7\x9c\xd7\x95\xd7\x92... ', 'w, it\x92s go']

We see that the first occurrence of \x92 should not be changed, it is within a UTF8 string. So we are safe to do

>>> s=s.replace('it\x92s', 'it\xc2\x92s')
>>> u8=s.decode('utf8')

And this time the decoding succeeded. This means that our data contains valid UTF8. My case was easy, but you might have to repeat the above process several times until you get rid of all latin1 characters in your data. It might be possible that you’ll have latin1 data that looks like utf8, and you’ll need to do even more manual search/replace.

Now write the utf8-encoded string (s, not u8) to a new file:

>>> open('/tmp/out.sql', 'w').write(s)

After that, I’ve changed all the CREATE statements in the SQL file from CHARSET=latin1 to CHARSET=latin1.

It is important to add SET NAMES=utf8; at the top of the file.

On the destination server I created the database with UTF8 default encoding:

CREATE DATABASE db_name DEFAULT CHARSET utf8;

and imported the new SQL file into the database:

mysql --default-character-set=utf8 -u db_user -p db_name < /tmp/out.sql

and voila, it works.

I also start searching more details about different databases including NoSQL database.

This entry was posted in daily. Bookmark the permalink.

1,578 Responses to WordPress MySQL Database Encoding Fun

  1. Pingback: pharmaciesshipping.wordpress.com20230515canadian-pharmaceuticals-online-with-no-prescription

  2. Pingback: Can maintaining a healthy blood sugar level help reduce the risk of heart disease - atorvastatin 20 mg price

  3. Pingback: What are 2 problems with antibiotics?

  4. Pingback: Can I take antibiotics an hour early?

  5. Pingback: http://trommelforum.ch/forum/profile/franbervage/

  6. Pingback: www.horreur.clubcommunityprofilecanadianpharmacy

  7. Pingback: https://essidi.cm/community/profile/canadianpharmacy/

  8. Pingback: https://nicol.co.tz/community/profile/canadianpharmacy/

  9. Pingback: abusetalk.co.ukforumprofilecanadianpharmacy

  10. Pingback: Is honey good for your liver?

  11. Pingback: madridbet

  12. Pingback: canadian pharmaceuticals online

  13. Pingback: warriorfarm.co.ukcommunityprofile153413

  14. Pingback: ascenddeals.combeaverageprofilecanadianpharmacy

  15. Pingback: baldstyled.comcommunityprofilecanadianpharmacy

  16. Pingback: Is Salt a natural antibiotic?

  17. Pingback: grandpashabet

  18. Pingback: canadian pharmaceuticals

  19. Pingback: https://chanchuoi.com/community/profile/canadianpharmacy/

  20. Pingback: Quels sont les 3 types de familles: pharmacie en ligne viagra

  21. Pingback: Comment se faire prendre par un chien: cialis 40 mg en france

  22. Pingback: Comment regler les problemes de famille | comment reconnaitre un homme qui prend du viagra

  23. Pingback: How quickly do men fall in love | vidalista 20

  24. Pingback: Is there an over the counter Viagra: buy vidalista online

  25. Pingback: meritking

  26. Pingback: Who reads Mens health vidalista 20 tadalafil tablets

  27. Pingback: Is slow speech more common in certain cultures or regions?

  28. Pingback: Increased fat accumulation, particularly around the abdomen and face, can be observed in individuals with thyroid deficiency?