How to Bulk Convert MySQL Tables and Columns to UTF8 Charset Programmatically


1 views

When working with international datasets or migrating legacy databases, you often need to convert all tables and columns to UTF8 encoding. Doing this manually for hundreds of tables would be extremely time-consuming.

-- Traditional approach (tedious for many tables)
ALTER TABLE customers CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE orders CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
-- Repeat for every table...

MySQL's information_schema contains all metadata needed to generate conversion statements dynamically. Here's the smart query approach:

SELECT DISTINCT 
  CONCAT('ALTER TABLE ', TABLE_SCHEMA, '.', TABLE_NAME, 
         ' CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;') 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'your_database_name';

The DISTINCT keyword ensures we get one statement per table rather than per column.

For production environments, follow this complete workflow:

# Step 1: Generate conversion script
mysql -B -N --user=admin --password=yourpassword -e \
"SELECT DISTINCT CONCAT('ALTER TABLE ', TABLE_SCHEMA, '.', TABLE_NAME, 
' CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;') 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'production_db';" > utf8_conversion.sql

# Step 2: Review the generated script
head -n 10 utf8_conversion.sql

# Step 3: Execute the conversion
mysql --user=admin --password=yourpassword < utf8_conversion.sql

For large databases, consider these optimizations:

-- Process tables in batches to avoid locking
SET SESSION wait_timeout = 3600;
START TRANSACTION;
-- First batch of tables
ALTER TABLE large_table1 CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE large_table2 CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
COMMIT;

-- Second batch
START TRANSACTION;
ALTER TABLE large_table3 CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
-- etc...

After conversion, verify the character sets with:

SELECT 
  TABLE_NAME, 
  COLUMN_NAME, 
  CHARACTER_SET_NAME, 
  COLLATION_NAME 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'your_database_name' 
AND CHARACTER_SET_NAME != 'utf8';

This should return an empty result set if all conversions succeeded.


When working with international applications, converting database character sets to UTF-8 becomes essential. The naive approach of manually altering each table and column quickly becomes impractical for databases with numerous tables.

After some research, I found that MySQL's CONVERT TO CHARACTER SET syntax can handle both the table and its columns in a single statement:

ALTER TABLE database.table_name 
CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

To generate these statements for all tables in a database, we can query the information_schema:

SELECT DISTINCT CONCAT(
    'ALTER TABLE ', 
    TABLE_SCHEMA, '.', TABLE_NAME,  
    ' CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;'
) 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'your_database_name';

1. First, generate the conversion statements to a file:

mysql -B -N --user=username --password=your_password -e \
"SELECT DISTINCT CONCAT('ALTER TABLE ', TABLE_SCHEMA, '.', TABLE_NAME, \
' CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;') \
FROM information_schema.COLUMNS \
WHERE TABLE_SCHEMA = 'your_database';" > alter_script.sql

2. Then execute the generated script:

mysql --user=username --password=your_password < alter_script.sql

Performance Impact: Large tables may take significant time to convert. Consider running during low-traffic periods.

Backup First: Always create a database backup before running mass alterations.

mysqldump -u username -p your_database > backup.sql

If you need to modify specific columns only:

SELECT CONCAT(
    'ALTER TABLE ', 
    TABLE_SCHEMA, '.', TABLE_NAME, 
    ' MODIFY ', COLUMN_NAME, ' ', COLUMN_TYPE, 
    ' CHARACTER SET utf8 COLLATE utf8_general_ci;'
)
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'your_database'
AND CHARACTER_SET_NAME != 'utf8';