Recently I needed to bring a spreadsheet of data into a MySQL database table using the phpMyAdmin web front-end. It had been a while since I had last done this, and some of the import screen’s options didn’t seem to make sense. Why was the default value separator a semicolon? Was there something I had forgotten about the CSV file format? Time for some research. I found very little useful information online, and decided to muddle through it and document the process for the next time it comes up. Here goes.
Note: Currently phpMyAdmin’s CSV import is restricted to one table per CSV file, so keep this in mind when planning your import.
From phpMyAdmin’s main screen, select a database, then select a table, then select the Import tab, and you’ll see the screen below.
Most of these options are probably self-explanatory, but the semicolon default for “Fields terminated by” really threw me for a loop. Why was this there? I found the answer here. From the piece:
Unfortunately, there is no formal specification of the CSV format. The Microsoft Excel implementation is the most widely used and it has become a de facto standard, but the variations are almost endless.
… CSV stands for “comma-separated values,” but most of the spreadsheet applications let the user select the field delimiter from a list of several different characters when saving or exporting data to a CSV file. Furthermore, in a Windows system, when you save a spreadsheet in Excel as a CSV file, Excel will use as the field delimiter the default list separator of your system’s locale, which happens to be a semicolon for several European languages.
So the semicolon is just an artifact of European phpMyAdmin development. Glad we got that cleared up. The next step is to specify your CSV file. There are several potential gotchas in CSV formatting (prior to importing), so we’ll first walk through the process of making the CSV file from Excel (though a text editor or even OpenOffice could easily suffice).
You have to make sure that the column order of your spreadsheet matches the field order of your database. This is because there is no mechanism in the import tool to assign specific spreadsheet columns to specific table columns. It’s first come, first served. If you attempt to load in a CSV with 5 columns into a MySQL table with 6 columns you generate an error. Also, if you don’t make sure the order of fields/columns is the same you’ll have data from your CSV going into the wrong table fields. Two methods exist to accommodate this:
1) CSV Column Padding
This means that in your spreadsheet to make sure there is a column for each MySQL database field, even if that means the column is full of empty data. You’ll have to make sure your spreadsheet’s cell values are legitimate values in the corresponding database field (e.g. boolean values are 0 or 1, not just blank; date values are 0000-00-00 00:00:00, not just blank, etc.). Note: For auto_increment field types, make sure the cells contain a zero value, not just a blank cell.
In the example below you can see the database table structure with field names, order, default values and other attributes (auto_increment, I’m looking at you).
Next is an example of the corresponding Excel sheet (right-most columns removed for screen readability). Note that the ID column (left-most) has zeros for the values. These are necessary, but will be ignored by that field’s auto-incrementing nature (see field definition in above screenshot). Note how the various varchar-style fields can be blank in the XLS file, but that booleans and dates must have values present.
Pay close attention to the postcode column below. Note how leading zeros in the spreadsheet have to be accommodated by providing a leading apostrophe in Excel. The first example cell’s contents below are actually ‘04605, not just 04605. Without this leading apostrophe, the value would come into the database’s postcode varchar field as 4605, not what we want at all.
Observe how the column order matches the field order in the MySQL database. ID is first, followed by name, alias, con_position, etc. This is critical to ensuring the correct data goes into the correct database field.
Once your XLS file is properly padded and ordered, strip out the header line, and save as a comma-separated CSV file.
2) Specifying Column Names
There is a method of specifying which named columns to import (assuming you wish to exclude some). Simply place their names, separated by commas, in the “Column names” field of the import tab. This assumes that you have created an XLS file similar to option 1 above, but that you left the header row in place before exporting as a CSV file. The import wizard will attempt to match and import only the columns you specifically named.
For most use cases here in the United States, you’ll want to use the following options:
Fields terminated by ,
Fields enclosed by ”
Fields escaped by
Lines terminated by auto
With your options set, go ahead and select your CSV file to import and click Go. If successful, you’ll receive a similar message to the one below, letting you know how many records were imported.
I didn’t cover all the options here, as you can see. While it’s true that there are several ways to skin a cat, the above methodology works for most use cases that we’ve come across, and there will always be exceptions. Just wanted to get this out there in case it might help someone (or if I need a refresher in six months).
Did I get something wrong? Do you have a better way? Drop us a line in the comments below and let us know your thoughts.