What is a UTF-8 CSV and Why Should I Care?

UTF-8, or “Unicode Transformation Format, 8 Bit” is a marketing operations pro’s best friend when it comes to data imports and exports. It refers to how a file’s character data is encoded when moving files between systems. This article talks specifically about the interaction of UTF-8 data as it pertains to Salesforce and Pardot, however the same principles apply to imports and exports from other CRMs and Marketing automation systems such as Hubspot and Zoho. Read on to see why UTF-8 matters when it comes to CSV files.

You might have noticed when exporting reports from Salesforce, it gives you the option for a UTF-8 encoded excel or CSV.

You can also save an excel file as a UTF-8 encoded CSV when you do a “save as” operation, as well as 3 other types of CSV. The takeaway from this blog post is ALWAYS ALWAYS ALWAYS use the save as type CSV UTF-8 option when creating import files for Pardot or Salesforce.

What isn’t commonly understood is a UTF-8 encoded CSV is the preferred import format for Pardot! This fact is not mentioned anywhere, even in the considerations for importing documentation. But it’s absolutely essential because if there’s any chance at all that your dataset contains any of the following:

Accented characters
Non-English alphabet characters

–You MUST use UTF-8 for the special characters to import properly!

Most commonly this occurs in people’s names, which is the absolute worst place for characters to be mis-encoded. Nobody who enters their name as Inès wants to be greeted by an email that says “Hello In?s”. But that is exactly what will happen if you don’t pay attention to your imports.

Here is a side by side example of what Greek and Japanese characters look like in an original Excel file (on the left, and what happens when that Excel file is saved and then re-opened as CSV UTF-8 (the middle) and regular CSV (the right).

What’s sneaky about this problem is that you could be happily working away on an excel file and everything looks fine, then when you’re finished, you save it to import into Pardot as a CSV, without paying attention to the encoding, and then José becomes Jos? — and you didn’t realize it because you never opened the CSV file after you created it!

Pardot saves records with characters in the UTF-8 format, which means that special characters or alphabets entered on a Pardot form will save with the original characters intact. Pardot provides UTF-8 encoded CSVs for all its export files. Salesforce, as mentioned, will export a UTF-8 CSV file as an option, and if you export an Excel file it will preserve special characters as well.

Here are the most common scenarios when not paying attention to UTF-8 encoding will come back to bite you:

Exporting records from Pardot to do a data append or fix operation and neglecting to save the file for re-import as CSV UTF-8.
Exporting records from Salesforce to import into Pardot to do a record add, data append or fix operation and neglecting to export the file as UTF-8 or save the import file as UTF-8 (note: this presents somewhat less of a problem depending on how your record sync is set up. Any field where “use the most recently updated record” is the rule could be impacted by a faulty upload.
An imported list from a third party source such as a trade show, event, or partner. (With proper permission-based sourcing, of course!)
Transferring data due to decommissioning another marketing system and moving to Pardot.

The moral of the story and my mantra to all my clients: ALWAYS ALWAYS ALWAYS just use UTF-8 and you won’t have to worry about it.

A bonus side note for Pardot users: since Pardot encodes the database entries in UTF-8, you can use any language in an automation rule condition, engagement studio logic branch, or dynamic list, and you can even mix languages in the same field if you are using a semicolon separated list. So for example if you wanted to create an automation rule to look for all records with the title “manager” in any language, you could create a semicolon separated list of manager in any language you wanted, including double-byte alphabets such as Japanese, Chinese and Korean.

Related Posts