27

Is there a way to determine the charset used for a given shapefile?

Mike T
  • 42,095
  • 10
  • 126
  • 187
Matthew Finlay
  • 689
  • 1
  • 8
  • 15

4 Answers4

14

There are two ways for programs to determinate the character set for a shapefile.

  • +1 That link to a dBase file format page is great. However, AFAIK, codepages were never included in the dBase III format. The reference there is to a FoxPro extension of the format, which suggests not all .dbf files are going to have codepage info in them (or, if they do, it might be a result of garbage bytes appearing in a free area of the header). But if you can dig this information out, it's still a good start for a trial-and-error search. BTW, welcome to our site! – whuber Jul 20 '11 at 13:33
  • Some python dbf libraries can read the codepage if information is there. – Paulo Scardine Aug 06 '12 at 19:19
10

Trial and error. Try to open the .dbf file with Ms Excel or with OpenOffice using different setting until you get everything right.

Look at this post for more clues: https://stackoverflow.com/questions/319095/how-do-i-determine-the-character-set-of-a-string

Pablo
  • 9,827
  • 6
  • 43
  • 73
  • If nothing is known about the encoding, it's worthwhile to try latin1 or UTF-8 first. – krlmlr Mar 29 '15 at 16:00
  • I have opened the .dbf in Ms Excel and the characters show up normally. How can I see which encoding Excel is using/detecting? So that I can set it in QGIS... – user3386170 May 24 '18 at 18:36
5

The file utility is able to guess the encoding of a text file. Use ogr2ogr for a conversion that preserves the original encoding if there is no .cpg file:

ogr2ogr -f CSV file.csv file.dbf
file file.csv

Example output:

file.csv: ISO-8859 text

I have tested it with two of the most frequent encodings, UTF-8 and latin1. Works out of the box in Ubuntu, not sure about OS X. I'm not aware of a file utility on Windows.

NOTE: As soon as there is a corresponding .cpg file that indicates the encoding, ogr2ogr will honor it and the output will be in UTF-8. But if the CSV output looks right, you know that the information in the .cpg file is accurate.

krlmlr
  • 213
  • 2
  • 7
3

Another table for converting 29th byte of *.dbf to code page: http://webhelp.esri.com/arcpad/8.0/referenceguide/index.htm#locales/task_code.htm