5

I am trying to import a .csv file as a layer in QGIS and keep getting a number of errors that I don't understand. Is there any documentation of QGIS errors available? Or can you point me towards a description of what QGIS sees as a valid .csv file? To me, the file looks ok in different editors.

The errors I get are (translated into English, so maybe not exact):

x data sets discarded because of invalid format

y data sets discarded because of invalid geometric definitions

the following lines were not loaded into QGIS because of errors:

invalid sentence format in line xyz, invalid x or y fields in line yxz, ...

there are zyx additional errors in the file.

I tried to find help on the internet, but I did not succeed. Now I don't know where to start because there are so many errors.

The problem exists in QGIS 2.0 (Dufour) and 2.8 (Wien). The errors look the same to me in both. I am trying to insert a .csv file that contains text, a time stamp, and lat and lon information.

The file header looks like this:

ID§$%text§$%timestamp§$%latitude§$%longitude 

As you can see, I chose a set of delimiters that are - hopefully - not present in any Tweet ;) At least for the few example lines QGIS shows when importing the file, delimiting the field worked.

The texts contain emojis that are shown as, e.g., in my bash terminal with locale "de_DE.UTF-8". So maybe they are not proper UTF8?

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
anjuta
  • 201
  • 2
  • 7
  • 1
    You can add info about the .csv file to help to understand better – Ale Jun 10 '15 at 08:46
  • are you using a .csvt as a template. – landocalrissian Jun 10 '15 at 11:45
  • @Ale What kind of information do you need? – anjuta Jun 10 '15 at 11:48
  • @Chris R no, I am not using any template – anjuta Jun 10 '15 at 11:48
  • Is it possible that QGIS somehow has a problem with smileys/emojis? – anjuta Jun 10 '15 at 13:24
  • Can you post a few sample lines from the csv if not the whole thing? It is entirely possible that emoji characters present in a text string are causing issues. It's also possible the text string is too long or also contains commas that are throwing off the delimiting character detection. If there is a header row present, invalid field names could also cause issues. Without more specific information it's very difficult to diagnose any problems. – Chris W Jun 10 '15 at 19:14
  • Thank you for this information. The data is taken from Tweets, so I don't know if I am allowed to post excerpts on the internet. But I'll update my questions with regard to the rest of your comment. What do you mean by "the string is too long"? Is there anywhere I can look these things up, like what QGIS accepts and what it doesn't? – anjuta Jun 11 '15 at 09:36
  • 1
    There are explicit characters for separating btw: http://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text – bugmenot123 Jun 11 '15 at 12:58
  • @bugmenot123 Wow, very useful. I did not know! Here also is a link that explains it: https://ronaldduncan.wordpress.com/2009/10/31/text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text/ But how do I tell QGIS to use these delimiters?? – anjuta Jun 11 '15 at 15:06
  • btw, I just found this - old - documentation: http://docs.qgis.org/1.8/en/docs/user_manual/plugins/plugins_delimited_text.html the plugin is not used anymore (I think), but the functionality is still there – anjuta Jun 11 '15 at 15:14
  • You said your data consisted of a string with other info. Max string length is dependent on the storage format - for example a shapefile string field can only be 254 characters long. Since tweets are limited to 140 or 160 or whatever it is, that shouldn't be an issue. Sounds like you have identified the issue as the delimiting characters. – Chris W Jun 11 '15 at 18:13

2 Answers2

2

I found the solution for my problem. There were (probably) two "errors" in my .csv.

  1. QGIS seems to detect that my data field called "text" contained exactly this - a text. Thus, it encoded it as a String (or character) data type. Within such a data field, however, double quotation marks are only allowed at beginning and end or not at all. Since my Tweets contained quotes within them, this seemed to cause the problem. Single quotation marks (the apostrophe) are no a problem (so I just converted them and since then it works). On the other hand, it does not make a difference whether the whole field is contained within quotation marks or not.

  2. QGIS treats a sequence of characters that you enters as "custom delimiters" like a logical or, not and. Thus, when I used "$%&" as delimiters, it also split the record at a single "&". So my choice of delimiters also caused a problem. But I got the idea to use and alt-key character (pressing alt-key and a letter at the same time) as a delimiter from Branco's answer here. I used alt-key plus s and QGIS import the file without any problems.

Also, for the record and in case other German speaking users run into the same error: "Ungültiges Satzformat" translates into "invalid record format" which gets you on the right track while searching for a solution to this error. I ultimately got the idea that the problem has something to do with strings and quotation marks from the comments to this question.

anjuta
  • 201
  • 2
  • 7
1

What editor are you using to save the CSV?

I find that saving the CSV data through LibreOffice, rather than Microsoft Excel helps resolve this in certain cases, particularly on Mac OSX.

a_merko
  • 71
  • 1
  • 4
  • I created the file through Java. No editor involved. But I looked at it in LibreOffice Calc and in Kate and Scite. – anjuta Jun 10 '15 at 11:49
  • Could be the encoding? My default encoding is latin-8, which for some reason, sometimes gives me problems in QGIS2.8 with CSV (same problem as you listed). Perhaps try importing the files with UTF-8 as the encoding? – a_merko Jun 10 '15 at 12:53
  • Good idea, but I just checked: My default encoding in QGIS 2.8 is UTF-8. This should be the same in Java 8. I also just opened the file in Scite and set encoding to UTF-8. Looks ok. – anjuta Jun 10 '15 at 13:23