6

I'm working in QGIS, and my current method to change CSVs to shapefiles is to import them to QGIS, right click and save as shapefile. For large CSVs (over 100 million rows) it can take quite some time to save. Is there a quicker, more efficient way to do this? Perhaps using GeoPandas?

TomazicM
  • 25,601
  • 22
  • 29
  • 39
Joshua Kidd
  • 565
  • 1
  • 4
  • 15
  • 2
    run ogr2ogr, it is installed in the bin folder of your qgis folder. http://stackoverflow.com/questions/22497541/ogr2ogr-or-arcpy-for-csv-to-shapefile – klewis Jan 19 '17 at 16:44
  • How do I create a sample vrt file? – Joshua Kidd Jan 19 '17 at 17:03
  • Do you need to store some data in your final shapefile, or you simply need to create the points (as geometries)? – mgri Jan 19 '17 at 17:21
  • Just create the points. – Joshua Kidd Jan 19 '17 at 17:22
  • 1
    The vrt file is just a bit of xml, but think of it as a text file. Using the example here: http://gis.stackexchange.com/questions/127518/convert-csv-to-kml-or-geojson-etc-using-ogr2ogr-or-equivalent-command-line-tool you just change the layer name to the base name of your csv file (mycoords.csv to mycoords). Though, this may not be ideal for many points, I'm converting a 1 million record file to shp and its taken at least 20 min and not done. Hasn't crashed though! – Dave-Evans Jan 19 '17 at 17:40
  • 1
    I convert 9 million points from csv to shape with ogr2ogr under Windows in about 5 minutes. Note: The DBF is then > 3 GB. But I think 100 million points are something much for a shapefile. – Mike -EZUSoft- Jan 20 '17 at 08:11
  • I agree with @Mike. I have experienced similar analyses with 1 million rows and it tooked about 40 seconds for creating the points. The main problem is that you don't only need to quickly read the data, but also to create the point geometries, which takes several adding time. My opinion is to split your csv in smaller chunks and then process them separatelly. However, I can post my solution (1 million rows in 40 second, i.e. 100 million rows in more than 1 hour) if it is of your interest. – mgri Jan 20 '17 at 11:24
  • That's alright I'll continue with ogr2ogr on one large file – Joshua Kidd Jan 20 '17 at 14:59
  • What is the structure of your csv file, can you add a sample of it to your question? – BERA Feb 28 '22 at 10:01
  • 1
    If you convert 100M points to a file ending in .shp you don't have a shapefile, which is limited to 2^31-1 bytes. See https://gis.stackexchange.com/questions/348557/how-does-shapefile-2gb-limit-equate-to-70-million-points/348568#348568 – Vince Feb 28 '22 at 12:16

1 Answers1

0

I think that your process is not efficient because of the limitation with shp (see this). QGIS can create those shp with millions of features, but may be there are other ways.

In case you are familiar with R, you can try to do so programatically. It's no big deal. In fact, you can make R script to do so through QGIS. Here I propose 2 methods to do so:

  1. Reading all .csv within a folder and export all them to shp in other folder (this csv could be a manual or macro splited from your original .csv file).
  2. CSV --> GPKG (allows much more features per layer: see here)

So here is method 1:

You can transform csv's into shp's fast, regarding that this csv's are splits of your original data.

library(landtools)
library(mapview)
library(stringr)
library(sf)
#-------------------------------------------------------------------------------
# METHOD 1 --> CSV'S TO SHP
#-------------------------------------------------------------------------------
# List cvs files
csvs <- list.files("./data/", '.csv', full.names = TRUE)

Read all csv files

dfs <- lapply(csvs, function(x) read.csv(x, sep = ';'))

See where your coords are stored (next AtriCoords parameter)

names(dfs[[1]])

TRANSFORM CSV (data.frame) INTO SPATIAL

layers <- lapply(dfs, function(x) dftopoints(x, "EPSG:25829", "EPSG:25829", AtriCoords = c("x", "y")))

Export the layers, setting the name as the original name

outnames <- paste0(str_replace(basename(csvs), '.csv', ''), '.shp') outfiles <- paste0('./output/', outnames) for(i in 1:length(layers)){st_write(layers[[i]], outfiles[[i]], append = FALSE)}

Check the results

mapview(layers[[1]], zcol = 'Altura..m.')

enter image description here

And here method 2:

Notice that, as I don't have one of those .csv of yours, I just create a huge data.frame. In your case, x will be your .csv. Also, you can get rid of the system.time() part because that was only to test how much it takes to create a .gpkg with a > 100 million rows.

# Create a huge (data.frame) to imitate a huge csv 
x <- dfs[[1]]
while(dim(x)[[1]] < 100000000){x <- rbind(x, x)}
str(x)

TRANSFORM CSV (data.frame) INTO SPATIAL

system.time({ lay <- dftopoints(x, "EPSG:25829", "EPSG:25829", AtriCoords = c("x", "y")) })

Export file

system.time({ fname <- paste0("./output/", 'try01.pgkg') st_write(lay, fname, append = FALSE)
})

NOTES:

  • The code need the library landtools just for one function(dftopoints). So, if you can't install the library or you don't want to, you can get the function here. The function needs a data.frame, an epsg of input and output (allows transformations) and the names of the attributes of lon/lat coords.
César Arquero Cabral
  • 3,439
  • 1
  • 16
  • 53