4

What is a good way to convert a large (100MB) CSV file to GeoJSON on the command line, preferably without GDAL? I tried Mapbox's csv2geojson but it ran out of memory.

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Steve Bennett
  • 5,682
  • 2
  • 44
  • 69

3 Answers3

6

Since I'm comfortable with Node, I came up with this stream-oriented script. It's easy to transform values in the process.

#!/usr/bin/env node
const fs = require('fs');
const turf = require('@turf/turf');

function rowToFeature(record, callback) {

    // Adjust these field names according to your file
    if (!(record.Latitude && record.Longitude)) {
        return;
    }
    let feature = turf.point([ +record.Longitude, +record.Latitude ], {
        myfield: record['My Input Field'],
        joinedfield: record.field1 + record.field2
        // ... etc
    });
    callback(null, JSON.stringify(feature) + '\n');
}

fs.createReadStream('infile.csv')
    .pipe(require('csv-parse')({ columns: true }))
    .pipe(require('stream-transform')(rowToFeature))
    .pipe(fs.createWriteStream('outfile'));

Before running:

npm init
npm install csv-parse stream-transform
Steve Bennett
  • 5,682
  • 2
  • 44
  • 69
  • +1 As a 'Node guy', I like this solution. There is a useful module in the NPM repository called dbGeo which takes data usually from a DB query but it cold equally be a parsed CSV file and does a similar thing to this. – MappaGnosis Jan 30 '18 at 09:03
3

This could be a job for CSVKit, my new favourite tool:

https://source.opennews.org/articles/eleven-awesome-things-you-can-do-csvkit/

5. Turn a CSV with latitude and longitude columns into GeoJSON

Is your data geographic, like in our previous example? Then don’t stop with JSON—go one step further and make it GeoJSON!

**csvjson --lat latitude --lon longitude --key slug --crs EPSG:4269 --indent 4 geo.csv > geo.jso**n

[geo.json]

{
    "type": "FeatureCollection",
    "bbox": [
        -95.31571447849274,
        32.299076986939205,
        -95.28174,
        32.35066
    ],
    "features": [
        {
            "type": "Feature",
            "id": "dcl",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -95.30181,
                    32.35066
                ]
            },
            "properties": {
                "place": "Downtown Coffee Lounge"
            }
        },
        {
            "type": "Feature",
            "id": "tyler-museum",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -95.28174,
...

Reference: csvjson

DPSSpatial_BoycottingGISSE
  • 18,790
  • 4
  • 66
  • 110
1

Another node solution that would require less coding than Steve Bennett's excellent solution would be to use the NPM Module called csv2geojson.

There is also another modules called csv3geojson. This version purports to be faster than csv2geojson and supports streaming so maybe better in this particular use-case. I can't vouch for it though as I have not used it, but certainly worth a look.

Also see Converting CSV file to GeoJSON while preserving data types? which is a similar Q&A.

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
MappaGnosis
  • 33,857
  • 2
  • 66
  • 129
  • Thanks. As I mentioned in the Q, csv2geojson ran out of memory for me. I tried to increase the maximum heap size with an environment variable but I'm not sure it had any effect. A streaming version would be likely to work. – Steve Bennett Jan 31 '18 at 21:48