mapping over a million points using QGIS

Question

I'm trying to map a CSV with over a million lat/lon points in the US with no metadata. I've never tried to map this much data and want to know what steps I should take. I know it's too much data for QGIS so I tried splitting it into 100,000 line CSVs but then got a lot of files which I didn't know what to do with. I'm thinking it might be best to aggregate the points by census block but don't know the best way to do that using Javascript (I've never used Python) or command line tools? Any suggestions?

Using PostGIS you can easily load the CSV into the database, then convert the lat/lon columns to geometry and you'd be done! Then you can do all the summarizing / aggregating by census block you want to in PostGIS... what are you looking to do with Turf? — DPSSpatial_BoycottingGISSE, Dec 08 '15 at 23:02
Hi, welcome to GIS stackexchange. I'd recommend you have a look at PostgreSQL/PostGIS, a spatially enabled database engine which interfaces nicely with QGIS. Once your data is loaded into a database, it will be relatively trivial to join and aggregate to, for example, census tracts, or to perform clustering on it. — raphael, Dec 08 '15 at 23:03
I think Raphael's comment is the best 'answer' so far. 1 million points is trivial to work with once they are loaded into PostGIS and the QGIS-PostGIS interface tools are very good. — Alexander, Dec 09 '15 at 10:58
Thanks @mapBaker and @raphael! I hadn't thought about using Postgis like that. I imported the CSV in QGUS. Is there anyway to convert that to PostGIS? — user63623, Dec 09 '15 at 16:21
@raphael do you know a good way to convert X,Y to geometry in PostGIS when there are > 1 million records? I've tried to replicate this using a ST_MakePoint method, but memory bombed out - can ST_GeomFromText use existing X and Y columns? — DPSSpatial_BoycottingGISSE, Dec 09 '15 at 17:36
@mapBaker, that's odd, I can't recall having that problem with large datasets. Are you using it in an UPDATE query? The documentation says ST_MakePoint is faster, so I would also assume it might take less memory. @user63623, see my answer below — raphael, Dec 09 '15 at 18:38
@raphael ok as long as that is the correct method, I'm going to have to figure out what's wrong - and yes, using the update query... thanks for that answer! — DPSSpatial_BoycottingGISSE, Dec 09 '15 at 18:42

score 9 · Answer 1 · edited May 23 '17 at 12:39

I would recommend using PostgreSQL/PostGIS, since it is natively supported by QGIS, has some handy built-in functions to join with other spatial data (like census tracts), and the use of a database should limit memory issues with large datasets. My recommended set of steps is below. To execute SQL commands you can either use PGAdmin or QGIS, the former will give you more informative errors on queries, the latter will let you load the results of queries as layers on a map. To access the latter go to Database > DB Manager > DB Manager and click on the second button.

Get started setting up PostGIS, also quick installers
Create a table in your database using the CREATE TABLE command (example) in either the QGIS SQL Window or in PGAdmin's SQL Window
Import your csv with either a COPY sql command or PGAdmin's built in import function by right-clicking on your table in PGAdmin and clicking on Import... (this latter can be delicate so I'd recommend COPY for larger datasets).
Add a geometry column to your table by executing the following SQL in either PGAdmin or the QGIS SQL window.
```
ALTER TABLE some_table ADD COLUMN geom geometry(Point,4326);
```

Create the point geometries using something like

UPDATE yourtable SET geom = ST_SetSRID(ST_MakePoint( x, y), 4326);

Display a subset of data by using a SELECT statement with something like LIMIT 50000
Or join to census data with something like

SELECT c.gid, c.geom
FROM census c
INNER JOIN yourdata ON ST_Within(yourdata.geom, c.geom)

Function references:
ST_Within
ST_MakePoint
ST_SetSRID

Thanks for the reply. I tried step 3 but I cant edit the table in QGIS. Is there another way to add a column without using a plugin? — user63623, Dec 09 '15 at 19:38
@user63623 I updated my answer to include an extra step that should address your issue. — raphael, Dec 09 '15 at 20:32
thanks I think theres a step missing between 1 and 2. When you import csv layer into QGIS you need to save it as a shp file to create a database. Right? It freezes when I try that though. I also tried using PGAdmin but I can't find the import function. — user63623, Dec 09 '15 at 22:14
In PGAdmin you would first have to CREATE your table, using a sql query like this: https://stackoverflow.com/questions/9826833/create-table-in-postgresql, with all your columns from the csv properly defined. Then you can right-click on the tablename > Import... to access the Import wizard, or use a COPY command
http://www.postgresql.org/docs/current/static/sql-copy.html I'll update the answer — raphael, Dec 10 '15 at 02:47

score 3 · Answer 2 · answered Dec 09 '15 at 08:05

I recently worked with a dataset of 1.4 million points imported from a CSV. I made sure to delete any irrelevant fields in the csv. It worked fine for me, even though some processes took a bit of time to execute. (QGIS 2.12, 64 bit Windows 7, 8 Gb RAM)

score 2 · Answer 3 · answered Dec 09 '15 at 06:30

An R candidate fwiw, in pseudocode:

library(rgdal) ## for spatial export
library(readr) ## for fast file read

x <- read_csv("file.csv") 
names(x)  ## some as yet unknown columns

coordinates(x) <- c("x", "y")  ## your coordinate names may be different

writeOGR(x, ".", layer = "filepoints", driver = "MapInfo File")

This will give you a filepoints.tab in the working directory you can read with QGIS. Or choose "ESRI Shapefile" to create a filepoints.shp, or whatever format it is you need. QGIS uses GDAL much like rgdal does, so there's a lot of overlap. There are analogous Python mechanisms.

Since you don't have any metadata, you could have a set after assigning coordinates(x) to set proj4string(x) <- CRS("+proj=something +etc") but we can only guess for your data.

mapping over a million points using QGIS

3 Answers3

Linked