46

I have a data set of values over a km grid in the continental U.S. The columns are "latitude", "longitude", and "observation", e.g.:

"lat"    "lon"     "yield"
 25.567  -120.347  3.6 
 25.832  -120.400  2.6
 26.097  -120.454  3.4
 26.363  -120.508  3.1
 26.630  -120.562  4.4

or, as an R data frame:

mydata <- structure(list(lat = c(25.567, 25.832, 26.097, 26.363, 26.63), 
lon = c(-120.347, -120.4, -120.454, -120.508, -120.562), 
yield = c(3.6, 2.6, 3.4, 3.1, 4.4)), .Names = c("lat", 
"lon", "yield"), class = "data.frame", row.names = c(NA, -5L))

(the full data set can be downloaded as csv here)

The data are output from a crop model (intended to be on) a 30km x 30km grid (from Miguez et al 2012).

enter image description here

How can I convert these to a raster file with GIS - related metadata such as map projection?

Ideally the file would be a text (ASCII?) file because I would like for it to be platform and software independent.

Abe
  • 1,933
  • 3
  • 22
  • 29
  • As CSV, this already is a "text file" in ASCII. Also, as it uses no projection at all, there may be little relevant metadata to add (datum, mostly). Could you be a little more specific about what kind of output you seek and what you intend to do with it? – whuber Feb 08 '12 at 22:42
  • I would like to make it as easy as possible for someone to use the data to with a variety of mapping software (ArcGIS, Google Maps, Grass, R, etc.) so as to facilitate reuse, e.g. by not requiring additional conversion steps. Based on the Wikipedia page of GIS file formats, I infer 1) a "raster" file should have rownames with latitude and column names of longitude, like an image and that 2) metadata should include geographical information (location of a corner, area covered by data). – Abe Feb 08 '12 at 23:30
  • This is one of the best references I came across on R and GIS. Thank you very much! Can you please provide another csv with lat and long with correct proj4string? I will really appreciate that. –  Sep 10 '14 at 21:25
  • @Nandini Not sure what the correct proj4string is, I suspect lambert conformal: proj +proj=lcc +lat_1=50.0 +lat_2=50.0 +units=km +lon_0=-145.5 +lat_0=1.0. I am not sure what you are asking for with respect to another csv file - how would it differ from the one linked in to in the question, or that would be produced by the accepted answer? – Abe Sep 18 '14 at 02:20
  • for me doesn 't work! I don' t know what to put on "x" and "y" to "coordinates(pts)=~x+y" –  Apr 10 '15 at 19:21

3 Answers3

48

Several steps required:

  1. You say it's a regular 1km grid, but that means the lat-long aren't regular. First you need to transform it to a regular grid coordinate system so the X and Y values are regularly spaced.

    a. Read it into R as a data frame, with columns x, y, yield.

    pts = read.table("file.csv",......)
    

    b. Convert the data frame to a SpatialPointsDataFrame using the sp package and something like:

    library(sp)
    library(rgdal)
    coordinates(pts)=~x+y
    

    c. Convert to your regular km system by first telling it what CRS it is, and then spTransform to the destination.

    proj4string(pts)=CRS("+init=epsg:4326") # set it to lat-long
    pts = spTransform(pts,CRS("insert your proj4 string here"))
    

    d. Tell R that this is gridded:

    gridded(pts) = TRUE
    

    At this point you'll get an error if your coordinates don't lie on a nice regular grid.

  2. Now use the raster package to convert to a raster and set its CRS:

    r = raster(pts)
    projection(r) = CRS("insert your proj4 string here")
    
  3. Now have a look:

    plot(r)
    
  4. Now write it as a geoTIFF file using the raster package:

    writeRaster(r,"pts.tif")
    

This geoTIFF should be readable in all major GIS packages. The obvious missing piece here is the proj4 string to convert to: this will probably be some kind of UTM reference system. Hard to tell without some more data...

whuber
  • 69,783
  • 15
  • 186
  • 281
Spacedman
  • 63,755
  • 5
  • 81
  • 115
  • +1 Thanks for laying out the workflow. Note that the data are available at the link provided in the question: take a look. You will discover, alas, that some of your assumptions about them are incorrect. (In particular, I hunted for any documentation about the projection used to create the grid but found none. And it's a strange projection, as you can see by plotting the points.) – whuber Feb 09 '12 at 13:38
  • Its very close to being a UTM system, but none of the ones I've tried are close enough to a regular grid for R to grid them. I'm half tempted to loop through R's entire epsg database.... – Spacedman Feb 09 '12 at 14:23
  • That would be a real tour de force if you could discover the projection that way! The key is to find an effective and efficient criterion to determine when these 7,000+ points are close enough to lying on a regular grid (because it's possible they might not form a perfect grid in any standard projection at all). For a quick run through the database, it should be enough to compare a small number of distances, such as an east-west distance in the north part of the grid to an east-west distance in the south part. That ought to eliminate the vast majority of candidates quickly. – whuber Feb 09 '12 at 14:32
  • If gridded(pts)=TRUE works in R then it will have found the projection - unfortunately when doing spTransform a large number of times it crashes R (with a seg fault). – Spacedman Feb 09 '12 at 16:21
  • What I'm suggesting is that gridded(pts) will never be TRUE, even when you have hit upon the correct projection. Often, grids like these are not perfectly regular. We have to search for projections in which the points are close enough to being on a regular grid. I'm sorry to hear about the crashing, but thanks for trying--it would have been interesting to see this approach succeed. – whuber Feb 09 '12 at 16:37
  • I was gambling on the coordinates being the transformation in one direction of an original regular grid. But yes, data is never exactly what you want! – Spacedman Feb 09 '12 at 17:01
  • @whuber I don't quite understand the confusion - is this not precisely a 30km x 30km grid? If not, it was intended to be. Also, each point represents a mean for the grid cell with each point at the center of a cell, and any transformation that moves a point less a few km would not contribute a meaningful amount to existing uncertainty (derived from, e.g., parameter and model uncertainty). – Abe Feb 10 '12 at 14:38
  • When these data are plotted, Abe, they do not form a grid in lat-lon. In fact, it's impossible to form a regular square grid on the sphere (except for the case of 8 points on vertices of a cube). The grid can be regular only with respect to some projected coordinate system, whose metric distortions prevent most distances from being precisely 30 km. The way in which the rows curve on a lat-lon plot show the projection is unusual. I looked all over the source Web site but could find no documentation at all of the projection used. – whuber Feb 10 '12 at 14:40
  • Just to explain this even more. Consider a 30km square grid with 10 cells per side. At the equator, that covers 300km of equator. Now move north, keeping your 300km length. Not a problem, until you get close to the north pole, where 360 degees round the north pole is 300km. Step further north, and you can't fit 10 30km-spaced cells in. – Spacedman Feb 10 '12 at 14:49
  • On the WGS spheroid, north-south geodesic distances on this grid range from 30.0502 km to 32456.2 km and east-west distances range from 29.9226 to 32.552 km. – whuber Feb 10 '12 at 14:57
  • 3
    I ran through all the (default) projections supported by Mathematica 8. It found a projection in which the points really do seem to fall on a grid: Alaska State Plane (1983) Zone 10! This is a Lambert Conformal Conic projection. I believe it is EPSG 26940. If you modify this to center it approximately at longitude -106, the points form a pretty good grid. – whuber Feb 10 '12 at 15:32
  • @spacedman Your explanation is helpful, but is it relevant to the present example, in which range(lat) = [25, 49] degrees? – Abe Feb 10 '12 at 22:07
  • @whuber how did you formulate the query sent to Mathematica? – Abe Feb 10 '12 at 22:10
  • 1
    Abe, do you mean to read the Web page? It was r = Import[ "https://ebi-forecast.igb.illinois.edu/bety/miscanthusyield.csv", "Data"];. You can get a quick plot of the points afterwards via data = Rest[r]; ListPlot[data[[;; , {3, 2}]]] (or ListPointPlot3D[data[[;; , {3, 2, 4}]]]). For reprojections, start with the help on GeoGridPosition, then make some intelligent guesses and cross-references to figure out what's going on :-). BTW, @Spacedman's explanation really is relevant: the metric distortion from 25 to 49 degrees equals cos(25)/cos(49) = 1.38; that's substantial. – whuber Feb 10 '12 at 22:14
35

Since the question was last answered, a much easier solution exists by using the raster package's rasterFromXYZ function that encapsulates all of the steps necessary (including specification of the CRS string).

library(raster)
rasterFromXYZ(mydata)
Abe
  • 1,933
  • 3
  • 22
  • 29
Lucas Fortini
  • 593
  • 4
  • 9
  • 1
    Apologies to the tireless @Spacedman, who has assisted me often, but I think this answer deserves to inherit the jolly green tick. – geotheory Jul 31 '14 at 12:57
  • @geotheory I would select this answer, its a great function, but it seems to be very slow on the dataset I am using (linked to in the op) – Abe Apr 15 '15 at 14:52
  • 1
    ... in fact it choked because it took my ~400KB file and created a file in /tmp/ that was ~19GB when I ran out of disk space. – Abe Apr 15 '15 at 15:12
  • There's probably an n-squared process in there somewhere. You might be able to group the points data by a broad grid, rasterise each group individually and then merge() the results together. – geotheory Apr 15 '15 at 21:22
  • With all due respect, but this answer is way better than Spacedman's. – Ghost May 25 '17 at 19:39
  • 1
    I'm in the same boat, with irregular lat/lon values but with a known grid and this function seems easy but it just doesn't work with irregular values. – B.Quaink May 06 '20 at 11:04
2

After reading the comments (not too old) under the great answer from Lucas, I thought it would be imperative to reply with an up-to-date solution even if the question is too old now.

If your lat lon in the data are equally spaced (to form consistent cell size of the resultant raster) then the simple rast(x, type="xyz") function from the (comparatively) new package terra may be used due to the following 2 reasons

  1. Better Performance
  2. Creation of multidimensional raster in case of more than 3 columns like x, y, yield1, yield2, yield3 etc., or a time series
library(terra)
mydata <- data.frame(lat = c(25.5, 25.5, 25.5, 25.75, 25.75, 25.75), 
          lon = c(-120.3, -120.55, -120.8, -120.3, -120.55, -120.8), 
          yield = c(3.6, 2.6, 3.4, 3.1, 4.4, 5.3))

myras <- rast(mydata, type="xyz")

myras #class : SpatRaster #dimensions : 3, 2, 1 (nrow, ncol, nlyr) #resolution : 0.25, 0.25 (x, y) #extent : 25.375, 25.875, -120.925, -120.175 (xmin, xmax, ymin, ymax) #coord. ref. :
#source(s) : memory #name : yield #min value : 2.6 #max value : 5.3

Robert Hijmans
  • 10,683
  • 25
  • 35
Badar Munir
  • 156
  • 9