1

I have a raster data set with an irregular outline (every value outside is NaN) in Python (loaded from rasterio v.1.3a3 ):

enter image description here

I'd like to extract its outline to a vector shape, such as a Shapely geometry or a WKT/WKB string (red in the figure below) so that, if I use that shape to clip the larger original region which is entirely containing the raster here above, I will end up on the exact same raster data set.

enter image description here

Is there a smarter and more direct way than extracting all the indices of the NaN values to a numpy array (but, hmm, this would be a bunch of useless data as only the ones 'at the border of where the values are' are of interest), figuring a way to apply the affine transform from the raster metadata to that numpy array (how? I can't find a natural way to do so here: https://rasterio.readthedocs.io/en/latest/api/rasterio.transform.html), and finally converting this to a shape so that's it's correctly georeferenced?

I was also thinking using OpenCV but this would probably be overkill...

swiss_knight
  • 10,309
  • 9
  • 45
  • 117
  • See here: https://gis.stackexchange.com/a/216797/57277 – GeoMonkey Apr 25 '22 at 06:02
  • @NathanThomas : would it be possible to stay inside a Python environment? The raster only exists as a numpy array, if I can avoid file system I/O it could be nice. – swiss_knight Apr 25 '22 at 17:44
  • There are GDAL python bindings so you should technically be able to run the same commands but via the api: https://gdal.org/python/ – GeoMonkey Apr 25 '22 at 19:50

1 Answers1

2

You can use rasterio.features.dataset_features

import rasterio as rio
from rasterio.features import dataset_features
import geopandas as gpd

with rio.open(raster_path) as ds: shapes = list(dataset_features(ds, bidx=1, as_mask=True, geographic=False, band=False))

for shape in shapes:
    print(shape['geometry'])

#Or as a GeoDataFrame with rio.open(raster_path) as ds: gdf = gpd.GeoDataFrame.from_features(dataset_features(ds, bidx=1, as_mask=True, geographic=False, band=False)) gdf.to_file(output_path)

user2856
  • 65,736
  • 6
  • 115
  • 196
  • Thanks for you answer, unfortunately, the vector.geojson file that was generated this way is not recognized as valid when I try to load it into QGIS for example. Would it be possible to keep it in memory, e.g. as a Shapely geometry or a WKT string, or a dict (JSON-like structure) in order to, see what's happening within the Python console itself, prior to saving it to the disk, or better, to a PostGIS database? – swiss_knight Apr 25 '22 at 17:34
  • 1
    Yes, just access the shapes. Each of the shape elements is a GeoJSON feature. – user2856 Apr 25 '22 at 21:45
  • Did this solve your question, @swiss_knight? If so, please mark it as solution :) – bugmenot123 Feb 12 '23 at 09:16
  • No, actually, it does not really answer the question. First because of what I said in a previous comment: "the raster only exists as a numpy array, if I can avoid file system I/O it could be nice." and then because the output is not a vector object but a for loop on a list that prints some dictionaries whereas I really need a vector object. I ended up using a solution based on GeoPandas that works very well. If I find the time to post it here... I will. – swiss_knight Feb 12 '23 at 15:27
  • Actually, it does answer your question as written. You then commented that you wanted to keep the vector objects in memory, so I edited my answer to show you how to do that, i.e accessing the features/shapes. If you want a geodataframe, just put the features into one, see edit. Nowhere in your question does it specify the raster "only exists as a numpy array". In fact you actually say that "I have a raster data set" which is not a numpy array. Comments are not for adding additional information to your question. – user2856 Feb 13 '23 at 04:45