I have +1000 CityGML data with building height that I want to convert to a raster mask for deep learning. So far, I have converted the GML data into GeoJSON with its respective ID as linestring. Now I want to convert the polylines to polygons (QGIS: “lines to polygons”) but it is not correctly working:
I am well aware that similar questions exist on StackOverflow but none of the proposed solutions has worked for me. I tried the following:
Polygonize > works but no attributes are saved which I essentially need for the raster mask
Assign projection
Buffer lines with 0 m
Merge selected lines
Dissolved > the result is much better but it is still not working for all lines

Extract vertices > points to the path
Identification of gaps as shown here (Transforming lines to polygons not working in QGIS?) > gaps found
Select lines > join multiple lines plugin
Fix geometry (no errors found) > lines to polygons
Dissolve > fix geometry > lines to polygons (same result as 5)
I applied the workflow to a single file but need some sort of automation e.g. in Python since I have so many files. The files also contain geometries with “NULL” entries, which would need to be filtered out. The heights are stored in a separate csv file. I’m new to Python.
My test data (EPSG 25832) can be downloaded here.

