4

I've written the Python code below in PyQGIS which loops through a ~30 MB shapefile. It adds an attribute with features which are calculated based on another attribute.

Is it normal that it takes around 20 minutes to fully execute the code?

Sometimes it doesn't execute it at all - QGIS just loads forever (waited over 30 minutes) until I close it.

The shapefile has ~130.000 features and 27 attributes.

My laptop has 16 GB RAM and an Intel Core i5-1135G7.

Code:

from qgis.PyQt.QtCore import QVariant
from qgis.core import QgsProject

layer = QgsProject.instance().mapLayersByName("677_5334")[0] iface.setActiveLayer(layer)

caps = layer.dataProvider().capabilities()

if caps & QgsVectorDataProvider.AddAttributes: res = layer.dataProvider().addAttributes([QgsField("Stockwerke", QVariant.Double)]) layer.updateFields()

exp1 = QgsExpression('"HOEHEGEB"/3.5')

context = QgsExpressionContext() context.appendScopes(
QgsExpressionContextUtils.globalProjectLayerScopes(layer))

with edit(layer): for f in layer.getFeatures(): context.setFeature(f) f["Stockwerke"] = exp1.evaluate(context) layer.updateFeature(f)

Updated the code (but doesn't work any faster unfortunately):

from qgis.PyQt.QtCore import QVariant
from qgis.core import QgsProject

layer = QgsProject.instance().mapLayersByName("675_5331")[0] iface.setActiveLayer(layer)

caps = layer.dataProvider().capabilities()

if caps & QgsVectorDataProvider.AddAttributes: res = layer.dataProvider().addAttributes([QgsField("Stockwerke", QVariant.Double)]) layer.updateFields()

visited_index = layer.fields().indexFromName("Stockwerke") attr_map = {}

exp1 = QgsExpression('"HOEHEGEB"/3.5')

context = QgsExpressionContext() context.appendScopes(
QgsExpressionContextUtils.globalProjectLayerScopes(layer))

for f in layer.getFeatures(): context.setFeature(f) attr_map[f.id()] = {visited_index: exp1.evaluate(context)} layer.dataProvider().changeAttributeValues(attr_map)

Ian Turton
  • 81,417
  • 6
  • 84
  • 185
Jonas
  • 159
  • 9
  • 2
    I cannot provide a detailed answer now, but I would use layer.dataProvider().changeAttributeValues() instead of edit and layer.updateFeature(). – Kadir Şahbaz Mar 10 '21 at 11:50
  • 2
    Please review this thread: https://gis.stackexchange.com/q/381174/29431 – Kadir Şahbaz Mar 10 '21 at 11:51
  • Also https://gis.stackexchange.com/questions/200997/is-there-a-faster-process-to-update-one-column-for-all-features/215464#215464 – Germán Carrillo Mar 10 '21 at 16:28
  • @KadirŞahbaz I updated the code (see answer below) like you suggested but It's just as slow...do you have any idea why? Did I use the suggested module wrong? – Jonas Mar 11 '21 at 07:38
  • Copy your whole shapefile into a memory layer, that can maybe speed the process. – J. Monticolo Mar 11 '21 at 07:59
  • @J.Monticolo how do I do that? – Jonas Mar 11 '21 at 08:54
  • Select all your features, Edit menu > Copy features, Edit menu > paste as > memory layer. Your features will be in memory so processing may be faster. But if your code is slow, it's not solving your problem. – J. Monticolo Mar 11 '21 at 09:04

1 Answers1

1

You may use "Refactor fields" processing tool as an alternative to achieve your goal. I've made a standalone sample below; The key is part 'expression': '\"pop_est\" - 20000',

You would replace the logic with your '"HOEHEGEB"/3.5'

You can told us if it speed up your processing.

import processing

input_path = '/vsicurl/https://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_admin_0_countries.geojson' parameters = { 'INPUT': input_path, 'FIELDS_MAPPING':[{ 'expression': '"scalerank"', 'length': 0, 'name': 'scalerank', 'precision': 0, 'type': 2 },{ 'expression': '"labelrank"', 'length': 0, 'name': 'labelrank', 'precision': 0, 'type': 2 },{ 'expression': '"sovereignt"', 'length': 0, 'name': 'sovereignt', 'precision': 0, 'type': 10 },{ 'expression': '"sov_a3"', 'length': 0, 'name': 'sov_a3', 'precision': 0, 'type': 10 },{ 'expression': '"adm0_dif"', 'length': 0, 'name': 'adm0_dif', 'precision': 0, 'type': 2 },{ 'expression': '"level"', 'length': 0, 'name': 'level', 'precision': 0, 'type': 2 },{ 'expression': '"type"', 'length': 0, 'name': 'type', 'precision': 0, 'type': 10 },{ 'expression': '"admin"', 'length': 0, 'name': 'admin', 'precision': 0, 'type': 10 },{ 'expression': '"adm0_a3"', 'length': 0, 'name': 'adm0_a3', 'precision': 0, 'type': 10 },{ 'expression': '"geou_dif"', 'length': 0, 'name': 'geou_dif', 'precision': 0, 'type': 2 },{ 'expression': '"geounit"', 'length': 0, 'name': 'geounit', 'precision': 0, 'type': 10 },{ 'expression': '"gu_a3"', 'length': 0, 'name': 'gu_a3', 'precision': 0, 'type': 10 },{ 'expression': '"su_dif"', 'length': 0, 'name': 'su_dif', 'precision': 0, 'type': 2 },{ 'expression': '"subunit"', 'length': 0, 'name': 'subunit', 'precision': 0, 'type': 10 },{ 'expression': '"su_a3"', 'length': 0, 'name': 'su_a3', 'precision': 0, 'type': 10 },{ 'expression': '"brk_diff"', 'length': 0, 'name': 'brk_diff', 'precision': 0, 'type': 2 },{ 'expression': '"name"', 'length': 0, 'name': 'name', 'precision': 0, 'type': 10 },{ 'expression': '"name_long"', 'length': 0, 'name': 'name_long', 'precision': 0, 'type': 10 },{ 'expression': '"brk_a3"', 'length': 0, 'name': 'brk_a3', 'precision': 0, 'type': 10 },{ 'expression': '"brk_name"', 'length': 0, 'name': 'brk_name', 'precision': 0, 'type': 10 },{ 'expression': '"brk_group"', 'length': 0, 'name': 'brk_group', 'precision': 0, 'type': 10 },{ 'expression': '"abbrev"', 'length': 0, 'name': 'abbrev', 'precision': 0, 'type': 10 },{ 'expression': '"postal"', 'length': 0, 'name': 'postal', 'precision': 0, 'type': 10 },{ 'expression': '"formal_en"', 'length': 0, 'name': 'formal_en', 'precision': 0, 'type': 10 },{ 'expression': '"formal_fr"', 'length': 0, 'name': 'formal_fr', 'precision': 0, 'type': 10 },{ 'expression': '"note_adm0"', 'length': 0, 'name': 'note_adm0', 'precision': 0, 'type': 10 },{ 'expression': '"note_brk"', 'length': 0, 'name': 'note_brk', 'precision': 0, 'type': 10 },{ 'expression': '"name_sort"', 'length': 0, 'name': 'name_sort', 'precision': 0, 'type': 10 },{ 'expression': '"name_alt"', 'length': 0, 'name': 'name_alt', 'precision': 0, 'type': 10 },{ 'expression': '"mapcolor7"', 'length': 0, 'name': 'mapcolor7', 'precision': 0, 'type': 2 },{ 'expression': '"mapcolor8"', 'length': 0, 'name': 'mapcolor8', 'precision': 0, 'type': 2 },{ 'expression': '"mapcolor9"', 'length': 0, 'name': 'mapcolor9', 'precision': 0, 'type': 2 },{ 'expression': '"mapcolor13"', 'length': 0, 'name': 'mapcolor13', 'precision': 0, 'type': 2 },{ 'expression': '"pop_est"', 'length': 0, 'name': 'pop_est', 'precision': 0, 'type': 2 },{ 'expression': '"gdp_md_est"', 'length': 0, 'name': 'gdp_md_est', 'precision': 0, 'type': 6 },{ 'expression': '"pop_est" - 20000', 'length': 0, 'name': 'pop_year', 'precision': 0, 'type': 2 },{ 'expression': '"lastcensus"', 'length': 0, 'name': 'lastcensus', 'precision': 0, 'type': 2 },{ 'expression': '"gdp_year"', 'length': 0, 'name': 'gdp_year', 'precision': 0, 'type': 2 },{ 'expression': '"economy"', 'length': 0, 'name': 'economy', 'precision': 0, 'type': 10 },{ 'expression': '"income_grp"', 'length': 0, 'name': 'income_grp', 'precision': 0, 'type': 10 },{ 'expression': '"wikipedia"', 'length': 0, 'name': 'wikipedia', 'precision': 0, 'type': 2 },{ 'expression': '"fips_10"', 'length': 0, 'name': 'fips_10', 'precision': 0, 'type': 10 },{ 'expression': '"iso_a2"', 'length': 0, 'name': 'iso_a2', 'precision': 0, 'type': 10 },{ 'expression': '"iso_a3"', 'length': 0, 'name': 'iso_a3', 'precision': 0, 'type': 10 },{ 'expression': '"iso_n3"', 'length': 0, 'name': 'iso_n3', 'precision': 0, 'type': 10 },{ 'expression': '"un_a3"', 'length': 0, 'name': 'un_a3', 'precision': 0, 'type': 10 },{ 'expression': '"wb_a2"', 'length': 0, 'name': 'wb_a2', 'precision': 0, 'type': 10 },{ 'expression': '"wb_a3"', 'length': 0, 'name': 'wb_a3', 'precision': 0, 'type': 10 },{ 'expression': '"woe_id"', 'length': 0, 'name': 'woe_id', 'precision': 0, 'type': 2 },{ 'expression': '"adm0_a3_is"', 'length': 0, 'name': 'adm0_a3_is', 'precision': 0, 'type': 10 },{ 'expression': '"adm0_a3_us"', 'length': 0, 'name': 'adm0_a3_us', 'precision': 0, 'type': 10 },{ 'expression': '"adm0_a3_un"', 'length': 0, 'name': 'adm0_a3_un', 'precision': 0, 'type': 2 },{ 'expression': '"adm0_a3_wb"', 'length': 0, 'name': 'adm0_a3_wb', 'precision': 0, 'type': 2 },{ 'expression': '"continent"', 'length': 0, 'name': 'continent', 'precision': 0, 'type': 10 },{ 'expression': '"region_un"', 'length': 0, 'name': 'region_un', 'precision': 0, 'type': 10 },{ 'expression': '"subregion"', 'length': 0, 'name': 'subregion', 'precision': 0, 'type': 10 },{ 'expression': '"region_wb"', 'length': 0, 'name': 'region_wb', 'precision': 0, 'type': 10 },{ 'expression': '"name_len"', 'length': 0, 'name': 'name_len', 'precision': 0, 'type': 2 },{ 'expression': '"long_len"', 'length': 0, 'name': 'long_len', 'precision': 0, 'type': 2 },{ 'expression': '"abbrev_len"', 'length': 0, 'name': 'abbrev_len', 'precision': 0, 'type': 2 },{ 'expression': '"tiny"', 'length': 0, 'name': 'tiny', 'precision': 0, 'type': 2 },{ 'expression': '"homepart"', 'length': 0, 'name': 'homepart', 'precision': 0, 'type': 2 },{ 'expression': '"featureclass"', 'length': 0, 'name': 'featureclass', 'precision': 0, 'type': 10}], 'OUTPUT':'/tmp/refactorized.shp' }

feedback = QgsProcessingFeedback()

out = processing.run("native:refactorfields", parameters)

project = QgsProject.instance() vl = QgsVectorLayer(out['OUTPUT'], "Refactored", "ogr") project.addMapLayer(vl)

ThomasG77
  • 30,725
  • 1
  • 53
  • 93
  • Thank you! I'll definitely try it out and will tell you on monday if it worked – Jonas Mar 12 '21 at 16:25
  • UPDATE: It definitely sped it up significantly, so thank you very much :) But why is that? It actually doesn't do anything else then my code I guess – Jonas Mar 16 '21 at 09:45
  • Except it's C++ instead of Python code behind the scene, got no explanations. Did not investigate enough in your code and preferred to go the alternative way, so can't give infos about where and why it could be slow – ThomasG77 Mar 16 '21 at 11:01