1

I'm loading the panama papers into neo4j using this tutorial,and the last step, loading the edges, has been running for 12 hours. This is the import statement.

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:/path/all_edges.csv' AS csvLine MATCH (n1 { id: toInt(csvLine.node_1)}),(n2 { id: toInt(csvLine.node_2)}) CREATE (n1)-[:ACCOC {role: csvLine.rel_type}]->(n2)

The edges file is 89MB. Is this normal, or how can I check on the status of this process?

2 Answers2

1

The query itself is problematic.

It's not using labels in the match pattern, which means it will have to perform an AllNodesScan to find both n1 and n2. So that's happening twice per row in your CSV file. An EXPLAIN of the query would have showed you this in the query plan.

You need to add labels into these match patterns for n1 and n2, which would at least get you to two NodeByLabelScans per row, but to make this performant you need to also have an index on the relevant label for the id property.

0

No, that is not normal. No recent processor & disk should take 12 hours to do anything with 89MB.

Start with the basics. Is the process still consuming CPU? Is the disk active. What sort of read & write is it performing. Are the any messages in the DBMS or OS logs?

Michael Green
  • 24,839
  • 13
  • 51
  • 96