This a question about database modeling and PostgreSQL. This post partially answers the question but I need advice on more technical stuff: columns and query to extract the data.
I need to store administrative boundaries history within a given territory in order to be able to track boundaries movements back over the passed years.
Here is an example on how boundaries move:
- Martin Land area has always existed
- Bulwers Land area has always existed too
- Those areas were merged together in 2015:
- The new area was renamed as Martin-Bulwers
- The new area kept Bulwers Land code (Martin Land code becoming deprecated)
Here is the way I store the data:
gid | code | name | change_date
-----+--------+-----------------+-------------
1 | 86001 | Martin Land | 2000-01-01
2 | 86002 | Bulwers Land | 2000-01-01
3 | 86002 | Martin-Bulwers | 2015-01-01
I have set a default change_date to 2000-01-01 for historical situation of an area, i.e. before the first changement occurs.
Then, I have many different cases: areas that have remained unmerged, areas that have merged with others at different dates. This can produce the following example:
CREATE TEMPORARY TABLE foo AS
SELECT gid,code,area,change_date::date FROM ( VALUES
( 1,86001,'Martin Land' ,'2000-01-01' ),
( 2,86002,'Bulwers Land' ,'2000-01-01' ),
( 3,86002,'Martin-Bulwers','2015-01-01' ),
( 4,86003,'Coveral Land' ,'2000-01-01' ),
( 5,86004,'Big Tom Area' ,'2000-01-01' ),
( 6,86005,'Small Tom Area','2000-01-01' ),
( 7,86004,'Tom Land' ,'2016-01-01' )
) AS t(gid,code,area,change_date);
Then I have difficulties writing a query that could return the list of areas of a given year. I tried to rely on DISTINCT ON() clause but this does not the job I need...
For instance, the following query will return the following table:
SELECT DISTINCT ON (code) code, area, change_date
FROM myTable WHERE change_date < '2016-01-01'
ORDER BY code, change_date DESC ;
-- Result:
code | area | change_date
-------+-----------------+-------------
86001 | Martin Land | 2000-01-01
86002 | Martin-Bulwers | 2015-01-01
86003 | Coveral Land | 2000-01-01
86004 | Big Tom Area | 2000-01-01
86005 | Small Tom Area | 2000-01-01
This is partially right as before 2016-01-01, Big Tom Area and Small Tom Area had not merged together but Martin Land and Bulwers Land had! There were 5 areas in 2014, then 4 areas in 2015 and 3 in 2016.
In fact, I would need the following result:
code | area | change_date
-------+-----------------+-------------
86002 | Martin-Bulwers | 2015-01-01
86003 | Coveral Land | 2000-01-01
86004 | Big Tom Area | 2000-01-01
86005 | Small Tom Area | 2000-01-01
Is there another way to store these informations or to write the query?
2000-01-01? In your example above, how do we infer thatCoveral Landis separate from all the other sets? Because it both COMES BEFORE, and has a ID AFTER? – Evan Carroll Jan 17 '17 at 19:44