I have ~1000 fixed points representing data collection stations with multiple sensors, and a csv that is ~4.5GB of values in the format:
id pointid sensor measdatetime measvalue
50345 500 tempa 1/1/2011 13:55 72.81
50346 501 tempb 10/31/2011 14:00 75.65
55457 500 gaght 10/31/2011 01:00 6.56
(The data is similar to USGS stream gauges, but from a different network and different format.)
I am somewhat familiar with geodatabase design, but have little experience with the capabilities of the different DBMS's that work with ArcGIS. I'm considering using a variation of the Marine Data Model for the geodatabase design, but am a bit fuzzy about the DBMS ArcSDE Architecture I should use.
It seems that hardware wise, I'll want a dedicated machine for DBMS and SDE with a direct gigabit lan connection to a desktop machine running ArcInfo. For cost and "openness" sake, I'd like to use postgreSQL with postGIS (Followed by MS SQL Server Express, as I'm more familiar with T-SQL), but am not sure of the implications because of having such a large dataset. Also, from what I've researched, PostGRE has much better built in "windowing" for queries (eg. calculating a moving-window 2-week tempa average). If it makes a difference, I'd like to be able to load a given dataset entirely into RAM if I know I will be using it extensively for a specific work session. (Similar to This Helpful Answer)
To The Point:
- Is there a performance reduction by using PostGIS, since there is only a 32-bit version of SDE for it? Can I use 64-bit PostGRE, authorize it with 32-bit SDE, then use a direct connection to PostGRE to avoid a 32-bit "bottleneck"?
- Is there an advantage to keeping the bulk of the data "non-spatial", so I could have 64-bit PostGRE do the queries, then join the results to the points in ArcMap?
- Is there a better overall approach to managing a large amount of data using postGRE, considering ArcSDE's 32-bit limitation?
Other considerations:
- I am using ArcInfo 10.0 with SP3, and have an EDN membership, so I can experiment fairly freely with minimal cost. I also have MS SQL Server 2008 R2 for developers, if needed.
- The desktop machine will obviously need to be windows, but I can run windows or linux for the DBMS machine if there's an advantage.
- There will be a number of static queries (eg 2-week moving average) that I may store the results of, but I would also like to be able to perform a specialized query quickly if needed. (eg. What is the minimum of the daily mean temps from stations 003, 007, and 104, betweem 1/1/2011 and 1/31/2011.)
- I've read this guide, but find it to be mostly ambiguous and not useful for finding any pros or cons regarding specific scenarios (Lots of info, but little useful guidance. Am I really supposed to believe that there is NO difference in performance by using or not using the ArcSDE service?)