How to handle many columns / variable schema?

Question

I apologize in advance in case this question has been asked already.

I'm working on revamping a reporting application used within my company. The requirements are:

Support addition of new fields (done through web app) and allow users to select those fields when building reports. Currently there are 300 of these, and right now their values are stored in a single SQL Server table with 300 columns. Users have to be able to select these new fields in report builder. In other words, the schema is dynamic.
Improve report generation performance.

What's the best way to approach this? My thought process was to split up these 300 (and potentially more) columns into multiple tables (normalization), but I'm not sure that's the right approach given there doesn't seem to be a logical way of grouping data without ending up with 20+ tables.

Another option would be to store values in rows (key, attribute, attribute-value) then do a pivot, but I'm not sure that would perform well. This option would handle the dynamic schema nicely, but the pivot statements would have to be built programmatically before a user can consume data (views).

What would be the ideal approach?

Thanks!

I prefer the EAV model, and was able to work around all of its alleged shortcomings in the 13 years I spent at my previous job (the EAV portion of the data was the focus for much of those years). The arrival of SQL Server 2008 was an absolute dream, because filtered indexes allowed me to pay small amounts of storage for much more efficient queries by the busiest customers against the hottest pieces of data. — Aaron Bertrand, Aug 22 '14 at 00:52
You don't really need 20+ tables to represent this data in a normalized way. You should be able to do this with a single table (and perhaps with sparse columns). How many distinct data types are there? Basically you need whatever column(s) relate the data to the owning customer or company or whatever, a column that signifies what this piece of data is, and then the value (in our case we just had datetime, string and int, and only one could be populated a time; you may need more granularity). If you're going to break it into more tables, I'd suggest partitioning by customer, not custom data. — Aaron Bertrand, Aug 22 '14 at 00:54
A couple of other questions on this site where you might get a large variety of opinions: http://dba.stackexchange.com/a/40693/1186 http://dba.stackexchange.com/questions/20759/is-there-a-name-for-this-database-structure http://dba.stackexchange.com/questions/49618/to-eav-or-not-to-eav http://dba.stackexchange.com/questions/49195/one-or-several-tables-for-attributes/ http://dba.stackexchange.com/questions/49310/ — Aaron Bertrand, Aug 22 '14 at 01:06
@AaronBertrand: If I understand correctly, these EAV extension tables would be great candidates for column-store indexing. Am I inferring correctly? — Pieter Geerkens, Aug 22 '14 at 01:15
@PieterGeerkens I'm not sure. I think the way that you query for these values makes it hard for columnstore to excel here - usually you get all of the attributes for a customer, not all of the shoe sizes or favorite colors from a single column. I guess it really depends how you're mapping and querying the data. — Aaron Bertrand, Aug 22 '14 at 04:02
@AaronBertrand: the data being stored in this table is either an int, string, or date. I haven't seen much else being stored in it. The schema of this big table could change on any given day (if they decide they want to see something new in a report). I'll give the EAV articles a read. Thanks! — Heidern, Aug 22 '14 at 17:14
@AaronBertrand one more thing: would this structure work well for reporting purposes? The only way I can picture this data being consumed is by doing a huge pivot. Thoughts? — Heidern, Aug 22 '14 at 17:29
@Heidern That's a pretty broad brush, but in our case, that's all we used it for - well, reporting and for filtering users. — Aaron Bertrand, Aug 22 '14 at 17:37

How to handle many columns / variable schema?

0 Answers0