SQL Server Database Design for Reporting

Question

We are designing a reporting solution for survey results. Although datasets are reasonable in size (rarely more than 500.000 respondents and 50 questions), performance is obviously a major concern.

Due to the nature of the solution, most queries return aggregated values and no locks are needed.

Storing answers in a "normal" tabular format (i.e. a column for each question and a row for each respondent) works well in terms of performance, and allows us to query the data like so:

SELECT COUNT(*) FROM Answers WHERE Gender = 'M' and Age < 20

However, this design requires a new table for each survey as the questions (columns) differ, which is obviously not an ideal solution.

Therefore, we are considering a design where we store answer data in a table that would basically just hold a respondent ID, a question ID and an answer value, thereby "transposing" the data (i.e. there would be a row for each respondent/question combination in the Answers table).

In this design, we would have to use exists conditions (or joins) to filter our data, e.g:

SELECT COUNT(*) 
FROM Answers AS A1 
WHERE A1.QuestionID = 'Gender' AND A1.VALUE = 'M' AND EXISTS
( 
    SELECT * 
    FROM Answers AS A2 
    WHERE A2.RespondentID = A1.RespondentID AND A2.QuestionID = 'Age' AND A2.Value > 18
)

This would allow as to handle any survey without changing the database schema but we are concerned about what the impact might be on performance?

Or perhaps there is a better way to deal with this issue altogether?

It is true that the one row per respondent and question is very flexible, but it will increase the complexity of processing and maintaining any needed constraints. Another question: Is every column of the same type of data (e.g Yes/No, A/B/C/D, freetext, or...) or can the answers be of all types? — RLF, Aug 28 '13 at 15:26
Do you report on more than one survey? For example, when you ask WHERE Gender = 'M' and Age < 20, are you wanting the count of all surveys or just one particular survey? If it is only one particular survey then one table per survey is the appropriate answer. — Greenstone Walker, Aug 29 '13 at 00:05
We do not combine results from multiple surveys in our reports, so the "one table per survey" design is feasible. — Rasmus B, Aug 29 '13 at 07:29
Types of answer data vary. Most are small integers (say 1-5) while some are variable length text (rarely more than a few 100 characters, but we need to support a few 1000 at least). We do not need to maitain any constraints as the dataset is never updated (we are simply importing an SPSS file with "static" survey data). — Rasmus B, Aug 29 '13 at 07:33
Your proposed design sounds like an Entity-Attribute-Value schema. Check this for opinions on how this works: http://dba.stackexchange.com/a/20763/10832 — Hannah Vernon, Sep 26 '13 at 04:17

score 0 · Accepted Answer · answered Oct 21 '13 at 18:04

0

We did some performance tests and found that - with scenarios that are typical for our application domain - it is 10-20 times faster to query data when stored in a regular "tabular" format.

answered Oct 21 '13 at 18:04

Rasmus B

3
2

SQL Server Database Design for Reporting

1 Answers1