I am working with Netezza SQL.
I have the following table ("my_table") which has information on university students over a period of years (2010-2015), the major they were enrolled in during a given year, the results of an exam (1 = pass, 0 = fail) during a given year and if they studied before the exam (1 = studied, 0 = did not study) during a given year. Some students did not attend the university during all years - in such instances, there are no rows for that student in that particular year:
student_id current_major year exam_result studied_for_exam
1 1 Science 2010 0 1
2 1 Arts 2013 1 1
3 1 Arts 2014 0 1
4 2 Science 2011 1 0
5 2 Arts 2012 1 1
6 2 Science 2013 1 1
7 3 Arts 2014 1 0
8 3 Arts 2015 1 1
9 4 Arts 2012 0 1
10 4 Science 2013 1 1
11 5 Arts 2010 0 0
12 5 Arts 2011 0 0
13 5 Science 2014 1 1
My Question:
For the year = 2011, I want to find out the major each student was enrolled in (i.e. in 2011), the exam results (i.e. in 2011) and if they studied for the exam (i.e. in 2011).
Then, I want to make a table which groups all students into combinations based on their current_major (2011), exam_result (2011), studied_for_exam (2011) - and see which years they attended the university
The final results should look something like this (note: I don't mind the year_2010 column appearing in the result even though it appears before 2011:
major_in_year exam_result_in_year studied_for_exam_in_year year_2010 year_2011 year_2012 year_2013 year_2014 year_2015 count
1 Science 0 1 0 1 0 1 0 0 3
2 Science 1 1 0 1 0 1 0 0 8
3 Arts 0 1 1 1 1 0 0 0 5
4 Science 0 1 0 1 1 1 1 0 3
Such a table would allow me to answer questions such as:
- Of the students that were enrolled in Science in 2011 that studied for their exam in 2011 and passed their exam in 2011 - 8 of these students were registered in the university 2013 and were not registered in 2012, 2014 and 2015
This is my attempt to write the query:
WITH major_exam_study_in_year AS (
SELECT student_id, current_major AS major_in_year, exam_result AS exam_result_in_year, studied_for_exam AS studied_for_exam_in_year
FROM (
SELECT student_id, current_major, exam_result, studied_for_exam, year,
ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY year) AS rn
FROM my_table
WHERE year = 2011
) sub
WHERE rn = 1
)
SELECT mesy.major_in_year,
mesy.exam_result_in_year,
mesy.studied_for_exam_in_year,
year_2010,
year_2011,
year_2012,
year_2013,
year_2014,
year_2015,
COUNT(*)
FROM (
SELECT student_id,
MAX(CASE WHEN (year = 2010) THEN 1 ELSE 0 END) AS year_2010,
MAX(CASE WHEN (year = 2011) THEN 1 ELSE 0 END) AS year_2011,
MAX(CASE WHEN (year = 2012) THEN 1 ELSE 0 END) AS year_2012,
MAX(CASE WHEN (year = 2013) THEN 1 ELSE 0 END) AS year_2013,
MAX(CASE WHEN (year = 2014) THEN 1 ELSE 0 END) AS year_2014,
MAX(CASE WHEN (year = 2015) THEN 1 ELSE 0 END) AS year_2015
FROM my_table
GROUP BY student_id
) a
LEFT JOIN major_exam_study_in_year mesy ON a.student_id = mesy.student_id
GROUP BY mesy.major_in_year,
mesy.exam_result_in_year,
mesy.studied_for_exam_in_year,
year_2010,
year_2011,
year_2012,
year_2013,
year_2014,
year_2015;
The query is producing the results in the desired format - but seeing as this was a multi-step query, I am worried that I might have missed/overlooked some key steps.
Can someone please tell me if I have done this correctly?
Thanks!