0

I am working with Netezza SQL.

I have the following table ("my_table") which has information on university students over a period of years (2010-2015), the major they were enrolled in during a given year, the results of an exam (1 = pass, 0 = fail) during a given year and if they studied before the exam (1 = studied, 0 = did not study) during a given year. Some students did not attend the university during all years - in such instances, there are no rows for that student in that particular year:

    student_id current_major year exam_result studied_for_exam
1            1       Science 2010           0                1
2            1          Arts 2013           1                1
3            1          Arts 2014           0                1
4            2       Science 2011           1                0
5            2          Arts 2012           1                1
6            2       Science 2013           1                1
7            3          Arts 2014           1                0
8            3          Arts 2015           1                1
9            4          Arts 2012           0                1
10           4       Science 2013           1                1
11           5          Arts 2010           0                0
12           5          Arts 2011           0                0
13           5       Science 2014           1                1

My Question:

  • For the year = 2011, I want to find out the major each student was enrolled in (i.e. in 2011), the exam results (i.e. in 2011) and if they studied for the exam (i.e. in 2011).

  • Then, I want to make a table which groups all students into combinations based on their current_major (2011), exam_result (2011), studied_for_exam (2011) - and see which years they attended the university

The final results should look something like this (note: I don't mind the year_2010 column appearing in the result even though it appears before 2011:

  major_in_year exam_result_in_year studied_for_exam_in_year year_2010 year_2011 year_2012 year_2013 year_2014 year_2015 count
1         Science                   0                        1         0         1         0         1         0         0            3
2         Science                   1                        1         0         1         0         1         0         0            8
3            Arts                   0                        1         1         1         1         0         0         0            5
4         Science                   0                        1         0         1         1         1         1         0            3

Such a table would allow me to answer questions such as:

  • Of the students that were enrolled in Science in 2011 that studied for their exam in 2011 and passed their exam in 2011 - 8 of these students were registered in the university 2013 and were not registered in 2012, 2014 and 2015

This is my attempt to write the query:

WITH major_exam_study_in_year AS (
    SELECT student_id, current_major AS major_in_year, exam_result AS exam_result_in_year, studied_for_exam AS studied_for_exam_in_year
    FROM (
        SELECT student_id, current_major, exam_result, studied_for_exam, year,
        ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY year) AS rn
        FROM my_table
        WHERE year = 2011
    ) sub
    WHERE rn = 1
)
SELECT mesy.major_in_year,
       mesy.exam_result_in_year,
       mesy.studied_for_exam_in_year,
       year_2010,
       year_2011,
       year_2012,
       year_2013,
       year_2014,
       year_2015, 
       COUNT(*) 
FROM (
    SELECT student_id,
           MAX(CASE WHEN (year = 2010) THEN 1 ELSE 0 END) AS year_2010,
           MAX(CASE WHEN (year = 2011) THEN 1 ELSE 0 END) AS year_2011,
           MAX(CASE WHEN (year = 2012) THEN 1 ELSE 0 END) AS year_2012,
           MAX(CASE WHEN (year = 2013) THEN 1 ELSE 0 END) AS year_2013,
           MAX(CASE WHEN (year = 2014) THEN 1 ELSE 0 END) AS year_2014,
           MAX(CASE WHEN (year = 2015) THEN 1 ELSE 0 END) AS year_2015
    FROM my_table
    GROUP BY student_id
) a
LEFT JOIN major_exam_study_in_year mesy ON a.student_id = mesy.student_id
GROUP BY mesy.major_in_year, 
         mesy.exam_result_in_year,
         mesy.studied_for_exam_in_year,
         year_2010, 
         year_2011, 
         year_2012, 
         year_2013, 
         year_2014, 
         year_2015;

The query is producing the results in the desired format - but seeing as this was a multi-step query, I am worried that I might have missed/overlooked some key steps.

Can someone please tell me if I have done this correctly?

Thanks!

stats_noob
  • 143
  • 7

0 Answers0