1

I am working with Netezza SQL.

I have the following table:

CREATE TABLE MY_TABLE (
    id INT,
    gender VARCHAR(1),
    disease VARCHAR(1),
    income DECIMAL(10, 3)
);

INSERT INTO MY_TABLE (id, gender, disease, income) VALUES ('1', 'f', 'y', '9315.461');

INSERT INTO MY_TABLE (id, gender, disease, income) VALUES ('2', 'm', 'y', '6310.993');

INSERT INTO MY_TABLE (id, gender, disease, income) VALUES ('3', 'm', 'y', '6252.358');

INSERT INTO MY_TABLE (id, gender, disease, income) VALUES ('4', 'f', 'y', '4911.177');

INSERT INTO MY_TABLE (id, gender, disease, income) VALUES ('5', 'f', 'n', '5821.646');

INSERT INTO MY_TABLE (id, gender, disease, income) VALUES ('6', 'f', 'y', '3986.903');

I want to calculate 20 different ranges of income (i.e. ntiles), and calculate the disease rate within each of these ranges.

Here is my attempt to do this:

WITH ntiles AS (
    SELECT 
income,
 disease, 
NTILE(10) OVER (ORDER BY income) as income_ntile
    FROM my_table
)
SELECT 
income_ntile, 
MIN(income) as min_income, 
MAX(income) as max_income,
    COUNT(*) as count, COUNT(CASE WHEN disease = 'y' THEN 1 END) as disease_count,
    COUNT(CASE WHEN disease = 'y' THEN 1 END)*100.0/COUNT(*) as disease_rate
FROM ntiles
GROUP BY income_ntile;

The code runs - but I am not sure if I have done everything correctly.

Is this correct?

Thanks!

stats_noob
  • 143
  • 7

0 Answers0