How to assign integer value from bins

Question

I am trying to find a pythonic way to assign a numeric value depending on where a variable falls in. That is:

variable = 23
if variable < -100:
    return_value = -15
elif variable <= -5:
    return_value = -4
elif variable <= 5:
    return_value = 18
else:
    return_value = 88

I could of course create a list containing the buckets/values and iterate through and return when the correct value is found:

bucket_values = [(-100, -15), (-5, -4), (5, 18)]
default = 88
variable = 100
for lower_bound, value in bucket_values:
    if variable < lower_bound:
        return_value = value
        break
else:
    return_value = default

But then I need to check for lower and upper bounds and equality i.e. If it is the first iteration of the loop I must check if inferior (<), then the next loop I must check for inferior & equal (<=).

I am looking for something like this (Ruby):

buckets = [
[:<, -90, -57], 
[:<=, 5, -10], 
[:<=, 10, 3], 
[:>, 60, 40]] 

# Pass bucket to a method

My question is: Is there a pythonic way of doing this with variable bounds and values?

I am not sure what a good question title might be for this problem. Any suggestion is welcomed. — Ju Bonn, Jan 22 '19 at 06:40
check [this answer](https://stackoverflow.com/questions/14029245/putting-an-if-elif-else-statement-on-one-line)... I don't know if this is what you're looking for — Anwarvic, Jan 22 '19 at 06:45
What is the code you are thinking about in the first place? If you could add that one to the question w could help to translate it to Python. — Tobias Wilfert, Jan 22 '19 at 06:46
@Anwarvic I am not looking for an if else. Because this forces the amount of buckets. — Ju Bonn, Jan 22 '19 at 06:49
@TobiasWilfert In ruby I could do the same with this: buckets = [ [:<, -90, -57], [:<=, 5, -10], [:<=, 10, 3], [:>, 60, 40] ] And then pass this to a method. Anything similar in python? — Ju Bonn, Jan 22 '19 at 06:51
If you write exact Ruby equivalent in your question, it may be easier to find somethind similar in Python, also are your numbers only integers or they can also be floats? — unlut, Jan 22 '19 at 06:54
@unlut Only integers in my case. I edited my question to add the ruby bucket. — Ju Bonn, Jan 22 '19 at 06:57
This is called **binning** and there are many existing solutions on SO. Can you use `pandas` library, it has `pd.cut()` ? [Binning and transforming in pandas](https://stackoverflow.com/questions/41953865/binning-and-transforming-in-pandas) — smci, Jan 22 '19 at 07:00
@smci From my understanding pd.cut() will separate into bins, but I do not see lower, upper bounds i.e. if i create bins [0, 10, 20], it will create 2 bins [0 - 10] and [10-20] but nothing for say [-inf - 0] nor [20 - inf]. I will look further if I am wrong. But this is the kind of answer I am looking for! — Ju Bonn, Jan 22 '19 at 07:08
@JuBonn: usually that's the case of discrete bins that are adjacent, hence the top of bin (n-1) is the bottom of bin n. But you want disjoint bins. You could include the missing bins too, then afterwards map illegal/missing bin values to whatever you want. Btw, np.NINF, np.PINF are legal bin values. — smci, Jan 22 '19 at 07:18
@smci this is the answer I was looking for! Therefore a combination of pd and np.ninf/np.inf. — Ju Bonn, Jan 22 '19 at 23:02
Do you want a basic Python solution (using `operator.ge/gt/lt/le`), or a pandas+numpy solution (using `pd.cut`), or don't-care which? Do you want your output to be the bottom value of the containing bin, or a categorical corresponding to the number of the bin, or you don't-care which? — smci, Jan 22 '19 at 23:24
@smci Gonna verify with the rest of the codebase. Don't want to add too many dependencies. I think that both are valid answers to my question. — Ju Bonn, Jan 23 '19 at 00:13

score 1 · Accepted Answer · answered Jan 22 '19 at 18:53

It's pretty simple with the module operator. Here's an example:

>>> import operator
>>> bucket = (operator.ge, -100, operator.le, -5)
>>> def in_bucket(value, bucket): return bucket[0](value, bucket[1]) and bucket[2](value, bucket[3])
...
>>> in_bucket(-101, bucket)
False
>>> in_bucket(-100, bucket)
True
>>> in_bucket(-5, bucket)
True
>>> in_bucket(-4, bucket)
False

But you can do better, by defining a more generic structure:

>>> conditions = ((operator.ge, -100), (operator.le, -5))
>>> def match_conditions(value, conditions): return all(c[0](value, c[1]) for c in conditions)
...
>>> match_conditions(-101, conditions)
False
>>> match_conditions(-100, conditions)
True
>>> match_conditions(-5, conditions)
True
>>> match_conditions(-4, conditions)
False

The all operator returns true iff all conditions are met. The key difference between bucket and conditions is that you can add conditions that do not concern boundaries, e.g value must be pair:

>>> conditions = ((operator.ge, -100), (operator.le, -5), (lambda v, _: v%2==0, None))
>>> match_conditions(-7, conditions)
False
>>> match_conditions(-6, conditions)
True
>>> match_conditions(-5, conditions)    
False

Now you can use a dictionary to summarize your conditions (first example you gave) :

>>> value_by_conditions = { 
... ((operator.lt, -100),): -15,
... ((operator.ge, -100), (operator.le, -5)): -4,
... ((operator.gt, -5), (operator.le, 5)): 18,
... ((operator.gt, 5),): 88,
... }
>>> next((v for cs, v in value_by_conditions.items() if match_conditions(23, cs)), None)
88
>>> next((v for cs, v in value_by_conditions.items() if match_conditions(-101, cs)), None)
-15
>>> next((v for cs, v in value_by_conditions.items() if match_conditions(-100, cs)), None)
-4

Notes:

I used tuples since lists are not hashable (and thus can't be used as dict keys) ;
next((x for x in xs if <test>), None) takes the first element in xs that passes the test. If no elements passes the test, it returns the default value None ;
You have, in older versions of Python (< 3.7), no guarantee for the order of the tests. It's important if you have overlapping conditions.
This is clearly suboptimal, because you test if value < 100 then if value >= 100, etc.

Is this really pythonic? I'm, not so sure. Have a look at https://www.python.org/dev/peps/pep-0020/ to make your own idea.

Anwarvic · Answer 2 · 2019-01-22T07:01:18.583

I think this is pretty pythonic, but I don't recommend it

>>> variable = 23
>>> return_value = -5 if variable<-100 else -4  if variable<=-4 else 18 if variable<= 5  else 88
>>> print(return_value)
88

Notice that 88 is the default value.

EDIT

you can create a function that is based on the same concept as if... else shown above. The function would be something like this:

def pythonic(variable, bucket_values, default):
    for k,v in bucket_values:
        return_value = v if variable<k else "---"
        if return_value != "---":
            return return_value
    return default

You can use it like so:

>>> variable = 23
>>> bucket_values = [(-100, -15), (-5, -4), (5, 18)]
>>> print(pythonic(variable, bucket_values, 88))
88

>>> variable = 1
>>> print(pythonic(variable, bucket_values, 88))
18

I am looking for a way with variable values and bounds. Therefore an IF-Else does not work for me. — Ju Bonn, Jan 22 '19 at 06:53

score 0 · Answer 3 · answered Jan 22 '19 at 06:59

0

If I understood you well, for every "bucket" you have an interval. To check if value belongs to some interval you could define a function:

def check_value(value, interval):
    if value in range(interval[0], interval[1]+1):
        print('Value ', value)
        print('Interval ', interval)
    else:
        pass

Now just iterate over a list of intervals to find where value belongs:

for interval in list_of_intervals:
    check_value(value, interval)

answered Jan 22 '19 at 06:59

vladsiv

2,718
1
11
21

This would be equivalent to the pd.cut() function mentionned by @smci. And it doesnt include lower and upper bounds. – Ju Bonn Jan 22 '19 at 07:09

How to assign integer value from bins

3 Answers3

EDIT