4

I am struggling for a while with the definition of colors in a bar plot using Pandas and Matplotlib. Let us imagine that we have following dataframe:

import pandas as pd
pers1 = ["Jesús","lord",2]
pers2 = ["Mateo","apostel",1]
pers3 = ["Lucas","apostel",1]
    
dfnames = pd.DataFrame(
    [pers1,pers2, pers3],
    columns=["name","type","importance"]
)

Now, I want to create a bar plot with the importance as the numerical value, the names of the people as ticks and use the type column to assign colors. I have read other questions (for example: Define bar chart colors for Pandas/Matplotlib with defined column) but it doesn't work...

So, first I have to define colors and assign them to different values:

colors = {'apostel':'blue','lord':'green'}

And finally use the .plot() function:

dfnames.plot(
    x="name",
    y="importance",
    kind="bar",
    color = dfnames['type'].map(colors)
)

Good. The only problem is that all bars are green:

enter image description here

Why?? I don't know... I am testing it in Spyder and Jupyter... Any help? Thanks!

yoonghm
  • 4,198
  • 1
  • 32
  • 48
José
  • 533
  • 1
  • 4
  • 14

2 Answers2

7

As per this GH16822, this is a regression bug introduced in version 0.20.3, wherein only the first colour was picked from the list of colours passed. This was not an issue with prior versions.

The reason, according to one of the contributors was this -

The problem seems to be in _get_colors. I think that BarPlot should define a _get_colors that does something like

def _get_colors(self, num_colors=None, color_kwds='color'):
    color = self.kwds.get('color')
    if color is None:
        return super()._get_colors(self, num_colors=num_colors, color_kwds=color_kwds)
    else:
        num_colors = len(self.data)  # maybe? may not work for some cases
        return _get_standard_colors(color=kwds.get('color'), num_colors=num_colors)

There's a couple of options for you -

  1. The most obvious choice would be to update to the latest version of pandas (currently v0.22)
  2. If you need a workaround, there's one (also mentioned in the issue tracker) whereby you wrap the arguments within an extra tuple -

    dfnames.plot(x="name",  
                 y="importance", 
                 kind="bar", 
                 color=[tuple(dfnames['type'].map(colors))]
    

Though, in the interest of progress, I'd recommend updating your pandas.

cs95
  • 379,657
  • 97
  • 704
  • 746
2

I find another solution to your problem and it works!

I used directly matplotlib library instead of using plot attribute of the data frame : here is the code :

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline # for jupyter notebook

pers1 = ["Jesús","lord",2]
pers2 = ["Mateo","apostel",1]
pers3 = ["Lucas","apostel",1]

dfnames = pd.DataFrame([pers1,pers2, pers3], columns=["name","type","importance"])

fig, ax = plt.subplots()
bars = ax.bar(dfnames.name, dfnames.importance)


colors = {'apostel':'blue','lord':'green'}

for index, bar in enumerate(bars) :
    color = colors.get(dfnames.loc[index]['type'],'b') # get the color key in your df
    bar.set_facecolor(color[0])
plt.show()

And here is the results :

enter image description here

Espoir Murhabazi
  • 5,973
  • 5
  • 42
  • 73