plotnine.geoms.geom_histogram¶
- class plotnine.geoms.geom_histogram(mapping: Aes | None = None, data: DataLike | None = None, **kwargs: Any)[source]¶
Histogram
Usage
geom_histogram(mapping=None, data=None, stat='bin', position='stack', na_rm=False, inherit_aes=True, show_legend=None, raster=False, **kwargs)
Only the
data
andmapping
can be positional, the rest must be keyword arguments.**kwargs
can be aesthetics (or parameters) used by thestat
.- Parameters:
- mapping
aes
, optional Aesthetic mappings created with
aes()
. If specified andinherit.aes=True
, it is combined with the default mapping for the plot. You must supply mapping if there is no plot mapping.Aesthetic
Default value
x
y
alpha
1
color
None
fill
'#595959'
group
linetype
'solid'
size
0.5
The bold aesthetics are required.
- data
dataframe
, optional The data to be displayed in this layer. If
None
, the data from from theggplot()
call is used. If specified, it overrides the data from theggplot()
call.- stat
str
or stat, optional (default:stat_bin
) The statistical transformation to use on the data for this layer. If it is a string, it must be the registered and known to Plotnine.
- position
str
or position, optional (default:position_stack
) Position adjustment. If it is a string, it must be registered and known to Plotnine.
- na_rmbool, optional (default:
False
) If
False
, removes missing values with a warning. IfTrue
silently removes missing values.- inherit_aesbool, optional (default:
True
) If
False
, overrides the default aesthetics.- show_legendbool or
dict
, optional (default:None
) Whether this layer should be included in the legends.
None
the default, includes any aesthetics that are mapped. If abool
,False
never includes andTrue
always includes. Adict
can be used to exclude specific aesthetis of the layer from showing in the legend. e.gshow_legend={'color': False}
, any other aesthetic are included by default.- rasterbool, optional (default:
False
) If
True
, draw onto this layer a raster (bitmap) object even ifthe final image is in vector format.
- mapping
See also
Examples¶
[1]:
import pandas as pd
import numpy as np
from plotnine import (
ggplot,
aes,
after_stat,
geom_histogram,
facet_wrap,
facet_grid,
coord_flip,
scale_y_continuous,
scale_y_sqrt,
scale_y_log10,
scale_fill_manual,
theme_bw,
theme_xkcd
)
from plotnine.data import diamonds
from mizani.formatters import percent_format
Histograms¶
Visualise the distribution of a variable by dividing the x-axis into bins and counting the number of observations in each bin. Histograms display the counts with bars.
You can define the number of bins (e.g. divide the data five bins) or define the binwidth (e.g. each bin is size 10).
Distributions can be visualised as: * count, * normalised count, * density, * normalised density, * scaled density as a percentage.
[2]:
diamonds.head(5)
[2]:
carat | cut | color | clarity | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.23 | Ideal | E | SI2 | 61.5 | 55.0 | 326 | 3.95 | 3.98 | 2.43 |
1 | 0.21 | Premium | E | SI1 | 59.8 | 61.0 | 326 | 3.89 | 3.84 | 2.31 |
2 | 0.23 | Good | E | VS1 | 56.9 | 65.0 | 327 | 4.05 | 4.07 | 2.31 |
3 | 0.29 | Premium | I | VS2 | 62.4 | 58.0 | 334 | 4.20 | 4.23 | 2.63 |
4 | 0.31 | Good | J | SI2 | 63.3 | 58.0 | 335 | 4.34 | 4.35 | 2.75 |
If you create a basic histogram, you will be prompted to define the binwidth or number of bins.
[3]:
(
ggplot(diamonds, aes(x='carat'))
+ geom_histogram()
)
/Users/hassan/scm/python/plotnine/plotnine/stats/stat_bin.py:109: PlotnineWarning: 'stat_bin()' using 'bins = 142'. Pick better value with 'binwidth'.

[3]:
<Figure Size: (640 x 480)>
You can define the width of the bins, by specifying the binwidth
inside geom_histogram()
.
[4]:
(
ggplot(diamonds, aes(x='carat'))
+ geom_histogram(binwidth=0.5) # specify the binwidth
)

[4]:
<Figure Size: (640 x 480)>
Or you can define the number of bins by specifying bins
inside geom_histogram()
. Note, the example below uses 10 bins, however you can't see them all because some of the bins are too small to be noticeable.
[5]:
(
ggplot(diamonds, aes(x='carat'))
+ geom_histogram(bins=10) # specify the number of bins
)

[5]:
<Figure Size: (640 x 480)>
There are different ways to visualise the distribution, you can specify this using the y
argument within aes()
. In the example below I'm using the default setting: raw count with after_stat('count')
.
[6]:
(
ggplot(diamonds, aes(x='carat',
y=after_stat('count') # specify each bin is a count
))
+ geom_histogram(binwidth=0.50)
)

[6]:
<Figure Size: (640 x 480)>
You can normalise the raw count to 1 by using after_stat('ncount')
:
[7]:
(
ggplot(diamonds, aes(x='carat',
y=after_stat('ncount') # normalise the count to 1
))
+ geom_histogram(binwidth=0.50)
)

[7]:
<Figure Size: (640 x 480)>
You can display the density of points in a bin, (this is scaled to integrate to 1) by using after_stat('density')
:
[8]:
(
ggplot(diamonds, aes(x='carat',
y=after_stat('density') # density
))
+ geom_histogram(binwidth=0.50)
)

[8]:
<Figure Size: (640 x 480)>
The proportion of bins can be shown, in the example below the bin=0.5
accounts for about ~55% of the data:
[9]:
(
ggplot(diamonds, aes(x='carat',
y=after_stat('width*density')) # show proportion
)
+ geom_histogram(binwidth=0.5)
)

[9]:
<Figure Size: (640 x 480)>
We can also display counts as percentages by using the percent_format()
which requires the mizani.formatters
library:
[10]:
(
ggplot(diamonds, aes(x='carat', y=after_stat('width*density')))
+ geom_histogram(binwidth=0.5)
+ scale_y_continuous(labels=percent_format()) # display labels as a percentage
)

[10]:
<Figure Size: (640 x 480)>
Instead of using stat
you can use stat_bin
defined within geom_histogram()
, this is useful if you want to layer a few different plots in the one figure.
[11]:
(
ggplot(diamonds, aes(x='carat'))
+ geom_histogram(binwidth=0.5, alpha=0.5)
+ geom_histogram(binwidth=0.2, alpha=0.5, fill='green')
)

[11]:
<Figure Size: (640 x 480)>
You can also flip the x-y coordinates:
[12]:
(
ggplot(diamonds, aes(x='carat', y=after_stat('density')))
+ geom_histogram(binwidth=0.5)
+ coord_flip()
)

[12]:
<Figure Size: (640 x 480)>
You can visualise counts by other variables using fill
within aes()
:
[13]:
(
ggplot(diamonds, aes(x='carat', y=after_stat('count'),fill='cut'))
+ geom_histogram(binwidth=0.5)
)

[13]:
<Figure Size: (640 x 480)>
You can visualise too-small-to-see bars by transforming the y-axis scaling by using scale_y_sqrt()
square-root scale or scale_y_log10()
for a log-scale (similarly use scale_x_sqrt()
and scale_x_log10()
to transform the x-axis).
[14]:
(
ggplot(diamonds, aes(x='carat', y=after_stat('count')))
+ geom_histogram(binwidth=0.5)
+ scale_y_sqrt() # square root scale
)

[14]:
<Figure Size: (640 x 480)>
[15]:
(
ggplot(diamonds, aes(x = 'carat', y = after_stat('count')))
+ geom_histogram(binwidth = 0.5)
+ scale_y_log10() # log scale
)

[15]:
<Figure Size: (640 x 480)>
Change the look of your plot:
[16]:
(
ggplot(diamonds, aes(x='carat', y=after_stat('density')))
+ geom_histogram(binwidth=0.5,
fill='green', # change the fill colour (one colour)
colour='pink', # change the outline
size=2, # change the thickness of the outline
alpha=0.7 # change the transparency
)
+ theme_xkcd() # play with themes (look/arrangement)
)

[16]:
<Figure Size: (640 x 480)>
Another change, this time changing the fill colours manually:
[17]:
(
ggplot(diamonds, aes(x='carat',
y=after_stat('density'),
fill='cut' # change the fill colour using another variable
))
+ scale_fill_manual(values=["#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442"]) # change the fill colour
+ geom_histogram(binwidth=0.5,
colour="#D55E00",# change the outline
size=1, # change the thickness of the outline
alpha=0.7 # change the transparency
)
+ theme_bw() # play with themes (look/arrangement)
)

[17]:
<Figure Size: (640 x 480)>
When faceting histograms with scaled counts/densities, they are normalised by each facet, and not overall. Here's an example of a facet wrap:
[18]:
(
ggplot(diamonds, aes(x='carat', y=after_stat('ncount')))
+ geom_histogram(binwidth=0.5)
+ facet_wrap('color') # facet wrap
)

[18]:
<Figure Size: (640 x 480)>
Here's an example of a facet grid with the count normalised in each grid:
[19]:
(
ggplot(diamonds, aes(x='carat', y=after_stat('ncount')))
+ geom_histogram(binwidth=0.5)
+ facet_grid('cut ~ color')
)

[19]:
<Figure Size: (640 x 480)>