plotnine.geoms.geom_boxplot

class plotnine.geoms.geom_boxplot(mapping: Aes | None = None, data: DataLike | None = None, **kwargs: Any)[source]

Box and whiskers plot

Usage

geom_boxplot(mapping=None, data=None, stat='boxplot', position='dodge2',
             na_rm=False, inherit_aes=True, show_legend=None, raster=False,
             outlier_alpha=1, varwidth=False, outlier_size=1.5,
             outlier_color=None, outlier_shape='o', notch=False, fatten=2,
             outlier_stroke=0.5, width=None, notchwidth=0.5, **kwargs)

Only the data and mapping can be positional, the rest must be keyword arguments. **kwargs can be aesthetics (or parameters) used by the stat.

Parameters:
mappingaes, optional

Aesthetic mappings created with aes(). If specified and inherit.aes=True, it is combined with the default mapping for the plot. You must supply mapping if there is no plot mapping.

Aesthetic

Default value

lower

middle

upper

x

ymax

ymin

alpha

1

color

'#333333'

fill

'white'

group

linetype

'solid'

shape

'o'

size

0.5

weight

1

The bold aesthetics are required.

datadataframe, optional

The data to be displayed in this layer. If None, the data from from the ggplot() call is used. If specified, it overrides the data from the ggplot() call.

statstr or stat, optional (default: stat_boxplot)

The statistical transformation to use on the data for this layer. If it is a string, it must be the registered and known to Plotnine.

positionstr or position, optional (default: position_dodge2)

Position adjustment. If it is a string, it must be registered and known to Plotnine.

na_rmbool, optional (default: False)

If False, removes missing values with a warning. If True silently removes missing values.

inherit_aesbool, optional (default: True)

If False, overrides the default aesthetics.

show_legendbool or dict, optional (default: None)

Whether this layer should be included in the legends. None the default, includes any aesthetics that are mapped. If a bool, False never includes and True always includes. A dict can be used to exclude specific aesthetis of the layer from showing in the legend. e.g show_legend={'color': False}, any other aesthetic are included by default.

rasterbool, optional (default: False)

If True, draw onto this layer a raster (bitmap) object even ifthe final image is in vector format.

widthfloat, optional (default None)

Box width. If None, the width is set to 90% of the resolution of the data. Note that if the stat has a width parameter, that takes precedence over this one.

outlier_alphafloat, optional (default: 1)

Transparency of the outlier points.

outlier_colorstr or tuple, optional (default: None)

Color of the outlier points.

outlier_shapestr, optional (default: o)

Shape of the outlier points. An empty string hides the outliers.

outlier_sizefloat, optional (default: 1.5)

Size of the outlier points.

outlier_strokefloat, optional (default: 0.5)

Stroke-size of the outlier points.

notchbool, optional (default: False)

Whether the boxes should have a notch.

varwidthbool, optional (default: False)

If True, boxes are drawn with widths proportional to the square-roots of the number of observations in the groups.

notchwidthfloat, optional (default: 0.5)

Width of notch relative to the body width.

fattenfloat, optional (default: 2)

A multiplicative factor used to increase the size of the middle bar across the box.

Examples

[1]:
import pandas as pd
import numpy as np

from plotnine import (
    ggplot,
    aes,
    geom_boxplot,
    geom_jitter,
    scale_x_discrete,
    coord_flip
)

A box and whiskers plot

The boxplot compactly displays the distribution of a continuous variable.

Read more: + wikipedia + ggplot2 docs

[2]:
flights = pd.read_csv('data/flights.csv')
flights.head()
[2]:
year month passengers
0 1949 January 112
1 1949 February 118
2 1949 March 132
3 1949 April 129
4 1949 May 121

Basic boxplot

[3]:
months = [month[:3] for month in flights.month[:12]]
print(months)
['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
[4]:
(
    ggplot(flights)
    + geom_boxplot(aes(x='factor(month)', y='passengers'))
    + scale_x_discrete(labels=months, name='month')  # change ticks labels on OX
)
../_images/geom_boxplot_5_0.png
[4]:
<Figure Size: (640 x 480)>

Horizontal boxplot

[5]:
(
    ggplot(flights)
    + geom_boxplot(aes(x='factor(month)', y='passengers'))
    + coord_flip()
    + scale_x_discrete(
        labels=months[::-1],
        limits=flights.month[12::-1],
        name='month',
    )
)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/.pyenv/versions/3.11.1/envs/plotnine/lib/python3.11/site-packages/IPython/core/formatters.py:708, in PlainTextFormatter.__call__(self, obj)
    701 stream = StringIO()
    702 printer = pretty.RepresentationPrinter(stream, self.verbose,
    703     self.max_width, self.newline,
    704     max_seq_length=self.max_seq_length,
    705     singleton_pprinters=self.singleton_printers,
    706     type_pprinters=self.type_printers,
    707     deferred_pprinters=self.deferred_printers)
--> 708 printer.pretty(obj)
    709 printer.flush()
    710 return stream.getvalue()

File ~/.pyenv/versions/3.11.1/envs/plotnine/lib/python3.11/site-packages/IPython/lib/pretty.py:410, in RepresentationPrinter.pretty(self, obj)
    407                         return meth(obj, self, cycle)
    408                 if cls is not object \
    409                         and callable(cls.__dict__.get('__repr__')):
--> 410                     return _repr_pprint(obj, self, cycle)
    412     return _default_pprint(obj, self, cycle)
    413 finally:

File ~/.pyenv/versions/3.11.1/envs/plotnine/lib/python3.11/site-packages/IPython/lib/pretty.py:778, in _repr_pprint(obj, p, cycle)
    776 """A pprint that just redirects to the normal repr function."""
    777 # Find newlines and replace them with p.break_()
--> 778 output = repr(obj)
    779 lines = output.splitlines()
    780 with p.group():

File ~/scm/python/plotnine/plotnine/ggplot.py:114, in ggplot.__repr__(self)
    110 def __repr__(self) -> str:
    111     """
    112     Print/show the plot
    113     """
--> 114     figure = self.draw(show=True)
    116     dpi = figure.get_dpi()
    117     W = int(figure.get_figwidth() * dpi)

File ~/scm/python/plotnine/plotnine/ggplot.py:234, in ggplot.draw(self, show)
    232 # Drawing
    233 self._draw_layers()
--> 234 self._draw_breaks_and_labels()
    235 self._draw_legend()
    236 self._draw_figure_texts()

File ~/scm/python/plotnine/plotnine/ggplot.py:415, in ggplot._draw_breaks_and_labels(self)
    413 ax = self.axs[pidx]
    414 panel_params = self.layout.panel_params[pidx]
--> 415 self.facet.set_limits_breaks_and_labels(panel_params, ax)
    417 # Remove unnecessary ticks and labels
    418 if not layout_info.axis_x:

File ~/scm/python/plotnine/plotnine/facets/facet.py:328, in facet.set_limits_breaks_and_labels(self, panel_params, ax)
    326 # breaks, labels
    327 ax.set_xticks(panel_params.x.breaks, panel_params.x.labels)
--> 328 ax.set_yticks(panel_params.y.breaks, panel_params.y.labels)
    330 # minor breaks
    331 ax.set_xticks(panel_params.x.minor_breaks, minor=True)

File ~/.pyenv/versions/3.11.1/envs/plotnine/lib/python3.11/site-packages/matplotlib/axes/_base.py:74, in _axis_method_wrapper.__set_name__.<locals>.wrapper(self, *args, **kwargs)
     73 def wrapper(self, *args, **kwargs):
---> 74     return get_method(self)(*args, **kwargs)

File ~/.pyenv/versions/3.11.1/envs/plotnine/lib/python3.11/site-packages/matplotlib/axis.py:2076, in Axis.set_ticks(self, ticks, labels, minor, **kwargs)
   2074 result = self._set_tick_locations(ticks, minor=minor)
   2075 if labels is not None:
-> 2076     self.set_ticklabels(labels, minor=minor, **kwargs)
   2077 return result

File ~/.pyenv/versions/3.11.1/envs/plotnine/lib/python3.11/site-packages/matplotlib/_api/deprecation.py:297, in rename_parameter.<locals>.wrapper(*args, **kwargs)
    292     warn_deprecated(
    293         since, message=f"The {old!r} parameter of {func.__name__}() "
    294         f"has been renamed {new!r} since Matplotlib {since}; support "
    295         f"for the old name will be dropped %(removal)s.")
    296     kwargs[new] = kwargs.pop(old)
--> 297 return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.1/envs/plotnine/lib/python3.11/site-packages/matplotlib/axis.py:1969, in Axis.set_ticklabels(self, labels, minor, fontdict, **kwargs)
   1965 if isinstance(locator, mticker.FixedLocator):
   1966     # Passing [] as a list of labels is often used as a way to
   1967     # remove all tick labels, so only error for > 0 labels
   1968     if len(locator.locs) != len(labels) and len(labels) != 0:
-> 1969         raise ValueError(
   1970             "The number of FixedLocator locations"
   1971             f" ({len(locator.locs)}), usually from a call to"
   1972             " set_ticks, does not match"
   1973             f" the number of labels ({len(labels)}).")
   1974     tickd = {loc: lab for loc, lab in zip(locator.locs, labels)}
   1975     func = functools.partial(self._format_with_dict, tickd)

ValueError: The number of FixedLocator locations (13), usually from a call to set_ticks, does not match the number of labels (12).

Boxplot with jittered points:

[6]:
(
    ggplot(flights, aes(x='factor(month)', y='passengers'))
    + geom_boxplot()
    + geom_jitter()
    + scale_x_discrete(labels=months, name='month')  # change ticks labels on OX
)
../_images/geom_boxplot_9_0.png
[6]:
<Figure Size: (640 x 480)>