plotnine.mapping.stage

class plotnine.mapping.stage(start=None, after_stat=None, after_scale=None)[source]

Stage allows you evaluating mapping at more than one stage

You can evaluate an expression of a variable in a dataframe, and later evaluate an expression that modifies the values mapped to the scale.

Parameters:
startexpression | array_like | scalar

Aesthetic expression using primary variables from the layer data.

after_statexpression

Aesthetic expression using variables calculated by the stat.

after_scaleexpression

Aesthetic expression using aesthetics of the layer.

Examples

[1]:
%load_ext autoreload
%autoreload 2
%aimport plotnine

import pandas as pd
import numpy as np

from plotnine import (
    ggplot,
    aes,
    after_stat,
    stage,
    geom_bar,
    geom_text,
    geom_bin_2d,
    stat_bin_2d,
)

stage

[2]:
df = pd.DataFrame({
    'var1': list('abbcccddddeeeee'),
    'cat': list('RSRSRSRRRSRSSRS')
})

(ggplot(df, aes('var1'))
 + geom_bar()
)
../_images/stage_2_0.png
[2]:
<Figure Size: (640 x 480)>

Add the corresponding count on top of each bar.

[3]:
(ggplot(df, aes('var1'))
 + geom_bar()
 + geom_text(aes(label=after_stat('count')), stat='count')
)
../_images/stage_4_0.png
[3]:
<Figure Size: (640 x 480)>

Adjust the y position so that the counts do not overlap the bars.

[4]:
(ggplot(df, aes('var1'))
 + geom_bar()
 + geom_text(aes(label=after_stat('count'), y=stage(after_stat='count', after_scale='y+.1')), stat='count')
)
../_images/stage_6_0.png
[4]:
<Figure Size: (640 x 480)>

Note that this will work even nicely for stacked bars where adjustig the position with nudge_y=0.1 would not.

[5]:
(ggplot(df, aes('var1', fill='cat'))
 + geom_bar()
 + geom_text(aes(label=after_stat('count'), y=stage(after_stat='count', after_scale='y+.1')), stat='count', position='stack')
)
../_images/stage_8_0.png
[5]:
<Figure Size: (640 x 480)>

Create a binned 2d plot with counts

[6]:
np.random.seed(123)
df = pd.DataFrame({
    'col_1': np.random.rand(1000),
    'col_2': np.random.rand(1000)
})
[7]:
(ggplot(df, aes(x='col_1', y='col_2'))
 + geom_bin_2d(position='identity', binwidth=0.1)
)
../_images/stage_11_0.png
[7]:
<Figure Size: (640 x 480)>

Add counts to the bins. stat_bin_2d bins are specified using retangular minimum and maximum end-points for dimension; we use these values to compute the mid-points at which to place the counts.

First x and y aesthetics are mapped to col_1 and col_2 variables, then after the statistic consumes them and creates xmin, xmax, ymin & ymax values for each bin along with associated count. After the statistic computation the x and y aesthetics do not exist, but we create meaningful values using the minimum and maximum end-points.

Note that the binning parameters for the geom and stat combination must be the same. In this case it is the binwidth.

[8]:
(ggplot(df, aes(x='col_1', y='col_2'))
 + geom_bin_2d(position='identity', binwidth=0.1)
 + stat_bin_2d(
     aes(
         x=stage(start='col_1', after_stat='(xmin+xmax)/2'),
         y=stage(start='col_2', after_stat='(ymin+ymax)/2'),
         label=after_stat('count')
     ),
     binwidth=0.1,
     geom='text',
     format_string='{:.0f}',
     size=10
 )
)
../_images/stage_13_0.png
[8]:
<Figure Size: (640 x 480)>