When the automatic groups are not sufficientΒΆ

In [1]:
import pandas as pd
from plotnine import *

%matplotlib inline

Some data to plot

In [2]:
df = pd.DataFrame({
    'letter': ['Alpha', 'Beta', 'Delta', 'Gamma'],
    'pos': [1, 2, 3, 4],
    'num_of_letters': [5, 4, 5, 5]
})

df
Out[2]:
letter num_of_letters pos
0 Alpha 5 1
1 Beta 4 2
2 Delta 5 3
3 Gamma 5 4
In [3]:
(ggplot(df)
 + geom_col(aes(x='letter', y='pos'))
 + geom_line(aes(x='letter', y='num_of_letters'))
 + ggtitle('Greek Letter Analysis')
)
/user/path/plotnine/geoms/geom_path.py:82: UserWarning: geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
  warn("geom_path: Each group consist of only one "
../_images/tutorials_miscellaneous-automatic-grouping-insufficient_4_1.png
Out[3]:
<ggplot: (97654321012345679)>

We get a plot with a warning and no line(s). This is not what we expected.

The issue is we have 4 groups (Alpha, Beta, ...) and each of those groups has a single point. This is a case where the automatic grouping is not sufficient (or just not what you expect). The solution is to manually set the group for geom_line so that all points belong to one group.

In [4]:
(ggplot(df)
 + geom_col(aes(x='letter', y='pos'))
 + geom_line(aes(x='letter', y='num_of_letters'), group=1)
 + ggtitle('Greek Letter Analysis')
)
../_images/tutorials_miscellaneous-automatic-grouping-insufficient_6_0.png
Out[4]:
<ggplot: (97654321012345679)>

That looks like it.

To understand why the behaviour behind the warning is not wrong, let us try a dataframe with 2 points per group.

In [5]:
df2 = pd.DataFrame({
    'letter': ['Alpha', 'Beta', 'Delta', 'Gamma'] * 2,
    'pos': [1, 2, 3, 4] * 2,
    'num_of_letters': [5, 4, 5, 5] * 2
})

df2.loc[4:, 'num_of_letters'] += 0.8

(ggplot(df2)
 + geom_col(aes(x='letter', y='pos'))
 + geom_line(aes(x='letter', y='num_of_letters'))
 + ggtitle('Greek Letter Analysis')
)
../_images/tutorials_miscellaneous-automatic-grouping-insufficient_8_0.png
Out[5]:
<ggplot: (97654321012345679)>

We get no warning and we get lines.

We can add some color to such a plot.

In [6]:
(ggplot(df2)
 + geom_col(aes(x='letter',y='pos', fill='letter'))
 + geom_line(aes(x='letter', y='num_of_letters', color='letter'), size=1)
 + scale_color_hue(l=0.45)                                  # some contrast to make the lines stick out
 + ggtitle('Greek Letter Analysis')
)
../_images/tutorials_miscellaneous-automatic-grouping-insufficient_10_0.png
Out[6]:
<ggplot: (97654321012345679)>

Credit: github user [@datavistics](https://github.com/datavistics) (derek) whose encounter with this issue motivated this example.