The bare python/stdlib example used (as well as bare python and avoiding add-on data science oriented libraries not being the way most people would use python for data science) is just...bad? (And, by bad here I mean showing signs of deliberately avoiding stdlib features in order to increase the appearance of the things the author then complains about.)
A better stdlib-only version would be:
from palmerpenguins import load_penguins
import math
from itertools import groupby
from statistics import fmean, stdev
penguins = load_penguins()
# Convert DataFrame to list of dictionaries
penguins_list = penguins.to_dict('records')
# create key function for grouping/sorting by species/island
def key_func(x):
return x['species'], x['island']
# Filter out rows where body_mass_g is missing and sort by species and island
filtered = sorted((row for row in penguins_list if not math.isnan(row['body_mass_g'])), key=key_func)
# Group by species and island
groups = groupby(filtered, key=key_func)
# Calculate mean and standard deviation for each group
results = []
for (species, island), group in groups:
values = [row['body_mass_g'] for row in group]
mean_value = fmean(values)
sd_value = stdev(values, xbar=mean_value)
results.append({
'species': species,
'island': island,
'body_weight_mean': mean_value,
'body_weight_sd': sd_value
})