The pure Python code in the last example is more verbose than it needs to be.

    groups = {}
    for row in filtered:
        key = (row['species'], row['island'])
        if key not in groups:
            groups[key] = []
        groups[key].append(row['body_mass_g'])

can be rewritten as:

    groups = collections.defaultdict(list)
    for row in filtered:
        groups[(row['species'], row['island'])].append(row['body_mass_g'])

and

    variance = sum((x - mean) ** 2 for x in values) / (n - 1)
    std_dev = math.sqrt(variance)

as:

    std_dev = statistics.stddev(values)

▲ roadside_picnic 8 hours ago | parent | next [-]

> (n - 1)

It's also funny that one would write their own standard deviation function and include Bessel's correction. Usually if I'm manually re-implementing a standard deviation function it's because I'm afraid the implementors blindly applied the correction without considering whether or not it's actually meaningful for the given analysis. At the very least, the correct name for what's implemented there should really be `sample_std_dev`.

	▲	m55au 8 hours ago \| parent [-]
		It is sadly really inconsistent. The stdlib statistics has two separate functions, stdev for sample and pstdev for population. Numpy and pandas both have .std() with ddof (delta degrees of freedom) as a parameter, but numpy defaults to 0 (population) and pandas to 1 (sample).

▲ ashdev 10 hours ago | parent | prev [-]

Disagree.

In the first instance, the original code is readable and tells me exactly what's what. In your example, you're sacrificing readability for being clever.

Clear code(even if verbose) is better than being clever.

▲

billyoyo 10 hours ago | parent | next [-]

Using a very common utility in the standard library is to avoid reinventing the wheel is not "clean code"?

defaultdict is ubiquitous in modern python, and is far from a complicated concept to grasp.

	▲	ux266478 8 hours ago \| parent [-]
		I don't think that's the right metaphor to use here, it exists at a different level than what I would consider "reinventing the wheel". That to me is more some attempt to make a novel outward-facing facet of the program when there's not much reason to do so. For example, reimplementing shared memory using a custom kernel driver as your IPC mechanism, despite it not doing anything that shared memory doesn't already do. The difference between the examples is so trivial I'm not really sure why the parent comment felt compelled to complain.

▲

MarsIronPI 10 hours ago | parent | prev | next [-]

I think code clarity is subjective. I find the second easier to read because I have to look at less code. When I read code, I instinctively take it apart and see how it fits together, so I have no problem with the second approach. Whereas the first approach is twice as long so it takes me roughly twice as long to read.

▲

explodes 10 hours ago | parent | prev | next [-]

The 2nd version is the most idiomatic.

▲

ashdev 7 hours ago | parent | prev | next [-]

Interesting! Thanks for the responses. I'm not python native and haven't worked as extensively with python as some of you here.

That said, I'll change my mind here and agree on using std library, but I'd still have separate 'key' assignment here for more clarity.

▲

pphysch 9 hours ago | parent | prev [-]

I would keep the explicit key= assignment since it's more than just a single literal but otherwise the second version is more idiomatic and readable.