The key insight behind transducers is that a ton of performance is lost not to bad algorithms or slow interpreters but to copying things around needlessly in memory, specifically through intermediate collections.

While the mechanics of transducers are interesting the bottom line is they allow you to fuse functions and basic conditional logic together in such a way that you transform a collection exactly once instead of n times, meaning new allocation happens only once. Once you start using them you begin to see intermediate collections everywhere.

Of course, in any language you can theoretically do everything in one hyperoptimized loop; transducers get you this loop without much of a compromise on keeping your program broken into simple, composable parts where intent is very clear. In fact your code ends up looking nearly identical (especially once you learn about eductions… cough).

▲ fud101 3 hours ago | parent [-]

These sound wild in terms of promise but I never understood them in a practical way.

▲ moomin 3 hours ago | parent | next [-]

They're not really that interesting. They're "reduce transformers". So, take a reduction operation, turn it into an object, define a way to convert one reduction operation into another and you're basically done. 99% of the time they're basically mapcat.

The real thing to learn is how to express things in terms of reduce. Once you've understood that, just take a look at e.g. the map and filter transducers and it should be pretty obvious. But it doesn't work until you've grasped the fundamentals.

▲ eduction 2 hours ago | parent | prev [-]

Canonical example is rewriting a non transducing set of collection transformations like

   (->> posts
      (map with-user)
      (filter authorized?)
      (map with-friends)
      (into []))

That’s five collections, this is two, using transducers:

    (into []
          (comp
            (map with-user)
            (filter authorized?)
            (map with-friends))
          posts)

A transducer is returned by comp, and each item within comp is itself a transducer. You can see how the flow is exactly like the double threading macro.

map for example is called with one arg, this means it will return a transducer, unlike in the first example when it has a second argument, the coll posts, so immediately runs over that and returns a new coll.

The composed transducer returned by comp is passed to into as the second of three arguments. In three argument form, into applies the transducer to each item in coll, the third argument. In two argument form, as in the first example, it just puts coll into the first argument (also a coll).

▲

kccqzy 2 hours ago | parent | next [-]

That does not sound like a good example. The two-argument form of `map` already returns a lazy sequence. Same for `filter`. I thought lazy sequences are already supposed to get rid of the performance problem of materializing the entire collection. So

▲

eduction 2 hours ago | parent [-]

Lazy sequences reduce the size of intermediate collections but they “chunk” - you get 32 items at a time, multiply that by however many transformations you have and obviously by the size of the items.

There are some additional inefficiencies in terms of context capturing at each lazy transformation point. The problem gets worse outside of a tidy immediate set of transformations like you’ll see in any example.

This article gives a good overview of the inefficiencies, search on “thunk” for tldr. https://clojure-goes-fast.com/blog/clojures-deadly-sin/ (I don’t agree with its near condemnation of the whole lazy pattern (laziness is quite useful - we can complain about it because we have it, it would suck if we didn’t).)

	▲	kccqzy an hour ago \| parent \| next [-]
		So what’s your coding style in Clojure? Do you eschew lazy sequences as much as possible and only use either non-lazy manipulation functions like mapv or transducers? I liked using lazy sequences because it’s more amenable to breaking larger functions into smaller ones and decreases coupling. One part of my program uses map, and a distant part of it uses filter on the result of the map. With transducers it seems like the way to do it is eductions, but I avoided it because each time it is used it reevaluates each item, so it’s sacrificing time for less space, which is not usually what I want. I should add that I almost always write my code with lazy sequences first because it’s intuitive. Then maybe one time out of five I re-read my code after it’s done and realize I could refactor it to use transduce. I don’t think I’ve ever used eduction at all.
	▲	eduction 2 hours ago \| parent \| prev [-]
		This, by the way, is why the lead example in the original linked post on clojure.org is very much like mine.

▲

fud101 2 hours ago | parent | prev [-]

Thanks. So is this not an optimiser Clojure runtime can do for you automatically? I find the first one simpler to read and understand.

	▲	jwr an hour ago \| parent [-]
		Performance is one of the niceties of transducers, but the real benefits are from better code abstractions. For example, transducers decouple the collection type from data-processing functions. So you can write (into #{} ...) (a set), (into [] ...) (a vector) or (into {} ...) (a map) — and you don't have to modify the functions that process your data, or convert a collection at the end. The functions don't care about your target data structure, or the source data structure. They only care about what they process. The fact that no intermediate structures have to be created is an additional nicety, not really an optimization. It is true that for simple examples the (-> ...) is easier to read and understand. But you get used to the (into) syntax quickly, and you can do so much more this way (composable pipelines built on demand!).