> A common pattern would be to separate pure business logic from data fetching/writing. So instead of intertwining database calls with computation, you split into three separate phases: fetch, compute, store (a tiny ETL). First fetch all the data you need from a database, then you pass it to a (pure) function that produces some output, then pass the output of the pure function to a store procedure.

Does anyone have any good resources on how to get better at doing "functional core imperative shell" style design? I've heard a lot about it, contrived examples make it seem like something I'd want, but I often find it's much more difficult in real-world cases.

Random example from my codebase: I have a function that periodically sends out reminders for usage-based billing customers. It pulls customer metadata, checks the customer type, and then based on that it computes their latest usage charges, and then based on that it may trigger automatic balance top-ups or subscription overage emails (again, depending on the customer type). The code feels very messy and procedural, with business logic mixed with side effects, but I'm not sure where a natural separation point would be -- there's no way to "fetch all the data" up front.

▲

lmm 2 hours ago | parent | next [-]

Sometimes you really can't separate the business logic from the imperative operations; in that case you use monads and at least make it a bit more testable and refactorable (e.g. https://michaelxavier.net/posts/2014-04-27-Cool-Idea-Free-Mo...).

That said:

> It pulls customer metadata, checks the customer type, and then based on that it computes their latest usage charges, and then based on that it may trigger automatic balance top-ups or subscription overage emails (again, depending on the customer type).

So compute those things, and store them somewhere (if only an in-memory queue to start with)? Like, I can already see a separation between an ETL stage that computes usage charges, which are probably worth recording in a datastore, and then another ETL stage that computes which top-ups and emails should be sent based on that, which again is probably worth recording for tracing purposes, and then two more stages to actually send emails and execute payment pulls, which it's actually quite nice to have separated from the figuring out which emails to send part (if only so you can retry/debug the latter without sending out actual emails)

▲

AdieuToLogic 2 hours ago | parent | prev | next [-]

> Does anyone have any good resources on how to get better at doing "functional core imperative shell" style design?

Hexagonal architecture[0] is a good place to start. The domain model core can be defined with functional concepts while also defining abstract contracts ( abstractly "ports", concretely interface/trait types) implemented in "adapters" (usually technology specific, such as HTTP and/or SMTP in your example).

0 - https://en.wikipedia.org/wiki/Hexagonal_architecture_(softwa...

▲

sltr 8 hours ago | parent | prev | next [-]

> Does anyone have any good resources on how to get better at doing "functional core imperative shell" style design?

I can recommend Grokking Simplicity by Eric Normand. https://www.manning.com/books/grokking-simplicity

▲

movpasd 9 hours ago | parent | prev | next [-]

If your required logic separates nicely into steps (like "fetch, compute, store"), then a procedural interface makes sense, because sequential and hierarchical control flow work well with procedural programming.

But some requirements, like yours, require control flow to be interwoven between multiple concerns. It's hard to do this cleanly with procedural programming because where you want to draw the module boundaries (e.g.: so as to separate logic and infrastructure concerns) doesn't line up with the sequential or hierarchical flow of the program. In that case you have to bring in some more powerful tools. Usually it means polymorphism. Depending on your language that might be using interfaces, typeclasses, callbacks, or something more exotic. But you pay for these more powerful tools! They are more complex to set up and harder to understand than simple straightforward procedural code.

In many cases judicious splitting of a "mixed-concern function" might be enough and that should probably be the first option on the list. But it's a tradeoff. For instance, you then could lose cohesion and invariance properties (a logically singular operation is now in multiple temporally coupled operations), or pay for the extra complexity of all the data types that interface between all the suboperations.

To give an example, in "classic" object-oriented Domain-Driven Design approaches, you use the Repository pattern. The Repository serves as the interface or hinge point between your business logic and database logic. Now, like I said in the last paragraph, you could instead design it so the business logic returned its desired side-effects to the co-ordinating layer and have it handle dispatching those to the database functions. But if a single business logic operation naturally intertwines multiple queries or other side-effectful operations then the Repository can sometimes be simpler.

▲

brickers 10 hours ago | parent | prev | next [-]

This stuff is quite new to me as I’ve been learning F#, so take this with a pinch of salt. Some of the things you’d want are: - a function to produce a list of customers

- a function or two to retrieve the data, which would be passed into the customer list function. This allows the customer list function to be independent of the data retrieval. This is essentially functional dependency injection

- a function to take a list of customers and return a list of effects: things that should happen

- this is where I wave my hands as I’m not sure of the plumbing. But the final part is something that takes the list of effects and does something with them

With the above you have a core that is ignorant of where its inputs come from and how its effects are achieved - it’s very much a pure domain model, with the messy interfaces with the outside world kept at the edges

▲

grayhatter 9 hours ago | parent | prev | next [-]

> there's no way to "fetch all the data" up front.

this is incorrect

I assume there's more nuance and complexity as for why it feels like there's no way. Probably involving larger design decisions that feel difficult to unwind. But data collection, decisions, and actions can all be separated without much difficulty with some intent to do so.

I would suggest caution, before implementating this directly: but imagine a subroutine that all it did was lock some database table, read the current list of pending top up charges required, issue the charge, update the row, and unlock the table. An entirely different subroutine wouldn't need to concern itself with anything other than data collection, and calculating deltas, it has no idea if a customer will be charged, all it does is calculate a reasonable amount. Something smart wouldn't run for deactivated/expiring accounts, but why does this need to be smart? It's not going to charge anything, it's just updating the price, that hypothetically might be used later based on data/logic that's irrelevant to the price calculation.

Once any complexity got involved, this is closer to how I would want to implement it, because this also gives you a clear transcript about which actions happened why. I would want to be able to inspect the metadata around each decision to make a charge.

▲

t-writescode 2 hours ago | parent | next [-]

They can until they can’t.

Sometimes you might need to operate on a result from an external function, or roll back a whole transaction because the last step failed, or the DB could go down midway through.

The theory is good, but stuff happens and it goes out the window sometimes.

▲

supermdguy 8 hours ago | parent | prev [-]

That's a good point, thinking about it some more, I think the business logic feels so trivial that it would make the code harder to reason about if it were separated from the effects. Currently, I have one giant function that pulls data, filters it, conditionally pulls more data, and then maybe has one line of effectful code.

I could have one function that pulls the wallet balance for all users, and then passes it to a pure function that returns an object with flags for each user indicating what action to take. Then another function would execute the effects based on the returned flags (kind of like the example you gave of processing a pending charges table).

The value of that level of abstraction is less clear though. Maybe better testability? But it's hard to justify what would essentially be tripling the lines of code (one function to pull the data, one pure function to compute actions, one function to execute actions).

Additionally, there's a performance cost to pulling all relevant data, instead of being able to progressively filter the data in different ways depending on partial results (example: computing charges for all users at once and then passing it to a pure function that only bills customers whose billing date is today).

Would be great to see some more complex examples of "functional core imperative shell" to see what it looks like in real-world applications, since I'm guessing the refactoring I have in my head is a naive way to do it.

	▲	grayhatter 8 hours ago \| parent [-]
		> The value of that level of abstraction is less clear though. Maybe better testability? You wouldn't do it to make it easier to test; you would do it to make it easier to reason about. E.g. There's some bug where some users aren't getting charged. You already know where the bug is, or rather, you know it's not in the code that calculates what the price would be. But now, as a bonus, you also can freely modify the code that collects the people to charge, and don't have to worry if modifying that code will change how much other people get charged, (because these two code blocks can't interact with each other). You know the joke/meme, 99 bugs in the code, take one down, patch it around, 104 bugs in the code? Yeah, that's talking about code like you're describing where everything is in one function, and everything depends on everything else as an intractable web somehow. > But it's hard to justify what would essentially be tripling the lines of code (one function to pull the data, one pure function to compute actions, one function to execute actions). This sounds like you're charging per line of source code. Not all code is equal. If you have 3x the amount of code, but it's written in a way that turns something difficult, or complex to understand and reason about, into something trivial to reason about, what you have is strictly better code. The other examples or counter points you mention are merely implementation details, that only make sense in the context of your specific example/code base that I haven't read. So I'm gonna skip trying to reasoning about the solutions to them given the point of the style recommendations is to write code in a way that is 1) easier to reason about, or 2) impossible to get wrong ...but those really are the same thing

▲

pdmccormick 9 hours ago | parent | prev [-]

Conceptually, can you break your processing up into a more or less "pure" functional core, surrounded by some gooey, imperative, state-dependent input loading and output effecting stages? For each processing stage, implement functions of well-defined inputs and outputs, with any global side effects clearly stated (i.e. updating a customer record, sending an email) Then factor all the imperative-ish querying (that is to say, anything dependent on external state such as is stored in a database) to the earlier phases, recognizing that some of the querying is going to be data-dependent ("if customer type X, fetch the limits for type X accounts"). The output of these phases should be a sequence of intermediate records that contain all the necessary data to drive the subsequent ones.

Whenever there is an action decision point ("we will be sending an email to this customer"), instead of actually performing that step right then and there, emit a kind of deferred-intent action data object, e.g. "OverageEmailData(customerID, email, name, usage, limits)". Finally, the later phases are also highly imperative, and actually perform the intended actions that have global visibility and mutate state in durable data stores.

You will need to consider some transactional semantics, such as, what if the customer records change during the course of running this process? Or, what if my process fails half-way through sending customer emails? It is helpful if your queries can be point-in-time based, as in "query customer usage as-of the start time for this overall process". That way you can update your process, re-run it with the same inputs as of the last time you ran it, and see what your updates changed in terms of the output.

If those initial querying phases take a long time to run because they are computationally or database query heavy, then during your development, run those once and dump the intermediate output records. Then you can reload them to use as inputs into an isolated later phase of the processing. Or you can manually filter those intermediates down to a more useful representative set (i.e. a small number of customers of each type).

Also, its really helpful to track the stateful processing of the action steps (i.e. for an email, track state as Queued, Sending, Success, Fail). If you have a bug that only bites during a later step in the processing, you can fix it and resume from where you left off (or only re-run for the affected failed actions). Also, by tracking the globally affecting actions you can actually take the results of previous runs into account during subsequent ones ("if we sent an overage email to this customer within the past 7 days, skip sending another one for now"). You now have a log of the stateful effects of your processing, which you can also query ("how many overage emails have been sent, and what numbers did they include?")

Good luck! Don't go overboard with functional purity, but just remember, state mutations now can usually be turned into data that can be applied later.