Remix.run Logo
pdmccormick 9 hours ago

Conceptually, can you break your processing up into a more or less "pure" functional core, surrounded by some gooey, imperative, state-dependent input loading and output effecting stages? For each processing stage, implement functions of well-defined inputs and outputs, with any global side effects clearly stated (i.e. updating a customer record, sending an email) Then factor all the imperative-ish querying (that is to say, anything dependent on external state such as is stored in a database) to the earlier phases, recognizing that some of the querying is going to be data-dependent ("if customer type X, fetch the limits for type X accounts"). The output of these phases should be a sequence of intermediate records that contain all the necessary data to drive the subsequent ones.

Whenever there is an action decision point ("we will be sending an email to this customer"), instead of actually performing that step right then and there, emit a kind of deferred-intent action data object, e.g. "OverageEmailData(customerID, email, name, usage, limits)". Finally, the later phases are also highly imperative, and actually perform the intended actions that have global visibility and mutate state in durable data stores.

You will need to consider some transactional semantics, such as, what if the customer records change during the course of running this process? Or, what if my process fails half-way through sending customer emails? It is helpful if your queries can be point-in-time based, as in "query customer usage as-of the start time for this overall process". That way you can update your process, re-run it with the same inputs as of the last time you ran it, and see what your updates changed in terms of the output.

If those initial querying phases take a long time to run because they are computationally or database query heavy, then during your development, run those once and dump the intermediate output records. Then you can reload them to use as inputs into an isolated later phase of the processing. Or you can manually filter those intermediates down to a more useful representative set (i.e. a small number of customers of each type).

Also, its really helpful to track the stateful processing of the action steps (i.e. for an email, track state as Queued, Sending, Success, Fail). If you have a bug that only bites during a later step in the processing, you can fix it and resume from where you left off (or only re-run for the affected failed actions). Also, by tracking the globally affecting actions you can actually take the results of previous runs into account during subsequent ones ("if we sent an overage email to this customer within the past 7 days, skip sending another one for now"). You now have a log of the stateful effects of your processing, which you can also query ("how many overage emails have been sent, and what numbers did they include?")

Good luck! Don't go overboard with functional purity, but just remember, state mutations now can usually be turned into data that can be applied later.