Remix.run Logo
mgh95 6 days ago

Why mock at all? Spend the time making integration tests fast. There is little reason a database, queue, etc. can't be set up in a per-test group basis and be made fast. Reliable software is built upon (mostly) reliable foundations.

com2kid 5 days ago | parent | next [-]

Because if part of my tests involve calling an OpenAI endpoint, I don't want to pay .01 cent every time I run my tests.

Because my tests shouldn't fail when a 3rd party dependency is down.

Because I want to be able to fake failure conditions from my dependencies.

Because unit tests have value and mocks make unit tests fast and useful.

Even my integration tests have some mocks in them, especially for any services that have usage based pricing.

But in general I'm going to mock out things that I want to simulate failure states for, and since I'm paranoid, I generally want to simulate failure states for everything.

End to End tests are where everything is real.

mgh95 5 days ago | parent [-]

> Because if part of my tests involve calling an OpenAI endpoint, I don't want to pay .01 cent every time I run my tests.

This is a good time to think to yourself: do I need these dependencies? Can I replace them with something that doesn't expose vendor risk?

These are very real questions that large enterprises grapple with. In general (but not always), orgs that view technology as the product (or product under test) will view the costs of either testing or inhousing technology as acceptable, and cost centers will not.

> But in general I'm going to mock out things that I want to simulate failure states for, and since I'm paranoid, I generally want to simulate failure states for everything.

This can be achieved with an instrumented version of the service itself.

com2kid 5 days ago | parent [-]

> This is a good time to think to yourself: do I need these dependencies? Can I replace them with something that doesn't expose vendor risk?

Given that my current projects all revolve solely around using LLMs to do things, yes I need them.

The entire purpose of the code is to call into LLMs and do something useful with the output. That said I need to gracefully handle failures, handle OpenAI giving me back trash results (forgetting fields even though they are marked required in the schema, etc), or just the occasional service outage.

Also integration tests only make sense once I have an entire system to integrate. Unit tests let me know that the file I just wrote works.

cornel_io 6 days ago | parent | prev | next [-]

There are thousands of projects out there that use mocks for various reasons, some good, some bad, some ugly. But it doesn't matter: most engineers on those projects do not have the option to go another direction, they have to push forward.

mgh95 6 days ago | parent [-]

In this context, why not refactor (and have your LLM of choice) write and optimize the integration tests for you? If the crux of the argument for LLMs is that it is capable of producing sufficient quality software and dramatically reduced costs, why not have it rewrite tests?

lanstin 6 days ago | parent | prev [-]

hmmmm. I do like integration tests, but I often tell people the art of modern software is to make reliable systems on top of unreliable components. And the integration tests should 100% include times when the network flakes out and drops 1/2 of replies and corrupts msgs and the like.

sethammons 5 days ago | parent | next [-]

Minor nit. I wouldn't call those failing systems tests integration tests.

Unit tests are for validation of error paths. Unit tests can leverage mocks or fakes. Need 3 retires with exponential back off, use unit tests and fakes. Integration tests should use real components. Typically, integration tests are happy path and unit are error paths.

Making real components fail and having tests validate failure handling in a more complete environment jumps from integration testing to resilience or chaos testing. Being able to accurately validate backoffs and retries may diminish, but validating intermediate or ending state can be done with artifact monitoring via sinks.

There is unit-integration testing which fakes out as little as possible but still fakes out some edges. The difference being that the failures are introduced via fake vs managing actual system components. If you connect to a real db on unit-integration tests, you typically wouldn't kill the db or use Comcast to slow the network artificially. That would be reserved for the next layer in the test pyramid.

mgh95 6 days ago | parent | prev [-]

> I do like integration tests, but I often tell people the art of modern software is to make reliable systems on top of unreliable components.

There is a dramatic difference between unreliable in the sense of S3 or other services and unreliable as in "we get different sets of logical outputs when we provide the same input to a LLM". In the first, you can prepare for what are logical outcomes -- network failures, durability loss, etc. In the latter, unless you know the total space of outputs for a LLM you cannot prepare. In the operational sense, LLMs are not a system component, they are a system builder. And a rather poor one, at that.

> And the integration tests should 100% include times when the network flakes out and drops 1/2 of replies and corrupts msgs and the like.

Yeah, it's not that hard to include that in modern testing.