what I've done for a similar script in the past:

    answer_initial = llm(prompt=prompt, site=site) # JSON with answer and any stuff needed to do heuristic checks.
    heuristic_results = heuristics(answer_final) # rule based.
    answer_final = llm(prompt-prompt, site=site, answer=answer_initial)
    mark_for_review = ... # basically just a bunch of hard-coded stuff I add flag possible failures for review.

You can use an extremely small/cheap model for something like this -- granite 4.0 micro works fine for me, 3.3 8b did as well, both run on my macbook. YMMV / try different models and see how it goes.