KPMG wrote 100-page prompt to build agentic TaxBot

"It is very efficient," Munnelly told the Forrester conference. "It does what our team used to do in about two weeks, in a day. It will strip through our documents and the legislation and produce a 25-page document for a client as a first draft.

"That speed is important," he added. "If we have a client who is about to do a merger, and they want to understand the tax implications, getting that knowledge in a day is much more important than getting it in two weeks' time."

---

I really wonder what is the foundation for their confidence in LLMs. If you have ever used ChatGPT you will be highly skeptic that the output is correct. If it's code, you can at least compile, typecheck, run it, to verify it to some extent. How do you do that with a 25 page report?

▲

SvenL 21 hours ago | parent | next [-]

I wonder the same. I mean, if it is produced in 1 day but I need 2 weeks to verify it, I don’t gain much. Sure I can ask it to quote and link the sources, but still. I remember this case of the Machine Learning book from Springer press where the author used a LLM and it was only revealed when someone tried to look up the quoted sources - they didn’t exist, they were made up.

▲

yobbo 20 hours ago | parent | prev | next [-]

It might also be their relative confidence in peope vs LLMs for this sort of task. People could be worse when the task itself is trivial but the volume is intangible for a single human.

▲

immibis 17 hours ago | parent | prev | next [-]

The secret is that nobody both reads the report and wants it to be factual.

▲

defrost a day ago | parent | prev [-]

> How do you do that with a 25 page report?

Like any technical 25 page report it'll be ballpark with reality, shorter to read and grasp than crawling through a wall of document filled boxes, and passed to other people to 'verify' / offer their opinions on.

Once contracts are in place with millions of dollars in play (or tens of millions, or billions) there will be clauses addressing responsibility and recompense should key parts of the reports upon which an agreement is based prove to be false.

The world runs on technical reports that aren't perfect, but "near enough"; errors are assumed and a frequency of deliberate malfeasance (knowingly lying, misleading, faking results) can be estimated.

Part of my career consisted of producing summaries of two to three thousand documents a day from stock markets about the globe, documents that ranged from three lines announcing a change on a board, a table disclosing a change in holdings by largest investors, etc. to large (hundred+ page) quarterly and annual reports, to small book economic feasibility reports with wads of raw data, interpretation, proposed plans, costings, timelines, etc.

> It will strip through our documents and the legislation and produce a 25-page document for a client as a first draft.

is the key point here, it's a rapid first draft of the major dot points seen to be most important for <whatever>. It is intended to be crawled through with a finer comb and a keen eye before contracts are signed based on a separate framing of <deal>.

The big change here is that an AI churns out a draft faster, the quality of the document will be as suspect as a non AI created human first draft .. untrusted.

▲

ofrzeta a day ago | parent [-]

Untrusted ... but does it have any value at all when you can't be sure that a lot of it is hallucinated? After all, LLMs are not very good with numbers.

▲

defrost 21 hours ago | parent [-]

You're correct that I can't be sure as I don't work at KPMG and haven't had any contact with their piles of documents, existing practices, or TaxBot summaries.

What I do know as a fact is that KPMG are self reporting satisfaction with their in house work on putting such a thing together.

The 'proof' will be the next five years of application to corporate clients.

> After all, LLMs are not very good with numbers.

The assumption, always, should be that neither are interns.

Hence why draft summaries should be reviewed and sanity checked by senior experienced people.

I would assume (based on my prior work summarizing large volumes of data for mineral and energy resources domain) that any report produced would have references back to source documents and pages making the task of cross checking the product simple and relatively straightforward.

▲

Neywiny 17 hours ago | parent [-]

I think the concern is more than what it gathered, I think there's a lot of skepticism over it missing something. The same way so many AI tools just ignore commands, imagine it just ignoring a few sentences. Maybe like:

> We'll sell you our company for $100. But, you have to do a hand-stand and spin around 5 times.

If the AI only puts the first sentence in the summary, you could see how it'd be a bad day for the client. Any human would go "huh that's weird, I'll make sure that's noted in the summary" but in my experience, AIs just don't have that feeling.

▲

defrost 16 hours ago | parent [-]

What's being ignored, it seems, is this is explicitly an in-house tool for a first draft summary to be reviewed by an in-house accountant prior to a final presentation to a client.

> imagine it just ignoring a few sentences.

Sure. Just like the risk every such human intern | associate | junior prepared similar draft report already carries today and in the past.

One would hope that as a company at risk of litigation and carrying the can for bad advice that an AI reduced draft such as this would be proof read by a senior expert in house who would trace back every "We'll sell you our company for $100." to the _original_ context via an embedded hyperlink in the draft.

It's certainly the way in which things were done when generating summaries of tens of thousands of documents for mineral and energy clients looking to invest at least $50 million in advancing projects for return.

▲

Neywiny 16 hours ago | parent [-]

You've missed my point. I don't think any human who has a job at a law firm would ignore a sentence like that. I think any AI I've used has ignored explicit instructions of moderate severity. I'm not worried it'll hallucinate things into existence, I'm worried it'll ignore them out. Can't summarize without throwing away words. I don't trust it to choose the right ones.

	▲	defrost 4 hours ago \| parent [-]
		And you've missed mine. I don't think any human at any law firm, medical practice, major resource company, etc. that deals with volumes of documentation in the course of making multi million deals would _trust_ an associate / intern pool or an AI to create a perfect product that can be passed directly to a client without any form of checking and verification. It's a _given_ that there will be shortfalls and errors and the procedures need to be sufficient to embrace an error prone distillation phase and a circle back and verify phase. At least in my experience to date. It's clear from the article that KPMG feel much the same way.

▲

bashtoni 19 hours ago | parent | prev | next [-]

If it really is a single 100 page prompt then it will be even less reliable than a KPMG audit.

(See https://www.theguardian.com/business/2023/oct/12/kpmg-fined-... or https://pcaobus.org/news-events/news-releases/news-release-d... or https://www.sec.gov/newsroom/press-releases/2017-142 or any of a myriad of other cases)

▲

roxolotl 16 hours ago | parent | prev | next [-]

> Munnelly said KPMG built the agent by writing a 100-page prompt it fed into Workbench. The Register asked for details of the prompt and Munnelly said a substantial team worked on it for months, and the resulting agent asks for four or five inputs before it starts working on tax advice, then asks a human for direction before generating a document.

> Only tax agents can use the tool, because its output is not suitable for people without deep tax expertise.

Ok cool so they write a giant piece of software to assist in highly specialized tasks. Would love to know what the LLM adds. Maybe just parsing?

▲

UltraSane 14 hours ago | parent | prev | next [-]

KPMG can't actually perform accurate audits .

▲

UK-Al05 15 hours ago | parent | prev [-]

"We produced a tool that produces a wonky result, so the results need to be examined in detail anyway."