| ▲ | steveBK123 8 hours ago | |
I think for most non-coding tasks we are still in the "convincing liar" stage, and not even at the "its right 99.9% of the time and humans need to quickly detect the 0.1% errors" problem. I think a lot of the HN crowd misses this because they are programmers using it for programming. I work at a firm that has given AI tooling to non-developer data analyst type people who otherwise live & die in excel. Much of their day job involves reading PDFs. I occasionally will use some of the firms AI tooling for PDF summarizing/parsing/interrogation/etc type tasks and remain consistently underwhelmed. Stuff like taking 10 PDFs each with a simple 30 row table per PDF, with the same title in each file, it ends up puking on 3-4 out of 10 with silent failures. Row drops, duplicating data, etc. When you point out its missed rows, it goes back and duplicates rows to get to the correct row count. Using it to interrogate standard company filings PDfs that it has been specially trained on and it gave very convincing answers which were wrong because it has silently truncated its search context to only recent year financial filings. Nowhere did it show this limitation to the user. It only became apparent after researching the 4th or 5th company when it decided to caveat its answer with its knowledge window. This invalidated the previous answers as questions such as "when was the first X" or "have they ever reported Y" were operating on incomplete information. Most users of these tool are not that technical, and are going to be much more naive in taking the answers for fact without considering the context. | ||
| ▲ | Terr_ 7 hours ago | parent [-] | |
I'm convinced the best use of these systems will be an explicit two-phase process where they just help people prototype and see and learn how to command regular software. For example, imagine describing what files you want to find, and getting back a command-line string of find/grep piping. It doesn't execute anything without confirmation, it doesn't "summarize" the results, it's just a narrow tutor to help people in a translation step. A tool for learning that, ideally, eventually puts itself out of a job. Returning to your PDF scenario: The LLM could help people weave together regular tools of "find regions with keywords" and "extract table as spreadsheet" and "cross-reference two spreadsheets using column values", etc. | ||