| ▲ | burningion 2 hours ago |
| The main point raised in the article is that these bots may void attorney client privileges. But the real danger with these IMO is that they're turning casual conversations into a permanent record, and one that will be completely discoverable in court, should the company get into trouble later. |
|
| ▲ | coffeebeqn 2 hours ago | parent | next [-] |
| Plus they are super inaccurate. Gemini gets one of its three bullet subtly or very majorly wrong almost every time.
Just a few weeks ago Gemini said we’re rolling out our payment setup in Russia. You know the place where we have 20+ sanctions packages on? We were talking about France in the meeting. |
| |
| ▲ | operation_moose an hour ago | parent | next [-] | | We've found they're surprisingly good if everyone on the call is using a decent headset. The problems start when using conference room audio or someone is on their laptop mic. If they miss a word they never do unintelligible, they just start playing madlibs based on the rest of the sentence. We just went through a round of 100+ (non-sensitive) VoC interviews and they really cut down the workload of compiling all of the feedback. If the audio was a little shaky though, we pretty much had to throw away the transcripts and do them from scratch like we used to. | | |
| ▲ | user_7832 an hour ago | parent [-] | | > If they miss a word they never do unintelligible, they just start playing madlibs based on the rest of the sentence. Imo this is the single biggest flaw of LLMs. They're great at a lot of things, but knowing when they're wrong (or don't have enough information to actually work on) is a critical flaw. IMO there's nothing structural about why they shouldn't be able to spot this and correct themselves - I suspect it's a training issue. But presumably bots that infer context/fill in the dots rank better on what people like... at the cost of accuracy. | | |
| ▲ | r_lee 30 minutes ago | parent | next [-] | | I don't think it's a training issue, it's simply that there's no inherent "I don't know" in the transformer architecture unless it's really like something completely unknown, otherwise the nearest neighbor will be chosen and that will be whatever sounds similar or is relevant, even if it might cause a problem | | |
| ▲ | aspenmartin 14 minutes ago | parent [-] | | Not inherent in transformer architecture, we do try to ingrain a sense of uncertainty but it’s difficult not only technically but also philosophically/culturally. How confident do you want the model to be in its answer to “why did Rome fall”? Lots of tools in our toolbelts to do better uncertainty calibration but it trades off against other capabilities and actually can be rather frustrating to interact with in agentic contexts since it will constantly need input from you or otherwise be indecisive and overly cautious. It’s not technically a limitation of transformer architecture but it is more challenging to deal with than other architectures/statistical paradigms. Like you can maintain a belief state and generate conditional on this and train to ensure belief state is stable and performant. But evals reward guessing at this point, and it’s very very hard to evaluate the calibration in these open ended contexts. But we’re slowly getting there, just not nearly as fast as other capabilities. |
| |
| ▲ | moffkalast 5 minutes ago | parent | prev [-] | | It's a benchmark and eval issue. Guessing gets them the right result sometimes and the models rank better in error rate than they'd otherwise. We need the kind of benchmarks that penalize being wrong WAY more than saying "I don't know". Of course there's a secondary problem that the model may then overuse the unintelligible option, but that's something that's a matter of training them properly against that eval. You could also try thresholding the output based on perplexity to remove the parts that the model is less sure about, but that's not going to be super accurate I think. |
|
| |
| ▲ | pjc50 2 hours ago | parent | prev [-] | | Given how financial services can impose silent inexplicable lifetime bans for using the wrong words in the "what is this transaction for" field, I'm wondering at what point the AI automatically reports people for sanctions violation based on its mishearing. |
|
|
| ▲ | LanceH 23 minutes ago | parent | prev | next [-] |
| > But the real danger with these IMO is that they're turning casual conversations into a permanent record, and one that will be completely discoverable in court, should the company get into trouble later. I would add that their is no guarantee their are correct as well. |
| |
| ▲ | mock-possum 3 minutes ago | parent [-] | | You’d use a computer generated transcript as a guide, not as proof - the proof is the recording of the person actually saying the thing, not the LLMs best guess of what it imagined the person saying. “At timestamp X, person Y said Z” says the robot, and then you dutifully scrub the audio to timestamp X to verify. |
|
|
| ▲ | stego-tech 38 minutes ago | parent | prev | next [-] |
| This. The fact LLMs can also amplify existing closed-set research means even smaller shops can now search through a flood of documents to find smoking guns or critical evidence, much faster. I’ve been saying it since the mid-10s, but it’s worth repeating: data isn’t gold, it’s more like oxygen in a room in that the higher the concentration, the more likely it is to poison the inhabitants or explode with an errant spark (lawsuit). Collect only what’s needed to perform the function, and store it only as long as necessary for compliance. Anything else is going to spool counsel. |
| |
|
| ▲ | HardwareLust 30 minutes ago | parent | prev | next [-] |
| IANAL, but seems to me the inaccuracy of AI translation should make those notes inadmissible as any sort of evidence. |
|
| ▲ | watwut an hour ago | parent | prev [-] |
| Basically, it will be harder to hide illegal and unethical stuff companies routinely engage in. |
| |
| ▲ | nz 44 minutes ago | parent | next [-] | | No, that would be a strict improvement. The AI note-takers can easily "mishear" or "misreport" non-existent illegal and unethical things. It also seems to easily mess up numbers (which is big problem, because a lot of decisions hinge on precise numbers -- imagine inflating an inventory by an order of magnitude, and then imagine having to pay a tariff on something that never existed). I have a friend who works at a large-ish company that imports and manufactures things (in one of the clerical/quantitative professions). A few years back, they had the IT department go on a kind of "inquisition", wherein they forced employees to disable the summarization function that came with MS Teams, and threatened to fire them if they did not. The resistance to this demand was surprising -- most people are clueless about the cost of their own convenience. Worst of all, people would zone out of meetings, because the AI was producing summaries, which they would then never read. The effect of the technology was that it made meetings infinitely more expensive, because the supposed benefit of meetings was nullified by complacency, _and_ it made the meetings a liability (incorrectly summarized meetings, that could be used in the discovery process, sure, but could also be sold by MSFT as a kind of market-research-data to competitors in the space). Nothing illegal has to happen in these meetings at all, for this tech to cause an infinity of problems for the corporation. Every employee that uses these is effectively an unwitting spy. And if that is the case, then the meetings might as well be recorded and uploaded to YouTube (or whatever people watch these days)[1]. [1]: Maybe this is the future. Which I am okay with, but only if the entire planet has to do it, and the penalties for not doing it are irrecoverably severe. | |
| ▲ | chvid 43 minutes ago | parent | prev | next [-] | | Show me man the man and I will show you the crime. Modernized. Industrial AI scale. | |
| ▲ | SecretDreams 39 minutes ago | parent | prev [-] | | Going to also be harder to hide completely legal, but not ideal stuff. Like randomly complaining about your boss to a colleague or casually discussing a feature you're stuck working on that you think is a bad idea. |
|