▲ | Imustaskforhelp 3 days ago | ||||||||||||||||
I genuinely don't think that the GP is actually making someone actually listen to the transcription and summary and check if the summary is wrong. I almost have this gut feeling that its the case (I may be wrong though) Like imagine this, if the agent could just spend 3 minutes writing a summary, why would you use AI to create a summary and then have some other person listen to the whole audio recording and check if the summary is right like it would take an agent 3 minutes out of lets say a 1 hour long conversation / (call?) on the other hand you have someone listen to 1 hour whole recording and then check the summary? that's now 1 hour compared to 3 minutes Nah, I don't think so. Even if we assume that multiple agents are contacted in the same call, they can all simply write the summary of what they did and to whom they redirected and just follow that line of summaries. And after this, I think that your summary of seeing that they are really screwing away is accurately true. Kinda funny how the gp comment was the first thing that I saw in this post and how even I was kinda convinced that they are one of the more smarter ones integrating AI but your comment made me come to realization of them actually just screwing themselves. Imagine the irony, that a post about how AI companies are screwing themselves by burning a lot of money and then the people using them don't get any value out of it. And then the one on Hn that sounded like it finally made sense for them is also not making sense... and they are screwing over themselves. The irony is just ridiculous. So funny it made me giggle | |||||||||||||||||
▲ | doorhammer 3 days ago | parent [-] | ||||||||||||||||
They might not be, and their use-case might not be one I agree with. I can just imagine a plausible reality where they made a reasonable decision given the incentives and constraints, and I default to that. I'm basically inferring how this would go down in the context I worked under, not the GP, because I don't know the details of their real context. I think I'm seeing where I'm not being as clear as I could, though. I'm talking about the lifecycle of a methodology for categorizing calls, regardless of whether or not it's a human categorizing them or a machine. If your call center agent is writing summaries and categorizing their own calls, you still typically have a QA department of humans that listen to a random sample of full calls for any given agent on a schedule to verify that your human classifiers are accurately tagging calls. The QA agents will typically listen to them at like 4x speed or more, but mostly they're just sampling and validating the sample. The same goes for _any_ automated process you want to apply at scale. You run it in parallel to your existing methodology and you randomly sample classified calls, verifying that the results were correct and you _also_ compare the overall results of the new method to the existing one, because you know how accurate the existing method is. But you don't do that for _every_ call. You find a new methodology you think is worth trying and you trial it to validate the results. You compare the cost and accuracy of that method against the cost and accuracy of the old one. And you absolutely would often have a real human listen to full calls, just not _all_ of them. In that respect, LLMs aren't particularly special. They're just a function that takes a call and returns some categories and metadata. You compare that to the output of your existing function. But it's all part of the: New tech consideration? -> Set up conditions to validate quantitatively -> run trials -> measure -> compare -> decide Then on a schedule you go back and do another analysis to make sure your methodology is still providing the accuracy you need it to, even if you haven't change anything | |||||||||||||||||
|