Full disclosure: I'm currently in a leadership role on an AI engineering team, so it's in my best interest for AI to be perceived as driving value.

Here's a relatively straightforward application of AI that is set to save my company millions of dollars annually.

We operate large call centers, and agents were previously spending 3-5 minutes after each call writing manual summaries of the calls.

We recently switched to using AI to transcribe and write these summaries. Not only are the summaries better than those produced by our human agents, they also free up the human agents to do higher-value work.

It's not sexy. It's not going to replace anyone's job. But it's a huge, measurable efficiency gain.

▲

dsr_ 3 days ago | parent | next [-]

Pro-tip: don't write the summary at all until you need it for evidence. Store the call audio at 24Kb/s Opus - that's 180KB per minute. After a year or whatever, delete the oldest audio.

There, I've saved you more millions.

▲

doorhammer 3 days ago | parent | next [-]

Sentiment analysis, nuanced categorization by issue, detecting new issues, tracking trends, etc, are the bread and butter of any data team at a f500 call center.

I'm not going to say every project born out of that data makes good business sense (big enough companies have fluff everywhere), but ime anyway, projects grounded to that kind of data are typically some of the most straight-forward to concretely tie to a dollar value outcome.

▲

la_fayette 3 days ago | parent | next [-]

Yes that sound like important and useful use cases. However, these are solved by boring old school ML models since years...

▲

williamdclt 3 days ago | parent | next [-]

I think what they're saying is that you need the summaries to do these things

▲

esafak 3 days ago | parent | prev | next [-]

It's easier and simpler to use an LLM service than to maintain those ad hoc models. Many replaced their old NLP pipelines with LLMs.

	▲	prashantsengar 3 days ago \| parent [-]
		The place I work at, we replaced our old NLP pipelines with LLMs because they are easier to maintain and reach the same level of accuracy with much less work. We are not running a call centre ourselves but we are a SaaS offering the services for call centre data analysis.

▲

aaomidi 3 days ago | parent | prev | next [-]

Sentiment analysis was not solved and companies were paying analyst firms shit tons of money to do that for them manually.

▲

doorhammer 3 days ago | parent | prev [-]

So, I wouldn't be surprised if someone in charge of a QA/ops department chose LLMs over similarly effective existing ML models in part because the AI hype is hitting so hard right now.

Two things _would_ surprise me, though:

- That they'd integrate it into any meaningful process without having done actual analysis of the LLM based perf vs their existing tech

- That they'd integrate the LLM into a core process their department is judged on knowing it was substantially worse when they could find a less impactful place to sneak it in

I'm not saying those are impossible realities. I've certainly known call center senior management to make more hairbrained decisions than that, but barring more insight I personally default to assuming OP isn't among the hairbrained.

▲

shortrounddev2 3 days ago | parent [-]

My company gets a bunch of product listings from our clients and we try to group them together (so that if you search for a product name you can see all the retailers who are selling that product). Since there arent reliable UPCs for the kinds of products we work with, we need to generate embeddings (vectors) for the products by their name/brand/category and do a nearest-neighbor search. This problem has many many many "old school" ML solutions to it, and when i was asked to design this system I came up with a few implementations and proposed them.

Instead of doing any of those (we have the infrastructure to do it) we are paying OpenAI for their embeddings APIs. Perhaps openAI is just doing old school ML under the hood but there is definitely an instinct among product managers to reach for shiny tools from shiny companies instead of considering more conservative options

	▲	doorhammer 3 days ago \| parent [-]
		Yeah, I don't want to downplay the reality of companies making bad decisions. I think for me, the way the GP phrased things just made me want to give them the benefit of the doubt. Given my experience, people I've worked with, and how the GP phrased things, in my mind it's more likely than not that their not making a naive "chase-the-AI" decision, and that a lot of replies didn't have a whole lot of call center experience. The department I worked with when I did work in call centers was particularly competent and also pretty org savvy. Decisions were always a mix of pragmatism and optics. I don't think it's hard to find people like that in most companies. I also don't think it's hard to find the opposite. But yeah, when I say something would be surprising, I don't mean it's impossible. I mean that the GP sounds informed and competent, and if I assume that, it'd be surprising to me if they sacrificed long-term success for an immediate boost by slotting LLMs into something so core to their success metrics. But, I could be wrong. It's just my hunch, not a quantitative analysis or anything. Feature factory product influence is a real thing, for sure. It's why the _main_ question I ask in interviews is for everyone to describe the relationship between product and eng, so I definitely self-select toward a specific dynamic that probably unduly influences my perspective. I've been places where the balance is hard product, and it sucks working somewhere like that. But yeah, for deciding if more standard ML techniques are worth replacing with LLMs, I'd ultimately need to see actual numbers from someone concretely comparing the two approaches. I just don't have that context

▲

adrr 3 days ago | parent | prev [-]

Those have been done for 10+ years. We were running sentiment analysis on email support to determine prioritization back in 2013. Also ran bayesian categorization to offer support reps quick responses/actions. Don't need expensive LLMs it.

	▲	doorhammer 3 days ago \| parent [-]
		Yeah, I was a QA data analyst supporting three multi-thousand agent call-centers for an F500 in 2012 and we were using phoneme matching for transcript categorization. It was definitely good enough for pretty nuanced analysis. I'm not saying any given department should, by some objective measure, switch to LLMs and I actually default to a certain level of skepticism whenever my department talks about applications. I'm just saying I can imagine plausible realities where an intelligent and competent person would choose to switch toward using LLMs in a call center context. There are also a ton of plausible realities where someone is just riding the hype train gunning for the next promotion. I think it's useful to talk about alternate strategies and how they might compare, but I'm personally just defaulting to assuming the OP made a reasonable decision and didn't want to write a novel to justify it (a trait I don't suffer from, apparently), vs assuming they just have no idea what they're doing. Everyone is free to decide which assumed reality they want to respond to. I just have a different default.

▲

andix 3 days ago | parent | prev | next [-]

Imagine a follow-up call of a customer. They are referring to earlier calls and the call center agents needs to check what it was about. So they can skim/read the transcripts while talking to the customer. I guess it's really hard to listen to transcripts while you're on the phone.

▲

ethagknight 3 days ago | parent | next [-]

Im imagining my actual experience of being transferred for the 3rd or 4th time, repeating my name and address for the 3rd or 4th time, restating my problem for the 3rd or 4th time... feels like theres an implementation problem, not a technological problem.

Quick and accurate routing and triage of inbound calls may be more fruitful and far easier than summarizing hundreds of hours of "ok now plug the router back into the wall." Im imagining AI identifying a specific technical problem that sounds a lot like a problem that a specific technician successfully solved previously.

	▲	0x457 3 days ago \| parent [-]
		Also waiting music being interrupted every minute to tell: 1) my call is very important to them (it's not) 2) listen carefully because options changed (when? 5 years ago?) 3) they have a website where I can do things (you can't, otherwise why would I call?) 4) please stay at the end of call to give them feedback (sure, I will waste more of my time)

▲

dsr_ 3 days ago | parent | prev | next [-]

That would be awesome!

But in fact, customer call centers tend not to be able to even know that you called in yesterday, three days ago and last week.

This is why email-ticketing call centers are vastly superior.

▲

Jolter 3 days ago | parent | next [-]

Perhaps doing this suggested auto-summarizing would be what finally solves that problem?

▲

josefx 3 days ago | parent [-]

Is doing that going to be cheaper than not doing it?

	▲	bmicraft 3 days ago \| parent [-]
		Maybe, if it means people spend less time on calls (because their problem got solved sooner?)

▲

dvfjsdhgfv 3 days ago | parent | prev | next [-]

I'm in love with email-based support as I am on both sides of the chain. When I raise a problem, the engineers on the other side can work at their pace, escalate when needed, and I almost always get a reasonably good reply. I can dig deeper if I wish, and I'm pretty sure the guys on the other end are doing their best.

It works the same way when I'm helping someone else: most reasonable people don't expect that if they make an audio call I will magically solve their problem faster. Maybe it will be slower and they will get a lower-quality ad-hoc solution.

▲

3 days ago | parent | prev | next [-]

[deleted]

▲

tomwheeler 3 days ago | parent | prev | next [-]

> But in fact, customer call centers tend not to be able to even know that you called in yesterday, three days ago and last week.

Nor what you told the person you talked to three minutes earlier, during the same call, before they transferred you to someone else. Because their performance is measured on how quickly they can get rid of you.

▲

ssharp 3 days ago | parent | prev | next [-]

I've always guessed that they are able to tell when you called/what you called about, but they simply don't give that level of information to their frontline folks.

	▲	Imustaskforhelp 3 days ago \| parent [-]
		It might be because its in their interests to do so. It is our problem that needs fixing, so we can just wait untill either they redirect us to the right person with the right knowledge who might be one of the higher ups in the call centers. Or we just quit the call. Either way, it doesn't matter to the company. Plus points that they don't have to teach the frontline customer service more details too and it could be easier for them to onboard new people / fire old employees. Also they would have to pay less if they require very low specifications. man I remember the is 0.001 cent = 0.001 $ video /meme of verizon https://www.youtube.com/watch?v=nUpZg-Ua5ao

▲

fifilura 3 days ago | parent | prev [-]

I am sorry about your bad experience. Maybe the ones you called did not have AI transcribed summaries and were not managed by GP?

▲

variadix 3 days ago | parent | prev [-]

Still makes more sense to do the transcription an analysis lazily rather than ahead of time (assuming you can do it relatively quickly). If that person never calls in again the transcription was a waste of money.

▲

alooPotato 3 days ago | parent | prev | next [-]

you want to be able to search over summaries so you need to generate them right away

▲

deadbabe 3 days ago | parent | next [-]

Do you want to search summaries, or do you want to save millions of dollars per year?

▲

tene80i 3 days ago | parent | next [-]

Product teams analyse call summaries at scale to guide the roadmap to reduce future calls. It’s not just about case management.

▲

morkalork 3 days ago | parent | prev [-]

I can assure you that people care very much about searching and mining calls, especially for compliance and QA reasons.

▲

deadbabe 3 days ago | parent [-]

What’s the ROI?

▲

Windchaser 8 hours ago | parent | next [-]

What's the ROI on quickly identifying and fixing problems with your product?

▲

morkalork 3 days ago | parent | prev [-]

Transcription cost is a race to the bottom because there's so many vendors competing, same with embeddings. It's positive. Gets better every year.

	▲	deadbabe 3 days ago \| parent [-]
		So no ROI, just hope.

▲

krainboltgreene 3 days ago | parent | prev [-]

Pro-tip: You won't ever do that.

▲

ch4s3 3 days ago | parent | next [-]

I would imagine OP is probably mining service call summaries to find common service issues, or at least that's what I would do.

▲

krainboltgreene 3 days ago | parent [-]

That's what everyone says they'll do and then it never gets touched again.

▲

ch4s3 3 days ago | parent [-]

I guess you just know better than everyone, include the people who do look at user interactions. I know I've done it, so I must be no one.

▲

morkalork 3 days ago | parent [-]

I guess I'm no one too because I've done plenty of call analyses too.

	▲	ch4s3 3 days ago \| parent [-]
		We should start a company!

▲

ninininino 3 days ago | parent | prev | next [-]

Advanced organizations (think not startups, but companies that have had years of decades of profit in the public market) might have solved all the low-hanging fruit problems and have staff doing things like automated quality audits (search summaries for swearing, abusive language, etc).

	▲	morkalork 3 days ago \| parent \| next [-]
		And you could save a bunch of money by replacing the staff that do that with LLMs!
	▲	krainboltgreene 3 days ago \| parent \| prev [-]
		I've worked at both. It is extremely rare that anyone ever does it.

▲

alooPotato 3 days ago | parent | prev [-]

we do

▲

anoojb 3 days ago | parent | prev | next [-]

Also entrenches plausible deniability and makes legal contests way more cumbersome for plantiffs to resolve.

	▲	3 days ago \| parent [-]
		[deleted]

▲

sillyfluke 3 days ago | parent | prev | next [-]

You also will have saved them all the cost of the AI summaries that are incorrect as well.

The parent states:

>Not only are the summaries better than those produced by our human agents...

Now, since they have not mentioned what it took to actually verify that the AI summaries were in fact better than the human agents, I'm sceptical they did the necessary due dillengence.

Why do I think this? Because I have actually tried to do such a verification. In order to verify that the AI summary is actually correct you have to engage in the incredibly tedious task of listening to original recording literally second by second and make sure that what is said does not conflict with the AI summary in question. Not only did the AI summary fail at this test, it failed in the first recording I tested.

The AI summary stated that "Feature x was going to be in Release 3, not 4" whereas the in the recording it is stated that the feature will be in Release 4 not 3, literally the opposite of what the AI said.

I'm sorry but the fact that the AI summary is nicely formatted and has not missed a major topic of conversation means fuck all if the details that are are discussed are spectacularly wrong from a decision tracking perspective, as in literally the opposite of what is stated.

And I know "why" the Ai summary fucked up, because in that instance the topic of conversation was about how there was some confusion about which release that feature was going to be in, that's why the issue was a major item of the meeting agenda in the first place. Predicably, the AI failed to follow the convoluted discussion and "came to" the opposite conclusion.

In short, no fucking thanks.

▲

doorhammer 3 days ago | parent | next [-]

Again, not the OP, so I can't speak to exactly their use-case, but the vast majority of call center calls fall into really clear buckets.

To give you an idea: Phonetic transcription was the "state of the art" when I was a QA analyst. It broke call transcripts apart into a stream of phonemes and when you did a search, it would similarly convert your search into a string of phonemes, then look for a match. As you can imagine, this is pretty error prone and you have to get a little clever with it, but realistically, it was more than good enough for the scale we operated at.

If it were an ecom site you'd already know the categories of calls you're interested in because you've been doing that tracking manually for years. Maybe something like "late delivery", "broken item", "unexpected out of stock", "missing pieces", etc.

Basically, you'd have a lot of known context to anchor the llms analysis, which would (probably) cover the vast majority of your calls, leaving you freed up to interact with outliers more directly.

At work as a software dev, having an LLM summarize a meeting incorrectly can be really really bad, so I appreciate the point you're making, but at a call center for an f500 company you're looking for trends and you're aware of your false positive/negative rates. Realistically, those can be relatively high and still provide a lot of value.

Also, if it's a really large company, they almost certainly had someone validate the calls, second-by-second, against the summaries (I know because that was my job for a period of time). That's a minimum bar for _any_ call analysis software so you can justify the spend. Sure, it's possible that was hand-waved, but as the person responsible for the outcome of the new summarization technique with LLMs, you'd be really screwing yourself to handwave a product that made you measurably less effective. There are better ways to integrate the AI hype train into a QA department than replacing the foundation of your analysis, if that's all you're trying to do.

▲

sillyfluke 3 days ago | parent | next [-]

Thanks for the detailed domain-specific explanation, if we assume that some whale clients of the company will end up in the call center is it not more probable that more competent human agents will be responsible for the call, whereas it's pretty much the same AI agent adressing the whale client as the regular customers in the alternative scenario?

	▲	doorhammer 3 days ago \| parent [-]
		Yeah, if I were running a QA department I wouldn't let llms anywhere near actual customers as far as trying to resolve a customer issue directly. And, this is just a guess, but it's not uncommon that whale customers like that have their own dedicated account person and I'd personally stick with that model. The use-case I'm like "huh, yeah, I could see that working well" is mostly around doing sentiment analysis and call tagging--maybe actual summaries that humans might read if I had a really well-design context for the llm to work within. Basically anything where you can have an acceptable false positive/negative rate.

▲

Imustaskforhelp 3 days ago | parent | prev [-]

I genuinely don't think that the GP is actually making someone actually listen to the transcription and summary and check if the summary is wrong.

I almost have this gut feeling that its the case (I may be wrong though)

Like imagine this, if the agent could just spend 3 minutes writing a summary, why would you use AI to create a summary and then have some other person listen to the whole audio recording and check if the summary is right

like it would take an agent 3 minutes out of lets say a 1 hour long conversation / (call?)

on the other hand you have someone listen to 1 hour whole recording and then check the summary? that's now 1 hour compared to 3 minutes Nah, I don't think so.

Even if we assume that multiple agents are contacted in the same call, they can all simply write the summary of what they did and to whom they redirected and just follow that line of summaries.

And after this, I think that your summary of seeing that they are really screwing away is accurately true.

Kinda funny how the gp comment was the first thing that I saw in this post and how even I was kinda convinced that they are one of the more smarter ones integrating AI but your comment made me come to realization of them actually just screwing themselves.

Imagine the irony, that a post about how AI companies are screwing themselves by burning a lot of money and then the people using them don't get any value out of it.

And then the one on Hn that sounded like it finally made sense for them is also not making sense... and they are screwing over themselves.

The irony is just ridiculous. So funny it made me giggle

▲

doorhammer 3 days ago | parent [-]

They might not be, and their use-case might not be one I agree with. I can just imagine a plausible reality where they made a reasonable decision given the incentives and constraints, and I default to that.

I'm basically inferring how this would go down in the context I worked under, not the GP, because I don't know the details of their real context.

I think I'm seeing where I'm not being as clear as I could, though.

I'm talking about the lifecycle of a methodology for categorizing calls, regardless of whether or not it's a human categorizing them or a machine.

If your call center agent is writing summaries and categorizing their own calls, you still typically have a QA department of humans that listen to a random sample of full calls for any given agent on a schedule to verify that your human classifiers are accurately tagging calls. The QA agents will typically listen to them at like 4x speed or more, but mostly they're just sampling and validating the sample.

The same goes for _any_ automated process you want to apply at scale. You run it in parallel to your existing methodology and you randomly sample classified calls, verifying that the results were correct and you _also_ compare the overall results of the new method to the existing one, because you know how accurate the existing method is.

But you don't do that for _every_ call.

You find a new methodology you think is worth trying and you trial it to validate the results. You compare the cost and accuracy of that method against the cost and accuracy of the old one. And you absolutely would often have a real human listen to full calls, just not _all_ of them.

In that respect, LLMs aren't particularly special. They're just a function that takes a call and returns some categories and metadata. You compare that to the output of your existing function.

But it's all part of the: New tech consideration? -> Set up conditions to validate quantitatively -> run trials -> measure -> compare -> decide

Then on a schedule you go back and do another analysis to make sure your methodology is still providing the accuracy you need it to, even if you haven't change anything

▲

Imustaskforhelp 3 days ago | parent [-]

Man firstly I wanted to say that I loved your comment to which I responded to and then this comment too. I feel actually happy reading it and maybe its hard explaing it but maybe its because I learned something new.

So firstly, I thought that you meant that they had to listen to every call so uh yeah a misunderstanding since I admittedly don't know much about it, but still its great to hear from an expert.

I also don't know about the GP's context but I truly felt like this because of how I said in some other comments too on how people are just slapping AI stickers and markets rewarding it even though they are mostly being reckless in how they are using AI (which the post basically says) and I thought of them as the same, though I still doubt them though. Only more context from their side can tell.

Secondly, I really appreciate the paragraph that you wrote about testing different strategies and almost how indepth you went into man. Really feel like one of those comments that I feel like will be useful for me one day or the other Seriously thanks!

	▲	doorhammer 3 days ago \| parent [-]
		Hey, thanks for saying that. I have huge gaps in time commenting on HN stuff because tbh, it's just social anxiety I don't need to sign up for :\| so I really value someone taking the time to express appreciation if they got something out of my novels. I don't ever want to come across like I think I know what's up better than someone else. I just want to share my perspective given my experience and if I'm wrong, hope someone will be kind when they point it out. Tbh it's been awhile since I've worked directly in a call center (I've done some consulting type stuff here and there since then, but not much) so I'm mostly just extrapolating based on new tech and people I still know in that industry. Fwiw, the way I try to approach interpreting something like the GPs post is to try to predict the possible realities and decide which ones I think are most plausible. After that I usually contribute the less represented perspective--but only if I think it's plausible. I think the reality you were describing is totally plausible. My gut feeling is that it's probably not what's happening, but I wouldn't bet any money on that. If someone said "Pick a side. I'll give you $20k if your right and take $20k if you're wrong" I'm just flat out not participating, lol. If I _had_ to participate I'd reluctantly take benefit-of-the-doubt side, but I wouldn't love having to commit to something I'm not at all confident about As it stands it's just a fun vehicle to talk about call center dynamics. Weirdly, I think they're super interesting

▲

roywiggins 3 days ago | parent | prev | next [-]

In the context of call centers in particular I actually can believe that a moderately inaccurate AI model could be better on average than harried humans writing a summary after the call. Could a human do better carefully working off a recording, absolutely, but that's not what needs to be compared against.

It just has to be as good as a call center worker with 3-5 minutes working off their own memory of the call, not as good as the ground truth of the call. It's probably going to make weirder mistakes when it makes them though.

▲

sillyfluke 3 days ago | parent | next [-]

>in the context of call centers in particular I actually can believe that a moderately inaccurate AI model could be better on average than harried humans

You're free to believe that of course, but you're assuming the point that has to be proven. Not all fuck ups are equal. Missing information is one thing, but writing literally opposite of what is said is way higher on the fuck up list. A human agent would be achieving an impressive level of incompetence if they kept on repeating such a mistake, and would definately have been jettisoned from the task after at most three strikes (assuming someone notices). But firing a specific AI agent that repeats such mistakes is out of the question for some reason.

Feel free to expand on why no amount of mistakes in AI summaries will outweigh the benefits in call centers.

▲

trenchpilgrim 3 days ago | parent | prev [-]

Especially humans whose jobs are performance-graded on how quickly they can start talking to the next customer.

	▲	Imustaskforhelp 3 days ago \| parent [-]
		Yeah Maybe that's fair in the current world we live in. But the solution isn't to use AI instead of not trusting the agents / customer service rep because their performance is graded on how quickly they can start talking to next The solution is to change the economics in the way that the workers are incentivized to write good summaries, maybe paying them more and not grading them in such a way will help. I am imagining some company saying AI is good enough because they themselves are using the wrong grading technique and AI is best option in that. SO in that sense, AI just benchmarked maxxed in that if that makes sense. Man, I am not even kidding but I sometimes wonder how economies of scale can work so functionally different from common sense. Like it doesn't make sense at this point.

▲

shafyy 3 days ago | parent | prev [-]

Well, in my own experience, the LLMs that summarize video meetings at work are not at all 100% accurate. The issue is if you have not participated in the call, you can't say which part is accurate and which is not. Therefore, they are utterly useless to me.

▲

paulddraper 3 days ago | parent | prev | next [-]

This works unless you want to automate something with the transcripts, stats, feedback.

▲

Spivak 3 days ago | parent [-]

Why wouldn't it, once you actually have that project you have the raw audio to generate the transcripts. Only spend the money at the last second when you know you need it.

Edit: Tell me more how preemptively spending five figures to transcribe and summarize calls in case you might want to do some "data engineering" on it later is a sound business decision. What if the model is cheaper down the road? YAGNI.

▲

thfuran 3 days ago | parent | next [-]

A company that could save millions by not having staff write up their own call notes almost surely is already doing that.

▲

Spivak 3 days ago | parent [-]

And yet the topic of conversation is a company that did just that. The AI is just the smoke and mirrors that pushed the business to do it. Staff aren't writing their own call notes anymore. The LLM summary, almost by definition, isn't adding any additional signal to the call audio. If your data engineering pipeline works by processing LLM generated notes then it must work equally well processing the call transcript—they're the exact same data. AI finally got the business to admit that nothing of value was added by call notes and have dropped that work completely. The final step is just dropping the useless use of LLM.

Just the audio transcript is way cheaper and can use existing technology.

	▲	thfuran 3 days ago \| parent [-]
		I think you’ve misunderstood something somewhere in the conversation. Text notes and transcripts are useful. They are widely used and integrated into existing processes at probably every large company that’s producing them. You appear to be suggesting that they should just stop doing that and switch to processing audio instead because that’s somehow more pre-existing than their existing processes and they probably don’t need the text for all the things they’re already using it for?

▲

kenjackson 3 days ago | parent | prev | next [-]

This is the bread and butter of call centers and the companies that use them. The transcripts and summaries are used from everything from product improvement to agent assessment. This data is used continuously. Its not like they use this transcript for the one rare time someone sues because they claim an agent lied. That rarely happens.

▲

paulddraper 3 days ago | parent | prev [-]

The "last second" is right after the call.

For example, if 60% of your calls this month mention assembly issues with the product, that information will help you improve it.

This is practical, not theoretical.

▲

3 days ago | parent | prev | next [-]

[deleted]

▲

lotsofpulp 3 days ago | parent | prev | next [-]

The summaries can help automate performance evaluation. If the employee disputes it, I imagine they pull up the audio to confirm.

	▲	Imustaskforhelp 3 days ago \| parent [-]
		the amount of false positives coming from wrongful AI summaries plus having to pull up the audio to confirm is so much more hassle than not using AI and evaluating on some different metric at the first place. Seriously not kidding but the more I read these comments, the more I become horrified realizing wtf,The only reason I can think of integrating AI is because you wish to integrate AI. Nothing wrong with that, But unless proven otherwise through some benchmarks there is no way to justify AI. So its like an experiment, they use AI and if it works/ saves time, great If not, then time to roll it. But we do need to think about experiments logically and the way I am approaching it, its maybe good considering what customer service is now but man that's such a low standard that as customers we shouldn't really stand it. Call centres need to improve period. AI can't fix it. Its like man, we can do anything to save some $ for the shareholders. Only to then "invest" it proudly into AI so that they can say they have integrated AI and so they can have their valuations increased since VC's / stock market reacts differently to the sticker known as AI man.. so saying that you use AI, should be a negative indicator instead of a positive one in the market and the whole bubble is gonna come crashing down when people realize it. It physically hurts me now thinking about it once again. This loop of making humans bad for money, using that money for inferior product, using that inferior product only because you want AI sticker, because shareholders want valuation increase and the company is willing to do this all because they feel/ are rewarded for this by people who will buy anything AI related thinking its gold or maybe that more people will buy it from them at an even higher evaluation because AI sticker and so on.. Almost sounds like a pyramid.

▲

FirmwareBurner 3 days ago | parent | prev | next [-]

>Store the call audio at 24Kb/s Opus - that's 180KB per minute

Why OPUS though? There's dedicated audio codecs in the VoiP/telecom industry that are specifically designed for the best size/quality for voice call encoding.

▲

andrepd 3 days ago | parent | next [-]

Opus pretty much blows all those codecs out of the water, in every conceivable metric. It's actually pretty impressive that a media codec is able to universally exceed (or match) every previous one in every axis.

Still, it's based on ideas from those earlier codecs of course :)

▲

pipo234 3 days ago | parent | prev [-]

Opus is one of those codecs. Older codecs like g711 have better latency and steady bitrate, but they compress terribly. (Essentially just bandwidth and amplitude remapping).

Opus is great for a lot of things and realtime speech over sip or webrtc is just one.

	▲	3 days ago \| parent [-]
		[deleted]

▲

smohare 3 days ago | parent | prev [-]

[dead]

▲

jordanb 3 days ago | parent | prev | next [-]

We use Google meet and it has Gemini transcriptions of our meetings.

They are hilariously inaccurate. They confuse who said what. They often invert the meaning "Joe said we should go with approach x" where Joe actually said we should not do X. It also lacks context causing it to "mishear" all of our internal jargon to "shit my iPhone said" levels.

	▲	rowanseymour 3 days ago \| parent \| next [-]
		Same here. It's frustrating that it doesn't seem to have contextual awareness of who we are and the things we work on so things like names of our products, names of big clients, that we use repeatedly in meetings, are often butchered.
	▲	sigmoid10 3 days ago \| parent \| prev \| next [-]
		That's the difference between having real AI guys and your average linkedIn "AI guys." The other post is a perfect example for a case where you could take a large but still manageable, cutting-edge transcription model like Whisper and fine-tune it using existing hand made transcriptions as ground truth. A match made in heaven for AI engineers. Of course this is going to work way, way better for specific corporate settings than slapping a random closed source general purpose model like Gemini on your task and hoping for the best, just because it achieves X% on random benchmark Y.
	▲	ricardonunez 3 days ago \| parent \| prev \| next [-]
		I don’t know how it can confuse because input on mic is relatively straight forward to get. I use fathom and others and they are accurate, better than manual taken. Interesting take, that I don’t memorize 100% on the calls anymore since I rely on note takers, I only remember the major points but when I read the notes, everything comes clear.
	▲	orphea 3 days ago \| parent \| prev \| next [-]
		Oh, that's what happening. I thought my English is just terrible :(
	▲	thisisit 3 days ago \| parent \| prev \| next [-]
		I found that if you have people with accents and they emphasize certain words then it becomes very difficult to read. One example, I find is "th" is often D because how people pronounce it. Apart from that it is a hit or miss.
	▲	nostrademons 3 days ago \| parent \| prev [-]
		I also use Gemini notes for all my meetings and find them quite helpful. The key insight is: they don’t have to be particularly accurate. Their primary purpose is to remind me (or the other participants) of what was discussed, what considerations were brought up, and what the eventual decision was. If it inverts the conclusion and forgets a “not”, we’re going to catch that, because we were all in the meeting too. It’s their to jog our memory of what was said, because it’s much easier to recognize correct information than recall it, it’s not the authoritative source of truth on the meeting. This gets to a common misconception when it comes to GenAI uses: it functions best as “augmented intelligence” rather than “artificial intelligence”. Meaning that it’s at its best when there’s still a human in the loop and the AI supplements the parts the person are bad at rather than replacing the person entirely. We see this with coding, where AI is very good at writing scaffolding, large-scale refactoring, picking decent libraries, reading API docs and generating code that calls it appropriately, etc but still needs a human to give it very specific directions for anything subtle, and someone to review carefully for bugs and security holes.

▲

vasco 3 days ago | parent | prev | next [-]

I wonder if the human agents agree the AI summaries are better than their summaries. I was nodding as I read and then told myself "yeah but it wouldn't be able to summarize the meetings I have", so I wonder if this only works in 3rd person.

▲

mbStavola 3 days ago | parent | next [-]

Part of me also wonders if people may agree that its better simply because they don't actually have to do the summarization anymore. Even if it is worse by some %, that is an annoying task you are no longer responsible for; if anything goes wrong down the line, "ah the AI must've screwed up" is your way out.

▲

roflc0ptic 3 days ago | parent | next [-]

I’m inclined to believe that call center employees don’t have a lot of incentive to do a good job/care, so a lossy AI could quite plausibly be higher quality than a human

▲

latexr 3 days ago | parent | next [-]

For many years now, every time I have to talk with someone on a call centre there has been a survey at the end with at least two questions:

1. Would you recommend us?

2. Was the agent helpful?

I have a friend who used to work at a call centre and would routinely get the lowest marks on the first item and the highest on the second. I do that when the company has been shitty but I understand the person on the line really made an effort to help.

Obviously, those ratings go back to the supervisor and matter for your performance reviews, which can make all the difference between getting a raise or being fired. If anything, call centre employees have a lot of incentive to do a good job if they have any intention of keeping it, because everything they do with a customer is recorded and scrutinised.

	▲	roflc0ptic 3 days ago \| parent [-]
		Fair point, though I think “did I accurately summarize a conversation” is much harder to check/get away with vs “did I piss off the person on the other end”

▲

freehorse 3 days ago | parent | prev [-]

Also it should be easy to correct some obvious mistakes in less convoluted discussions. Also, prob a support call is less complex than eg a group meeting by many aspects, and with a prob larger margin of acceptable errors.

▲

evereverever 3 days ago | parent | prev [-]

That re-synthesis of information is incredibly valuable to storing it in your own memory.

Of course, we can just rely on knowing nothing just to look things up, but I want more for thinking peoples.

▲

jcims 3 days ago | parent | prev | next [-]

I built a little solution to record and transcribe all of my own meetings. I have many meetings (30hr week+) and I can't keep pace with adequate note-taking while participating in them all.

I'm finding that the summarization of individual meetings very useful, I'm also finding that the ability to send in transcripts across meetings, departments, initiatives whatever to be very effective at surfacing subtexts and common pain points much more effectively than I can.

I'm also using it to look at my own participation in meetings to help me see how I interact with others a (little) bit more objectively and it has helped me find ways to improve. (I don't take its advice directly lol, just think about observations and determine myself if it's something that's important and worth thinking about)

	▲	mjcohen 3 days ago \| parent [-]
		Make sure that is legal where you are and, if needed, you have their permission.

▲

dymk 3 days ago | parent | prev | next [-]

Have you tried having it summarize the meetings you have?

▲

kenjackson 3 days ago | parent | prev | next [-]

AI definitely summarizes meetings better than me and _almost_ anyone else I've seen do it (there is one exception -- one guy was a meeting note taker god. He was so good that he set up a mailing list because so many people wanted to read his meeting notes.) I could probably do better than AI if I really tried, but I've only ever done that a few times.

▲

3 days ago | parent | prev [-]

[deleted]

▲

pedrocr 3 days ago | parent | prev | next [-]

> agents were previously spending 3-5 minutes after each call writing manual summaries of the calls

Why were they doing this at all? It may not be what is happening in this specific case but a lot of the AI business cases I've seen are good automations of useless things. Which makes sense because if you're automating a report that no one reads the quality of the output is not a problem and it doesn't matter if the AI gets things wrong.

In operations optimization there's a saying to not go about automating waste, cut it out instead. A lot of AI I suspect is being used to paper over wasteful organization of labor. Which is fine if it turns out we just aren't able to do those optimizations anyway.

	▲	nulbyte 3 days ago \| parent [-]
		As a customer of many companies who has also worked in call centers, I can't tell you how frustrating it is when I, as a customer, have to call back and the person I speak with has no record or an insufficient record of my last call. This has required me to repeat myself, resend emails, and wait all over again. It was equally frustrating when I, as a call center worker, had to ask the custmer to tell me what should already have been noted. This has required me to apologize and to do someone else's work in addition to my own. Summarizing calls is not a waste, it's just good business.

▲

doubled112 3 days ago | parent | prev | next [-]

At work we've tried AI summaries for meetings, but we spent so much time fixing those summaries that we started writing our own again.

Is there some training you applied or something specific to your use case that makes it work for you?

▲

nsxwolf 3 days ago | parent | next [-]

We stopped after it kept transcribing a particular phrase of domain jargon as “child p*rn”, again and again.

▲

cube00 3 days ago | parent | prev | next [-]

Unless a case goes down the legal road, nobody is ever bothering to read old call summaries in a giant call center.

When was the last time you called a large company and the person answering was already across all the past history without you giving them a case number first?

▲

doubled112 3 days ago | parent [-]

Does an AI summary hold up in court? Or would you still need to review a transcript or recording anyway?

	▲	cube00 3 days ago \| parent [-]
		You can store low quality audio cheaply on cold storage so I suspect that's the real legal record if it got that far.

▲

shawabawa3 3 days ago | parent | prev | next [-]

My guess is that the summaries are never actually read, so accuracy doesn't actually matter and the AI could equally be replaced with /dev/null

▲

mrweasel 3 days ago | parent | prev [-]

We tried Otter.ai, someone complained and asked: "Could you f-ing not? I don't trust them" and now Otter is accused of training their models on recorded meetings without permission. Yeah, I don't even care if it works, I don't trust any of these companies.

▲

recallingmemory 3 days ago | parent | prev | next [-]

So long as precision isn't important, I suppose. Hallucination within summaries is the issue I keep running into which prevents me from incorporating it into any of our systems.

	▲	thrown-0825 3 days ago \| parent [-]
		I have seen ai summaries of calls get people into trouble because the ai hallucinated prices and features that didn't exist

▲

Shank 3 days ago | parent | prev | next [-]

Who reads the summaries? Are they even useful to begin with? Or did this just save everyone 3-5 minutes of meaningless work?

▲

doorhammer 3 days ago | parent | next [-]

Not the op, but I did work supporting three massive call centers for an f500 ecom.

It's 100% plausible it's busy work but it could also be for: - Categorizing calls into broad buckets to see which issues are trending - Sentiment analysis - Identifying surges of some novel/unique issue - Categorizing calls across vendors and doing sentiment analysis that way (looking for upticks in problem calls related to specific TSPs or whatever) - etc

False positives and negatives aren't really a problem once you hit a certain scale because you're just looking for trends. If you find one, you go spot-check it and do a deeper dive to get better accuracy.

Which is also how you end up with some schlepp like me listening to a few hundreds calls in a day at 8x speed (back when I was a QA data analyst) to verify the bucketing. And when I was doing it everything was based on phonetic indexing, which I can't imagine touching llms in terms of accuracy, and it still provided a ton of business value at scale.

▲

vosper 3 days ago | parent | prev [-]

AI reads them and identifies trends and patterns, or answers questions from PMs or others?

	▲	cube00 3 days ago \| parent [-]
		AI writes inaccurate summaries and then consumes its own slop so it can hallucinate the answer to the PM's questions after misreading said slop. Much like dubbing a video tape multiple times, it's going to get worse as you add more layers text predictors.

▲

glimshe 3 days ago | parent | prev | next [-]

This makes sense. AI is obviously useful for many things. But people wouldn't invest tens of billions to summarize data center calls and similar tasks. Replacing data center workers isnt where the money is - it's replacing 100K-200K/year workers.

▲

generic92034 3 days ago | parent | prev | next [-]

> It's not going to replace anyone's job.

Is it not, in the scenario you are describing? You are saying the agents are free now to do higher-value work. Why were there not enough agents before, especially if higher-value work was not done?

▲

cube00 3 days ago | parent | next [-]

It's such a useless platitude. The "higher value work" is answer more calls so we can have less staff on queue.

	▲	c0nducktr 3 days ago \| parent [-]
		Exactly. The only thing this does is allows the company to have fewer staff while making the jobs of the few who remain worse. Great product, can't imagine why the general public aren't thrilled about 'AI'.

▲

hobs 3 days ago | parent | prev [-]

Because call centers are cost centers - nobody pays a dime more than they have to in these situations and its all commodity work.

	▲	generic92034 3 days ago \| parent [-]
		But that means the so-called "higher-value" work does not need to be done, so agents can be fired.

▲

actsasbuffoon 3 days ago | parent | prev | next [-]

That’s the thing. There’s value in AI, it’s just not worth half a trillion dollars to train a new model that’s 0.4% better on benchmarks. Meta is never going to get a worthwhile return on spending $100M on individual engineers.

But that doesn’t mean AI is without its uses. We’re just in that painful phase where the hype needs to die down and we treat LLMs as what they really are; an interesting new tool in the toolkit that provides some new ways to solve problems. It’s almost certainly not going to turn into AGI any time soon. It’s not worth trillions. It’s certainly worth something, though.

I think the financials on developing new frontier models are terrible. But I’ve already built multiple AI projects for my company that are making money and we’ve got extremely happy customers.

Investors thought one company was going to win the AI Wars and make a quadrillion dollars. Instead it’s probably going to be 10,000 startups that will build interesting products based on AI, and training new models won’t actually be a good financial move.

▲

Imanari 3 days ago | parent [-]

Could you broadly describe the AI projects you have built?

	▲	actsasbuffoon 3 days ago \| parent \| next [-]
		I literally just had a conversation with my CEO this morning where he told me not to disclose the projects I’ve been working on, so I can only speak about it obliquely. We identified some problems our customers have, and I’ve come up with interesting ways to use LLMs as part of an automated system to solve some of those problems. It’s not the kind of thing where we just dump some data into the ChatGPT API and get an answer. We’re doing fairly deep integrations that do some interesting/powerful things. It’s been a big deal for our prospective clients and investors.
	▲	trueismywork 3 days ago \| parent \| prev [-]
		Pinn for simulations

▲

didibus 3 days ago | parent | prev | next [-]

Where is the money being saved? Are you reducing the number of agents? Otherwise, it should actually cost more, before you simply had customers wait longer to speak to the next agent no? Or do you sell "support calls" so you're able to sell more of them given the same number of agents?

▲

doorhammer 3 days ago | parent | prev | next [-]

I'm curious, have you noticed an impact on agent morale with this?

Specifically: Do they spend more time actually taking calls now? I guess as long as you're not at the burnout point with utilization it's probably fine, but when I was still supporting call centers I can't count the number of projects I saw trying to push utilization up not realizing how real burnout is at call centers.

I assume that's not news to you, of course. At a certain utilization threshold we'd always start to see AHTs creep up as agents got burned out and consciously or not started trying to stay on good calls.

Guess it also partly depends on if you're in more of a cust serv call center or sales.

I hated working as an actual agent on the phones, but call center ops and strategy at scale has always been fascinating.

▲

lljk_kennedy 3 days ago | parent [-]

Thank you, I came to say this too. You're mushing your humans harder, and they'll break. Those 5 mins of downtime post-call aren't 100% note taking - it's catching their breath, trying to re-compose after dealing with a nasty customer, trying to re-energise after a deep technical session etc.

I think AI in general is just being misused to optimise local minima in detriment to the overall system.

▲

ponector 3 days ago | parent [-]

Imagine how AI changed the call center's work:

1. Some agents have been laid off.

2. Survivors got stripped off 5-minutes summarizing breaks between calls and assigned new higher targets of how many calls should they take per hour/day.

And it wasn't a good job before AI...

▲

doorhammer 3 days ago | parent [-]

So, I fully agree that we should be aware how AI use is impacting front-line agents--honestly, I'd bet AI is overall a bad thing in most cases--but that's just a gut feeling.

That said, it's possible the agents weren't given extra time to make notes about calls and write summaries; often they're not.

You usually have different states you can be in as a call center agent. Something like: "On a call", "Available to take a new call", "Unavailable to take a new call"

Being on a call is also being unavailable to take a call, but you'd obviously track that separately.

"Unavailable" time is usually further broken down into paid time (breaks), unpaid time (lunch) etc

And _sometimes_ the agent will have a state called something like "After Call Work" which is an "Unavailable" state that you use to finish up tasks related to the call you were just on.

So, full disclosure: I did work for a huge e-com supporting huge call centers, but I only worked for one company supporting call centers. What I'm talking about is my experience there and what I heard from people who also worked there who had experience with other call centers.

A lot of call centers don't give agents any "After Call Work" time and if they do, it's heavily discouraged and negatively impacts your metrics. They're expected to finish everything related to the call _during_ the call.

If you're thinking "that's not great" then, yeah, I agree, but it was above my paygrade.

It's entirely possible that offloading that task to an LLM gives agents _more_ breathing room.

But also totally possible that you're right. I don't know the GPs exact situation, but I feel pretty confident that other call centers are doing similar things with AI tagging and summaries and that you see both situations (AI giving more breathing room some places and taking it away others).

▲

ponector 3 days ago | parent [-]

>> It's entirely possible that offloading that task to an LLM gives agents _more_ breathing room

In theory, yes. But there is no way they are going to save millions by giving more breathing room to agents.

▲

doorhammer 3 days ago | parent [-]

As a whole the incentives of capitalism are aligned as you suggest, but every major corp I've worked with has not-so-rare pockets of savvy middle managers that know how to play the game and also care about the welfare of their employees--even if the cultural incentives don't lean that way. (I'm assuming a US market here--and I'm at least tangentially aware that other cultures aren't identical)

E.g., when I worked in call centers I was directly part of initiatives that saved millions and made agents lives better, with an intentionality toward both outcomes.

I also saw people drive agents into the ground trying to maximize utilization and/or schedule adherence with total disregard for the negative morale and business value they were pushing.

It makes me wonder if there are any robust org psych studies about the prevalence and success of middle managers trying to strategically navigate those kinds of situations to benefit their employees. I'd bet it's more rare than not, but I have no idea by how much.

	▲	tafda 2 days ago \| parent [-]
		Moral Mazes ( https://en.wikipedia.org/wiki/Moral_Mazes ) is a sociology classic along these lines. Here's a relevant interview with the author, Robert Jackall: https://anso.williams.edu/files/2015/07/Jackall_interview_Ch...

▲

tux3 3 days ago | parent | prev | next [-]

> it's a huge, measurable efficiency gain.

> It's not going to replace anyone's job

Mechanically, more efficiency means less people required for the same output.

I understand there is no evidence that any other sentence can be written about jobs. Still, you should put more text in between those two sentences. Reading them so close together creates audible dissonance.

▲

missedthecue 3 days ago | parent | next [-]

"Mechanically, more efficiency means less people required for the same output."

Why can't it mean more output with the same number of people? If I pay 100 people for 8 hours of labor a day, and after making some changes to our processes, the volume of work completed is up 10% per day, what is that if not an efficiency gain? What would you call it?

It really depends on the amount of work. If the demand for your labor is infinite, or at least always more than you can do in a days work, efficiency gains won't result in layoffs, just more work completed per shift. If the demand for the work is limited, efficiency gains will likely result in layoffs because there's no point in paying someone who was freed up by your new processes to sit around twirling a pen all day.

▲

tux3 3 days ago | parent [-]

All else equal, the demand for support calls doesn't go up as your support becomes more efficient.

I get that we're trying to look for positive happy scenarios, but only considering the best possible world instead of the most likely world is bias. It's Optimistic in the sense of Voltaire.

▲

missedthecue 3 days ago | parent [-]

What i'm saying is that if the volume of support is high enough, and never even changed, it's completely possible to improve throughput without reducing demand for labor. The result is simply that you improve response times.

▲

tux3 3 days ago | parent [-]

But I think this comes back to the same question of understaffing/overwork. We have to ask what strategic thinking led to accept long response times in the past. And the answer is unequivocal.

Unless we're claiming there is an intractable qualified labor shortage in call centers, this is always the result of a much simpler explanation: it's much cheaper to understaff call centers

A company that wants to save money by adding more AI is a company that cares about cost cutting. Like most companies.

The strategy that caused the company to understaff have not changed. The result is that we go back to homeostasis, and less jobs are needed to reach the same deliberate target.

▲

missedthecue 3 days ago | parent [-]

OK, but in that case, we reach status quo but with fewer employees. Doesn't that meet your definition of efficiency gains?

	▲	tux3 3 days ago \| parent [-]
		Yep. I was arguing that in this case more efficiency means you can't need as many jobs as you would otherwise. That does meet my definition of efficiency gain if there are fewer employees. Whether that's a good thing and for whom is another question.

▲

flkiwi 3 days ago | parent | prev [-]

You're not accounting for teams already being understaffed and overtasked in many situations, with some AI tools allowing people to get back to doing their jobs. We aren't expecting significant headcount changes, but we are expecting significant performance and quality improvements for the resources we have.

	▲	tux3 3 days ago \| parent [-]
		The reason that caused the team to be understaffed and overtasked has not gone away because of AI. I am expecting the team to stay understaffed and overtasked, for the same reason it was before: it's less expensive. With or without an LLM summarizing phone calls.

▲

ethagknight 3 days ago | parent | prev | next [-]

This highlights the potentially unrealistic spend on AI, where the proposal is to spend 10s of millions to save... marginal millions... that could have also been saved by a change in process with limited additional spend.

I also would assume that there are far more significant behavioral or human factors that consume the time writing those minutes, i.e. an easy spot to kill 5-10 min before opening the line for the next inbound call, but the 5-10 minute break will persist anyway.

I fully believe AI will create a lot of value and is revolutionary, especially for industries where value is hidden within data. Its the pace of value creation that stands out to me (how long til its actually useful and better and creates more value than it costs??) but the bubble factor is not ignorable on the near term.

▲

MangoToupe 3 days ago | parent | prev | next [-]

> We operate large call centers, and agents were previously spending 3-5 minutes after each call writing manual summaries of the calls.

This is a tiny fraction of all work done. This is work people were claiming to have solved 15 years ago. Who cares?

▲

amluto 3 days ago | parent | prev | next [-]

I’d like to see a competent AI replace the time that doctors and nurses spend tediously transcribing notes into a medical record system. More time spent doing the actual job is good for pretty much everyone.

▲

beart 3 days ago | parent | next [-]

But... that is the actual job. A clear medical history is very important, and I'm not ready yet to cut out my doctor from that process.

This reminds me of the way juniors tend to think about things. That is, writing code is "the actual job" and commit messages, documentation, project tracking, code review, etc. are tedious chores that get in the way. Of course, there is no end to the complaints of legacy code bases not having any of those things and being difficult to work with.

▲

hinkley 3 days ago | parent | next [-]

Not just juniors. Industry is full of senior and some staff engineers who see discipline as a waste of time.

The number of things I do in a day that half my coworkers see as a waste of time until they enjoy the outcomes is basically uncountable at this point.

If something is a “waste of time” it’s possible that you’re just lousy at it.

Self reflection is a rarer commodity than it should be. And most of the tasks you list either require or invite it.

▲

wl 3 days ago | parent | prev | next [-]

Charting is for billing. If the point were to have accurate medical records useful for facilitating diagnosis and treatment, we'd structure medical records way differently. Fishing clinically-useful bits of information out of encounter and progress notes is tedious and only done as a last resort.

	▲	hinkley 3 days ago \| parent [-]
		I presume for malpractice suits as well.

▲

amluto 3 days ago | parent | prev [-]

Making notes is fine. When a nurse watches a patient in a hospital for an hour and spends 45 of those minutes awkwardly typing into the record system and therefore can’t actually attend to the patient, something is wrong.

▲

nottorp 3 days ago | parent | prev | next [-]

Competent, yes. But the current ones are likely to transcribe "recommend amputating the left foot" as "recommend amputating the right foot". Still want it?

▲

simmerup 3 days ago | parent | prev [-]

Until it hallucinates and the AI has written something wrong about you in your official medical record

▲

hobs 3 days ago | parent | prev | next [-]

That is a good thing, but that's also just a training gap - I worked with tech support agents for years in gigantic settings and taking notes while you take an action is difficult to train but yields tangible results, clarity from the agent on what they are doing step by step, and builds a shared method of documenting things that importantly focuses on the important details and doesn't miss a piece which may be considered trivial by an outsider but (for instance) defines SOP for call escalation and the like.

▲

dvfjsdhgfv 3 days ago | parent | prev | next [-]

What I found is that although some tools are good for transcription and the final result is quite accurate, this is not necessarily true for summaries.

Yes, I routinely read meeting summaries spit out by the so-called SOTA tools and they almost always contain major errors. So if you actually need them for anything serious, it would be wise to keep the audio.

▲

positron26 3 days ago | parent | prev | next [-]

Given that people skimp on work that is viewed as trash anyway, how were you getting value out of the summaries in the first place?

▲

thrown-0825 3 days ago | parent [-]

they weren't

its likely a checkbox for compliance or some policy a middle manager put in place that is now tied to a kpi

	▲	positron26 3 days ago \| parent [-]
		Could be CRM, leaving summaries for the next person. I suppose it would sound like I'm implying a prior.

▲

chasd00 3 days ago | parent | prev | next [-]

so you're saving 3-5 minutes per agent per call. I'm guessing calls come into a queue and then the next available agent starts to handle it. If an average call takes about 20min until the agent hangs up and is free for another then after about 5 calls they've saved enough time to take an extra call they wouldn't have before. 5 calls is 1.4 hrs on the phone, i'm guessing with breaks and call center reps not being 100% on the ball all the time then your agents probably will take maybe 3-4 more calls per day with the AI than without (assuming call volume is such that there are always more calls than agents can handle)

Is that really millions of savings annually? Maybe it is but I always hesitate when a process change that saves one person a few minutes is extrapolated all the way out to dollars/year. What you'll probably see is the agents using those 3-5 minutes to check their phone.

▲

Terr_ 3 days ago | parent | prev | next [-]

I think the biggest issue is accurately estimating the LLM failure risk, and what impacts the company is willing to tolerate in the long term. (As distinct from what the company is liable to permit through haste and ignorance.)

With LLMs the risk is particularly hard to characterize, especially when it comes to adversarial inputs.

▲

pjmorris 3 days ago | parent | prev | next [-]

I wonder how this change affects what the agents remember about the calls, and how that affects their performance on future calls. And I wonder whether agent performance, as measured by customer satisfaction, will decline over time, and whether that will affect the bottom line.

▲

trevor-e 3 days ago | parent | prev | next [-]

This is a great use-case of AI.

However I strongly doubt your point about "It's not going to replace anyone's job" and that "they also free up the human agents to do higher-value work". The reality in most places is that fewer agents are now needed to do the same work as before, so some downsizing will likely occur. Even if they are able to switch to higher-value work, some amount of work is being displaced somewhere in the chain.

And to be clear I'm not saying this is bad at all, I'm just surprised to see so many deluded by the "it won't replace jobs" take.

▲

belter 3 days ago | parent | prev | next [-]

Are the summaries reviewed by the agents? And if not how do you handle hallucinations, or transcribe of wrong insurance policy id for example? Like, customer wants to cancel insurance policy AB-2345D and transcribe says wants to cancel insurance policy AD-2345B

▲

ghalvatzakis 3 days ago | parent | prev | next [-]

I lead an AI engineering team that automated key parts of an interviewing process, saving thousands of hours each month by handling thousands of interviews. This reduced repetitive, time-consuming tasks and freed human resources to focus on higher-value work

▲

the_snooze 3 days ago | parent [-]

I'm under the impression that one of the most critical responsibilities a lead has is to establish and maintain a good working culture. Properly vetting new additions feeds directly into that. Why offload it to AI?

▲

ghalvatzakis 3 days ago | parent [-]

Just to clarify, these aren’t interviews for job positions

▲

Jolter 3 days ago | parent | next [-]

What kind of interviews do HR do, apart from job interviews?

▲

ghalvatzakis 3 days ago | parent [-]

Not HR either. I work for an experts network firm

	▲	Jolter 3 days ago \| parent [-]
		Aha, see, your choice of words there was ”freed human resources” which threw me off. I see now.

▲

hinkley 3 days ago | parent | prev [-]

Clear as mud.

▲

Capricorn2481 3 days ago | parent | prev | next [-]

> Not only are the summaries better than those produced by our human agents

We have someone using Firefly for note taking, and it's pretty bad. Frequently gets details wrong or extrapolates way too much from a one-off sentence someone said.

How do you verify these are actually better?

▲

varispeed 3 days ago | parent | prev | next [-]

What’s the actual business value of a “summary” though? A transcript is the record. A tag or structured note (“warranty claim,” “billing dispute,” “out of scope”) is actionable. But a free-form blob of prose? That’s just narrative garnish - which, if wrong or biased, is worse than useless.

Imagine a human agent or AI summarises: “Customer accepted proposed solution.” Did they? Or did they say “I’ll think about it”? Those aren’t the same thing, but in the dashboard they look identical. Summaries can erase nuance, hedge words, emotional tone, or the fact the customer hung up furious.

If you’re running a call centre, the question is: are you using this text to drive decisions, or is it just paperwork to make management feel like something is documented? Because “we saved millions on producing inaccurate metadata nobody really needs” isn’t quite the slam dunk it sounds like.

▲

croes 3 days ago | parent | prev | next [-]

But I doubt it justifies the billions of dollars getting burned for training language models and building power plants.

And are full transcriptions not the better option?

▲

tempodox 3 days ago | parent | prev | next [-]

That’s an important point. Real-life use cases are not sexy. And they don’t lend themselves to overblown hype generation and “creative marketing”.

▲

ozgune 3 days ago | parent | prev | next [-]

Previously discussed here: https://news.ycombinator.com/item?id=44941118

It's also disappointing that MIT requires you to fill out a form (and wait for) access to the report. I read four separate stories based on the report, and they all provide a different perspective.

Here's the original pdf before MIT started gating it: https://web.archive.org/web/20250818145714/https://nanda.med...

▲

nuker 3 days ago | parent | prev | next [-]

> recently switched to using AI to transcribe and write these summaries

Did users knew that conversation was recorded?

	▲	creaturemachine 3 days ago \| parent \| next [-]
		Yeah the standard "this call may be recorded for quality or training purposes" preamble shouldn't cover for slurping your voiceprint to further the butchering of client service that this call centre is here for.
	▲	prophesi 3 days ago \| parent \| prev \| next [-]
		You would be hard-pressed to find a call center that _doesn't_ start every phone call with a warning that the conversation may be recorded.
	▲	watwut 3 days ago \| parent \| prev [-]
		Typical call center call is recorded and you are told so by the start of the conversation. I had quite a few of those.

▲

stronglikedan 3 days ago | parent | prev | next [-]

I wouldn't allow myself to be held accountable for anything in a summary I didn't write.

▲

ponector 3 days ago | parent | prev | next [-]

Your company saved millions means they fired many agents and hiked amount of cases per agent?

▲

apwell23 3 days ago | parent | prev | next [-]

i was pretty sure you were going to say "meeting summaries" ( which apparently is poster child of LLM application) .

my guess was wrong but not really.

▲

computerthings 3 days ago | parent | prev | next [-]

[dead]

▲

ddddang 3 days ago | parent | prev | next [-]

[dead]

▲

zeromyte 3 days ago | parent | prev | next [-]

[dead]

▲

pluc 3 days ago | parent | prev | next [-]

You could do that offline 20 years ago with Dragon Naturally Speaking and a one-time licence.

	▲	dymk 3 days ago \| parent \| next [-]
		You could get a transcript, not a summary
	▲	loloquwowndueo 3 days ago \| parent \| prev [-]
		Only if your audio was crystal clear, you spoke like a robot very slowly, and each customer has to do a 30-minute “please read this text slowly to train the speech recognition software” preamble before talking to the actual human.

▲

butlike 3 days ago | parent | prev [-]

How can you double-check the work? Also, what happens when the AI transcription is wrong in a way that would have terminated the employee. You can't fire a model.

Finally, who cares about millions saved (while considering the above introduced risk), when trillions are on the line?

▲

PaulRobinson 3 days ago | parent | next [-]

Having a human read a summary is way faster than getting them to write it. If they want to edit it, they can.

AI today is terrible at replacing humans, but OK at enhancing them.

Everyone who gets that is going to find gains - real gains, and fast - and everyone who doesn't, is going to end up spending a lot of money getting into an almost irreversible mistake.

	▲	butlike 3 days ago \| parent [-]
		"Reading a summary is faster, so enhancing humans with AI is going to receive boons or busts to the implementer." Now, summary, or original? (Provided the summary is intentionally vague to a fault, for arguments sake on my end).

▲

throitallaway 3 days ago | parent | prev [-]

I presume they're not using these notes for anything mission or life critical, so anything less than 100% accuracy is OK.

▲

butlike 3 days ago | parent [-]

I disagree with the concept of affluvic notes. All notes are intrinsically actionable; it's why they're a note in the first place. Any note has unbounded consequence depending on the action taken from it.

	▲	wredcoll 3 days ago \| parent [-]
		You're being downvoted, I suspect for being a tad hyperbolic, but I think you are raising a really important point, which is just the ever more gradual of removing a human's ability to disobey the computer system running everything. And the lack of responsibility for following computer instructions. It's a tad far-fetched in this specific scenario, but an AI summary that says something like "cancel the subscription for user xyz" and then someone else takes action on that, and XYZ is the wrong ID, what happens?