Thinking about that time Berkeley delisted thousands of recordings of course content as a result of a lawsuit complaining that they could not be utilized by deaf individuals. Can this be resolved with current technology? Google's auto captioning has been abysmal up to this point, I've often wondered what the cost would be for google to run modern tech over the entire backlog of youtube. At least then they might have a new source of training data.

https://news.berkeley.edu/2017/02/24/faq-on-legacy-public-co...

Discussed at the time (2017) https://news.ycombinator.com/item?id=13768856

▲

hackernewds 8 months ago | parent | next [-]

What a silly requirement? Since 1% cannot benefit, let's remove it for the 99%

▲

kleiba 8 months ago | parent | next [-]

Note that Berkeley is in theory not required to remove the video archive. It's just that by law, they are required to add captions. So, if they want to keep it up, that's what they could do. Except that it's not really a choice - the costs for doing so would be prohibitive. So, really, Berkeley is left with no choice: making the recording accessible or don't offer them at all means - in practice - "don't offer them at all".

Clearly the result of a regulation that meant well. But the road to hell is paved with good intentions.

It's a bit reminiscent of a law that prevents institutions from continually offering employees non-permanent work contracts. As in, after two fixed-term contracts, the third one must be permanent. The idea is to guarantee workers more stable and long-term perspectives. The result, however, is that the employee's contract won't get renewed at all after the second one, and instead someone else will be hired on a non-permanent contract.

	▲	freedomben 8 months ago \| parent [-]
		> the road to hell is paved with good intentions The longer I live the more the truth of this gets reinforced. We humans really are kind of bad at designing systems and/or solving problems (especially problems of our own making). Most of us are like Ralph Wiggum with a crayon sticking out of our noises saying, "I'm helping!"

▲

Thorrez 8 months ago | parent | prev | next [-]

In the past, my university was publishing and mailing me a print magazine, and making it available in pdf form online. Then they stopped making the pdf available. I emailed them and asked why. They said it's because the pdf wasn't accessible.

But the print form was even less accessible, and they kept publishing that...

	▲	giancarlostoro 8 months ago \| parent [-]
		ADA compliance will cost you.

▲

3abiton 8 months ago | parent | prev | next [-]

It's one of those "to motivate the horse to run 1% faster, you add shit ton of weight on top of it" strategy.

▲

IanCal 8 months ago | parent | prev [-]

The problem is that having that rule results in those 1%s always being excluded. It's probably worth just going back and looking at the arguments for laws around accessibility.

▲

mst 8 months ago | parent [-]

Yeah, every time I try and figure out an approach that could've avoided this being covered by the rules without making it easy for everybody to screw over deaf people entirely I end up coming to the conclusion that there probably isn't one.

I'm somewhat tempted to think that whoever sued berkeley and had the whole thing taken down in this specific case was just being a knob, but OTOH there's issues even with that POV in terms of letting precedents be set that will de facto still become "screw over deaf people entirely" even when everybody involved is doing their best to act in good faith.

Hopefully speech-to-text and text-to-speech will make the question moot in the medium term.

▲

freedomben 8 months ago | parent [-]

> Hopefully speech-to-text and text-to-speech will make the question moot in the medium term.

I really think this and other tech advances are going to be our saviors. It's still early days and it sometimes gets things wrong, but it's going to get good and it will basically allow us to have our cake and eat it too (as long as we can prevent having automated solutions banned).

	▲	mst 8 months ago \| parent [-]
		Yeah, my hopes have the caveat of "this requires regulations to catch up to where technology is at rather than making everything worse" and in addition to my generally low opinion of politicians (the ones I've voted for absolutely included) there's a serious risk of a "boomers versus technology" incident spannering it even if everything else goes right ... but I can still hope even if I can see a number of possible futures where said hopes will turn out to be in vain.

▲

andai 8 months ago | parent | prev | next [-]

Didn't YouTube have auto-captions at the time this was discussed? Yeah they're a bit dodgy but I often watch videos in public with sound muted and 90% of the time you can guess what word it was meant to be from context. (And indeed more recent models do way, way, way better on accuracy.)

▲

zehaeva 8 months ago | parent | next [-]

I have a few Deaf/Hard of Hearing friends who find the auto-captions to be basically useless.

Anything that's even remotely domain specific becomes a garbled mess. Even watching documentaries about light engineering/archeology/history subjects are hilariously bad. Names of historical places and people are randomly correct and almost always never consistent.

The second anyone has a bit of an accent then it's completely useless.

I keep them on partially because I'm of the "everything needs to have subtitles else I can't hear the words they're saying" cohort. So I can figure out what they really mean, but if you couldn't hear anything I can see it being hugely distracting/distressing/confusing/frustrating.

▲

hunter2_ 8 months ago | parent | next [-]

With this context, it seems as though correction-by-LLM might be a net win among your Deaf/HoH friends even if it would be a net loss for you, since you're able to correct on the fly better than an LLM probably would, while the opposite is more often true for them, due to differences in experience with phonetics?

Soundex [0] is a prevailing method of codifying phonetic similarity, but unfortunately it's focused on names exclusively. Any correction-by-LLM really ought to generate substitution probabilities weighted heavily on something like that, I would think.

[0] https://en.wikipedia.org/wiki/Soundex

▲

novok 8 months ago | parent | next [-]

You can also download the audio only with yt-dlp and then remake subs with whisper or whatever other model you want. GPU compute wise it will probably be less than asking an llm to try to correct a garbled transcript.

▲

ldenoue 8 months ago | parent | next [-]

The current Flash-8B model I use costs $1 per 500 hours of transcript.

	▲	andai 8 months ago \| parent [-]
		If I read OpenAI's pricing right, then Google's thing is 200 times cheaper?

▲

HPsquared 8 months ago | parent | prev [-]

I suppose the gold standard would be a multimodal model that also looks at the screen (maybe only if the captions aren't making much sense).

▲

schrodinger 8 months ago | parent | prev | next [-]

I'd assume Soundex is too basic and English-centric to be a practical solution for an international company like Google. I was taught it and implemented it in a freshman level CS course in 2004, it can't be nearly state of the art!

▲

shakna 8 months ago | parent | prev [-]

Soundex is fast, but inaccurate. It only prevails, because of the computational cost of things like levenshtein distance.

▲

creato 8 months ago | parent | prev | next [-]

I use youtube closed captions all the time when I don't want to have audio. The captions are almost always fine. I definitely am not watching videos that would have had professional/human edited captions either.

There may be mistakes like the ones you mentioned (getting names wrong/inconsistent), but if I know what was intended, it's pretty easy to ignore that. I think expecting "textual" correctness is unreasonable. Usually when there are mistakes, they are "phonetic", i.e. if you spoke the caption out loud, it would sound pretty similar to what was spoken in the video.

▲

dqv 8 months ago | parent | next [-]

> I think expecting "textual" correctness is unreasonable.

Of course you think that, you don't have to rely solely on closed captions! It's usually not even posed as an expectation, but as a request to correct captions that don't make sense. Especially now that we have auto-captioning and tools that auto-correct the captions, running through and tweaking them to near-perfect accuracy is not an undue burden.

> if you spoke the caption out loud, it would sound pretty similar to what was spoken in the video.

Yes, but most deaf people can't do that. Even if they can, they shouldn't have to.

	▲	beeboobaa6 8 months ago \| parent [-]
		There's helping people and there's infantilizing them. Being deaf doesn't mean you're stupid. They can figure it out. Deleting thousands of hours of course material because you're worried they're not able to understand autogenerated captions just ensures everyone loses. Don't be so ridiculous.

▲

mst 8 months ago | parent | prev [-]

They continue to be the worst automated transcripts I encounter and personally I find them sufficiently terribad that every time I try them I end up filing them under "nope, still more trouble than it's worth, gonna find a different source for this information and give them another go in six months."

Even mentally sounding them out (which is fine for me since I have no relevant disabilities, I just despise trying to take in any meaningful quantity of information from a video) when they look weird doesn't make them tolerable *for me*.

It's still a good thing overall that they're tolerable for you, though, and I hope other people are on average finding the experience closer to how you find it than how I find it ... but I definitely don't, yet.

Hopefully in a year or so I'll be in the same camp as you are, though, overall progress in the relevant class of tech seems to've hit a pretty decent velocity these days.

▲

GaggiX 8 months ago | parent | prev | next [-]

Youtube captions have improved massively in recent years, they are flawless in most cases, sometimes a few errors (almost entirely in reporting numbers).

I think that the biggest problem is that the subtitles do not distinguish between the speakers.

▲

ldenoue 8 months ago | parent | prev [-]

Definitely: and just giving the LLM context before correcting (in this case the title and description of the video, often written by a person) creates much better transcripts.

▲

jonas21 8 months ago | parent | prev | next [-]

Yes, but the DOJ determined that the auto-generated captions were "inaccurate and incomplete, making the content inaccessible to individuals with hearing disabilities." [1]

If the automatically-generated captions are now of a similar quality as human-generated ones, then that changes things.

[1] https://news.berkeley.edu/wp-content/uploads/2016/09/2016-08...

▲

jazzyjackson 8 months ago | parent | prev | next [-]

Definitely depends on audio quality and how closely a speaker's dialect matches the mid-atlantic accent, if you catch my drift.

IME youtube transcripts are completely devoid of meaningful information, especially when domain-specific vocabulary is used.

▲

PeterStuer 8 months ago | parent | prev | next [-]

Youtube auto-captions are extremely poor compared to e.g. running the audio through Wisper.

▲

cavisne 8 months ago | parent | prev [-]

What happened here is a specific scam where companies are targeted for ADA violations, which are so vague it’s impossible to “comply”.

▲

georgecmu 8 months ago | parent | prev | next [-]

A bit of an aside, but the entire Berkeley collection has been saved by and is available at archive.org: https://archive.org/search?query=subject%3A%22webcast.berkel...

It would be great if they were annotated and served in a more user-friendly fashion.

As a bonus link, one of my favorite courses from the time: https://archive.org/details/ucberkeley_webcast_itunesu_35482...

	▲	freedomben 8 months ago \| parent [-]
		Neat, thanks!

▲

IanCal 8 months ago | parent | prev | next [-]

Probably quite expensive over the whole catalog but the Berkley content would be cheap to do.

If it's, say, 5000 hours then through the best model at assembly.ai with no discounts it's cost less than $2000. I know someone could do whisper for cheaper, and there likely would be discounts at this rate but worst case it seems very doable even for an individual.

▲

ldenoue 8 months ago | parent | next [-]

My repo doesn't re process the audio track: instead it makes the raw ASR text transcript better by feeding it additional info (title and description) and asking the LLM to fix errors.

It is not perfect, it'd sometimes replace words with a synonym, but it is much faster and cheaper.

The low cost of Gemini 1.5 Flash-8B costs $1 per 500 hours of transcript.

▲

ei23 8 months ago | parent | prev [-]

With a RTX4090 and insanly-fast-whisper on whisper-large-v3-turbo (see Whisper-WebUI for easy testing) you can transscribe 5000h on consumer hardware in about 50h with timestamps. So, yeah. I also know someone.

	▲	IanCal 8 months ago \| parent [-]
		I can also run this all locally, my point was more that at the worst right now the most advanced model (afaik, I'm not personally benchmarking) paid for at the headline rates, for a huge content library, costs such a reasonable amount that an individual can do it. I've donated more to single charities than this would cost, while it's not an insignificant sum it's a "find one person who cares enough" level problem. Grabbing the audio from thousands of hours of video, or even just managing getting the content from wherever it's stored, is probably more of an issue than actually creating the transcripts. If anyone reading this has access to the original recordings, this is a pretty great time to get transcriptions.

▲

delusional 8 months ago | parent | prev [-]

That's a legal issue. If humans wanted that content to be up, we just could have agreed to keep it up. Legal issues don't get solved by technology.

▲

jazzyjackson 8 months ago | parent | next [-]

Well. The legal complaint was that transcripts don't exist. The issue was that it was prohibitively expensive to resolve the complaint. Now that transcription is 0.1% of the cost it was 8 years ago, maybe the complaint could have been resolved.

Is building a ramp to meet ADA requirements not using technology to solve a legal issue?

▲

delusional 8 months ago | parent [-]

Nowhere on the linked page at least does it say that it was due to cost. It would seem more likely to me that it was a question of nobody wanting to bother standing up for the videos. If nobody wants to take the fight, the default judgement becomes to take it down.

Building a ramp solves a problem. Pointing at a ramp 5 blocks away 7 years later and asking "doesn't this solve this issue" doesn't.

▲

pests 8 months ago | parent [-]

Yet this feels very harrison bergeron to me. To handicap those with ability so we all can be at the same level.

	▲	fuzzy_biscuit 8 months ago \| parent [-]
		Right. The judgment doesn't help people with disabilities at all. It only punishes the rest of the population.

▲

yard2010 8 months ago | parent | prev [-]

Yet. Legal issues don't get solved by tech yet!