Remix.run Logo
jazzyjackson a year ago

Thinking about that time Berkeley delisted thousands of recordings of course content as a result of a lawsuit complaining that they could not be utilized by deaf individuals. Can this be resolved with current technology? Google's auto captioning has been abysmal up to this point, I've often wondered what the cost would be for google to run modern tech over the entire backlog of youtube. At least then they might have a new source of training data.

https://news.berkeley.edu/2017/02/24/faq-on-legacy-public-co...

Discussed at the time (2017) https://news.ycombinator.com/item?id=13768856

hackernewds a year ago | parent | next [-]

What a silly requirement? Since 1% cannot benefit, let's remove it for the 99%

kleiba a year ago | parent | next [-]

Note that Berkeley is in theory not required to remove the video archive. It's just that by law, they are required to add captions. So, if they want to keep it up, that's what they could do. Except that it's not really a choice - the costs for doing so would be prohibitive. So, really, Berkeley is left with no choice: making the recording accessible or don't offer them at all means - in practice - "don't offer them at all".

Clearly the result of a regulation that meant well. But the road to hell is paved with good intentions.

It's a bit reminiscent of a law that prevents institutions from continually offering employees non-permanent work contracts. As in, after two fixed-term contracts, the third one must be permanent. The idea is to guarantee workers more stable and long-term perspectives. The result, however, is that the employee's contract won't get renewed at all after the second one, and instead someone else will be hired on a non-permanent contract.

freedomben a year ago | parent [-]

> the road to hell is paved with good intentions

The longer I live the more the truth of this gets reinforced. We humans really are kind of bad at designing systems and/or solving problems (especially problems of our own making). Most of us are like Ralph Wiggum with a crayon sticking out of our noises saying, "I'm helping!"

Thorrez a year ago | parent | prev | next [-]

In the past, my university was publishing and mailing me a print magazine, and making it available in pdf form online. Then they stopped making the pdf available. I emailed them and asked why. They said it's because the pdf wasn't accessible.

But the print form was even less accessible, and they kept publishing that...

giancarlostoro a year ago | parent [-]

ADA compliance will cost you.

3abiton a year ago | parent | prev | next [-]

It's one of those "to motivate the horse to run 1% faster, you add shit ton of weight on top of it" strategy.

IanCal a year ago | parent | prev [-]

The problem is that having that rule results in those 1%s always being excluded. It's probably worth just going back and looking at the arguments for laws around accessibility.

mst a year ago | parent [-]

Yeah, every time I try and figure out an approach that could've avoided this being covered by the rules without making it easy for everybody to screw over deaf people entirely I end up coming to the conclusion that there probably isn't one.

I'm somewhat tempted to think that whoever sued berkeley and had the whole thing taken down in this specific case was just being a knob, but OTOH there's issues even with that POV in terms of letting precedents be set that will de facto still become "screw over deaf people entirely" even when everybody involved is doing their best to act in good faith.

Hopefully speech-to-text and text-to-speech will make the question moot in the medium term.

andai a year ago | parent | prev | next [-]

Didn't YouTube have auto-captions at the time this was discussed? Yeah they're a bit dodgy but I often watch videos in public with sound muted and 90% of the time you can guess what word it was meant to be from context. (And indeed more recent models do way, way, way better on accuracy.)

zehaeva a year ago | parent | next [-]

I have a few Deaf/Hard of Hearing friends who find the auto-captions to be basically useless.

Anything that's even remotely domain specific becomes a garbled mess. Even watching documentaries about light engineering/archeology/history subjects are hilariously bad. Names of historical places and people are randomly correct and almost always never consistent.

The second anyone has a bit of an accent then it's completely useless.

I keep them on partially because I'm of the "everything needs to have subtitles else I can't hear the words they're saying" cohort. So I can figure out what they really mean, but if you couldn't hear anything I can see it being hugely distracting/distressing/confusing/frustrating.

hunter2_ a year ago | parent | next [-]

With this context, it seems as though correction-by-LLM might be a net win among your Deaf/HoH friends even if it would be a net loss for you, since you're able to correct on the fly better than an LLM probably would, while the opposite is more often true for them, due to differences in experience with phonetics?

Soundex [0] is a prevailing method of codifying phonetic similarity, but unfortunately it's focused on names exclusively. Any correction-by-LLM really ought to generate substitution probabilities weighted heavily on something like that, I would think.

[0] https://en.wikipedia.org/wiki/Soundex

creato a year ago | parent | prev | next [-]

I use youtube closed captions all the time when I don't want to have audio. The captions are almost always fine. I definitely am not watching videos that would have had professional/human edited captions either.

There may be mistakes like the ones you mentioned (getting names wrong/inconsistent), but if I know what was intended, it's pretty easy to ignore that. I think expecting "textual" correctness is unreasonable. Usually when there are mistakes, they are "phonetic", i.e. if you spoke the caption out loud, it would sound pretty similar to what was spoken in the video.

GaggiX a year ago | parent | prev | next [-]

Youtube captions have improved massively in recent years, they are flawless in most cases, sometimes a few errors (almost entirely in reporting numbers).

I think that the biggest problem is that the subtitles do not distinguish between the speakers.

ldenoue a year ago | parent | prev [-]

Definitely: and just giving the LLM context before correcting (in this case the title and description of the video, often written by a person) creates much better transcripts.

jonas21 a year ago | parent | prev | next [-]

Yes, but the DOJ determined that the auto-generated captions were "inaccurate and incomplete, making the content inaccessible to individuals with hearing disabilities." [1]

If the automatically-generated captions are now of a similar quality as human-generated ones, then that changes things.

[1] https://news.berkeley.edu/wp-content/uploads/2016/09/2016-08...

jazzyjackson a year ago | parent | prev | next [-]

Definitely depends on audio quality and how closely a speaker's dialect matches the mid-atlantic accent, if you catch my drift.

IME youtube transcripts are completely devoid of meaningful information, especially when domain-specific vocabulary is used.

PeterStuer a year ago | parent | prev | next [-]

Youtube auto-captions are extremely poor compared to e.g. running the audio through Wisper.

cavisne a year ago | parent | prev [-]

What happened here is a specific scam where companies are targeted for ADA violations, which are so vague it’s impossible to “comply”.

georgecmu a year ago | parent | prev | next [-]

A bit of an aside, but the entire Berkeley collection has been saved by and is available at archive.org: https://archive.org/search?query=subject%3A%22webcast.berkel...

It would be great if they were annotated and served in a more user-friendly fashion.

As a bonus link, one of my favorite courses from the time: https://archive.org/details/ucberkeley_webcast_itunesu_35482...

freedomben a year ago | parent [-]

Neat, thanks!

IanCal a year ago | parent | prev | next [-]

Probably quite expensive over the whole catalog but the Berkley content would be cheap to do.

If it's, say, 5000 hours then through the best model at assembly.ai with no discounts it's cost less than $2000. I know someone could do whisper for cheaper, and there likely would be discounts at this rate but worst case it seems very doable even for an individual.

ldenoue a year ago | parent | next [-]

My repo doesn't re process the audio track: instead it makes the raw ASR text transcript better by feeding it additional info (title and description) and asking the LLM to fix errors.

It is not perfect, it'd sometimes replace words with a synonym, but it is much faster and cheaper.

The low cost of Gemini 1.5 Flash-8B costs $1 per 500 hours of transcript.

ei23 a year ago | parent | prev [-]

With a RTX4090 and insanly-fast-whisper on whisper-large-v3-turbo (see Whisper-WebUI for easy testing) you can transscribe 5000h on consumer hardware in about 50h with timestamps. So, yeah. I also know someone.

IanCal a year ago | parent [-]

I can also run this all locally, my point was more that at the worst right now the most advanced model (afaik, I'm not personally benchmarking) paid for at the headline rates, for a huge content library, costs such a reasonable amount that an individual can do it. I've donated more to single charities than this would cost, while it's not an insignificant sum it's a "find one person who cares enough" level problem.

Grabbing the audio from thousands of hours of video, or even just managing getting the content from wherever it's stored, is probably more of an issue than actually creating the transcripts.

If anyone reading this has access to the original recordings, this is a pretty great time to get transcriptions.

delusional a year ago | parent | prev [-]

That's a legal issue. If humans wanted that content to be up, we just could have agreed to keep it up. Legal issues don't get solved by technology.

jazzyjackson a year ago | parent | next [-]

Well. The legal complaint was that transcripts don't exist. The issue was that it was prohibitively expensive to resolve the complaint. Now that transcription is 0.1% of the cost it was 8 years ago, maybe the complaint could have been resolved.

Is building a ramp to meet ADA requirements not using technology to solve a legal issue?

delusional a year ago | parent [-]

Nowhere on the linked page at least does it say that it was due to cost. It would seem more likely to me that it was a question of nobody wanting to bother standing up for the videos. If nobody wants to take the fight, the default judgement becomes to take it down.

Building a ramp solves a problem. Pointing at a ramp 5 blocks away 7 years later and asking "doesn't this solve this issue" doesn't.

yard2010 a year ago | parent | prev [-]

Yet. Legal issues don't get solved by tech yet!