Remix.run Logo
mcny 6 days ago

I don't spend too much time thinking about cameras or lenses but this kind of conversation makes me wonder... when I take photos of receipts or street signs or just text in general, is it possible that at some point the computational photography makes a mistake and changes text? or am I being paranoid?

matrss 6 days ago | parent | next [-]

Worse, Xerox scanners specifically meant for digitizing documents have changed text for a long time. The compression algorithm they used (I think even in the default settings) sometimes replaced e.g. 6 with 8, and similar things. See: https://www.youtube.com/watch?v=7FeqF1-Z1g0 (german, but there should be news articles from back then in english as well, somewhere)

gruez 6 days ago | parent [-]

That's not really "computation photography" in any meaningful sense, closer to "digital processing". It's not impossible for such glitches to occur with modern smartphone cameras, but it's implausible. I don't think there's ever a confirmed instance of such a gaff happening. Meanwhile a few years ago there was a photo with a misplaced leaf that made the rounds, and people were complaining about how it was caused by computational photography, but it turned out the photo was accurate. The leaf actually there.

matrss 6 days ago | parent [-]

My point was that you don't have to take a photo of a receipt to run into this issue, actual machines specifically build to digitize receipts and other documents already made this kind of mistake.

No idea if this can happen with what modern smartphone cameras do to photos. If "AI" is involved then I would expect such issues to be possible because of the basic nature of them being random generators, just like how LLMs hallucinate stuff all the time. Other "enhancement" approaches might not produce issues like this.

bobbylarrybobby 6 days ago | parent | prev | next [-]

iPhones can definitely garble text, although it's not clear whether they can substitute some text for another. Seems possible but unlikely (in a purely statistical sense).

https://www.reddit.com/r/iphone/comments/1m5zsj7/ai_photo_ga...

https://www.reddit.com/r/iphone/comments/1jbcl1l/iphone_16_p...

https://www.reddit.com/r/iphone/comments/17bxcm8/iphone_15_n...

jlokier 6 days ago | parent | prev | next [-]

> is it possible that at some point the computational photography makes a mistake and changes text?

Yes it is. I've seen that happen in real-time with the built-in camera viewfinder (not even taking a photo) on my mid-range Samsung phone, when I zoomed in on a sign.

It only changed one letter, and it was more like a strange optical warping from one letter to a different one when I pointed the camera at a particular sign in a shop, but it was very surprising to see.

rasalas 6 days ago | parent | prev | next [-]

Xerox scanners/photocopiers had this problem.

https://news.ycombinator.com/item?id=29223815

Aachen 6 days ago | parent [-]

It was the compression format, not the scanner, right? Same would have happened if you store in that format (with the same quality settings etc.) on a computer or smartphone

Not that that helps anyone who's affected, but that situation is more like if you'd have an .aip file, AI Photo storage format, where it invents details when you zoom in, and not a sensor (pipeline) issue

namibj 5 days ago | parent [-]

No they exhibited it in pure instant single copy copying mode.

Aachen 5 days ago | parent [-]

Oh wtf! I had ctrl+f'd the article for cop (to catch "copy" and "copies" and such) to quickly check this but didn't see that. Then I guess I don't remember the root cause of this issue

namibj 3 hours ago | parent [-]

Apparently you were right, mostly. Though it was later determined to be independent of quality setting; the vendor had claimed after the initial findings and having had a lot of time to try and internally reproduce "that factory default settings would be unaffected".

I, probably due to phrasing ambiguity in an old TheRegister article on the matter, had mistakenly remembered the temporary storage between scan and print of the copy mode to also had been affected.

As there were many situations where one would scan and destroy the original once offsite backup has run, while physical copies would/should often not entail destruction of the original, most of the overall damage/impact would be due to scanning anyways, not copying.

6 days ago | parent | prev | next [-]
[deleted]
sjsdaiuasgdia 6 days ago | parent | prev | next [-]

It's definitely a possibility if there's a point where LLM-based OCR is applied.

See https://www.runpulse.com/blog/why-llms-suck-at-ocr and its related HN discussion https://news.ycombinator.com/item?id=42966958

thesuitonym 6 days ago | parent [-]

Like almost everything LLMs do, you don't need an LLM to make these mistakes.

sjsdaiuasgdia 6 days ago | parent [-]

LLM-based OCR and speech transcription do come with a failure condition that is different than you see in pre-LLM solutions. When the source data is hard to understand, LLMs try to fill the gap with something that makes sense given the surrounding context.

Pre-LLM approaches handle unintelligible source data differently. You'll more commonly see nonsense output for the unintelligible bits. In some cases the tool might be capable of recognizing low confidence and returning an error or other indicator of a possible miss.

IMO, that's a feature. The LLM approach makes up something that looks right but may not actually match the source data. These errors are far harder to detect and more likely to make it past human review.

The LLM approach does mean that you can often get a more "complete" output from a low quality data source vs pre-LLM approaches. And sometimes it might even be correct! But it will get it wrong other times.

Another failure condition I've experienced with LLM-based voice transcription that I didn't have pre-LLM - running down the wrong fork in the road. Sometimes the LLM approaches will get a word or two wrong...words with similar phonetics or multiple meanings, that kind of thing. It may then continue down the path this mistaken context has created, outputting additional words that do not align to the source data at all.

coredog64 6 days ago | parent | prev [-]

Having uploaded my share of receipts to Concur, there's 2 checks & balances: If you still have the original, then you can correct the OCR'd value. And then Concur will recognized both line items and totals and whine if they don't match.