Remix.run Logo
stego-tech 6 days ago

I fully agree with your observations, and would add the irony of such a pursuit by phone makers is that serious hobbyist/amateur/professional photographers and videographers understand that cameras are inherently inaccurate, and that what we’re really capturing is an interpretation of what we’re seeing through imperfect glass, coatings, and sensor media to form an artistic creation. Sure, cameras can be used for accuracy, but those models and lenses are often expensive and aimed at specific industries.

We enjoy the imperfections of cameras because they let us create art. Smartphone makers take advantage of that by, as you put it, cranking things to eleven to manipulate psychology rather than invest in more accurate platforms that require skill. The ease is the point, but ease rarely creates lasting art the creator is genuinely proud of or that others appreciate the merit behind.

mcny 6 days ago | parent | next [-]

I don't spend too much time thinking about cameras or lenses but this kind of conversation makes me wonder... when I take photos of receipts or street signs or just text in general, is it possible that at some point the computational photography makes a mistake and changes text? or am I being paranoid?

matrss 6 days ago | parent | next [-]

Worse, Xerox scanners specifically meant for digitizing documents have changed text for a long time. The compression algorithm they used (I think even in the default settings) sometimes replaced e.g. 6 with 8, and similar things. See: https://www.youtube.com/watch?v=7FeqF1-Z1g0 (german, but there should be news articles from back then in english as well, somewhere)

gruez 6 days ago | parent [-]

That's not really "computation photography" in any meaningful sense, closer to "digital processing". It's not impossible for such glitches to occur with modern smartphone cameras, but it's implausible. I don't think there's ever a confirmed instance of such a gaff happening. Meanwhile a few years ago there was a photo with a misplaced leaf that made the rounds, and people were complaining about how it was caused by computational photography, but it turned out the photo was accurate. The leaf actually there.

matrss 6 days ago | parent [-]

My point was that you don't have to take a photo of a receipt to run into this issue, actual machines specifically build to digitize receipts and other documents already made this kind of mistake.

No idea if this can happen with what modern smartphone cameras do to photos. If "AI" is involved then I would expect such issues to be possible because of the basic nature of them being random generators, just like how LLMs hallucinate stuff all the time. Other "enhancement" approaches might not produce issues like this.

bobbylarrybobby 6 days ago | parent | prev | next [-]

iPhones can definitely garble text, although it's not clear whether they can substitute some text for another. Seems possible but unlikely (in a purely statistical sense).

https://www.reddit.com/r/iphone/comments/1m5zsj7/ai_photo_ga...

https://www.reddit.com/r/iphone/comments/1jbcl1l/iphone_16_p...

https://www.reddit.com/r/iphone/comments/17bxcm8/iphone_15_n...

jlokier 6 days ago | parent | prev | next [-]

> is it possible that at some point the computational photography makes a mistake and changes text?

Yes it is. I've seen that happen in real-time with the built-in camera viewfinder (not even taking a photo) on my mid-range Samsung phone, when I zoomed in on a sign.

It only changed one letter, and it was more like a strange optical warping from one letter to a different one when I pointed the camera at a particular sign in a shop, but it was very surprising to see.

rasalas 6 days ago | parent | prev | next [-]

Xerox scanners/photocopiers had this problem.

https://news.ycombinator.com/item?id=29223815

Aachen 6 days ago | parent [-]

It was the compression format, not the scanner, right? Same would have happened if you store in that format (with the same quality settings etc.) on a computer or smartphone

Not that that helps anyone who's affected, but that situation is more like if you'd have an .aip file, AI Photo storage format, where it invents details when you zoom in, and not a sensor (pipeline) issue

namibj 5 days ago | parent [-]

No they exhibited it in pure instant single copy copying mode.

Aachen 5 days ago | parent [-]

Oh wtf! I had ctrl+f'd the article for cop (to catch "copy" and "copies" and such) to quickly check this but didn't see that. Then I guess I don't remember the root cause of this issue

namibj 3 hours ago | parent [-]

Apparently you were right, mostly. Though it was later determined to be independent of quality setting; the vendor had claimed after the initial findings and having had a lot of time to try and internally reproduce "that factory default settings would be unaffected".

I, probably due to phrasing ambiguity in an old TheRegister article on the matter, had mistakenly remembered the temporary storage between scan and print of the copy mode to also had been affected.

As there were many situations where one would scan and destroy the original once offsite backup has run, while physical copies would/should often not entail destruction of the original, most of the overall damage/impact would be due to scanning anyways, not copying.

6 days ago | parent | prev | next [-]
[deleted]
sjsdaiuasgdia 6 days ago | parent | prev | next [-]

It's definitely a possibility if there's a point where LLM-based OCR is applied.

See https://www.runpulse.com/blog/why-llms-suck-at-ocr and its related HN discussion https://news.ycombinator.com/item?id=42966958

thesuitonym 6 days ago | parent [-]

Like almost everything LLMs do, you don't need an LLM to make these mistakes.

sjsdaiuasgdia 6 days ago | parent [-]

LLM-based OCR and speech transcription do come with a failure condition that is different than you see in pre-LLM solutions. When the source data is hard to understand, LLMs try to fill the gap with something that makes sense given the surrounding context.

Pre-LLM approaches handle unintelligible source data differently. You'll more commonly see nonsense output for the unintelligible bits. In some cases the tool might be capable of recognizing low confidence and returning an error or other indicator of a possible miss.

IMO, that's a feature. The LLM approach makes up something that looks right but may not actually match the source data. These errors are far harder to detect and more likely to make it past human review.

The LLM approach does mean that you can often get a more "complete" output from a low quality data source vs pre-LLM approaches. And sometimes it might even be correct! But it will get it wrong other times.

Another failure condition I've experienced with LLM-based voice transcription that I didn't have pre-LLM - running down the wrong fork in the road. Sometimes the LLM approaches will get a word or two wrong...words with similar phonetics or multiple meanings, that kind of thing. It may then continue down the path this mistaken context has created, outputting additional words that do not align to the source data at all.

coredog64 6 days ago | parent | prev [-]

Having uploaded my share of receipts to Concur, there's 2 checks & balances: If you still have the original, then you can correct the OCR'd value. And then Concur will recognized both line items and totals and whine if they don't match.

Karrot_Kream 6 days ago | parent | prev [-]

> We enjoy the imperfections of cameras because they let us create art

For something as widespread as photography I'm not sure you can define a "we". Even pro photographers often have a hard time relating to each other's workflows because they're so different based on what they're shooting.

The folks taking pictures of paintings for preservation are going to be lighting, exposing, and editing very differently than the folks shooting weddings who will be shooting differently than the folks doing architecture or real estate shots. If you've ever studied under a photographer or studied in school you'll learn this pretty quickly.

There's a point to be made here than an iPhone is more opinionated than a camera, but in my experience most pro photographers edit their shots, even if it's just bulk application of exposure correction and an appropriate color profile. In that way a smartphone shot may have the composition of the shooter but not the color and processing choices that the shooter might want. But one can argue that fixed-lens compacts shooting JPG are often similarly opinionated. The difference of opinion is one of degrees not absolutes.

As an aside, this appeal to a collective form of absolute values in photography bothers me. It seems to me to be a way to frame the conversation emotionally, to create an "us vs them" dynamic. But the reality of professional photography is that there are very few absolute values in photography except the physical existence of the exposure triangle.

There's no such thing as "accurate photographs". I don't think we can even agree if two human perceive the same picture the same way.

I do think the average person today should learn about the basics of photography in school simply because of how much our daily lives are influenced by images and the visual language of others. I'd love to see addition to civics and social sciences classes that discuss the effects of focal lengths, saturation, and DOF on compositions. But I don't think that yearning for an "accurate photo" is the way.