Remix.run Logo
amelius 2 days ago

But have a look at the "Thresholding" section. It appears to me that AI would be much better at this operation.

vincenthwt 2 days ago | parent | next [-]

It really depends on the application. If the illumination is consistent, such as in many machine vision tasks, traditional thresholding is often the better choice. It’s straightforward, debuggable, and produces consistent, predictable results. On the other hand, in more complex and unpredictable scenes with variable lighting, textures, or object sizes, AI-based thresholding can perform better.

That said, I still prefer traditional thresholding in controlled environments because the algorithm is understandable and transparent.

Debugging issues in AI systems can be challenging due to their "black box" nature. If the AI fails, you might need to analyze the model, adjust training data, or retrain, a process that is neither simple nor guaranteed to succeed. Traditional methods, however, allow for more direct tuning and certainty in their behavior. For consistent, explainable results in controlled settings, they are often the better option.

shash 2 days ago | parent [-]

Not to mention performance. So often, the traditional method is the only thing that can keep up with performance requirements without needing massive hardware upgrades.

Counter intuitively, I’ve often found that CNNs are worse at thresholding in many circumstances than a simple otsu or adaptive threshold. My usual technique is to use the least complex algorithm and work my way up the ladder only when needed.

MassPikeMike a day ago | parent | next [-]

I am usually working with historical documents, where both Otsu and adaptive thresholding are frustratingly almost but not quite good enough. My go-to approach lately is "DeepOtsu" [1]. I like that it combines the best of both the traditional and deep learning worlds: a deep neural net enhances the image such that Otsu thresholding is likely to work well.

[1] https://arxiv.org/abs/1901.06081

shash a day ago | parent [-]

Ok. Those are impressive results. Nice addition to the toolbox

hansvm a day ago | parent | prev [-]

Something I've had a lot of success with (in cases where you're automating the same task with the same lighting) is having a human operator manually choose a variety of in-sample and out-of-sample regions, ideally with some of those being near real boundaries. Then train a (very simple -- details matter, but not a ton) local model to operate on small image patches and output probabilities for each pixel.

One fun thing is that with a simple model it's not much slower than techniques like otsu (you're still doing a roughly constant amount of vectorized, fast math for each pixel), but you can grab an alpha channel for free even when working in colored spaces, allowing you to near-perfectly segment the background out from an image.

The UX is also dead-simple. If a human operator doesn't like the results, they just click around the image to refine the segmentation. They can then apply directly to a batch of images, or if each image might need some refinement then there are straightforward solutions for allowing most of the learned information to transfer from one image to the next, requiring much less operator input for the rest of the batch.

As an added plus, it also works well even for gridlines and other stranger backgrounds, still without needing any fancy algorithms.

Greamy a day ago | parent | prev | next [-]

It can benefit from more complex algorithms, but I would stay away from "AI" as much as possible unless there is indeed need of it. You can analyse your data and make some dynamic thresholds, you can make some small ML models, even some tiny DL models, and I would try the options in this order. Some cases do need more complex techniques, but more often than not, you can solve most of your problems by preprocessing your data. I've seen too many solutions where a tiny algorithm could do exactly what a junior implemented using a giant model that takes forever to run.

Legend2440 2 days ago | parent | prev | next [-]

It indeed would be much better. There’s a reason the old CV methods aren’t used much anymore.

If you want to anything even moderately complex, deep learning is the only game in town.

shash 2 days ago | parent [-]

I’ve found exactly the opposite. In domain after domain the performance of a pure deep learning method is orders of magnitude less than that of either a traditional algorithm or a combination.

And often the CNNs are so finicky about noise or distortion that you need something as an input stage to clean up the data.

spookie a day ago | parent | prev | next [-]

There are also many other classical thresholding algos. Don't worry about it :)

do_not_redeem 2 days ago | parent | prev [-]

sure, if you don't mind it hallucinating different numbers into your image

Legend2440 2 days ago | parent [-]

Right, but the non-deep learning OCR methods also do that. And they have a much much lower overall accuracy.

There’s a reason deep learning took over computer vision.

vincenthwt 2 days ago | parent | next [-]

You're absolutely right, deep learning OCR often delivers better results for complex tasks like handwriting or noisy text. It uses advanced models like CNNs or CRNNs to learn patterns from large datasets, making it highly versatile in challenging scenarios.

However, if I can’t understand the system, how can I debug it if there are any issues? Part of an engineer's job is to understand the system they’re working with, and deep learning models often act as a "black box," which makes this difficult.

Debugging issues in these systems can be a major challenge. It often requires specialized tools like saliency maps or attention visualizations, analyzing training data for problems, and sometimes retraining the entire model. This process is not only time-consuming but also may not guarantee clear answers.

Legend2440 2 days ago | parent [-]

No matter how much you tinker and debug, classical methods can’t match the accuracy of deep learning. They are brittle and require extensive hand-tuning.

What good is being able to understand a system if this understanding doesn’t improve performance anyway?

vincenthwt 2 days ago | parent | next [-]

I agree, Deep Learning OCR often outperforms traditional methods.

But as engineers, it’s essential to understand and maintain the systems we build. If everything is a black box, how can we control it? Without understanding, we risk becoming dependent on systems we can’t troubleshoot or improve. Don’t you think it’s important for engineers to maintain control and not rely entirely on something they don’t fully understand?

That said, there are scenarios where using a black-box system is justifiable, such as in non-critical applications where performance outweighs the need for complete control. However, for critical applications, black-box systems may not be suitable due to the risks involved. Ultimately, what is "responsible" depends on the potential consequences of a system failure.

throwway120385 a day ago | parent | prev [-]

This is a classic trade-off and the decision should be made based on the business and technical context that the solution exists within.

shash 2 days ago | parent | prev | next [-]

OCR is one of those places where you can just skip algorithm discovery and go straight to deep learning. But there are precious few of those kinds of places actually.

do_not_redeem 2 days ago | parent | prev [-]

GP is talking about thresholding and thresholding is used in more than just OCR. Thresholding algorithms do not hallucinate numbers.