Remix.run Logo
do_not_redeem 2 days ago

sure, if you don't mind it hallucinating different numbers into your image

Legend2440 2 days ago | parent [-]

Right, but the non-deep learning OCR methods also do that. And they have a much much lower overall accuracy.

There’s a reason deep learning took over computer vision.

vincenthwt 2 days ago | parent | next [-]

You're absolutely right, deep learning OCR often delivers better results for complex tasks like handwriting or noisy text. It uses advanced models like CNNs or CRNNs to learn patterns from large datasets, making it highly versatile in challenging scenarios.

However, if I can’t understand the system, how can I debug it if there are any issues? Part of an engineer's job is to understand the system they’re working with, and deep learning models often act as a "black box," which makes this difficult.

Debugging issues in these systems can be a major challenge. It often requires specialized tools like saliency maps or attention visualizations, analyzing training data for problems, and sometimes retraining the entire model. This process is not only time-consuming but also may not guarantee clear answers.

Legend2440 2 days ago | parent [-]

No matter how much you tinker and debug, classical methods can’t match the accuracy of deep learning. They are brittle and require extensive hand-tuning.

What good is being able to understand a system if this understanding doesn’t improve performance anyway?

vincenthwt 2 days ago | parent | next [-]

I agree, Deep Learning OCR often outperforms traditional methods.

But as engineers, it’s essential to understand and maintain the systems we build. If everything is a black box, how can we control it? Without understanding, we risk becoming dependent on systems we can’t troubleshoot or improve. Don’t you think it’s important for engineers to maintain control and not rely entirely on something they don’t fully understand?

That said, there are scenarios where using a black-box system is justifiable, such as in non-critical applications where performance outweighs the need for complete control. However, for critical applications, black-box systems may not be suitable due to the risks involved. Ultimately, what is "responsible" depends on the potential consequences of a system failure.

throwway120385 a day ago | parent | prev [-]

This is a classic trade-off and the decision should be made based on the business and technical context that the solution exists within.

shash 2 days ago | parent | prev | next [-]

OCR is one of those places where you can just skip algorithm discovery and go straight to deep learning. But there are precious few of those kinds of places actually.

do_not_redeem 2 days ago | parent | prev [-]

GP is talking about thresholding and thresholding is used in more than just OCR. Thresholding algorithms do not hallucinate numbers.