▲ | Jackson__ 2 days ago | |
This appears to be a classic vision fail on the VLM's part. Which is entirely unsurprising for anyone who has used open VLMs for anything except ""benchmarks"" in the past two god damn years. The field is in a truly embarrassing state, where they pride themselves how it can solve equations off a blackboard, yet couldn't even accurately read a d20 dice roll among many other things. I've tried (and failed) to have VLMs accurately caption images for such a long time, yet anytime I check on the output it is blindingly clear that these models are awful at actually _seeing things_. | ||
▲ | Ianjit 2 days ago | parent [-] | |
5-10 years and Radiologists will be out of a Job, just you wait and see. |