| ▲ | tehjoker a day ago | |
Hmm you note that the problem is the LLM doesn’t have enough image context, but then zoom the image more? Why not downscale the image and feed it as a second input so that entire planets fit into a patch and instruct it to use the doensampled image for coarse coordinate estimation | ||