Remix clone Hacker News

	▲	jsight 10 days ago
		I'd guess that it wouldn't be a huge effort to fine tune them to produce bounding boxes. I haven't done it with OCR tasks, but I have fine tuned other models to produce them instead of merely producing descriptive text. I'm not sure if there are datasets for this already, but creating one shouldn't be very difficult.