Remix clone Hacker News

new | show | ask | jobs Github

	▲	Barrin92 3 months ago
		I don't really understand the logic here. All the actual signal about what artillery in bushes look like is already in the original training data. Synthetic data cannot conjure empirical evidence into existence, it's as likely to produce false images as real ones. Assuming the military has more privileged access to combat footage than a multi-purpose public chatbot I'd expect synthetic data to degrade the accuracy of a drone.
	▲	johndough 3 months ago \| parent \| next [-]
		Generative models can combine different concepts from the training data. For example, the training data might contain a single image of a new missile launcher at a military parade. The model can then generate an image of that missile launcher hiding in a bush, because it has internalized the general concept of things hiding in bushes, so it can apply it to new objects it has never seen hiding in bushes.
	▲	IanCal 3 months ago \| parent \| prev \| next [-]
		I'm not arguing this is the purpose here but data augmentation has been done for ages. It just kind of sucks a lot of the time. You take your images and crop, shift, etc them so that your model doesn't learn "all x are in the middle of the image". For text you might auto replace days of the week with others, there's a lot of work there. Broadly the intent is to keep the key information and generate realistic but irrelevant noise so that you train a model that correctly ignores the noise. You don't want to train your model identifying some class of ship to base it on how choppy the water is, just because that was the simple signal that correlated well. There was a case of radiology results that detected cancer well but actually was detecting rulers in the image because in images with tumors there was often a ruler so the tumor could be sized. (I think it was cancer, broad point applies if it was something else).
	▲	rovr138 3 months ago \| parent \| prev \| next [-]
		If you're building a system to detect something, usually you need enough variations. You add noise to the images, etc. With this, you could create a dataset that will by definition have that. You should still corroborate the data, but it's a step ahead without having to take 1000 photos and adding enough noise and variations to get to 30k.
	▲	stormfather 3 months ago \| parent \| prev [-]
		What you're saying just isn't true. I can get an AI to generate an image of a bear wearing a sombrero. There are no images of this in its training data, but there are bears, and there are images of sombreros, and other things wearing sombreros. It can combine the distributions in a plausible way. If I am trying to train a small model to fit into the optical sensor of a warhead to target bears wearing sombreros, this synthetic training set would be very useful. Same thing with artillery in bushes. Or artillery in different lighting conditions. This stuff is useful to saturate the input space with synthetic examples.