Symmetry augmentation sounds good for software.
Traditional ML CV Computer Vision research has perhaps been supplanted by multimodal LLMs that are trained on image analysis annotations. (CLIP, Brownian-motion based Dall-E and Latent Diffusion were published in 2021. More recent research: Brownian Bridges, SDEs, Lévy processes. What are foundational papers in video genai?)
TOPS are now necessary.
I suspect that existing CV algos for feature extraction would also be useful for training LLMs. OpenCV, for example, has open algorithms like ORB (Oriented FAST and Rotated BRIEF), KAZE and AKAZE, and SIFT since 2020. SIFT "is highly robust to rotation, scale, and illumination changes".
But do existing CV feature extraction and transform algos produce useful training data for LLMs as-is?
Similarly, pairing code and tests with a feature transform at training time probably yields better solutions to SWE-bench.
Self Play algos are given rules of the sim. Are self play simulations already used as synthetic training data for LLMS and SLMs?
There are effectively rules for generating synthetic training data.
The orbits of the planets might be a good example of where synthetic training data is limited and perhaps we should rely upon real observations at different scales given cost of experimentation and confirmations of scale invariance.
Extrapolations from orbital observations and classical mechanics failed to predict the Perihelion precession of Mercury (the first confirmation of GR General Relativity).
To generate synthetic training data from orbital observations where Mercury's 43 arcsecond deviation from Newtonian mechanics was disregarded as an outlier would result in a model overweighted by existing biases in real observations.
Tests of general relativity >
Perihelion precession of Mercury
https://en.wikipedia.org/wiki/Tests_of_general_relativity#Pe...