Remix clone Hacker News

new | show | ask | jobs Github

	▲	manquer 3 hours ago
		> game is to turn knobs until you get a benchmark run that shows an improvement, then ship it i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently. The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration . The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.