Remix clone Hacker News

new | show | ask | jobs Github

	▲	starchild3001 2 days ago
		Great talk. Dr. Li has a way of cutting through the hype and getting to the fundamental challenges that is really refreshing. Her point about spatial intelligence being the next frontier after language really resonates. I'm particularly hung up on the data problem she touched on (41 min). She rightly points out that unlike language, where we could bootstrap LLMs with the vast, pre-existing corpus of the internet, there's no equivalent "internet of 3D space." She mentions a "hybrid approach" for World Labs, and that's where the real engineering challenge seems to lie. My mind immediately goes to the trade-offs. If you lean heavily on synthetic data, you're in a constant battle with the "sim-to-real" gap. It works for narrow domains, but for a general "world model," the physics, lighting, and material properties have to be perfect, which is a monumental task. If you lean on real-world capture (e.g., massive-scale photogrammetry, NeRFs, etc.), the MLOps and data pipeline challenges seem staggering. We're not just talking text files; we're talking about petabytes of structured, multi-sensor data that needs to be processed, aligned, and labeled. It feels like an entirely new class of data infrastructure problem. Her hiring philosophy of "intellectual fearlessness" (31 min) makes a lot of sense in this context. You'd need a team that's not intimidated by the fact that the foundational dataset for their entire field doesn't even exist yet. They have to build the oil refinery while also figuring out where to drill for oil. It's exciting to see a team with this much deep learning and computer vision firepower aimed at such a foundational problem. It pulls the conversation away from just optimizing existing architectures and towards creating entirely new categories. It leaves me wondering: what does the "AlexNet moment" for spatial intelligence even look like? Is it a novel model architecture, or is the true breakthrough a new form of data representation that makes this problem tractable at scale?