| ▲ | almaight 4 days ago | |
Video games have long served as a crucial proving ground for artificial intelligence. Like the real world, they offer rich, dynamic environments with responsive, real-time settings and complex challenges that push the boundaries of AI capabilities. The history of AI in gaming is marked by landmark achievements, from mastering classic board games to achieving superhuman performance in complex strategy titles. However, the next frontier lies beyond mastering individual, known environments. To meet this challenge, we introduce Game-TARS: a next-generation generalist game agent designed to master complex video games and interactive digital environments using human-like perception, reasoning, and action. Unlike traditional game bots or modular AI frameworks, Game-TARS integrates all core faculties—visual perception, strategic reasoning, action grounding, and long-term memory—within a single, powerful vision-language model (VLM). This unified approach enables true end-to-end autonomous gameplay, allowing the agent to learn and succeed in any game without game-specific code, scripted behaviors, or manual rules. With Game-TARS, this work is not about achieving the highest possible score in a single game. Instead, our focus is on building a robust foundation model for both generalist game-playing and broader computer use. We aim to create an agent that can learn to operate in any interactive digital environment it encounters, following instructions just like a human. | ||