▲ | zh2408 6 days ago | |||||||
The Linux repository has ~50M tokens, which goes beyond the 1M token limit for Gemini 2.5 Pro. I think there are two paths forward: (1) decompose the repository into smaller parts (e.g., kernel, shell, file system, etc.), or (2) wait for larger-context models with a 50M+ input limit. | ||||||||
▲ | achierius 6 days ago | parent | next [-] | |||||||
Some huge percentage of that is just drivers. The kernel is likely what would be of interest to someone in this regard; moreover, much of that is architecture specific. IIRC the x86 kernel is <1M lines, though probably not <1M tokens. | ||||||||
| ||||||||
▲ | rtolsma 6 days ago | parent | prev | next [-] | |||||||
You can use the AST for some languages to identify modular components that are smaller and can fit into the 1M window | ||||||||
▲ | ryao 5 days ago | parent | prev [-] | |||||||
The first path would be the most interesting, especially if it can be automated. |