I would recommend investigating how contemporary LLMs actually work.
Possibly start with something like: https://transformer-circuits.pub/2025/attribution-graphs/bio...