| ▲ | wesammikhail 6 hours ago | |
Because in my mind, as a person not working directly on this kind of stuff, I figured that caching was done similar to any resource caching in a webserver environment. It´s a semantics issue where the word caching is overloaded depending on context. For people that are not familiar with the inner workings of llm models, this can cause understandable confusion. | ||