| ▲ | sasipi247 6 hours ago | |
I am working on a system built around the OpenAI Responses API WebSocket mode as performance is something that interests me. Its like a microservices architecture with NATS JetStream coordinating stuff. I want to keep the worker core as clean as possible, just managing open sockets, threads and continuation. Document querying is something I am interested in also. This system allows me to pin a document to a socket as a subagent, which is then called upon. I have hit alot of slip ups along the way, such as infinite loops trying to call OpenAI API, etc ... Example usage: 10 documents on warm sockets on GPT 5.4 nano. Then the main thread can call out to those other sockets to query the documents in parallel. It allows alot of possibilities : cheaper models for cheaper tasks, input caching and lower latency. There is also a frontend Alot of information is in here, just thoughts, designs etc: https://github.com/SamSam12121212/ExplorerPRO/tree/main/docs | ||