Remix.run Logo
nishilbhave 3 hours ago

The 'reverse engineering prompts' approach is interesting, but the variance in LLM responses based on temperature and system prompt updates makes consistency a major hurdle for this type of monitoring. One of the biggest technical challenges is distinguishing between when a model retrieves your site via RAG (live search) versus when it relies on stale training data. In the latter case, you can't really optimize for visibility without a new training cutoff, whereas RAG visibility can be influenced by site structure and indexing. Have you found a way to reliably trigger the search-tool use in your pipeline to ensure you're getting live results? Disclosure: I'm building Sivon HQ, where we track similar AI search visibility metrics.

biduskamil 2 hours ago | parent [-]

Thanks for feedback! A couple of things:

Temperature settings only matter for api usage. Nevertheless, stochastic nature of LLM responses does produce a distribution of responses for a single prompt query. It could potentially be a good idea to run the same query couple of times in each iteration of the monitoring tool to get a better look at the distribution of responses.

As for live search, we focus pretty much only on queries that refer to brands or products. Such queries do use web search tool almost 100% of the time so you will not encounter the stale training data issue in our tool.

Happy to discuss it in more details if you want