Remix.run Logo
joshstrange a day ago

I got a chance to play with this a bit and while I see the promise it's still a far way off.

I asked it to find the best price for a new Seagate Exos 24TB drive on Amazon (GPT-4o):

> The best price found for a new Seagate EXOS 24TB enterprise hard drive on Amazon is $274.99 (model ST24000NM000C). This is a renewed/refurbished unit with 4.1/5 stars from 39 reviews. Note: The search results show mostly renewed units, and new units may be available at different prices on individual product pages.

Ok, fair, you couldn't find a new one on Amazon. I would have prefered a "Couldn't find one" instead of leading with the price and only in the second sentence mentioning it's a refurb. Note, this is mostly an LLM failing, not Nxtscape, that part seemed to work well overall.

I then followed up with "I only want a new drive, please look at other popular electronic retailers". It replied with (formatted slightly):

> Best Buy - WD Blue 2TB Internal Hard Drive ($59.99, on sale from $70)

> Newegg - Desktop Internal Hard Drives (500GB up to 8TB available) - Price range: $180-$299 - 4.5 store rating - Features: 30-day returns, 5-9 day delivery

> Amazon - Seagate EXOS Enterprise Hard Drives (up to 24TB)

> Additional options include SSDs from Newegg ($60-$201) if you're interested in faster storage solutions.

As almost always with LLMs, I see where it went off the beaten path. I didn't specify "a new one with the same specs I originally asked for", I shouldn't have to. This is probably mostly on the LLM, I don't know if Nxtscape could improve that with prompting (I don't know exactly what they are sending to judge either way). Also it got lazy with the Amazon response (no price).

One way that Nxtscape might be able to improve is to parse out what the user is asking for, creating a data structure to define a "result" (in my case: url, name, price, description?), use that to prompt the LLM to conform to that shape, then take the results and pass them all through a one-off LLM instance to summarize the data. I think that would help with the inconsistencies in the results. Then again, that's very "Data extraction"/"data lookup"-focused and I haven't even played with using it for input: Fill out this form, loop this process for input (mail merge), etc.

Really cool idea and I'll try throwing some other problems at it as I think of them, but mostly for fun/research, this doesn't seem like force-multiplier for my normal workflows (yet).