Remix.run Logo
kburman 20 hours ago

> a state-of-the-art research tool over Hacker News, arXiv, LessWrong, and dozens

what makes this state of the art?

Xyra 3 hours ago | parent | next [-]

in the direction of "empowering the public with new capabilities they didn't have before", Scry offers, with the copy and paste of a prompt and talking with an agent:

1) Full readonly-SQL + vector manipulation in a live public database. Most vector DB products expose a much narrower search API. Basically only a few enterprise level services let you run arbitrary SQL on remote machines. Google BigQuery gives users SQL power, but it mostly doesn't have embeddings, connect public corpora, have as good of indexes, and doesn't have support an agentic research experience. Beyond object-level research, Scry a good tool for exploring and acquiring intuitions about embedding-space.

2) An agent-native text-to-SQL + lexical + semantic deep research workflow. We have a prompt that's been heavily optimized for taking full advantage of our machine and Claude Code for exploration and answering nuanced questions. Claude fires off many exploratory queries and builds towards really big queries that lean on the SQL query planner. You can interrupt at any time. You have the compute limits to do lots of exhaustive exploration--often more epistemically powerful than finding a document often, is being confident than one doesn't exist.

3) dozens of public commons in one database, with embeddings.

rvnx 16 hours ago | parent | prev | next [-]

It's just marketing.

It is not a protected term, so anything is state-of-the-art if you want it to be.

For example, Gemma models at the moment of release were performing worse their competition, but still, it is "state-of-the-art". It does not mean it's a bad product at all (Gemma is actually good), but the claims are very free.

Juicero was state-of-the-art on release too, though hands were better, etc.

lo_zamoyski 15 hours ago | parent | next [-]

> It's just marketing. [...] It is not a protected term, so anything is state-of-the-art if you want it to be.

But is it true?

I think we ought to stop indulging and rationalizing self-serving bullshit with the "it's just marketing" bit, as if that somehow makes bullshit okay. It's not okay. Normalizing bullshit is culturally destructive and reinforces the existing indifference to truth.

Part of the motivation people have seems to be a cowardly morbid fear of conflict or the acknowledgment that the world is a mess. But I'm not even suggesting conflict. I'm suggesting demoting the dignity of bullshitters in one's own estimation of them. A bullshitter should appear trashy to us, because bullshitting is trashy.

docjay 14 hours ago | parent [-]

I would vote for you as dictator.

econ 9 hours ago | parent [-]

If my comments were only state of the art I wouldn't need to write them.

goopypoop 16 hours ago | parent | prev [-]

just like "cruelty free" and "not tested on animals" in usa

7moritz7 19 hours ago | parent | prev | next [-]

The scale. How many tools do you know that can query the content of all arxiv papers.

eamag 12 hours ago | parent [-]

Doesn't look like the scale is there, even for HN:

> Currently have embedded: posts: 1.4M / 4.6M comments: 15.6M / 38M That's with Voyage-3.5-lite

Xyra 3 hours ago | parent [-]

The scale is there. I'm scraping, cleaning, token efficientizing dozens of sources every single hour. The lack of monies for embedding everything was a temporary problem.

nandomrumber 19 hours ago | parent | prev | next [-]

The tool is state of the art, the sources are historical.

ashirviskas 19 hours ago | parent | prev [-]

First, so best in this?