Remix.run Logo
the_arun 6 days ago

Just wondering - can't AI read HTML? If so how are we training our models?

justmedep 6 days ago | parent | next [-]

The AI only sees a bit of HTML plus a bunch of JS that, when executed, generates more HTML. If the AI does not run the JS it won’t see everything. During training they probably use a crawler that runs a headless browser behind the scenes to get everything a human would get.

Lockal 5 days ago | parent [-]

So... The answer is to use during the real-time access the same headless browser as they used during the training? Which they already do, unless you ask specifically to write and run a python script that uses simple requests?

It is like generating static webpages just for SEO: obsolete since 2012[1], and few years later for other major websites.

[1] https://www.i-programmer.info/news/81-web-general/4248-googl...

y1n0 6 days ago | parent | prev | next [-]

Ai can’t read something dynamically rendered with JavaScript. At the moment.

novok 6 days ago | parent | prev [-]

They can, but the token to content ratio is far less, so they work less effectively when it's put into the inference context window.