Remix.run Logo
noosphr 5 days ago

The AI company I've been working at ran out of money last week so I'm taking a month long break.

I've been playing around with defining a standard that is easy to implement for serializing tabular data using the ASCII delimiters.

So far I've got:

    <group> ::= GS | <record>
    <record> ::= RS <group> | <unit>
    <unit> ::=  <high-ascii> | US <record>
    <high-ascii> ::= 0x20 <unit> | ... | 0x7E <unit>
    
Which seems like a good way to avoid all the trouble of escaping separators in CSV files, if a bit clunky since you need to end each record with US RS and each file with US RS GS.

I also accidentally found another test that _all_ LLMs fail at (including all the reasoning models): the ability to decide if a given string is derivable from a grammar. I was asking for tests before I started coding and _every_ frontier model gave me obvious garbage. I've not seen such bad performance on such low hanging fruit for automated training in over a year.

mac3n 4 days ago | parent [-]

Hey, good to see someone using ASCII

Don't forget File Separator 0x1c