Remix clone Hacker News

new | show | ask | jobs Github

	▲	roflcopter69 a day ago
		Thanks for the reply, that part about the models attention is pretty interesting! I would suspect that a single attention layer won't be able to figure out to which token a token for an opening bracket should attend the most to. Think of {"x": {y: 1}} so with only one layer of attention, can the token for the first opening bracket successfully attend to exactly the matching closing bracket? I wonder if RNNs work better with JSON or XML. Or maybe they are just fine with both of them because a RNN can have some stack-like internal state that can match brackets? Probably, it would be a really cool research direction to measure how well Transformer-Mamba hybrid models like Jamba perform on structured input/output formats like JSON and XML and compare them. For the LLM era, I could only find papers that do this evaluation with transformer-based LLMs. Damn, I'd love to work at a place that does this kind of research, but guess I'm stuck with my current boring job now :D Born to do cutting-edge research, forced to write CRUD apps with some "AI sprinkled in". Anyone hiring here?