▲ | Show HN: A reasoning model that infers over whole tasks in 1ms in latent space(github.com) | ||||||||||||||||
2 points by orderone_ai 5 hours ago | 4 comments | |||||||||||||||||
I've spent the last few weeks working on a novel model architecture. It is not a transformer - it lacks attention, tokens, and softmax. However: Batch Processing: Average batch size: 10 Time per batch: 13.03ms Time per example in batch: 1.30ms TASK SUMMARY WITH TIMING =================================================== Task Correct Total Accuracy Med Time (ms) --------------------------------------------------- Emotion Classification 10 10 100.0 % 1.30 Toxicity Classification 9 10 90.0 % 1.29 Sentiment Classification 10 10 100.0 % 1.34 Domain Classification 8 10 80.0 % 1.30 Sarcasm Detection 6 10 60.0 % 1.34 Scam Detection 7 10 70.0 % 1.31 Age Appropriateness Classification 4 10 40.0 % 1.28 Urgency Level Classification 4 10 40.0 % 1.25 Privacy Policy Classification 9 10 90.0 % 1.32 Dialogue Speaker Classification 8 10 80.0 % 1.29 Book Review Sentiment 10 10 100.0 % 1.25 Empathetic Direction Classification 10 10 100.0 % 1.29 Virtual Assistant Action Classification 6 10 60.0 % 1.37 --------------------------------------------------- OVERALL 101 130 77.7 % =================================================== It can do interesting things. This has a lot of caveats and limitations. However, the model is available for download via a script in the repo, and the exact benchmarks I used are available. The white paper gets into theory and application, as well as reveals a lot of limitations and interesting differences from transformers in terms of training and prompting behavior. It also produces extensive appendices (over 100 pages) on training datasets used, and performance on the ~260 (I think?) NIV2 tasks in its validation dataset. Running inference for the DSRU model + BGE embedding model together takes a bit shy of 10GB of VRAM, and the reference comparison model -- Zephyr 7B -- takes about 15GB of VRAM. | |||||||||||||||||
▲ | tripplyons 2 hours ago | parent | next [-] | ||||||||||||||||
How does this model compare to just using a linear classifier trained on BGE embeddings? | |||||||||||||||||
▲ | throwawayffffas 5 hours ago | parent | prev [-] | ||||||||||||||||
Can I ask? why do you have a single model for all these tasks? Wouldn't it be easier and more ergonomic to users to have dedicated models for each of this tasks? | |||||||||||||||||
|