| ▲ | largbae 5 hours ago | |
I think this article just speaks to the immaturity of our use of AI at this "moment." Production grade systems might be written by agents running on filesystem skills, but the production systems themselves will run on consistent and scalable data structures. Meanwhile the UI of AI agents will almost certainly evolve away from desktop computers and toward audio/visual interfaces. An agent might get more context from a zoom call with you, once tone and body language can be used to increase the bandwidth between you. | ||
| ▲ | andai 3 hours ago | parent | next [-] | |
https://www.youtube.com/watch?v=GH9-EmgtABw Saw this video recently, by an AI company working to get contextual cues from tone and body language. I think they're converting it to text and feeding it into a LLM, so not natively multimodal, but I still thought it was really cool. | ||
| ▲ | fragmede 3 hours ago | parent | prev [-] | |
I don't think written prompting will ever go away. Writing helps you organize your thoughts in a way that speaking, umm, ah, wait no, hang on, does not. Writing I can go back and change what I've already written before I hit send. Anybody who's prompted with speech for any length has been "wait no nevermind start over". So STT will get better, sure, it's already quite good. I just don't see text extry entirely going away because Human Intelligence (HI) just doesn't work in a way that speech would be the only interface. | ||