| ▲ | sigmoid10 8 hours ago | |
That's why this model and all the other ones serious about realtime speech don't use such a pipeline and instead process raw audio. The most realistic approach is probably a government mandated, real name online identity verification system, and that comes with its very own set of fundamental issues. You can't have the freedom of the web and the accountability of the physical world at the same time. | ||
| ▲ | exe34 8 hours ago | parent [-] | |
this is amazing - it reminds me of the time when LLM precursors were able to babble in coherent English, but would just write nonsense. | ||