Remix.run Logo
londons_explore 6 hours ago

The mel spectrum is the first part of a speech recognition pipeline...

But perhaps you'd get better results if more of a ML speech/audio recognition pipeline were included?

Eg. the pipeline could separate out drum beats from piano notes, and present them differently in the visualization?

An autoencoder network trained to minimize perceptual reconstruction loss would probably have the most 'interesting' information at the bottleneck, so that's the layer I'd feed into my LED strip.

akhudek an hour ago | parent | next [-]

I've done this in my own solution in this space (https://thundergroove.com). I use a realtime beat detection neural network combined with similar frequency spectrum analyses to provide a set of signals that effects can use.

Effects themselves are written in embedded Javascript and can be layered a bit like photoshop. Currently it only supports driving nanoleaf and wled fixtures, though wled gives you a huge range of options. The effect language is fully exposed so you can easily write your own effects against the real-time audio signals.

It isn't open source though, and still needs better onboarding and tutorials. Currently it's completely free, haven't really decided on if I want to bother trying to monetize any of it. If I were to it would probably just be for DMX and maybe midi support. Or maybe just for an ecosystem of portable hardware.

calibas 5 hours ago | parent | prev [-]

I was playing around with this recently, but the problem I encountered is that most AI analysis techniques like stem separation aren't built to work in real-time.