Been working on this very idea casually for couple years with ESP-IDF and I could never get the statistical signal processing just right (by my definition). Things I've tried: adapting filtering (LMS, Kalman), kernel methods (NEWMA, MMD), detectors (CUSUM, GLR), dimensionality reduction (random projection, online PCA), whitening, etc.

I use a single ESP32 in STA/AP mode which sniffs ACK packets with a specific destination mac, which come from any server on my WiFi network (uses a special sniffing mode IIRC). This way I can receive regular CSI packets originating from a fixed location and doesn't need another device running.

I'll have to look at this code, maybe I just overlooked the obvious or my requirements were too high!

▲

francescopace 9 hours ago | parent [-]

ESPectre takes a different architectural approach that might address some of the challenges you encountered:

1. Instead of STA/AP mode on a single ESP32, ESPectre uses the natural traffic between your existing router and an ESP32-S3 in station mode. To ensure a stable, continuous CSI packet rate, I implemented a traffic generator that sends ICMP pings to the gateway at a configurable rate (default: 20 pps). This provides bidirectional traffic (request + reply) that reliably triggers CSI generation, giving you predictable packet timing without relying on ambient network traffic or special sniffing modes.

2. Rather than applying filters directly to raw CSI, ESPectre uses Moving Variance Segmentation (MVS) on unfiltered spatial turbulence (std dev of subcarrier amplitudes).

3. The filters are applied to features, not to the segmentation signal itself. This preserves motion sensitivity while cleaning up the feature data

I found that having a stable transmitter (the router) combined with controlled traffic generation provides more consistent multipath patterns and predictable CSI timing, which makes the segmentation more reliable.

▲

roger_ 8 hours ago | parent [-]

Actually I misspoke. I previously used STA/AP mode (and two ESP32s) but I switched to something close to what you describe. I filter the pings to only get the ones targeting a specific MAC (in promiscuous mode). This way I get only specific CSI packets and they're perfectly periodic at whatever rate I want.

Sounds like your MVS approach is a sliding window variance of the cross channel variance, with some adaptive thresholding. My pre-processing has generally been an EWMA de-meaning filter followed by some type of dimensionality reduction and feature extraction (kernel or hand-crafted, like raw moments), which I think fits into your overall architecture.

I'll have to look more closely at your work, thanks for sharing!

▲

francescopace 7 hours ago | parent [-]

Interesting note, I actually disabled promiscuous mode after some testing because it made the CSI signal noisier and consumed more resources. I found that normal station mode with pings to gateway gave me cleaner, more predictable CSI data. But your MAC filtering approach might mitigate those issues!

You're spot on about the MVS approach. It's essentially a sliding window variance of the spatial turbulence (std dev across subcarriers), with adaptive thresholding based on the moving variance of that signal.

If you're interested in the MVS details, I wrote a free Medium article that walks through the segmentation algorithm step-by-step with visualizations. Links are in the README.

Your approach is actually quite similar to what I'm doing, just in a different order:

- My flow: Raw CSI → Segmentation (MVS) → Filters (Butterworth/Wavelet/Hampel/SG) → Feature extraction

- Your flow: Raw CSI → EWMA de-meaning → Dimensionality reduction → Feature extraction

The main difference is that I segment first to separate IDLE from MOTION states (keeping segmentation on raw, unfiltered CSI to preserve motion sensitivity), then only extract features during MOTION (to save CPU cycles).

Thanks for the thoughtful feedback! Always great to exchange notes with someone who's been in the trenches with CSI signal processing

▲

roger_ 6 hours ago | parent [-]

I noticed your feature vector is large and you don't use ML. What's the final statistic that you threshold?

	▲	francescopace 6 hours ago \| parent [-]
		The final statistic I threshold is the Moving Variance of Spatial Turbulence. The decision is a binary comparison: When moving_variance > threshold then MOTION state (movement detected) else IDLE state. The features are extracted only during MOTION segments (to save CPU cycles) and published via MQTT. They serve as rich foundation data for potential external ML models (e.g., to capture nuances like gestures, running, or falling), but they are absolutely not used for the core segmentation decision itself.