Remix.run Logo
NalNezumi 6 days ago

My previous job was at a startup doing BMI, for research. For the first time I had the chance to work with expensive neural signal measurement tools (mainly EEG for us, but some teams used fMRI). and quickly did I learn how absolute horrible the signal to noise ratio (SNR) was in this field.

And how it was almost impossible to reproduce many published and well cited result. It was both exciting and jarring to talk with the neuroscientist, because they ofc knew about this and knew how to read the papers but the one doing more funding/business side ofc didn't really spend much time putting emphasis on that.

One of the team presented a accepted paper that basically used Deep Learning (Attention) to predict images that a person was thinking of, from the fMRI signals. When I asked "but DL is proven to be able to find pattern even in random noise, so how can you be sure this is not just overfitting to artefact?" and there wasn't really any answer to that (or rather the publication didn't take that in to account, although that can be experimentally determined). Still, a month later I saw tech explore or some tech news writing an article about it, something like "AI can now read your brain" and the 1984 implications yada yada.

So this is indeed something probably most practitioners, masters and PhD, realize relatively early.

So now that someone says "you know mindfulness is proven to change your brainwaves?" I always add my story "yes, but the study was done with EEG, so I don't trust the scientific backing of it" (but anecdotally, it helps me)

SubiculumCode 6 days ago | parent | next [-]

There are lots of reliable science done using EEG and fMRI; I believe you learned the wrong lesson here. The important thing is to treat motion and physiological sources of noise as a first-order problem that must be taken very seriously and requires strict data quality inclusion criterion. As far as deep learning in fMRI/EEG, your response about overfitting is too sweepingly broad to apply to the entire field.

To put it succinctly, I think you have overfit your conclusions on the amount of data you have seen

D-Machine 6 days ago | parent | next [-]

I would argue in fact almost all fMRI research is unreliable, and formally so (test-retest reliabilities are in fact quite miserable: see my post below).

https://news.ycombinator.com/item?id=46289133

EDIT: The reason being, with reliabilities as bad as these, it is obvious almost all fMRI studies are massively underpowered, and you really need to have hundreds or even up to a thousand participants to detect effects with any statistical reliability. Very few fMRI studies ever have even close to these numbers (https://www.nature.com/articles/s42003-018-0073-z).

mattkrause 6 days ago | parent | next [-]

That depends immensely on the type of effect you're looking for.

Within-subject effects (this happens when one does A, but not when doing B) can be fine with small sample sizes, especially if you can repeat variations on A and B many times. This is pretty common in task-based fMRI. Indeed, I'm not sure why you need >2 participants expect to show that the principle is relatively generalizable.

Between-subject comparisons (type A people have this feature, type B people don't) are the problem because people differ in lots of ways and each contributes one measurement, so you have no real way to control for all that extra variation.

D-Machine 6 days ago | parent [-]

Precisely, and agreed 100%. We need far more within-subject designs.

You would still in general need many subjects to show the same basic within-subject patterns if you want to claim the pattern is "generalizable", in the sense of "may generalize to most people", but, precisely depending on what you are looking at here, and the strength of the effect, of course you may not need nearly as much participants as in strictly between-subject designs.

With the low test-retest reliability of task fMRI, in general, even in adults, this also means that strictly one-off within-subject designs are also not enough, for certain claims. One sort of has to demonstrate that even the within-subject effect is stable too. This may or may not be plausible for certain things, but it really needs to be considered more regularly and explicitly.

SubiculumCode 6 days ago | parent [-]

Between-subject heterogeneity is a major challenge in neuroimaging. As a developmental researcher, I've found that in structural volumetrics, even after controlling for total brain size, individual variance remains so large that age-brain associations are often difficult to detect and frequently differ between moderately sized cohorts (n=150-300). However, with longitudinal data where each subject serves as their own control, the power to detect change increases substantially—all that between-subject variance disappears with random intercept/slope mixed models. It's striking.

Task-based fMRI has similar individual variability, but with an added complication: adaptive cognition. Once you've performed a task, your brain responds differently the second time. This happens when studies reuse test questions—which is why psychological research develops parallel forms. But adaptation occurs even with parallel forms (commonly used in fMRI for counterbalancing and repeated assessment) because people learn the task type itself. Adaptation even happens within a single scanning session, where BOLD signal amplitude for the same condition typically decreases over time.

These adaptation effects contaminate ICC test-retest reliability estimates when applied naively, as if the brain weren't an organ designed to dynamically respond to its environment. Therefore, some apparent "unreliability" may not reflect the measurement instrument (fMRI) at all, but rather highlights the failures in how we analyze and conceptualize task responses over time.

D-Machine 6 days ago | parent [-]

Yeah, when you start getting into this stuff and see your first dataset with over a hundred MRIs, and actually start manually inspecting things like skull-stripping and stuff, it is shocking how dramatically and obviously different people's brains are from each other. The nice clean little textbook drawings and other things you see in a lot of education materials really hide just how crazy the variation is.

And yeah, part of why we need more within-subject and longitudinal designs is to get at precisely the things you mention. There is no way to know if the low ICCs we see now are in fact adaptation to the task or task generalities, if they reflect learning that isn't necessarily task-relevant adaptation (e.g. the subject is in a different mood on a later test, and this just leads to a different strategy), if the brain just changes far more than we might expect, or all sorts of other possibilities. I suspect if we ever want fMRI to yield practical or even just really useful theoretical insights, we definitely need to suss out within-subject effects that have high test-retest reliability, regardless of all these possible confounds. Likely finding such effects will involve more than just changes to analysis, but also far more rigorous experimental designs (both in terms of multi-modal data and tighter protocols, etc).

FWIW, we've also noticed a lot of magic can happen too when you suddenly have proper longitudinal data that lets you control things at the individual level.

SubiculumCode 6 days ago | parent | prev | next [-]

Yes on many of those fronts, although not all those papers support your conclusion. The field did/does too often use tasks with to few trials, with to few participants. That always frustrated me as my advisor rightly insisted we collect hundreds of participants for each study, while others would collect 20 and publish 10x faster than us.

D-Machine 6 days ago | parent | next [-]

Yes, well "almost all" is vague and needs to be qualified. Sample sizes have improved over the past decade for sure. I'm not sure if they have grown on median meaningfully, because there are still way too many low-N studies, but you do see studies now that are at least plausibly "large enough" more frequently. More open data has also helped here.

EDIT: And kudos to you and your advisor here.

EDIT2: I will also say that a lot of the research on fMRI methods is very solid and often quite reproducible. I.e. papers that pioneer new analytic methods and/or investigate pipelines and such. There is definitely a lot of fMRI research telling us a lot of interesting and likely reliable things about fMRI, but there is very little fMRI research that is telling us anything reliably generalizable about people or cognition.

SubiculumCode 6 days ago | parent [-]

I remember when resting-state had its oh shit moment when Power et al (e.g. https://pubmed.ncbi.nlm.nih.gov/22019881/) showed that major findings in the literature, many of which JD Power himself helped build, was based off residual motion artifacts. Kudos to JD Power and others like him.

D-Machine 6 days ago | parent [-]

Yes, and a great example of how so much research in fMRI methodology is just really good science working as it should.

parpfish 6 days ago | parent | prev [-]

The small sample sizes is rational response from scientists in the face of a) funding levels and b) unreasonable expectations from hiring/promotion committees.

cog neuro labs need to start organizing their research programs more like giant physics projects. Lots of PIs pooling funding and resources together into one big experiment rather than lots of little underpowered independent labs. But it’s difficult to set up a more institutional structure like this unless there’s a big shift in how we measure career advancement/success.

D-Machine 6 days ago | parent [-]

+1 to pooling funding and resources. This is desperately needed in fMRI (although site and other demographic / cultural effects make this much harder than in physics, I suspect).

leoc 6 days ago | parent [-]

I'm not an expert, but my hunch would be that a similar Big(ger) Science approach is also needed in areas like nutrition and (non-neurological) experimental psychology where (apparently) often group sizes are just too small. There are obvious drawbacks to having the choice of experiments controlled by consensus and bureaucracy, but if the experiments are otherwise not worthwhile what else is there to do?

D-Machine 6 days ago | parent [-]

I think the problems in nutrition are far, far deeper (we cannot properly control diet in most cases, and certainly not over long timeframes; we cannot track enough people long enough to measure most effects; we cannot trust the measurement i.e. self-report of what is consumed; industry biases are extremely strong; most nutrition effects are likely small and weak and/or interact strongly with genetics, making the sample size requirements larger still).

I'm not sure what you mean by "experimental psychology" though. There are areas like psychophysics that are arguably experimental and have robust findings, and there are some decent-ish studies in clinical psychology too. Here the group sizes are probably actually mostly not too bad.

Areas like social psychology have serious sample size problems, so might benefit, but this field also has serious measurement and reproducibility problems, weak experimental designs, and particularly strong ideological bias among the researchers. I'm not sure larger sample sizes would fix much of the research here.

leoc 6 days ago | parent [-]

> Areas like social psychology have serious sample size problems, so might benefit, but this field also has serious measurement and reproducibility problems, weak experimental designs, and particularly strong ideological bias among the researchers. I'm not sure larger sample sizes would fix much of the research here.

I can believe it; but a change doesn't have to be sufficient to be ncessary.

D-Machine 6 days ago | parent [-]

Agreed, it is needed regardless.

ieee2 4 days ago | parent | prev | next [-]

The signal is never 100% clear in any real world scenario... Noise is everywhere. But when there is some (even very little) signal in the noise, it is not the noise any more and the signal can be retrieved assuming enough data.

caycep 6 days ago | parent | prev [-]

which is why the good labs follow up fMRI results and then go in with direct neurophysiological recording...

SubiculumCode 6 days ago | parent [-]

You got downvoted, but I think you are right in a way. Direct neurophysiological recording is not a panacea because either 1) you can't implant electrodes in your participants ethically, 2) Recordings usually are limited in number or brain areas. That said, I think the key is "convergent evidence" that spans multiple levels and tools of analysis. That is how most progress has been made in various areas, like autism research (my current work) or memory function (dissertation). We try to bridge evidence spanning human behavior, EEG, fMRI, structural MRI, post-mortem, electrode, eye-tracking, with primate and rodent models, along with neuron cultures in a dish type of research. We integrate it and cross-pollinate.

caycep 5 days ago | parent [-]

There is actually a field where subjects do have electrodes implanted (see https://www.humansingleneuron.org/). - this is done when they are doing pre-operative recordings in preparation for brain surgery for the treatment of epilepsy, and the electrodes are already there for clinical diagnostic purposes and they volunteer a few hours of their time while they are sitting around in the hospital, to participate in various cognitive tasks/paradigms. The areas you can go are limited for sure, but some areas are near regions of interest depicting in fMRI scans.

Then there are also papers w/ recordings done in primates...

But overall yes - you integrate many modalities of research for a more robust theory of cognition

6 days ago | parent | prev | next [-]
[deleted]
j45 6 days ago | parent | prev [-]

I have heard and seen good things about QEEG and fMRI as well.

jtbayly 6 days ago | parent | prev | next [-]

But none of this (signal/noise ratio, etc) is related to the topic of the article, which claims that even with good signal, blood flow is not useful to determine brain activity.

D-Machine 6 days ago | parent | prev | next [-]

The difference is that EEG can be used usefully in e.g. biofeedback training and the study of sleep phases, so there is in fact enough signal here for it to be broadly useful in some simple cases. It is not clear fMRI has enough signal for anything even as simple as these things though.

j45 6 days ago | parent [-]

I have been told QEEG can offer an additional perspective in neurofeedback, etc as well.

fMRI's are being used in TBI/Concussion recovery that are study backed and seem to be delivering results.

D-Machine 6 days ago | parent | next [-]

Yes, there are a few medical cases where fMRI makes good simple basic sense, and TBI/Concussion sounds immediately like one of those to me. I seem also to recall them being useful in some cases prior to brain surgeries and the like.

This all makes sense because fMRI tracks metabolic activity via oxygenation changes, which is much more clearly and plausibly related to tissue health and recovery. In these cases, it is also most likely being used within-subject (i.e. longitudinally) to make comparisons to baselines, rather than in an attempt to make speculative inferences about the mind using groups of people, and likely is a simple comparison to baseline rather than bespoke statistical analyses relying on questionable assumptions about the BOLD response being related to overly-specific kinds of neural activity.

j45 6 days ago | parent [-]

fMRI can track oxygenation changes, and indirectly where the blood flow is, or isn't, and perhaps some ideas on where to get it.

All to say, this application might not fall in the 40%.

I just find articles like these can't help but feel like they have an agenda to undermine something instead of simply acknowledge the kinds of things it is and isn't working for.

There's no doubt these researchers have found something, but the need for sensationalistic headlines is well known in academia as well.

Sometimes it's noticeable where the research is specific in scope, but the findings are more general and broad.

hirvi74 6 days ago | parent | prev [-]

> fMRI's are being used in TBI/Concussion recovery

Interesting. Do you happen to have any more information on this topic? I ask because I was under the impression that concussions are a functional/metabolic injury and not a structural injury, therefore, concussions are not visible on any type of fMRI, CT Scan, etc.. Though, I haven't looked into this topic for almost half a decade, so I imagine things have likely progressed.

D-Machine 6 days ago | parent | next [-]

Well fMRI (as opposed to MRI) is used precisely because it measures things directly related to metabolism and function. Not hard to find info on this stuff: https://scholar.google.ca/scholar?hl=en&as_sdt=0%2C5&q=fMRI%...

j45 6 days ago | parent | prev [-]

Concussions seem to be pretty physiological - first they're a brain bleed, and blood doesn't seem to pump the same as it did before the concussion... resulting in different symptoms.

That might be what you're referring to as functional?

Metabolically, or otherwise, if the brain can't operate, other things in the body such as metabolism would be impacted for sure when it can't oversee and run as it normally can?

While I'm not sure if a concussion directly is visible or not (some have sizeable enough brain bleeds that can be visible), concussions to the extent that they are a change in blood circulation changes and issues, can be visualized on fMRI, etc, where it's not regular, those areas suffer in a brain.

Things luckily have progressed and quite exciting.

Out of convenience, I'll share one I know about (no affiliation) that lay out their therapies and the science behind it as well.

Effectively (I hope I'm getting this accurately) it seems the blood vessels in the brain also have signalling from the blood and oxygen that gets affected which affects things downstream from there.

These guys do an fMRI baseline, have you jump on a bike, fMRI again, see what's not getting blood, and then give you exercises and activites for those regions of the brain. It's pretty interesting.

https://www.cognitivefxusa.com/treatment

Some reported patient outcomes: https://www.cognitivefxusa.com/our-patients

Blog links to research: https://www.cognitivefxusa.com/blog

Independently of this I've heard QEEGs can do a similar thing of seeing where brain activity is/isn't baseline.

Plutoberth 6 days ago | parent | prev | next [-]

I'm not sure I understand. Wouldn't any prediction result above statistical random (in the image mind reading study) be significant? If the study was performed correctly I don't really need to know much about fMRI to tell whether it's an interesting result or not.

ladberg 5 days ago | parent [-]

The study misleading claimed to produce images from brainwaves. In reality, they effectively built a combination of classifier from brainwaves to one of a few predetermined classifications of images shown (still cool, but less impressive) and a neural net to reproduce images it was trained on given a classification (boring).

ErroneousBosh 6 days ago | parent | prev | next [-]

> When I asked "but DL is proven to be able to find pattern even in random noise, so how can you be sure this is not just overfitting to artefact?"

So here you say quite a mouthful. If you train it on a pattern it'll see that pattern everywhere - think about the early "Deep Dream" trippy-dogs-pictures nonsense that was pervasive about eight or nine years ago.

I repaired a couple of cameras for someone who was working with a large university hospital about 15 years ago, where they were using admittedly 2010s-era "Deep Learning" to analyse biopsy scans for signs of cancer. It worked brilliantly, at least with the training materials, incredible hit rate, not too terrible false positive rate (no biggie, you're just trying to decide if you want to investigate further), really low false negative rate (if there was cancer it would spot it, for sure, and you don't want to miss that).

But in real-world patient data it went completely mental. The sample data was real-world patient data, too, but on "uncontrolled" patients, it was detecting cancer all over the place. It also detected cancer in pictures of the Oncology department lino floor, it detected cancer in a picture of a guy's ID badge, it detected cancer in a closeup of my car tyre, and it detected cancer in a photo of a grey overcast sky.

Aw no. Now what?

Well, that's why I looked at the camera for them. They'd photographed the biopsies with one camera on site, from "real patients", but a lot of the "clear" biopsies were from other sites.

You're ahead of me now, aren't you?

The "Deep Learning" system had in fact trained itself on a speck of shit on the sensor of one of the cameras, the one used for most of the "has cancer" biopsies and most of the "real patient under test" biopsies. If that little blob of about a dozen slightly darker pixels was present, then it must be cancer because that's what the grown-ups told it. The actual picture content was largely irrelevant because the blob was consistent across all of them.

I'm not too keen on AI in healthcare, not as a definitive "go/no-go" test thing.

pedalpete 5 days ago | parent | prev | next [-]

I think you're throwing the baby out with the bathwater, while also pointing to the missing pieces in our understanding of the brain and consciousness.

I also work in the field, specifically with sleep slow-wave enhancement.

Blood flow as a proxy for brain activity I always felt was a weak measure, as the brain activity involved across all manner of operating our biological systems, so is the increased blood flow measured in fMRI a response to cognition, or autonomic activity? What does that oxydation mean.

EEG is similarly flawed when we try to equate "brainwaves" to emotions and consciousness. I think we're almost better off measuring HRV, a much simpler measure, and more reliable.

I'm fascinated that so many people who discuss brainwaves think of them as actual "waves", when it is just how we plot electrical activity which creates a visual wave like pattern.

However, and this is specifically related to our work in sleep, we can detect slow-waves (I dislike that term, it's the synchronous firing of neurons) and we are able to stimulate this restorative brain function through sensory perception during sleep, and even create slow-waves in a lab using TMS.

Research linked on our website [1]

I agree the industry needs to stop projecting what we hope we're seeing with what is actually being measured, and we don't understand enough about how the brain works, but I think completely throwing away any brain related measures we have is going too far.

1 - https://affectablesleep.com/how-it-works#research

j-krieger 6 days ago | parent | prev | next [-]

90% of papers I read in computer science / computer security speak of software written or AI models they trained that are nowhere to be found. Not on git nor via email to the authors.

aardvark92 6 days ago | parent | prev | next [-]

Saw the same thing first hand with Pathology data. Image analysis is far more straightforward problem than fMRI, but sorry, I do not trust your AI model that matches our pathologist’s scoring with 98.5% accuracy. Our pathologists are literally guesstimating these numbers and can vary by like 10-20% just based on the phase of the moon, whether the pathologist ate lunch yet, what slides he looked at earlier that day…that’s not even accounting for inter-pathologist variation…

Also saw this irl with a particular NGS diagnostic. This model was initially 99% accurate, P.I. smelled BS, had the grad student crunch the numbers again, 96% accurate, published it, built a company around this product —-> boom, 2 years later it was retracted because the data was a lot of amplified noise, spurious hits, overfitting.

I don’t know jack compared to the average HN contributor, but even I can smell the BS from a mile away in some of these biomedical AI models. Peer review is broken for highly-interdisciplinary research like this.

caycep 6 days ago | parent | prev | next [-]

There's fancier ML studies on EEG signal but probably not consistent enough for clinical work. For now, the one thing EEG can reliably tell is if you're having a seizure or not, if you're delirious (or in a coma) or not, or if you're asleep.

canjobear 6 days ago | parent | prev | next [-]

> but DL is proven to be able to find pattern even in random noise, so how can you be sure this is not just overfitting to artefact?

You test your DL decoder on held-out data. This is the common practice.

5 days ago | parent | prev [-]
[deleted]