Remix.run Logo
enoent 3 hours ago

> Fraud detection in transaction data is mostly SQL. Not machine learning, not graph databases, not whatever Gartner is hyping this year. SQL, run against the right tables, with the right joins, looking for the right shapes.

It's also not all program-integrity, which is the only work that could justify such blanket statements. Worse is better as long as it addresses the problem domain.

Fintech clients are generally interested in knowing whether a transaction happening _right now_ is fraud. They want to know that in a few milliseconds, for high-dimensional data. It's work done at a scale where relational databases cannot meet these real-time constraints, and instead find other uses like historical data loading. That's how you end up with in-memory databases, stream-processing engines, and yes, even machine learning.

Having said that, some of the author's points are valid, and I'm looking forward for their next writings, in particular dealing with noisy alerts is a general problem beyond performance engineering.

beamy an hour ago | parent [-]

In my experience, what you're describing would more specifically be called Fraud Prevention rather than Fraud Detection. Both tend to coexist and are complementary in a mature setup.

For Prevention, you're always going to be constrained by latency requirements, available data and an incomplete picture of user behaviour. You make a quick decision using ML and rules that deals with the majority of cases. But those constraints make it impossible to precisely prevent all fraud.

Detection deals with the downstream consequences of this. A team of analysts will typically analyse the accepted transactions for signs of fraud. This is particularly important for fraud types where you don't get an external signal like a chargeback or customer complaint. Platform integrity is one such example. But Fintechs will also see this building anti-money laundering systems - you need to go looking for the fraud. This is the process the article is describing.

I say they're complementary because the detected transactions become the labels for training and evaluating the next iteration of prevention models.