The task of detecting fraudulent online payments is a perfect use case for applying machine learning algorithms that thrive in environments where data volume is high and the characteristics of fraudulent transactions cannot be easily detected using only a handful of features. Nonetheless, many fraud prevention systems still rely on hard-coded rules engines that consolidate the aggregate knowledge of fraud experts. In this piece I will shed some light on the main differences between the two approaches and which use cases fit one or the other better.
A crucial cog in the machine - the decision engine
Systems that guard merchants from fraud are a lot more than just serialized machine learning models or sets of rules expressed in code using many “if-else” statements. There are a lot of other engineering challenges in various areas ranging from infrastructure, backend and frontend programming. Those challenges tend to differ a bit depending on the chosen decision engine and business sector specificity, but they are not the main topic of this blog post. Here, we will focus on just one crucial piece, a single cog in the machine - the decision engine that determines whether the transaction is fraudulent or not.
Rules based systems
As the name suggests, those systems rely on hard coded rules that are set to flag transactions if they meet certain criteria. Such rules can be developed by:
- following industry best practices - like blocking multiple transactions from a single account in a short period of time or the ones coming through VPNs or from risky areas,
- analyzing caught / prevented fraudulent transactions and developing new rules to cover all of their suspicious characteristics.
The rules are often expressed using “if-else” statements present in almost all imperative programming languages and are easily interpretable. They mirror the way in which a human would process a transaction — the engine checks if a transaction meets any of the risky patterns expressed in the rules and if it does, it blocks it or sends it to be manually reviewed by humans. This is one of the reasons why their presence is still very strong - stakeholders trust them because they mimic the way in which they themselves would tackle this task.
Advantages of rule based systems
- Full explainability out of the box - if a certain rule triggered an alert for a particular transaction it’s 100% transparent why this happened.
- No cold start problem - they are operational from day 1, there’s no need to gather training datasets that are required for machine learning algorithms.
- Low threshold of entry - you don’t need a team of data scientists, machine learning engineers or MLOps - first rules can be easily implemented by the backend team since they are already familiar with translating business logic into code.
- Continuous need of reverse engineering fraudsters’ attacks - new rules have to be developed as new fraud patterns emerge.
- Incremental number of rules - cost of maintenance grows in time (recalibration & adjusting to new fraud patterns).
- Detection of fraud cases with limited complexity - there is a limit for number of rules & transactions’ features. Rule based systems are limited by human comprehension (due to manual development of rules & necessary maintenance).
Machine learning models in fraud prevention
ML models address the shortcomings of rule based systems. They thrive in environments where the volume and dimensionality of data is high. Algorithms like decision trees, random forests, gradient boosting or neural networks are designed to find complex, nonlinear patterns utilizing hundreds (if available) features of transactions. Such an approach demands a shift in focus For one, deploying ML models requires high quality, labeled historical data used as a training dataset. The more data you have (in terms of the number of transactions and number of features capturing transactions’ characteristics) the better the model will perform. In such a scenario we are trying to keep a record of past transactions (with a detailed description in the form of a feature vector) rather than trying to directly understand the fraud phenomena.
- Automatic fraud pattern recognition - the task of figuring out what makes a fraud is handled by the algorithm. Our task is to provide it with as detailed a description as possible (in form of a feature vector).
- Concept drift defined as a change in fraud characteristics in time (new fraud methods, new tools used by fraudsters) often can be solved by retraining the models on new data — there’s no need to reverse engineer fraudsters’ methods.
- Less manual work involved - many of the processes can be automated. Companies that have mature machine learning pipelines spend most of the time on researching new features & algorithms while keeping an eye on performance metrics of current models available through monitoring apps.
- ML models’ economical efficiency grows along with data volume. The more data you have and the more complex it is, the harder it is to develop rule-based systems. The return on developing automated fraud detection using ML models thus increases as data volume increases.
- Cold start problem - to run ML models you need a significant amount of historical data.
- Lack of explainability out of the box - not all algorithms’ predictions can be easily explained, some of them are “black boxes” for which there are no easy explanations between inputs and outputs.
ML models deep dive
Most modern fraud prevention systems function as hybrid solutions that gather outputs from both rule based engines and machine learning models, and then propose a synthetic recommendation based on the client specific business logic. Since rule based systems mimic the reasoning process of humans let’s dive deeper into how machine learning algorithms find fraudulent traits in online traffic.
There has been a lot of hype around machine learning for the past few years but certain tasks, like fraud detection, remain difficult even for many novel methods and techniques. Extreme class imbalance, concept drift (defined as a varying characteristics of detected phenomena in time), and expectations of full explainability of models’ predictions from business stakeholders are just some examples of common difficulties.
Fraudulent transactions tend to make up a tiny fraction of traffic. This poses 2 challenges:
Datasets need to be bigger than usual due to the fact that fraudulent patterns are to be observed only in a small fraction of the data. Since most of the traffic is legitimate, models need to be carefully calibrated so as not to “suffocate” the business by frequent false positive errors (the situation when a legitimate transaction is blocked on the suspicion of fraud). These data characteristics disqualify a range of ML algorithms. Gradient boosting methods tend to strive in such environments due to the feedback loop mechanism that is embedded into the algorithm. During the iterative process of training, the algorithm “focuses” on the parts of data where it was previously wrong - this mechanism is a good solution to class imbalance.
Fraudsters play a constant “cops and robbers” game with companies working on fraud prevention software. Their toolset is growing and when a new security measure becomes a new industry standard, they quickly adapt to the situation and find new ways of being efficient at their activities. This calls for frequent retraining of ML models - one trained a year ago may not address the fraud patterns found in newer data samples.
Superiority over rule based systems
Maintaining a complex rule engine with hundreds of interdependent rules that express constantly changing fraud patterns isn’t easy and it definitely isn’t scalable. In contrast, machine learning based solutions scale automatically via cloud service providers - the only difference in cost between processing 1k and 100k transactions is the figure on the invoice from your cloud service provider. Data scientists or machine learning engineers need to do exactly the same job provided they use proper tools and automate repetitive tasks like retraining models or data collection.
Automatic adaptation via retraining
Concept drift is less troublesome for machine learning based solutions. In rule based engines, changes in fraud patterns call for manual recalibration of rules and creation of new ones that are a result of research. This is manual work that can’t be easily automated. In comparison, ML models require rerunning the training on new data samples and (sometimes) coming up with new features that would capture the change in detected phenomena described as concept drift. Retraining can be easily automated so, again, ML models prove to be more effective cost wise.
Automatic detection of fraud patterns
Today, you can attend an online bootcamp that teaches you how to effectively commit fraud the same way one might attend an online course to learn programming. This means that obvious fraud patterns, expressed by rule based engines that haven’t evolved as much as ML in recent years, will be swiftly bypassed by modern fraudsters. In light of this, automatic fraud pattern detection that comes with ML models is a necessity rather than a luxury.
Power of ensembles
Many modern day ML algorithms work as ensembles (e.g. random forest, gradient boosting). This means that, under the hood, algorithms create numerous separate classifiers that are trained independently on different data subsets, learning slightly different things about fraud patterns. When deployed, they vote on the score for every transaction, solving the problem of bias. If a fraudster is coming from the other part of the world and is half the age of the analyst that composes the rules, the bias transferred from analyst to code can create a gateway for fraudsters coming from different backgrounds. Ensembles partially alleviate this single point of failure.
Rule based systems hold a strong advantage over ML models in terms of explainability. In such systems, there is little ambiguity over why a certain transaction was blocked. Some ML algorithms (especially deep neural networks, the most hyped of all ML techniques) work as a black boxes - there is no easy way of saying why it returned a certain value for certain input. Fortunately, most fraud detection datasets are imbalanced and made of structured data - this means that algorithms that utilize decision trees work really well. Predictions of such models can be easily explained using packages like ELI5 (which stands for “Explain Like I'm 5”) that enable us to see which transaction traits contribute to its likelihood of being fraudulent (just like in rule based systems). Even if the algorithm is not tree-based, there are many tools that try to demystify the internal workings of those black boxes (deep neural networks included). XAI which stands for “Explainable Artificial Intelligence” is a new field that gained a lot of attention recently due to the fact that many real-world applications of ML models demand explainability.
In this piece I tried to outline the main differences between rule based engines and machine learning models. A stated above, the best set up should contain both since they are not mutually exclusive. Each of the methods has its pros and cons but it looks like the future belongs to machine learning, complemented by rule based systems. One way of looking at this is treating the machine learning model as just another rule in a rule based engine - it’s just a bit more smart, that’s all.