A Beginner’s Guide to Machine Learning
Are you a stranger to Machine Learning? Reading our beginner's guide to machine learning you’ll learn how to effectively prevent fraud in five simple steps.
Hubert RachwalskiChief Executive Officer
18 August 2017
10 min read
Nowadays, Machine Learning is being applied in nearly all areas of business: customer churn prediction, credit scoring, offer recommendation (e.g. Amazon or Netflix) and more. Machines can pilot an aircraft, drive a car, read texts and recognize their sentiment, and even write short novels or compose music. They have already beaten humans in one of the popular multiplayer cooperative games – DOTA2.
This technology has also proven to be extremely effective when it comes to fighting fraud.
….but what exactly Machine Learning is in the context of detecting fraudulent activities?
Machine Learning is a subfield of computer science that allows the machine to learn to tell fraudsters from legitimate users without explicitly telling it what designates to look for.
Let’s dive deeper…
The idea is that there are certain characteristics of fraudulent transactions that differentiate them from legitimate ones. Machine Learning algorithms recognize patterns in the data that allow them to discern fraudsters from legitimate clients, based on thousands of pieces of information, that sometimes may seem completely unrelated to a human being. The algorithm is searching for patterns in fraudsters’ behaviour, their hardware characteristics etc.
Whenever a customer carries out a transaction – the Machine Learning model thoroughly x-rays their profile searching for suspicious patterns.
Depending on the severity of the discovered “fraud-like” patterns, such a transaction can be accepted, blocked or handed over for a manual review. Everything is done in milliseconds.
What makes Machine Learning so special, is that it allows spotting fraudulent transactions with very high accuracy. Take Almundo.com case. This popular Online Travel Agent from Latin America has reduced fraud, chargebacks and manual reviews by 70%, thanks to Machine Learning.
Such a reduction leads to better customer experience (fewer false positives), optimisation of operational costs and a significant increase in revenue.
Machine Learning is not aimed at replacing risk managers – it provides them with a more powerful tool to do their job!
There are a few reasons why companies should consider including Machine Learning in their fraud detection strategy. I’ve selected some that I find most important.
Online fraud has become more sophisticated due to the rapid advances in the technology available to fraudsters. Therefore to stay one step ahead of them, companies need to analyze much more data to successfully detect fraudulent attempts. However, a skilful analyst can embrace, say, up to 10-20 pieces of information. While, Machine Learning allows us to analyze thousands of features, and it will do it in a blink of an eye.
The traditional approach to fraud detection, using static rules-based systems (also known as production or expert systems) has its disadvantages which make it less effective:
And last but not least, Machine Learning allows to clearly devise a business strategy based on KPIs and generated predictions of fraud attempts. It is possible to foresee the levels of refusal, acceptance or manual review to maximize the revenue. It means, for instance, that you are able to understand at what level of refusals, and how many fraud transactions will be caught.
For the purpose of this blog post, I’m presenting a simplified version of the Machine Learning process, to give you a general concept of what it is all about.
First of all, you need to determine your business objectives. Your goals may include, for instance:
Here are some common questions that need to be answered during Step 1:
On a technical level, our main goal is to predict whether a given transaction is a part of the revenue or a fraud attempt.
Imagine that you want to learn a new skill. What do you do? You look for educational information. Read books, guides, various articles, ask questions on forums, talk to professionals in this area etc.
The same refers to machines – in order to create fraudsters’ profiles, they need historical data about previous fraudulent events. The more features and data collected by a company to analyze, the better. It could be time, frequency or value of the transaction, the history of the previous purchases, geolocalization information, chargebacks report etc.
This raw data should be then cleaned and prepared into a form understandable for machines. It may take some time (usually it is 60% – 80% of the whole Machine Learning process) and requires certain technical skills. So it is advisable to build such competency inside your company or outsource it to an external vendor.
The result of Step 2 is a source dataset that will be used for further analysis (see Step 3). Below, you will find a simplified example of what one can receive as a result of data preparation. Please keep in mind, that in practice, such a dataset may include hundreds or thousands of columns and even millions of rows.
As you can see, in our example, each transaction (row) is described by a set of features (columns). The last column is called the target. It indicates whether a particular transaction turned out to be a fraud or not. It is not important how you will mark a fraud in your data, it’s up to you. The target can take a value of “1”, “F”, “Fraud” etc. It is not important which transactions your business considers as fraudulent — machine learning algorithms will look for patterns that discern the “1” class from “0”. However, it’s worth noting that the accuracy of the algorithm depends on the quality of the “Target” column. Of course, the strength of ML comes also in a possibility of identifying more categories e.g. – good customer, a regular customer, fraudster.
Wait, Machine Learning what….?
Machine Learning model.
This is what the whole ML process is about, its final product. Once provided with information about a new transaction, the model will generate a recommendation stating whether you are dealing with a fraud attempt or not.
During the process of building such a model, one takes the dataset from the Step 2 to find out what characterizes marked fraudulent transactions and what the best predictors of fraud are. As there might be hundreds of features describing transactions, customers and their behaviour, analyzing and drawing a meaningful conclusion is not a trivial task.
This process requires proper technology and Data Scientists with domain knowledge to know how to combine different kinds of data, which modelling technique will be most suitable for the particular business case and data, what will be the best set of model parameters and more.
Ok, so we have a Machine Learning model….now what?
Make it work for your business! The model should be now deployed and integrated with your IT infrastructure.
Every time a customer buys a product/service in your e-store, the data about this transaction will be sent to the model. The model will generate a recommendation based on which your transaction system will make a decision about approving, blocking it or marking for manual review.
This process is called data scoring.
But that’s not the end. During a manual review, if a fraud detection team member marks the suspicious transaction as a legitimate one (false positive), the Machine Learning model will take this information into account to make a better, more accurate decision next time.
Models working in the production environment are under an instantaneous feedback loop with new chargebacks and are constantly retrained to be able to detect new emerging fraudulent patterns. Just like in real life, humans without learning stimuli degrade their intellectual capabilities, the same goes for models.
As I mentioned, fraud attacks are getting more sophisticated, therefore one needs more data to successfully detect fraud. For instance, detailed device features (e.g. GPU capabilities, processing power, connection type, use of a virtual machine or a VPN connection) can bring a lot of new insights about the consumer and increase the accuracy of prediction.
It is recommended to look for new sources of information or use one of the available anti-fraud systems, which gather even 3000 data points and analyse them in order to create more precise and detailed fraudsters’ profiles.
I hope you have enjoyed this guide. If you would like to learn more about Machine Learning, I recommend the Visual intro to machine learning from R2D3, which will give you more insights into the topic. Also, check out the article from Harvard Business Review to learn what ML can and cannot do for your organization.
…or just drop us a line at email@example.com. We will gladly answer all your questions concerning the application of ML in your business.