Who's really behind the screen? Understand fraudster profiling

In the previous article, I explained the very concept of in-depth data, its vast impact on deep fraudster profiling increases fraud detection accuracy in online business. Behind all the greatness of in-depth data stands the right technology. Only with this can you find those data attributes, collect them from the website, and combine them into meaningful features for ML models. With the proper technology, you can also create a complete profile of a user from declared and undeclared data the second he/she enters the website. In this text, you will learn more about the Profiler - the solution for in-depth profiling of online users to discover the true motive and intention of each user's visit.

Who's really behind the screen? Understand fraudster profiling

Fraudster profiling - what it is and how it works

First, the team of highly specialised experts in darknet knowledge and computer science search for valuable and insightful data attributes among fraudsters technology to distinguish the characteristic and unique factors of individual fraudster techniques. They do it by reverse engineering fraudsters' tools and studying their behaviour. Based on that knowledge, the Profiler team lists all essential characteristics (in-depth data) about the online user that can tell apart fraudsters and actual customers.

The Profiler technology extracts more than 5,000 attributes about each user who lands on a page to see if they are attempting to anonymise their actions, conceal their actual location, or use automation tools to deceive your business. The technology creates a complete profile of a user from declared and undeclared data the second he/she enters the website. It’s an in-depth screening of a user who will no longer trick you with system emulations or spoofing.

What’s important here is the fact that the Profiler is as powerful in screening mobile devices and applications as computers.

4 groups of data gathered by the Profiler

There are four main groups of in-depth attributes that Profiler delves into. However, the list is never final. The constant challenge is to develop new ways to screen users’ interactions with the website or application to understand them better and always be one step ahead of fraudsters.

  1. Hardware - This variable exposes the truth about the device itself (e.g. is the claim of a “mobile device” actually true or is this emulation?) and allows for precise fingerprinting.
  2. Software and browser intelligence - Understanding the device’s software environment, plugins, applications, coherence of the setup, and use of fraudulent tools.
  3. Behavioral data - This attribute shows the user’s behaviour, how they interact with the website and is it a human behind it.
  4. Network data - Everything related to the user’s internet connection, proxies, VPNs, and detection of various anonymisation techniques.

Signals - a human-readable extract of the above

Out of those 5K attributes, Profilers created the list of Signals. A signal interprets the information extracted from in-depth data attributes that indicate a higher probability of fraud. (i.e. use of 'Virtual Machine', instance 'User-Agent spoofing', connection through a 'Tor Network'). Profiler today delivers more than 80 signals, and the list is growing with the development of fraudster techniques.

Before data is thrown into ML models...

When developing most ML-based fraud detection systems, the raw data gathered from the website/application/client before being sent to models must be prepared to train the algorithms. This is a crucial step and, if done right, truly enhances the power of ML and improves fraud recognition. For outstanding ML performance, you need both: a) collection of in-depth data attributes (harder to reach raw data attributes extracted from the user session, found by reverse engineering of fraudsters’ tools), and b) transformation of those raw data into meaningful features for ML models that add additional context to algorithms.

Why are new features so important?

Features are representations and interpretation of data that allows the models to extract knowledge. For example, the model does not know that you can count the distance between two sets of coordinates, but if you create a new field that holds the distance between those two sets, this is a perfect feature for the model to use. This will then allow the model to figure out whether a shorter length points to a fraudster or maybe to a legitimate user.

Another great example is the email address, where instead of combining two attributes, you can use the pieces of it. If a full email address is just passed on to the model, it most typically is treated as an identifier. If there is a lot of fraud done with this email, then this identifier would be sought for by the model to detect fraud.

But that’s not why we use Machine Learning. You could get a similar result with a blacklist.

What could be done is the decomposition of the email address into separate features understandable by the model, based on the knowledge of what an email address technically is. That means - decoupling the first part of the email address (i.e. john.smith) from the domain (email.com) and passing on the domain info as a separate feature.

That move already allows the model to spot patterns of use of these domains - e.g., identifying temporary email domains as those far more often tied to fraudulent activity.

That process is just one of the things that can be done with email. Another is to check if the first part of the email is a random string or not.

The list of such transformations is immense. The more data you have, the more combinations can be done between and within them. The limit is in the scientist’s breadth, creativity, and expertise in what each attribute represents and understanding how the chosen ML approach interacts with the possible features.

The Profiler + ML - the real deal in the fight with a fraudster

The Profiler today is not the technology that will single-handedly detect fraud. With its development and broadening of available data, the advantages will only become apparent when combined with ML models. The two technologies are intended to complement and reinforce each other. Data Scientists (DS) make great models technically, and they know how to challenge them to have better performance. Still, they might miss the insight knowledge about the world of cybercriminals or the instinct for what data to use while profiling fraudsters. Therefore, Profiler’s role is to proactively search and collect as much raw unique users’ data as possible and facilitate the creation o new features to enhance overall performance. Models' power lies in the ability to crunch immense volumes of data and analyse them in a blink of an eye to find patterns unattainable by human analysts.

Combination of client data (transaction’s data + purchased goods), in-depth data and any 3rd party data with accurate utilisation of Machine Learning allows to spot most advanced patterns, hone in on the fraudsters, minimise false positives and deliver the best value - all in real-time.

If you would like to learn more about Machine learning in fraudster profiling, check our Beginner's Guide to Machine Learning in Payment Fraud Detection and Prevention. If you would like to learn more, we're also happy to talk...


Financed by: