nethone fraud ocean hero background

Today everyone speaks about ML and its strengths, but the reality is that it needs the proper fuel to do all the promoted deeds of AI. The more raw data attributes you collect about the customer interaction with the website, their device, browser, you can feed more knowledge into your ML models yielding a true understanding of the user with the ability to spot suspicious behaviors. Some data attributes are out there for the taking and are widely used. Still, there is a category far harder to reach - the in-depth data - that explores users profiling and skyrockets your fraud detection accuracy. What is it exactly?

Raw data collection - this is where the magic begins

Today I will introduce you to the concept of in-depth data - an additional layer of user’s profiling to deeper understand their actions and enrich ML models with contextual information. It all starts at the process of data gathering. Raw attributes are collected from various sources: client’s database, third-party provider, user’s interaction with the website. Most current anti-fraud systems receive:

  1. cookies data,
  2. some behavioral data,
  3. hardware, software, browser intelligence & network data,
  4. transaction data, purchased goods data (client’s data),
  5. linked data with history of the certain identifying attribute - email, card token/number, phone number etc. (third-party data)

I will focus on data points from group 1,2, and 3 - users data - as the ones that are important to understand customers. The more unique users’ data attributes you gather, the more accurate fuel for ML algorithms is provided and the better it can recognize imposters. But the constant challenge in a fight against online fraud is to come up with new ways to screen users’ interactions with the website or application. This is where in-depth data comes into play as an advanced method of user’s profiling.

What “In-depth data” means

In-depth data is the insightful, raw data attributes about the user extracted from the website and processed into new meaningful features for ML to improve the authentication process. It’s a variety of not easily reachable hardware, network and behavior characteristics that can shed light on the user’s motives.

They can be combined into meaningful features and/or signals to enhance Machine Learning algorithms with potent and unique knowledge.

How to gather in-depth data

To make it more clear to you, I will show you the process of finding that data.

First, security & intelligence experts specializing in computer crimes study the darknet. They reverse-engineer fraudsters' tools and techniques, study their behaviour to spot and identify non-obvious technical or behavioral patterns. They take apart the technology that allows fraudsters to anonymize themselves in order to find a characteristic set of factors that indicate the use of a given tool. In the last stage, they add these new attributes to the list of data collected from the website. And it is these new attributes that I call in-depth data - significant information from the browser, software, hardware or even robust use of behavioural data about the user/device that boost the process of distinguishing imposters over the real users.

At what point in the detection of fraud does in-depth data matter most?

First and foremost, in the data collection process, to gather as many unique data attributes as you can about the users to deeply understand their behaviour and identify their true intentions. But quantity must go hand in hand with quality. Just collecting more and more data will not work if you do not know what context it should be analyzed in or what knowledge this data might carry.

Secondly, in-depth data is beneficial while creating robust fingerprints. A fingerprint is a device-specific set of data (hardware, software, browser intelligence & network data combined) extracted from a user session, which can be used to confirm with a high probability user's device between visits to a website. The equivalent of a cookie in the real world would be the license plate of a car. The equivalent of a fingerprint is an even richer description: "red Volkswagen Passat with a broken mirror, green spoiler, and bead seat covers." It’s sort of a label that each device gets when entering the website.

All fraud prevention systems apply fingerprint technology in the fraud detection process, so it is not a question of if you use one, but what quality you have (how many attributes contribute to the fingerprint). Stronger and more unique fingerprints contribute to the increase in fraud detection accuracy and in user authentication. There are pre-built JavaScript fingerprint libraries that can be bought and just plug into the anti-fraud systems, but you never know the quality of those libraries. The whole fraud detection process is about the extreme precision of the technology to spot minor details or anomalies in users’ behaviour. For this, among other factors, you need the most accurate fingerprints to classify the user as precisely as possible. And these you will create only on the basis of unique data. In-depth data gives you that possibility.

Thirdly, in-depth data can carry more knowledge about the users into the models, and as I have mentioned several times, the more data you provide to the algorithms, the better recommendations they will make. Furthermore, it is not only about higher accuracy but stronger resilience - relying on a broader range of data means that no single data attribute is a game-changer, so its change by the fraudster will not fool the system.

Data + the right technology

Correctly collected raw data; its quantity and quality have a critical role in anti-fraud systems. However, in this text, I use a bit of understatement and simplification in an attempt to explain the very concept of in-depth data. In addition to the darknet know-how required to find these unique indicators, you also need the right technology for their extraction and processing. Be it in the process of finding a way to collect new data attributes from the website, processing it into a digestible form for the models (features) or humans(signals) - so in other words - giving them meaning, or creating unique fingerprints. Not to mention, ML algorithms. You need a sophisticated technology that will transform those raw data attributes into real value to the business.

Data will tell you all you want to know (and more) if only you know which set to choose, what technology to use to collect it & analyze it!

If you would like to know more about the topic of in-depth data, stay tuned, as I will prepare another blog post on which technology to use to extract it.


Financed by: