What has changed since the last anti-spam report?
The most important and most influential change in the anti-spam system is a new machine-learning model for spam-detection. It is used by the artificial intelligence (AI) to detect spammers. By improving our model-training and by adding more data to the training phase, the AI became more reliable and detects more kinds of spammers.
The graph above shows that our detection rate increased dramatically with the end of April at the release of the new model. To make this even more visible, have a look at the picture below which shows the same graph but this time only the section for April. Here we can see even better how a small change to the AI can affect the entire system: Upon releasing the new version towards mid-/ end April, we find a significantly bigger amount of spammers.
In addition, we added a completely new spam detection method to our anti-spam system: Entity Reputation. Our anti-spam system now operates with 4 pillars which detect spam independently from each other:
- Rule-based system
This system detects spammers through simple rules, like keyword analysis or by the exclusion of entire ranges of IP addresses.
- Artificial Intelligence
The AI analyzes the behavior of all users and automatically recognizes when a profile acts suspiciously.
- NEW! Entity Reputation
In the middle of May, we added a completely new system as our third pillar. This system works with trust points. Each user profile includes various entities, such as, for example, used devices, email domains, IP addresses, and so on. Each of these entities loses trust points (a.k.a. reputation) when a large amount of spam profiles can be associated with it. When a new user registers and the combination of his entities is not trustworthy enough, we directly ask for a verification. The idea behind this system is that it is (too) much effort for professional spammers to change all entities from one spam wave to the next. With that, spammers taint their entities with every spam wave they execute.
- User reports
In addition to these three pillars, we also make use of the vigilance of our users. This last resort monitors the reports about each profile that reach our system. Once a critical number is exceeded, the associated profile is blocked directly or send to the manual check.
With these tools we managed to keep the amount of spammers in the entire user base at 0.3%. Although the number is somewhat higher than in the previous quarter, it does not mean that there were more spammers in the app. Since we improved the system we have simply been able to recognize spammers which we couldn’t detect before.
How active are spammers?
Although there are only a few spam profiles, they are active above average and generate a fair amount of likes, i.e. positive votes in the Match.
The percentage of spam votes compared to the total votes is 4.01 %. However, as can be seen in the graph below, we have a continuous downtrend towards fewer spam votes.
Who detects more spam: our anti-spam system or our users?
Spammers can be reported by users as well as identified automatically by the anti-spam system. Our goal is to block as many spammers as possible automatically, before the critical number of reports is reached or the support staff have to intervene manually.
In this quarter we succeeded in blocking 89% of all spammers automatically using the anti-spam system. The remaining 11% spammers that got reported by users was also helpful. These reports help us to continuously improve our system. If it is a new type of spam or a completely new behavior of spammers, we include this information in the system. With this we make sure to be able to recognize spammers sooner and we can even retroactively block profiles which showed the same characteristics in the past. Our main goal, however is to get the anti-spam system to find 100% of all spammers so that no user is forced to encounter them.
What happens to a report?
A question often asked is why a user is not blocked immediately after the first report. Very simple: users report other users for the most diverse reasons. Not all of these reasons justify blocking the reported user. Every day we receive a huge number of reports: In this quarter alone we got 1.145.683 reports that reported a staggering 883.431 of different user profiles. However, 79% of these profiles were, in fact, no spammers.
|Quarter||Reports||Users||Of which were no spammers|
How long does it take until a spammer is detected?
We have to be very careful when blocking users automatically. On the one hand, we want to catch as many spammers as possible, but we must not be too quick to judge. Otherwise, we risk blocking real users who only briefly exhibit spam-like behavior. This can happen, for example, if Match is played very fast. The anti-spam system therefore waits until a user has exhibited negative behavior on several occasions.
On average in this quarter, we took 2.4 hours from the first event executed, e.g. a vote in the Match, till the profile got blocked through the anti-spam system. This is a lot longer than in the last quarter. The reason for this is that spammers have adapted to our system: They act increasingly cautious to stay hidden and remain below the radar for as long as possible. They do so by using likes very sparsely. This means, that sometimes it can take up to several days until enough actions could be observed to identify a profile as a spam profile. The anti-spam system needs to observe negative behaviour several times until it can justify blocking a profile. Of course, a consistently high amount of likes is identified almost immediately and the corresponding user is blocked within a few seconds.
Are spammers more active on male or female accounts?
As in the previous quarters, spammers had predominantly female profiles. From all spam profiles a staggering 80.21% were female. On average these profiles were 29 years old. The male spam profiles were on average 10 years older. Why this discrepancy exists is a mystery.
|Q3/2016||♀ 84%||♂ 16%||♀ 29||♂ 29|
|Q4/2016||♀ 70%||♂ 30%||♀ 31||♂ 29|
|Q1/2017||♀ 77%||♂ 23%||♀ 29||♂ 29|
|Q2/2017||♀ 80%||♂ 20%||♀ 29||♂ 39|
More interesting stuff from the current quarter
We are particularly proud that we could also publicly shed some light on the fight against spam and fake. While we got very good results in a TV report in the last quarter in comparison with other dating portals (http://www.daserste.de/information/ratgeber-service/vorsicht-verbraucherfalle/sendung/falsche-flirts-single-boersen-fake-profile-100.html), this time we could even promote our own success. Juan, a programmer of the anti-spam system, explained the anti-spam setup and showed some Machine Learning methods at a Berlin tech conference ‘BuzzWords’ (11-13 June).
If you want more information about this, you can see the full talk here:
In the end we have to say: The fight against spammers is a never ending story – every time you believe that you found and eliminated them, they come up with a new trick. They often do this by imitating normal user behavior, which makes them even harder to detect.
Success (as in detected spammers) and failure (as in user complaints about profiles that have not been blocked) often go hand in hand. We take on the challenge every day and are become more and more efficient at detecting new types of spam.