Machine Learning for Cybercriminals, Part 2
Machine Learning for Cybercriminals, Part 2
Experts think that ML and AI will allow the bad guys to navigate around CAPTCHA protocols, automate attacks, and more in the coming years.
Join the DZone community and get the full member experience.Join For Free
Protect your applications against today's increasingly sophisticated threat landscape.
Welcome back! If you missed Part 1, you can check it out here!
Machine Learning for Unauthorized Access
The next step is obtaining unauthorized access to user accounts. Imagine cybercriminals' need to get unauthorized access to a user's session. The obvious way is to compromise the account. For mass hacking, one of the annoying things is a captcha bypass. A number of computer programs can solve a simple captcha tests but the most complex part is the object segmentation. There are numerous research papers where captcha bypass methods were described. One of the first examples of Machine Learning was published on June 27, 2012, by Claudia Cruz, Fernando Uceda, and Leobardo Reyes. They used a support vector machines (SVM) method to break a system running on reCAPTCHA images with an accuracy of 82%. All captcha mechanisms were significantly improved. However, afterward, a wave of papers appeared that leveraged deep learning methods to break CAPTCHA. In 2016, an article was published that detailed how to break simple-captcha with 92% accuracy using deep learning.
Another piece of research used one of the latest advances in image recognition - deep residual networks with 34 layers to break a CAPTCHA of IRCTC, a popular Indian website, also with 95-98% accuracy. These articles mostly embraced character-based CAPTCHAs.
One of the most inspiring papers was released at a BlackHat conference. The research was called "I am a Robot." They broke the latest semantic image CAPTCHA and compared various machine learning algorithms. The paper promised a 98% accuracy on breaking Google's reCAPTCHA.
To make things even worse, a new article states that scientists warn of forthcoming 100% CAPTCHA bypass methods.
Another area where cybercriminals may find advantages with the help of machine learning is password brute force.
Markov models were the first models to be used to generate password "guesses" in 2005, a long time before deep learning became so topical. If you are familiar with the current neural networks and LSTM, you have probably heard about a network that generates text based on the trained text; for example, if you give this network a Shakespeare work, and it will create a new text based on it. The same idea can be used for generating passwords. If we can train a network on the most common passwords, and it can generate a lot of similar ones. Researchers took this approach, applied it to passwords and received positive outcomes, which are better than traditional mutations to create password lists such as changing letters to symbols, e.g. from "s" to "$".
Another approach was mentioned in one of papers "PassGAN: A Deep Learning Approach for Password Guessing" where researchers used GANs - generative adversarial networks - to generate passwords. GANs are special types of neural networks consisting of two networks; one is usually called generative and another is discriminative. While one is generating adversarial examples, another is testing if they can fix an issue. The core idea is to train the networks that are based on the real data about passwords from those which were collected from the recent data breaches. And after the publication of the biggest database of 1.4 billion passwords from all breaches, the idea looks promising for cybercriminals.
Machine Learning for Attacks
The fourth area where cybercriminals want to use machine learning is the actual attack. Overall, there are three general goals for attacks: espionage, sabotage, and fraud. Mostly all of them are performed with malware, spyware, ransomware, or any other type of malicious program which users download because of phishing or attackers uploaded them to a victim's machine. In any case, attackers need to somehow upload malware on a victim's machine.
The use of machine learning for malware protection was probably the first commercially successful application of machine learning for cybersecurity. There are dozens of works describing different techniques on how to detect malware using artificial intelligence (AI) but it's a topic for another article.
How can cybercriminals use machine learning to create malware? The first well-known example of AI for malware creation was presented in 2017 in a paper called "Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN." The authors built a network called MalGAN.
This research proposes an algorithm to generate malware examples, which are able to bypass black-box machine learning-based detection models. The presented algorithm turns out to be much better than a traditional gradient-based example of generation algorithms and is able to decrease the detection rate to nearly zero. The algorithm is quite obvious, the system takes original malware samples as inputs and outputs adversarial examples based on a sample and some noise. The non-linear structure of neural networks enables them to generate more complex and flexible examples to trick the target model.
I mentioned earlier that there are three main attack purposes: espionage, sabotage, and fraud, and most of them are carried out by malware. Nevertheless, there is another relatively new type of attacks that can be considered as a sabotage and it's dubbed Crowdturfing. To put it simply, crowdturfing is a malicious use of crowdsourcing services. For example, an attacker pays workers some cash to write negative online reviews for a competing business. Since real people write them, these reviews often go undetected as automated tools are looking for software attackers.
The other options may be mass following, DoS attacks, or the generation of fake information. With the help of machine learning, cybercriminals can reduce costs on these attacks and automate them. The "Automated Crowdturfing Attacks and Defenses in Online Review Systems" research paper, published in September 2017, introduced an example of a system that generates fake reviews on Yelp. The advantage was not just great reviews that can't be detected, but the reviews with better scores compared to ones from a human.
Machine Learning for Cybercrime Automation
Experienced hackers can use machine learning in various areas to automate certain necessary tasks. It's difficult to tell when and what exactly will be automated, but as cybercrime organizations have hundreds of members, different types of software e.g. support portals or support bots, could appear.
As for specific cybercrime, there is a new term - Hivenet - which are smart botnets. The idea is that if botnets are managed manually by cybercriminals, hivenets can have a sort of brain to reach a particular event and, depending on the event, change its behavior. Multiple bots will sit in devices, and, depending on the task, they will decide who will use a victim's resources now. It's like a chain of parasites living in an organism.
Published at DZone with permission of Alexander Polyakov , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.