Types of Bot Traffic on a Public Internet Website
In this article, we are going to explore the different types of bot traffic that you might encounter on a public internet website and how you might handle them.
Join the DZone community and get the full member experience.
Join For FreeA bot is a software application that uses the internet in order to perform certain functions. However, not all bot traffic is the same and, in reality, there are a number of different ways in which bot traffic can interfere with the operation of a public internet website. In this article, we are going to explore the different types of bot traffic that you might encounter on a public internet website and how you might handle them.
One of the main reasons that bot traffic is a serious consideration for owners of public internet websites is that it can dramatically affect the performance of the website. If not addressed, bot traffic can interfere with the website's ability to provide a good service to genuine users and, ultimately, have a negative impact on the website's performance and reputation. Website owners also have to pay for the hosting of bot traffic, so the financial costs can also be significant. For example, the website may use a web hosting company that charges based on the amount of data that is transmitted to serve the website to visitors.
By generating large amounts of artificial traffic, bots are consuming the owner's internet allowance and running up various types of costs, such as data usage and possibly even 'pay per click' costs. By no means is it unusual for a public site on the internet to receive bot traffic. However, it can be difficult to distinguish between bot visitors and genuine users.
As we will explore later in this article, one technique to filter out bot traffic is to analyze the features of the traffic against the known behaviors and characteristics of different types of bots.
Crawler Bots
Crawler bots, which are sometimes called spider bots, are the most common type of bot. Search engines like Google and Yahoo use their own crawler bots to build and maintain their indices, but there are countless number of custom-designed crawler bots out there. Each of the custom-designed crawler bots is usually created for a specific purpose. For example, a competitor might use a custom-designed crawler bot to track the pricing information on a commercial website and constantly download data from this website so as to monitor the pricing change.
Similarly, a data analysis company can also use a similar custom-designed bot to perform data analysis on a daily basis by seeking some useful information on the internet, such as news or weather conditions. Crawler bots are very important to the operation of the internet. However, they can consume a lot of resources on the internet. For instance, frequent visits to a website from many crawler bots can slow down the response time of the website, and the internet connection of the website can become slow when it is trying to respond to the requests sent from the crawler bots. Besides, the operation of a website can be affected if some of the inbound links to this website are coming from sources that are not friendly, such as referral spam from some crawler bots.
Usually, crawler bots make requests to a website very rapidly and they can make more than ten times of requests if compared to the number of requests made by human users. The server of the website can detect such a high request rate and it can block that IP address automatically. However, it is important to note that a correct and working URL should be provided to the user for certain interactive Facebook tags, such as "share" buttons so that the user can always be directed to the correct URL when the user clicks on the button. Also, the website can provide "robots.txt" to direct the custom-designed bot on when to stop crawling the website. The server can generate this "robots.txt" automatically to meet the requirements of the robot exclusion protocol, which defines how to use the robots.txt file to give instructions to well-behaving crawler bots.
By defining the custom-designed crawler bot's behavior in "robots.txt", the web server can prevent this bot from consuming too much of the server's resources, hence making the website more responsive to the user's requests. Web administrators can always use the information of the bot's user agent string and IP address to identify the bots and decide whether the requests issued to the website are from friends or foes so that certain preventive measures can be taken if necessary.
Malicious Bots
With malicious bots, the main focus is to leverage the website's resources to carry out an attack. One of the most common types of malicious bot that is used is what's called a DDoS bot, or a distributed denial of service bot. What this does is create a significant amount of junk traffic by getting a huge number of 'slave' systems, or 'zombies', to request access to the website in question, often from lots of different places around the world. This traffic can be difficult to filter, as it's usually very similar in makeup to ordinary user requests — just with a hugely multiplied number of hits. This is incredibly effective at overloading a server, as it is usually geared up to provide content to a certain number of legitimate users at once, spread across the globe.
When an excessive number of requests are generated by a botnet, which is the collective name for a group of 'slave' systems, then legitimate traffic is unable to be fulfilled and the server becomes overwhelmed, slowing down or preventing all requests. Another type of malicious bot is a spam bot. These are used to propagate illegal nonsense text or advertising material onto the internet with the aim of driving traffic to the originator. Spam bots launch their payloads onto forums, blogs, web forms, and wikis, as these generally allow unfiltered content. This allows for the spread of materials which might not usually make it through traditional search engines and web filtering. This sort of bot is easy to counter, as servers can ask users to complete a simple test to confirm that a human is sending the content. However, when used as part of a larger and more complex attack that leverages the effects of spam and other types of bot-driven attacks, the overall challenge to keep a website secure, increases.
These sorts of attacks are sometimes called 'compound attacks', as they are ones that are made up of multiple different stages, each of which might be relatively minor on its own, but together are much more damaging than the sum of their parts.
Scraping Bots
So-called "scraping bots" use web scraping, a technique in which a computer program extracts data from output generated by another computer program. When a user views a webpage, that webpage's data is typically sent from the website's server to the user's browser, where it is rendered on the screen. Scraping bots bypass the need for a user to view the data in a web browser; the bot sends a request to the website's server, which then sends the data directly back to the bot.
The bot can then navigate and extract what it needs from the page on its own, rather than relying on a user to do so. The ability of bots to scrape data with relative ease — and without the website owner's permission — leads to significant issues for the targets of such bots. In general, website administrators prohibit data scraping in their website's terms of service.
Moreover, some states and countries have computer crime laws that prohibit unauthorized access to a computer system or the unauthorized gathering of certain data types. However, a bot operator must actually violate these laws to be held criminally liable, which is a warning to many a significant impediment to recourse. An operator of a scraping bot can be criminally liable under the federal Computer Fraud and Abuse Act or an equivalent state law providing the operator knew that access to the data was not authorized or that using the bot violated the terms of service on the website being scraped. However, both state and federal law enforcement actions are historically rare, particularly in cases where consumer data rather than trade secrets or government data is targeted.
This means most website owners who bring suit against scraping bot operators will do so in civil court based on a number of different legal theories. Given that the targets of scraping bots have a variety of recourses, the operators of such bots should proceed with caution.
Conclusion
After going through the different types of bot traffic introduced in the previous sections, I think it is not difficult to understand that bot traffic is becoming a more and more severe problem faced by public internet websites. Business competition is one of the key drivers that drive various individuals with different business orientations to use bot traffic, which creates unfair competition. S
ome of the websites use bot traffic for commercial purposes like pulling data from other websites and using the information on their own websites. Some individuals create bot traffic for their self-interests and gains, like spying and click fraud.
I hope this article provides a valuable source to raise your awareness and understanding of this growing and complex online issue and the impacts of such issues and helps to reduce the chance of being victims of these bot traffics. Bot traffic is advantageous in some cases, but it becomes harmful when it impacts society and businesses. And again, the importance of security for websites and the education of the public to tackle bot traffic becomes self-evident. Through public awareness and a tight legal framework, I believe one day the harmful bot traffic will become history.
Opinions expressed by DZone contributors are their own.
Comments