Search engines are used not only by people looking for information.
Bots sometimes turn to search engines too — to research keywords, gather data about specific page positions, or click as many times as possible on contextual ads or search results to outpace competitors.
Such bots use search engine resources and can also gather information about users that search engines use to build query suggestions and ranking algorithms.
Google long ago asked webmasters not to use automated programs that check positions or submit pages. “Such programs overload servers and violate search engine usage rules.”
Because of such situations, many search engines have developed methods to distinguish bot queries from human queries. As a search engine tracks queries, it collects a lot of user information. Beyond keywords, this information can include metadata — query time, IP address, search query chains, results pages.
To determine whether a query belongs to a user or a bot, the search engine uses two factor groups: behavioural characteristics and physical query parameters.
One way to know who issued a query is to track some physical query characteristics. Physical parameters include query volume and location. Users can’t make a large number of queries in a short time, unlike bots. Also, one user can’t query from various locations on the planet simultaneously or at short intervals. So the search engine detects botnets or a person using an anonymous search tool but who hasn’t disabled cookies. Physical parameters can identify automated queries. However, some automated queries mimic normal user queries. To distinguish such automated queries, behavioural characteristics exist.
These include:
- CTR (clicks on search results are tracked);
- search order (bots sometimes search alphabetically);
- use of spam words, queries to adult topics;
- many words in a query, especially across several consecutive queries;
- query periodicity;
- use of query operators (bots often use operators);
- category limitedness (a bot’s whole query chain belongs to one or a few narrow categories).
When a query series seems suspicious, the search engine asks the user to answer a question or solve a CAPTCHA.