DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Coding
  3. Languages
  4. Shell Script to Detect if the IP Address Is Googlebot

Shell Script to Detect if the IP Address Is Googlebot

Ever wondered if that IP address you keep seeing is Googlebot? Learn how to make a shell script that can detect if Googlebot is crawling your data.

Mohamed Sanaulla user avatar by
Mohamed Sanaulla
CORE ·
Apr. 08, 17 · Tutorial
Like (0)
Save
Tweet
Share
5.43K Views

Join the DZone community and get the full member experience.

Join For Free

1. Introduction

Google has explained here how to verify if a given IP address belongs to Googlebot. So instead of hard-coding all the IPs which might change eventually, Google has suggested doing a DNS lookup using a host command on Linux. The steps suggested in the article are:

  1. Reverse DNS lookup using the IP to get the domain name.
  2. Check if the domain name contains googlebot.com or google.com. The types of bots and their names can be found here.
  3. Forward DNS lookup using the domain name obtained in step 1 to get the IP and verify that this IP is same as the IP you initially started with.

2. Implementation Approach

And I wanted to check if the IPs ( I had around 45) belonged to google bot. One option was to run host for each of the IPs, as suggested in the above steps, but this was practically not possible and if possible it would be time-consuming. So I came up with a simple shell script to do this job.

  1. Reverse DNS lookup to get the domain name:
    #hostName is the complete domain name
    hostName = host $line | cut -d" " -f 5
  2. So a sample response of host command would be:  246.66.249.66.in-addr.arpadomain name pointer crawl-66-249-66-246.googlebot.comAnd we can extract the domain name using the cut command as shown above. This gives us: hostName = crawl-66-249-66-246.googlebot.com .

  3. Forward DNS lookup to get the IP:
    hostIp=host $hostName | cut -d" " -f 4

    A sample response to the host command, in this case, would be: crawl-66-249-66-246.googlebot.com has address 66.249.66.246. And we extract the IP using the cut command shown above which gives us the  hostIp = 66.249.66.246 .
  4. Verify the domain name to contain the Googlebot and the IP obtained in step 2 is the same as the IP we started with in step 1:
    if [ $line == $hostIp ] && [ $domainName == "googlebot.com" ]
    then
        echo "Googlebot: $hostIp -> $hostName"
    fi

3. Complete Shell Script

Let the IPs be in the file googlebots, for example, and let us add these IPs:

66.249.66.246
66.249.66.97
66.249.64.12

The shell script is given below:

#!/bin/sh
file="googlebots"
while read -r line
do
    hostName=`host $line | cut -d" " -f 5`
    domainName=`echo $hostName | cut -d"." -f2,3`
    #echo $domainName
    #echo "$hostName"
    hostIp=`host $hostName | cut -d" " -f 4`
    #echo "$hostIp"
    if [ $line == $hostIp ] && [ $domainName == "googlebot.com" ]
    then
        echo "Googlebot: $hostIp -> $hostName"
    fi
done < "$file"


Shell script shell

Published at DZone with permission of Mohamed Sanaulla, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • The Enterprise, the Database, the Problem, and the Solution
  • AWS Fargate: Deploying Jakarta EE Applications on Serverless Infrastructures
  • Promises, Thenables, and Lazy-Evaluation: What, Why, How
  • AIOps Being Powered by Robotic Data Automation

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: