Lesson 13: What is crawlability? – SEO for beginners training

By | November 8, 2019


As mentioned in the first module,
a search engine like Google uses crawlers, also named spiders, robots or simply bots,
to follow links. When the crawler finds your site, it will start to read posts and pages
on your site. The crawler will read the content
of those posts and pages and will add the HTML version
of these pages to a gigantic database, called the index. This index is updated every time
the crawler visits your website and finds a new or revised version of it. Depending on how important Google
deems your site and the amount of changes
you make on your website, the crawler comes around
more or less often. But what exactly is crawlability then? Well, crawlability stands for the possibilities
Google has to crawl your website. These possibilities can be restricted
in a number of ways by blocking the crawler from your website. If your website or a page on your website
is blocked, you’re saying to Google’s crawler:
“do not come here”. Your site or the respective page won’t turn up in the search results
in most of these cases. There are a few things
that could prevent Google from crawling or indexing your website. The first method. You can use a .txt file (a text file)
named robots.txt to block Google from crawling that page. Before a search engine spider crawls
any page it hasn’t encountered before, it will open the robots.txt file
for that site. The robots.txt file will tell the search engine
which URLs on that site it’s allowed to visit. Using the robots.txt file you can tell a spider
where it cannot go on your site. You’re not able to forbid a search engine
from showing a URL in the search results. This means that not allowing
a search engine to crawl a URL, called “blocking” it, does not mean that URL will not show up
in the search results. If the search engine finds enough links
to that URL, it will include it,
it will just not know what’s on that page. The second method. You can use HTTP headers to prevent
Google from crawling and indexing a page. This HTTP header contains a status code. If this status code says
that a page doesn’t exist, Google won’t crawl that page. There are several codes
with different meanings. If the status code is, for example, 200, the page exists
and Google can crawl your page. If the status code is 307, that page has been redirected
to another URL and Google won’t crawl the current URL. The third method. You can use robots meta tags to block
Google from indexing that page. Robots meta tags are short pieces of code
which tell Google what it can and can’t do. We won’t go into the details,
but let’s explore the options. There are several robot meta tag values. To prevent Google from adding that page
to its index, you can use the noindex value. Google will crawl that page,
but won’t add it to its index. The opposite value of noindex is index, which can be used to tell Google
it can add the page to it’s index. Another useful robots meta tag
is the nofollow value. If you have been paying attention, you already know that a crawler
follows links on a page. The nofollow value tells the crawler to not follow any links
on this specific page at all. As you may have guessed, the opposite of the nofollow value
is the follow value. You don’t have to manually set the index
or the follow values as they are the default
for any page the crawler will encounter. These methods are actually numbered
in the way that Google will crawl your site. Let’s run through the life of a crawler: Our crawler finds a link to a page
on your site. First it will check the robots.txt
to verify whether it’s allowed on that page. If the crawler is allowed by the robots.txt
to crawl your site, it will check the HTTP header
of the page for a status code. If the status code is 200, everything is OK
and the crawler will crawl the entire page. Lastly, it will check the robots meta tag. If the robots meta tag allows for indexing,
it will add the page to Google’s search index and the page can be found
when you search in Google. You can learn more
about the technical aspects of crawlability, and how to use them,
in our technical SEO course.

Leave a Reply

Your email address will not be published. Required fields are marked *