How to configure crawl settings in WebSite Auditor

By | August 10, 2019


Hello guys! SEO PowerSuite customer care team here and today we’re gonna walk through the project setup process in
WebSite Auditor to show you all the crawler settings available in the tool.
Once you start up the tool, you can click ‘New’ to create a new project and enter
the domain URL or a page URL. Please note that if you enter a specific page URL
the program will then consider it as the project domain so only the pages that
contain the URL in full will end at that project basically what’s what the
crawler does it starts from scanning the URL you’ve entered then follows each and
every internal link to find all the pages that go onto the site and harvest
all the resources present on the pages as well as all they are gone links to
external resources here at the bottom you can enable or disable expert options
to customize the crawler settings according to your knee
for instance with website auditor you can crawl a website exactly as it’s seen
by any search engine but which can be extremely useful to uncover any crawl
and/or indexation issues here in robotics instructions you can leave the
default a super-sweet but to just check the overall instructions that would
apply to any random but you can also choose one of the predefined search
engine BOTS like Googlebot to see how Google sees your website exactly this
can be very helpful to detect the pages that may have been restricted from
indexation by mistake in case you have a website that’s still under development
and is yet restricted for all BOTS you can simply disable the option completely
to collect the pages after the site is crawled you will be able to revise all
the instructions that are associated with specific pages and apply to that
certain bar in the rubbish instructions column and the program reports that
based on the robots.txt file as well as page level noindex tags and X robot’s
directives from HTTP response headers next crone as a specific user agent is
necessary in most cases however it can be used in case the content of your
website depends on the user agent it is served to it can also be used to crawl
the mobile version of the website based on the mobile user agent the one you can
also choose from the predefined list next you can limit the scan depth it
will control how many clicks dip the crawler will dive its time saver in case
the website your coin is huge and you’re only interested in top-level pages or
core websites structure the search for our fan pages option will
allow you to find all the pages that are unlinked from the website means have no
internal links leading to them you can set the program to search for them in
google index sitemap or both sources so this can also help you to detect valid
pages that are not linked to the website by a mistake and you will be able to
find those pages by the corresponding orphan page tag in the text column that
will also mention the source they have been found in the next filtering section
enables you to filter the pages and resources that should lend in your
project please keep in mind that these options won’t really affect the
estimated crawling time the program will need to crawl the website in full as the
pages that don’t match the conditions may contain links to pages that do and
need to be collected still you can set the program to only collect the pages
that contain specific words or symbols in the URL the delimiter would be a
single space you can also exclude the page is the same way and control which
types of resources should be gathered to a project along with the pages and here
in the next field you can specify which extensions should be considered
considered as web page extensions in the speech section you can limit the number
of requests sent to website per second by default the speed is unlimited but
you may need to use the option when you’re crawling a slower or highly
protected website and the server starts blocking the excessive queries to
prevent heavy load 5 requests per one second is the recommended setup for such
cases next in the URL parameters section you can control how the program treats
the dynamic URLs if you set it to ignore all parameters it will treat all similar
pages with with different parameters in the URLs as the same page you can also
set set it to only ignore certain parameters here you can also download
the list of parameter used on your website from your google
search console account or disable the option completely to collect each and
every dynamic URL here in the last section of Advanced Options you can set
the program to crawl subdomains so that it will collect the pages that belong to
subdomains to the same project in case you have any dynamically generated
content on your website meaning it uses scripts you will need to enable the
option to enable to execute JavaScript to collect the pages and gather their
content if your site uses different site versions depending on user’s location
you can crawl a certain regional version only by specifying the accept language
header and in case your site or some of its pages are restricted with a password
you can enable the program to crawl them by entering your username and password
here and once you hit finish the program will go on Chrome the website using the
settings you have just adjusted and for any existing project the same crawler
settings are available here under preferences crawler settings they can be
readjusted anytime before updating any page and can also be accessed once you
are rebuilding a project right here so that’s it
happy crawling

Leave a Reply

Your email address will not be published. Required fields are marked *