How to Prune your Website Content in an SEO Process

By | September 3, 2019

(upbeat music) ♪ Crawling Mondays! ♪ – In today’s edition of Crawling Mondays, I am going to focus on a
very very important topic, especially when doing an SEO
process for big websites. All websites too. Content pruning. Content pruning in SEO,
well, you will see that there is a question that
is asked again and again, and actually I got the idea for this topic from Cyrus Shepard. Heads up to Cyrus and if
you’re still not following him, you should definitely do it. Over Twitter, he’s always
sharing really really really good SEO stuff,
and he did this whole, “When pruning old content, is
it better to Noindex/Remove,” “301 Redirect to a relevant, better page?” Assuming that the content
cannot be refreshed/improved.” We should say, “Okay,
if it is old content” “that at one point we wrote,” “should it be worthy to update, right?” And, well, not necessarily,
maybe it was very very long ago, maybe we have another page
that is very very similar, targeting the same queries, that might be much more worthy to keep. So there are so many different
criterias and factors here, that the usual answer is, “it depends.” And, as with everything with SEO. And here my answer is, if you see here, there were many SEOs who reply, “If that old content
is not driving value,” “traffic/ranking/links,
and it’s not worthy” “to improve/update, not that relevant,” “not good potential topic, remove.” “But if driving traffic/rankings/links,” “or could drive value if improved,” “then keep/improve/update or
redirect to a new version.” Realistically, a content pruning process is something that will
usually drive value, improve the perception
from a content quality, content relevance
perspective of a website, will also end up positively impacting the website crawl budget, in
case we have a very very large, old website, that Google
spends a lot of time crawling pages that are not really meant to be this highly ranked anymore. It will end up having many many
effects in our SEO process. So, today I would like to share with you a little flowchart
that I actually did, to try to simplify all of
this if thens, and it depends. Facilitate for you to go
through the different type of scenarios and outcomes that
this question usually has. So, hopefully, it will be
highly valuable for you, as I expect it is, and I
will be actually sharing this flowchart also over
my blog, so you can, you can share it, you can save it. And I will be also linking to blog posts from the video description,
so it is far easier for you to get it too, so, don’t worry. And I will also be sharing
how to actually go through all of the steps that I
share here, with the help of a few tools, gathering
the data from all of these metrics that we are
going to use as criteria, and by using certain
crawlers in this case, for example I will be using DeepCrawl, and Screaming Frog too, and
using many other metrics that will be, we will be gathering to do the best possible
content assessment. To identify if we should prune or optimize a page from our website. Let’s take a look. Let’s start replying the questions and going through the flowchart. Should you prune or optimize
a page from your website? This is the first question. Is the content driving SEO,
organic traffic, ranking, links, social, referral or overall
user engagement value? Right, so this is the first
question that we should answer. From all of these pages, how many of this? Like, which of those actually
delivering any value, not only from an SEO perspective, because they are ranking, getting traffic, or attracting links, but
that have an important role, delivering any type of value
from a user perspective. If they are not driving value,
then we go to next step. Is the content of a useful, relevant topic with search potential,
worthy to be improved? Maybe they are not driving
any value right now, but is it a very worthy topic, that has a good search potential? If not, again, if the
topic is not good, somehow we ended up publishing something that is not necessarily that
well relevant at this point. Does the content have a
business/customer/legal-related goal that requires it to be kept in the site? Like, for example, the legal notice, or the privacy condition. If not, right, no, I don’t even know why I have this content here,
because it has nothing to do, not potential topic, but
not legal requirement, nothing to do with the
business or legal side, operation side of the website, so in this case, yes, we
can eliminate the page. We will show 410 http status, and delete any internal
links going towards it, and take it out, also
from the XML Sitemaps. Because this content is just
consuming our whole budget. On the other hand, we may have found that instead of replying with a no here, we reply with a yes, yes it
is content, a legal notice, that maybe is not necessarily
as specific from a business perspective, etc, but it’s important from a legal criteria. The next question will be, is it required that the page content is
findable to keep it indexable? Right, because maybe I
should have it on my website, because it fulfills some
role, some type of role, from a legal perspective,
but it’s not necessarily, it’s not required that
it’s actually findable by search engine. The users won’t actually
be looking to search for this type of content, and if not, then it’s okay to noindex this page. We can add a meta robots
noindex follow to the page, and nofollow any links going towards it, because realistically, we don’t want to assign any value to these pages. That if these pages are not
fulfilling any business, important business role that
we want, that is findable, and they are not targeting
any important query, also from a business perspective, or important in the customer journey, so we very well can noindex. And after it’s new status is updated, and to changing noindex
this page which is not in index anymore, we
can very well block it with the route.txt, so, Google doesn’t, or search engines in general, they don’t spend time
going through these pages, in case there are many,
right, maybe this is not even worth it, if there is
a very small number of them, so it’s not necessarily a huge
consumption of crawl budget. However, if we are in a
very specific scenario that there is a high number of this, we might just stop crawling them to stop consuming our crawling budget. Then, on the other hand,
we might answer with yes. The content has a
business/customer/legal-related goal, and yes, it is required
that it is findable. Right, so we should keep it
indexable, within our website. If that’s the case, then we come here, and we ask the next question. Is the content of the page
to be kept too poor/thin, not updated, not relevant enough, or not formatted to
satisfy the user needs? So even if it is not a page that is meant to be ranking for a top
query from the business side of our website, or from a
product service perspective, we want it to fulfill very well it’s role. If it is too poor, too
thin, then it’s important to improve page content
by updating, expanding, reformatting the information
to satisfy user’s needs, and make the most out of
the SERPs features, also, if that’s a possibility,
and then of course, validate that page to be kept
is correctly crawlable, indexable, internally linked,
and included in XML sitemaps. Since it is meant to be findable, it is meant to be indexable,
it is meant to be searched, we should deliver the best
possible experience, right? In case that we find that
the content is not poor, not thin, is good enough, updated enough, is relevant enough, from
a content perspective, no, then we should only
validate, verify in this case, that it’s technically correctly configured within our website. Then we go to the other
side of the flowchart. Is the SEO content actually
driving value of any kind? And the answer is yes. So, is the page content
too similar, offering the same information and/or targeting the same/similar queries than other pages with SEO/traffic/user value? If not, this page is targeting
something very unique, featuring very unique
information, content, targeting a very unique query. If not, then we go directly and ask, is the content of the
page too poor or thin, can it be updated/improved? And we follow the same steps. If yes, we improve the content, we expand the content accordingly, to make sure that provides
the best possible experience to the user, and fulfills the user needs. And validate that it’s
correctly configured, from a technical perspective. That it’s correctly included within the navigation, XML sitemaps, etc. And if not, we just validate
the technical aspect, the technical configuration of that. If the page is actually
targeting very similar terms than others they’re also indexable, they’re also driving value. Then we should ask if the
page has a specific role than the others. If this page individually is
targeting something unique, a relevant unique popular
queries to make it worthy to differentiate and
optimize it’s content? Maybe we can see that the content at this point is very very similar, but realistically, it
could be differentiated because the nature of the
pages are actually different, or can be differentiated very easily. This other page can be
tweaked to be actually improve and differentiate it well enough to target another topic
that is also, might be also very important
for our website, right? If that is the case, then it is worthy to differentiate the content
of these very similar pages, and optimize the page to target
it’s own specific queries, and update it, format it in a way that will actually fulfill the user needs, and to make the most out
of the SERPs features. And once that we do this,
we will again verify that it’s technically
correctly configured again, that it’s correctly crawlable, indexable, internally linked and include
it in the XML sitemaps. If we identify, however, that the page doesn’t have a specific role within the website, so it is
featuring very similar content, or duplicate content, duplicate
information at this point, it doesn’t make sense to differentiate it, because it actually has the same role than another already very popular, another page that is also driving value. Then the question that we
ask here, is it necessary to keep the page enabled as it fulfills a specific
functionality/purpose towards the user experience
or business within the site? And this is the type of
role that, for example, some filters or faucets
might have, if these are just to change the number of items
that are shown in a page, they will generate a new URL,
doesn’t necessarily connect with an actual query need. But, it does play a role
with the user experience of our website, so we need to keep it enabled as a specific URL, and that is actually accessible
from a user perspective. But we don’t want to necessarily want to go into this other side, because it’s not targeting any unique, specific queries that are worthy to rank for. So in this particular scenario, we will canonicalize these pages to the original page URL
version that we identified. This is usually the one that
is more prominently linked from within the website navigation, bringing more SEO
traffic/user value to you, that has a more prominent role also in the site web organization. So we will canonicalize
back to this original URL that we set, and also linked accordingly. We are going to nofollow
the links going towards these non-canonical URLs,
because they are not worthy to get any value, because
they are not the ones meant to be ranked with the original
ones, as much as possible. Eliminate them from the XML sitemaps. And, once that this is done then we should ask again the question, is the content of the page
that is kept, in this case, not the one of the pages
are canonicalizing, but the ones that we’re,
they’re canonicalizing too, the original canonical URL. Is the content of the page
to be kept too poor/thin, not updated, not relevant enough, or not formatted to
satisfy the user needs? If yes, then we should
again improve and expand, update the content to
fulfill the user needs, and to make the most out
of those features, etc, and then again validate from
a technical perspective, that the page is correctly crawlable, indexable, internally linked, included in XML sitemaps, and so on. On the other hand, if
the answer here is no, if it is not necessary
to keep the page enabled, it doesn’t fulfill
specific functionality that is really required to
be kept in the website, then we can consolidate
these pages to a similar one, and we can redirect
them to the, to the one that we are going to select as the original, new, updated
one that we want to keep. So, consolidate the similar
pages by 301 redirecting them to a single URL version to be kept. The one that usually drive
the highest SEO value, user engagement value, showing
the same/similar content while fulfilling the same/similar role from a topical, functionality, business and user experience perspective, right? And we should replace and eliminate any internal links going to these URLs that are going to be redirecting. The same, eliminate them
from the XML sitemaps. And, again, once that we do this, with the content that is actually kept, with the page where these
other URLs are redirecting to, we then should ask, is
the content of the page that is kept, the final
destination of this redirect, is too poor/thin, not
updated, not relevant enough, not formatted to satisfy the user needs? If so, we improve again,
and we then verify, validate that all the technical
configuration is okay. So as you can see, I think that most of the most common SEO scenarios in this content pruning
journeys are included in this flowchart, so
we can act accordingly. Of course, as I mentioned here in the, in the note below, remember that there might be exceptions to this criteria, as
happens with everything. Apply with care and
after validating within your own website/business/goals. And this is just the first version. If you see that there is anything here that is actually worthy to be included, an important SEO scenario
that I haven’t included, please let me know, I’ll be
more than happy to include it. And I’m going to show you now how you can actually gather all this data, and reply these questions very easily, thanks to a few tools. Of course, I want to show you how you can answer these
questions from the flowchart, in a versatile way, by
using SEO crawlers to crawl our website and integrate
the additional metrics, that will tell us what’s the value that they bring in to our website. So for this we can use
crawlers like Screaming Frog, as I mentioned before, or
Deep Crawl, that I have here. And then, of course, compliment this data with the one that we can obtain from the Google Search
Console, for example, with a coverage report here we can obtain which are those figures
that have been excluded from the index, right, why
they have been excluded if they are indexable. These are pages that are
important for us to check, and of course, we can
gather data like this from the Google Search Console
and do specific list crawls, with any of the SEO crawlers
to validate the specific pages. It is also recommended
that we do a full crawl, from all of our website content, to identify and assess
all of the crawlable URLs that we have on our site. This is what I have done
already here with remoters, which is my website project here, and as you can see, we can easily integrate external data, metrics, that will allow us to answer the questions from the flowchart. So, from a traffic
perspective, Google Analytics, Google Search Console
to be able to identify what are the rankings, the
core rankings of these pages, the impressions that
they are getting, etc. Then Ahrefs, for the link-related metrics, to understand which of these
pages are actually being also externally linked. Not only with Screaming Frog but also I’m going to show you with Deep Crawl too, we can integrate data from
the Google Search Console, Google Analytics, the back links, data from the sitemaps too, to
identify if there are gaps too, and assess these pages again, to prioritize the
evaluation of these pages, and identify these lower performers. Is that, if we go here, for
example, with Deep Crawl, or Screaming Frog, or whatever
SEO crawler that you use, and look for the thin pages, those pages are showing
less content in general. And we can include and identify which from these pages have actually
any organic search traffic. It is with duplicate content,
or very thin content, and identify which are the
ones driving less traffic, or with zero links, for example, that are indexable at the same time. Another important aspect is that any of these tools will allow us to export these pages, along all of the metrics that we
have gathered for them. For example, here I have
the export from Deep Crawl, for example I can start
and select these pages and tell me all of these
pages are actually indexable, and that are also duplicate, so in this case none is
duplicate thankfully. And that have the lowest
content to HTML ratio, or that have the lowest
text count in any of them. And exactly the same with the data that we can export from Screaming Frog. From all of these pages
that I have exported, that I have crawled, which of
these are actually indexable? And which of these have
the lowest word count? So here I will reorganize
a bit, like this, and tell me of which that are indexable and have a very low word count, which are the ones
getting zero GA sessions, zero traffic, and, and
zero links because again, I have the Google Analytics data here, and I, and I have the
Ahrefs backlinks data here, zero links. So I can see all, despite being indexable, have very little content, may have content duplication issues, canonicalization issues, and
are delivering zero value from an organic traffic perspective, and ranking perspective. And start filtering, and at
the end, we can end up having, with this very specific set of pages that we can start here,
like the worst performance. We can see which are this
very low performance, and unfortunately these
are the jobs type of pages, that have indeed very little content, because they have been generated
in a very outmoded way. Very little information, take a look. So these are the lowest for
performance, and it makes sense. So in this case, I should
decide, like, what do I do with this lower performance? And we’ll follow the flowchart
then, is it worthy for me? Yeah, they are not driving
much value right now, and the content is also
too similar with the ones with the original pages, unfortunately. Yes, they do have a specific
role within the website, and they should target individual
queries, should be worthy to target, and so ideally in this case, I should differentiate, include, and feature more content to
fulfill this type of query that we want to address with these pages. Hopefully with these couple
of tools and integrations, you have been able to see how
to gather this information, and prioritize which should be these pages that you should be improving,
and which should be those that it’s not worthy for you to keep, and you can eliminate, in
a very straightforward way. Like this, following this flowchart rules. And of course, if you have
any questions and doubts, if you want me to include
any other particular type of use case that you think
might be valuable, just let me know, I’ll be more than happy to hear your thought about this. Leave your comment, don’t
forget, or tweet at me at Aleyda, or retweet or follow me, in case you still don’t have
followed me for any reason. I’ll be more than happy to reply. Thank you very much, have a good day. (upbeat rock music) ♪ Crawling Mondays! ♪

3 thoughts on “How to Prune your Website Content in an SEO Process

  1. Usman Khurshid Post author

    Very helpful Aleyda. I was using ahrefs content audit method which is a far simpler approach than yours but I think we should be very very careful when auditing content issues and definitely go with a more advanced setup like yours to audit content.

    BTW, here's the ahrefs video I was talking about:

  2. Matija Zajšek Post author

    Thank you, Aleyda, for this video. I do have a question regarding this topic. You are saying if content is similar on many different url's, it should be consolidated or deleted. What if consolidated topic destroy ux experience? After all, if I consolidate my articles, I will go far beyond 40.000 words… No one is going to read it. To delete it, also doesn't make sense to me, since each url brings some clicks.


Leave a Reply

Your email address will not be published. Required fields are marked *