Googlebot: SEO Mythbusting

By | August 9, 2019

A lot of confusion revolves around SEO because no one understands how the Googlebot actually works. Hello and welcome to another episode of SEO myth busting. With me today is Suz Hinton from Microsoft. Suz, What do you do at work? And what is your experience with front-end and SEO? So right now I’m doing less front-end these days. I focus more on IOT. So in the time you were front-end developer, yeah I was a friend of development for I think 12 or 13 years and so I got to sort of work on lots of different Contexts in front-end development different websites things like that cool today I wanted to like just address like a bunch of stuff about Google but specifically and nerd out about Googlebot because That was the side of things that I was sort of the most confused about at the time So Googlebot is basically a program that we run that does three things. The first thing is it crawls and it indexes, and then, last but not least, there’s another thing that is not really Googlebot anymore. That is the ranking bits. so we have to basically grab the content from the internet, and then we have to figure out what is this content about, what is the is the stuff that we can put out to users looking for these things. And then last, but not least, is which of the many things that we picked for the index is the best thing for this particular query in this particular time, right? Yeah so but the ranking that the last bit where we like move things around that is Informed by Googlebot that it’s not part of Googlebot Is that because like there’s this bit in the middle of the indexing like the Googlebot is responsible for the indexing Yes and making sure that that content is useful for the ranking engine to kind of Absolutely, you can’t imagine like someone has to in the library Someone has to like figure out what the books are about and I get the index of the bits and a catalog The catalog being our index really and then someone else is using that index to make informed decisions And and like going like here this book is what you’re looking for I’m really glad you use that analogy because I worked in the library for four years. And I was that person, people be like I want Italian cookbooks and I’m like well at 641.5495. You just say If I would come to you as a librarian and ask a very specific question like so what is the best book on Making apple pies really quick Would you be able to like figure out from the index of you probably have lots of cookbooks… We did Yeah, we had a lot but given that I also put lots of books back on the shelf I knew which ones were popular I’ve no idea if we can link this back to Googlebot but it does it’s it’s the yeah It’s pretty much so you have the index that yeah probably doesn’t really change that much unless you add new books to new edition Right exactly Yeah
so you have this index which Googlebot provides you with but then we have the second the librarian the second part that basically based on how the Interactions with the index work figure out which books to recommend to someone asking for it. So that’s that’s it Pretty much the exact same thing there like someone figures out what goes into the catalogue and then someone uses their I love this this Makes total sense to me, but I guess that’s still not necessarily all the answers you need, right? Yeah I just want to know like what does it actually do? Like how often does it crawl sites? Like what does it do when it gets there? Like what is it sort of how is it generally behaving like does it behave like a web browser? Like it was a good question Yeah Generally speaking It behaves a little bit like a browser at least part of it does so the very first step the crawling bit is pretty much browser coming to your page either because we found a link somewhere or you submit a sitemap or There’s something else that basically fed that into our systems You can use search console to give us a hint and ask for reenacting and that triggers a crawl before done that We ask for it to be done and that is perfectly fine But the problem then obviously is is how often do you crawl things? And how much do you have to crawl and how much can the server bear right if you’re on the back-end side? You know that you have a bunch of load and that might not be always the same thing if it’s like Black Friday Then the load is probably higher then on any other day So what Googlebot does is it tries to figure out from what we have in the index already Is that something that looks like we need to check it more often. Does that probably like a newspaper or something got it? Yeah
Or is that something like a retail site that does have offerings that change every couple of weeks? Or even do not change at all because this is actually the site of a museum That changes very rarely like for the for the exhibitions maybe but like a few bits and pieces don’t change that much so we try to like Segregate our index data into something that we call Daily or fresh and that gets crawled relatively Frequently and then it becomes less and less frequent as we discover and if it’s like something that is super spammy or super broken We might not crawl it as often or if you specifically tell us Oh, don’t no do not do not Index this do not put this in the index This is something that I don’t want to Show up in the search results And we don’t come back every day and check right? So you might want to use the reindex feature if that changes you might have a page that you go like No, this shouldn’t be here And then once it has to be there you want to make sure that we are coming back and next thing again So that’s the that’s the browser bit That’s the crawler part, but then a whole slew of stuff happens in between that happening us fetching the content from your server and The index having the data that is then being served and ranked So the first thing is we have to make sure that we discover if you have any other resources on your page Right. The crawling cycle is very important so what we do is the moment we have some HTML from you we check if we have any links in there or images for that matter or video something that we want to want to crawl as well and That feeds right back into the the crawling mechanism. Now if you have a gigantic Retail site. Let’s say Just hypothetically speaking We can’t just like crawl all the pages at once both for our restorative resource constraints. But also we don’t want to overwhelm your service so we basically Try to figure out how much we can put how much strain we can put on your service and how much resources we’ve got available As well and that’s called the crawl budget oftentimes, but it’s pretty tricky to determine so one thing that we do is we crawl a little bit and then basically ramp it up and when we start seeing errors, we Ramp it down a little bit more. So like oh, sorry for that. Oh, So whenever your service serves us 500 errors and there are certain tools in search console that allow you to say like hey Can you can you maybe like chill out a little bit? But generally we don’t try to get all of it at once and then then ramp down We’re trying to like carefully ramp up rent down again ramp up again run down like answer it fluctuates a little bit There’s a lot more detail in there than I was even expecting like I didn’t even know that I guess I never considered that a Googlebot like sort of crawling event could put strain on somebody’s website like That sounds like it’s a lot more common than I even thought it does It does happen Especially if we discover Say a page that has like lots of links to sub pages then all of these go into the crawling queue got it and then you might like these have links to let’s say you have like a 30 different categories of stuff and each of these have A few thousand products and then a few thousand pages of products so we might go like oh cool Crawl and then we might crawl like a few hundred thousand pages and if we don’t Spread that out a little bit. So it’s a weird balance right on one hand If you add a new product you want that to be surfaced in Search as quickly as possible, on the other hand You don’t want us to take all the bandwidth that you serve I mean cloud computing makes that a little less scary I guess but I remember the days I’m not sure if you remember the days But you had to like call someone And they asked you to send a form or fax a form and then like two weeks later you get the confirmation Letter that you server has been stuck. Yes I remember the days when we would have to call and then we would basically pay $200 to have a human like go down the aisles like push the physical reset button on the server. So yeah those those times And then imagine you basically renting five servers somewhere in the data center Yeah, and that taking a week and then we come in and scoop up all your bandwidth Hey, we’re offline today because Google has its crawl day that that’s not what we want Yeah these days it’s more of like hacker news kind of moment waiting. Yeah, exactly So I feel like you have much more Considerate and yeah, we try to not overwhelm anyone and we respect the robots.txt. So that works within the crawl step as well And once we have the content, we can’t put strain on your infrastructure anymore. So that’s fantastic But modern web apps being mostly JavaScript We then put that in a queue. And then once we have it we have the resources to render it We actually use another headless browser kind of thing. We call that the web rendering service then there’s other crawlers as well That might not have the capacity or the need to run JavaScript. This is like social media butts for instance They they come and look for metadata if that meta tag is coming into the JavaScript you usually have a bad time and they’re just like Sorry yeah, so that’s always been a big myth is and I remember when single page applications or SPAs really came into vogue a lot of People were really concerned. There’s a lot of FUD around if Crawlers in general don’t execute JavaScript, then they’re gonna see a blind page and how do you get around that? So so contextually within Googlebot it sounds like Googlebot executes JavaScript Even if it does do it at a later point. Yes, so that’s Good, that’s good. But like is there anything that people need to be aware of beyond just oh well It’ll just run it and then it’ll see exactly the same thing as like a human with a phone or a desktop Let’s see. There’s a bunch of things that you need to be aware So the the most important thing is again, as you said, it’s deferred. It happens at a later point So if you want us to crawl your stuff as quickly as possible That also means we have to wait to find these links that JavaScript injects Wait, they’re basically we crawl we have to wait until javascript is executed Then we get the rendered HTML and then we find the link So there’s a nice little short loop that finds these links very relatively quickly right after crawling will not work right So we will only see the links after we render it and this rendering can take a while because the web is surprisingly big yeah, just a little bit like 30 trillion ducks in 2016 so I’ll say now there’s way more than that. Yes more than that so so Robots.txt is very effective at being able to sort of tell much how to do a certain thing But in this scenario, like how do you tell that like, it’s Googlebot visiting your site? Yes question Yes So as we are basically using a browser in two steps one of the crawling and one is the the actual rendering At both of these moments we do give you the user agent header But basically there’s the string, it’s literally the string Googlebot in there, that’s so right straightforward Yes, and you can actually use that to help with your SPA Performance as well. So okay as you can detect on the server side Oh, this is a Googlebot user agent requesting you might consider sending us a pre-rendered static HTML Version and you can do the same thing for the others like all the other search engines and and social media Bots have a specific string saying that they are a robot Okay, so you can then basically go like oh in this case I’m not I’m not giving you the real deal that the single page app. I’m giving you this HTML that we pre-rendered for you That’s called dynamic rendering we have ducks on that as well. The one thing that still doesn’t quite make sense to me is Does the Googlebot kind of have different contexts like Does it sometimes pretend that it’s sort like I I think of it as this little mythical creature that’s pretending to do certain things so like does it pretend to be on a mobile and then desktop like Are they different sort of I guess like user agents? Even though it still says Googlebot and can you differentiate between them you’re asking great questions because yes, we have different user agents So I’m not sure if you heard about more by first indexing being rolled out and happening I’ve heard that like it’s going to affect like how you’re ranked. Potentially That’s two different things that get conflated so often. So mobile- first indexing is about us discovering your content using a mobile user agent and a mobile viewport. So we are using mobile user agents and and the user agent string says so if it says something about Android and the name And then you’re like aha. So this is the mobile Googlebot you have documentation on there There’s literally a Help Center article that lists all these things So we try to index mobile content to make sure that we have something nice to serve for people who are on mobile But we’re not pretending like random user-agents or anything that we stick to the user agent strings that we have documented as well And that’s more my first indexing where we try to get your mobile content into the index rather than the desktop content, huh? And then there’s mobile readiness or mobile friendliness If your page is mobile-friendly it makes sure that everything is within viewport and you have large enough tap targets and all these kind of lovely things and that just Is a quality indicator we call these signals we have over 200 of them That’s a lot So Googlebot collects all these signals and then stuffs them as metadata into the index And then when ranked we’re like, okay, so this user’s on mobile So maybe this thing that has a really good mobile friendliness Signal attached to it might be a better one and the thing where they have to like pinch zoom all the way out to be Able to read anything and then can’t actually deal with the different links because they’re too close to each other so that’s one of the many it’s not the signal it’s one of the many signals is one of the over 200 signals to to deal with I Had no idea. They were 200 right? That’s like me I know that you’re not allowed to like share what they all out because like there has to be a certain mystique around it because I guess like a lot of SEOs abused that in the past. Yeah, yeah Unfortunately that is a game that is still being played and people are doing like weird stuff to try to game us And the interesting thing with this is with the 200 signals. It’s really hard to say which one gets you would like weights And they keep moving and they keep changing So it’s I love when people are like no let’s do this and then look my my rank changes like yeah for this one query But you lost on all the other queries because you did like really weird and funky stuff for that. So just Build good content for the users and then you’ll be fine I feel like that it feels like less effort as well and like constantly trying to yeah Yeah, but it’s not an easy answer, right? You’ll pay me to make you more successful on on Search engines and I come to you and say like so who are your users and what do they need and how could you express? That so that they know that it’s what they need That’s a hard one because that means I basically bring the ball back to you and you have to think about stuff and figure out strategically whereas if I’m like, okay, I’m just gonna you know Get you get you links, or do some funky tricks here and then you’ll be ranking number one. That’s an easier answer It’s the wrong answer, but it’s the easier answer so people are like and links are the most important metric ever is I’m like, no, we have over 200 and it’s important, but it’s not that important and Chill out everybody, but this still happens. Yeah. I’m so glad it’s better now Like I feel I feel actually we’re at peace in general with SEO as well Suz thank you so much for being with me here and has been a great pleasure, you know Thanks for I like answering all of my weird and wonderful questions about the Googlebot. Did we bust some myths? I feel like we did Fantastic, I think that’s worth a high five. I say Thanks. Thanks join us again for the next episode of SEO myth-busting where Jamie Alberico and I will discuss if JavaScript and SEO can be friends and how to get there.

57 thoughts on “Googlebot: SEO Mythbusting

  1. steve miller Post author

    Great video! thank you! more iced tea? lol

  2. Anders Hansen Post author

    The set hasn't gotten any less weird 🙂

  3. Ambrose Francis Post author

    Google having a guest from Microsoft. Boop Boop!

  4. Sławomir Piwowarczyk Post author

    I really like that ashtray on the desk

  5. CopyPst TV Post author

    Nice video and very helpful. I am working upwork and seo. Some videos help me to make me expert. Thank you all

  6. josh bachynski Post author

    It is more than a little disingenuous, for Google to blame us for having SEO myths, when their PR team has been purposefully hiding and obfuscating the ranking factors for years

  7. Chris Adels Post author

    unfortunately the name is still not quite suitable

  8. Dark Forest Post author

    What was supposed the myth here? This video had no added value to any other video there is on the internet about crawlers… 16 minutes wasted…

  9. david hart Post author

    Google forced us to create amazing content for readers but in the same breath are now showing ads higher and more frequently than organic – you are hypocrites. I suggest users start looking into

  10. Nabil Nawaz Post author

    should we allow the crawling for 'uploads' folder on wordpress websites in robots.txt?

  11. Jamie Foster Post author

    Great video. I'm sharing with my clients, saying, "SEE!? at the 15 min. mark Martin is literally begging us to create good content!

  12. Ryan McCain Post author

    Why does this set look like an underground hatch from the Lost TV show?

  13. Dalibor Danicic Post author

    What about googlebot spoofing? it happens a lot these days, Blackhat SEO’s are abusing your bot and you’re not doing anything about it.

  14. Daniel Cheung Post author

    Very good foundational video. I hope more people will come across this!

  15. James Scott Post author

    This was amazing, thanks! 🙏🏻 The library analogy is perfect. Most people just take for granted when they type something into Google and get the results. But behind the scenes they have no idea of the complexity that has gone into serving them with the right content. All the SEO gurus keep telling us to build links to rank, but the guy in the video says it all… “Just build good content for your users and you’ll be fine!”

    So basically instead of spending endless hours trying to build links, use that time into making the content more better and we be ok??

    But when I search for stuff I still see that’s it’s always the sites who have built thousands of links who rank top, even if their content is mediocre. And the best content with no links sometimes sits on page 2 lol. Unfair.

  16. Hippie Jugend Post author

    If I close my eyes I have no idea which one is speaking

  17. Adrian Diaz Post author

    I found this episode MUCH better than the first one. The first one was SOOO BAD … you could see it from miles away how false and forced it was.

    Come on. You're Google ffs … you should KNOW what GOOD CONTENT is!

    And for God's sake … change the name of the show into maybe:

    "The mysteries of Google" or something like that.

    Naming this show "SEO mythbusting" is like calling a movie "The beauty and the beast" and when you click to watch it … it's acutally an XXX movie instead of Disney movie.😃

    The title and the content … simply dont match.

    This is like that "bait and switch" myth if you know what I mean 😋

  18. Justinas Kundrotas Post author

    Martin, here's a question about Googlebot – does it get document HTTP Headers when that document is blocked via robots.txt disallow directive? I wonder if Googlebot may see and respect "noindex" HTTP header in that case? Thanks!

  19. Ad Min Post author

    Liars cuz the algorithm is aimed to benefit the big companies

  20. Erik Martin Post author

    Much more informative than the previous one.

  21. Michael Köhl Post author

    Video is great.
    Great people, good questions and answers. Nice atmosphere too.

  22. CHRIS PALMER SEO Post author

    Can not wait for this show to get into the SEO part of the mythbusting. { ***Thanks for this show however *** }

  23. Wotan will awake Post author

    How can someone that worked with frontend for 10 YEARS + ask these simple questions?

  24. Ela Iliesi Post author

    Especially liked the last few minutes of the discussion. Great jobs with this videos, keep it up!

  25. Biyo Link Post author

    Türkçeye çevirecel delikanlılarr nerde ?

  26. Search Officials Post author

    Love this video series! It'll make it easier for me to get internal teams and clients to buy into the best practices Google preaches and I evangelize. To be short, the conversation typically goes something this: Empathy is the ultimate strategy! Google's customer is the searcher. Their mission is to direct that searcher to the content that will best answer their query and ideally engages/educates them beyond the initial query. Quality isn't 10,000 links. A few solid, relevant, contextual links and a well built brand that earns trust from their audience is the key to longevity! — Chris

  27. Lost Mamo Post author

    Could you maybe label the episodes 1 and 2 and so on? Thanks!

  28. Venith K Post author

    At first I thought she was looking at a mirror and speaking

  29. Ivo Faase Post author

    Great video. Too bad about the closed captions mistakes, like 'more my first indexing' instead of 'mobile first indexing'

  30. Mark Seah Post author

    if links are not the most important signal, why does Google allow pages to rank high on page 1 over time, as people engage vendors n agencies to do link building / buying for a fee etc? Google doesnt seem to bother whether those links are legitimate?

  31. Roberto Renteria Post author

    Great Video, I still feel this is a smoke cover for end users and not SEO Pros, I am actually sending it to clients as part of educational and presales pieces. The one thing I have noticed is that they come out even more confused than when I first spoke to them. So that only means "UPSELL" Thanks Google.

  32. Oscar Gonzalez Post author

    Hey! Let's talk about the myths that have been created because we don't really tell anybody how anything works. Let's do it with a girl that doesn't really do SEO or has done SEO before. Yeay!

  33. zac Samu Post author

    When i first started with Google Console , I didn't really know what everything did well I still don't but for some crazy reason I requested a indexing of my sitemap, ever since then it's shows that page (link ) has an error? Any ideas what I can do to solve this also I noticed some of the excluded pages are /feed can someone shed some light on this. thank you , we need more videos like these 😁😁😁

  34. Khandar William Post author

    they can just release the code if they want

  35. Shao Chieh Lo Post author

    In this episode , we know that Googlebot don't execute the JS in the first round of crawl, it do it in the later stage.

    That rise a question, why google prefer Jason LD over micro data? My understanding is that Jason is a kind of Javascript, so if they don't execute it in the first round of crawl, wouldn't the site using micro data get their markup seen earlier and easier by Googlebot compared to site use JasonLD?

    Let me know your thought


  36. Mutahir Nazar Post author

    Great video, very informative question and answers

  37. Allan Shaw Post author

    A great video doesn't explain why Google is not doing anything about webspam in the carpet cleaning niche in Australia or other niches. If you google this long-tail keyword "blood carpet cleaning deagon " I know it is not a very competitive keyword. One company has around seven of there sites on the front page. They are all different sites but it doesn't take much investigation to see that they are the same company. They all have the same MO they use a few articles and then use them on hundreds of pages on the site only changing the suburb. The point I'm making is this company and others makes Google look like a fool. Saying build good content then Google doesn't do anything about that bad content.

  38. Roel Comporal Post author

    Hi what happens when you exceed 155 characters for meta description?


Leave a Reply

Your email address will not be published. Required fields are marked *