State of the Index 2009

By | October 3, 2019

>> CUTTS: Okay. I made a promise that whenever
I give a presentation at a conference, I would try to recreate it. So, today, I’m recreating
this State of the Index talk that I gave at PubCon in November 2009. So, the way that
I decided to structure this talk was to say, “Okay, what has Google done for users, web
developers, and webmasters lately? Have we communicated? What services that we put out
there?” Just sort of showing that in the last year there has been a lot of stuff that Google’s
done which hopefully helps users and webmasters or web developers. So, starting out, what
has Google done for users lately? Well, for an average user I think Chrome, Android, and
Wave are all really interesting. They all have very large open-source components. So,
you know, if you think that Chrome talks back to Google, which really it doesn’t and you
don’t need to worry about it, you could pull down the open-source version, Chromium, compile
it and surf away, and not have to worry about it, because you can see everything in the
open-source code, which is really, really nice. But I wanted to tilt things even what
have we done for users towards the webmaster’s side of things. So, this talk was given in
Las Vegas. So I talked about the Music OneBox which we had just launched. So you can search
for an artist and a song name. So, for example, Sheryl Crow, Leaving Las Vegas. And you can
not only play her song and hear the music right there, but you can also buy it, so for
99 cents or a very reasonable price, you can buy the MP3 which is really quite pretty handy.
One feature that the Music OneBox has that I don’t think anybody else has that I know
of is you don’t just have to search for artist name and song or the song, you can also type
in lyrics. So, for example, “Lady Luck please let the dice stay hot,” and that brings back
“Viva Las Vegas” by Elvis Presley. So, if you’ve heard a song on the radio and you don’t
know who’s singing it, you can still probably type in a few words and get back to that song.
So you can listen to it or buy it. So, we’re going along with the theme with what is Google
done for user. But, realistically, you got the tunes on, what are you interested in now?
Well, as a user maybe you want to do some key word research. So, over the summer, we
introduced a product called Google Squared which is really pretty fun. A lot of people
have played with it, but not everybody has realized how deep it is. You can type almost
anything into it. So I showed an example where you typed in Las Vegas Shows, and you got
Bette Midler, Chris Angel, Rita Rudner, stuff that seems like it should be relatively common
sense. You know, like, “Oh, okay, what so great about that?” But you can type anything
in to Google Squared, and you’ll usually get some pretty interesting or reasonable results.
So if you’re just coming up blank and you’re trying to do some brainstorming–I showed
an example of social networking sites–and what I did is I took all the standard stuff
like Facebook and you know, MySpace out of it. And what was left was myYearbook, Skyrock,
Netlog, MEETin, Essembly, all these sort of really almost niche sites that appeal to young
people or to specific languages. And I asked the audience, “Okay, how many of you are members
of any of these social networks?” and almost no one raised their hand, of course. And so,
I said, “Look, instead of trying to chase the market in Facebook, you could establish
a valid presence on some of these social networks and participate in that community. And it
might be a little bit easier to get attention over there than in some of the louder, crowded,
noisy place where everybody is.” So that’s social networking sites, that’s one where
you could use it. But Google Squared, you can type in almost anything, and you can type
in suggestions and get a lot of good feedback on, in terms of brainstorming or keyword research.
So speaking of social networking sites, the next thing that I mentioned that a lot of
users enjoyed that we’ve done in the last year’s Google Social Search. So I showed an
example where I search for PubCon and I got really good posts. Not just like, you know,
two days ago but posts that had stood the test of time. So, Michael Grey was talking
about how PubCon is like Star Wars, which is a really fun, entertaining post. And that’s
not just stuff from the web, it’s stuff from the Web that’s public but has also involved
in your social circle. So, if you click on for example results from people in your social
circle, you’ll get to this sort of tool belt on the left-hand side where you can slice
and dice your search results. And one of the things that’s shown is Social. So you can
do a search and click on that tool belt, you can go to Social. And the example that I showed
was meta tags, and I got all the people who had done public blog posts or talked on Twitter
about meta tags, and it was really pretty interesting. One feature that I talked about
that not everybody knows about it on social search is that if you open up this one, this
tool belt mode where you say Show Options up at the top above the Search Results. Suppose
you do a search like meta tags, you’ll see the people who are most relevant who have
written about meta tags in the past. And the people who are shown will change depending
on what you search. So if I search for Podcasts, I’d probably get Leo Laporte. But if I search
for meta tags, you know, I get Jennifer Slegg or Eric Goldman or Danny Sullivan. And so,
it’s pretty neat to see how different searches will bring up different people who are experts
in your social circle. So, I kind of wanted to push it a little bit, because whenever
I talked to the social search people before I left, they said, “Hey, we will have query
capacity. Tell them to sign up for it.” So the simple way to sign up for Google Social
Search is, first, it helps if you have a Google profile because then we know from your account
what are the different services like Twitter or FriendFeed that you use. Add the links
to your Google profile to sort of point those services. It could be your blog or it could
be Twitter, and then you have to opt-in. You go to, and say “Yes,
I’d like to be on the Social Search.” And then, whenever you search you just have to
be signed in so we know it’s you, but we’ll surface people when we think is relevant from
your social circle with the public stuff that they have said. So, it’s very fun. I was really
impressed with the quality of the stuff that it surfaced. It sort of surprised me that
it was a really nice blend of both social but also relevant. So, continuing on with
what has Google done for users?” I talked about Show Options, which internally at Google
we call Google Tool Belt, because it’s a nice little tool belt of different ways to slice
and dice your search results. So, if you click on Show Options above your search results,
there are all these great ways where you can say, “Okay. I searched for PubCon, but show
me mentions of PubCon within the last 24 hours.” And in fact, you can say, “Sort by date,”
which is really handy. You want to find out who are the people who have mentioned Google
and tell me the most recent blog post or the most recent Web pages that we’ve found. So,
it can be a good way to monitor repetition. We also have the ability to search by date
range, so you can say, “Okay, show me all the mentions of, you know, Barack Obama, but
show me from 2002 to 2006.” So you don’t have to get stuff from the presidential election,
you can get from when he was a senator. So that’s a really helpful way to do power searches,
and a lot of people appreciate that. One last fun thing that is it the tool belt on this
side on the Search Results is what we call Wonder Wheel. And that’s another way that
as a webmaster or a publisher, you can do a lot of keyword research and brainstorming.
So if you typed in PubCon, you can always use the Google keyword tool. There’s a bunch
of different ways where you can do keyword research. But in Wonder Wheel, it’s in flash
so can type in PubCon. And some of the suggestions include PubCon 2009; Tony Hsieh who is associated
with Zappos who did the keynote at the conference; Las Vegas Convention Center. But there’s also
related conferences like Search Engine Strategies and the ADTECH and Affiliate Summit. So, it
lets you bring so many ways you might not normally brainstorm. And if you click on one
of those entries, PubCon will move out of the way a little bit, and this new entry whether
it be ADTECH or PubCon 2009 will take the center, and you’ll see different related concepts
to that particular keyword phrase. So you can kind of click around and explore the space
a little bit and quickly move in to different ways to brainstorm, different ways to get
good keyword research done. And with that, we were done with what has Google done for
users. And I wanted to spend a little bit of time talking about web developers, not
just webmasters, because we’ve done some really nice things. So, at,
we’ve released a bunch of different tools so that people can figure out how to make
their site faster. So one of them is a Firefox extension called Page Speed. And what it will
do is basically try to show you all the different ways whenever you load a page that you can
try to make things load a little bit faster. So leveraging browser caching, minifying JavaScript,
taking a bunch of different CSS files and combining them in to one CSS file so that
you have fewer HTTP requests. And at the time I told the people at the conference that while
we currently don’t use Page Speed as a factor in our search results and how we rank different
search results, there are people at Google who definitely want to. And a lot of people
within Google have been thinking about ways where we can figure out how Page Speed can
be one of the factors, not the biggest factor, not the only factor but one of the factors,
because if you have a fast site, it really improves the user experience. And so, I sort
of tried to let people know that if you can improve the speed of your site, it’s good
for users and there’s at least a chance in 2010 that it would be good for your website
rankings as well or help just a little bit at least, so it’s worth paying attention to.
The next slide is about Google has similar tools. This particular site is
not associated with Google, but I just wanted to throw it out there because it’s really
neat. You can see like almost a waterfall model of how long it takes to load your site
and what things are loading. It’s just a sort of thing where you can’t manage something
until you can measure it. And having the ability to see how long does it take to load all the
different stuff on a site can be a really eye-opener. I think Barry Schwartz looked
at afterwards and sort of found the way to squeeze two-thirds of a time of
loading the site completely out just by using some very simple changes. So, you’d be amazed
at how much of a difference it really can make. Okay. So what else has Google done for
web developers? We have just recently released this fantastic set of tools called Closure,
and you can find it at, and it’s a bunch of different things. It’s
compiler, a library, and a templating system, so I’ll just focus on a couple of those. It’s
a compiler in the sense that it will take JavaScript and it will basically try to combine
it down into something that’s very, very compact. So you might have–I’ll talk about that on
the next slide but it will squeeze it down to be very, very small. The library is incredibly
interesting. There’s something like over 180 different just UI elements alone. So, I’m
showing on this slide, goog.ui.DatePicker which is the same DatePicker that Google use
whether it’s in Google calendar or a lab in Gmail. And you can use that code totally for
free. So by providing this library which has got all kinds of user interface components
but also a bunch of different things for just math and time and all sorts of stuff like
that, you save yourself the work of writing that JavaScript code, and it’s very nicely
internationalized. It works really well. It’s the same stuff that we use. So, we’re trying
to make the web better by making it easier to develop for the web. So to talk about the
compiler part of Closure a little bit, Google Reader’s JavaScript, they talked to the Google
Reader team, and I think it was Louis Gray that interviewed Mihai Paparita. And Mihai
said that Google Reader’s JavaScript would have been two 2 megabytes uncompressed, and
Closure got it down to 513 kilobytes, so 25 percent of the original size. And then, with
gzipping it, they were able to get it down to 184 kilobytes. So, 2 megabytes down to
184 kilobytes is really worth the few minutes of running this Closure compiler and figuring
out how to do gzipping, because it makes a difference between things loading in two or
three seconds versus 20 or 40 seconds. So, it’s really a pretty good idea to pay some
attention to. So, I didn’t want to emphasize the web developer stuff for too, too long.
I didn’t want to bore people, but I did want to include a couple of shout-outs to other
Google tools that make things easier. One is the Google Web Toolkit; we use that ourselves
for a bunch of different stuff. I believe Google Wave uses it. The latest version of
AdWords, I believe uses Google Web Toolkit. And it just makes it so that it’s a lot easier
to sort of write your code. It’s almost like you can write it in Java and compile it down
to JavaScript, and you get a lot or reusable components where you don’t have to worry about
cross-compile or cross browser aspects, all those sort of things. So, a lot of people
enjoy GWT or the Google Web Toolkit. And then, a final thing for web developers is called
the AJAX Libraries API. So what is that? Google found a bunch of really useful AJAX Libraries,
you know, Scriptaculous and the whole bunch of different stuff as I recall. And we said
to these people, “You know what? We will host this for free on Google. We’ll pay the bandwidth
bills, all that sort of stuff.” We’ll also make sure that if you include it from here,
you always get the most recent version. So now you don’t have to worry about a security
hole in your JavaScript library or a third party library that you’re using. As long as
you’re using the version hosted on Google, you’ll always get the most recent version.
So, it’s a very handy thing. It’s just, you know, means that there’s infrastructure that
you don’t have to worry about. You can let somebody else deal with it and that can be
a very handy thing. So a lot of different stuff in the last few months that have been
pretty helpful for web developers. And then we got to what was my favorite part which
was, what has Google done for webmasters? So there’s a bunch of different stuff, starting
off February 2009, rel=canonical. So this is something that the major search engines
support. If you have two pages that are basically the same pages, you can say, “You know what?
This is my preferred page so I’m going to put a rel=canonical on this page to point
to this page and Google can sort of glom those together and say, “Oh, the links to this page
should be combined with the links to this page.” Now, if you can do it with your site
architecture where you don’t have to worry about the incoming links and the duplicate
content at all, that’s best. If you can do a 301 redirect where it passes the page rank
and you can say, “You know what? I have duplicate URLs but I can do a 301 redirect to this one
single location,” that’s almost as good. But if you can’t generate either one of those
because of your CMS or for whatever reasons, rel=canonical is a pretty good way to say,
“You know what? These two pages are actually the same page.” So that’s been something that
a lot of people–I’ve been surprised at how much uptake we’ve gotten and how much traction
it’s sort of gotten in just a few months. I also took people on a little bit of a tour
of what’s new in webmaster console, because not everybody goes back all the time to see
what’s new. So, Yahoo, I’ve sent out a shout out to Yahoo, because they have this great
feature that lets you say, “Here are URL parameters, which I don’t find that useful”; section IDs,
for example. And if you want to, you can specify now in Google’s Webmaster console the URL
parameters that you think should be ignored. So that’s very handy. A lot of people really
enjoy that. It took us a while to deliver it, but I’m glad that we did because now you
can say, “You know what? Here’s this parameter. I can’t get rid of it because of my CMS. Google,
just please ignore it,” and Google will do that for you. My personal favorite new feature
in the Google Webmaster Console is that you can fetch as Googlebot. So you prove that
you own a site and you can tell Google to go and fetch that site or a page on that site
and show exactly what Google saw. Now, why would you need to use this? Primarily, my
favorite reason why you’d use it is if your site has been hacked. It turns out people
are so evil these days that they will only show the hacked content when Google comes
crawling. So, if it’s someone who pretends to be Googlebot but they’re not coming from
the right IP address, they don’t show the hacked content. They only show it to Googlebot
when it’s coming from the right address, the right IP address. And if they’re really sneaky,
they’ll put on a noarchive meta tag on. And then, you can’t see the cached link so there’s
no way that you can see what Googlebot saw when it crawled your page. So as long as you
own the page and you register that it’s yours in the webmaster tools area, you can fetch
the page’s Googlebot and then you can say, “Uh-ah, here’s my hacked content, buy cheap
[INDISTINCT].” And you iterate, you can fetch it a bunch of different times a day to sort
of saying, “Okay, I tried to clean it up. Is it gone? No. Crap! Okay. I’ll try to clean
it up again. Oh, got it. Okay. Good. Now, my site is clean.” So it’s a very handy tool.
In my mind, it’s primarily the best for detecting malware. We’ve even seen people at Google
get their blogs hacked, so it can be really, really handy even for people at Google to
be able to fetch the page’s Googlebot. What else? A better malware Warnings. So not just
telling you that you have malware but trying to actually show you, you know, more information:
what is the URL or what’s the specific content on the page that was giving you the malware
or making us flag the site as having malware. The more information we can provide, the faster
you can diagnose and debug the stuff, clean it up, and get back into Google, or you know,
hopefully, not have to worry about having infected your users, so a very simple thing
but very, very handy. This one’s kind of interesting: message forwarding. So, if you have a message
in the webmaster console, it used to be you have to go and check on it and say, “Go log
in every day and see do I have a new message.” And not everybody lives and dies and breathes
the webmaster console and wants to show up every morning and, “Oh, I wonder if I have
any new messages. I can’t wait to find out.” So, adding the ability to say, “You know what?
If I get a message in the webmaster console, send it to me by email or forward it to me
by email.” I was kind of surprised. This one got spontaneous applause. So, apparently,
there are a few people who are really glad that they could forward their messages on
to their email address. Relatively straightforward thing but very, very helpful. Another straightforward
but helpful thing is keyword details. So we’re starting to show more information where, suppose
you look at my blog, for example. I rank maybe for a keyword like SEO. I could click on that
and I could see some of the top pages that have the keyword SEO. So, it could be very
handy to just sort of drill down in more detail on what are the exact pages that have these
keywords, things like SEO. So, to close out what the PubCon presentation was talking about.
One of the big things we talked about was communication. So we did over 80 blog posts
in the last year. We posted a 20 plus page SEO Beginner’s Guide in PDF. So that’s really
handy, because a lot of people think Google hates SEO, and nothing could be further from
the truth. SEO, when done well and when done in a whitehat way can make your site more
crawlable, more accessible, and can help users find useful content on your site. So a lot
of people like to think, “Oh, Google hates SEO. Google thinks all SEO is evil.” And I
was really glad that we published this SEO Beginner’s Guide because, you know, if we
thought SEO is evil, we wouldn’t tell people, “Hey, here’s an introduction to what it is,
how to do it well, how to do it in a whitehat way.” I think there’s nothing more than we
would like than to have webmasters and Google cooperating to try to return good information
to the users. That’s in everybody’s interest. So that was a really big step, to be able
to put that beginner’s guide out there. Something that eased a lot of people’s mind was just
that do a blog post and a video that say, “We don’t use keyword meta tags.” You see
these lawsuits going on where somebody is like, “He put, you know, my business’s name
in the keyword meta tags; therefore, I’m going to sue him.” And it was kind of nice to do
this blog post, because we’ve already seen a little bit of an effect from that where
people are like, look, it’s been a well-known fact for a long time. You can test it by putting
a weird, unique keyword meta tag and, you know, you search for it later and you don’t
find it. So, any reasonable person running the experiment would conclude Google doesn’t
use the keyword meta tag. But just to come out and confirm it so that people don’t have
to worry about it and don’t have to waste their time on it is really nice. So I was
glad we were able to do that. Something people in the United States might not appreciate,
but people around the world appreciate is that we now have the webmaster console in
40 different languages. So, that was towards the beginning of the year but still very,
very important. And then one thing that we’ve been doing which you are very familiar with
is Webmaster Videos. So I just mentioned the fact that we’ve done over 165 videos to date,
over 1 million page views. I think it’s something like 1.2 or 1.3 million page views at this
point. And you know, we have a webmaster channel on YouTube where hundreds of people subscribe
to it. And they’re sometimes watching the video before I even Tweet about it. So, it’s
at And it’s just all kinds of really interesting stuff.
Sometimes it’s keynote presentations, you know, recreating talks; sometimes it’s one
or two-minute videos. But it’s free, it’s often very useful information, and I’m really
glad that we’ve tried that experiment to communicate more. So it was kind of a hot topic about
Caffeine right around the time the PubCon was going on, so we just included a slide
or two to talk about Caffeine. So, just to remind everybody, it’s a rewrite of our indexing
infrastructure so it’s taking the old way that we used to index things that we crawled
around the web and replacing that with new architecture that was fresh and was written
to be more scalable, more flexible, the ability to attach different types of data in the process
of indexing, the ability to do more documents or more comprehensive version of the web,
and the ability to do it faster. All of that sort of stuff is really, really useful. And
people were a little worried so we just reassured them that we got great feedback for the beta,
but we were going to open up Caffeine at one data center. It was going to stay at one data
center before the holidays, so you wouldn’t see Caffeine at any other data centers than
the one until after the holidays until at least January. And that’s just put everybody’s
mind at ease. They don’t need to worry about Caffeine. We are mindful of the fact that
when the holidays are coming, webmasters get a little jittery. They get a little anxious.
They don’t want rankings to change. They don’t want major changes to happen. And to the extent
that we can, we try not to make any major changes. Now, Q4 is one-fourth of the year,
and you can’t just shut down the search engine and not make any daily changes or make any
changes at all for one-fourth of the year or you’d lose a lot of productivity. But we
try to figure out if there’s something big coming, can we either do it earlier in the
year or can we do it after the holiday so that we can to avoid getting any major problems.
And I think that people appreciate that. Looking forward to the future, what do people see
coming down the pipe? I think hacking and malware will continue. We see a lot more people
sort of checking those webmaster documentation pages. So now we’re in hacking; it will continue
to keep growing. But, we’re going to keep working on making our relevance better, trying
to find ways to detect hack sites and detect spam as we always do. We’re going to keep
looking at ways to communicate in better, more scalable ways. We’ve tried everything
from webmaster chats to forums to Twittering, Tweeting, to videos. And we just keep trying
to find ways whether it’s blogs or conferences to answer questions, and we’re going to keep
doing that. So, I close the presentation with just a few takeaways. I said if there’s nothing
else that you remember from this talk, the four things that I’d like you to remember
are number one, try Social Search out. It’s surprisingly useful; it’s really good at surfacing
relevant public content from your friends. Number two, try to speed up your site. There’s
a bunch of tools and you’d be amazed at how easily you can speed up your site very quickly
in some ways without doing a ton of new things, just trying to tweak a few small things, and
users really appreciate that. If you haven’t looked at the webmaster console in a while,
dig into it because there’s a lot of good content there. And finally, go ahead and subscribe
to the official blog, the Google Webmaster Central Blog and the video channel on YouTube.
I am kind of proud that I feel like I’m a little superfluous. People don’t really need
me as much anymore. So I’ve noticed that I don’t post as much Google stuff on my personal
blog, because there’s so much more stuff going up on the official blog. And I’m sort of urging
people to think about switching their mental model from “lets see what Matt has to say
today” to “lets see what the official blog has to say today,” because that’s always going
to be completely comprehensive. They’re going to go into a lot of detail. And if you look
at the schedule, they’re posting a lot of new information all the time on there. So
I would definitely make sure that you subscribe with that blog and check it. There’s a lot
of great information. Okay. That’s basically how the talk went. Everything else was panel
and questions. So I hope you enjoyed the recreation of the panel presentation.

Leave a Reply

Your email address will not be published. Required fields are marked *