Hi, my name is Maile Ohye, and I work at Google
as a Developer Programs tech lead. I’m so glad to be speaking to you today
because for me and on my behalf of all my colleagues at Google, we understand how important
it is to have a strong news ecosystem so I hope you find something in this presentation
that you find useful. Today we’re going to talk about three main topics. First the
ranking factors of Google News search. Next we’re going to cover some of the frequently
asked questions that we hear from publishers or from SEOs. And last we’re going
to talk more about the best practices when you publish articles. So let’s take
a first look at how your articles appear in a Google search result. There are several
ways. First is obviously on google.com, where people might see a news onebox. And this here
in the upper screen shot shows you a news result for a search like Obama medals, where
now the user is shown some news article. There’s one way where your articles
can appear in Google news. On this second screenshot this is from a user going directly
to news.google.com and here’s where they see a similar cluster of articles but
instead of the google.com homepage they’re seeing it on the news home page. So you might
be asking yourself, “How did these articles appear?” Now the way we gather these
articles are by first crawling it, next grouping it, and then ranking all of the information.
And we’ll cover each of these steps more in depth. Let’s start with crawling.
In the crawling stage, much like websearch, we have Googlebot who’s going to go
out to your news sites to look for new articles. And there’s two ways that we retrieve
these articles. One is through our discovery crawl where Google sees new URLs and then
crawls those articles, but in addition to that discovery crawl you can also create News
Sitemaps. And News Sitemaps are a way for you to list exactly what are your new URLS,
and so we can use that as well in addition to our discovery crawl to find your new information.
And of course, we respect the Robots Exclusion Protocol, so you can create a robots.txt file
or use http headers to let us know specifically what documents you want crawled and what documents
you want excluded from Google search results. Last, once we’ve crawled and made sure
that we’ve only crawled what we are allowed to crawl, we bring those articles
back to Google. And that’s the end of the crawling phase.
So next we get into that grouping phase, and here’s where we have this classification
idea. In classification, what we’re doing is actually looking at each individual
article’s contents. So you can see on this article “The millions Kozlowski
didn’t steal” . We actually take out individual words like “business”
, “tycho” , “money” and “cfo” and understand that this article
pertains to the section of business. And that’s how we populate those different sections in
Google news like Business, Health and Entertainment. Another thing we’re doing is populating
our additions whether it’s UK, US or India. And we can take that from the text
as well. Here we’ve taken words like New York and Manhattan and that’s led
us to believe that this article pertains to the United States. So this is that grouping
stage where we understand what an article is about and also what sections and additions
it pertains to. So now that we’ve covered crawling, grouping, we now have ranking.
And ranking is going to come in two phases. First of course is story ranking. Story ranking
is much like what you see on the Google News page where there’s a group of stories,
whether it might be Obama and the medal ceremony, or it might be the death of Michael Jackson.
Or it might be rising oil prices. Story ranking is deciding which of these stories should
be placed higher which second, which third. That type of idea. These cluster of stories.
And we rank these story clusters according to aggregate editorial interest. So let’s
take a deeper look at what that means. In the upper diagram you can see that a smaller
story has a small effect on publishing activity. Let’s say in North Carolina a man was
giving out free cars to those that really needed it. That’s a great human interest
story. It might be covered in a local paper and also be picked up by a few wires. But
this is still a relatively small story not showing as much aggregate editorial interest
as say a larger story, like the death of Michael Jackson, which is not only published on a
local newspaper, but foreign and national papers, covered by many wires, also including
op-ed articles and follow-up articles. You can see that due to all the editorial interest
about this story we will likely rank it higher than the interest story about a man giving
out free cars in North Carolina. So that’s story ranking. We’re actually ranking
those clusters. The next part about ranking is the individual article ranking. Article
ranking helps us take a cluster of stories, say the death of Michael Jackson, and helps
us determine out of those 200 stories which one should be ranked first for our users,
which should be ranked second and so on. There are many signals that go into article ranking,
but I’m just going to cover four of the major ones for you here. First is fresh
and new. It’s important to us that an article contain recent substantial information
about a topic. And it needs to be objective news to lead this cluster of stories. So press
releases, satire, op-eds aren’t eligible to lead clusters. Another factor is duplication
and novelty detection. And that’s where we try to determine an original source of
content from those that are duplicating the information. So something that we use there
is this idea of citation rank. So for an article we can see that if a news story was broken
by the Los Angeles Times and then later another article saying Washignton cited the Los Angeles
Times as being the source of the information then we can start to see the citation rank
taking place for this story. That this article from the Los Angeles Times might have higher
ranking now because other people are citing it as being an original story.
Another factor is local and personal relevancy. And this applies to individual sections, as
well as additions of your publication. So what we want to do is actually give more weight
to local sources that are likely more relevant to the news item. So if we take that idea
of a man giving out free cars in North Carolina, it’s likely that we would take a paper
like the Charlotte Observer, and know that could be a higher authority for that story
and therefore that article might be ranked higher in this cluster. The last signal I
wanted to cover in article ranking is the idea of trusted sources. For us trusted sources
doesn’t have to do with some arbitrary decision that we make, but it’s actually
data driven. So according to our data over time, did users start to look at your articles
and then click on them? Let’s say that there were five articles being listed and
a significant amount of users chose the third article and went to that source. Then we might
start to determine that this source is actually very trusted for this certain type of information
and over time we start to build out what publications are trusted sources. But not for their entire
publication, it’s done on a section and category basis. So something like the
Sporting News could be very trusted for sports information but maybe not so much for business.
And likely something like the Wall Street Journal might be very trusted in the United
States for business information but maybe not in India. So again, these trusted sources
have to do with section and addition. So it’s a very specific thing that we’re looking
for due to aggregate user behaviour. So those are just four of the signals that we use in
news Search article ranking. Next let’s go into some of your frequently
asked questions. You might be asking “What are the benefits of submitting a News Sitemap?”
Well, we think that Sitemaps are beneficial to us and to you as a publisher as well. First
of all they provide you greater control over which of your articles appear in Google News.
And that’s because, as I mentioned earlier, they help compliment our discovery
crawl and tell us exactly what articles are new and which articles we should crawl. Second,
News Sitemaps are great because they help you give us meta-information about your articles.
So rather than rely on our extractor you can give us the publication date. And rather than
rely on just our extractor to determine the categories for your article you can give us
good hints by using the keywords field. So all in all, we think News Sitemaps provide
a huge benefit to publishers. Another frequently asked question is “Can
Googlebot visit our URLs more than once?” And the answer is yes, we can definitely recrawl
URLs to check for updates. But just taking a step back. Initially Google can actually
find your new content within a matter of minutes of when you published it. And we find your
new content through our discovery crawl or through news sitemaps and after that initial
discovery we will definitely go back and re-check for new article content. So the time at which
we may re-crawl varies, so that re-crawl rate varies, but its pretty safe to say that we’ll
probably go back and check for new content within 12 hours. So we’ll find it within
a matter of minutes and we’ll re-crawl for new content within 12 hours.
You might also be asking “How do I optimize my multimedia content?” Well that’s
a great question. So we’re going to take a look at two types of content. First,
let’s talk about videos. With videos you can create a youtube channel and submit
that to us. We’re looking to include other types of video hosters, but right now
with Youtube we have a pretty good idea of the user experience, that the video will load
etc., so youtube is a trusted video hoster platform for us. And if you do use Youtube
remember that including textual descriptions and transcripts are also helpful because that
helps us associate a specific video with the subject matter.
Now let’s talk about images. With images we have five tips that will help your images
get included in Google News Search. First you want to use a large size image with a
good aspect ratio. Second you want descriptive captions and alt text. Third you want to keep
your good image near the title. And that again helps us associate an image with the subject
matter. Fourth, you want your good image to be inline and not a clickable version. So
again you want your good image near the title and inline. And last we prefer JPG. So if
you use things like PNG images that’s not as good for Google News as a JPEG. So
I would definitely stick to JPEG if you would like your images included in Google News. So the last frequently asked question is of
course “What about PageRank?” PageRank is a lesser factor in Google News than it
is in Websearch. And that makes sense right because the linking structure for an article
that was only published minutes ago isn’t going to be the same as one that was published
years or months ago. So we have to use PR delicately in Google news. So instead of using
signals like PR we actually use more signals like we talked about earlier. Which is things
like timeliness. Is it fresh and new? Or does it have local or personal relevancy. Those
types of things. So now that we’ve covered how Google crawls and groups and ranks
articles and we answered some of your frequently asked questions let’s just get in to
some best practices. First, it’s important that you create
permanent unique URLs with at least 3 digits. And the reason for this is, is that traditionally,
news publishers have used article Ids and then equals a number in their url strings.
And that has helped us to determine that its an article and not just a static html page.
But if your news publishing system doesn’t include digits, three at least three for Google
News, then you can actually submit a News Sitemap. So that’s the workaround.
If you don’t have 3 digits in your URLs, you can create a News Sitemap and let
us know which specific URLS belong in News. The second best practice is to not break up
the article body. So in your news article it should have sequential paragraphs that
can all be included in Google News. You don’t want to break that up with user comments or
links to related posts or even if you have things like it links to additional pages.
That’s not as good for Google News. We’ll take all the article on that
first page. So look again to not break up the article body.
A third best practice is to put dates between the title and the body and that will help
our date extractor to have the correct publication date.
Fourth, titles matter. And this is to have a good HTML title as well as an article title.
So you want your title to be extremely indicative of the story at hand.
Fifth , its best for Google News if you separate your original article content from your press
releases. And you can do this in a directory structure. And this helps us determine what
is specifically a news article versus what might be satire or opinion or a press release.
And the last tip of course is to create unique and informative content. And taht’s
always going to help you do well in the rankings. So the more unique content that you create,
and the more users that enjoy that, the more users will send there and this is kind of
converse to the idea of just publishing other people’s content or just having duplicate
information. So again, the greater information that you put out for all of us to read the
more users you’ll attract to your site. If you have additional questions, please feel
free to visit our News Publisher Help Center and thanks so much for watching.