# 14 2 PageRank Overview and Markov Chains 12 10

By | January 17, 2020

Let’s talk about page rank, one of the most
important uses of link analysis. And we’ve talked a lot in information retrieval about
using the overlap between the query and the document in terms of words and. We talked
about TF IDF accounts, but there’s more than just word overlap that tells us a page is
good. And one instinct is that a page that’s very popular is a good page. How do we measure popularity? Well, one simple
measure might be just look at who it gets pointed at. A page that’s pointed to by lots
of other pages, that’s probably a good page. And one way to do that is to use link counts.
We’ll just count the number of links to a page, and maybe we’ll combine that with the
text match score. So you can imagine two ways of using link
counts to simply measure the popularity of a page. One is the undirected popularity of
a page. Undirected popularity’s also called degree. Degree is just the number of in-links
plus the number of out-links, the total number of links related to a page. So here’s a page. Call that page B. And B
has five– has three in-links and two out-links, so it’s a total of five links. So that’s the
degree of B. B’s degree is five, and we could use that as a page score. Alternatively, we might just choose to use
the number of in-links that people point to me. Maybe I’m important, so we could use that
number three, the in-links, as a directed measure of popularity. These is fine. Now
the problem with simple link counting popularity is that it’s very easy to spam, and this is
true whether we use the degree of node, the total number of in-links and out-links, or
more important if it comes from a page that’s more important itself, and here’s a graphical
representation of this idea. Here E has a lot of links. It’s got six links coming in
and some coming out. Whereas C only has one link coming in. But
C’s one link coming in is from B, a very important page. And so the intuition of page rank is
we’d like to measure not just how many links come in, but who the links are from. And if
a link is from a much more important page, we’re going to weight that more importantly.
We’re going to cope with an iterative algorithm for measuring the importance of a page, and
then passing this importance along the links. So imagine a browser doing a random walk on
web pages. We’ll start at a random page, and at each step, we go out of the current page,
along one of the links on the page equiprobably. So if a page has three links with probability
1/3, we’ll take the first link. And probability the 1/3, we’ll walk along the second. And
probability 1/3, we’ll walk along the third. And the intuition of page rank is that, in
the steady state, after a lot of walking around, each page has a long term visit rate. And
we’re going to use this as the page’s score. So the long term visit rate, we’re going to
use this as the score, and it’s going to be called the page rank of the page. Now just a random walk isn’t quite good enough,
because the web is full of dead ends. If you do a random walk, and you get to a page, and
imagine the page– here’s a node C, and C has no out-links, just in-links, and so we
could randomly walk away– we can’t walk away from C. We’re stuck in C. So we’re going to
add one more thing to our random walk, and that’s called teleporting. Here’s how teleporting works. When we get
to a dead end, we just jump to any random web page. Whatever the number of pages in
the entire collection is, the whole web if it is, we just jump to one of those random
pages. If we’re at a non-dead end, with probability 10% let’s say, we still jump to a random page. But with the remaining probability, the 90%
probability, we’ll go out on a random link. And that 10% is a parameter that we’ll call
So now we’ve modified our random walk to solve the problem of dead ends. So teleported means we can’t get stuck locally,
and it also lets us compute a long term rate, the page rank, at which any pages visited.
To see how to compute this, we’re going to have to start with Markov chains. So now a
Markov chain has a set of states, n states, and a transition probability matrix, an n
by n transition probably the matrix P. At each step in a Markov chain, we’re at one
of the states. And the entries of P, P sub i j, tells us the probability P of j given
i. The probably that if we’re in state i, we’re going to go next to state j. So here
we’re in state i, and P of i j says here’s the problem we’re going to go to state j. So if we look at i– we look at all the links
out of i, those probabilities must sum to 1. So the probability for a given i, of all
the j’s we can go to, those sum to 1. It’s a probability condition on i. So let’s look at an example. Here’s a little
Markov chain. So we have three nodes, A, B, and C. So if we’re in A, the probability of
going to B is 0.5. The probability of going to C is 0.5. If we’re in C, we’re definitely
going to go to A. And if we’re in B, we’re definitely going to go to A. So everything sums up to 1. Given A, the probability
of going to B is 0.5 and C is 0.5, that sums to 1. And the other two also sum to 1. So
we can build a little transition probability matrix. So here is A, B, C, A, B, C. So if we’re in
A, the probability of going to A in the next step is 0. There is no self loop here on A.
If we’re in A, the probability of going to B as 0.5. There’s that. And the probably of going to C is 0.5. There’s
this, and so on. So we can fill out that probability matrix, and we’re going to be using this transition
probably matrix to talk about page rank. So a Markov chain is an abstraction of a random
walk, and we’re going to use it to talk about the random surfer surfing the web. So each
state of our Markov chain’s got to represent one web page. And the transition probability
will represent the probability of moving from one page to another. Now we can generate this
transition probability matrix P from the adjacency matrix of the web graph. And the adjacency
matrix just has 1 if two pages link to each other, and 0 if they don’t. So more formally, if we’re at some node in
the web, and the node has no out-links, the random surfer will teleport, and the transition
probability to each node in the entire N node graph is 1 over n. So if there is n pages
on the web, and we’re at one page, at a dead end page, then with probably 1 over n, we’ll
go to any other pages in the whole web. But if a node does have outgoing links, let’s
say it has k, k greater than 0 outgoing links, then with some probability alpha, alpha between
0 and 1, the server will teleport to a random node. And the probability, since there is n possible
nodes in the web, is alpha over n. And with the remaining probability, 1 minus alpha,
we’ll take a normal random walk. And we said there, there’s k outgoing links from this
particular page. And so, with that probability 1 minus alpha, we’ll take 1 over k of those
of the web graph. So a sub i j, in this matrix, is 1 if there is a hyperlink from page i to
page j, so just a 1, 0 matrix expressing the structure of the web. 1 between i and j if
there’s a link from i to j. Now here’s how we’re going to derive our transition
probability matrix P. If node i is a dead end, then it’s row, the row of the A matrix,
will have no 1’s, because there is a 0– there are no outgoing links, so there are no 1’s
in its row. So if that’s the case, we’re going to get to replace each cell with 1 over n.
So the 1– the probability for that node of going to any place on the web is 1 over n. For any other row, so rows that do have outgoing
links, we’re going to proceed as follows. First, we’re going to normalize. So we’ll
divide each 1 in A by the number of 1’s in it’s row. So if there’s a row that has three
1’s, we’ll turn them all into 1/3, so we’re getting a probability matrix here. Now we’re going to multiply that by 1 minus
alpha. So with probability 1 minus alpha, we’re just going to go to to a random outgoing
node from– on our– take one of our random outgoing links. And now we’re going to add
alpha over n to every entry of the matrix. So with probability alpha over n, we’re going
to go to some other random place. Let’s look at an example. Here, we have another
little sample of the web, a little mini web with only three nodes, node 1, node 2, and
node 3. And the adjacency matrix for this little example, we’ll have from node 2, we
can get to node 3 and node 1. From node 1, we can only get to node 2. And
from node 3, we can get to node 2. So here’s our adjacency matrix, 1, 2, 3, 1, 2, 3. So
from node 2, we can get to nodes 1 and nodes 3. From node 3, we can get to node 2 and so
on. Now let’s consider computing at the transition
alpha equals 0, meaning there’s no teleporting. So in that case, all we’re really doing to
compute P is we’re going to normalize A into a probability matrix. So each row sums to
1, because the outgoing nodes from each node must be a probability where we’re going to
each link with its probability. So this row is already normalized. This first
row is already normalized. And here, in the last row, is normalized. So we just did normalize
this and turn all those 1’s into 0.5’s. So there is our alpha equals 0 probability transition–
transition probability– so there’s our alpha equals 0 transition probability matrix. Now the more interesting case, of course,
happens and we have some other alpha, so let’s think about a different alpha, alpha equals
0.5, where we do have a possibility of teleportation. So now, let’s look just at this first row,
and we’ll recompute that for alpha equals 0.5. So how do we compute that for the first
row? We said with probability alpha, we’re going
to teleport to any place on the web. And there’s three nodes in this little mini web, N equals
3. So with probability alpha, this vector will be 1 over n, 1 over n, 1 over n. With
probability 1 minus alpha, we’re going to go to each node with its out-link probability.
And we can only go from node 1 to node 2, and we do that with probability 1. And so with 1 minus alpha, we’re going to
take that set of transitions. And alpha, we said, is 0.5, so that’s then 0.5 times this
vector plus 0.5 times this vector, and that’s going to be 1/6, 2/3, and then 1/6. So that’s
going to be the first new row of this vector. Let me clear that off. And sure enough, 1/6, 2/3, 1/6. And we do
the same computation for each of the two rows, so here’s the new transition probability matrix
assuming a 0.5 teleportation probability. So we’ve introduced the intuition of page
rank, talked about Markov chains, and how to think about the transition probability
matrix, and how to compute that with the teleportation probabilities. In the next section, we’ll
look at actually the computation of page rank itself.

## 5 thoughts on “14 2 PageRank Overview and Markov Chains 12 10”

1. Aria Farmani Post author

Thanks for this 🙂

2. Solidwater Slayer Post author

dude thanks so much

I was reading dis and couldn't understand

https://nlp.stanford.edu/IR-book/html/htmledition/markov-chains-1.html

did yall copy each other?

3. Kristina Jakaviciute Post author

Thank you for the explanation! 🙂

4. Waleed Ahmad Post author

wonderfully explained. thank you

5. Shayma Faiz Post author

in min 11:05 why is it always 1/N why not alpha/N