Let’s talk about page rank, one of the most

important uses of link analysis. And we’ve talked a lot in information retrieval about

using the overlap between the query and the document in terms of words and. We talked

about TF IDF accounts, but there’s more than just word overlap that tells us a page is

good. And one instinct is that a page that’s very popular is a good page. How do we measure popularity? Well, one simple

measure might be just look at who it gets pointed at. A page that’s pointed to by lots

of other pages, that’s probably a good page. And one way to do that is to use link counts.

We’ll just count the number of links to a page, and maybe we’ll combine that with the

text match score. So you can imagine two ways of using link

counts to simply measure the popularity of a page. One is the undirected popularity of

a page. Undirected popularity’s also called degree. Degree is just the number of in-links

plus the number of out-links, the total number of links related to a page. So here’s a page. Call that page B. And B

has five– has three in-links and two out-links, so it’s a total of five links. So that’s the

degree of B. B’s degree is five, and we could use that as a page score. Alternatively, we might just choose to use

the number of in-links that people point to me. Maybe I’m important, so we could use that

number three, the in-links, as a directed measure of popularity. These is fine. Now

the problem with simple link counting popularity is that it’s very easy to spam, and this is

true whether we use the degree of node, the total number of in-links and out-links, or

just the number of in-links. Let’s look at a question about this. So the intuition of page rank is, instead

of just counting the number of in-links and out-links, we’re going to weight a link as

more important if it comes from a page that’s more important itself, and here’s a graphical

representation of this idea. Here E has a lot of links. It’s got six links coming in

and some coming out. Whereas C only has one link coming in. But

C’s one link coming in is from B, a very important page. And so the intuition of page rank is

we’d like to measure not just how many links come in, but who the links are from. And if

a link is from a much more important page, we’re going to weight that more importantly.

We’re going to cope with an iterative algorithm for measuring the importance of a page, and

then passing this importance along the links. So imagine a browser doing a random walk on

web pages. We’ll start at a random page, and at each step, we go out of the current page,

along one of the links on the page equiprobably. So if a page has three links with probability

1/3, we’ll take the first link. And probability the 1/3, we’ll walk along the second. And

probability 1/3, we’ll walk along the third. And the intuition of page rank is that, in

the steady state, after a lot of walking around, each page has a long term visit rate. And

we’re going to use this as the page’s score. So the long term visit rate, we’re going to

use this as the score, and it’s going to be called the page rank of the page. Now just a random walk isn’t quite good enough,

because the web is full of dead ends. If you do a random walk, and you get to a page, and

imagine the page– here’s a node C, and C has no out-links, just in-links, and so we

could randomly walk away– we can’t walk away from C. We’re stuck in C. So we’re going to

add one more thing to our random walk, and that’s called teleporting. Here’s how teleporting works. When we get

to a dead end, we just jump to any random web page. Whatever the number of pages in

the entire collection is, the whole web if it is, we just jump to one of those random

pages. If we’re at a non-dead end, with probability 10% let’s say, we still jump to a random page. But with the remaining probability, the 90%

probability, we’ll go out on a random link. And that 10% is a parameter that we’ll call

alpha. So teleporting, if we’re at a dead end, we jump to someplace random. If we’re

not dead end, we still might jump to some place random, or we’ll follow a random out-links.

So now we’ve modified our random walk to solve the problem of dead ends. So teleported means we can’t get stuck locally,

and it also lets us compute a long term rate, the page rank, at which any pages visited.

To see how to compute this, we’re going to have to start with Markov chains. So now a

Markov chain has a set of states, n states, and a transition probability matrix, an n

by n transition probably the matrix P. At each step in a Markov chain, we’re at one

of the states. And the entries of P, P sub i j, tells us the probability P of j given

i. The probably that if we’re in state i, we’re going to go next to state j. So here

we’re in state i, and P of i j says here’s the problem we’re going to go to state j. So if we look at i– we look at all the links

out of i, those probabilities must sum to 1. So the probability for a given i, of all

the j’s we can go to, those sum to 1. It’s a probability condition on i. So let’s look at an example. Here’s a little

Markov chain. So we have three nodes, A, B, and C. So if we’re in A, the probability of

going to B is 0.5. The probability of going to C is 0.5. If we’re in C, we’re definitely

going to go to A. And if we’re in B, we’re definitely going to go to A. So everything sums up to 1. Given A, the probability

of going to B is 0.5 and C is 0.5, that sums to 1. And the other two also sum to 1. So

we can build a little transition probability matrix. So here is A, B, C, A, B, C. So if we’re in

A, the probability of going to A in the next step is 0. There is no self loop here on A.

If we’re in A, the probability of going to B as 0.5. There’s that. And the probably of going to C is 0.5. There’s

this, and so on. So we can fill out that probability matrix, and we’re going to be using this transition

probably matrix to talk about page rank. So a Markov chain is an abstraction of a random

walk, and we’re going to use it to talk about the random surfer surfing the web. So each

state of our Markov chain’s got to represent one web page. And the transition probability

will represent the probability of moving from one page to another. Now we can generate this

transition probability matrix P from the adjacency matrix of the web graph. And the adjacency

matrix just has 1 if two pages link to each other, and 0 if they don’t. So more formally, if we’re at some node in

the web, and the node has no out-links, the random surfer will teleport, and the transition

probability to each node in the entire N node graph is 1 over n. So if there is n pages

on the web, and we’re at one page, at a dead end page, then with probably 1 over n, we’ll

go to any other pages in the whole web. But if a node does have outgoing links, let’s

say it has k, k greater than 0 outgoing links, then with some probability alpha, alpha between

0 and 1, the server will teleport to a random node. And the probability, since there is n possible

nodes in the web, is alpha over n. And with the remaining probability, 1 minus alpha,

we’ll take a normal random walk. And we said there, there’s k outgoing links from this

particular page. And so, with that probability 1 minus alpha, we’ll take 1 over k of those

for each of the possible k outgoing links. So let’s suppose we had the adjacency matrix

of the web graph. So a sub i j, in this matrix, is 1 if there is a hyperlink from page i to

page j, so just a 1, 0 matrix expressing the structure of the web. 1 between i and j if

there’s a link from i to j. Now here’s how we’re going to derive our transition

probability matrix P. If node i is a dead end, then it’s row, the row of the A matrix,

will have no 1’s, because there is a 0– there are no outgoing links, so there are no 1’s

in its row. So if that’s the case, we’re going to get to replace each cell with 1 over n.

So the 1– the probability for that node of going to any place on the web is 1 over n. For any other row, so rows that do have outgoing

links, we’re going to proceed as follows. First, we’re going to normalize. So we’ll

divide each 1 in A by the number of 1’s in it’s row. So if there’s a row that has three

1’s, we’ll turn them all into 1/3, so we’re getting a probability matrix here. Now we’re going to multiply that by 1 minus

alpha. So with probability 1 minus alpha, we’re just going to go to to a random outgoing

node from– on our– take one of our random outgoing links. And now we’re going to add

alpha over n to every entry of the matrix. So with probability alpha over n, we’re going

to go to some other random place. Let’s look at an example. Here, we have another

little sample of the web, a little mini web with only three nodes, node 1, node 2, and

node 3. And the adjacency matrix for this little example, we’ll have from node 2, we

can get to node 3 and node 1. From node 1, we can only get to node 2. And

from node 3, we can get to node 2. So here’s our adjacency matrix, 1, 2, 3, 1, 2, 3. So

from node 2, we can get to nodes 1 and nodes 3. From node 3, we can get to node 2 and so

on. Now let’s consider computing at the transition

probability matrix from this adjacency matrix, and let’s start with the simple case where

alpha equals 0, meaning there’s no teleporting. So in that case, all we’re really doing to

compute P is we’re going to normalize A into a probability matrix. So each row sums to

1, because the outgoing nodes from each node must be a probability where we’re going to

each link with its probability. So this row is already normalized. This first

row is already normalized. And here, in the last row, is normalized. So we just did normalize

this and turn all those 1’s into 0.5’s. So there is our alpha equals 0 probability transition–

transition probability– so there’s our alpha equals 0 transition probability matrix. Now the more interesting case, of course,

happens and we have some other alpha, so let’s think about a different alpha, alpha equals

0.5, where we do have a possibility of teleportation. So now, let’s look just at this first row,

and we’ll recompute that for alpha equals 0.5. So how do we compute that for the first

row? We said with probability alpha, we’re going

to teleport to any place on the web. And there’s three nodes in this little mini web, N equals

3. So with probability alpha, this vector will be 1 over n, 1 over n, 1 over n. With

probability 1 minus alpha, we’re going to go to each node with its out-link probability.

And we can only go from node 1 to node 2, and we do that with probability 1. And so with 1 minus alpha, we’re going to

take that set of transitions. And alpha, we said, is 0.5, so that’s then 0.5 times this

vector plus 0.5 times this vector, and that’s going to be 1/6, 2/3, and then 1/6. So that’s

going to be the first new row of this vector. Let me clear that off. And sure enough, 1/6, 2/3, 1/6. And we do

the same computation for each of the two rows, so here’s the new transition probability matrix

assuming a 0.5 teleportation probability. So we’ve introduced the intuition of page

rank, talked about Markov chains, and how to think about the transition probability

matrix, and how to compute that with the teleportation probabilities. In the next section, we’ll

look at actually the computation of page rank itself.

Thanks for this 🙂

dude thanks so much

I was reading dis and couldn't understand

https://nlp.stanford.edu/IR-book/html/htmledition/markov-chains-1.html

did yall copy each other?

Thank you for the explanation! 🙂

wonderfully explained. thank you

in min 11:05 why is it always 1/N why not alpha/N