Page Rank – CS101 – Udacity

By | December 14, 2019


The idea for how to rank web pages is the same idea as how we measured popularity of people. Instead of thinking about friendships as the way to measure popularity, what we’re measuring, is links on the web. When one page has a link to another page, that indicates that this more likely that this other page is popular. Just like when someone is a friend, it indicates that the person they are friends with is more likely to be popular. The goal in ranking web pages is to get a measure of how popular a page is based on the other pages that link to it. We have the same issue with popularity, that not all links are the same– that a link from a page that is really important counts for a lot more than a link from a page that is not important. If the NY Times has a link to your page, that counts for a lot more than if your mom sets up a web page and puts a link to your page in it, unless your mom is Lady Gaga, in which case her link probably counts for more. Another way of thinking about this– is what we’re trying to model– is a random web surfer. Our random web surfer has some set of pages that they know about. Those pages have links to other pages. Some pages might have a lot of links. Some pages might just have one link. Some pages might have no links. One way to think about this– is that we’re trying to model a random web surfer. The web surfer starts knowing about some pages. She picks one page at random. Let’s say she picks this page. And then, when she’s on that page, she picks a random link, and follows that link. Oops, this was a bad starting page, it actually has no out links. So, then she picks another random page. Let’s say she picks this one. She follows the link from that page, and now she got to the page with no links again. Let’s say she picked a better starting point. Let’s say she randomly picked this one. Now she’s got 2 links to follow. She randomly picks one of those. She follows it. She gets to a new page. She randomly picks a link from that page. In this case, it only had 1. In this case, it seems we have a bit of a problem, because all of the starting pages eventually lead into this one, which has no outgoing links. We’ll think about how to solve that problem later. We can think about what our random web surfer is doing–it is picking random pages, following links. What we want to measure is the popularity of the page. That’s the probability is that she reaches that given page starting from these random pages. If you did this over-and-over again, and you counted the number of times you reached these pages, that would give you a measure of that page’s popularity. This is very similar to the popularity function. We’re going to define a function that we’ll call the rank of a page. Like the way we defined the popularity function, it’s going to have 2 inputs. It is going to have a timestep, and it’s going to have the page, which we will use the URL for. The output of rank will be some number, except we’ll define for timestep 0– this is our base case–we’re going to define–all the ranks have value 1– we’ll actually change that shortly, but we’ll start out thinking of all the ranks having value 1 like we did with the popularity scores. Then we’ll define the value of the rank at timestep T for any given URL, just like we defined the popularity score. It’s going to be the sum of all the pages that are friends with this page, and what it means for our web page to be friends with another web page is that it has a link to it. This is going to be for all the pages that exist, that have some like to that URL, or its friends. We are going to go through each of those pages– we’ll call them in-links instead of friends. We’re going to go through those, and we’re going to sum up the ranks that we got for those pages at time T minus 1. This is our first model popularity of web pages. This is exactly the same as the model we had on popularity for people. It’s not going to work that well yet. One of the reasons it’s not going to work that well is some pages might have lots of links– and if a page has lots of links, the value of each one of its links should be diminished. It shouldn’t have the same value as a page that only 1 link that links to this URL. Maybe that should be the same case for friend popularity. If someone has lots of friends, each friend is less valued. Whereas, if someone only has a small number of friends, has lots of time for each friend. This is the way we’re going to model web popularity. We don’t want to just give the same score to each link. We’re going to change this by dividing by the number of out links. If a page has many outgoing links, the value of the page that it links to, is less for each page. A page that is just a long list of links won’t have that much influence on the rankings. If a page only has a few outgoing links, well, then they are worth more. What we’re going to do is divide this by the number of out links from P. Each of the P pages–these will be the value of P– as they go through the in links of URL, we’re going to sum up the rank that we got on the previous timestep and divide that by the number of out links.

3 thoughts on “Page Rank – CS101 – Udacity

  1. 150metri Post author

    what is the initial value you pass for the time t on the function rank for page p?

    Reply
  2. Nilanjan Bhattacharya Post author

    Great video.  I wish I knew which was the previous and which is the next in the series

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *