Computer Ranks – CS101 – Udacity

By | December 13, 2019

So we’re going to provide a start on this code and then we’ll leave the crucial part of it for you to do, as a quiz. You should feel free, at any point, to stop and try to figure out more of the code on your own. There are lots of different ways to do this and we’ll show you one– and you’ll finish that as a quiz. So the first thing we’re going to do is define 2 constants. So we’re going to use “d” as the damping factor. And I’ll use 0.8 as my damping factor. That’s the probability, thinking about our random Web surfer– that she selects the link on the current page, rather than starting over with a new random page. The other constant I’m going to define here, we’ll call “numloops”. That’s the number of times we’re going to go through the relaxation. What we’re computing is the value of rank at some time step. The number of times we go through that is going to determine the accuracy of our ranks. We’ll use 10 as the number of loops. You can experiment with changing that and one of the questions in the homework assignment will ask you to think about what happens when you change that. So now we need to start–we said, initially, the rank of each URL is 1 divided by the number of pages. and so the dictionary ranks, we want to initialize with those values. So we’re going to create and empty dictionary–we’ll call it ranks. The number of pages–the number of pages, we can get from the graph. The graph has a dictionary of nodes and len(graph) will tell us the number of entries in that dictionary. So that’s the number of nodes in the graph, which is the number of pages that we crawled. And now we want to go through the pages, initializing each page to the value, 1.0 / npages. And I’m remembering to use 1.0 here, to make sure the division is done as floating point division and we get an accurate number, rather than integer division. So now we’ve initialized the ranks. We have a dictionary that maps each page to its current rank, which is the 1.0 / npages. So now we get to the interesting part. We need a loop that’s going to go through the number of times of numloops. Each time through this loop what we want to do is update the newranks, based on this formula–using the oldranks. And then, at the end of the loop, we’re going to make the variable ranks hold what was previously newranks– and that way we can keep going. Each time, we’re going to get a new value for newranks. At the end of doing all the updates we’re going to update ranks to refer to whatever newranks did. So that means, each time through this loop, we’re going to create a new dictionary, called newranks, that starts as empty and we’re going to add all the pages to newranks as we update their rank. So to do that, we need to go through the pages in the graph, and for each page, what we want to do is compute the newrank for that page. And the first thing we’ll do is this part– The newrank is: (1 – d) / npages plus this summation. So the first thing we’ll do is introduce a new variable. We’ll call it newrank, and we’ll assign it to this value. Then we’re going to update it, as we go through the pages that link into this page. So we’ll start by initializing newrank as (1 -d) / npages. So then what we need to do is update for the summation. And I’m going to leave this blank, and I’m going to skip that for now. This is going to be left, as a quiz. We’ll finish the rest of the code, and then your quiz will be to finish this part of the code, which is really the most interesting part of computing the Page Ranks. Once we’ve done that–so we use newrank as our variable to keep track of the rank for this page. But we want to update our dictionary so we’re going to add an entry, newranks. We’re still within the foreloop–you’re going to put your code that sums up all the links here. Once we’ve done that, we’ve got the value of newrank that reflects both the probability of starting from that page and the popularity, from all the inlinks. And so we’ll update this to be newrank. We’ve added that to our dictionary. So once we’ve finished looping through all the pages in the graph– well now we’re ready for the next step. So that means we want to make the variable, ranks, refer to the newranks– so we’ve changed the time step to the next time step. And now we’re going to go back through this loop. And we go through this loop the number of loops times. Each time, we’re updating the ranks and when we’re done, what we want to return is ranks–that’s the dictionary that maps each page to its rank.

Leave a Reply

Your email address will not be published. Required fields are marked *