User Tools

Site Tools


gibson:teaching:spring-2018:math445:lab6

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
gibson:teaching:spring-2018:math445:lab6 [2018/03/06 06:59]
gibson
gibson:teaching:spring-2018:math445:lab6 [2018/03/21 11:41] (current)
gibson
Line 24: Line 24:
 **Problem 2: Google Page Rank.** **Problem 2: Google Page Rank.**
  
-Compute ​the Google Page Rank for a network of 100 pages within ​''​www.unh.edu''​. ​+Note 3/21/2018: This problem has been revised slightly to ameliorate problems people have experienced with the ''​surfer.m''​ script and ''​www.unh.edu''​. ​
  
-**(a)**  Download the [[https://www.mathworks.com/​matlabcentral/​mlc-downloads/​downloads/​submissions/​37976/​versions/​7/​previews/​surfer.m/​index.html?​access_key=|surfer.m]] Matlab script from [[http://​www.mathworks.com/​moler/​chapters.html|Numerical Computing with Matlab]] by Clive Moler. Then set ''​M=100'' ​and run the command  +Compute the Google Page Rank for network of 32 pages within ''​www.mit.edu''​''​www.unh.edu'', ​or another website of your choice
-''​[w,​L] = surfer('​http://​www.unh.edu',​M);''​. This might take a few minutes+
  
-**(b)**  After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. But $L$ will have 100^2 == 10,000 entries, so you can't check this by eye. Write a short piece of Matlab code that double-checks that all entries of $L$ are either 0 or 1. +**(a)**  ​Download the [[:​gibson:​teaching:​spring-2018:​math445:​surfer|surfer.m]] Matlab script (an updated version of a script from [[http://​www.mathworks.com/​moler/​chapters.html|Numerical Computing with Matlab]] by Clive Moler). Then set ''​M=32''​ and run the command  
 +''​[w,​L] = surfer('​http://​www.unh.edu',​M);''​ (or whatever website you choose). ​After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. 
  
 +**(b)** Now from the matrix $L$, generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $x^{n+1} = A x^n$. Similar to the hamster dynamics problem above, the $i$th component of the vector $x^n$ is the fraction of websurfers visiting the $i$th webpage at time $n$. You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$.
  
-**%%(c)%%** Now from the matrix $L$, generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $x^{n+1} = A x^n$. Similar to the hamster dynamics problem above, the $i$th component of the vector $p^n$ is the fraction of websurfers visiting the $i$th webpage at time $n$. You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$. +**%%(c)%%** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\alpha$) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ and dynamics $x^{n+1} = B x^n$, as follows
- +
-**(d)** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\alpha$) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ and dynamics $x^{n+1} = B x^n$, as follows+
  
 \begin{equation*} \begin{equation*}
Line 42: Line 41:
 where $[1]$ is an $M \times M$ matrix of ones. Rumor has it that Google uses $\alpha = 0.15$, so use that value and calculate $B$.  where $[1]$ is an $M \times M$ matrix of ones. Rumor has it that Google uses $\alpha = 0.15$, so use that value and calculate $B$. 
  
-**(e)** Double-check that the sum of each column of $B$ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the 100 numbers is 1! +**(d)** Double-check that the sum of each column of $B$ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the $M$ numbers is 1! 
  
-**(f)** Let’s assume that all websurfers start at the first webpage, ''​x=zeros(M,​1);​ x(1)=1;''​. If we iteratively apply ''​B''​ to ''​x''​ many times (say 40 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop+**(e)** Let’s assume that all websurfers start at the first webpage, ''​x=zeros(M,​1);​ x(1)=1;''​. If we iteratively apply ''​B''​ to ''​x''​ many times (say 40 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop
  
 <​code>​ <​code>​
Line 56: Line 55:
 After that is complete the elements of ''​x''​ give the probabilities that a random web surfer will end up at the web pages listed in ''​w'',​ and you can rank the pages in ''​w''​ according to their probabilities in ''​x''​. Look up the ''​sort''​ command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab.  After that is complete the elements of ''​x''​ give the probabilities that a random web surfer will end up at the web pages listed in ''​w'',​ and you can rank the pages in ''​w''​ according to their probabilities in ''​x''​. Look up the ''​sort''​ command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab. 
  
-**(g)**  If you are interested: Our calculation involved two free parameters: the probability $\alpha$ of jumping to an entirely random page in the network, and $N$, the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two parameters? If you change $\alpha$ to 0.1 or 0.05, do the top 10 pages change? +**(f)**  If you are interested: Our calculation involved two free parameters: the probability $\alpha$ of jumping to an entirely random page in the network, and $N$, the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two parameters? If you change $\alpha$ to 0.1 or 0.05, do the top 10 pages change? 
-How about if you change ​''​M'' ​to 100 or 1,000?+How about if you change ​$Mto 64128, 256, 512, or 1024(some of these larger values might take a really long time). ​
  
 ---- ----
gibson/teaching/spring-2018/math445/lab6.1520348386.txt.gz · Last modified: 2018/03/06 06:59 by gibson