====== Differences ====== This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
gibson:teaching:spring-2018:math445:lab6 [2018/03/21 09:16] gibson |
gibson:teaching:spring-2018:math445:lab6 [2018/03/21 11:41] (current) gibson |
||
---|---|---|---|
Line 26: | Line 26: | ||
Note 3/21/2018: This problem has been revised slightly to ameliorate problems people have experienced with the ''surfer.m'' script and ''www.unh.edu''. | Note 3/21/2018: This problem has been revised slightly to ameliorate problems people have experienced with the ''surfer.m'' script and ''www.unh.edu''. | ||
- | Compute the Google Page Rank for a network of 32 pages within ''www.mit.edu'', ''www.unh.edu'', or another website of your choice. (Note 3/21/2018: This problem has been revised slightly to ameliorate problems people have experienced with the ''surfer.m'' script and ''www.unh.edu''.) | + | Compute the Google Page Rank for a network of 32 pages within ''www.mit.edu'', ''www.unh.edu'', or another website of your choice. |
**(a)** Download the [[:gibson:teaching:spring-2018:math445:surfer|surfer.m]] Matlab script (an updated version of a script from [[http://www.mathworks.com/moler/chapters.html|Numerical Computing with Matlab]] by Clive Moler). Then set ''M=32'' and run the command | **(a)** Download the [[:gibson:teaching:spring-2018:math445:surfer|surfer.m]] Matlab script (an updated version of a script from [[http://www.mathworks.com/moler/chapters.html|Numerical Computing with Matlab]] by Clive Moler). Then set ''M=32'' and run the command | ||
''[w,L] = surfer('http://www.unh.edu',M);'' (or whatever website you choose). After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. | ''[w,L] = surfer('http://www.unh.edu',M);'' (or whatever website you choose). After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. | ||
- | **(b)** Now from the matrix $L$, generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $x^{n+1} = A x^n$. Similar to the hamster dynamics problem above, the $i$th component of the vector $p^n$ is the fraction of websurfers visiting the $i$th webpage at time $n$. You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$. | + | **(b)** Now from the matrix $L$, generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $x^{n+1} = A x^n$. Similar to the hamster dynamics problem above, the $i$th component of the vector $x^n$ is the fraction of websurfers visiting the $i$th webpage at time $n$. You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$. |
**%%(c)%%** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\alpha$) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ and dynamics $x^{n+1} = B x^n$, as follows | **%%(c)%%** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\alpha$) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ and dynamics $x^{n+1} = B x^n$, as follows | ||
Line 41: | Line 41: | ||
where $[1]$ is an $M \times M$ matrix of ones. Rumor has it that Google uses $\alpha = 0.15$, so use that value and calculate $B$. | where $[1]$ is an $M \times M$ matrix of ones. Rumor has it that Google uses $\alpha = 0.15$, so use that value and calculate $B$. | ||
- | **(d)** Double-check that the sum of each column of $B$ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the 100 numbers is 1! | + | **(d)** Double-check that the sum of each column of $B$ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the $M$ numbers is 1! |
**(e)** Let’s assume that all websurfers start at the first webpage, ''x=zeros(M,1); x(1)=1;''. If we iteratively apply ''B'' to ''x'' many times (say 40 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop | **(e)** Let’s assume that all websurfers start at the first webpage, ''x=zeros(M,1); x(1)=1;''. If we iteratively apply ''B'' to ''x'' many times (say 40 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop | ||
Line 56: | Line 56: | ||
**(f)** If you are interested: Our calculation involved two free parameters: the probability $\alpha$ of jumping to an entirely random page in the network, and $N$, the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two parameters? If you change $\alpha$ to 0.1 or 0.05, do the top 10 pages change? | **(f)** If you are interested: Our calculation involved two free parameters: the probability $\alpha$ of jumping to an entirely random page in the network, and $N$, the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two parameters? If you change $\alpha$ to 0.1 or 0.05, do the top 10 pages change? | ||
- | How about if you change ''M'' to 100 or 1,000? | + | How about if you change $M$ to 64, 128, 256, 512, or 1024? (some of these larger values might take a really long time). |
---- | ---- |