====== Differences ====== This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
gibson:teaching:spring-2018:math445:lab6 [2018/03/06 06:59] gibson |
gibson:teaching:spring-2018:math445:lab6 [2018/03/21 11:41] (current) gibson |
||
---|---|---|---|
Line 24: | Line 24: | ||
**Problem 2: Google Page Rank.** | **Problem 2: Google Page Rank.** | ||
- | Compute the Google Page Rank for a network of 100 pages within ''www.unh.edu''. | + | Note 3/21/2018: This problem has been revised slightly to ameliorate problems people have experienced with the ''surfer.m'' script and ''www.unh.edu''. |
- | **(a)** Download the [[https://www.mathworks.com/matlabcentral/mlc-downloads/downloads/submissions/37976/versions/7/previews/surfer.m/index.html?access_key=|surfer.m]] Matlab script from [[http://www.mathworks.com/moler/chapters.html|Numerical Computing with Matlab]] by Clive Moler. Then set ''M=100'' and run the command | + | Compute the Google Page Rank for a network of 32 pages within ''www.mit.edu'', ''www.unh.edu'', or another website of your choice. |
- | ''[w,L] = surfer('http://www.unh.edu',M);''. This might take a few minutes. | + | |
- | **(b)** After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. But $L$ will have 100^2 == 10,000 entries, so you can't check this by eye. Write a short piece of Matlab code that double-checks that all entries of $L$ are either 0 or 1. | + | **(a)** Download the [[:gibson:teaching:spring-2018:math445:surfer|surfer.m]] Matlab script (an updated version of a script from [[http://www.mathworks.com/moler/chapters.html|Numerical Computing with Matlab]] by Clive Moler). Then set ''M=32'' and run the command |
+ | ''[w,L] = surfer('http://www.unh.edu',M);'' (or whatever website you choose). After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. | ||
+ | **(b)** Now from the matrix $L$, generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $x^{n+1} = A x^n$. Similar to the hamster dynamics problem above, the $i$th component of the vector $x^n$ is the fraction of websurfers visiting the $i$th webpage at time $n$. You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$. | ||
- | **%%(c)%%** Now from the matrix $L$, generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $x^{n+1} = A x^n$. Similar to the hamster dynamics problem above, the $i$th component of the vector $p^n$ is the fraction of websurfers visiting the $i$th webpage at time $n$. You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$. | + | **%%(c)%%** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\alpha$) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ and dynamics $x^{n+1} = B x^n$, as follows |
- | + | ||
- | **(d)** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\alpha$) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ and dynamics $x^{n+1} = B x^n$, as follows | + | |
\begin{equation*} | \begin{equation*} | ||
Line 42: | Line 41: | ||
where $[1]$ is an $M \times M$ matrix of ones. Rumor has it that Google uses $\alpha = 0.15$, so use that value and calculate $B$. | where $[1]$ is an $M \times M$ matrix of ones. Rumor has it that Google uses $\alpha = 0.15$, so use that value and calculate $B$. | ||
- | **(e)** Double-check that the sum of each column of $B$ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the 100 numbers is 1! | + | **(d)** Double-check that the sum of each column of $B$ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the $M$ numbers is 1! |
- | **(f)** Let’s assume that all websurfers start at the first webpage, ''x=zeros(M,1); x(1)=1;''. If we iteratively apply ''B'' to ''x'' many times (say 40 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop | + | **(e)** Let’s assume that all websurfers start at the first webpage, ''x=zeros(M,1); x(1)=1;''. If we iteratively apply ''B'' to ''x'' many times (say 40 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop |
<code> | <code> | ||
Line 56: | Line 55: | ||
After that is complete the elements of ''x'' give the probabilities that a random web surfer will end up at the web pages listed in ''w'', and you can rank the pages in ''w'' according to their probabilities in ''x''. Look up the ''sort'' command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab. | After that is complete the elements of ''x'' give the probabilities that a random web surfer will end up at the web pages listed in ''w'', and you can rank the pages in ''w'' according to their probabilities in ''x''. Look up the ''sort'' command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab. | ||
- | **(g)** If you are interested: Our calculation involved two free parameters: the probability $\alpha$ of jumping to an entirely random page in the network, and $N$, the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two parameters? If you change $\alpha$ to 0.1 or 0.05, do the top 10 pages change? | + | **(f)** If you are interested: Our calculation involved two free parameters: the probability $\alpha$ of jumping to an entirely random page in the network, and $N$, the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two parameters? If you change $\alpha$ to 0.1 or 0.05, do the top 10 pages change? |
- | How about if you change ''M'' to 100 or 1,000? | + | How about if you change $M$ to 64, 128, 256, 512, or 1024? (some of these larger values might take a really long time). |
---- | ---- |