====== Differences ====== This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
gibson:teaching:spring-2016:math445:lab6 [2016/02/24 20:34] gibson |
gibson:teaching:spring-2016:math445:lab6 [2018/03/06 06:58] (current) gibson |
||
---|---|---|---|
Line 26: | Line 26: | ||
Compute the Google Page Rank for a network of 100 pages within ''www.unh.edu''. | Compute the Google Page Rank for a network of 100 pages within ''www.unh.edu''. | ||
- | **(a)** Go to the website [[http://www.mathworks.com/moler/chapters.html|Numerical Computing with Matlab]] by Clive Moler and download the “surfer.m” program. Set ''M=100'' and run the command | + | **(a)** Download the [[https://www.mathworks.com/matlabcentral/mlc-downloads/downloads/submissions/37976/versions/7/previews/surfer.m/index.html?access_key=|surfer.m]] Matlab script from [[http://www.mathworks.com/moler/chapters.html|Numerical Computing with Matlab]] by Clive Moler. Set ''M=100'' and run the command ''[w,L] = surfer('http://www.unh.edu',M);''. This might take a few minutes. |
- | ''[w,L] = surfer('http://www.unh.edu',M);''. This might take a few minutes. | + | |
**(b)** After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. But $L$ will have 100^2 == 10,000 entries, so you can't check this by eye. Write a short piece of Matlab code that double-checks that all entries of $L$ are either 0 or 1. | **(b)** After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. But $L$ will have 100^2 == 10,000 entries, so you can't check this by eye. Write a short piece of Matlab code that double-checks that all entries of $L$ are either 0 or 1. | ||
Line 34: | Line 33: | ||
**%%(c)%%** Now from the matrix $L$, generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $p^{n+1} = A p^n$. Similar to the hamster dynamics problem above, the $i$th component of the vector $p^n$ is the fraction of websurfers visiting the $i$th webpage at time $n$. You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$. | **%%(c)%%** Now from the matrix $L$, generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $p^{n+1} = A p^n$. Similar to the hamster dynamics problem above, the $i$th component of the vector $p^n$ is the fraction of websurfers visiting the $i$th webpage at time $n$. You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$. | ||
- | **(d)** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\beta$) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ as follows | + | **(d)** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\alpha$) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ and dynamics $p^{n+1} = B p^n$, as follows |
\begin{equation*} | \begin{equation*} | ||
- | B = (1-\alpha) A + \alpha \bb{1}/N | + | B = (1-\alpha) A + \alpha [1]/M |
\end{equation*} | \end{equation*} | ||
- | where $\bb{1}$ is a matrix of ones. Rumor has it that Google uses $\alpha = 0.15$, so use that value and calculate $B$. | + | where $[1]$ is an $M \times M$ matrix of ones. Rumor has it that Google uses $\alpha = 0.15$, so use that value and calculate $B$. |
- | **(e)** Double-check that the sum of each column of ''G'' is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the 100 numbers is 1! | + | **(e)** Double-check that the sum of each column of $B$ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the 100 numbers is 1! |
- | **(f)** Let’s assume we start at the first webpage, ''p=zeros(N,1); p(1)=1;''. If we applying ''G'' to ''p'' many times (say 100 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop | + | **(f)** Let’s assume that all websurfers start at the first webpage, ''p=zeros(M,1); p(1)=1;''. If we applying ''B'' to ''p'' many times (say 40 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop |
<code> | <code> | ||
- | M=100; | + | N=40; |
- | for m=1:M; | + | for n=1:N; |
- | p=G*p; | + | p=B*p; |
end | end | ||
</code> | </code> | ||
- | After that is complete the elements of ''p'' give the probabilities that a random web surfer will end up at the web pages listed in ''U'', and you can rank the pages in ''U'' according to their probabilities in ''p''. Look up the ''sort'' command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab. | + | After that is complete the elements of ''p'' give the probabilities that a random web surfer will end up at the web pages listed in ''w'', and you can rank the pages in ''w'' according to their probabilities in ''p''. Look up the ''sort'' command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab. |
- | **(g)** If you are interested: Our calculation involved two free parameters: the probability alpha of jumping to an entirely random page in the network, and ''M'', the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two numbers? If you change alpha to 0.1 or 0.05, do the top 10 pages change? | + | **(g)** If you are interested: Our calculation involved two free parameters: the probability $\alpha$ of jumping to an entirely random page in the network, and $N$, the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two parameters? If you change $\alpha$ to 0.1 or 0.05, do the top 10 pages change? |
- | How about if you change ''M'' to 10^3 or 10^4? | + | How about if you change ''M'' to 100 or 1,000? |
---- | ---- |