channelflow.org

====== Differences ====== This shows you the differences between two versions of the page.

--- gibson:teaching:fall-2012:math445:lab6 [2012/10/15 10:49]
gibson
+++ gibson:teaching:fall-2012:math445:lab6 [2012/10/17 12:14] (current)
gibson
@@ Line 29: / Line 29: @@
 **Part 2:** Compute the Google Page Rank for a network of 100 pages within ''www.unh.edu''.
-  - Go to the website for “Experiments with MATLAB” by Cleve Moler and download the “surfer”
+**(a)**  Go to the website [[http://www.mathworks.com/moler/chapters.html|Numerical Computing with Matlab]]
-program. Run the command (may take 7-8 minutes to run):
+by Clive Moler and download the “surfer.m” program. Set ''N=100'' and run the command
-[U,G] = surfer('http://www.unh.edu',)
+''[U,G] = surfer('http://www.unh.edu',N);''. This might take a few minutes.
+**(b)**  After some time a vector ''U'' of websites will be generated and a matrix ''G'' of links will
+be generated. All the entries of ''G'' should be either 0 or 1. But ''G'' will have 100^2 == 10,000 entries,
+so you can't check this by eye. Write a short piece of Matlab code that double-checks that all entries of
+''G'' are either 0 or 1.
+**%%(c)%%** Now generate an ''H'' matrix as described in problem 1. You will probably run into an error in
+part 2, namely a webpage might contain no links or only links outside the set of pages we are working with.
+In that case we will just assume that the person randomly selects a new website among all the ''N'' pages
+in our set. Revise your calculation of ''H'' so that if a column of ''G'' sums to zero, then each of the
+entries in the same column of ''H'' is ''1/N''.
+**(d)** Suppose that instead of always clicking on a random link within a page, a web surfer sometimes
+(with probability alpha) jumps to a random page chosen from all the pages in the network. We can
+model this behavior with a new transition matrix ''T'' given by
+''T = (1-alpha) H + alpha **1**/N;''
+where **1** is a matrix of ones. Rumour has it that Google uses alpha = 0.15, so use that value
+and calculate ''T''.
+**(e)** Double-check that the sum of each column of ''T'' is 1. Again, be clever and get Matlab to do
+the work, rather than listing out all the sum of each column and verifying manually that each of the
+''N'' numbers is 1!.
+**(f)** Let’s assume we start at the first webpage, ''x=zeros(N,1); x(1)=1;''. If we applying  ''T''
+to ''x'' many times (say 10^5 times), the resulting vector will give the probability that we end up
+on a particular website and after a long session of random web surfing. We can do this the slow way
+by using the code
+<code>
+m=1e05;
+for j=1:m;
+  x=T*x;
+end
+</code>
+or a faster equivalent way
+<code>
+m=1e05;
+x=(T^m)*x;
+</code>
+After that is complete the values of ''x'' correspond to the rank of the particular websites as
+ordered in ''U''. Look up the ''sort'' command in the Help menu and find a way to print the list of
+websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random
+web surfer would spend in each one of those 10, and your code to complete the Lab. Note: There is
+a faster way to compute the solution for very large matrices, but we will not worry about that here.
+Our method would work fine until you reached the scale of 8,000-10,000 websites. Don't try it!
+**(g)**  If you are interested: Our calculation involved two free parameters: the probability alpha
+of jumping to an entirely random page in the network, and ''m'', the number that specifies the
+length of our long session of random web surfing. How robust is the page rank algorithm with
+respect to these two numbers? If you change alpha to 0.1 or 0.05, do the top 10 pages change?
+How about if you change ''m'' to 10^4?

channelflow.org

User Tools

Site Tools

Page Tools