User Tools

Site Tools


gibson:teaching:fall-2014:math445:lab5

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
gibson:teaching:fall-2014:math445:lab5 [2014/11/12 18:12]
gibson
gibson:teaching:fall-2014:math445:lab5 [2015/03/09 13:39] (current)
gibson
Line 25: Line 25:
  
 **(a)** ​ Go to the website [[http://​www.mathworks.com/​moler/​chapters.html|Numerical Computing with Matlab]] ​ **(a)** ​ Go to the website [[http://​www.mathworks.com/​moler/​chapters.html|Numerical Computing with Matlab]] ​
-by Clive Moler and download the “surfer.m” program. Set ''​N=100''​ and run the command  +by Clive Moler and download the “surfer.m” program. Set ''​M=100''​ and run the command  
-''​[U,​L] = surfer('​http://​www.unh.edu',​N);''​. This might take a few minutes. ​+''​[U,​L] = surfer('​http://​www.unh.edu',​M);''​. This might take a few minutes. ​
  
 **(b)** ​ After some time a vector ''​U''​ of websites will be generated and a matrix ''​L''​ of links will be generated. All the entries of ''​L''​ should be either 0 or 1. But ''​L''​ will have 100^2 == 10,000 entries, so you can't check this by eye. Write a short piece of Matlab code that double-checks that all entries of ''​L''​ are either 0 or 1.  **(b)** ​ After some time a vector ''​U''​ of websites will be generated and a matrix ''​L''​ of links will be generated. All the entries of ''​L''​ should be either 0 or 1. But ''​L''​ will have 100^2 == 10,000 entries, so you can't check this by eye. Write a short piece of Matlab code that double-checks that all entries of ''​L''​ are either 0 or 1. 
Line 37: Line 37:
 ''​G = (1-alpha) T + alpha **1**/​M;''​ ''​G = (1-alpha) T + alpha **1**/​M;''​
  
-where **1** is a matrix of ones. Rumor has it that Google uses alpha = 0.15, so use that value and calculate ''​T''​. ​+where **1** is a matrix of ones. Rumor has it that Google uses alpha = 0.15, so use that value and calculate ''​G''​. ​
  
-**(e)** Double-check that the sum of each column of ''​T''​ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the 100 numbers is 1! +**(e)** Double-check that the sum of each column of ''​G''​ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the 100 numbers is 1! 
  
 **(f)** Let’s assume we start at the first webpage, ''​p=zeros(N,​1);​ p(1)=1;''​. If we applying ​ ''​G''​ to ''​p''​ many times (say 1000 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop **(f)** Let’s assume we start at the first webpage, ''​p=zeros(N,​1);​ p(1)=1;''​. If we applying ​ ''​G''​ to ''​p''​ many times (say 1000 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop
Line 46: Line 46:
 N=100; N=100;
 for n=1:N; for n=1:N;
-  p=T*p;+  p=G*p;
 end end
 </​code>​ </​code>​
  
-or equivalently 
- 
-<​code>​ 
-N=100; 
-p=(G^N)*p; 
-</​code>​ 
  
 After that is complete the elements of ''​p''​ give the probabilities that a random web surfer will end up at the web pages listed in ''​U'',​ and you can rank the pages in ''​U''​ according to their probabilities in ''​p''​. Look up the ''​sort''​ command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab.  After that is complete the elements of ''​p''​ give the probabilities that a random web surfer will end up at the web pages listed in ''​U'',​ and you can rank the pages in ''​U''​ according to their probabilities in ''​p''​. Look up the ''​sort''​ command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab. 
Line 72: Line 66:
 ---- ----
  
-** Problem 4:** In 2(f) above, it was claimed that  
  
-<code+**Problem 4:** This lab has mentioned three different algorithms for calculating the page rank. For this problem, you will compare the execution time T of these three algorithms as a function of M, the number of pages in your web of links. The end result will be a plot of execution time versus M, with three lines, one for each algorithm.  
-m=1000+ 
-p=(T^m)*p;+It will be easiest to do the comparisons if you convert your page rank script to a Matlab function and then write three different algorithms as three different versions of that function. 
 + 
 + 
 +^function^algorithm^ 
 +| ''​pagerank_fullpower'' ​    | matrix power method with full matrix | 
 +| ''​pagerank_fulliterate'' ​  | iterated matrix multiplication with full matrix |  
 +| ''​pagerank_sparseiterate''​ | iterated matrix multiplication with sparse matrix |  
 + 
 + 
 +** 1. matrix power method with full matrix** 
 + 
 +Write a Matlab function 
 + 
 +  function [i,p] = pagerank_fullpower(L);​ 
 + 
 +that takes a matrix of links L as an input and returns the indices i of webpages sorted by their page ranks, and the page ranks p sorted the same way (so that p(1) >= p(2) etc). 
 + 
 + 
 +To be sure that the function operates with full matrices, make this the first line of the function body 
 + 
 +  L full(L)
 + 
 +Compute T and G as discussed above, and then compute the page rank G^N by the straightforward Matlab code 
 + 
 +  N 100; 
 +  p = G^N p; 
 + 
 + 
 +** 2. iterated matrix multiplication with full matrix ** 
 + 
 +Write a Matlab function  
 + 
 +  function [i,p] = pagerank_fulliterate(L);​ 
 + 
 +just as before, except compute G^N p with the for-loop 
 + 
 +  N=100; 
 +  for n=1:N 
 +    p = G*p 
 +  end 
 + 
 + 
 +** 3. iterated matrix multiplication with sparse matrix ** 
 + 
 +Write a Matlab function  
 + 
 +  function [i,p] = pagerank_sparseiterate(L);​ 
 + 
 +that converts the input L to a sparse matrix in the first line with  
 + 
 +  L = sparse(L) 
 + 
 +and computes G^N p with the for-loop 
 + 
 +  N = 100; 
 +  for n=1:N 
 +    p = (1-alpha)*(T*p) + alpha*ones(M,​1)/​M;​ 
 +  end 
 +   
 +or equivalently 
 + 
 +  N = 100; 
 +  one_alpha_T = (1-alpha)*T;​ 
 +  alpha_ones ​ = alpha*ones(M,​1)/​M;​ 
 +  for n=1:N 
 +    p = one_alpha_T*p + alpha_ones;​ 
 +  end 
 + 
 +Note that the sparse algorithm avoids the matrix of 1's that is present in the other two algorithms, because that matrix is necessarily full. 
 + 
 + 
 +Now write a script that tests and times these algorithms on link matrices L of sizes M = 32, 64, 128, 256, ...., up to whatever power of two your computer can handle. It's smart to develop and test your code on a smaller set of M, perhaps just M = 32, 64, 128. As a starting point, here's the code we developed in class to measure execution time of full versus sparse matrices. 
 + 
 +<code matlab>​ 
 + 
 +% script to compare full versus sparse matrix multiplication 
 + 
 +for n=1:12 
 +   
 +  % set up random M x M Ax=b problem 
 +  m = 2^n;  
 +  A = randomL(m)
 +  x = randn(m,​1);​ 
 + 
 +  % measure execution time of sparse matrix multiplication 
 +  tic; 
 +  b = A*x   
 +  Tsparse(n) = toc; 
 + 
 +  % measure execution time of full matrix multiplication 
 +  A = full(A); 
 +  tic; 
 +  b = A*x;    
 +  Tfull(n) = toc; 
 + 
 +  M(n) = m; 
 +end  ​
 </​code>​ </​code>​
  
-is faster than +You can use this Matlab function to produce synthetic data for testing any given value of M
  
-<​code>​ +<​code ​matlab
-m=1000+function L randomL(M)
-for j=1:m+% return random link matrix for testing pagerank algorithms 
-  p=T*p;+ 
 +  maxlinks = 10;  % maximum of number of links per page 
 +  L = sparse(M,​M);​ 
 +   
 +  ​for j=1:M                     
 +    i = randi(M, maxlinks, 1)% get vector of random numbers between 1 and M  
 +    for n 1:​maxlinks 
 +      L(i(n),j) = 1          % create links from page j to the random pages i 
 +    end 
 +  end 
 +  ​
 end end
 </​code>​ </​code>​
  
-as way of computing ​the steady-state distribution ​of websurfers+ 
 +Produce ​plot of execution time T versus M for each of your three pagerank functions using Matlab'​s ''​tic''​ and ''​toc''​. From the plot, estimate the explicit function that relates T to M for each of the three algorithms. 
 + 
 +Which algorithm is fastest? Why? How long do you estimate each algorithm would take to rank M=10^6 web pages? Provide your answer in easily understandable units of time (days, weeks, years, etc.
 + 
 + 
 + 
  
gibson/teaching/fall-2014/math445/lab5.1415844772.txt.gz · Last modified: 2014/11/12 18:12 by gibson