====== Differences ====== This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
gibson:teaching:fall-2012:math445:lab8 [2012/11/01 09:40] gibson created |
gibson:teaching:fall-2012:math445:lab8 [2012/11/06 06:24] (current) gibson [Math 445 Lab 8: Presidential election] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Math 445 Lab 8; Presidential election ====== | ||
- | Your job is to predict the outcome of today's Presidential election, given the last-minute polling data. | + | ====== Math 445 Lab 8: Presidential election ====== |
- | Specifically, given a list of states, their electoral votes, the composite polling percentages for each | + | Your job is to predict the outcome of today's Presidential election |
- | candidate, and the margins of error those polling percentages, you are to run a large number (''Nelections'') | + | given the last-minute polling data, using Monte Carlo simulation. |
- | of simulations of the election. For each state, award each of the two candidates the specified composite | + | |
- | polling percent plus | + | |
- | a random number in the range between -margin and +margin. Then award that state's electoral votes to the candidate | + | |
- | with the larger percentage of votes. Add up all the electoral votes for each candidate, and award the nth election | + | |
+ | Specifically, given a list of states, their electoral votes, the composite | ||
+ | polling percentages for each candidate, and the margins of error those | ||
+ | polling percentages, you are to run a large number of simulations of the | ||
+ | election and determine the likelihood that either candidate will win based | ||
+ | on the results of those simulations. For each state, start by assigning the | ||
+ | specified composite polling percentages to the two candidates. Then add to | ||
+ | each candidate's percentage a different random number in the range between | ||
+ | ''-margin'' and ''+margin''. Compare the two percentages and award that state's | ||
+ | electoral votes to the candidate with the larger percentage of votes. Do this | ||
+ | for all fifty states (plus DC), add up all the electoral votes for each candidate, | ||
+ | and award the ''n''th election to the candidate with the majority of electoral votes. | ||
+ | Run a large number of such simulated elections, keeping track of the number of | ||
+ | electoral votes for each candidate in each election. Make a histogram that | ||
+ | shows the statistical distribution of total electoral votes for one of the | ||
+ | candidates, using bins of width 10 between 0 and 540 (0-9.99 for bin 1, 10-14.99 for | ||
+ | bin 2, etc). If you can figure out how, color the bins corresponding to Romney | ||
+ | wins red and the bins corresponding to Obama wins blue, or else just draw a vertical | ||
+ | line at the magic number of 270 electoral votes needed to win the election outright. | ||
+ | ===== Questions ===== | ||
+ | |||
+ | Then answer the following questions | ||
+ | |||
+ | - Who is most likely to win the presidential election? | ||
+ | - What is the probability that the most likely winner will actually win? | ||
+ | - What is the most likely range of electoral votes for the winner? (among the bins of width 10 specified above) | ||
+ | - What is the likelihood of a 269-269 electoral vote tie? | ||
+ | |||
+ | Turn in print-outs of your codes, your histogram, and your answers to the above questions. | ||
+ | |||
+ | |||
+ | ===== Tips ===== | ||
+ | |||
+ | |||
+ | * Start with a small number of simulated elections (say 100) and then increase to a large number (say 10,000) when you're confident your code is working correctly. | ||
+ | * You can also develop your code using simulated data, for example, just ten states all with the same polling numbers and a very small margin of error. | ||
+ | * Try to use as few for-loops as possible. If you are really on fire, you can do it with just one for-loop that loops over the number of trials. | ||
+ | * Changing the colors of histogram bins in Matlab is not as easy as one might hope. You'll need to take data returned from the **hist** function and replot it with the **bar** command. See http://www.mathworks.com/matlabcentral/newsreader/view_thread/290534 for an example of how to do this. | ||
+ | |||
+ | ===== Broader questions ===== | ||
+ | |||
+ | Some further questions you might also address | ||
+ | |||
+ | * The margins of error reported in the table are really 95% confidence levels, corresponding to two standard deviations of a Gaussian distribution. Modify your code so that the random number added to each percentage is from a Gaussian distribution with standard deviation of one-half the margin of error. Does this significantly change your results? | ||
+ | * Does doubling or halving the margins of error significantly change your results? | ||
+ | * How many elections do you need to simulate in order to get reliable answers? | ||
+ | * The lab as written assumes a two-party presidential election. Should we include third-party candidates? Why or why not? How would you revise your code to include a third party? Would it change the results significantly? | ||
+ | * We are trusting that the polling data form an accurate estimate of the actual votes cast, to within the margins of error. The data reported below was obtained from [[http://fivethirtyeight.blogs.nytimes.com/]], and is claimed by its compiler to be unbiased and statistically reliable estimate, though there is a fair amount of controversy about this, split along party and ideological lines. Do you think the given polling data is fair and accurate? Is there a reason to suspect it is or is not? | ||
+ | * Do you believe your own election prediction? Why or why not? | ||
+ | |||
+ | |||
+ | Relevant matlab commands; **rand**, **randn**, **sum**, **hist**, and **bar**, plus standard plotting commands such as **xlabel**, **ylabel**, **title**. | ||
+ | |||
+ | ===== Background ===== | ||
+ | |||
+ | Nate Silver, a sports statistician, pioneered the use of Monte Carlo methods | ||
+ | in election prediction during the 2008 elections ([[http://fivethirtyeight.blogs.nytimes.com/]], [[http://en.wikipedia.org/wiki/FiveThirtyEight]]). In the 2008 elections, His model predicted 49 of 50 states correctly for the Presidential race (missing Indiana, which went to Obama by 1%) and all 35 Senate races correctly. Note that this lab does not cover the subtlest and most difficult aspect of election prediction: producing good composite poll numbers and margins of error from large numbers of pollsters using different methods, sample sizes, and polling dates. There is quite a bit of controversy in the current election over Mr. Silver's methods and his assessment that Obama has an 91% chance of winning the election. See, for example, | ||
+ | |||
+ | * [[http://cosmiclog.nbcnews.com/_news/2012/10/30/14809227-political-forecasts-stir-up-a-storm?lite]], | ||
+ | * [[http://www.dailykos.com/story/2012/11/01/1153661/-Nate-Silver-s-Math-Based-Math]] | ||
+ | * [[http://2012.talkingpointsmemo.com/2012/11/nate-silver-colbert-report-pundits.php?ref=fpnewsfeed|Nate Silver on Colbert the Colbert Report]] | ||
+ | * google:"Nate Silver controversy"| | ||
+ | ===== Data ===== | ||
+ | |||
+ | Here's some current polling data, taken from [[http://fivethirtyeight.blogs.nytimes.com]] on 2012-11-01. You can load this into Matlab as a matrix ''P'' by cutting and pasting the data into a text file ''P.asc'' and running ''load P.asc'' within Matlab. If you don't believe this polling data, feel free to use something you trust more. | ||
+ | <code> | ||
+ | % Composite Presidential election polling numbers | ||
+ | % from http://fivethirtyeight.blogs.nytimes.com | ||
+ | % 2012-11-06 1am | ||
+ | % | ||
+ | % O == Obama percentage | ||
+ | % R == Romney percentage | ||
+ | % M == margin of error | ||
+ | % EV == electoral votes | ||
+ | % | ||
+ | % O R M EV state | ||
+ | 36.8 62.7 3.8 9 % AL | ||
+ | 38.8 59.7 6.0 3 % AK | ||
+ | 46.2 53.0 3.3 11 % AZ | ||
+ | 38.7 59.7 3.8 6 % AR | ||
+ | 58.2 40.5 2.9 55 % CA | ||
+ | 50.9 48.2 3.0 9 % CO | ||
+ | 56.7 42.4 3.3 7 % CT | ||
+ | 59.6 39.7 5.5 3 % DE | ||
+ | 93.1 6.3 3.2 3 % DC | ||
+ | 49.9 49.7 2.7 29 % FL | ||
+ | 45.5 54.1 2.7 16 % GA | ||
+ | 66.5 32.6 3.9 4 % HA | ||
+ | 32.2 66.1 4.4 4 % ID | ||
+ | 59.9 39.5 3.0 20 % IL | ||
+ | 45.3 53.9 3.0 11 % IN | ||
+ | 51.2 47.8 3.2 6 % IA | ||
+ | 38.0 61.0 6.1 6 % KA | ||
+ | 40.4 58.7 4.5 8 % KY | ||
+ | 39.4 59.8 3.5 8 % LA | ||
+ | 56.1 42.7 3.7 4 % ME | ||
+ | 61.0 38.0 3.0 10 % MD | ||
+ | 59.1 39.8 3.7 11 % MA | ||
+ | 53.1 45.8 2.7 16 % MI | ||
+ | 53.8 45.0 2.9 10 % MN | ||
+ | 39.4 60.1 5.3 6 % MS | ||
+ | 45.6 53.6 2.8 10 % MO | ||
+ | 45.3 53.1 3.9 3 % MT | ||
+ | 40.5 58.8 3.3 5 % NE | ||
+ | 51.9 47.2 2.9 6 % NV | ||
+ | 51.5 47.8 3.4 4 % NH | ||
+ | 55.6 43.4 3.3 14 % NJ | ||
+ | 54.2 44.6 3.6 5 % NM | ||
+ | 62.5 36.9 2.8 29 % NY | ||
+ | 48.9 50.5 2.6 15 % NC | ||
+ | 42.1 56.5 3.9 3 % ND | ||
+ | 51.4 47.6 2.7 18 % OH | ||
+ | 33.9 65.8 3.8 7 % OK | ||
+ | 53.7 44.0 3.6 7 % OR | ||
+ | 52.6 46.5 2.6 20 % PA | ||
+ | 61.9 36.3 4.3 4 % RI | ||
+ | 43.3 56.0 4.6 9 % SC | ||
+ | 42.6 56.1 4.2 3 % SD | ||
+ | 41.4 57.7 3.9 11 % TN | ||
+ | 41.3 58.1 3.1 38 % TX | ||
+ | 27.8 70.5 4.1 6 % UT | ||
+ | 66.3 32.5 4.8 3 % VT | ||
+ | 50.8 48.6 2.5 13 % VA | ||
+ | 56.2 42.5 3.5 12 % WA | ||
+ | 41.4 57.4 4.7 5 % WV | ||
+ | 52.5 46.8 2.9 10 % WI | ||
+ | 30.9 67.6 6.0 3 % WY | ||
+ | </code> |