Predicting Survivor 6: To What Factors Does Survivor Success Relate?by Jeffrey D. Sadow -- 02/10/2003
In the previous article, I explicated a theory about predicting success in Survivor. Here, using statistical methods, I test the validity of the theory and use it to build a model of how well one does in Survivor.
I will choose two different techniques with two different goals in mind. First, and more ambitiously, one could try to predict the exact placement of a particular player. This is possible through a technique known as ordinary least-squares regression (OLS). Assuming a relationship between one or more independent variables (ideally causally unaffected by all other relevant variables in the equation set) and a dependent variable (the one variable assumed to be affected causally, directly or indirectly, by all the independent variables), OLS essentially tells us how much effect each independent variable has upon the dependent variable, whether that effect is significant (that is, it does not occur by chance but that real covariation occurs between the two - as values of the dependent variable changes, the change in values of the independent variable mirror that), whether all together the independent variables have a significant effect on the dependent variable, and how influential are they on it.
While there are a number of methodological assumptions that must be met, the fact that we can measure the dependent variable, placement in the game, on a scale of 1 to 16 (1 being the winner), and that the independent variables, even those that are either/or (such as the indicator showing overt religiosity, coded as either a person does or doesn't), have sufficient variation in their distributions (that is, enough different values across the contestants), and that there are 80 cases (players) to use as data, make this an appropriate technique.
(And, after all, it's not something serious I'm doing, like I'm trying to model voting behavior, or states' willingness to go to war, or policy outcomes in government - it's just a game, people! Always remember that!)
There are two ways to utilize this technique. One is to assume that all variables derived from the theory (remember, 16 of them) matter, even if subsequent analysis shows some or most of them do not really have a relationship with the variable under study (placement). Here, one assumes that since theory calls for these variables' presence in the equation, they minimally serve as control variables to disentangle interactive effects among two or more variables upon the variable of interest which may obscure the individual effects of these variables.
But another way of looking at this is to say if a variable (standing in for a concept) does not show a significant relationship with the variable (and its underlying concept) under study, that the theory regarding that particular relationship is wrong and a reanalysis ought to be done leaving out that (and other statistically unrelated) variable(s), keeping in the analysis only those variables which do have a significant relationship with the dependent variable. This study will utilize both approaches.
Perhaps a less-lofty goal in prediction would be not to predict an exact placement but a general range. After all, most observers find more valuable the knowledge about whether a player makes the jury, or the final four, or the final two, as opposed to an exact placement (after all, exact placement in 10th through 16th place doesn't matter, you still didn't make the jury, regardless). A technique known as discriminant analysis can fit the bill here.
It is like OLS except that it assumes that the dependent variable is categorical in the sense that cases exhaustively are assigned to mutually exclusive groups (that is, a single case can fit in one and only one category, and all cases can be assigned to a category). So it enables us to predict whether a past observation (player's performance), given what we know about things related to that observation (the 16 independent variables), falls into the category that theory says it should ("doesn't make the jury," "makes the jury but not finals," "final four," "final two," etc.).
While less specific than OLS because it assigns to a broader category instead of exact placement, it is also more robust in explication in some ways, such as in its ability to compile a "confusion matrix" that specifies hits and misses where fewer missed predictions signifies a better model. Like OLS, we can take two approaches, of retaining all theoretically relevant variables in constructing the confusion matrix, or using only those that have significant relationships (which is more a matter of judgment than with OLS, given the way in which discriminant gets computed).
Further, we can test multiple categorization schemes where, obviously, the fewer categories get tested for, the greater are the odds of making accurate categorical predictions (but the less that tells us in terms of shades of distinguishing playing ability). I will test for three different schemes:
By analyzing multiple schemes, I hope to be able to gather some rules of thumb to help make predictions. Thus, there are 8 different analyses:
Enough with the boring methodological explanation - on to the results! Since there are eight of these, I will cull them to only three - the best three-group model, the best two-group model, and the best OLS model. The models will be rank-ordered in terms of predictive power (for anybody crazy enough to want to know how this will be done, I will be looking at the F-values and adjusted R-squared scores in OLS and the Wilks' lambda and confusion matrix totals from the discriminant analyses). Each type of analysis will have presented the variables considered significant (and to be considered so they would have to be in the predicted direction, i.e., older age must be correlated with higher placement, not the reverse). Finally, I will conclude with some interesting implications gathered from all that may be used to increase chances of correct prediction.
Generally speaking, the models without removal performed much better. And, there was a lot of removal. In the OLS model, only three variables were significant - in order, appearance (personality), sex, and family structure, all in the hypothesized direction. Since the full models fit better, we may presume valuable the use of the removed variables as controls. Nonetheless, this model did not explain a whole lot of variance - 26.2 percent, which in the social science business is decidedly average (better than 50 percent is considered extraordinary).
Discriminant analysis also came up with some decent but unspectacular models. In the 4JN removed model (three groups), the confusion matrix came in at 56.3% correctly assigned, while in the FN total model (two groups), the figure was 86.3%. The two most prominent variables were appearance and age in both cases (in the former, the only two significant variables)
(Ironically, the one thing the contestants or CBS can alter in its presentation out of all the data gathered is the strongest predictor. By publishing this, I may have destroyed the usefulness of these models, for in the future the web site photos may be chosen with this article in mind. Perhaps I credit myself with too much influence, but the way Mark Burnett likes to keep people guessing...)
Viewing the accuracies of predictions (exact placement for OLS, correct category for each discriminant) reveal some interesting information. The three-group analysis showed excellent accuracy in predicting the final four, placing 60 percent correctly but, if including the top five, that rose to 90 percent. In fact, its overall placement was so low only because of some mixing between the other two categories (more of the jurors into the non-jurors category than the opposite). This major fault of the model gets accentuated when looking at the two-group model, where only one prediction for the top four comes from outside the actual top five (Linda from S3) and within those top five 60 percent of the predictions are accurate as to who makes the top four. Three of the five winners (Ethan and Brian missing) were picked in the final four by the three-group method used here, while the two-group method missed only Vecepia. Since different numbers of variables are involved in each, results will differ somewhat.
Another way of assessing the three-group model performance is to look at how badly miscategorizations occurred. With 80 cases, if we took the absolute value of the differences in actual group versus predicted group (designating the final group as value 1, the rest of the jury as value 2, and the non-jurors value 3), the maximum mismatches would give a sum of 120, the minimum (everybody perfectly placed) of 0. The 35 errors summed to 45 (37.5th percentile), or closer to the accurate side than not (and showing only 10 of the errors were finals-in-non-jury or non-jury-in-finals, thus meaning many misses weren't bad). All in all, this is a decent model.
In OLS, we can study each case to see how deviant it was compared to its prediction and see whether it really falls outside of the realm of chance in performance - the fewer of these, the better the model. Only one case was a real outlier - Debb from S2 - which the model predicted would finish in a jury slot when in fact she was the first evictee. On the other hand, winner Rich from S1 had the third-highest score, behind Rodger in S2 (5th) and Clay in S5 (2nd). The worst score belonged to Robb of S5, who actually finished four spots higher than predicted.
(The OLS model, for methodological reasons I won't go into, predictably attenuated scores at each end. Clay's prediction was for 2.935th "place" and Robb's 14.739th "place.")
These result give me a good deal of confidence in these models. In the next article, I will apply the models to the characteristics of the cast of S6 and predict how this one turns out.
Jeffrey D. Sadow is an associate professor of political science at Louisiana State University in Shreveport where he teaches, among other things, classes in international politics, international organizations, and diplomatic history. He has published in the area of gaming simulations in international politics.
Be sure to sign up for our e-mail update so you can stay informed about new articles on the site! And take a look at the rest of the site. You can find all of our recent Survivor articles at the Survivor: The Amazon page and take a look at our sections on Joe Millionaire and The Osbournes. You can even buy reality show stuff at our Reality TV Store!