![]() ![]() |
|
Full Show Index Home Search RNO Article Archive Feedback E-mail Updates Advertise With Us Write For Us |
Predicting Survivor 6: To What Factors Does Survivor Success Relate?by Jeffrey D. Sadow -- 02/10/2003
View Printable version of this article In the previous article, I explicated a theory about predicting success in Survivor. Here, using statistical methods, I test the validity of the theory and use it to build a model of how well one does in Survivor. I will choose two different techniques with two different goals in mind. First, and more ambitiously, one could try to predict the exact placement of a particular player. This is possible through a technique known as ordinary least-squares regression (OLS). Assuming a relationship between one or more independent variables (ideally causally unaffected by all other relevant variables in the equation set) and a dependent variable (the one variable assumed to be affected causally, directly or indirectly, by all the independent variables), OLS essentially tells us how much effect each independent variable has upon the dependent variable, whether that effect is significant (that is, it does not occur by chance but that real covariation occurs between the two - as values of the dependent variable changes, the change in values of the independent variable mirror that), whether all together the independent variables have a significant effect on the dependent variable, and how influential are they on it. While there are a number of methodological assumptions that must be met, the fact that we can measure the dependent variable, placement in the game, on a scale of 1 to 16 (1 being the winner), and that the independent variables, even those that are either/or (such as the indicator showing overt religiosity, coded as either a person does or doesn't), have sufficient variation in their distributions (that is, enough different values across the contestants), and that there are 80 cases (players) to use as data, make this an appropriate technique. (And, after all, it's not something serious I'm doing, like I'm trying to model voting behavior, or states' willingness to go to war, or policy outcomes in government - it's just a game, people! Always remember that!) There are two ways to utilize this technique. One is to assume that all variables derived from the theory (remember, 16 of them) matter, even if subsequent analysis shows some or most of them do not really have a relationship with the variable under study (placement). Here, one assumes that since theory calls for these variables' presence in the equation, they minimally serve as control variables to disentangle interactive effects among two or more variables upon the variable of interest which may obscure the individual effects of these variables. But another way of looking at this is to say if a variable (standing in for a concept) does not show a significant relationship with the variable (and its underlying concept) under study, that the theory regarding that particular relationship is wrong and a reanalysis ought to be done leaving out that (and other statistically unrelated) variable(s), keeping in the analysis only those variables which do have a significant relationship with the dependent variable. This study will utilize both approaches. Perhaps a less-lofty goal in prediction would be not to predict an exact placement but a general range. After all, most observers find more valuable the knowledge about whether a player makes the jury, or the final four, or the final two, as opposed to an exact placement (after all, exact placement in 10th through 16th place doesn't matter, you still didn't make the jury, regardless). A technique known as discriminant analysis can fit the bill here. It is like OLS except that it assumes that the dependent variable is categorical in the sense that cases exhaustively are assigned to mutually exclusive groups (that is, a single case can fit in one and only one category, and all cases can be assigned to a category). So it enables us to predict whether a past observation (player's performance), given what we know about things related to that observation (the 16 independent variables), falls into the category that theory says it should ("doesn't make the jury," "makes the jury but not finals," "final four," "final two," etc.). While less specific than OLS because it assigns to a broader category instead of exact placement, it is also more robust in explication in some ways, such as in its ability to compile a "confusion matrix" that specifies hits and misses where fewer missed predictions signifies a better model. Like OLS, we can take two approaches, of retaining all theoretically relevant variables in constructing the confusion matrix, or using only those that have significant relationships (which is more a matter of judgment than with OLS, given the way in which discriminant gets computed). Further, we can test multiple categorization schemes where, obviously, the fewer categories get tested for, the greater are the odds of making accurate categorical predictions (but the less that tells us in terms of shades of distinguishing playing ability). I will test for three different schemes:
View Printable version of this article |