Please submit your problem set via GauchoSpace. Submit both the .Rmd
file and the HTML
file it creates. This assignment is due by 11:55pm on Monday, July 20th. No late problem sets will be accepted. Please write your name below, and list any students you collaborated with.
a. (1 point) This dataset also tells us if each state has a ban on texting while driving. Look at the variable text_ban
. What values can it take? What is the mean and standard deviation for this variable? Given the mean, would you say that most states do or do not have bans on texting?
Extra credit (0.5 points) What do we call variables like this?
b. (1 point) As a first analysis, you want to know if the mean difference in number of deaths differs in states with and without bans. Get the difference in means, i.e. the mean number of deaths in states with a text ban minus the mean number of deaths in states without text bans. Hint: You need to subset your data using “[ ]” or the subset
command.
c. (1 point) Get this difference in means again, but by using regression. (0.4 points) (You should get the same result – if you didn’t, you made a mistake somewhere.) Interpret the coefficient estimates and their p-values (both for the intercept and the slope coefficient). (0.6 points)
d. (1 point) Can you conclude from this model result whether bans work? State why or why not, following the rules we have discussed in class for what this requires of you to argue.
a. (0.2 points) Sometimes our variables can be reconceptualized in ways that are more sensible. Let’s make a new variable that tells us the number of cell subscriptions per person. Rather than trying to control for population by including it as a predictor, construct the variable cellperpop
equal to the number of cell phone subscriptions divided by the population times one thousand. (We want to multiply the population by a thousand because the cell subscription variable happened to be reported as subscriptions per thousand people). Check that it has a mean of almost 1 (about 0.94) to make sure you made it correctly.
b. (0.2 points) Run a regression of numberofdeaths
on text_ban
, this time controlling for cell subscriptions per person (cellperpop
) and show the regression table using summary
.
c. (0.3 points) The actual mean of cellperpop
is about 0.94. What is the number of deaths that the model predicts for a hypothetical state with the average cell phone per person (cellperpop=0.94) and that has imposed a text ban?
d. (0.3 points) What is the number of deaths that the model predicts for a hypothetical state with the average cell phone per person (cellperpop=0.94) and that has not imposed a text ban?
Extra credit (0.5 points) Predict which states will impose a ban on texting by regressing a dummy for text_ban
on other variables. In this case, what is your dependent variable? Choose whatever variables you would like (except for cell_ban
) to predict your outcome variable.
Extra credit (0.5 points) Run your regression model, show the summary, and interpret each of the coefficients correctly. Use the \(R^2\) statistic reported by the model (the “multiple R-squared” is fine) to describe how well the model works in terms of the variance of \(Y\) that is “explained” by the model.
Extra credit (0.5 points) Get the predicted values (probabilities) from your model for each observation. There are multiple ways of doing this, and you can use any of them you would like (google to figure out the coding options). Choose one variable that you had included in your model and use it to create a scatter plot that has that variable for the horizontal axis and the predicted probability of text_ban=1
for the vertical axis.