Final rules

GOOD LUCK!

Name:

Reminder

You can write math in RStudio by enclosing it between dollar signs, like so: \(\beta_1\). Refer to the explainer “How to write math in RStudio” posted at the top of the GauchoSpace for more information.


Multiple choice questions (4 points total)

Simply write your chosen answer (A, B, C, or D). No explanation needed.

1. (1 point) Endogeneity occurs when:

  • A. You have a confounder
  • B. Your independent variable is correlated with the error term
  • C. Your dependent variable is correlated with the error term
  • D. A and B

Your answer

2. (1 point) An independent variable is endogenous if:

  • A. It is correlated with another independent variable in the model
  • B. It is correlated with the error term
  • C. It is not correlated with the error term
  • D. It is correlated with Y

Your answer

3. (1 point) Which of the following is used to evaluate the goodness of fit of a model?

  • A. The residual \(\hat{\epsilon}_i\)
  • B. The standard error of the coefficient \(SE(\hat{\beta})\)
  • C. The statistic \(R^2\)
  • D. The intercept \(\beta_0\)

Your answer

4. (1 point) What allows us to claim that, for large samples, the sample average of any random variable will follow a normal distribution?

  • A. The Law of Large Numbers
  • B. The Central Limit Theorem
  • C. Randomization
  • D. The fact that the underlying random variable follows a normal distribution

Your answer


Fill in the blank (4 points total)

Write down the appropriate technical term (1 or 2 words) for your answer. No explanation needed.

5. (1 point) A researcher is interested in understanding whether wealth shapes political attitudes towards redistribution. In an observational study, she collects data on the political attitudes of lottery winners and of people nearby who did not win the lottery - and finds that those who win the lottery are more hostile towards estate taxes and government redistribution. Since lottery winnings are randomly assigned, she argues that this is a plausibly exogeneous independent variable. What type of research design would you call this?

Your answer

6. (1 point) A researcher wants to analyze the effect of a regular exercise schedule on health outcomes. He randomly allocates vouchers for private training sessions at the gym: those who receive the voucher are in the treatment group, those who do not are in the control group. In the first year of the experiment, 70% of those who were given free vouchers for the gym used them. In the second year, only 60% of those offered the free vouchers used them. What is the name for what is going wrong here?

Your answer

7. (1 point) You decide to run a Randomized Control Trial to understand the effect of a treatment \(X\) on some outcome \(Y\). You randomly assign the treatment to the treated group; the people in the control group do not get the treatment. Next, you run a bivariate regression of \(Y\) on \(X\), where \(X\) is a dummy variable. In this experimental setting, what is another name for the slope coefficient \(\beta_1\) on \(X\)?

Your answer

8. (1 point) A researcher has some strongly held pre-existing ideas about the relationship between gun control policy and gun deaths. He runs a bunch of models in order to find one with significant results that fits his beliefs, and only reports that model. What is this researcher engaging in?

Your answer


Short answers (4 points total)

Explain your answer with two or three sentences (no more).

9. (2 points) You toss a fair coin (i.e., one that gives you heads about half the time) 1000 times and record the outcome for each toss. If you were to graph a histogram or density plot of all the outcomes, would you roughly see a normal distribution? Explain why or why not.

Your answer below

10. (2 points) What are the consequences of including irrelevant variables in your model?

Your answer below


Data analysis (24 points total)

The United Nations Sustainable Development Goals (SDGs) are the blueprint to achieve a better world for all. Goal 5 of the SDGs is “Achieve gender equality and empower all women and girls”.

The demand for female employment increased substantially during the Second World War, as men were away at war. Rising female labor participation has continued in the second half of the 20th century.

What are the determinants of female labor force participation? We will investigate this question using data from the OECD and the World Bank.

The data-set gender.RData contains the following variables for a sample of OECD countries between 1997-2016.

Variable Name Description
Country Country name
Year Year of the observation
LaborParticip Female labor participation rate (% of females)
SeatsParliament Female share of seats in national parliaments (%)
FertilityRate Fertility rate (births per woman)
PTERateFem Share of females employed in part-time employment (% of females)
ParentalLeave Length of parental leave with job protection (in weeks)

11. (1 point) Set your working directory and load the data-set gender.RData.

# Your code here

12. (2 points total) Let’s start by examining the data-set and looking at some summary statistics. What is your sample size? (0.4 points) What is the mean value of LaborParticip? (0.4 points) What is the standard deviation of this variable? (0.4 points) What is the minimum and maximum of female labor force participation? (0.4 points) What type of data-set are we working with here? (Hint: Lecture 8) (0.4 points)

# Your code here

Your answer belows

13. (2 points total) Create a density plot to display the distribution of the variable LaborParticip. (0.5 points). Add the mean and the median of this variable to your density plot. (You can do the first two sub-questions on the same graph). (1 point) How would you characterize this distribution? (0.5 points)

# Your code here

Your answer below

14. (2 points total) We might posit that countries with greater gender equity might also have higher female labor force participation. Let’s examine whether female representation in the legislature predicts female labor force participation. What is the correlation between those two variables? (1 point) Create a scatter plot showing the relationship between those two variables (using the set-up of IV/DV implied in this question). (1 point)

# 14a. Your code here

Your answer below

# 14b. Your code here

15. (1 point total) Let’s examine this formally. You want to know whether female representation in the legislature predicts female labor force participation, controlling for the fertility rate and the share of female part-time employment.

Given this set-up, what is your dependent variable? (0.4 points) What is your main independent variable? (0.4 points) What are your control variables? (0.2 points)

Your answers below

16. (1 point total) Write down the multivariate regression model that we want to estimate (as implied by the question above).

Your answer below

17. (2 points total) State your null hypothesis for the primary explanatory variable of interest (in words and in math) (1 point) State the corresponding two-sided alternative hypothesis (in words and in math) (1 point)

Your answers below

18. (4 points total) Run a regression of female labor force participation on female parliamentary representation, controlling for the fertility rate and the share of female part-time employment. Display the output of this regression. (1 point) Interpret each of the coefficients (except for the intercept) and comment on their statistical significance. (3 points)

# Your code here

Your answers below

19. (1 point) Interpret your findings in terms of the hypotheses stated in Q17. Do you find evidence for or against the null hypothesis?

Your answer below

20. (4 points total) A critic points out that the fertility rate is not the only determinant of female labor force participation, and that workplace policies which are supportive of families would help reduce the burden of childcare on women. To evaluate this criticism, add parental leave as an additional control variable to your model from Q18. Show the output of this regression. (1 point) Substantively interpret the coefficient on the parental leave variable and its p-value. (1 point)

Now look at the coefficient on fertility rate in this model, and in the previous one from Q18. Comment on the magnitude of that coefficient. (1 point) Is the result from the previous model robust to the addition of the new control? (1 point)

# Your code here

Your answers below

21. (4 points total) Can we say that we have identified a causal relationship between female parliamentary representation and female labor force participation? Why or why not?

Your answer below

Extra credit. (2 points) What is the most interesting and useful idea related to research methods that you learned in this class, which you will remember in the future?

Your answer below