Welcome to the first problem set. There is not a great deal of material in this problem set, but since this your first time using R and R Markdown, you will want to make sure that you leave yourself plenty of time to complete it. The goal is to get you started with coding right away. You are not expected to be able to sit down and write the answers straight away. Rather, you have several days to figure it out and to work on your answers.
If you are looking at the HTML version of the problem set (pset1.html
) that may have opened in your web browser, you are seeing the ouput produced by running the script included in the file pset1.Rmd
, also available on the course website. Go ahead and open the file called pset1.Rmd
. Once you open pset1.Rmd
, you can continue reading the text easily in that file.
It will be easiest for you to open the .Rmd
file posted for each problem set, and start writing your solutions in by learning from the code you see in the questions.
This is an R Markdown document. Markdown is a simple formatting syntax to create documents that include text, math equations, code, and the output from executing that code.
For more details on using R Markdown, see https://rmarkdown.rstudio.com/lesson-1.html.
When you click the Knit to HTML button in RStudio, an HTML document will be generated from the .Rmd
file. This document will include your answers (in text), but also the code you write and the results from running that code (such as tables or figures).
Code should be written inside a code chunk. Below is what a code chunk looks like.
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Text should be written outside of a code chunk, like so.
Please submit your problem set via GauchoSpace. Submit both the .Rmd
file and the HTML
file it creates. This assignment is due by 11:55pm on Friday, June 26th. No late problem sets will be accepted. Please write your name below, and list any students you collaborated with.
Here is an example problem, with an example solution.
In this question, we will provide the answer for you, as an example. You need to be looking at the .Rmd
file right now for this to make sense.
Execute the code in the chunk below by running the command getwd()
. (Ask your TA if you don’t know how to run a command - this is essential). Describe what this command does.
getwd()
## [1] "C:/Users/Alice/Box Sync/PhD/Teaching/POLS 15 Introduction to Research Methods_Summer 2020/Problem Sets/Problem Set 1"
This command, when executed, tells the user what directory is set as the working directory. The working directory is the directory where R will look for files (such as a data-set that you want to analyze), and where it will save the output (such as the HTML
file that is created when you click on Knit).
Before you write any code, you need to set your working directory so that R knows where to pull the data from.
Now, you should knit your .Rmd
file by clicking Knit. Take a look at the HTML
file that it created and see what you get.
Remember to knit often to make sure that your RMarkdown file can compile.
Okay, your turn to answer the remaining questions!
Start by reading Chapter 1 of the textbook, Real Stats. You can also review your lecture notes.
A researcher observes that more educated people vote at a higher rate. He decides to publish a research article that says completing a bachelor’s degree causes people to participate in elections at a higher rate. Would you like to be a co-author on this paper? Why or why not? (100 words max)
Your answer.
Explain what this sentence means: “Experiments create exogeneity via randomization.” (75 words max)
Your answer.
What are some problems with experiments, particularly in a social science discipline such as political science? (100 words max)
Your answer.
You decide to run an experiment to see whether going to lectures helps students learn. You randomly assign half of the class to go to the lecture and section for the course, and to read the textbook. For the other half of the class, you just assign the students to read the textbook. At the end of the quarter, you give the entire class a test. You find that the students in the first group did much better than those in the second group, who only read the textbook.
4a (0.7 points) What could you call each group?
Your answer.
4b (0.7 points) What is your independent variable and what is your dependent variable?
Your answer.
4c (0.7 points) Given the set up of this experiment, list some factors that you are controlling for.
Your answer.
4d (0.7 points) Can you say that attending lectures caused the students to do better on the test? Why or why not? Explain using the technical terms in the textbook.
Your answer.
4e (0.7 points) Can you say that this finding would also apply in courses with online lectures? Why or why not? Explain using the technical terms in the textbook.
Your answer.
Imagine that you are looking at the relationship between level of education and salary. List some of the factors that could lead to endogeneity. (50 words max)
Your answer.
When James Carville was crafting a simple catchphrase to summarize then-presidential candidate Bill Clinton’s electoral message, he hung a sign up at campaign headquarters that read “The Economy, stupid”. The phrase has since morphed into “It’s the economy, stupid” and it still reflects the core message that the economy decides elections. When times are good, voters want more of the same; when times are bad, they want a fresh face. If we look at presidential elections from the last 70 years, do the data support this claim?
The Presidential Voteshare database from 1948–2012 offers a chance to evaluate this hypothesis.
Download the dataset, presvote.Rdata
, which you’ll find on the course website. You may want to put it in your working directory to make it easy to find (use getwd()
to see what your current working directory is; you can use the setwd()
command to change your working directory.)
Here is a brief description of the variables:
You will need to use the $
operator to access these variables within the presdata
dataset. Specifically, once you have loaded presvote.RData
, the result will be available in your environment as the data object presdata
. To get at the variable vote
, for example, you would use presdata$vote
. Remember, the end of each chapter in the textbook includes R code that can be helpful. We have also posted R resources on Gauchospace.
Load the data into R. The data are stored as an RData file, so you can use the load()
function to load it.
# Your code
Using code, check the dimensions of the data (i.e. the number of rows and columns). How many observations are there? How many variables are there?
# Your code
Your answer.
What is the range of years covered in this data set?
# Your code
Your answer.
Calculate the average change in real disposable income across all observations in the sample. Do you think this is a large or a small average?
# Your code
Your answer.
What is the minimum and the maximum change in real disposable income?
# Your code
Your answer.
Calculate the average two-party vote share of the incumbent. What does this tell you about the power of incumbency?
# Your code
Your answer.