Garret Christensen —BITSS Project Scientist
For whatever reason, economists use a lot of Stata. It does what we want to do (data cleaning, regression analysis, data visualization) well, and the $1,000 fees we pay every other version or so doesn’t seem to have stopped its widespread adoption. But is that changing, and are people switching to R and/or Python? I have no good data on the question, but it does seem obvious that if you’re teaching students who won’t be going to grad school in economics, requiring them to use Stata instead of R is pointlessly harmful to their job prospects in the tech sector.
Second, the dynamic documents and reproducible workflow capabilities of R Markdown and knitr, and version control, which are all built right into R Studio, blow anything I’ve seen in Stata out of the water. (Link to what I’ve seen in Stata.)
Third, it’s nice to be able to have more than one dataset in memory at the same time. And lot of other things like that.
With that in mind, I’m looking around for resources designed to help Stata users learn R. If I get time I’d like to develop a Software Carpentry-style lesson for it. The resources I’ve found so far are:
- Bob Muenchen’s website and book: r4stats.com
- Mostly Harmless Econometrics in four languages: R, Stata, Julia, Python
- A one page guide to dplyr tools and their Stata analogs.
Does anyone else have other resources? Experience or suggestions on making (and/or teaching) the switch?
For me, creating the code to run a regression or two in R is like Chinese and Stata is like German. For me, the second is difficult and the first really really difficult. I prefer just to tell Alexa what I want and I look forward to the time that she will understand my command to “run a regression of US GDP growth on the lagged and twice lagged growth of US housing starts!” The future is normal language with well-designed interaction with our AI assistants. R and Chinese are for those who already speak those languages, not for me. Eviews is closer to Alexa but not very close. Let’s all move into the Future together!
Ed Leamer
Thoughts:
1. Use RStudio as your IDE
2. https://www.rstudio.com/resources/cheatsheets/ are awesome
3. Use Hadley Wickham’s “tidyverse”, especially tidyr and dplyr, for data cleaning. Embrace the %>% (“magrittr”) operator. Some R purists think using the tidyverse makes you weak. These people are wrong. You would probably not go far wrong to simply ignore every tutorial that does not use the tidyverse constituent packages.
4. The canonical reference is http://r4ds.had.co.nz
5. vignette(“tidy-data”) and demo(package = “tidyr”) are nice intros
6. https://rpubs.com/bradleyboehmke/data_wrangling is a nice overview, but I also like your suggestion of http://johnricco.github.io/2016/06/14/stata-dplyr/
7. Now that they are available, I think more people should be using R notebooks