Introduction

Objective

This study guide is a resource for graduate students, PhD candidates, and early career researchers performing applied empirical research in economics and management sciences. The guide is meant for the field of analysis of health care markets using secondary data. Many textbook examples use readily available datasets for analysis of econometric problems. For students developing a related research question and generating their own analysis dataset, important steps that lead to a final analysis dataset are often missing. Additionally, many resources focus on labor economics problems. Resources that showcase processing and generating secondary data are scarce. One reason is that data sources used in health care applications are often subject to confidentiality and data protection issues.

This guide explains the five essential steps needed to create a reproducible research project. We introduce important terminology, highlight relevant tasks, and provide key resources in the form of textbooks and websites available via open access. We provide a concise guide that users can easily access when starting academic research. Each section takes about 10 to 15 minutes to read. We do not cover any specific data science or econometric method, but point to the relevant resources.

To use this guide most efficiently, users are required to have basic knowledge in statistics, econometrics and program evaluation methods. Users should be familiar with one essential programming language and one major statistical package such as R or Stata. For maximum benefit readers should have background knowledge and a research idea for their own reproducible project in mind.

Learning objectives

The goal is to set up and carry out a data science project using secondary data. Students will learn all steps starting with hypothesis formulation, data generation and analysis, and presentation of empirical results.

After reading and applying the principles introduced in this study guide, you will be able to:

  1. Recognize the features of using secondary (health care) data in empirical research.
  2. Execute the steps of a reproducible research project.
  3. Implement an empirical research project.
  4. Recall the steps taken to execute a reproducible research project using secondary data.

Structure of the study guide

The study guide consists of five chapters that include the essential steps of a reproducible research project. Each step is covered in four parts.

  1. An introduction to the basic concepts and key terminology.
  2. A resources box that includes textbooks, articles and references to current web resources with emphasis on open access material.
  3. A checklist for each step of the reproducible research project to follow.
  4. A showcase example of an empirical project replicated based on the article of Hellerstein, Judith K. 1998. “The Importance of the Physician in the Generic versus Trade-Name Prescription Decision.” The RAND Journal of Economics 29 (1): 108–36. https://doi.org/10.2307/2555818.

This is a living document

How can you contribute to this study guide? Best practices how to perform reproducible research are constantly developing. We aim to keep resources up to date. If you come across good resources that serve as additions, preferably open access, or have suggestions for improvement, please open an issue in the corresponding github repository