causalbenchmarks is an R package that provides convenient functions to interface with the causalbenchmarks.org API.

Getting started

To begin using the package, you will need to have an algorithm key, which you can obtain from the causalbenchmarks dashboard. An algorithm key is an API key specific to an individual causal inference algorithm and is used to request new data analysis tasks. Like other API keys, it should not be shared publicly.

If you are unsure whether you would like to make an account, or just would like to experiment with the package, you can also use the testing algorithm key, ABCDEFGHIJKLMNOPQRSTUVWXYZ1234. This is a special algorithm for the rct_100 task which will never show up on the leaderboards, but it will allow you to experiment with the API. You can check out some details about the test algorithm here

Step 1: Request a New Ticket

Once you have your algorithm key, you can request a data analysis ticket. A ticket is a unique identifier associated with a simulation, and provides a way to request a dataset and submit the results of your analysis once your algorithm has run. To request a ticket, we use the get_new_ticket function which returns a new ticket for the algorithm associated with the submitted algorithm key.

If we print out the ticket id, we see it is another 30-character identifier, just like the algorithm key. However, this key is specific to this simulation, and will only be used to request a dataset and submit the analysis.

ticket_id <- get_new_ticket("ABCDEFGHIJKLMNOPQRSTUVWXYZ1234")
print(ticket_id)

## [1] "ATHJ55MJA71DXZ2DCVKUJJB2S6P5F7IN"

Step 2: Request the Dataset

Now that we have a new data analysis ticket, we can ask for a dataset and begin our analysis using the get_dataset function in conjunction with the ticket identifier. Because submissions to causalbenchmarks.org are timed, requesting this dataset starts a timer which is stopped when we submit our answer to the data analysis task.

data <- get_dataset(ticket_id)
print(head(data))

## # A tibble: 6 × 2
##   treatment outcome
##   <lgl>       <dbl>
## 1 TRUE        2.38 
## 2 TRUE        2.27 
## 3 FALSE       0.356
## 4 TRUE        3.13 
## 5 FALSE       1.11 
## 6 FALSE       2.76

Step 3: Run your Algorithm

Once we have the dataset, we can run our analysis. In this case, we are going to use linear regression to estimate the difference in group means and come up with standard errors for our treatment effect estimates using the lm function. In practice, you would perform your own data analysis here with your own algorithm.

# estimate treatment effect via difference in group means
linear_model <- lm(outcome~treatment, data = data) 
# extract our answer from the models' coefficients
estimate <- coef(linear_model)[2]
estimate

## treatmentTRUE 
##     0.4829739

Step 4: Submit your results

With our estimate in hand, we can then submit our answer to the API for benchmarking. The submit_estimate function takes in the ticket_id, estimate and optionally, a lower bound and upper bound on the 95% confidence interval for our estimate. The function returns a tidy benchmark of statistics so we can see how we did! These statistics are also available online and if you have made your algorithm public, they will show up on the leaderboard if you are in the top 50 algorithms for a task!

# Optional: also include ci_lo and ci_hi to get coverage statistics (in this case our CI is [-20,20])
submit_estimate(ticket_id, estimate, -20.0, 20.0)

## # A tibble: 1 × 8
##   estimate ground_truth     mse total_mse time_s ci_lo ci_hi coverage_rate
##      <dbl>        <dbl>   <dbl>     <dbl>  <dbl> <dbl> <dbl>         <dbl>
## 1    0.483        0.439 0.00196   0.00196  0.165   -20    20             1