Mixing languages using Jupyter notebooks

Better Code, Better Science: Chapter 6, Part 5

Nov 25, 2025

This is a possible section from the open-source living textbook Better Code, Better Science, which is being released in sections on Substack. The entire book can be accessed here and the Github repository is here. This material is released under CC-BY-NC. Thanks to Steffen Bollman for helpful suggestions on a draft of this section.

It’s very common for researchers to use different coding languages to solve different problems. A common use case is the Python user who wishes to take advantage of the much wider range of statistical methods that are implemented in R. There is a package called rpy2 that allows this within pure Python code, but it can be cumbersome to work with, particularly due to the need to convert complex data types. Fortunately, Jupyter notebooks provide a convenient solution to this problem, via magic commands. These are commands that start with either a % (for line commands) or %% for cell commands, which enable additional functionality.

An example of this can be seen in the mixing_languages.ipynb notebook, in which we load and preprocess some data using Python and then use R magic commands to analyze the data using a package only available within R. In this example, we will work with data from a study published by our laboratory (Eisenberg et al., 2019), in which 522 people completed a large battery of psychological tests and surveys. We will focus here on the responses to a survey known as the “Barratt Impulsiveness Scale” which includes 30 questions related to different aspects of the psychological construct of “impulsiveness”; for example, “I say things without thinking” or “I plan tasks carefully”. Each participant rated each of these statements on a four-point scale from ‘Rarely/Never’ to ‘Almost Always/Always’; the scores were coded so that the number 1 always represented the most impulsive choice and 4 represented the most self-controlled choice.

In order to enable the R magic commands, we first need to load the rpy2 extension for Jupyter:

import pandas as pd
%load_ext rpy2.ipython

In the notebook, we first load the data from Github and preprocess it in order to format into into the required format, which is a data frame with one column for each item in the survey (not shown here). Once we have that data frame (called data_df_spread here), we can create a notebook cell that takes in the data frame and performs mirt, searching for the optimal number of factors according to the Bayesian Information Criterion (BIC):

%%R -i data_df_spread -o bic_values

# Perform a multidimensional item response theory (MIRT) analysis using the `mirt` R package

library(mirt)

# Test models with increasing # factors to find the best-fitting model based on minimum BIC

bic_values <- c()
n = 1
best_model_found = FALSE
fit = list()

while (!best_model_found) {
    fit[[n]] <- mirt(data_df_spread, n, itemtype = ‘graded’, SE = TRUE, 
        verbose = FALSE, method = ‘MHRM’)

    bic <- extract.mirt(fit[[n]], ‘BIC’)
    if (n > 1 && bic > bic_values[length(bic_values)]) {
        best_model_found = TRUE
        best_model <- fit[[n - 1]]
        cat(’Best model has’, n - 1, ‘factor(s) with BIC =’, 
            bic_values[length(bic_values)], ‘\n’)
    } else {
        cat(’Model with’, n, ‘factor(s): BIC =’, bic, ‘\n’)
        n <- n + 1
    }
    bic_values <- c(bic_values, bic)
}

This cell uses the -i flag to ingest the data_df_spread data frame from the previous Python cells; a major advantage of this approach is that it automatically converts the Python data frame to an R data frame. After performing the analysis in R, it then outputs the bic_values variable back into a Python variable (using the -o flag), again automatically converting into a Python data frame. The R session remains active in the background, such that we can use another cell later in the notebook to work with the variables generated in that cell and compute the loadings of each item onto each factor, exporting them back into Python:

%%R -o loadings
loadings <- as.data.frame(summary(best_model)$rotF, verbose=FALSE)

The ability to easily integrate code from Python and many other languages is one of the most important applications of Jupyter notebooks for scientists.

In the next post I will outline a set of best practices for the use of Jupyter notebooks.

Neural Strategies

Discussion about this post