4 Rpy2

4.1 Rpy2: basic functionality

Rpy2 is a foreign function interface to R. It can be used in the following way:

import rpy2
import rpy2.robjects as robjects

/home/runner/work/polygloty/polygloty/renv/python/virtualenvs/renv-python-3.12/lib/python3.12/site-packages/rpy2/rinterface_lib/embedded.py:276: UserWarning: R was initialized outside of rpy2 (R_NilValue != NULL). Trying to use it nevertheless.
  warnings.warn(msg)
R was initialized outside of rpy2 (R_NilValue != NULL). Trying to use it nevertheless.

vector = robjects.IntVector([1,2,3])
rsum = robjects.r['sum']

rsum(vector)

IntVector with 1 elements.

6

Luckily, we’re not restricted to just calling R functions and creating R objects. The real power of this in-memory interoperability lies in the conversion of Python objects to R objects to call R functions on, and then to the conversion of the results back to Python objects.

Rpy2 requires specific conversion rules for different Python objects. It is straightforward to create R vectors from corresponding Python lists:

str_vector = robjects.StrVector(['abc', 'def', 'ghi'])
flt_vector = robjects.FloatVector([0.3, 0.8, 0.7])
int_vector = robjects.IntVector([1, 2, 3])
mtx = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5)

However, for single cell biology, the objects that are most interesting to convert are (count) matrices, arrays and dataframes. In order to do this, you need to import the corresponding rpy2 modules and specify the conversion context.

import numpy as np

from rpy2.robjects import numpy2ri
from rpy2.robjects import default_converter

rd_m = np.random.random((10, 7))

with (default_converter + numpy2ri.converter).context():
    mtx2 = robjects.r.matrix(rd_m, nrow = 10)

import pandas as pd

from rpy2.robjects import pandas2ri

pd_df = pd.DataFrame({'int_values': [1,2,3],
                      'str_values': ['abc', 'def', 'ghi']})

with (default_converter + pandas2ri.converter).context():
    pd_df_r = robjects.DataFrame(pd_df)

One big limitation of rpy2 is the inability to convert sparse matrices: there is no built-in conversion module for scipy. The anndata2ri package provides, apart from functionality to convert SingleCellExperiment objects to an anndata objects, functions to convert sparse matrices.

import scipy as sp

from anndata2ri import scipy2ri

sparse_matrix = sp.sparse.csc_matrix(rd_m)

with (default_converter + scipy2ri.converter).context():
    sp_r = scipy2ri.py2rpy(sparse_matrix)

We will showcase how to use anndata2ri to convert an anndata object to a SingleCellExperiment object and vice versa as well:

import anndata as ad
import scanpy.datasets as scd

import anndata2ri

adata_paul = scd.paul15()


  0%|          | 0.00/9.82M [00:00<?, ?B/s]
  0%|          | 8.00k/9.82M [00:00<03:28, 49.4kB/s]
  0%|          | 32.0k/9.82M [00:00<01:36, 106kB/s] 
  1%|          | 96.0k/9.82M [00:00<00:43, 235kB/s]
  2%|1         | 200k/9.82M [00:00<00:25, 390kB/s] 
  4%|4         | 408k/9.82M [00:00<00:14, 703kB/s]
  8%|8         | 840k/9.82M [00:01<00:06, 1.35MB/s]
 17%|#6        | 1.65M/9.82M [00:01<00:03, 2.59MB/s]
 34%|###3      | 3.33M/9.82M [00:01<00:01, 5.07MB/s]
 67%|######7   | 6.62M/9.82M [00:01<00:00, 9.85MB/s]
 83%|########2 | 8.15M/9.82M [00:01<00:00, 11.1MB/s]
100%|##########| 9.82M/9.82M [00:01<00:00, 6.21MB/s]


with anndata2ri.converter.context():
    sce = anndata2ri.py2rpy(adata_paul)
    ad2 = anndata2ri.rpy2py(sce)

4.2 Interactive sessions

One of the most useful ways to take advantage of in-memory interoperability is to use it in interactive sessions, where you’re exploring the data and want to try out some functions non-native to your language of choice.

Jupyter notebooks (and some other notebooks) make this possible from the Python side: using IPython line and cell magic and rpy2, you can easily run an R jupyter cell in your notebooks.

%load_ext rpy2.ipython  # line magic that loads the rpy2 ipython extension.
                        # this extension allows the use of the following cell magic

%%R -i input -o output  # this line allows to specify inputs 
                        # (which will be converted to R objects) and outputs 
                        # (which will be converted back to Python objects) 
                        # this line is put at the start of a cell
                        # the rest of the cell will be run as R code

4.3 Usecase: ran in Python

We will perform the Compute DE step not in R, but in Python The pseudobulked data is read in:

import anndata as ad

pd_adata = ad.read_h5ad("../usecase/data/pseudobulk.h5ad")

Select small molecule and control:

sm_name = "Belinostat"
control_name = "Dimethyl Sulfoxide"

Creating a DESeq dataset: This requires a bit more effort: we need to import the DESeq2 package, and combine the default, numpy2ri and pandas2ri converter to convert the count matrix and the obs dataframe.

import numpy as np

import rpy2
import rpy2.robjects as robjects

from rpy2.robjects import numpy2ri
from rpy2.robjects import pandas2ri

from rpy2.robjects import default_converter
from rpy2.robjects.packages import importr

DESeq2 = importr("DESeq2")

np_cv_rules = default_converter + numpy2ri.converter + pandas2ri.converter

with np_cv_rules.context() as cv:
    counts_dense = np.transpose(pd_adata.X.astype(np.int32))

    robjects.globalenv["count_data"] = counts_dense
    robjects.globalenv["obs_data"] = pd_adata.obs

We can also specify R formulas!

from rpy2.robjects import Formula

design_formula = Formula('~ sm_name + plate_name')

dds = DESeq2.DESeqDataSetFromMatrix(countData = robjects.globalenv["count_data"],
        colData = robjects.globalenv["obs_data"],
        design = design_formula)

Run DESeq2:

dds = DESeq2.DESeq(dds)

Get results:

contrastv = robjects.StrVector(["sm_name", sm_name, control_name])
res = DESeq2.results(dds, contrast=contrastv)

base = importr('base')
res = base.as_data_frame(res)

Preview results:

dplyr = importr('dplyr')
utils = importr('utils')

res = utils.head(dplyr.arrange(res, 'padj'), 10)

Write to disk: this again requires the pandas2ri converter to convert the results to a pandas dataframe.

with (robjects.default_converter + pandas2ri.converter).context():
    res_pd = robjects.conversion.get_conversion().rpy2py(res)

    res_pd.to_csv("../usecase/data/de_contrasts.csv")