EVmutation supplementary data and code

Source Code

EVmutation github repository

Python code to compute the effects of mutations from the parameters of an undirected graphical model estimated using plmc (included in this repository). Please refer to the included EVmutation.html for usage instructions and an example calculation.

plmc github repository

Software that infers undirected graphical models to describe coevolution and covariation in families of biological sequences. The model parameters estimated with plmc are used by EVmutation to compute the effects of mutations. Please refer to included README.md for compilation and usage instructions.

EVzoom github repository

JavaScript/D3-based tool to interactively visualize the parameters of undirected graphical models of protein families.

Supplementary Web Data

Sequence alignments (35 MB)

For all proteins with mutational scans (in A2M format, .tar.gz archive). Note that inserts relative to the target sequence were removed, and that columns with lowercase residues were excluded during model inference.

Mutation effects (3 MB)

For all proteins/RNAs with mutational scans (in csv format, .tar.gz archive)

We also provide predicted mutation landscapes for almost 7000 human proteins.

Description of file types

Predictions of experimental datasets (.csv)

Column Description
mutant String representation of mutant. The individual substitutions of higher-order mutants are separated by commas. WT for wild-type sequence
effect_prediction_epistatic Statistical energy difference of mutant, computed using the epistatic model (wild-type sequence: 0)
effect_prediction_independent Statistical energy difference of mutant, computed using the independent model (wild-type sequence: 0)
Experimental data columns Columns containing experimental measurements. Their names are listed explicitly in the comment lines at the top of the .csv file. For some of the datasets, we used the name "linear" for experimental fitness measurements between 0 and 1 (WT=1), and "log" if the fitness was reported as a log enrichment ratio relative to WT (WT=0).

Predicted single-substitution landscapes (.csv)

Column Description
mutant String representation of mutant
pos Substituted position in target sequence
wt Wild-type residue at position in target sequence
subs Substitution introduced at position
prediction_epistatic Statistical energy difference of mutant, computed using the epistatic model (wild-type sequence: 0)
prediction_independent Statistical energy difference of mutant, computed using the independent model (wild-type sequence: 0)
frequency Frequency of substituted residue in family sequence alignment used for inference
column_conservation Conservation of position in family sequence alignment (0: fully variable, 1: fully conserved)

Evolutionary couplings (.csv)

Column Description
i Index of first position in sequence
A_i Residue at position i of sequence
i Index of second position in sequence
A_i Residue at position j of sequence
fn (placeholder column)
cn Evolutionary coupling score (corrected Frobenius norm, CN) of residue pair (i, j).
Higher scores correspond to stronger coupling, coupling scores around 0 indicate that no coupling could be detected.