Python code to compute the effects of mutations from the parameters of an undirected graphical model estimated using plmc (included in this repository). Please refer to the included EVmutation.html for usage instructions and an example calculation.

Software that infers undirected graphical models to describe coevolution and covariation in families of biological sequences. The model parameters estimated with plmc are used by EVmutation to compute the effects of mutations. Please refer to included README.md for compilation and usage instructions.

JavaScript/D3-based tool to interactively visualize the parameters of undirected graphical models of protein families.

For all proteins with mutational scans (in A2M format, .tar.gz archive). Note that inserts relative to the target sequence were removed, and that columns with lowercase residues were excluded during model inference.

For all proteins/RNAs with mutational scans (in csv format, .tar.gz archive)

We also provide predicted mutation landscapes for almost 7000 human proteins.

Column | Description |
---|---|

mutant | String representation of mutant. The individual substitutions of higher-order mutants are separated by commas. WT for wild-type sequence |

effect_prediction_epistatic | Statistical energy difference of mutant, computed using the epistatic model (wild-type sequence: 0) |

effect_prediction_independent | Statistical energy difference of mutant, computed using the independent model (wild-type sequence: 0) |

Experimental data columns | Columns containing experimental measurements. Their names are listed explicitly in the comment lines at the top of the .csv file. For some of the datasets, we used the name "linear" for experimental fitness measurements between 0 and 1 (WT=1), and "log" if the fitness was reported as a log enrichment ratio relative to WT (WT=0). |

Column | Description |
---|---|

mutant | String representation of mutant |

pos | Substituted position in target sequence |

wt | Wild-type residue at position in target sequence |

subs | Substitution introduced at position |

prediction_epistatic | Statistical energy difference of mutant, computed using the epistatic model (wild-type sequence: 0) |

prediction_independent | Statistical energy difference of mutant, computed using the independent model (wild-type sequence: 0) |

frequency | Frequency of substituted residue in family sequence alignment used for inference |

column_conservation | Conservation of position in family sequence alignment (0: fully variable, 1: fully conserved) |

Column | Description |
---|---|

i | Index of first position in sequence |

A_i | Residue at position i of sequence |

i | Index of second position in sequence |

A_i | Residue at position j of sequence |

fn | (placeholder column) |

cn | Evolutionary coupling score (corrected Frobenius norm, CN) of residue pair (i, j). Higher scores correspond to stronger coupling, coupling scores around 0 indicate that no coupling could be detected. |