WEBSERVERS / CODE / SOFTWARE
We work hard to make our research fully reproducible and extendable by other researchers by providing well-documented data and code for each project. We also strive to make our computational tools widely usable to biologists and biomedical scientists via reusable software and interactive webservers.
PecanPy is a parallelized, efficient, and accelerated node2vec software written in Python. Learning low-dimensional representations (embeddings) of nodes in large graphs is key to applying machine learning on massive biological networks. Node2vec is the most widely used method for node embedding. PecanPy is an ultrafast implementation of node2vec that uses cache-optimized compact graph data structures and precomputing/parallelization to result in high-quality node embeddings for biological networks of all sizes and densities.
PecanPy: a fast, efficient, and parallelized Python implementation of node2vec.
Renming L, Krishnan A
bioRxiv (2020) doi.org/10.1101/2020.07.23.218487.
The Expresto repository contains data and code to generate/reproduce the results in our work on imputing the expression of unmeasured genes in gene-expression profiles. This work introduces a new method called SampleLASSO that uses a sparse regression-based approach that is accurately imputes unmeasured genes in samples from any platform in a way that captures context-specific biologically relevant information to guide imputation. The code includes a function that allows users to use SampleLASSO to fill in the unmeasured genes in their dataset of interest and get a report on which samples in the training data were the most helpful for imputation.
A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.
Mancuso CA*, Canfield JL*, Singla D, Krishnan A
Nucleic Acids Research (2020) doi.org/10.1093/nar/gkaa881.
The GenePlexus repository contains data and code to generate/reproduce the results in our work on systematically benchmarking supervised-learning for network-based gene classification across diverse prediction tasks (functions, diseases, and traits) and molecular networks using meaningful validation schemes and evaluation metrics. We have designed the code to enable easy addition of new methods, which can then be benchmarked along with the other methods using the same evaluation environment.
Supervised-learning is an accurate method for network-based gene classification.
Liu R*, Mancuso CA*, Yannakopoulos A, Johnson KA, Krishnan A
Bioinformatics (2020) doi.org/10.1093/bioinformatics/btaa150.
The ASD webserver contains a genome-wide ranking of human candidate genes associated with Autism Spectrum Disorder (ASD), predicted based on known ASD-related genes and their functional relationships in a human brain-specific gene interaction network (from GIANT; below). Using the ASD webserver, researchers can interactively access all autism gene predictions in the context of their relationships in the human brain-specific gene network, along with the results from subsequent analyses, including spatiotemporal brain signatures, functional modules and prioritized copy-number variants (CNVs).
Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder.
Krishnan A*, Zhang R*, Yao V, Theesfeld CL, Wong AK, Tadych A, Volfovsky N, Packer A, Lash A, Troyanskaya OG
Nature Neuroscience (2016) 19:1454-1462.
The GIANT webserver contains data-driven human genome-scale functional interaction networks between ~26,000 genes in more than 280 tissues and cell-types. Using GIANT, researchers can (i) look-up the tissue-specific interactions of one or more genes, (ii) compare a gene's functional interaction in different tissues by selecting the relevant tissues in the dropdown menu, and (iii) reprioritize functional associations from a genome-wide association study (GWAS) using tissue-specific networks using an approach named NetWAS and identify additional candidate disease-associated genes.
GIANT 2.0: genome-scale integrated analysis of gene networks in tissues.
Wong AK, Krishnan A, Troyanskaya OG
Nucleic Acids Research (2018) 46:W65–W70.
Understanding multi-cellular function and disease with human tissue-specific gene interaction networks.
Greene CS*, Krishnan A*, Wong AK*, Ricciotti E, Zelaya R, Himmelstein D, Chasman D, Fitzgerald G, Dolinski K, Grosser T, Troyanskaya OG
Nature Genetics (2015) 47:569-576.