
WEBSERVERS / CODE / SOFTWARE
We work hard to make our research fully reproducible and extendable by other researchers by providing well-documented data and code for each project. We also strive to make our computational tools widely usable to biologists and biomedical scientists via reusable software and interactive webservers.
Txt2Onto is a Python utility for text-based tissue classification along with NLP-ML (natural-language-processing + machine learning) models trained to perform the tissue classification. The repo also contains demo scripts with extensive documentation. Given an input file where each line is a piece of text to be classified, the txt2onto utility will perform the necessary text preprocessing, create an embedding for each piece of text, and then run each embedding through our pre-trained tissue models.
Publication
-
Systematic tissue annotations of –omics samples by modeling unstructured metadata.
Hawkins NT, Maldaver M, Yannakopoulos A, Guare LA, Krishnan A
bioRxiv (2021) 10.1101/2021.05.10.443525.
PecanPy is a parallelized, efficient, and accelerated node2vec software written in Python. Learning low-dimensional representations (embeddings) of nodes in large graphs is key to applying machine learning on massive biological networks. Node2vec is the most widely used method for node embedding. PecanPy is an ultrafast implementation of node2vec that uses cache-optimized compact graph data structures and precomputing/parallelization to result in high-quality node embeddings for biological networks of all sizes and densities.
Publication
-
PecanPy: a fast, efficient, and parallelized Python implementation of node2vec.
Renming L, Krishnan A
Bioinformatics (2021) doi.org/10.1093/bioinformatics/btab202.
The Expresto repository contains data and code to generate/reproduce the results in our work on imputing the expression of unmeasured genes in gene-expression profiles. This work introduces a new method called SampleLASSO that uses a sparse regression-based approach that is accurately imputes unmeasured genes in samples from any platform in a way that captures context-specific biologically relevant information to guide imputation. The code includes a function that allows users to use SampleLASSO to fill in the unmeasured genes in their dataset of interest and get a report on which samples in the training data were the most helpful for imputation.
Publication
-
A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.
Mancuso CA*, Canfield JL*, Singla D, Krishnan A
Nucleic Acids Research (2020) doi.org/10.1093/nar/gkaa881.
The GenePlexus repository contains data and code to generate/reproduce the results in our work on systematically benchmarking supervised-learning for network-based gene classification across diverse prediction tasks (functions, diseases, and traits) and molecular networks using meaningful validation schemes and evaluation metrics. We have designed the code to enable easy addition of new methods, which can then be benchmarked along with the other methods using the same evaluation environment.
Publication
-
Supervised-learning is an accurate method for network-based gene classification.
Liu R*, Mancuso CA*, Yannakopoulos A, Johnson KA, Krishnan A
Bioinformatics (2020) doi.org/10.1093/bioinformatics/btaa150.
ASD
The ASD webserver contains a genome-wide ranking of human candidate genes associated with Autism Spectrum Disorder (ASD), predicted based on known ASD-related genes and their functional relationships in a human brain-specific gene interaction network (from GIANT; below). Using the ASD webserver, researchers can interactively access all autism gene predictions in the context of their relationships in the human brain-specific gene network, along with the results from subsequent analyses, including spatiotemporal brain signatures, functional modules and prioritized copy-number variants (CNVs).
Publication
-
Krishnan A*, Zhang R*, Yao V, Theesfeld CL, Wong AK, Tadych A, Volfovsky N, Packer A, Lash A, Troyanskaya OG
Nature Neuroscience (2016) 19:1454-1462.
The GIANT webserver contains data-driven human genome-scale functional interaction networks between ~26,000 genes in more than 280 tissues and cell-types. Using GIANT, researchers can (i) look-up the tissue-specific interactions of one or more genes, (ii) compare a gene's functional interaction in different tissues by selecting the relevant tissues in the dropdown menu, and (iii) reprioritize functional associations from a genome-wide association study (GWAS) using tissue-specific networks using an approach named NetWAS and identify additional candidate disease-associated genes.
Publications
-
GIANT 2.0: genome-scale integrated analysis of gene networks in tissues.
Wong AK, Krishnan A, Troyanskaya OG
Nucleic Acids Research (2018) 46:W65–W70. -
Greene CS*, Krishnan A*, Wong AK*, Ricciotti E, Zelaya R, Himmelstein D, Chasman D, Fitzgerald G, Dolinski K, Grosser T, Troyanskaya OG
Nature Genetics (2015) 47:569-576.