Human diseases have led to rapidly expanding research to understand their cause and find solutions. This has led to the association of many genes to these diseases. However the underlying mechanisms often remain unknown or are unclear. There currently is a bias toward well studied genes and a tendency to ignore poorly annotated genes, which may be the missing pieces of the puzzle. By using the same pieces over and over again, we may be never able to see the complete picture of the disease under study. To identify new pieces and complete the puzzle we have created this online tool. This tool associates unstudied genes with well studied and disease related genes, by identifying which genes tend to work together. This can then be used to predict the function of unstudied genes adding new pieces to the puzzle.
A 1 minute presentation on GeneFriends' motivation and purpose:
GeneFriends:RNAseq relies on a co-expression map that describes which genes tend to generally activate (increase in expression) and deactivate (decrease in expression) simultaneously in approximately 4000 Human RNAseq samples (obtained from the Short Read Archive (SRA) database of pubmed). This creates a general impression of which genes tend to activate simultaneously.
Since co-expressed genes tend to be involved in the same biological processes this map can be used to:
- Assign putative functions to poorly annotated genes.
- Identify new target genes related to a disease or biological process using a guild by association approach.
- User ID - List of IDs of the same type as the user input, associated with the seed list.
- Ensembl Gene ID - Gene symbols of genes associated with the seed list ranked by significance.
- Gene Symbol - Gene Symbols corresponding to these gene symbols.
- Mutual rank - Average rank of a pair of genes in each others co-expressed gene lists. Gene co-expression is calculated as below.
- p-value - Calculated p-value based on "gene set friends", "total number of genes in the seed list" and "Total friends" using cumulative distribution function(binomial).
- Gene set Friends - Number of times this gene is associated/friends with a gene in the seed list (aka in the top 5% co-expressed genes).
- Total Friends - Number of times this gene is associated/friends with any gene in the map.
- HGNC Annotation - The HGNC annotation this gene is associated with.
The full co-expression maps can be downloaded belowHuman: Mouse:
- Co-expressed genes - A full list of co-expressed genes and the co-expression values. Also includes genomic positions and HGNC annotation per gene.
- Transcription factors - Same as the list of co-expressed genes, but only including transcription factors.
- Seed list genes - Same as the list of co-expressed genes, but only including genes in the input/seed list.
- Inversely expressed genes - A list of genes that are negatively correlated to the input/seed list genes. In other words, genes that tend to be down-regulated when genes in the input list are up-regulated.
- DAVID Friend Annot - Functional enrichment of the genes that are co-expressed with the input/seed genes.
- DAVID Enemy Annot - Functional enrichment of the genes that are negatively correlated with the input/seed genes.
- BioLayout - A file that can be imported into BioLayout to visualize the network. BioLayout is freely available on the BioLayout website. A tree of 20*20*20 of the 20 friends and friends of those friends and friends of those. Top 10 friends are considered "good friends" and have a connection strength of 1. Friends ranking between 10 and 20 are considered "lesser friends" with a rank of 0.5. If a gene is a friend of a friend 0.25 is deducted from the connection strength. Use CTRL+W to hide nodes by number of edges. Use CTRL+ALT+W to hide nodes by edge strength to visualize the core network.
- Cytoscape - A file that can be imported into Cytoscape to visualize the network. Cytoscape is freely available on the Cytoscape website. A tree of 20*20*20 of the 20 friends and friends of those friends and friends of those. Top 10 friends are considered "good friends" and have a connection strength of 1. Friends ranking between 10 and 20 are considered "lesser friends" with a rank of 0.5. If a gene is a friend of a friend 0.25 is deducted from the connection strength.
The GeneFriends tool employs a genome wide co-expression map which describes which genes are related based on how often they are co-expressed. To construct this map 4133 RNAseq/2531 (Human/Mouse) samples with at least 10.000.000 reads were downloaded and analysed using STAR. Raw read counts were counted using a custom script performing a similar job to the freely available HTseq tool(albeit >20 fold faster). Each sample was normalized by dividing the read counts for each gene/exon by the total read count (only reads that overlap features are included) in the sample. The correlation between each gene pair was calculated using a weighted Pearson correlation. For each gene all other genes were ranked based on their co-expression strength. Then the mutual rank was calculated by adding the ranks of each pair in each others Pearson correlation ranked lists and divided by two. The mouse co-expression map will be updated 21-03-2015 to include >4000 samples.