Human diseases have led to rapidly expanding research to understand their cause and find solutions. This has led to the association of many genes to these diseases. However the underlying mechanisms often remain unknown or are unclear. There currently is a bias toward well studied genes and a tendency to ignore poorly annotated genes, which may be the missing pieces of the puzzle. By using the same pieces over and over again, we may be never able to see the complete picture of the disease under study. To identify new pieces and complete the puzzle we have created this online tool. This tool associates unstudied genes with well studied and disease related genes, by identifying which genes tend to work together. This can then be used to predict the function of unstudied genes adding new pieces to the puzzl
GeneFriends is a co-expression map that describes which genes tend to generally activate (increase in expression) and deactivate (decrease in expression) simultaneously in a large range of microarray datasets between the different conditions described within these micro-arrays (obtained from the GEO database).
This leads to a general impression of which genes tend to activate simultaneously irrespective of the condition (the microarray datasets describe a wide range of different conditions).
Since co-expressed genes tend to be involved in the same biological processes this map can be used to:
- Assign putative functions to poorly annotated genes.
- Identify new target genes related to a disease or biological process using a guilt by association approach.
- Gene Symbol - Gene symbols of genes associated with the seed list ranked by significance.
- Entrez Gene ID - Entrez ID's corresponding to these gene symbols.
- P-value - Calculated P-value based on "gene set friends", "total number of genes in the seed list" and "Total friends" the cumulative distribution function(binomial).
- Gene set Friends - Number of times this gene is associated/friends with a gene in the seed list (aka in the top 5% co-expressed genes).
- Total Friends - Number of times this gene is associated/friends with any gene in the map.
- Chromosome - Chromosome on which this gene is located.
- Chrom Start - Start location of this gene on the chromosome.
- Chrom End - End location of this gene on the chromosome.
- GO Annotation - The GO:function or GO:Process this gene is associated with. Only one is listed to keep the interface clean and allow a quick assessment of the type of genes in the results.
- Remaining columns - Co-expression values of each gene in the seed list. A value of 0.60 indicates the gene in this row is increased in expression (at least 2 fold) in 60% of the cases the gene in the header of this column is increased in expression (at least 2 fold).
- Co expressed genes - A full list of co-expressed genes and the co-expression values. Also includes genomic positions and GO annotation per gene.
- Transcription factors - Same as the list of co-expressed genes, but only including transcription factors.
- Seed list genes - Same as the list of co-expressed genes, but only including genes in the input/seed list.
- Enemy genes - A list of genes that are negatively correlated to the input/seed list genes. Aka genes that tend to be down-regulated when genes in the input list are up-regulated.
- DAVID Friend Annot - Functional enrichment of the genes that are co-expressed with the input/seed genes.
- DAVID Enemy Annot - Functional enrichment of the genes that are negatively correlated with the input/seed genes.
- BioLayout - A file that can be imported into BioLayout to visualize the network. BioLayout is freely available on the BioLayout website. A tree of 20*20*20 of the 20 friends and friends of those friends and friends of those. Top 10 friends are considered "good friends" and have a connection strength of 1. Friends ranking between 10 and 20 are considered "lesser friends" with a rank of 0.5. If a gene is a friend of a friend 0.25 is deducted from the connection strength. Use CTRL+W to hide nodes by number of edges. Use CTRL+ALT+W to hide nodes by edge strength to visualize the core network.
The GeneFriends tool employs a genome wide co-expression map which describes which genes are related based on how often they are co-expressed. To construct this map we used normalized microarray data from the GEO database. Entrez ID's present in at least 5% of the datasets were included in the co-expression map. These were paired to establish if genes were co-regulated; co-regulation being defined as both genes increasing or decreasing in expression at least two-fold simultaneously, a standard (even if arbitrary) measure of differential expression. Then based on how often gene pairs were co-regulated compared to how often the single genes showed a two-fold increase or decrease in expression we calculated a co-expression ratio, which quantifies how strongly two genes are co-expressed, for all gene pairs.
Number of datasets included in different co-expression maps:
- Mouse 3571 Micro array datasets containing 20455 experimental conditions and 22760 genes
- Human 4164 Micro array datasets containing 26113 experimental conditions and 19080 genes
- Rat 717 Micro array datasets containing 4970 experimental conditions and 13960 genes
- Fruit fly 230 Micro array datasets containing 1045 experimental conditions and 12660 genes
- Yeast 260 Micro array datasets containing 1544 experimental conditions and 5880 genes