GEOexplorer

Summary information of the gene expression study is displayed below.

A table containing information for each of the experimental conditions used in the gene expression study is displayed below. Each experimental condition relates to a column in the gene expression dataset in the 'Gene Expression Dataset' tab.

A table containing the gene expression data is displayed below. Each column relates to an experimental condition, each row relates to a gene, and each value relates to a gene expression value for that gene under that experimental condition. The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Download

Generated using R plotly. The plot below displays the distribution of the values of the genes in the dataset. This plot is useful for identifying if the data is normalised before performing differential expression analysis. If density curves are similar from gene to gene, it is indicative that the data is normalized and cross-comparable. The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Generated using R plotly. The plot below displays the distribution of the values of the genes in the dataset. The quartiles are calculated using the linear method. Viewing the distribution can be useful for determining if the data in the dataset is suitable for differential expression analysis. Generally, median-centred values are indicative that the data is normalized and cross-comparable. The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Generated using R prcomp and plotly. Principal component analysis (PCA) reduces the dimensionality of multivariate data to two dimensions that can be visualized graphically with minimal loss of information.
Eigenvalues correspond to the amount of the variation explained by each principal component (PC). The plot displays the eigenvalues against the number of dimensions. The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Generated using R prcomp and R plotly. Principal component analysis (PCA) reduces the dimensionality of multivariate data to two dimensions that can be visualized graphically with minimal loss of information.
Eigenvalues correspond to the amount of the variation explained by each principal component (PC). The plot displays the eigenvalues for each individual (row) in the gene expression dataset for the top two principal components (PC1 and PC2). The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Generated using R limma and plotly. The plot below is used to check the mean-variance relationship of the expression data, after fitting a linear model. It can help show if there is a lot of variation in the data. Each point represents a gene. The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Generated using R cor and heatmaply. The plot below compares the correlation values of the samples in a heatmap. The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Generated using R prcomp and R plotly. Principal component analysis (PCA) reduces the dimensionality of multivariate data to two dimensions that can be visualized graphically with minimal loss of information.
Eigenvalues correspond to the amount of the variation explained by each principal component (PC). The plot displays the eigenvalues for each variable (column) in the gene expression dataset for the top two principal components (PC1 and PC2). The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Generated using R prcomp and R plotly. Principal component analysis (PCA) reduces the dimensionality of multivariate data to two dimensions that can be visualized graphically with minimal loss of information.
Eigenvalues correspond to the amount of the variation explained by each principal component (PC). The plot displays the eigenvalues for each variable (column) in the gene expression dataset for the top three principal components (PC1, PC2 and PC3). The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Generated using R umap and plotly. Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique useful for visualizing how genes are related to each other. The number of nearest neighbours used in the calculation is indicated in the graph. The values are displayed post KNN imputation, count per million transformation and log transformation if selected.

Input the k-nearest neighbors value to use:

A table containing information for each of the experimental conditions used in the gene expression study is displayed below. In the group column, select the experimental conditions you want to include in group 1, group 2 or N/A if you want the experimental condition excluded from differential gene expression analysis. During differential gene expression analysis, group 1 is compared against group 2.

Select the experimental conditions to include in Group 1.

Select the experimental conditions to include in Group 2.

The parameters for differential gene expression analysis are displayed below. Please select the appropriate parameters and click analyse to perform differential gene expression analysis.

Apply adjustment to the P-values:

Apply limma precision weights (vooma):

Yes

Force normalization:

Yes

Significance level cut-off:

Generated using R limma. The table below displays the top differentially expressed genes between the groups selected.

adj.P.Val is the P-value after adjustment for multiple testing. This column is generally recommended as the primary statistic by which to interpret results. Genes with the smallest P-values will be the most reliable.

P.Value is the Raw P-value

t is the Moderated t-statistic

B is the B-statistic or log-odds that the gene is differentially expressed

logFC is the Log2-fold change between two experimental conditions

F is the moderated F-statistic which combines the t-statistics for all the pair-wise comparisons into an overall test of significance for that gene

Download

Generated using R limma and plotly. Use to view the distribution of the P-values in the analysis results. The P-value here is the same as in the Top differentially expressed genes table and computed using all selected contrasts. While the displayed table is limited by size this plot allows you to see the 'big picture' by showing the P-value distribution for all analyzed genes.

Generated using limma (vennDiagram). Displays the number of differentially expressed genes versus the number of non-differentially expressed genes.

Generated using R limma (qqt) and plotly. Plots the quantiles of a data sample against the theoretical quantiles of a Student's t distribution. This plot helps to assess the quality of the limma test results. Ideally the points should lie along a straight line, meaning that the values for moderated t-statistic computed during the test follow their theoretically predicted distribution.

Generated using R limma and plotly. The volcano plot displays statistical significance (-log10 P value) versus magnitude of change (log2 fold change) and is useful for visualizing differentially expressed genes. Highlighted genes are significantly differentially expressed at the selected adjusted p-value cutoff value.

Generated using R limma and plotly. The mean difference (MD) plot displays log2 fold change versus average log2 expression values and is useful for visualizing differentially expressed genes. Highlighted genes are significantly differentially expressed at the selected adjusted p-value cutoff.

Generated using R limma and heatmaply. A heatmap plot displaying the top differentially expressed genes expression values for each experimental condition. The expression values are displayed post KNN imputation, count per million transformation, log transformation, normalisation and limma precision weights if selected.

Input the number of genes to display:

Gene enrichment analysis is performed using Enrichr. Information on each of the databases is available from the Enrichr website via the link below. Enrichr

Select the column containg the gene symbols and input any missing gene symbols.

Select a database to use for gene enrichment analysis

Generated using R enrichR. The table below displays the gene sets identified from the genes, including several summary and statistical values.

Select if you want to view results for all differentially expressed genes, upregulated genes or downregulated genes

Download

Generated using R enrichR and plotly. The Barchart plot displays the gene sets along the y axis and the user selected column along the y axis. The points are ordered based on the user selected column.

Select if you want to view results for all differentially expressed genes, upregulated genes or downregulated genes

Select the column to display:

The number of gene sets to display:

Sort the values ascendingly or descendingly:

Ascendingly

Descendingly

Generated using R enrichR and plotly. The volcano plot displays statistical significance (-log10 P value) versus odds ratio and is useful for visualizing the statistically significant gene sets.

Select if you want to view results for all differentially expressed genes, upregulated genes or downregulated genes

Generated using R enrichR and plotly. The manhattan plot displays the gene sets along the x axis and the user selected column along the y axis. The points are ordered based on the user selected column.

Select if you want to view results for all differentially expressed genes, upregulated genes or downregulated genes

Select the column to display:

OS	Version	Chrome	Firefox	Edge	Safari
Linux	Ubuntu 20.04.2	96.0.4664.53	94.0.2	Not tested	Not tested
MacOS	macOS 10.14.6	96.0.4664.53	94.0.2	Not tested	15.0
Windows	10 & 11	96.0.4664.53	94.0.2	96.0.1054.43	Not tested

Introduction

Operating System and Browser Compatibility

Additional Information

Contact Details

Reporting Problems or Bugs

Acknowledgements

Citation

Workflow

1. Introduction

2. Accessing the GEOexplorer Web Server

3. Alternatively Installing the GEOexplorer R package

4. Launching the GEOexplorer R package

3. Tutorial

a. GEOexplorer Structure

I. Home Tab

II. About Tab

III. Workflow Tab

IV. Tutorial Tab

V. GEO Search Tab

VI. Example Datasets Tab

b. Loading GEO Datasets into GEOexplorer

I. Searching for a GEO Dataset

II. Using a GEO Accession Code

c. Performing Exploratory Data Analysis

I. Checking if RNA-seq Datasets Contain Transformed Data

II. Reviewing the Results of Exploratory Data Analysis

d. Performing Differential Gene Expression Analysis

e. Performing Gene Enrichment Analysis

f. Transforming Datasets into the Format Needed by GEOexplorer

I. Identifying that a GEO Dataset Failed to Loaded into GEOexplorer

II. Downloading GEO datasets

g. Uploading a Dataset to GEOexplorer

6. Reporting Problems or Bugs

7. Acknowledgements

8. Session Information