How to run MAGeCK Pathway on Latch

  1. Find MAGeCK Plot in your Workspace
    1. Find MAGeCK Plot in “All Workflows” and open the workflow
  2. Enter the parameters for MAGeCK Plot
    1. First add your Gene Ranking file, if you used MAGeCK test to generate your Gene Ranking file it will be *.gene_summary.txt file from the test outputs.
    2. Then add your Pathway file in GMT format (learn more about this file and format below).
    3. TThen fill out the Output Prefix and Output Location and click Launch Workflow.
  3. Within n30 seconds your results will show up in the Data tab!

FYI

  • If you want to run multiple executions of this workflow click the large plus button at the bottom of the parameters to add an additional execution of the count workflow.
  • We have hidden many of the optional parameters under Hidden Parameters, you can click that if you would like to fine tune your execution run or want to use any of the advanced parameters.

Required Parameters

Gene Ranking File

  • The gene ranking file generated by MAGeCK Test. This will be the *.gene_summary.txt from the outputs.

Count Table

  • A tab-separated count table, each line in the table should include sgRNA name (1st column), targeting gene (2nd column) and read counts in each sample. If you used MAGeCK Count to generate your count table it will *.count.txt file from the count outputs.

Output Prefix

  • The prefix appended to all of the outputted files

Output Location

  • The directory where the files produced by this subcommand will be placed. A path can either be selected or if a new path is typed in field Latch will automatically create the folders in the data viewer.

Hidden Parameters

Plot Settings

Method for Normalization

  • By default MAGeCK will use Median normalization.
  • Options:
    • None: No normalization
    • Median: Median normalization, default
    • Total: Normalization by total read counts
    • Control: Normalization by control sgRNAs specified by the Control sgRNA option. The median factor used for normalization will be calculated based on control sgRNAs only, rather than all the sgRNAs

Control sgRNAs

  • A list of control sgRNAs for normalization and for generating the null distribution of RRA. Alternatively Control Genes can be specified instead of this parameter. This option tells MAGeCK to use provided negative control sgRNAs to generate the null distribution when calculating the p values. By providing the corresponding sgRNA IDs in this parameter, MAGeCK will have a better estimation of p values.
  • When using this option, you will need to provide a plain text file just containing negative control sgRNA IDS (one per each line). For example,
NonTargetingControlGuideForHuman_0001
NonTargetingControlGuideForHuman_0002
NonTargetingControlGuideForHuman_0003
NonTargetingControlGuideForHuman_0004

Control Genes

  • A list of genes whose sgRNAs are used as control sgRNAs for normalization and for generating the null distribution of RRA. Alternatively Control sgRNA can be specified instead of this parameter. There are several issues that you need to keep in mind:
    • You should have enough number of negative control guides (>100 recommended) for accurate p value estimation and normalization.
    • It is known that for growth based screens, non-targeting controls may lead to high false positives (e.g., Morgens et al. 2017. Use non-targeting controls carefully.
  • By default MAGeCK will generate the null distribution of RRA scores by assuming all of the genes in the library are non-essential. This approach is sometimes over-conservative, and you can improve this if you know some genes are not essential.

Specify Specific Genes To Be Plotted

  • A list of genes to be plotted.

Specify Specific Samples To Be Plotted

  • A list of samples to be plotted. By default MAGeCK uses all samples in the count table.

Outputs

PDF of Plots

  • Just a PDF of the plots. It isn’t very pretty though.

.R File

  • This file contains code that can be executed within the R software environment to plot the data from the count subcommand and create a PDF from it. This file can be used in a program such as RStudio.

plot_summary.Rnw

  • This file is called by the counts summary.R file and has the specific code for plotting the results.

Log File

  • This file contains all of the logs of the execution. This file is mostly a bunch of techno gobbledygook but you can view it to view any errors the execution might have encountered.

What is MAGeCK

Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens (or GeCKO) technology. MAGeCK can be used for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. MAGeCK identifies both positively and negatively selected genes simultaneously and reports robust results across different experimental conditions. MAGeCK is developed and maintained by Wei Li and Han Xu from Prof. Xiaole Shirley Liu’s lab at the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health. MAGeCK has been used to identify functional lncRNAs from screens with close to 100% validation rate.