MAGeCK - Test - LatchBio

How to run MAGeK Test on Latch

Find MAGeCK Test in your Workspace
1. Find MAGeCK Test in “All Workflows” and open the workflow
Enter the parameters for MAGeCK Test
1. First add your Sample Labels, these labels should correspond to the labels in your count table. Ex. “L1”, “CTRL”
2. Then, specify which labels are treatment and which are control. This can either be done by listing each in Sample Types and must correspond to the Sample Labels given in the same order.
— or — You can specify the name of a single Sample Label (usually the day 0 or plasmid label) in the Specify Single Control Label parameter which will tell MAGeCK to treat all other samples as treatments.
1. Select the count table, if you used MAGeCK Count to generate your count table it will be *.count.txt file from the count outputs.
2. Then fill out the Output Prefix and set an Output Location and click Launch Workflow.
Within no time your results will show up in the Data tab!

FYI

If you want to run multiple executions of this workflow click the large plus button at the bottom of the parameters to add an additional execution of the test workflow.
We have hidden many of the optional parameters under Hidden Parameters, you can click that if you would like to fine tune your execution run or want to use any of the advanced parameters.

Required Parameters

Sample Labels

The labels of each sample in the count table. The names and number of labels listed must correspond to samples listed in the count table.

Which samples are treatment or control, this can be done in one of two ways:

Sample Types

A list that specifies whether each of the sample labels listed above are a treatment or a control sample. This list must correspond to the list of Sample Labels given.

— or —

Specify Single Control Label

The name of a single Sample Label (usually the day 0 or plasmid label) which will tell MAGeCK to treat all other samples as treatments.

Sample Inputs Are Paired

Enabling this tells MAGeCK to do paired sample comparisons.
- When enabled, the sample labels given must be listed paired, so treatment-1, control-1, treatment-2, control-2, etc…
For example: Sample Labels: HL60.final, HL60.intial, KBM7.final, KBM7.intial
Sample Types: Treatment, Control, Treatment, Control

Count Table

A tab-separated count table, each line in the table should include sgRNA name (1st column), targeting gene (2nd column) and read counts in each sample. If you used MAGeCK Count to generate your count table it will *.count.txt file from the count outputs.
The read count file should list the names of the sgRNA, the gene it is targeting, followed by the read counts in each sample. Each item should be separated by the tab (‘\t’). A header line is optional. For example in the studies of T. Wang et al. Science 2014, there are 4 CRISPR screening samples, and they are labeled as: HL60.initial, KBM7.initial, HL60.final, KBM7.final. Here are a few lines of the read count file:

sgRNA	Gene	HL60.initial	KBM7.initial	HL60.final	KBM7.final
A1CF_m52595977	A1CF	213	274	883	175
A1CF_m52596017	A1CF	294	412	1554	1891
A1CF_m52596056	A1CF	421	368	566	759
A1CF_m52603842	A1CF	274	243	314	855
A1CF_m52603847	A1CF	0	50	145	266

Output Prefix

The prefix appended to all of the outputted files.

Output Location

The directory where the files produced by this subcommand will be placed. A path can either be selected or if a new path is typed in field Latch will automatically create the folders in the data viewer.

Hidden Parameters

Quality Control & Trimming

Method for Normalization

By default MAGeCK will use Median normalization.
Options:
- None: No normalization
- Median: Median normalization – Default
- Total: Normalization by total read counts
- Control: Normalization by control sgRNAs specified by the Control sgRNA option. The median factor used for normalization will be calculated based on control sgRNAs only, rather than all the sgRNAs

Control sgRNAs

A list of control sgRNAs for normalization and for generating the null distribution of RRA. Alternatively Control Genes can be specified instead of this parameter. This option tells MAGeCK to use provided negative control sgRNAs to generate the null distribution when calculating the p values. By providing the corresponding sgRNA IDs in this parameter, MAGeCK will have a better estimation of p values.
When using this option, you will need to provide a plain text file just containing negative control sgRNA IDS (one per each line). For example,

NonTargetingControlGuideForHuman_0001
NonTargetingControlGuideForHuman_0002
NonTargetingControlGuideForHuman_0003
NonTargetingControlGuideForHuman_0004

Control Genes

A list of genes whose sgRNAs are used as control sgRNAs for normalization and for generating the null distribution of RRA. Alternatively Control sgRNA can be specified instead of this parameter. There are several issues that you need to keep in mind:
- You should have enough number of negative control guides (>100 recommended) for accurate p value estimation and normalization.
- It is known that for growth based screens, non-targeting controls may lead to high false positives (e.g., Morgens et al. 2017. Use non-targeting controls carefully.
By default MAGeCK will generate the null distribution of RRA scores by assuming all of the genes in the library are non-essential. This approach is sometimes over-conservative, and you can improve this if you know some genes are not essential.

p-Value Threshold For FDR Gene Test

The p value threshold to determine the alpha value of RRA in gene test. The default value is set as 0.25.

sgRNA-Level p-Value Adjustment Method

The method for sgrna-level p-value adjustment, by default MAGeCK will use False Discovery Rate (fdr). Holm’s Method (holm) and Pounds’s Method (pounds) can also be used.

Labels of Samples for Estimating Variance

Sample label or sample index for estimating variances. The label names given here must match the name given in Sample Labels.

Remove sgRNAs Whose Mean Value is Zero In

Specify where sgRNAs whose mean value is zero should be removed, this can be either remove none, only the treatment, only the control, both the treatment and control, or any sgRNA where the mean value is zero.
MAGeCK defaults to removing from both the treatment and control sgRNA.

Remove Zeros Threshold

The sgRNA normalized count threshold to be considered removed in the Remove sgRNAs Whose Mean Value Is Zero option. By default this threshold is 0.

Log2 Fold Changes (LFC) Calculation Method

Method to calculate gene log2 fold changes (LFC) from sgRNA LFCs. Available methods include the median/mean of all sgRNAs (median/mean), or the median/mean sgRNAs that are ranked in front of the alpha cutoff in RRA (alphamedian/alphamean), or the sgRNA that has the second strongest LFC (secondbest). In the alphamedian/alphamean case, the number of sgRNAs correspond to the “goodsgrna” column in the output, and the gene LFC will be set to 0 if no sgRNA is in front of the alpha cutoff. The default value is median.

Copy Number Variation (CNV) Matrix for Normalization

A matrix of copy number variation data across cell lines to normalize CNV-biased sgRNA scores prior to gene ranking.

Cell Line Name For Copy Number Variation (CNV) Normalization

The name of the cell line to be used for copy number variation normalization. Must match one of the column names in the file provided for Copy Number Variation (CNV) Matrix.

BED File with Gene Positions For Copy Number Variation (CNV) Estimation

Estimate CNV profiles from screening data. A BED file with gene positions are required as input. The CNVs of these genes are to be estimated and used for copy number bias correction.

Output Settings

Sort Criteria for Output Summaries

Tells MAGeCK to sort summaries either by negative selection (neg) or positive selection (pos). By default MAGeCK will sort by negative selection.

Generate PDF Report of Analysis

Enabling this might do something, not sure…

Generate File With Normalized Read Counts

Save unmapped reads to file, with sgRNA lengths specified by sgRNA Length parameter which might be combine with this one or be a separate parameter, I’m not sure yet…

Keep Intermediate Files

This will have MAGeCK keep the _.gene.high.txt, _.gene.low.txt, phigh.txt and plow.txt which are used in generating the gene/sgrna ranking and normally deleted at the end of the execution by MAGeCK.

Outputs

gene_summary.txt

This file is a table with the p-values and FDRs of all genes computed by MAGeCK. You can learn more about the format of it at the MAGeCK Wiki.

sgrna_summary.txt

Results by guide. You can learn more about the format of it at the MAGeCK Wiki.

.R File

This file contains code that can be executed within the R software environment to plot the data from the count subcommand and create a PDF from it. This file can be used in a program such as RStudio.

summary.Rnw

This file is called by the counts summary.R file and has the specific code for plotting the results.

Log File

This file contains all of the logs of the execution. This file is mostly a bunch of techno gobbledygook but you can view it to view any errors the execution might have encountered.

Temporary Files

gene.high.txt

The gene ranking results (positively selected genes). Used in the RRA analysis.

gene.low.txt

The gene ranking results (negatively selected genes). Used in the RRA analysis.

phigh.txt

An sgrna ranking file created and used by MAGeCK for RRA analysis.

plow.txt

An sgrna ranking file created and used by MAGeCK for RRA analysis.

What is MAGeCK

Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens (or GeCKO) technology. MAGeCK can be used for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. MAGeCK identifies both positively and negatively selected genes simultaneously and reports robust results across different experimental conditions. MAGeCK is developed and maintained by Wei Li and Han Xu from Prof. Xiaole Shirley Liu’s lab at the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health. MAGeCK has been used to identify functional lncRNAs from screens with close to 100% validation rate.

Workflows & SDK

Developer Tools

Workflow User Interface

Testing

Latch Console Dashboard

Ready-to-Use Workflows

​How to run MAGeK Test on Latch

​FYI

​Required Parameters

​Sample Labels

​Sample Types

​Specify Single Control Label

​Sample Inputs Are Paired

​Count Table

​Output Prefix

​Output Location

​Hidden Parameters

​Quality Control & Trimming

​Method for Normalization

​Control sgRNAs

​Control Genes

​p-Value Threshold For FDR Gene Test

​sgRNA-Level p-Value Adjustment Method

​Labels of Samples for Estimating Variance

​Remove sgRNAs Whose Mean Value is Zero In

​Remove Zeros Threshold

​Log2 Fold Changes (LFC) Calculation Method

​Copy Number Variation (CNV) Matrix for Normalization

​Cell Line Name For Copy Number Variation (CNV) Normalization

​BED File with Gene Positions For Copy Number Variation (CNV) Estimation

​Output Settings

​Sort Criteria for Output Summaries

​Generate PDF Report of Analysis

​Generate File With Normalized Read Counts

​Keep Intermediate Files

​Outputs

​gene_summary.txt

​sgrna_summary.txt

​.R File

​summary.Rnw

​Log File

​Temporary Files

​gene.high.txt

​gene.low.txt

​phigh.txt

​plow.txt

​What is MAGeCK