Using single cell RNA-seq, single cell CITE-seq, and single cell LIBRA-seq
This repository contains a pipeline that can be used to fully replicate all analysis and figures associated with the manuscript [TODO add manuscript name].
In order to fully replicate analysis download the following docker images from dockerhub:
All aspects of this pipeline are contained within a Snakemake pipeline. This will be able to start with fastq files and generate a final directory containing all final figures numbered based on the figure in the manuscript.
This pipeline was written to interact with an lsf scheduler. Feel free to reach out if you need help modifying to run on your own server.
The majority of the packages required to run this are in docker images that are publicly available:
To load these packages as docker images:
Other required packages include
cellranger V 7.1.0from 10x genomicssingularity,apptainer, ordockersnakemakeV 6.0.3
- Download and install miniconda3: For Linux
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh
- Install
Snakemake:
conda install snakemake=6.3.0 -c bioconda -c conda-forge
If you are using docker, you can download the images directly from dockerhub
[TODO] - Test how downloading works.
[TODO] - Test making sif from docker hub directly
Once everything is installed, you will need to update the config file to make sure all paths are correctly configured for your environment.
- RAW_DATA: Path to the directory containing fastq files
- SAMPLES: the list of samples you want to test. This is the name that will be in the output files. The order must be identical to the order of RNA_SAMPLES, ADT_SAMPLES, and VDJ_SAMPLES
- AGGR_SAMPLES: Which samples should be aggregated together at the end of the pipeline. These should be samples that were split between multiple 10x runs. This was left blank for our experiment.
- RNA_SAMPLES: The name of the RNA samples. Not the full fastq name, but the name that is the same for all samples. Likely ends in "GEX". This will likely need to be updated depending on how you downloaded files from GEO.
- ADT_SAMPLES: optional The name of the samples from CITE-seq or hashtagging. If CITE-seq and hashtagging files are separate, include them both separated by a comma. Put samples in the same order as their RNA counterparts. If CITE-seq or hashtagging were not performed, leave this blank.
- VDJ_SAMPLES: optional The name of the samples from VDJ-seq. Put samples in the same order as their RNA counterparts. If VDJ-sequencing was not peformed, leave this blank.
- RESULTS: Path to the output directory. This will be created if it doesn't already exist
- GENOME: Path to the cellranger genome reference. We used refdata-gex-GRCh38-2020-A downloaded from 10x genomics
- ADT_REF: optional Path to the ADT-reference. Should be a comma separated file with the columns described in the 10x tutorial: id, name, read, pattern, sequence, and feature_type. The feature_type will be Antibody Capture. The name will be the name in the final output matrix. Leave this blank if CITE-seq or hashtagging were not performed. This can be found here
- VDJ_REF: Path to the cellranger VDJ reference. If VDJ sequencing were not performed, leave this blank.
- MAX_JOBS: The maximum number of jobs that can be submitted by cell ranger at a time
- LSF_TEMPLATE: Path to an LSF template. One is included in this git repo.
- CHEMISTRY: optional Arguments to the
--chemstiryflag in cellranger count. If left blank, chemistry will beauto. Only use this if the pipeline failed because of a failure to detect the chemistry. Can be filled in for only some samples.- SCRIPT_PATH: The path to the scripts. Does not need to be updated if you downloaded this directory and haven't modified the paths.
- SCRIPTS_RUN: Name of one-off scripts that need to be run
- SAMPLE_SCRIPTS: A dictionary containing 1. What samples to run the scripts on. Either should match the sample names in SAMPLES or "all" for all samples. 2. What scripts should be run. These scripts will be found in
src/scripts/individual_analysisand will be run on all samples. Any parameters that are sample-specific will be found in the sample info file- MERGE_SCRIPTS: A dictionary containing 1. What samples to merge, if set to "all", all samples will be merged. 2. What scripts should be run. These will be found in
src/scripts/integrated_analysis. Any parameters such as the batch correction will be found in the sample info file- SAMPLE_METADATA: Any metadata to add to the sample objects. This file is found here
- SAMPLE_INFO: Path to the file containing sample info. If you are running analysis from scratch, you can fill this out as you get outputs to help make decisions. This file is here
- RSRIPT_CONTAINER: Path to docker or singularity rscript file. See above for access.
- DROPKICK_CONTAINER: Path to docker or singularity dropkick file. See above for access.
- SCAR_CONTAINER: Path to docker or singularity scar file. See above for access.
- T1K_CONTAINER: Path to docker or singularity T1K file. See above for access.
The script to submit the job is here. You will need to update this command to work with your cluster or system.
This pipeline will run
Cellrangerdata preprocessingDropkickto determine cutoffs for cells DropkickScarto remove ambient protein reads scarImmcantationVDJ clone calling ImmcantationT1Kfor HLA identification T1K- R analysis scripts to recreate analysis and figures (described below)
This runs through snakemake and will process the whole pipeline for you.
I highly recommend looking at the csv files that are generated and passed to cell ranger to ensure that the correct fastq files have been detected for each sample.
Once you've run T1K, map to HLA types here
R scripts for the analysis are found in src/scripts. There are two directories here. One called individual_analysis includes scripts to process an individual sample. intigrated_analysis contains scripts to run analysis on multiple samples.
All packages needed to run this scripts are in the smith_r_docker described above.
Scripts within each directory are numbered based on the order in which they should be run.
They include:
- Steps for intial processing and filtering of the data
- Steps for doublet removal
- Steps for normalizing ADTs (if they are present in your data)
- Steps for dimensional reduction with PCA and UMAP either on RNA alone or with ADTs included
- Steps to name clusters based on references with
clustifyr - Steps for marker detection
- Steps for gene ontology and pathway analysis
- Steps for integrating multiple samples
- Steps for differential expression when multiple samples are present
For many of these steps, I've included different methods of performing them (for example batch correction can be done with harmony or fastMNN) and ways of comparing the different approaches so you can make an informed decision about what method is best for your samples.
For all scripts in this analysis, please make sure you understand the processing steps so you know what are the appropriate parameters for each step. These scripts are not intended to be run blindly with no understanding of the underlying algorithims.
01_dropkick.py- Runsdropkickon each sample individually to identify cells to be filtered.01b_run_scar.py- Runsscaron each sample individually to estimate and remove background contamination based on ambient droplets02_Initial_processing.R- Sets up seurat objects for each sample individually. This runs the following steps.- Creates a seurat object that contains ADT, HTO, and RNA data
- Determines the percentage of reads that map to the mitochondria.
- Runs
scuttleto estimate cutoffs based on quality metics and compares this to the output ofdropkick. This comparison is output as a heatmap and barplots saved inresults/R_analysis/{sample_name}/images/dropkick_vs_cellqc.pdf - Subsets the seurat object based on the
scuttlecutoffs - Reads in VDJ data using
djvdj - Processes tetramers
- Pulls the tetramers out of the ADT slot
- Normalizes the tetramers and ADT using the
CLRmethod - Identifies the libra score from the raw tetramer data and adds the score as an assay
clr_matrix <- log2((all_tetramers + 1) / rowMeans(all_tetramers)) libra_score <- scale(clr_matrix, center = TRUE, scale = TRUE)
- Reads in the output from
scar- Pulls out the tetramers
- Normalizes the tetramers and ADTs with the
CLRmethod - Identifies the libra score form the scar corrected tetramer data and add the score as an assay
clr_matrix <- log2((all_tetramers + 1) / rowMeans(all_tetramers)) libra_score <- scale(clr_matrix, center = TRUE, scale = TRUE)
- Runs an updated form of
HTOdemuxon the tetramers (both raw and scar corrected)HTOdemuxwas updated because the previous form would take the highest counts for any feature even if it was not the feature with the highest value above the determined cutoff. Because of this, sometimes the actual feature that was higher than the cutoff was not returned while features that were below the cutoff were returned. The updated function also returns a list of all features that were above the cutoff in thefull_hash_idcolumn.
- Identifies positive values for each tetramer based on a libra cutoff of 1 for both the raw and
scarcorrected libra scores. - Runs SCT normalization on the RNA assay
03_remove_doublets.R- Identifies and tags doublets usingDoubletFinder. Images related to this can be found inresults/R_analysis/{sample_name}/images/dropkick_vs_cellqc.pdf.04_adt_dsb_normalization.R- Usesdsbto identify and remove background. This is similar toscarand was only used to compare toscaroutput.05_PCA.R- Runs PCA using methods fromSeurat. Makes the outputresults/R_analysis/{sample_name}/images/RNA_pca.pdfto help determine the number of pcs to use for downstream processing.06a_UMAP_find_resolution.R- Uses the number of pcs found above (specified in the filefiles/sample_info.tsv) and usesclustreeto visualize many clustering resolutions. Theclustreeand umaps of each resolution will be in a fileresults/R_analysis/{sample_name}/images/clustering_resolution.pdfto guide resolution selection for downstream analysis.06b_UMAP.R- Taking the number of pcs and resolution determined in the previous steps (specified infiles/sample_info.tsv), generates the final umap and clustering for the object. The final images can be found inresults/R_analysis/{sample_name}/images/RNA_pca.pdf07_name_clusters.R- Uses two references the seurat reference which can be downloaded here and a BND reference previously published by Mia Smith to name clusters. I name each cluster on individual samples to provide more confidence in the integration of samples.08_find_markers.R- Finds markers for each of the clusters and cell types using a one-vs-all approach withFindAllMarkersin Seurat. This outputs a file of markers that should be used to double check cell type determination.09_demultiplex_tets.R- More attempts to identify positive tetramers. Here, UMAPs are generated on theADT_CLR,TET_CLR,ADT_DSBand theTET_DSBassays and clusters are found in the same. These clusters are compared to the tetramer deremination at other steps. Nothing from this script was used in the final manuscript.10_compare_dsb_clr.R- A script comparing the output of the dsb and clr normalization approaches. Nothing from this script was used in the final manuscript11_remove_ambience.R- A script that runsremoveAmbiencefromDropletUtilsto attempt to remove ambient contamination. This assay was used for all steps and compared to the non-abient removal downstream, but was not used in the final manuscript because ambience removal from a popultion of only B cells did not make much difference.12_improve_cutoff.R- Steps to try a tetramer identification that uses the non-b cells in our population to determine a cutoff. A cutoff of the 95th quartile of the non-b cells was used to identify a cutoff. A proportion above the cutoff was than calculated for each cell (and made into an assay) and tetramer calls were based on any tetramers with scores above 1. This scoring system ended up being used in the final manuscript.
NOTE on tetramer labeling. While libra, HTODemux, scar, and raw data were all used, only the libra score on scar corrected values and the t-cell/myeloid cell cutoff on the scar corrected values were used to identify tetramer names the others were just used to test methods and compare results
[TODO]
Below is a detailed explanation of all parts of the seurat object
RNA- Raw and log normalized RNA-seq readsADT- Raw andCLRnormalized values for all ADTsTET- Raw andCLRnormalized values for all tetramersTET_LIBRA- libra score on all raw tetramer valuesSCAR_ADT-scarcorrected andCLRnormalizedscarcorrected values for all ADTsSCAR_ADT_LOG-scarcorrected andlognormalized (withlog1p)scarcorrected values for all ADTsSCAR_TET-scarcorrected andCLRnormalizedscarcorrected values for all tetramersSCAR_TET_LOG-scarcorrected andlognormalized (withlog1p)scarcorrected values for all tetramersSCAR_TET_LIBRA- libra score on allscarcorrected tetramer valuesDSB_ADT- ADTs with background corrected withdsbSCT- SCT normalized dataTET_PROPORTIONS- The proportions above the cutoff based on the modifiedHTOdemuxrun on the raw tetramer data.NEW_TET_PROPORTIONS- The proportion above the cutoff based on the 95th quartile of the non-b cells included in the assay. The scores were determined based on the scar corrected tetramer data. This was added in12_improve_cutoffs.R
These columns are all in the metadata available on GEO
Seuratdefault columnsorig.ident- Sample namenCount_RNA- Number of RNA molecules in the cellnFeature_RNA- Number of genes in the cellnCount_ADT- Number of ADT molecules in the cellnFeature_ADT- Number of different ADTs in the cellpercent.mt- Percent of reads mapping to the mitochondria
- Meta data columns
Sex- SexCollection.Date- Date the blood was collectedAge.at.Collection..years.- Age of patient at date of collection in yearsAge.at.Collection..Months.- Age of patient at date of collection in monthsStatus- Disease status (ND = non-diabetic, T1D = type 1 diabetes, AAB = autoantibody positive)Date.of.Diagnosis- Date of T1D diagnosisDays.post.onset- Number of days between diagnosis and collectionmillions.of.cells.frozen- Number of cells collectedHLA- HLA type of the individualAutoantibodies- What autoantibodies were detecteddate.processed.for.scSeq- The date cells were thawed and prepped for scRNA-seqNotes..FDR.relationship.- For first degree relative, how they are relatedsample- Sample IDEthnicity- The ethnicity of the individual
- Cell cycle columns
Phase- Final cell cycle phase determination as determined by Seurat'sCellCycleScoringfunction
- VDJ columns (Added by
djvdj)chains- What chains are present in the cell, each chain is separated by a semi colonn_chains- How many chains are in the cellcdr3- The amino acid sequence of the CDR3 region(s). Each chain is separated by a semi coloncdr3_nt- The nucleotide sequence of the CDR3 region(s). Each chain is separated by a semi coloncdr3_length- The CDR3 amino acid length(s). Each chain is separated by a semi coloncdr3_nt_length- The CDR3 nucleotide length(s). Each chain is separated by a semi colonv_gene- The V gene(s) identified by cellranger. Each chain is separated by a semi colond_gene- The D gene(s) identified by cellranger. Each chain is separated by a semi colonj_gene- The J gene(s) identified by cellranger. Each chain is separated by a semi colonc_gene- The C gene(s) identified by cellranger. Each chain is separated by a semi colonisotype- The isotype(s) identified by cellranger. Each chain is separated by a semi colonreads- How many reads support each chain. Each chain is separated by a semi colonumis- How many umis support each chain. Each chain is separated by a semi colonproductive- If each chain (separated by a simi colon) is productivefull_length- If each chain (separated by a simi colon) is full lengthpaired- If there were paired chains presentall_mis_freq- The number of mimatches in all region(s). Each chain is separated by a semi colon
- Columns relating to tetramer calling
libra_tet_hash_id- The classification of tetramers based on the libra score - the name of the tetramer if only one is above the cutoff or multi-reactive (Islet or Other) depending on what tetramers were above the cutoff. Defined in02_Initial_processing.R. This uses the libra score based on the raw tetramer data.libra_full_hash_id- Full id based on libra scores. This gives all tetramers above the cutoff. Defined in02_Initial_processing.R. This uses the libra scores computed based on the raw tetramer data.scar_libra_tet_hash_id- The classification of tetramers based on the libra score - the name of the tetramer if only one is above the cutoff or multi-reactive (Islet or Other) depending on what tetramers were above the cutoff. Defined in02_Initial_processing.R. This uses the libra score based on the scar corrected tetramer data.scar_libra_full_hash_id- Full id based on libra scores. This gives all tetramers above the cutoff. Defined in02_Initial_processing.R. This uses the libra scores computed based on the scar corrected tetramer data.tet_name_cutoff- Cutoff determined based on the non-b cells present for each sample. Here a cutoff was drawn ath the 95th quartile of the non-b cells. The value is the name of the tetramer if only one is above the cutoff or multi-reactive (Islet or Other) depending on what tetramers were above the cutoff. Defined in12_improve_cutoff.R. This uses the scar corrected tetramer data.full_tet_name_cutoff- Cutoff determined based on the non-b cells present for each sample. Here a cutoff was drawn ath the 95th quartile of the non-b cells. This gives all tetramers above the cutoff. Defined in12_improve_cutoff.R. This uses the scar corrected tetramer data.
- Clustering and cell type columns
RNA_cluster- The final clusters used for downstream analysiscluster_celltype- A combination of the final cluster and final cell typefinal_celltype- Final cell types that were used for making the figures
- immcantation columns
final_clone- Clone call by immcantationimcantation_isotype- Isotype determined by immcantation
These columns will be generated when the seurat object is made following the pipeline in this repository. These are not all in the published meta data.
Seuratdefault columnsorig.ident- Sample namenCount_RNA- Number of RNA molecules in the cellnFeature_RNA- Number of genes in the cellnCount_ADT- Number of ADT molecules in the cellnFeature_ADT- Number of different ADTs in the cellnCount_SCT- Number of reads based on SCT normalizationnFeature_SCT- Number of genes based on SCT normalizationnCount_TET- Number of reads from the tetramersnFeature_TET- Number of features from the tetramersnCount_TET_LIBRA- Number of reads associated with the libra score (ignore)nFeature_TET_LIBRA- Number of genes associated with the libra score (ignore)nCount_SCAR_ADT_LOG- Number of reads from the log normalized scar corrected ADTs (ignore)nFeature_SCAR_ADT_LOG- Number of genes from the log normalized scar corrected ADTs (ignore)nCount_SCAR_ADT- Number of reads from the scar corrected ADTsnFeature_SCAR_ADT- Number of genes from the scar corrected ADTsnCount_SCAR_TET- Number of reads from the scar corrected tetramersnFeature_SCAR_TET- Number of genes from the scar corrected tetramersnCount_SCAR_TET_LOG- Number of reads from the log normalized scar corrected tetramers (ignore)nFeature_SCAR_TET_LOG- Number of genes from the log normalized scar corrected tetramers (ignore)nCount_SCAR_TET_LIBRA- Number of reads associated with the tetramer libra score (ignore)nFeature_SCAR_TET_LIBRA- Number of genes associated with the tetramer libra score (ignore)nCount_CLR_ADT- Number of reads associated with the clr normalized adts (ignore)nFeature_CLR_ADT- Number of genes associated with the clr normalized adts (ignore)nCount_CLR_TET- Number of reads associated with the clr normalized tetramers (ignore)nFeature_CLR_TET- Number of genes associated with the clr normalized tetramers (ignore)nCount_AMBRNA- Number of reads from the ambient corrected rnanFeature_AMBRNA- Number of genes from the ambient corrected rnanCount_TET_PROPORTIONS- Number of reads from the hto determined proportions on the raw data (ignore)nFeature_TET_PROPORTIONS- Number of features from the hto determined proportions on the raw data (ignore)nCount_SCAR_TET_PROPORTIONS- Number of reads from the hto determined proportions on the scar corrected data (ignore)nFeature_SCAR_TET_PROPORTIONS- Number of features from the quantile determined proportions on the scar corrected data (ignore)nCount_DSB_ADT- Number of reads associated with the dsb normalized adts (ignore)nFeature_DSB_ADT- Number of genes associated with the dsb normalized adts (ignore)nCount_DSB_TET- Number of reads associated with the dsb normalized tetramers (ignore)nFeature_DSB_TET- Number of genes associated with the dsb normalized tetramers (ignore)nCount_NEW_TET_PROPORTIONS- Number of reads from the qantile determined proportions based on the scar corrected tetramers (ignore)nFeature_NEW_TET_PROPORTIONS- Number of features from the quantile determined proportions based on the scar corrected tetramers (ignore)percent.mt- Percent of reads mapping to the mitochondria
- Meta data columns
ID- Sample idInitials- Individual initialsSample.Name- Full sample nameSex- SexCollection.Date- Date the blood was collectedAge.at.Collection..years.- Age of patient at date of collection in yearsAge.at.Collection..Months.- Age of patient at date of collection in monthsStatus- Disease status (ND = non-diabetic, T1D = type 1 diabetes, AAB = autoantibody positive)Date.of.Diagnosis- Date of T1D diagnosisDays.post.onset- Number of days between diagnosis and collectionmillions.of.cells.frozen- Number of cells collectedHLA- HLA type of the individualAutoantibodies- What autoantibodies were detecteddate.processed.for.scSeq- The date cells were thawed and prepped for scRNA-seqNotes..FDR.relationship.- For first degree relative, how they are relatedsample- Sample IDEthnicity- The ethnicity of the individual
scuttleqc columns (Added byscuttle)cell_qc_sum- Same asnCount_RNAcell_qc_detected- Same asnFeature_RNAcell_qc_subsets_Mito_sum- Sum of mito counts per cellcell_qc_subsets_Mito_detected- Number of mit genes per cellcell_qc_subsets_Mito_percent- Percentmit per cellcell_qc_altexps_ADT_sum- Same asnCount_ADTcell_qc_altexps_ADT_detected- Same asnFeature_ADTcell_qc_altexps_ADT_percent- Percent of ADT counts per cellcell_qc_total- sum of counts for each cell across the main and alternative experimentcell_qc_low_lib_size- If the cell passed QC based on library sizecell_qc_low_n_features- If the cell passed QC based on number of featurescell_qc_high_subsets_Mito_percent- If the cell passed QC based on mito percentcell_qc_discard- If the cell is flagged to be removed byperCellQCFilters
- Cell cycle columns
S.Score- Cell cycle score for S phase as determined by Seurat'sCellCycleScoringfunctionG2M.Score- Cell cycle score for G2M phase as determined by Seurat'sCellCycleScoringfunctionPhase- Final cell cycle phase determination as determined by Seurat'sCellCycleScoringfunction
- VDJ columns (Added by
djvdj)clonotype_id- Clonotype determined bydjvdjexact_subclonotype_id- Clonotype sub id determined bydjvdjchains- What chains are present in the cell, each chain is separated by a semi colonn_chains- How many chains are in the cellcdr3- The amino acid sequence of the CDR3 region(s). Each chain is separated by a semi coloncdr3_nt- The nucleotide sequence of the CDR3 region(s). Each chain is separated by a semi coloncdr3_length- The CDR3 amino acid length(s). Each chain is separated by a semi coloncdr3_nt_length- The CDR3 nucleotide length(s). Each chain is separated by a semi colonv_gene- The V gene(s) identified by cellranger. Each chain is separated by a semi colond_gene- The D gene(s) identified by cellranger. Each chain is separated by a semi colonj_gene- The J gene(s) identified by cellranger. Each chain is separated by a semi colonc_gene- The C gene(s) identified by cellranger. Each chain is separated by a semi colonisotype- The isotype(s) identified by cellranger. Each chain is separated by a semi colonreads- How many reads support each chain. Each chain is separated by a semi colonumis- How many umis support each chain. Each chain is separated by a semi colonproductive- If each chain (separated by a simi colon) is productivefull_length- If each chain (separated by a simi colon) is full lengthpaired- If there were paired chains presentv_ins- The number of insertions in the v region(s). Each chain is separated by a semi colonv_del- The number of deletions in the v region(s). Each chain is separated by a semi colonv_mis- The number of mismatches in the v region(s). Each chain is separated by a semi colond_ins- The number of insertions in the d region(s). Each chain is separated by a semi colond_del- The number of deletions in the d region(s). Each chain is separated by a semi colond_mis- The number of mismatches in the d region(s). Each chain is separated by a semi colonj_ins- The number of insertions in the j region(s). Each chain is separated by a semi colonj_del- The number of deletions in the j region(s). Each chain is separated by a semi colonj_mis- The number of mismatches in the j region(s). Each chain is separated by a semi colonc_ins- The number of insertions in the c region(s). Each chain is separated by a semi colonc_del- The number of deletions in the c region(s). Each chain is separated by a semi colonc_mis- The number of mismatches in the c region(s). Each chain is separated by a semi colonall_ins- The number of insertions in all region(s). Each chain is separated by a semi colonall_del- The number of deletions in all region(s). Each chain is separated by a semi colonall_mis- The number of mismatches in all region(s). Each chain is separated by a semi colonvd_ins- The number of insertions in the v and d region(s). Each chain is separated by a semi colonvd_del- The number of deletions in the c and d region(s). Each chain is separated by a semi colondj_ins- The number of insertions in the d and j region(s). Each chain is separated by a semi colondj_del- The number of deletions in the d and j region(s). Each chain is separated by a semi colonv_mis_freq- The number of mimatches in the v region(s). Each chain is separated by a semi colond_mis_freq- The number of mimatches in the d region(s). Each chain is separated by a semi colonj_mis_freq- The number of mimatches in the j region(s). Each chain is separated by a semi colonc_mis_freq- The number of mimatches in the c region(s). Each chain is separated by a semi colonall_mis_freq- The number of mimatches in all region(s). Each chain is separated by a semi colon
- Columns relating to tetramer calling
TET_maxID- Output of running the new version ofHTOdemuxbased on theTETassay (theCLRnormalized raw tetramer counts). This gives the highest tet id. (Ignore)TET_secondID- Output of running the new version ofHTOdemuxbased on theTETassay (theCLRnormalized raw tetramer counts). This gives the second highest tet id. (Ignore)TET_margin- Output of running the new version ofHTOdemuxbased on theTETassay (theCLRnormalized raw tetramer counts). This gives the difference between the highest and second highest cutoffs (ignore)TET_classification- Output of running the new version ofHTOdemuxbased on theTETassay (theCLRnormalized raw tetramer counts). This gives the two highest tet ids. (ignore)TET_classification.global- Output of running the new version ofHTOdemuxbased on theTETassay (theCLRnormalized raw tetramer counts). This gives the valuesNegative,SingletandDoubletdepending on how many were positive (Ignore)hash.ID- Ignoretet_hash_id- Output of running the new version ofHTOdemuxbased on theTETassay (theCLRnormalized raw tetramer counts). This gives the final label - the name of the tetramer if only one is above the cutoff or multi-reactive (Islet or Other) depending on what tetramers were above the cutoff. Defined in02_Initial_processing.Rfull_hash_id- Output of running the new version ofHTOdemuxbased on theTETassay (theCLRnormalized raw tetramer counts). This gives all tetramers above the cutoff. Defined in02_Initial_processing.Rlibra_tet_hash_id- The classification of tetramers based on the libra score - the name of the tetramer if only one is above the cutoff or multi-reactive (Islet or Other) depending on what tetramers were above the cutoff. Defined in02_Initial_processing.R. This uses the libra score based on the raw tetramer data.libra_full_hash_id- Full id based on libra scores. This gives all tetramers above the cutoff. Defined in02_Initial_processing.R. This uses the libra scores computed based on the raw tetramer data.old_hash_id- ignoreSCAR_TET_maxID- Output of running the new version ofHTOdemuxbased on theSCAR_TETassay (theCLRnormalized scar corrected tetramer counts). This gives the highest tet id. (Ignore)SCAR_TET_secondID- Output of running the new version ofHTOdemuxbased on theSCAR_TETassay (theCLRnormalized scar corrected tetramer counts). This gives the second highest tet id. (Ignore)SCAR_TET_margin- Output of running the new version ofHTOdemuxbased on theSCAR_TETassay (theCLRnormalized scar corrected tetramer counts). This gives the difference between the highest and second highest cutoffs (ignore)SCAR_TET_classification- Output of running the new version ofHTOdemuxbased on theSCAR_TETassay (theCLRnormalized scar corrected tetramer counts). This gives the two highest tet ids. (ignore)SCAR_TET_classification.globalscar_hash_id- Output of running the new version ofHTOdemuxbased on theSCAR_TETassay (theCLRnormalized scar corrected tetramer counts). This gives the valuesNegative,SingletandDoubletdepending on how many were positive (Ignore)full_scar_hash_id- Output of running the new version ofHTOdemuxbased on theSCAR_TETassay (theCLRnormalized scar corrected tetramer counts). This gives all tetramers above the cutoff. Defined in02_Initial_processing.Rscar_libra_tet_hash_id- The classification of tetramers based on the libra score - the name of the tetramer if only one is above the cutoff or multi-reactive (Islet or Other) depending on what tetramers were above the cutoff. Defined in02_Initial_processing.R. This uses the libra score based on the scar corrected tetramer data.scar_libra_full_hash_id- Full id based on libra scores. This gives all tetramers above the cutoff. Defined in02_Initial_processing.R. This uses the libra scores computed based on the scar corrected tetramer data.old_scar_hash_id- ignoretet_name_cutoff- Cutoff determined based on the non-b cells present for each sample. Here a cutoff was drawn ath the 95th quartile of the non-b cells. The value is the name of the tetramer if only one is above the cutoff or multi-reactive (Islet or Other) depending on what tetramers were above the cutoff. Defined in12_improve_cutoff.R. This uses the scar corrected tetramer data.full_tet_name_cutoff- Cutoff determined based on the non-b cells present for each sample. Here a cutoff was drawn ath the 95th quartile of the non-b cells. This gives all tetramers above the cutoff. Defined in12_improve_cutoff.R. This uses the scar corrected tetramer data.
- Clustering and cell type columns
seurat_clusters- Final clusters called by the clustering algorithm (ignore)RNA_cluster- The final clusters used for downstream analysisRNA_celltype_seurat- Cell types determined by mapping the RNA clusters from individual samples to the seurat reference. Download of the reference is hereRNA_celltype_bnd- Cell types determined by mapping the RNA clusters from individual samples to a BND reference previously published by Mia SmithRNA_celltype- Final cell type determined by a combined best score from the BND and Seurat references based on the RNA clusters from individual samples.cluster_celltype- A combination of the final cluster and final cell typeadtdsb_clusters- Clusters determined by using the adt dsb normalized data (ignore)adtclr_clusters- Clusters determined by using the adt clr normalized data (ignore)adtscar_clusters- Clusters determined by using the adt scar normalized data (ignore)tetdsb_clusters- Clusters determined by using the tet dsb normalized data (ignore)tetclr_clusters- Clusters determined by using the tet clr normalized data (ignore)tetscar_clusters- Clusters determined by using the tet scar normalized data (ignore)tetscarlog_clusters- Clusters determined by using the tet scar log normalized data (ignore)tetdsb_nd_clustersgrouped_celltypecelltype_clusterrna_uncorrected_cluster- Clusters determined before RNA correctionrna_harmony_clust- Clusters determined using theharmonydimensionality reductionrna_mnn_clust- Clusters determined using themnndimensionality reductionAMBRNA_cluster- Clusters determined using the ambient corrected RNA data (ignore)ambience_uncorrected_cluster- Clusters determined before RNA correction using the ambient RNA corrected dataambience_harmony_clust- Clusters determined using theharmonydimensionality reduction using the ambient RNA corrected dataambience_mnn_clust- Clusters determined using themnndimensionality reduction using the ambient RNA corrected datarna_corrected_cluster- Final clustering based on themnnreductionambience_corrected_cluster- Final clustering using the abmient RNA based on themnnreductionRNA_comb_celltype_seurat- Cell types determined by mapping the RNA clusters from the combined samples to the seurat reference. Download of the reference is hereRNA_comb_celltype_bnd- Cell types determined by mapping the RNA clusters from the combined samples to a BND reference previously published by Mia SmithRNA_combined_celltype- Final cell type determined by a combined best score from the BND and Seurat references based on the RNA clusters from individual samples.AMBRNA_comb_celltype_seurat- Combined cell type based on the ambience removed assay using the seurat reference. Download of the reference is hereAMBRNA_comb_celltype_bnd- Combined cell type based on the ambience removed assaying using a BND reference previously published by Mia SmithAMBRNA_combined_celltype- Combined cell type based on the ambience removed assay using both references.final_celltype- Final cell types that were used for making the figures
- Doublet finder
Doublet_finder- Doublet finder results
- immcantation columns
final_clone- Clone call by immcantationimcantation_isotype- Isotype determined by immcantation