Analysis pipeline for genotyping individuals based on either microsatellites or SNPs (Single Nucleotide Polymorphisms).
Clone down this repository and move into it:
git clone https://github.com/CGI-NRM/NGSGenotype
cd NGSGenotype/Move your data into the Raw_data-folder:
mv [folder with your data in it]/*.fastq.gz ./Raw_data/If you have paired end data, merge reads with (log will be saved to 'bbmerge_log.out'):
cd Merged_data/
bash merge_all_pairs.shTrim all primers, choose primer file as well as if data was merged of single end (log will be saved to 'cutadapt_log.out'):
cd ..
bash cutadapt_script.sh merged Primers/Bear_SNP_all_loci.fa # for merged data with bear primers
# or:
bash cutadapt_script.sh single Primers/L_vulgaris.fa # for single end data with Lissotriton primersContinue analysis in R using the rmarkdown file 'ngsgenotype.rmd'.
Trim the superflous sequences around the SNPs:
bash filter_out_SNPs.sh # currently hardcoded for bear SNPsGenotype the SNPs:
cd Genotyped_SNPs/
python snpotypewriter.py ../Filtered_data/SNP_filtered/ > genotyped_SNP_data.csv
# or, if data is divided into chunks (where chunk folder names start with "chunk_"):
bash loop_over_chunks.sh ../path/to/folder/with/chunks/And then continue analysis in R using the rmarkdown file 'ngsgenotype_snp.rmd'.