Skip to content

RC-PCR CLASSIFIER

Version: V0.2 (BETA)

Alt text

install

  1. obtain docker image

    docker pull jonovox/nextflowcentos:latest
    

  2. download this project

  3. extract this project

  4. navigate to project

  5. Download prebuild conda.tar.gz https://surfdrive.surf.nl/files/index.php/s/5q2feFVult4v81k

  6. Extract conda.tar.gz in folder

single sample docker

sh docker/run.sh RC jonovox/nextflowcentos:latest

batch run docker

# USAGE
cd project
# USAGE
# bash run_batch_docker.sh <inputpath> <file_extension> <database> <threads> <image> <outputname>
# bash run_batch_docker.sh ${1}             ${2}          ${3}        ${4}     ${5}   ${6}
# Example:
bash run_batch_docker.sh /workflow/input/ _001.fastq.gz SILVA 8 jonovox/easyseq_covid19:latest SILVA_test
# <file_extension> most common _001.fastq.gz

FLOW-DIAGRAM

Alt text

conda environments

  • 1A_clean_reads (fastp, version 0.20.1)\ env-f07c78eef9e8319c7eb087d931e36003
  • 2A_measure_amplicons\ env-cd4ea0676bf53b5d7e5c6c6c523f0013
  • 3A_KMA (version 1.3.28)\ env-84e06c5335c0a958ed012db619fdfceb
  • 3B_process_KMA\ env-4ca5b26b8a059c60e73996439311c22f
  • 4A_abricate\ env-b415f051979c22cdef40a3cbee1f0aa3
  • 5A_annotation\ env-9f6b61e20675ae28786fdb538092d4db
  • 6_multiQC (version 1.12)\ env-3abca7a24ea4d6c708bf4c6cea6413d2

output

SHANKEY PLOT

Alt text

.
├── QC
│   └── multiqc_report.html
├── abricate
│   └── test_blast.txt        #abricate/blast result
├── annotation
│   └── test.final.vcf        #annotated vcf file
├── fastp
│   └── test.fastp.json
├── kma
│   ├── test.aln
│   ├── test.frag.gz
│   ├── test.frag_raw.gz
│   ├── test.fsa
│   ├── test.mapstat
│   ├── test.res              #KMA result file
│   ├── test.sam
│   ├── test.sorted.bam       #bam for genomebrowser
│   ├── test.sorted.bam.bai
│   └── test.vcf.gz
└── test_UMI_counttable.xlsx  #primer count table

database structure

db
├── databasename
│   ├── KMA
│   │   ├── databasename.comp.b
│   │   ├── databasename.length.b
│   │   ├── databasename.name
│   │   └── databasename.seq.b
│   ├── blast
│   │   ├── sequences.fasta
│   │   ├── sequences.fasta.fai
│   │   ├── sequences.nhr
│   │   ├── sequences.nin
│   │   └── sequences.nsq
│   └── primers
│       └── databasename_primers.fasta

Extra

blastdb

18S

makeblastdb -in /workflow/db/blast_db/18S/sequences.fasta -title 18S -dbtype nucl -out /workflow/db/blast_db/18S/sequences

CYP51A (Afu4g06890)

makeblastdb -in /workflow/db/blast_db/CYP51A/sequences.fasta -title CYP51A -dbtype nucl -out /workflow/db/blast_db/CYP51A/sequences

CYP51A

18S

kma_index -i /workflow/db/KMA/18S.fa -o /workflow/db/KMA/18S
CYP51A (Afu4g06890)
kma_index -i /workflow/db/KMA/CYP51A.fa -o /workflow/db/KMA/CYP51A

snpEff

manual CYP51A (Afu4g06890)

snpEff build -gff3 CYP51A

SILVA database

SILVA_138.1_SSURef_NR99_tax_silva_trunc convert rRNA to DNA

perl -pe 'tr/tU/uT/ unless(/>/)' < db/SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta > SILVA_138.1_SSURef_NR99_tax_silva_trunc_DNA.fasta

NOTES

splitting of samples on UMI using seqkit seqkit grep -irp UMI samplename.fastq.gz > output.fastq