nf-core/abotyper
Edit

A pipeline for characterising the Human Blood Group and Red Cell Antigens using Oxford Nanopore third-generation sequencing data.

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/abotyper

Introduction

This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Output

For each sample and each of exon6 and exon7, the pipeline will generate BAM files, BAM metrics, and PILEUP results.

The output directory generated by this Nextflow pipeline will look something like this:

OUTDIR/
├── ABO_results.log
├── ABO_result.txt
├── ABO_result.xlsx
├── final_export.csv
├── per_sample_processing
│   ├── SAMPLE1_barcode01
│   │   ├── exon6
│   │   │   ├── ABOReadPolymorphisms.txt
│   │   │   ├── alignment
│   │   │   │   ├── SAMPLE1_barcode01.bam
│   │   │   │   ├── SAMPLE1_barcode01.bam.bai
│   │   │   │   ├── SAMPLE1_barcode01.coverage.txt
│   │   │   │   ├── SAMPLE1_barcode01.flagstat
│   │   │   │   └── SAMPLE1_barcode01.stats
│   │   │   ├── SAMPLE1_barcode01.ABOPhenotype.txt
│   │   │   ├── SAMPLE1_barcode01.AlignmentStatistics.tsv
│   │   │   ├── SAMPLE1_barcode01.log.txt
│   │   │   └── mpileup
│   │   │       └── SAMPLE1_barcode01.mpileup.gz
│   │   └── exon7
│   │       ├── ABOReadPolymorphisms.txt
│   │       ├── alignment
│   │       │   ├── SAMPLE1_barcode01.bam
│   │       │   ├── SAMPLE1_barcode01.bam.bai
│   │       │   ├── SAMPLE1_barcode01.coverage.txt
│   │       │   ├── SAMPLE1_barcode01.flagstat
│   │       │   └── SAMPLE1_barcode01.stats
│   │       ├── SAMPLE1_barcode01.ABOPhenotype.txt
│   │       ├── SAMPLE1_barcode01.AlignmentStatistics.tsv
│   │       ├── SAMPLE1_barcode01.log.txt
│   │       └── mpileup
│   │           └── SAMPLE1_barcode01.mpileup.gz
├── pipeline_info
│   ├── execution_report_DATETIME.html
│   ├── execution_timeline_DATETIME.html
│   ├── execution_trace_DATETIME.txt
│   ├── nf_core_pipeline_software_mqc_versions.yml
│   ├── params_DATETIME.json
│   └── pipeline_dag_DATETIME.html
└── qc-reports
    ├── fastqc
    │   ├── SAMPLE1_barcode01_fastqc.html
    │   ├── SAMPLE1_barcode01_fastqc.zip
    └multiqc
        ├── multiqc_data
        ├── multiqc_plots
        │   ├── pdf
        │   ├── png
        │   └── svg
        └── multiqc_report.html

The ABO_result.xlsx Excel worksheet contains details of all SNVs and metrics used to deduce the ABO phenotype for each sample.

A summary of the ABO typing results is provided in final_export.csv

Feel free to raise an issue or reach out if you need any support getting this tool running, or with suggestions for improvement.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

MakeIndex – Generate BED index from reference FASTA
FastQC – Raw read quality control
Minimap2 – Mapping reads to reference
SAMtools Coverage – Coverage metrics
SAMtools Flagstat – Alignment summary statistics
SAMtools Stats – Detailed alignment metrics
SAMtools Mpileup – Base-level pileup generation
Nucleotide Quantification – Frequency analysis of bases at target loci
ABO SNP Interpretation – Custom logic to infer ABO phenotype
MultiQC – Aggregate report describing results and QC from the whole pipeline

MakeIndex

Output files

reference/
- *.bed: BED file generated from FASTA index.

This step converts the FASTA index (.fai) into a BED format used for downstream analysis and visualization.

FastQC

Output files

fastqc/
- *_fastqc.html: FastQC report containing quality metrics.
- *_fastqc.zip: Zip archive containing the FastQC report, tab-delimited data file and plot images.

FastQC provides general quality metrics about your sequenced reads, including base quality scores, GC content, adapter contamination, and overrepresented sequences.

Minimap2

Output files

alignment/
- *.bam: Aligned reads in BAM format.
- *.bai: BAM index files.

Minimap2 is used to align Nanopore reads to the ABO reference genome. It supports long-read data and is optimized for speed and accuracy.

SAMtools Modules

This set of modules use SAMtools and its subtools to extract key metrics and data from aligned BAM files. These outputs are used for quality control, coverage analysis, and variant quantification.

Output subtool files

coverage/
- *.coverage.txt: Coverage metrics per sample.
flagstat/
- *.flagstat.txt: Summary of alignment flags.
stats/
- *.stats.txt: Detailed alignment statistics.
mpileup/
- *.mpileup.txt: Base-level pileup data.

Coverage: Calculates per-base and overall coverage statistics to assess sequencing depth and uniformity.

Flagstat: Provides a quick summary of alignment quality, including total reads, mapped reads, duplicates, and other key metrics.

Stats: Generates comprehensive metrics such as insert size distributions, read lengths, and mapping quality.

Mpileup: Produces raw base-level pileups at each genomic position, which are used for downstream nucleotide frequency analysis and SNP detection.

Nucleotide Quantification

Output files

nucleotide_freq/
- *.nucl_freq.txt: Nucleotide frequency tables.

This generic module parses mpileup output to calculate the frequency of each base at target loci, enabling SNP detection.

ABO SNP Interpretation

Output files

phenotype/
- *.snp_summary.txt: Summary of ABO-related SNPs.
- *.phenotype.txt: Predicted ABO phenotype per sample.

This custom module interprets SNP data to infer ABO blood group phenotypes using curated reference profiles and decision rules.

MultiQC

Output files

multiqc/
- multiqc_report.html: A standalone HTML file that can be viewed in your web browser.
- multiqc_data/: Directory containing parsed statistics from the different tools used in the pipeline.
- multiqc_plots/: Directory containing static images from the report in various formats.

MultiQC aggregates results from FastQC, SAMtools, and other modules into a single interactive report. It also includes software version tracking for reproducibility.

Pipeline information

Output files

pipeline_info/
- Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
- Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter’s are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.
- Parameters used by the pipeline run: params.json.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

On this page

nf-core/abotyper Edit

Introduction

Output

Pipeline overview

MakeIndex

FastQC

Minimap2

SAMtools Modules

Nucleotide Quantification

ABO SNP Interpretation

MultiQC

Pipeline information

nf-core/abotyper
Edit