Abstract
Next-generation sequencing (NGS) using two-dye chemistry has reduced DNA sequencing costs but introduced challenges, such as overrepresented poly-guanine (poly-G) tails, especially in reverse strands. Poly-G artifacts often result in erroneous high-confidence G bases at the ends of reads, complicating downstream analyses. This study evaluated the efficiency and speed of three popular trimming tools viz. BBDuk, Cutadapt, and Fastp in removing poly-G artifacts from NGS datasets. A sample dataset generated using the Illumina NovaSeq 6000 platform from crossbred cattle with 26.32 million reads and 6.79 per cent poly-G content was used for the study. Quality was assessed with FastQC, and trimming was performed using BBDuk, Fastp, and Cutadapt. Post-trimming, datasets were re-evaluated using FastQC and metrics like poor quality sequences, GC content, and trimming time were recorded. Results indicated that the tool BBDuk was the fastest (8.42 seconds), followed by Fastp (9.50 seconds) and Cutadapt (24.42 seconds). All the tools efficiently trimmed poly-G tails, with BBDuk and Cutadapt retaining more sequences than Fastp.
Keywords : Poly G-tail trimming, BBDuk, Cutadapt, Fastp
Share this article on [Facebook] [LinkedIn]
Article history: Received: 27-09-2024, Accepted : 16-10-2024, Published online: 01-12-2024
Corresponding author: Pramod S