FastqPuri: high-performance preprocessing of RNA-seq data

Pérez-Rubio, P.; Lottaz, C.; Engelmann, J.C.

doi:/10.1186/s12859-019-2799-0

nieuwe zoekopdracht

[ meld een fout in dit record ]

mandje (0): toevoegen | toon

FastqPuri: high-performance preprocessing of RNA-seq data

Pérez-Rubio, P.; Lottaz, C.; Engelmann, J.C. (2019). FastqPuri: high-performance preprocessing of RNA-seq data. BMC Bioinformatics 20(1): 226. https://dx.doi.org/10.1186/s12859-019-2799-0

Bijhorende data:

https://doi.org/10.4121/uuid:9d88ee8d-ceda-4d7e-8109-1cfcd2892632
https://doi.org/10.4121/uuid:b1c4ee4f-9b88-493f-81d8-4040f0d1af25

In: BMC Bioinformatics. BioMed Central: London. e-ISSN 1471-2105, meer

Beschikbaar in	Auteurs
NIOZ: NIOZ Open Repository 328753 [ download pdf ]

Author keywords

fastq; RNA-seq; Quality control; Preprocessing; Sequence data

Auteurs		Top
Pérez-Rubio, P. Lottaz, C. Engelmann, J.C., meer

Abstract

Pérez-Rubioet al. BMC Bioinformatics (2019) 20:226 https://doi.org/10.1186/s12859-019-2799-0SOFTWAREOpen AccessFastqPuri: high-performancepreprocessing of RNA-seq dataPaula Pérez-Rubio1, Claudio Lottaz1and Julia C. Engelmann2*AbstractBackground:RNA sequencing (RNA-seq) has become the standard means of analyzing gene and transcriptexpression in high-throughput. While previously sequence alignment was a time demanding step, fast alignmentmethods and even more so transcript counting methods which avoid mapping and quantify gene and transcriptexpression by evaluating whether a read is compatible with a transcript, have led to significant speed-ups in dataanalysis. Now, the most time demanding step in the analysis of RNA-seq data is preprocessing the raw sequence data,such as running quality control and adapter, contamination and quality filtering before transcript or genequantification. To do so, many researchers chain different tools, but a comprehensive, flexible and fast software thatcovers all preprocessing steps is currently missing.Results:We here presentFastqPuri, a light-weight and highly efficient preprocessing tool for fastq data.FastqPuriprovides sequence quality reports on the sample and dataset level with new plots which facilitate decision making forsubsequent quality filtering. Moreover,FastqPuriefficiently removes adapter sequences and sequences frombiological contamination from the data. It accepts both single- and paired-end data in uncompressed or compressedfastq files.FastqPurican be run stand-alone and is suitable to be run within pipelines. We benchmarkedFastqPuriagainst existing tools and found thatFastqPuriis superior in terms of speed, memory usage, versatility andcomprehensiveness.Conclusions:FastqPuriis a new tool which covers all aspects of short read sequence data preprocessing. It wasdesigned for RNA-seq data to meet the needs for fast preprocessing of fastq data to allow transcript and genecounting, but it is suitable to process any short read sequencing data of which high sequence quality is needed, suchas for genome assembly or SNV (single nucleotide variant) detection.FastqPuriis most flexible in filtering undesiredbiological sequences by offering two approaches to optimize speed and memory usage dependent on the total sizeof the potential contaminating sequences.FastqPuriis available athttps://github.com/jengelmann/FastqPuri.Itisimplemented in C and R and licensed under GPL v3.

Alle informatie in het Integrated Marine Information System (IMIS) valt onder het VLIZ Privacy beleid

Top | Auteurs

IMIS is ontwikkeld en wordt gehost door het VLIZ, voor meer informatie contacteer