bulk RNA-seq(1):Concatenation of raw read files

bulk RNA-seq(1):Concatenation of raw read files

Concatenation of Raw Read Files

To begin with, my operating environment is Linux Ubuntu 20.04.4, and I am working with 64 raw read fastq.gz files.

What we need to accomplish first is the concatenation of 4 files into a single file. For instance:

File name (before)File name(after)
LAB_273_17_…_L001_R1_001.fastq.gzLAB_273_17_R1.fastq.gz
LAB_273_17_…_L002_R1_001.fastq.gz
LAB_273_17_…_L003_R1_001.fastq.gz
LAB_273_17_…_L004_R1_001.fastq.gz

Tip

Lane Splitting: To increase throughput and reduce experimental bias, a sample’s DNA or RNA library might be sequenced across several lanes. This results in a separate data file for each lane, leading to multiple R1.fastq.gz files for a single sample.

To achieve this in Linux, open the terminal and execute the following script:

for name in *.fastq.gz; do
  printf '%s\n' "${name%_*_*_R[12]*}"
done | uniq |

while read prefix; do
  cat "$prefix"*R1*.fastq.gz > "${prefix}_R1.fastq.gz"
  cat "$prefix"*R2*.fastq.gz > "${prefix}_R2.fastq.gz"
done

Subsequently, verify if the files have been successfully concatenated.

Tags :
comments powered by Disqus

Related Posts

Bulk RNA-seq (4): Mapping and Quality Control with STAR, Qualimap, and featureCounts

Bulk RNA-seq (4): Mapping and Quality Control with STAR, Qualimap, and featureCounts

Note This section of the blog series will guide you through mapping RNA-seq read files to a genome index using STAR, performing quality control (QC) with Qualimap, and generating count data with featureCounts.

Read More
Bulk RNA-seq (5): Streamlining Mapping with a Custom Linux Script

Bulk RNA-seq (5): Streamlining Mapping with a Custom Linux Script

Note To overcome the inconvenience of manually mapping each RNA-seq sample to the reference genome, I’ve developed a Linux shell script.

Read More
What is the RNA-seq analysis?

What is the RNA-seq analysis?

Overall description of RNA-Seq The transcriptome has a high degree of complexity including multiple types of coding and non-coding RNA species, like mRNA, pre-mRNA, microRNA, and long ncRNA.

Read More