Identification of extracellular fungal enzymes via de novo assembly of short-read transcript sequence data
Last modified: 2010-03-05
Abstract
Whole transcriptome shotgun sequencing, or “RNA-seq”, is a major application of recent next-generation sequencing technologies. The technique yields precise measurements of the RNA content of a biological sample, allowing for example the quantitative assessment of differential gene expression and refinement of gene models. These applications rely on knowledge of the sequenced genome of the organism of interest – transcript sequence reads are aligned to the genomic sequence. De novo assembly of transcript sequence reads, i.e. without use of the genomic sequence, has in the past proved difficult computationally due to wide variations in transcript abundance. We have used the Illumina sequencing technology and the Velvet sequence assembly software to successfully assemble short RNA reads (39 to 84bps), producing hundreds of full length transcripts for several fungal species. We concentrate on enzymes secreted by biomass-degrading fungi because of their interest to the bio-fuel community. Using the contiguous transcript sequences produced by our de novo assembly, a computational pipeline has been developed to semi-automatically identify extracellular fungal enzymes. The pipeline relies on a sequence database of ligno-cellulose active enzymes, against which potential transcript sequences are searched for close matches. Full-length transcript sequences of matches are detected via the presence of suitable start and stop codons encompassing the region of sequence alignment with known enzyme sequences. The presence or otherwise of a secretory signal peptide is checked with the SignalP software, and cloning primers are automatically generated to aid in the downstream characterization of potential target enzymes.
Conference registration is required in order to view papers.