CSFG Conferences, Cellulosic Biofuel Network AGM 2010

Font Size:  Small  Medium  Large

Grass genomes guide prediction of wheat full length cDNA for transcriptome sequencing analysis

Yong Xu, Michele Frick, Dallas Thomas, André Laroche

Last modified: 2010-03-04

Abstract


A very important challenge for triticale and wheat transcriptome sequencing analysis is that no reference such as genome and full length cDNA sequences is available. In this project, we are developing a pipeline to generate in silico wheat full length cDNA sequences based on the availability of more than 10.6 M wheat EST records in GenBank. The basic principle is to cluster and assemble wheat ESTs into contigs and then correct their alignment (frame shift, chimeric sequences) taking advantage of the protein sequences available from four known grass genomes: Brachypodium, rice, sorghum and maize. After full analysis of the contigs and removal of redundancies, we have identified 11,000 full-length sequences based on their comparable length to the corresponding orthologs from the four grass genomes. Integration of the Wheat Full length CDSs from GenBank (http://www.ncbi.nlm.nih.gov/) and Triticeae Full-Length CDS DataBase (http://trifldb.psc.riken.jp/index.pl) to our full length EST-based database led to the identification of 20,000 non-redundant full length CDS genes identified as TaFLCDS. Using 454 transcriptome data from triticale, we could associate 40% of the reads (n=735,534) to those full length CDS genes.


Conference registration is required in order to view papers.