riboHMM*: Comprehensive annotation of translated regions using ribosome footprint profiling data
Understanding the functional effects of gene expression critically depends on the accurate and comprehensive annotation of sequence elements which are translated in each gene. Ribosome profiling provides direct and genome-wide measurements of translation levels in a given cell type. In this talk, I will first introduce a method, riboHMM, that 1) models a codon periodicity structure in ribosome profiling data, and 2) integrates RNA sequence information and transcript expressions to identify translated regions in a transcript. Applying riboHMM on ribosome profiling data collected from human lymphoblastoid cell lines, we identified 7273 novel translated regions, including 2442 translated upstream open reading frames (uORFs) and 2551 coding sequences from transcripts that were previously annotated as non-coding. We observed that more than 60% of the novel coding sequences use non-canonical start codons. We also observed that ~40% of the 2442 translated uORFs are likely to regulate the translation of their downstream coding regions. Motivated by this observation, I will briefly introduce another method, riboHMM2, for annotating a comprehensive set of translated uORFs by jointly modelling the fine-scale structure in ribosome profiling data around translated uORFs and downstream coding regions. While the previous riboHMM was able to search for translated uORFs only in the transcripts with translated downstream coding regions, riboHMM2 enables annotation of translated uORFs in the entire transcripts. It also allows us to infer the regulatory impact of uORFs on downstream coding regions (e.g., suppression), which is useful for gene regulation studies to understand the mechanisms of uORF actions.
Dr Heejung Shim, University of Melbourne