Published on: **Mar 3, 2016**

Source: www.slideshare.net

- 1. “NaRCiSuS” Noncoding RNA Comparitive Searching System Marie Curie European Reintegration Grants (ERG) Call: FP7-PEOPLE-RG-2009
- 2. What are noncoding RNAs? • RNA transcripts • without long translable open reading frame • highly structured • realising their functions as RNAs
- 3. What do they do? • RNA-protein machine: – Transfer RNA (tRNA). – Ribosomal RNA (rRNA). – RNAs (snRNAs) in spliceosome. • Catalytic RNAs (ribozymes): catalyzing some functions. • Micro RNAs (miRNAs): regulatory roles. • Small interfering RNAs (siRNAs): RNA silencing – The genome’s immune system. [Plasterk, Science (2002)] – The breakthrough of the year by Science magazine in 2002.
- 4. Protein coding gene prediction 1. Prokaryota • translation initiation site • ribosome binding site (Kozak sequence) • ORF features – length, similarity to known proteins, codon usage, coding capacity 2. Eukaryota • poli-A site • 3’, 5’ UTR • axon-intron location, splice sites
- 5. ncRNA gene prediction ? • no translation initiation site • no ribosome binding site • no KOZAK sequence • no significant ORF
- 6. ncRNA gene prediction ? • base composition statistics • simple translated regions search • secondary structure based search • combined comparative approach
- 7. RNA secondary structure • ncRNA is not a random sequence. • Most RNAs fold into particular base-paired secondary structure. • Canonical basepairs: – Watson-Crick basepairs: • G - C • A - U – Wobble basepair: • G – U
- 8. RNA secondary structure • Stacks: continuous nested basepairs. (energetically favorable) • Non-basepaired loops: – Hairpin loop. – Bulge. – Internal loop. – Multiloop.
- 9. RNA secondary structure • Most basepairs are non-crossing basepairs. – Any two pairs (i, j) and (i’,j’) i < i’ < j’ < j or i’ < i < j < j’ • Pseudoknots are the crossing basepairs.
- 10. Pseudoknots • Pseudoknots are important for certain ncRNAs • Violate the non-crossing assumption. • Pseudoknots make most problems harder • We assume there are no pseudoknots otherwise noted. [Rivas and Eddy (1999)]
- 11. ncRNA evolution is constrained by it secondary structure http://www.sanger.ac.uk/Software/Rfam/ • Drastic sequence changes can be tolerated. • Compensatory mutations are very common. – One basepair mutates into another basepair. – Doesn’t change its secondary structure. • In this talk: ncRNA – conserved structured RNA. tRNA1: tRNA2: Compensatory mutation
- 12. Secondary Structure alone is not enough • de novo prediction: – Find stable secondary structure from genome. [Shapiro et al. (1990)] • The stability of ncRNA secondary structure is not sufficiently different from the predicted stability of a random sequence. [Rivas and Eddy (2000)]. – Look transcript signals. [Wassarman et al. (2001), Argaman et al. (2000)] • ncRNA transcript signals are not strong. • protein coding gene signals (open reading frame, promoter). [Rivas and Eddy (2000)].
- 13. RNA secondary structure prediction • It is a basic issue in ncRNA analysis • It is important information to the biologists. • Searching and alignment algorithms are based on these models. • RNA secondary structure -- a set of non- crossing base pairs.
- 14. Base pair maximization problem • A simple energy model is to maximize the number of basepairs to minimize the free energy. [Waterman (1978), Nussinov et al (1978), Waterman and Smith (1978)] • G – C, A – U, and G – U are treated as equal stability. • Contributions of stacking are ignored.
- 15. A dynamic programming solution • Let s[1…n] be an RNA sequence. • δ(i,j) = 1 if s[i] and s[j] form a complementary base pair, else δ(i,j) = 0. • M(i,j) is the maximum number of base pairs in s[i…j]. [Nussinov (1980)]
- 16. A dynamic programming solution • M(1,n) is the number of base pairs in the optimal basepaired structure for s[1…n]. • All these basepairs can be found by tracing back through the matrix M. • Filling M needs O(n3 ) time.
- 17. RNA structure: example 0 1 1 1 1 0 2 2 1 1 3 2 1 1 0 i 1 2 3 4 5 6 j 3 4 5 6 2 A C G A U UA C G A U U 1 2 3 4 5 61 2 3 4 5 6
- 18. Zuker-Sankoff minimum energy model • Stacks (contiguous nested base pairs) are the dominant stabilizing force – contribute the negative energy • Unpaired bases form loops contribute the positive energy. – Hairpin loops, bulge/internal loops, and multiloops. • Zuker-Sankoff minimum energy model. [Zuker and Sankoff (1984), Sankoff (1985)] • Mfold and ViennaRNA are all based on this model. (this model is also called mfold model)
- 19. Zuker-Sankoff minimum energy model :eH(i,j) :a+3*b+4*c :eL(i,j,i’,j’) i j i j J’ i’ [Lyngsø (1999)] i i+1 j j-1 :eS(i,j,i+1,j-1)
- 20. RNA minimum energy problem • This problem can be solved by a dynamic programming algorithm in O(n4 ) time. • Lyngsø et al. (1999) revise the energy function for internal loop, proposed an O(n3 ) time solution.
- 21. Zuker-Sankoff model
- 22. Recursive functions • W(i) holds the minimum energy of a structure on s[1…i]. • V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. • WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.
- 23. Recursive functions (Zuker) • W(i) holds the minimum energy of a structure on s[1…i]. • V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. • WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.
- 24. A recursive solution • W(i) holds the minimum energy of a structure on s[1…i]. • V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. • WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.
- 25. A recursive solution • W(i) holds the minimum energy of a structure on s[1…i]. • V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. • WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.
- 26. Idea of the project • Integrated bioinformatics platform that is specifically addressed for detecting, verifying, and classifying of noncoding RNAs. • This complex approach to "Computational RNomics" will provide the pipeline which will be capable of detecting RNA motifs with low sequence conservation. • It will also integrate RNA motif prediction which should significantly improve the quality of the RNA homolog search. • The secondary structure of the RNA can be represented as a graph. • Graph can distinguish RNA of known families from Rfam database.
- 27. Goals • Using experssed sequence tags and profile- profile protein like method for predicting ORF. • The candidates will be folded “ab initio” usinf method proposed by zucker (MFold) • The secondary strucetres of the candidates can be then compared to sequence form Rfam or ncRNA database to find novel ncRNAs or detect missing members of already know families.
- 28. Service Features • Prediction missing members of already defined RNA families • Describe properties of ncRNA and measure confidence level of their functional importance • Prediction of novel ncRNA “ab initio”. • Visualization secondary and tetriary structure “online” • Exchange formats of RNA annotation. • Identify interaction with coding genes. • Design novel RNA’s for therapy and drug delivery (RNAi)
- 29. SERVER NCBI NCBI Sanger Repeatmasker BLASTz tBLASTn BLASTn mRNA/EST %GC PromH NCBI Loop Scaner QRNA mFOLD Human Mouse Human Mouse PFAM NR EST Remove repeats Searching for sequence of high functionality Searching for coding sequences Comparison of expression patterns Searching for sequences of well defined structure orcontaining compensatory mutations Searching for remaining genes 1 2 3 4 5 6