Predicting RNA 3D Structures with Motivus
Rodrigo Inostroza, 2020-12-29
What is RNA?
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes.
For years RNA was thought of as a simple messenger and a path between DNA and proteins. Within the last decades, RNA has been found to be much more than a mere messenger and translator of the genetic information in the cell. Nowadays, it is known that it can regulate functions of different proteins or even perform functions by itself. Its enzymatic and regulatory function have been observed in a variety of cellular processes, conferring it a major role in evolution and cellular metabolism.
"For a thorough understanding of these functions, an insight on the three-dimensional structure of RNA molecules is of crucial importance. Nevertheless, the reliable prediction of the full structure of an RNA motif based uniquely on its sequence is still a challenging aim. More than 100,000 structures are currently available in the Protein Data Bank; however, RNA-containing structures take up less than 6% of these depositions, including RNA structures complexed with other molecules."
RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures.
Despite the new technological advances of the 21st century, determining and analyzing RNA structures are still difficult and time-consuming tasks. The size, complexity, and specific detail of RNA 3D structures have been studied using nuclear magnetic resonance, electron microscopy, and crystallography. These techniques require multiple stages to perform in a laboratory.
This is why a program that allows us to obtain the full structure of a RNA motif, based uniquely on its sequence, is so significant. It will allow for accurate prediction and the ability to refine 3D Structures to assist in better and more efficient RNA laboratory work.
How does RNA 3D prediction work?
For the creation of a RNA model, the atoms that compound the RNA are grouped by their respective components (nitrogen base, sugar group or phosphate group) and they are represented by a common figure, for example, all the atoms that compound the phosphate group of a certain nucleotide are represented by a sphere. This is what is known as a Coarse-Grained model, which is a good option for representing complex molecular systems like RNA sequences due to their focus on the atom compound.
Modeling RNA sequences could be very time-consuming if inappropriate algorithms are used in the simulation, which is where the solution proposed by Dr. Simon Poblete comes in. An algorithm that uses divide and conquer and Montecarlo techniques for the implementation of the simulation (SPQR-MC simulation) works well in this type of problem where the RNA structure starts with a sequence of nitrogen bases.
"The SPQR code represents RNA through its nitrogen base, sugar group or phosphate group. Motivus uses this code to explore what happens with a certain RNA sequence, when it, for example, is thrown into a cup of water. Motivus will then give you a 3D structure in a file, with the different positions of all the elements", says Poblete.
For example, the sequence "GGGCGCAAGCCU" is initialized as a disordered 3D structure, the simulation in Motivus then iterates over it until it reaches the minimal state of energy.
The role of distributed computing through Motivus
The way SPQR works through the Motivus framework is through simultaneous simulations. Let's say that you want to do a calculation that implies 4 different conformations of an RNA; they will have to run simultaneously to reach an accurate result. For example, one simulation connects the different components, another removes knots and errors, another one remodels, and another one minimizes the structure.
The algorithm that uses SPQR through Motivus then works as a black box, where different calculations that inform each other happen simultaneously, and these simulations are sent through the Motivus framework to different users all over the world. The users’ personal devices therefore function data processing nodes.
In predicting RNA 3D structures through SPQR, parallel data processing is essential. You can have a workstation with 32 processors, where each one takes 2 hours to process, but if your structures require 700 simulations, then the computer would take 2 days to finalize the calculations. On the other hand, if you have 700 computers or nodes, and even if they are significantly slower than the workstation, the calculations could be run simultaneously and achieve results in even 10 hours.
"There is no limit to what can be achieved. Motivus took my code and implemented it differently so that it rests on their server. Then, if I or any scientist in the world has a calculation they want to do, they can enter the sequence in the Motivus framework and then distribute it all over the world to be calculated."
"There are very interesting things that my model can offer, and that is why I want people to be able to use it. I'm not satisfied with only publishing things, I want them to be utilized. Hopefully it will inspire other scientists to solve different problems".
Dr. Simon Poblete is part of the Physics Sciences and Mathematics Institute of Universidad Austral de Chile.