The computational analysis of proteins reveals that many repetitive sequences are shared across proteins and show similarities in species from bacteria to humans.
MIT biologists have found a new method to study proteins that have similar sequences as a unified group. The researchers used dot plots and dimensionality reduction to identify “low-complexity regions” (LCR)/copy relationships systematically and create a map of low-complexity regions sequence space capable of integrating LCR features and functions. The study provides insight into how LCR types and copy numbers contribute to higher order assemblies by defining LCR relationships across the proteome, such as the importance of K-rich LCR copy numbers on the assembly of nucleolar protein RPA43.
There are many biological processes involving low-complexity regions, however, a comprehensive understanding of their sequences, features, relationships, or functions is missing. There are about 70% of human proteins that contain more than one sequence, composed of a single amino acid repeated many times with a few additional amino acids scattered in. In most organisms, these LCRs are also present. These proteins have many different functions, but the new method will help scientists learn more about them. This technique allows them to look at the similarities and differences between different species’ LCRs and helps them figure out what functions these sequences have and in what proteins they are found.
A method developed by the researchers was used to analyze all proteins found in eight different species, including bacteria and humans. The low-complexity regions vary between proteins and species, but they usually share a common function – assisting the protein they are found in to join larger assemblies, such as the nucleolus, an organelle found in almost every human cell.
According to Byron Lee, an MIT graduate student, rather than focusing on specific LCRs and their functions, which may seem different since they are involved in different processes, our broader approach allows us to see similarities between their properties, indicating that perhaps LCR functions are not as disparate as they seem.
There were also some differences between the LCR sequences of different species, and these species-specific sequences were associated with species-specific functions, including the formation of plant cell walls.
Computational analysis of large biological datasets
According to earlier research, LCRs are involved in a variety of cellular processes, such as cell adhesion and DNA binding. These LCRs are frequently abundant in single amino acids, such as alanine, lysine, or glutamic acid.
Studying these sequences individually is time-consuming, so the MIT team applied bioinformatics approaches for analyzing large sets of biological data.
According to Jaberi-Lashkari, rather than focusing on individual LCRs, the researchers took a step back to check if they could see some patterns on a larger scale instead of just looking at individual ones. Taking a closer look might allow us to understand better what the ones with assigned functions do and also provide some insight into what the ones without assigned functions do.
To generate images of each protein, the researchers used a technique called dot plot matrix, which shows amino acid sequences visually. A computational image processing method was then used to compare thousands of these matrices simultaneously.
This method enabled the researchers to categorize LCRs according to their most frequent amino acid repeats. Proteins containing LCRs were also grouped according to the number of copies of each LCR type. This analysis helped the researchers gain a deeper understanding of the functions of these LCRs.
The human protein picked for demonstration by the researchers, known as RPA43, has three lysine-rich LCRs. The protein is one of many subunits that make up the enzyme RNA polymerase 1, which synthesizes ribosomal RNA. According to the researchers, Lysine-rich LCRs can facilitate protein integration into the nucleolus, the organelle responsible for synthesizing ribosomes.
Highly conserved LCRs
The sequences of some LCR types have not changed much over evolution, suggesting that they are highly conserved among species. Many of these sequences can also be found in highly conserved elements of proteins and cell structures, such as nucleolus.
According to Lee, sequences like these appear to be essential for nucleolus assembly. Several of the principles known to be essential for higher-order assembly appears to be at work since copy numbers, which can influence the number of interactions a protein can have, are necessary for it to embed in that compartment.
Additionally, the researchers found differences between the LCRs of two kinds of nucleolus-assembling proteins. They found that a nucleolar protein known as TCOF contains many glutamine-rich LCRs that can assist in scaffolding assembly formation, while nucleolar proteins with a few of these glutamic acid-rich LCRs can be recruited to act as clients.
Nuclear speckle, which is found within the nucleus of cells, also appears to have many conserved LCRs. Moreover, researchers discovered many similarities between LCRs and those involved in forming large-scale assemblies, like the extracellular matrix, which provides structural support to animals and plants.
Moreover, the team found examples of structures with LCRs that seem to have diverged among species. There are distinctive LCR sequences in the proteins of plants that form their cell walls, and these sequences are not seen in other organisms.
The researchers are now looking forward to expanding their LCR analysis to additional species.
According to Lee, since this map can be expanded to essentially any species, there’s a lot more to discover. It provides researchers with the opportunity and framework to identify new biological assemblies.
Freely available courses to learn each and every aspect of bioinformatics.
Stay updated with the latest discoveries in the field of bioinformatics.
Dr. Tamanna Anwar is a Scientist and Co-founder of the Centre of Bioinformatics Research and Technology (CBIRT). She is a passionate bioinformatics scientist and a visionary entrepreneur. Dr. Tamanna has worked as a Young Scientist at Jawaharlal Nehru University, New Delhi. She has also worked as a Postdoctoral Fellow at the University of Saskatchewan, Canada. She has several scientific research publications in high-impact research journals. Her latest endeavor is the development of a platform that acts as a one-stop solution for all bioinformatics related information as well as developing a bioinformatics news portal to report cutting-edge bioinformatics breakthroughs.