RNAmining is a web tool that allows nucleotides coding potential prediction. It takes a user-defined fasta sequences. This tool was implemented using XGBoost machine learning algorithm. Machine learning is a subfield of computer science that developed from the study of pattern recognition and computational learning theories in artificial intelligence. This tool operate through a model obtained from training data analyzes and produces an inferred function, which can be used for mapping new examples.
You need to upload your RNA sequences in fasta format, see the image example below:
The algorithm begins by reading the RNA sequences provided in the uploaded file. Thereafter, it is divided into two main parts: the preprocessing and the prediction. In preprocessing, we perfomed a tri-nucleotides frequency of each RNA sequence and then, we normalized it according to the sequence's lenght. This process is save in a file, which is going to be used as input for the second part. In prediction, since the user provides the organism type (e.g. Homo sapiens), the tool selects a specific organism model trained by XGBoost and perform the prediction, which is shown in the platform and can be downloaded as a .zip file.
Non-coding RNAs are untranslated RNA molecules, but are important players in the cellular regulation of organisms from different kingdom. Thus, the research interest on non-coding RNAs has increased dramatically in recent years. Its investigation is routine in every transcriptome or genome project, since any mutations or misregulation on them result in disorders such as: tumor formation (cancerous or other type), cardiovascular, neurological diseases and others human illness. Therefore, exists an important step in ncRNAs research which is the ability to distinguish coding/non-coding sequences.
Thus, RNAmining was built to enable easy access to nucleotides coding potential prediction for non-programming researchers. Additionally, the results are very easy to interpret.