MIReStruC

MIReStruC: A tool to find microRNA structural clusters at genome scale and from deep sequencing data.

Overview:

MIReStruC can be run in three different ways. It is characterized by specific pre-treatments of the different kinds of data that it can handle. It predicts structural clusters of miRNAs by providing a list of positions of miRNA/pre-miRNA pairs composing corresponding structural clusters. Structural clusters of miRNAs are identified along a genomic sequence either 1. with an ab initio sequence analysis by looking for repeated sequences in palindromic regions (black path, Figure 1 of the article), or 2. with a structural analysis by considering deep sequencing reads as potential miRNAs (red path, Figure 1 of the article), or 3. with a combination of sequence analysis and deep sequencing data by finding structural clusters from deep sequencing reads and from multiple palindromic sequences (green path, Figure 1 of the article). The algorithm starts with pre-treatments adequate to the type of input data and filters afterwards potential miRNA structural clusters based on five combinatorial and structural criteria describing acceptable pre-miRNAs.

Download

The MIReStruC-1.0 package can be downloaded here.

You can unpack the archive through the command

tar xzf MIReStruC-1.0.tar.gz

System requirements:

MIReStruC version 1.0 has been developed under Ubuntu Linux operating system.
The bash environment should be installed.
Python executable should be accessible from the PATH environment variable.
Awk executable should be accessible from the PATH environment variable.
C source codes are used in MIReStruC, hence you will need the adequate environment testes by ./configure executable
RNAfold (version 1.8x) should be pre-installed on the operating system and be accessible from the PATH environment variable.
RNAeval should be pre-installed on the operating system and be accessible from the PATH environment variable.
If you want to use MIReStruC with deep sequencing data, you will need to use MicroRazerS.
MIReStruC has been developed under the Ubuntu Linux operating system.

How to compile MIReStruC C source codes:

To compile C sources useful for MIReStruC, type the following commands (you can also read the INSTALL file for informations):

cd [MIReStruC-1.0 repository]
./configure
make

How to excute MIReStruC:

To know how to execute MIReStruC, you can type the following command:

./MIReStruC.sh -h

Look at the files contained in the repository named "dataset/" for examples of input files. Go into the dataset repository and type one of the following commands to obtain the corresponding out.txt file:

for method (1):
```
../MIReStruC.sh -P
```
for method (2):
```
../MIReStruC.sh -D
```
for method (3):
```
../MIReStruC.sh -C
```

Input files:

Using MIReStruC with method (1)

When invoking MIReStruC to predict structural clusters from a genomic sequence, you need a single file. The file contains a DNA sequence (e.g. dataset/par_seq.txt) to apply method 1. (black path, Figure 1 of the article) on it. The genomic sequence is given on a single line.

Using MIReStruC with method (2)

When invoking MIReStruC to predict structural clusters from deep sequencing reads, you need two files. The first file contains a DNA sequence (e.g. dataset/deep_seq.txt) and the second file contains an output file of the MicroRazerS algorithm (e.g. dataset/deep_mraz.txt). The output of MicroRazerS is obtained by applying MicroRazerS software on the DNA sequence with deep sequencing reads.

Using MIReStruC with method (3)

When invoking MIReStruC to predict structural clusters by using the combination of sequence analysis and deep sequencing reads, you need two files. The first file contains a DNA sequence (e.g. dataset/deep_seq.txt) and the second file contains an output file of the MicroRazerS algorithm (e.g. dataset/deep_mraz.txt). The output of MicroRazerS is obtained by applying MicroRazerS software on the DNA sequence with deep sequencing reads.

Results:

When executing MIReStruC, the output file contains structural clusters predicted by the corresponding algorithm. It corresponds to a list of clusters composed of several miRNAs/pre-miRNAs whose positions are given following the format:

Cluster n°A:
>B-C D E
>F-G H I
>J-K L M

Where A stands to the number of the corresponding cluster in the list. The positions of miRNAs (in the corresponding genomic sequence) composing the cluster are given by numbers B, C, F, G, J and K where B, F and J stands for starting positions and where C, G and K stands for ending positions. Given a miRNA, positions of its corresponding predicted pre-miRNA can be obtained the two following numbers. For instance, ">B-C D E" indicates that the pre-miRNA is obtained by adding D nt before the miRNA sequence and E nt after. When ".rc" is given for a miRNA/pre-miRNA pair, it indicates that the corresponding miRNAs/pre-miRNAs lie on complementary strand.

Licence:

The MIReStruC program has been developed under the CeCILL licence (see LICENCE).

MIReStruC uses a modified implementation of RNAfold from the ViennaRNA package [Hofacker et al. 1994].

Contacts:

For questions, comments, or suggestions feel free to contact Alessandra Carbone or Anthony Mathelier.

Reference:

If you are using MIReStruC, please cite:

A. Mathelier and A. Carbone. (2013) Large scale chromosomal mapping of human microRNA structural clusters Nucleic Acids Research. 10.1093/nar/gkt112

Last Update Sept. 2013