Step by step Description

[Work in progress]

Data import

If no input files are found in resources/1.1-fluo-ceq8000 or resources/1.2-fluo-ceq8000, IPANEMAP Suite will import them using a path constructed as following : raw_data:prefix_path from config.yaml concatenated with probe_file and control_file column from samples.tsv.

If samples.tsv specifies a QuShape file (qushape_file column), and no QuShape file is already present in results/2.1-qushape, the IPANEMAP Suite will import it. The file name is interpreted relative to prefix_path as defined in the configuration.

If samples.tsv specifies a Map file (map_file column), the IPANEMAP Suite will import it to the normalized reactivity folder; all previous steps of data import are overriden. The file name is interpreted relative to prefix_path as defined in the configuration.

:::{warning} Importing QuShape files and Map files created outside of IPANEMAP Suite should be done with special care.

If you want to import such files, you must ensure that you used the exact same sequence file as provided in config.yaml and that data are coherents between the qushape section of your config.yaml, columns of samples.tsv (e.g.: ddNTP column, rt_start and rt_stop)

IPANEMAP Suite avoids to overwrite existing QuShape project files. It will created a new project using sequencer data, only if no QuShape project file exists yet in the corresponding results folder. If a QuShape project exists, the pipeline extracts its reactivities (with the exception of direct import from Map files).

Data conversion

Files from CEQ8015 sequencer must converted to be used with QuShape. Headers are removed to obtain a tabular file

QuShape project generation

To simplify the use of QuShape (saving many manual configuration steps), the pipeline generates a complete QuShape project with probe_file control_file, sequence , ddNTP. Optionally, it uses a reference project, if specified in the sample file. All QuShape options used for generation can be controlled through the configuration dialog or the config.yaml file.

QuShape treatment of SHAPE-CE data

Treatment of the SHAPE-CE data in QuShape is the only remaining truly manual step in IPANEMAP Suite, even if the pipeline can facilitate it by preparing the project files. The user must open each QuShape project, and perform treatment to allow the calculation of reactivities.

Once the file is treated, the pipeline will be able to extract the reactivities, which will be used in downstream steps.

QuShape reactivity extraction

Reactivity is retrieved from fully treated QuShape projects.

Reactivity Normalization

Reactivity is normalized from raw reactivity extracted from QuShape.

Parameters :

reactive_nucleotides

Nucleotides which are affected by the SHAPE probe used

low_norm_reactivity_threshold

normalized reactivity threshold above which reactivity is not considered as significant, and then clipped to 0

stop_percentile

(default: 90. )The threshold above which background is estimated to be too high - data above this threshold will be discarded

simple_outlier_percentile

(default)simple method only - threshold (in percent) above which reactivity is considered as too high

simple_norm_term_avg_percentile

simple_method_only - threshold (in percent) above which reactivities are used as to calculate normalization term

Steps :

  1. Substracting background luminescence

  2. Removing top 10% - considered as Reverse transcriptase stop sites.

  3. Selecting reactive nucleotides

  4. Computing Normalization Term (simple normalization / interquartile method)

  5. Dividing nucleotide by normalization term.

  6. clipping reactivity under the low_norm_reactivity_threshold to 0

Simple normalization method

  1. Remove top 2% (simpleoutlier_percentile) reactivity which are considered as outliers

  2. Get top 8% (simplenorm_term_avg_percentile) following. the avg of those values constitues the normalization term.

Interquartile method

  1. Compute interquartile threshold defined as $Ir = 1.5 \times (Q_3 - Q_1)$ all values above $Q_3 + Ir$ this threshold are considered as outliers

  2. Avg of the top 10% of the remains values is the normalization factor.

Reactivity Aggregation

Structure generation - IPANEMAP

  • Added reactivity

Footprinting