normalize_reactivity

ipasuite.workflow.scripts.tools.normalize_reactivity.compute_interquart_norm_term(df: DataFrame, norm_column: str = 'corr_areaRX') float

“BoxPlot” normalization

define outline threshold as 1.5 * interquartile range remove value above 3rd quartile that are above threshold, average arround 10% most reactive position remaining

Parameters:
  • df (pd.DataFrame) – Dataframe used to calculate normalization term

  • norm_column (str) – column name used for normalization

Returns:

normalization term for df

Return type:

float

ipasuite.workflow.scripts.tools.normalize_reactivity.compute_simple_norm_term(df: DataFrame, norm_column: str = 'corr_areaRX', stop_percentile: float = 90, outlier_percentile: float = 98, norm_term_avg_percentile: float = 90) float

“Simple” normalization

average of the 10% top reactive nucleotides, minus the top 2% (outliers)

Parameters:
  • df (pd.DataFrame) – Dataframe used to calculate normalization term

  • norm_column (str) – df column used to calculate normalization term

  • stop_percentile (float) – The threshold above which background is estimated to be too high - data above this threshold are excluded from nomalization

  • outlier_percentile (float) – threshold (in percent) above which reactivity is considered as too high

  • norm_term_avg_percentile (float) – threshold (in percent) above which reactivities are used as to calculate normalization term

Return type:

float

ipasuite.workflow.scripts.tools.normalize_reactivity.normalize_all(*inputs: [<class 'str'>], output: str | None = None, reactive_nucleotides: [<class 'str'>] = ['G', 'C', 'A', 'U'], stop_percentile: float = 90.0, simple_outlier_percentile: float = 98.0, simple_norm_term_avg_percentile: float = 90.0, low_norm_reactivity_threshold: float = -0.3, norm_methods: [<class 'str'>] = ['simple', 'interquartile'], normcol: str = 'simple_norm_reactivity', plot: str | None = None, plot_title: str | None = None, shape_output: bool | None = None, map_output: bool | None = None) int

Normalized reactivity for each input files

Output tsv file with normalized reactivities

Parameters:
  • inputs ([str]) – List of file to normalize in tsv

  • output (str) – Path directory where to output .tsv files containing normalized reactivity

  • reactive_nucleotides ([str]) – (default: A,C,G,U) comma separeted list of reactive nucleotides (A,C,G,U) are accepted

  • stop_percentile (float) – (default: 90. )The threshold above which background is estimated to be too high - data above this threshold will be discarded

  • simple_outlier_percentile (float) – (default)simple method only - threshold (in percent) above which reactivity is considered as too high

  • simple_norm_term_avg_percentile (float) – simple_method_only - threshold (in percent) above which reactivities are used as to calculate normalization term

  • low_norm_reactivity_threshold (float) – normalized reactivity threshold above which reactivity is not considered as significant, and removed

  • norm_methods ([str]) – comma-separated list of normalization methods: simple and interquartile are allowed

Returns:

output tsv files

Return type:

int