normalize_reactivity
- ipasuite.workflow.scripts.tools.normalize_reactivity.compute_interquart_norm_term(df: DataFrame, norm_column: str = 'corr_areaRX') float
“BoxPlot” normalization
define outline threshold as 1.5 * interquartile range remove value above 3rd quartile that are above threshold, average arround 10% most reactive position remaining
- Parameters:
df (pd.DataFrame) – Dataframe used to calculate normalization term
norm_column (str) – column name used for normalization
- Returns:
normalization term for df
- Return type:
float
- ipasuite.workflow.scripts.tools.normalize_reactivity.compute_simple_norm_term(df: DataFrame, norm_column: str = 'corr_areaRX', stop_percentile: float = 90, outlier_percentile: float = 98, norm_term_avg_percentile: float = 90) float
“Simple” normalization
average of the 10% top reactive nucleotides, minus the top 2% (outliers)
- Parameters:
df (pd.DataFrame) – Dataframe used to calculate normalization term
norm_column (str) – df column used to calculate normalization term
stop_percentile (float) – The threshold above which background is estimated to be too high - data above this threshold are excluded from nomalization
outlier_percentile (float) – threshold (in percent) above which reactivity is considered as too high
norm_term_avg_percentile (float) – threshold (in percent) above which reactivities are used as to calculate normalization term
- Return type:
float
- ipasuite.workflow.scripts.tools.normalize_reactivity.normalize_all(*inputs: [<class 'str'>], output: str | None = None, reactive_nucleotides: [<class 'str'>] = ['G', 'C', 'A', 'U'], stop_percentile: float = 90.0, simple_outlier_percentile: float = 98.0, simple_norm_term_avg_percentile: float = 90.0, low_norm_reactivity_threshold: float = -0.3, norm_methods: [<class 'str'>] = ['simple', 'interquartile'], normcol: str = 'simple_norm_reactivity', plot: str | None = None, plot_title: str | None = None, shape_output: bool | None = None, map_output: bool | None = None) int
Normalized reactivity for each input files
Output tsv file with normalized reactivities
- Parameters:
inputs ([str]) – List of file to normalize in tsv
output (str) – Path directory where to output .tsv files containing normalized reactivity
reactive_nucleotides ([str]) – (default: A,C,G,U) comma separeted list of reactive nucleotides (A,C,G,U) are accepted
stop_percentile (float) – (default: 90. )The threshold above which background is estimated to be too high - data above this threshold will be discarded
simple_outlier_percentile (float) – (default)simple method only - threshold (in percent) above which reactivity is considered as too high
simple_norm_term_avg_percentile (float) – simple_method_only - threshold (in percent) above which reactivities are used as to calculate normalization term
low_norm_reactivity_threshold (float) – normalized reactivity threshold above which reactivity is not considered as significant, and removed
norm_methods ([str]) – comma-separated list of normalization methods: simple and interquartile are allowed
- Returns:
output tsv files
- Return type:
int