(I am seeing about 5-10 views a day on the Pipeline Pilot pages, please be so kind to acknowledge / cite my blog when you use these tools and guides)

During the time I have been using PP, I found it inconvenient that there was no component to calculate the correlation coĆ«fficient between two properties present in the data stream (for instance when performing external validation of a model).

Therefore I have written a component to do just that. One of the features I find useful is the option to include both an upper and lower error margin line. Allowing a quick visual inspection of your model reliability.

While in the latest version (8.5) there is a component called "Regression Model Evaluation Viewer" which calculates an RMSE and R

My component is on my website and compatible with PP 8.5 and up, it can be found

here.

It has been tested up to a maximum of approx. 20,000 records and works fine. In addition the parameters that are also calculated in the 'Regression Model Evaluation Viewer' and 'R-statistics fit plots' are identical.

The component calculates correlation parameters according to Tropsha (2010)

The following values are calculated:

**Why would we want such a thing?**During the time I have been using PP, I found it inconvenient that there was no component to calculate the correlation coĆ«fficient between two properties present in the data stream (for instance when performing external validation of a model).

Therefore I have written a component to do just that. One of the features I find useful is the option to include both an upper and lower error margin line. Allowing a quick visual inspection of your model reliability.

While in the latest version (8.5) there is a component called "Regression Model Evaluation Viewer" which calculates an RMSE and R

^{2}, this component has some downsides.- The component calculates the modeled values internally, so it cannot be used to calculate the correlation between two sets of values obtained from external sources.
- The component only calculates the R
^{2}and RMSE, while for a proper evaluation R_{0}^{2}and k-slope are also required.

My component is on my website and compatible with PP 8.5 and up, it can be found

here.

It has been tested up to a maximum of approx. 20,000 records and works fine. In addition the parameters that are also calculated in the 'Regression Model Evaluation Viewer' and 'R-statistics fit plots' are identical.

*So what does it do?*The component calculates correlation parameters according to Tropsha (2010)

^{1}between two properties present in the stream. These properties are defined as 'Activity' (Y-values) and 'Model' (X-values). These have to be present in the stream and therefore need to be pre-calculated in the case of a model. In addition, a scatter plot containing all values is output. Both the parameters and the plot are output as reporting items.The following values are calculated:

- RMS Error (RMSE)
- R
^{2}(R2) - R
_{0}^{2}(R2_zero) - R
_{0}^{2}' (R2_zero_acc) - k-Slope (Slope_K)
- k-Slope ' (Slope_K_acc)
- % Difference between R
^{2}and R_{0}^{2}(Perc_Diff_R2_with_R2_zero) - % Difference between R
^{2}and R_{0}^{2}' (Perc_Diff_R2_with_R2_zero_acc) - Absolute difference between R
_{0}^{2}and R_{0}^{2}' ( Absolute_diff_R2_zero_and_R2_zero_acc)

**Additional Settings:**- Under 'Plot Parameters' variables for the x-y scatter plot can be defined. Furthermore the range of the upper and lower error lines can be set (
*default 0.5 from the line of unity*).- 'Auto_range'; when set to 'True' the scale of the axis is automatically defined to the scale of the data. Alternatively; when set to 'False' (
*default*), a range can be entered manually for 'Activity' (y-value) and 'Model' (x-value)(*Default is 2.0 - 12.0*). - 'Uncertainty' defines the margin between the line of unity and the uncertainty lines (
*default 0.5 units away from line of unity*). - If 'Uncertainty_in_plot' is set to 'True' (default) then two lines indicating a lower and upper error line are drawn in the plot.

- 'Auto_range'; when set to 'True' the scale of the axis is automatically defined to the scale of the data. Alternatively; when set to 'False' (
- If 'Output_Records' is set to 'True' all values are output unchanged to the 'Fail' port while the plot and correlation parameters are output to the 'Pass' port.

RMSE | R2_zero | R2 | R2_zero_acc | Slope_K acc | Slope_K | Perc_Diff_R2 with_R2_zero | Perc_Diff_R2 with_R2_zero_acc | Absolute_diff_R2_zero and_R2_zero_acc |

0.679 | 0.839 | 0.839 | 0.827 | 0.997 | 0.928 | 0.000 | 0.015 | 0.012 |

*Tropsha, A. (2010). Predictive Quantitative Structure-Activity Relationships Modeling. Handbook of Chemoinformatics Algorithms. J. Faulon and A. Bender.*

## Geen opmerkingen:

## Een reactie posten