Fig. 4

Distribution of MCC obtained in 400 cross-validation runs at the Stage 3 of the modelling pipeline. Each point, representing MCC value obtained for a RF classifier prediction for the validation set in the cross validation loop. Each RF classifier was built on the different training set constructed in the cross-validation loop, using the variables selected as most relevant for a given training set. Values for G-145, CNV, MA-145, and MA+CNV data sets are presented from left to right. Each box-plot represents distribution of points to its left