Fig. 2
From: Associating expression and genomic data using co-occurrence measures

Accuracy on the validation set for PAM50 subtype classification, comparing training on continuous microarray data to training on GMM discretized data and a naive discretization strategy (STD). For the GMM discretization, we allowed the maximum number of regimes to be 2 (GMM 2), 3 (GMM 3) and 6 (GMM 6). Cross-validation was repeated 20 times, with a 0.7/0.3 training/validation ratio using a Random Forest classifier with 1500 trees [49]