Measuring, testing, and identifying heterogeneity of large parallel datasets