T23C-2953 – Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures

Authors

Richard T Mills
Intel Corporation
Forrest M. Hoffman (forrest at climatemodeling dot org)
Oak Ridge National Laboratory
Jitendra Kumar
Oak Ridge National Laboratory
Sarat Sreepathi
Oak Ridge National Laboratory
Vamsi Sripathi
Oak Ridge National Laboratory

Session

State of the Art in Computational Geoscience II Posters
Tuesday, December 13, 2016 13:40–18:00
Moscone South Poster Hall

Abstract

The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi (“Knights Landing”) processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.


Forrest M. Hoffman (forrest at climatemodeling dot org)