Welcome to KMR#
This is KMR, a high-performance map-reduce library. KMR-1.0 is available on K computer since 2013-04-26. KMR works on ordinary clusters as well.
Latest release is KMR-1.8.1 (2016-04-25).
KMR is a set of high-performance map-reduce operations in the MPI (Message Passing Interface) environment. It makes programming for data-processing much easier by hiding low-level details of message passing. Its main targets are large-scale supercomputers with thousands of compute nodes, such as K and Fujitsu FX10. On these platforms, KMR provides utilities other than the map-reduce operations which address issues such as accessing very large file-systems.
KMR is designed to work in-memory and exploit large amount of memory available on supercomputers, whereas most map-reduce implementations are designed to work with external (disk-based) operations. So, data exchanges in KMR occur as message passing instead of remote file operations. The KMR routines work in bulk-synchronous and the most part of the code is sequential, but the code inside the mapper and reducer are multi-threaded.
- Project Overview and other Activities of the Team at RIKEN:
- Overview and API Document
- It is a Doxgen generated document, included in the installation.
- Source Code Download:
- KMR source is available with BSD license.
- KMR Issue Tracker:
- Tutorial (in Japanese)
- cluster2013.pdf: K MapReduce: A Scalable Tool for Data-Processing and Search/Ensemble Applications on Large-Scale Supercomputers. Motohiko Matsuda, Naoya Maruyama, and Shinichiro Takizawa. IEEE Cluster Computing (CLUSTER) 2013. (C) Copyright IEEE. IEEE Explore
It describes an overview and optimizations used in KMR.
- hpcs2014.pdf: Supporting Workflow Management of Scientific Applications by MapReduce Programming Model. Shinichiro Takizawa, Motohiko Matsuda, and Naoya Maruyama. IPSJ HPCS 2014. (Japanese)
It describes some scientific applications workflow implemented in MapReduce using KMR.
- bigdata2014.pdf: Evaluation of Asynchronous MPI Communication in Map-Reduce System on the K Computer. Motohiko Matsuda, Naoya Maruyama, and Shinichiro Takizawa. EuroMPI Workshop 2014. (C) Copyright ACM. ACM Digital Library
It compares all-to-all collective communication versus asynchronous communication in shuffling communication, to qualify believed effectiveness of overlapping of communication and computation.
KMR comes with ABSOLUTELY NO WARRANTY. This wiki also comes with ABSOLUTELY NO WARRANTY. Contents are liable to change.
KMR is a product of RIKEN AICS. Part of the results is obtained by using the K computer at RIKEN AICS.