Welcome to KMR#

This is KMR, a high-performance map-reduce library. KMR-1.0 is available since 2013-04-26. KMR works on ordinary clusters as well as large-scale supercomputers. KMR source code is available under the BSD license.

Latest release is KMR-1.9 (2018-08-27).

KMR is a set of high-performance map-reduce operations in the MPI (Message Passing Interface) environment. It makes programming for data-processing much easier by hiding low-level details of message passing. Its main targets are large-scale supercomputers with thousands of compute nodes. KMR provides utilities other than map-reduce operations to address issues such as accessing very large file-systems, on platforms K and Fujitsu FX10.

KMR is designed to work in-memory and exploit large amount of memory available on supercomputers, whereas most map-reduce implementations are designed to work with external (disk-based) operations. So, data exchanges in KMR occur as message passing instead of remote file operations. The KMR routines work in bulk-synchronous and the most part of the code is sequential, but the code inside the mapper and reducer are multi-threaded.

Documents#

Downloading#

Tutorials#

Project Site#

Publications#

  • cluster2013.pdf(info): K MapReduce: A Scalable Tool for Data-Processing and Search/Ensemble Applications on Large-Scale Supercomputers. Motohiko Matsuda, Naoya Maruyama, and Shinichiro Takizawa. IEEE Cluster Computing (CLUSTER) 2013. (C) Copyright IEEE. IEEE Explore
    It describes an overview and optimizations used in KMR.
  • hpcs2014.pdf(info): Supporting Workflow Management of Scientific Applications by MapReduce Programming Model. Shinichiro Takizawa, Motohiko Matsuda, and Naoya Maruyama. IPSJ HPCS 2014. (Japanese)
    It describes some scientific applications workflow implemented in MapReduce using KMR.
  • bigdata2014.pdf(info): Evaluation of Asynchronous MPI Communication in Map-Reduce System on the K Computer. Motohiko Matsuda, Naoya Maruyama, and Shinichiro Takizawa. EuroMPI Workshop 2014. (C) Copyright ACM. ACM Digital Library
    It compares all-to-all collective communication versus asynchronous communication in shuffling communication, to qualify believed effectiveness of overlapping of communication and computation.

DISCLAIMER#

KMR comes with ABSOLUTELY NO WARRANTY. This wiki also comes with ABSOLUTELY NO WARRANTY. Contents are liable to change.


Acknowledgment#

KMR is a product of RIKEN R-CCS. Part of the results is obtained by using K computer at RIKEN R-CCS.