Guide

This page describes how to quickly get started with MALT-2. MALT-2 parallelizes Torch over multiple CPUs and GPUs.

Building MALT with Torch

Requirements

Setup

Install Torch, MPI, Boost and CUDA (if using GPU).

git clone https://github.com/malt2/malt2.git --recursive

Setup the environment variables

Source your torch/cuda/MKL environment:

on some machines, you might need things something like (MKL is optional):

source [torch-dir]/install/bin/torch-activate
source /opt/intel/mkl/bin/intel64/mklvars.sh intel64

If using modules, you can try:

module install icc cuda80 luajit

To build everything including dstorm, orm and torch, just type from the top-level directory:

make

This command builds the distributed shared memory component (dstorm), the shared memory transport hook (orm) and the luarocks for torch hooks and distributed optimization.

Component-wise build

To build componenet-wise (not required if using make above):

Build the dstorm directory, run:

cd dstorm
./mkit.sh GPU test

You should get a SUCCESS as the output. Check the log files to ensure the build is successful.

The general format is:

./mkit.sh <type> 

where TYPE is: or MPI (liborm + mpi) or GPU (liborm + mpi + gpu) A side effect is to create ../dstorm-env.{mk|cmake} environment files, so lua capabilities can match the libdstorm compile options.

Build the orm

cd orm
./mkorm.sh GPU

Building Torch packages. With Torch environment setup, install the malt-2 and dstoptim (distributed optimization packages)

cd dstorm/src/torch
rm -rf build && VERBOSE=7 luarocks make malt-2-scm-1.rockspec >& mk.log && echo YAY #build and install the malt-2 package
cd dstoptim
rm -rf build && VERBOSE=7 luarocks make dstoptim-scm-1.rockspec >&mk.log && echo YAY # build the dstoptim package

Test

require "malt2"

Run a quick test.

mpirun -np 2 `which th` `pwd -P`/test.lua mpi 2>&1 | tee test-mpi.log
mpirun -np 2 `which th` `pwd -P`/test.lua gpu 2>&1 | tee test-GPU-gpu.log
mpirun -np 2 `which th` `pwd -P`/test.lua mpi 2>&1 | tee test-GPU-mpi.log

default transport is set to the “highest” built into libdstorm2: GPU > MPI > SHM

mpirun -np 2 `which th` `pwd -P`/test.lua 2>&1 | tee test-best.log

Running over multiple GPUs.

mpirun -np 2 ./redirect.sh `which th` `pwd`/test.lua

This script assigns available GPUs in a round-robin fashion. Since MPI requires visibility of all other GPUs to correctly access shared memory, this script only changes the enumeration order and does not restrict visibility.

Running applications.