on some machines, you might need things something like (MKL is optional):
If using modules, you can try:
To build everything including dstorm, orm and torch, just type from the top-level directory:
This command builds the distributed shared memory component (dstorm), the shared memory transport hook (orm)
and the luarocks for torch hooks and distributed optimization.
Component-wise build
To build componenet-wise (not required if using make above):
Build the dstorm directory, run:
You should get a SUCCESS as the output. Check the log files to ensure the build is successful.
The general format is:
where TYPE is:
or MPI (liborm + mpi)
or GPU (liborm + mpi + gpu)
A side effect is to create ../dstorm-env.{mk|cmake} environment files, so lua capabilities
can match the libdstorm compile options.
Build the orm
Building Torch packages. With Torch environment setup, install the malt-2 and dstoptim (distributed optimization packages)
Test
A very basic test is to run th and then try, by hand,
Run a quick test.
With MPI, then you’ll need to run via mpirun, perhaps something like:
if GPU,
NEW: a WITH_GPU compile can also run with MPI transport
default transport is set to the “highest” built into libdstorm2: GPU > MPI > SHM
Running over multiple GPUs.
MPI only sees the hostname. By default, on every host, MPI jobs enumerate the
GPUs and start running the processes. The only way to change this and run on
other GPUs in a round-robin fashion is to change this enumeration for every
rank using CUDA_VISIBLE_DEVICES. An example script is in redirect.sh file
in the top-level directory.
To run:
This script assigns available GPUs in a round-robin fashion. Since MPI requires
visibility of all other GPUs to correctly access shared memory, this script only
changes the enumeration order and does not restrict visibility.
Running applications.
Check out here to see how to run Torch applications.