To build and install FlexFlow, follow the instructions below.
Clone the FlexFlow source code, and the third-party dependencies from GitHub.
git clone --recursive https://github.com/flexflow/FlexFlow.git
FlexFlow has system dependencies on cuda and/or rocm depending on which gpu backend you target. The gpu backend is configured by the cmake variable FF_GPU_BACKEND. By default, FlexFlow targets CUDA. docker/base/Dockerfile installs system dependencies in a standard ubuntu system.
If you are targeting CUDA, FlexFlow requires CUDA and CUDNN to be installed. You can follow the standard nvidia installation instructions CUDA and CUDNN.
Disclaimer: CUDA architectures < 60 (Maxwell and older) are no longer supported.
If you are targeting ROCM, FlexFlow requires a ROCM and HIP installation with a few additional packages. Note that this can be done on a system with or without an AMD GPU. You can follow the standard installation instructions ROCM and HIP. When running amdgpu-install, install the use cases hip and rocm. You can avoid installing the kernel drivers (not necessary on systems without an AMD graphics card) with --no-dkms I.e. amdgpu-install --usecase=hip,rocm --no-dkms. Additionally, install the packages hip-dev, hipblas, miopen-hip, and rocm-hip-sdk.
See ./docker/base/Dockerfile for an example ROCM install.
This is not currently supported.
If you are planning to build the Python interface, you will need to install several additional Python libraries, please check this for details. If you are only looking to use the C++ interface, you can skip to the next section.
We recommend that you create your own conda environment and then install the Python dependencies, to avoid any version mismatching with your system pre-installed libraries.
You can configure a FlexFlow build by running the config/config.linux file in the build folder. If you do not want to build with the default options, you can set your configurations by passing (or exporting) the relevant environment variables. We recommend that you spend some time familiarizing with the available options by scanning the config/config.linux file. In particular, the main parameters are:
CUDA_DIRis used to specify the directory of CUDA. It is only required when CMake can not automatically detect the installation directory of CUDA.CUDNN_DIRis used to specify the directory of CUDNN. It is only required when CUDNN is not installed in the CUDA directory.FF_CUDA_ARCHis used to set the architecture of targeted GPUs, for example, the value can be 60 if the GPU architecture is Pascal. To build for more than one architecture, pass a list of comma separated values (e.g.FF_CUDA_ARCH=70,75). To compile FlexFlow for all GPU architectures that are detected on the machine, passFF_CUDA_ARCH=autodetect(this is the default value, so you can also leaveFF_CUDA_ARCHunset. If you want to build for all GPU architectures compatible with FlexFlow, passFF_CUDA_ARCH=all. If your machine does not have any GPU, you have to set FF_CUDA_ARCH to at least one valid architecture code (orall), since the compiler won't be able to detect the architecture(s) automatically.FF_USE_PYTHONcontrols whether to build the FlexFlow Python interface.FF_USE_NCCLcontrols whether to build FlexFlow with NCCL support. By default, it is set to ON.FF_USE_GASNETis used to enable distributed run of FlexFlow. You can then set the preferred GASNET conduit withFF_USE_GASNET. For instance, to run FlexFlow on multiple nodes using MPI, setFF_USE_GASNET=ONandFF_GASNET_CONDUIT=mpiFF_BUILD_EXAMPLEScontrols whether to build all C++ example programs.FF_MAX_DIMis used to set the maximum dimension of tensors, by default it is set to 4.FF_USE_{NCCL,LEGION,ALL}_PRECOMPILED_LIBRARY, controls whether to build FlexFlow using a pre-compiled version of the Legion, NCCL (ifFF_USE_NCCLisON), or both libraries . By default,FF_USE_NCCL_PRECOMPILED_LIBRARYandFF_USE_LEGION_PRECOMPILED_LIBRARYare both set toON, allowing you to build FlexFlow faster. If you want to build Legion and NCCL from source, set them toOFF.
More options are available in cmake, please run ccmake and search for options starting with FF.
You can build FlexFlow in three ways: with CMake, with Make, and with pip. We recommend that you use the CMake building system as it will automatically build all C++ dependencies inlcuding NCCL and Legion.
To build FlexFlow with CMake, go to the FlexFlow home directory, and run
mkdir build
cd build
../config/config.linux
make -j N
where N is the desired number of threads to use for the build.
To build Flexflow with pip, run pip install . from the FlexFlow home directory. This command will build FlexFlow, and also install the Python interface as a Python module.
The Makefile we provide is mainly for development purposes, and may not be fully up to date. To use it, run:
cd python
make -j N
After building FlexFlow, you can test it to ensure that the build completed without issue, and that your system is ready to run FlexFlow.
Set the FF_HOME environment variable before running FlexFlow. To make it permanent, you can add the following line in ~/.bashrc.
export FF_HOME=/path/to/FlexFlow
The Python examples are in the examples/python. The native, Keras integration and PyTorch integration examples are listed in native, keras and pytorch respectively.
To run the Python examples, you have two options: you can use the flexflow_python interpreter, available in the python folder, or you can use the native Python interpreter. If you choose to use the native Python interpreter, you should either install FlexFlow, or, if you prefer to build without installing, export the following flags:
export PYTHONPATH="${FF_HOME}/python:${FF_HOME}/build/python"export FF_USE_NATIVE_PYTHON=1
We recommend that you run the mnist_mlp test under native using the following cmd to check if FlexFlow has been installed correctly:
cd python
./flexflow_python examples/python/native/mnist_mlp.py -ll:py 1 -ll:gpu 1 -ll:fsize <size of gpu buffer> -ll:zsize <size of zero buffer>
A script to run all the Python examples is available at tests/multi_gpu_tests.sh
The C++ examples are in the examples/cpp. For example, the AlexNet can be run as:
./alexnet -ll:gpu 1 -ll:fsize <size of gpu buffer> -ll:zsize <size of zero buffer>
Size of buffers is in MBs, e.g. for an 8GB gpu -ll:fsize 8000
If you built/installed FlexFlow using pip, this step is not required. If you built using Make or CMake, install FlexFlow with:
cd build
make install