OpenFFT_Cpp_wrapper

Introduction

C++ interface & utilities for OpenFFT library.
This wrapper provides array-index management between global 3D/4D array data and distributed buffer data for OpenFFT.

License

The implementations of OpenFFT_Cpp_wrapper (inside ./include directory) are released under the MIT license.
The sample codes (inside ./sample directory) are released under the GPLv3 or any later version license because they written based on the original sample codes of the OpenFFT library.

How to use

Include the ./include/openfft.hpp in your source code, then compile and link with the libopenfft.a of OpenFFT.
Do not include the openfft.h of OpenFFT.

This library uses C++11 standard.

This library is developed in the environment shown in below.

GCC 6.4
OpenMPI 2.1.2
FFTW 3.3.7
OpenFFT 1.2

Calling wrapper functions from your C++ program

Definition of Interface.
All definitions are implemented in the namespace of OpenFFT:: .

The complex type for OpenFFT library is OpenFFT::dcomplex . The members of dcomplex are shown in below.

OpenFFT::dcomplex complex_data;

double real_part = complex_data.r;
double imag_part = complex_data.i;

The management class for OpenFFT library is OpenFFT::Manager<double> .
In current version of OpenFFT, that is only implemented in 64bit float (double) version.

Data type and index type informations are available from manager class.

using float_type   = typename OpenFFT::Manager<double>::float_type;    //  same to double
using complex_type = typename OpenFFT::Manager<double>::complex_type;  //  same to OpenFFT::dcomplex
using index_type   = typename OpenFFT::Manager<double>::IndexList;     //  same to std::array<int, 8>

Step 1: Initialize manager.

//--- make manager instance
OpenFFT::Manager<double> fft_mngr;

//--- initialize for r2c 3D FFT
fft_mngr.init_r2c_3d(N1, N2, N3,
                     offt_measure, measure_time, print_memory);

//--- initialize for c2c 3D FFT
fft_mngr.init_c2c_3d(N1, N2, N3,
                     offt_measure, measure_time, print_memory);

//--- initialize for c2c 4D FFT
fft_mngr.init_c2c_4d(N1, N2, N3, N4,
                     offt_measure, measure_time, print_memory);


//--- initialize for c2c_3D FFT with default settings
//      auto-tuning, without measure-time and print-memory.
fft_mngr.init_c2c_3d(N1, N2, N3);

These grid size and transform type are shared in global manager (that is static instance in this wrapper).
You can select each one configuration exclusively in same time. (re-initialize to different configuration is possible.)

Step 2: Copy the global 3D/4D array into the local input buffer.

//--- copy data for r2c 3D FFT
double RealGlobalInput[N1][N2][N3];
std::vector<double> real_input_buffer;
fft_mngr.copy_array_into_input_buffer( &( RealGlobalInput[0][0][0] ), real_input_buffer );

//--- copy data for c2c 3D FFT
OpenFFT::dcomplex GlobalInput[N1][N2][N3];
std::vector<OpenFFT::dcomplex> input_buffer;
fft_mngr.copy_array_into_input_buffer( &( GlobalInput[0][0][0] ), input_buffer );

//--- copy data for c2c 4D FFT
OpenFFT::dcomplex GlobalInput[N1][N2][N3][N4];
std::vector<OpenFFT::dcomplex> input_buffer;
fft_mngr.copy_array_into_input_buffer( &( GlobalInput[0][0][0][0] ), input_buffer );

All 'buffer' input/output accepts both of reference to std::vector<T> and pointer to buffer array (double * or OpenFFT::dcomplex *).
Using std::vector<T> is recommended because the buffer length is checked for input buffer or resized for output buffer in wrapper API.

Step 3: Execute FFT.

//--- for r2c 3D FFT
std::vector<OpenFFT::dcomplex> output_buffer;
fft_mngr.fft_r2c_forward( real_input_buffer, output_buffer );

//--- for c2c 3D/4D FFT
std::vector<OpenFFT::dcomplex> output_buffer;
fft_mngr.fft_c2c_forward( input_buffer, output_buffer );

Step 4: write back the local output buffer data into the local 3D/4D array.

//--- write back for r2c_3D FFT
OpenFFT::dcomplex LocalOutput[N1][N2][N3r];   // N3r = N3/2+1
fft_mngr.copy_array_from_output_buffer( &( GlobalOutput[0][0][0] ), output_buffer);

//--- write back for c2c_3D FFT
OpenFFT::dcomplex LocalOutput[N1][N2][N3];
fft_mngr.copy_array_from_output_buffer( &( GlobalOutput[0][0][0] ), output_buffer);

//--- write back for c2c_4D FFT
OpenFFT::dcomplex LocalOutput[N1][N2][N3][N4];
fft_mngr.copy_array_from_output_buffer( &( GlobalOutput[0][0][0][0] ), output_buffer);

Step 4-2: Transpose the output buffer into the input buffer for Inverse FFT (available in c2c_3D or c2c_4D mode only).
```
fft_mngr.transpose_output_to_input( output_buffer, input_buffer );
```
In this function, communicate the data of 'output_buffer' by using MPI_Alltoallv() and reshape received data into the 'input_buffer'.
Step 4-3: Execute Inverse FFT (available in c2c_3D or c2c_4D mode only).
```
fft_mngr.fft_c2c_backward( input_buffer, output_buffer );
```
This function is compatible with backward DFT compute of FFTW3.
step 5: Finalize.
```
fft_mngr.finalize();
```
The OpenFFT configurations are shared in global manager object (It allows you to make many instance of OpenFFT::Manager<>).
It is recommended that you call the 'finalize' function at once in finalization part of your program.

Other API

Get OpenFFT information of local process.

int                My_Max_NumGrid = fft_mngr.get_max_n_grid();
int                My_NumGrid_In  = fft_mngr.get_n_grid_in();
int                My_NumGrid_Out = fft_mngr.get_n_grid_out();
std::array<int, 8> My_Index_In    = fft_mngr.get_index_in();
std::array<int, 8> My_Index_Out   = fft_mngr.get_index_out();

double                elapsed_time = fft_mngr.get_time();
OpenFFT::FFT_GridType grid_type    = fft_mngr.get_grid_type();

The values of [6] and [7] in "My_Index_In" and "My_Index_Out" are significant only at c2c_4D mode.
OpenFFT::FFT_GridType is enum class type. It is defined as below.

namespace OpenFFT {
    enum class FFT_GridType {
        none,
        r2c_3D,
        c2c_3D,
        c2c_4D,
    };
}

Get OpenFFT information of other process.

int tgt_proc;  // your target MPI process ID

int                NumGrid_In  = fft_mngr.get_n_grid_in(  tgt_proc );
int                NumGrid_Out = fft_mngr.get_n_grid_out( tgt_proc );
std::array<int, 8> Index_In    = fft_mngr.get_index_in(   tgt_proc );
std::array<int, 8> Index_Out   = fft_mngr.get_index_out(  tgt_proc );

Gather functions are available to obtain the global 3D/4D array data from local output buffer.

//--- gather for r2c 3D FFT and c2c 3D FFT
OpenFFT::dcomplex GlobalOutput[N1][N2][N3];
const int tgt_proc = 0; // MPI proc id
fft_mngr.gather_array( &( GlobalOutput[0][0][0] ), output_buffer, tgt_proc );

//--- gather for c2c 4D FFT
OpenFFT::dcomplex GlobalOutput[N1][N2][N3][N4];
const int tgt_proc = 0; // MPI proc id
fft_mngr.gather_array( &( GlobalOutput[0][0][0][0] ), output_buffer, tgt_proc );


//--- allgather for r2c 3D FFT and c2c 3D FFT
OpenFFT::dcomplex GlobalOutput[N1][N2][N3];
fft_mngr.allgather_array( &( GlobalOutput[0][0][0] ), output_buffer );

//--- allgather for c2c 4D FFT
OpenFFT::dcomplex GlobalOutput[N1][N2][N3][N4];
fft_mngr.allgather_array( &( GlobalOutput[0][0][0][0] ), output_buffer );



//--- gather user-defined data type (c2c_3D example)
struct YourData {
    // define anything your data.
    // (notice: this class will be transported with MPI function.
    //          be careful if this class contains pointer or reference member.)
};
std::vector<YoutData> your_output;
Yourdata YourGlobalOutput[N1][N2][N3];

fft_mngr.gather_array( &( YourGlobalOutput[0][0][0] ), your_output, tgt_proc );

These 'gather' functions accept any type of 3D/4D array and buffer (defined as template class).
The gather interface may reduce data transport than using MPI_Reduce() or MPI_Allreduce() to make global array.

Apply your function between global 3D/4D array and local input/output buffer.

//--- example 1: collect real part of output.
double RealGlobalOutput[N1][N2][N3];

//------ using functor
struct CopyRealpartFromBuffer {
    void operator () (double &arr_v, const OpenFFT::dcomplex &buf_v){
        arr_v = buf_v.r;
    }
};
CopyRealpartFromBuffer your_func;

fft_mngr.apply_array_with_output_buffer( &(RealGlobalOutput[0][0][0]), output_buffer, your_func );



//--- example 2: if you have the local buffer in other process.
int proc_i;  // MPI process ID which has the part of 'output_buffer_at_proc_i'.
std::vector<OpenFFT::dcomplex> output_buffer_at_proc_i;

//------- this API accepts explicit MPI process ID.
fft_mngr.apply_array_with_output_buffer( &(RealGlobalOutput[0][0][0]), output_buffer_at_proc_i, your_func, proc_i);

The implementations of OpenFFT::Manager<>::apply_array_with_[input/output]_buffer() are used in other API between global 3D/4D array and local buffer. These 'apply' functions can accept explicit MPI process ID.

These 'apply' functions return the functor that you passed. It can use for reducing value from local buffer.

//--- calculate sum of element-wise product between ApplyMatrix array and Global output array.
double ApplyMatrix[N1][N2][N3]
std::vector<OpenFFT::dcomplex> output_buffer;

//--- reduce real part product
struct ReduceValue{
    double v = 0.0;
    void operator () (const double arr_v, const OpenFFT::dcomplex buf_v){
        this->v += arr_v*buf_v.r;
    }
};
ReduceValue reduce_value;

//--- get local sum
const ReduceValue local_sum = fft_mngr.apply_array_with_output_buffer( &(ApplyMatrix[0][0][0]), output_buffer, reduce_value );

//--- get global sum
double global_sum;
MPI_Allreduce( &local_sum.v, &global_sum, 1
              MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);

The apply interface accept any type of 3D/4D array and buffer (both of input are defined as template class respectively).

//--- perform 'apply' between user-defined class array and buffer.
struct ArrayClass {
    double v;
    int    m;
};
struct BufferClass {
    int j, k;
    OpenFFT::dcomplex c;
};

//--- something function between ArrayClass and BufferClass
struct ApplyFunc {
    void operator () (ArrayClass &arr_v, const BufferClass buf_v){
        arr_v.v =  buf_v.c.r * static_cast<double>(buf_v.j)
                +  buf_v.c.i * static_cast<double>(buf_v.k);
        arr_v.m = buf_v.j - buf_v.k;
    }
};
ApplyFunc apply_func;

ArrayClass array_3d[N1][N2][N3];
std::vector<BufferClass> apply_buffer;

//--- apply your function (select as index pattern)
fft_mngr.apply_array_with_input_buffer(  &(array_3d[0][0][0]), apply_buffer, apply_func );
//  or
fft_mngr.apply_array_with_output_buffer( &(array_3d[0][0][0]), apply_buffer, apply_func );

Index sequence generator for input/output buffer.

//--- for 3D array
int tgt_proc = 0;
std::vector< std::array<int, 3> > index3d_seq;

fft_mngr.gen_input_index_sequence(index3d_seq);  //  for local proc.
fft_mngr.gen_output_index_sequence(index3d_seq);

fft_mngr.gen_input_index_sequence(index3d_seq , tgt_proc);  //  for another proc
fft_mngr.gen_output_index_sequence(index3d_seq, tgt_proc);

//--- usage sample for 3D array: make input buffer
complex_t InputArray3D[N1][N2][N3];
std::vector<complex_t> input_buffer;
input_buffer.reserve(My_Max_NumGrid);
input_buffer.clear();
for(const auto& g : index3d_seq){
    input_buffer.emplace_back( InputArray3D[ g[0] ][ g[1] ][ g[2] ] );
}



//--- for 4D array
int tgt_proc = 0;
std::vector< std::array<int, 4> > index4d_seq;

fft_mngr.gen_input_index_sequence(index4d_seq);  //  for local proc.
fft_mngr.gen_output_index_sequence(index4d_seq);

fft_mngr.gen_input_index_sequence(index4d_seq , tgt_proc);  //  for another proc.
fft_mngr.gen_output_index_sequence(index4d_seq, tgt_proc);

//--- usage sample for 4D array: copy from output buffer
complex_t OutputArray4D[N1][N2][N3][N4];
std::vector<complex_t> output_buffer;
output_buffer.reserve(My_Max_NumGrid);
output_buffer.clear();

fft_mngr.fft_c2c_forward( input_buffer, output_buffer );

for(size_t ii=0; ii<index4d_seq.size(); ++ii){
    const auto g = index4d_seq[ii];
    OutputArray4D[ g[0] ][ g[1] ][ g[2] ][ g[3] ] = output_buffer[ii];
}

The index sequence generator interface is useful to make the data part in local process.

Transpose functions with user-defined data class.

struct YourData {
    // define anything your data.
    // (notice: this class will be transported with MPI function.
    //          be careful if this class contains pointer or reference member.)
};

std::vector<YourData> input_buf, output_buf;

//--- transpose input_buf -> output buf
fft_mngr.transpose_input_to_output(input_buf, output_buf);

//--- transpose output_buf -> input_buf
fft_mngr.transpose_output_to_input(output_buf, input_buf);

The transpose interface accept any type of buffer (defined as template class).

Buffer size management interface for gather/transpose API

struct YourData {
    // define anything your data.
};

//
//  after you called gather or transpose function of OpenFFT::Manager.
//

//--- get buffer size
const size_t sz_buf = fft_mngr.get_buf_size<YourData>();

//--- resize buffer size
const size_t sz_buf_new = 4;  // your target size
fft_mngr.shrink_buf<YourData>(sz_buf_new);

The buffer for gather/transpose API will be allocated for each data type.
The minimum shrinked buffer size is sz_min = 4 * sz_buf_new.

Note

Makefile configuration example to compile the OpenFFT library.

CC  = mpicc -O3 -fopenmp -I$(YOUR_FFTW_DIR)/include -I./include
LIB = -L$(YOUR_FFTW_DIR)/lib -lfftw3 -lfftw3_omp -lm
FC  = 0

The setting example is validated at the environment shown in below.

GCC 6.4
OpenMPI 2.1.2
FFTW 3.3.7
OpenFFT 1.2

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
include		include
sample		sample
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenFFT_Cpp_wrapper

Introduction

License

How to use

Calling wrapper functions from your C++ program

Other API

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenFFT_Cpp_wrapper

Introduction

License

How to use

Calling wrapper functions from your C++ program

Other API

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages