clFFT  2.0
OpenCL Fast Fourier Transforms (FFT's)

The clFFT library is an OpenCL library implementation of discrete Fast Fourier Transforms. It:

  • Provides a fast and accurate platform for calculating discrete FFTs.
  • Works on CPU or GPU backends.
  • Supports in-place or out-of-place transforms.
  • Supports 1D, 2D, and 3D transforms with a batch size that can be greater than 1.
  • Supports planar (real and complex components in separate arrays) and interleaved (real and complex components as a pair contiguous in memory) formats.
  • Supports dimension lengths that can be any mix of powers of 2, 3, and 5.
  • Supports single and double precision floating point formats.

Installation of clFFT library

Downloadable Binaries

clFFT library pre-compiled packages for recent versions of Microsoft Windows operating systems and several flavors of Linux are available.

The downloadable binary packages are freely available at https://github.com/clMathLibraries/clFFT/releases

Once the appropriate package for the respective OS has finished downloading, uncompress the package using the native tools available on the platform in a directory of the user's choice. Everything needed to build a program using clFFT is included in the directory tree, including documentation, header files, binary library components, and sample programs for programming illustration.

CMake

After the clFFT package is uncompressed on the user's hard drive, a samples directory exists with source code, but no Visual Studio project files, Unix makefiles, or other native build system exist. Instead, it contains a CMakeLists.txt file. clFFT uses CMake as its build system, and other build files, such as Visual Studio projects, nmake makefiles, or Unix makefiles, are generated by the CMake build system, during configuration. CMake is freely available for download from: http://www.cmake.org/

Note
CMake generates the native OS build files, so any changes made to the native build files are overwritten the next time CMake is run.

CMake is written to pull compiler information from environment variables, and to look in default install directories for tools. Once installed, a popular interface to control the process of creating native build files is CMake-gui. When the GUI is launched, two text boxes appear at the top of the dialog: a path to source and a separate path to generate binaries. For the browse source... box, find the path to where you unzipped clFFT, and select the root samples directory that contains the CMakeLists.txt; for clFFT, this should be clFFT/samples. For browse build..., select an appropriate directory where the build environment generates build files; a convenient location is a sibling directory to the source. This makes it easy to wipe all the binaries and start a fresh build. For instance, for a debug configuration of NMake, an example directory could be clFFT/bin/NMakeDebug. This is where the generated makefile, native build files, and intermediate object files are built. These generated files are kept separate from the source; this is referred to as 'out-of-source' builds, and is very similar in concept to what 'autotools' does for Linux. To build using NMake, simply type NMake in the build directory containing the makefile. To build using Visual Studio, generate the solution and project files into a directory such as clFFT/bin/vs10, find the generated .sln file, and open the solution.

The first time the configure button near the bottom of the screen is clicked, it causes CMake to prompt for what type of native build files to make. Various properties appear in red in the properties box. Red indicates that the value has changed since last time configure was clicked. (The first time configure is clicked, everything is red.) CMake tries to configure itself automatically to the client's system by looking at a systems environment variables and by searching through default install locations for project dependencies. Take a moment to verify the settings and paths that are displayed on the configuration screen; if any changes must be made, you can provide correct paths or adjust settings by typing directly into the CMake configuration screen. Click the configure button a second time to 'bake' those settings and serialize them to disk.

Options relevant to the clFFT project include:

  • 'AMDAPPSDKROOT': Location of the Stream SDK installation. This value is already populated if CMake could determine the location by looking at the environment variables. If not, the user must provide a path to the root installation of the Stream SDK here.
  • 'BOOST_ROOT': Location of the Boost SDK installation. This value is already populated if CMake could determine the location by looking at the environment variables or default install locations. If not, the user must provide a path to the root installation of the Stream SDK here. This dependency is only relevant to the sample client; the FFT library does not depend on Boost.
  • 'CMAKE_BUILD_TYPE': Defines the build type (default is debug). For Visual Studio projects, this does not appear (modifiable in IDE); for makefile-based builds, this is set in CMake.
  • 'CMAKE_INSTALL_PREFIX': The path to install all binaries and headers generated from the build. This is used when the user types make install or builds the INSTALL project in Visual Studio. All generated binaries and headers are copied into the path prefixed with CMAKE_INSTALL_PREFIX. The Visual Studio projects are self explanatory, but a few other projects are autogenerated; these might be unfamiliar.

The Visual Studio projects are self explanatory, but a few other projects are autogenerated; these might be unfamiliar.

  • 'ALL_BUILD': A project that is empty of files, but since it depends on all user projects, it provides a convenient way to rebuild everything.
  • 'ZERO_CHECK': A CMake-specific project that checks to see if the generated solution and project files are in sync with the CMakeLists.txt file. If these files are modified, the solutions and projects are now out-of-sync, and this project prompts the user to regenerate their environment.
Note
If the user chooses to build on Windows with a NMake based build, it is important to launch CMake from within a Visual Studio Command Prompt (20xx). This is because CMake must be able to parse environment variables to properly initialize NMake. This is not necessary if a Visual Studio solution is generated, because solution files contain their own environmental setup.

Boost

clFFT includes one sample project that has source dependencies on Boost: the sample client project. Boost is freely available from: http://www.boost.org/.

The command-line clFFT sample client links with the program_options library, which provides functionality for parsing command-line parameters and .ini files in a cross-platform manner. Once Boost is downloaded and extracted on the hard drive, the program_options library must be compiled. The Boost build system uses the BJam builder (a project for a CMake-based Boost build is available for separate download). This is available for download from the Boost website, or the user can build BJam; Boost includes the source to BJam in its distribution, and the user can execute bootstrap.bat (located in the root boost directory) to build it.

After BJam is either built or installed, an example BJam command-line is given below for building a 64-bit program_options binary, for both static and dynamic linking:

bjam --with-program_options address-model=64 link=static,shared stage

The last step to make boost readily available and usable by CMake and the native compiler is to add an environment variable to the system called BOOST_ROOT. In Windows, right click on the computer icon and go to

'Properties|Advanced system settings|Advanced|Environment Variables...'

Remember to relaunch any new processes that are open, in order to inherit the new environment variable. On Linux, consider modifying the .bash_rc file (or shell equivalent) to export a new environment variable every time you log in.

If you are on a Linux system and have used a package manager to install Boost, you may have to confirm where the Boost include and library files have been placed. For example, after installing Boost with the Ubuntu Synaptic Package Manager, the Boost include files are in /usr/include/boost, and the library files either /usr/lib or /usr/lib64. The CMakeLists.txt file in this project defaults the BOOST_ROOT value to /usr on Linux; so, if the system is set up similarly, no further action is necessary. If the system is set up differently, you may have to set the BOOST_ROOT environmental variable accordingly.

Note
Note that CMake does not recognize version numbers at the end of the library filename; so, if the package manager only created a libboost_module_name.so.x.xx.x file (where x.xx.x is the version of Boost), the user may need to manually create a soft link called libboost_module_name.so to the versioned libboost_module_name.so.x.xx.x. See the clFFT binary artifacts in the install directory for an example.

Introduction to clFFT

The FFT is an implementation of the Discrete Fourier Transform (DFT) that makes use of symmetries in the FFT definition to reduce the mathematical intensity required from O( \(N^2\)) to O( \( N \log N\)) when the sequence length, N, is the product of small prime factors. Currently, there is no standard API for FFT routines. Hardware vendors usually provide a set of high-performance FFTs optimized for their systems: no two vendors employ the same interfaces for their FFT routines. clFFT provides a set of FFT routines that are optimized for AMD graphics processors, and that also functional across CPU and other compute devices.

Supported Radices

clFFT supports powers of 2, 3 and 5 sizes. This means that the vector lengths that can be configured through a plan can be any length that is a power of two, three, and five; examples include \(2^7, 2^1*3^1, 3^2*5^4, 2^2*3^3*5^5\), up to the limit that the device can support.

Transform Size Limits

Currently, there is an upper bound on the transform size the library supports. This limit is \(2^{24}\) for single precision and \(2^{22}\) for double precision. This means that the product of transform lengths must not exceed these values. As an example, a 1D single-precision FFT of size 1024 is valid since 1024 \(<= 2^{24}\). Similarly, a 2D double-precision FFT of size 1024x1024 is also valid, since 1024*1024 \(<= 2^{22}\). But, a 2D single-precision FFT of size 4096x8192 is not valid because 4096*8192 > 224.

Dimensionality

clFFT currently supports FFTs of up to three dimensions, given by the enum clFFT-Dim. This enum is a required parameter into clfftCreateDefaultPlan() to create an initial plan; there is no default for this parameter. Depending on the dimensionality that the client requests, clFFT uses the formulations shown below to compute the DFT.

The definition of a 1D complex DFT used by clFFT is given by:

\[ {\tilde{x}}_j = {{1}\over{scale}}\sum_{k=0}^{n-1}x_k\exp\left({\pm i}{{2\pi jk}\over{n}}\right)\hbox{ for } j=0,1,\ldots,n-1 \]

where \(x_k\) are the complex data to be transformed, \(\tilde{x}_j\) are the transformed data, and the sign of \(\pm\) determines the direction of the transform: \(-\) for forward and \(+\) for backward. Note that the user must provided the scaling factor. Typically, the scale is set to 1 for forward transforms, and \({{1}\over{N}}\) for backwards transforms.

The definition of a complex 2D DFT used by clFFT is given by:

\[ {\tilde{x}}_{jk} = {{1}\over{scale}}\sum_{q=0}^{m-1}\sum_{r=0}^{n-1}x_{rq}\exp\left({\pm i} {{2\pi jr}\over{n}}\right)\exp\left({\pm i}{{2\pi kq}\over{m}}\right) \]

for \(j=0,1,\ldots,n-1\hbox{ and } k=0,1,\ldots,m-1\), where \(x_{rq}\) are the complex data to be transformed, \(\tilde{x}_{jk}\) are the transformed data, and the sign of \(\pm\) determines the direction of the transform. Typically, the scale is set to 1 for forwards transforms and \({{1}\over{M \cdot N}}\) for backwards transforms.

The definition of a complex 3D DFT used by clFFT is given by:

\[ \tilde{x}_{jkl} = {{1}\over{scale}}\sum_{s=0}^{p-1}\sum_{q=0}^{m-1}\sum_{r=0}^{n-1} x_{rqs}\exp\left({\pm i} {{2\pi jr}\over{n}}\right)\exp\left({\pm i}{{2\pi kq}\over{m}}\right)\exp\left({\pm i}{{2\pi ls}\over{p}}\right) \]

for \(j=0,1,\ldots,n-1\hbox{ and } k=0,1,\ldots,m-1\hbox{ and } l=0,1,\ldots,p-1\), where \(x_{rqs}\) are the complex data to be transformed, \(\tilde{x}_{jkl}\) are the transformed data, and the sign of \(\pm\) determines the direction of the transform. Typically, the scale is set to 1 for forwards transforms and \({{1}\over{M \cdot N \cdot P}}\) for backwards transforms.

Setup and Teardown of clFFT

clFFT is initialized by a call to clfftSetup(), which must be called before any other API exported from clFFT. This allows the library to create resources used to manage the plans that are created and destroyed by the user. This API also takes a structure clfftInitSetupData that is initialized by the client to control the behavior of the library. The corresponding clfftTeardown() method must be called by the client when it is done using the library. This instructs clFFT to release all resources, including any acquired references to any OpenCL objects that may have been allocated or passed to it through the API.

Thread safety

The clFFT API is designed to be thread-safe. It is safe to create plans from multiple threads, and to destroy those plans in separate threads. Multiple threads can call clfftEnqueueTransform() to place work into a command queue at the same time. clFFT does not provide a single-threaded version of the library. It is expected that the overhead of the synchronization mechanisms inside of clFFT thread safe is minor.

Currently, multi-device operation must be managed by the user. OpenCL contexts can be created that are associated with multiple devices, but clFFT only uses a single device from that context to transform the data. Multi-device operation can be managed by the user by creating multiple contexts, where each context contains a different device, and the user is responsible for scheduling and partitioning the work across multiple devices and contexts.

Row Major formats

clFFT expects all multi-dimensional input passed to it to be in row-major format. This is compatible with C-based languages. However, clFFT is very flexible in the input and output data organization it accepts by allowing the user to specify a stride for each dimension. This feature can be used to process data in column major arrays, and other non-contiguous data formats. See clfftSetPlanInStride and clfftSetPlanOutStride.

OpenCL object creation

OpenCL objects, such as contexts, cl_mem buffers, and command queues, are the responsibility of the user application to allocate and manage. All of the clFFT interfaces that must interact with OpenCL objects take those objects as references through the API. Specifically, the plan creation function clfftCreateDefaultPlan() takes an OpenCL context as a parameter reference, increments the reference count on that object, and keeps the object alive until the corresponding plan has been destroyed through a call to clfftDestroyPlan().

Flushing of command queues

The clFFT API operates asynchronously, and with the exception of thread safety locking with multiple threads, all APIs return immediately. Specifically, the clfftEnqueueTransform() API does not explicitly flush the command queues that are passed by reference to it; it pushes the transform work onto the command queues and returns the modified queues to the client. The client is free to issue its own blocking logic, using OpenCL synchronization mechanisms, or push further work onto the queue to continue processing.

clFFT Plans

A plan is the collection of (almost) all of the parameters needed to specify an FFT computation. This includes:

  • What OpenCL context executes the transform?
  • Is this a 1D, 2D or 3D transform?
  • What are the lengths or extents of the data in each dimension?
  • How many datasets are being transformed?
  • What is the data precision?
  • Should a scaling factor be applied to the transformed data?
  • Does the output transformed data replace the original input data in the same buffer (or buffers), or is the output data written to a different buffer (or buffers).
  • How is the input data stored in its data buffers?
  • How is the output data stored in its data buffers?

The plan does not include:

  • The OpenCL handles to the input and output data buffers.
  • The OpenCL handle to a temporary scratch buffer (if needed).
  • Whether to execute a forward or reverse transform.

These are specified when the plan is executed.

Default Plan Values

When a new plan is created by calling clfftCreateDefaultPlan, its parameters are initialized as follows:

  • Dimensions: as provided by the caller.
  • Lengths: as provided by the caller.
  • Batch size: 1.
  • Precision: CLFFT_SINGLE.
  • Scaling factors:
    1. For the forward transform, the default is 1.0, or no scale factor is applied.
    2. For the reverse transform, the default is 1.0 / P, where P is the product of the FFT lengths.
  • Location: CLFFT_INPLACE.
  • Input layout: CLFFT_COMPLEX_INTERLEAVED.
  • Input strides: the strides of a multidimensional array of the lengths specified, where the data is compactly stored using the row-major convention.
  • Output layout: CLFFT_COMPLEX_INTERLEAVED.
  • Output strides: same as input strides.

Writing client programs that depend on these initial values is not recommended.

Supported Memory Layouts

There are two main families of Discrete Fourier Transform (DFT):

  • Routines for the transformation of complex data. clFFT supports two layouts to store complex numbers: a 'planar' format, where the real and imaginary components are kept in separate arrays:
    1. Buffer1: RRRRR
    2. Buffer2: IIIII
    and an interleaved format, where the real and imaginary components are stored as contiguous pairs:
    1. Buffer1: RIRIRIRIRIRI
  • Routines for the transformation of real to complex data and vice versa; clFFT provides enums to define these formats. For transforms involving real data, there are two possibilities:
    • Real data being subject to forward FFT transform that results in complex data.
    • Complex data being subject to backward FFT transform that results in real data. See the Section "FFTs of Real Data".

Strides and Distances

For one-dimensional data, if clStrides[0] = strideX = 1, successive elements in the first dimension are stored contiguously in memory. If strideX is an integral value greater than 1, gaps in memory exist between each element of the vectors.

For multi-dimensional data, if clStrides[1] = strideY = LenX for 2 dimensional data and clStrides[2] = strideZ = LenX*LenY for 3 dimensional data, no gaps exist in memory between each element, and all vectors are stored tightly packed in memory. Here, LenX, LenY, and LenZ denote the transform lengths clLengths[0], clLengths[1], and clLengths[2], respectively, which are used to set up the plan.

By specifying non-default strides, it is possible to process either row-major or column-major arrays. Data can be extracted from arrays of structures. Almost any regular data storage pattern can be accommodated.

Distance is the amount of memory that exists between corresponding elements in an FFT primitive in a batch. Distance is measured in the units of the FFT primitive; complex data measures in complex units, and real data measures in real data. Stride between tightly packed elements is 1 in either case. Typically, one can measure the distance between any two elements in a batch primitive, be it 1D, 2D, or 3D data. For tightly packed data, the distance between FFT primitives is the size of the FFT primitive, such that dist=LenX for 1D data, dist=LenX*LenY for 2D data, and dist=LenX*LenY*LenZ for 3D data. It is possible to set the distance of a plan to be less than the size of the FFT vector; most often 1 for this case. When computing a batch of 1D FFT vectors, if distance == 1, and strideX == length( vector ), a transposed output is produced for a batch of 1D vectors. It is left to the user to verify that the distance and strides are valid (not intersecting); if not valid, undefined results can occur.

A simple example is to perform a 1D length 4096 on each row of an array of 1024 rows x 4096 columns of values stored in a column-major array, such as a FORTRAN program might provide. (This would be equivalent to a C or C++ program that had an array of 4096 rows x 1024 columns stored in a row-major manner, and you wanted to perform a 1-D length 4096 transform on each column.) In this case, specify the strides [1024, 1].

For a more complex example, an input buffer contained a raster grid of 1024 x 1024 monochrome pixel values, and you want to compute a 2D FFT for each 64 x 64 subtile of the grid. Specifying strides allows you to treat each horizontal band of 1024 x 64 pixels as an array of 16 64 x 64 matrixes, and process an entire band with a single call to clfftEnqueueTransform. (Specifying strides is not quite flexible enough to transform the entire grid of this example with a single kernel execution.) It is possible to create a Plan to compute arrays of 64 x 64 2D FFTs, then specify three strides: [1, 1024, 64]. The first stride, 1, indicates that the rows of each matrix are stored consecutively; the second stride, 1024, gives the distance between rows, and the third stride, 64, defines the distance from matrix to matrix. Then call clfftEnqueueTransform 16 times: once for each horizontal band of pixels.

Supported Precisions in clFFT

Both CLFFT_SINGLE and CLFFT_DOUBLE precisions are supported by the library for all supported radices. With both of these enums the host computer's math functions are used to produce tables of sines and cosines for use by the OpenCL kernel.

Both CLFFT_SINGLE_FAST and CLFFT_DOUBLE_FAST are meant to generate faster kernels with reduced accuracy, but are disabled in the current build..

See clfftPrecision, clfftSetPlanPrecision, and clfftGetPlanPrecision.

clfftDirection

The direction of the transform is not baked into the plan; the same plan can be used to specify both forward and backward transforms. Instead, clfftDirection is passed as a parameter into clfftEnqueueTransform.

In-Place and Out-of-Place

The clFFT API supports both in-place and out-of-place transforms. With inplace transforms, only input buffers are provided to the clfftEnqueueTransform() API, and the resulting data is written in the same buffers, overwriting the input data. With out-of-place transforms, distinct output buffers are provided to the clfftEnqueueTransform() API, and the inputdata is preserved. In-place transforms require that the cl_mem objects the client creates have both read and write permissions. This is given in the nature of the in-place algorithm. Out-of-place transforms require that the destination buffers have read and write permissions, but input buffers can still be created with read-only permissions. This is a clFFT requirement because internally the algorithms may go back and forth between the destination buffers and internally allocated temp buffers. For out-of-place transforms, clFFT never writes back to the input buffers.

Batches

The efficiency of clFFT is improved by utilizing transforms in batches. Sending as much data as possible in a single transform call leverages the parallel compute capabilities of OpenCL devices (and GPU devices in particular), and minimizes the penalty of transfer overhead. It's best to think of an OpenCL device as a high-throughput, high-latency device. Using a networking analogy as an example, it's similar to having a massively high-bandwidth pipe with very high ping response times. If the client is ready to send data to the device for compute, it should be sent in as few API calls as possible. This can be done by batching. clFFT plans have a parameter to describe the number of transforms being batched: clfftSetPlanBatchSize(), and to describe how those batches are laid out and spaced in memory: clfftSetPlanDistance(). 1D, 2D, or 3D transforms can be batched.

Using clFFT on a Client Application

To perform FFT calculations using clFFT, the client program must:

  • Initialize the library by calling clfftSetup.
  • For each distinct type of FFT needed:

    1. Create an FFT Plan object. This usually is done by calling the factory function clfftCreateDefaultPlan. Some of the most fundamental parameters are specified at this time, and others assume default values. The OpenCL context must be provided when the plan is created; it cannot be changed. Another way is to call clfftCopyPlan. In either case, the function returns an opaque handle to the Plan object.
    2. Complete the specification of all of the Plan parameters by calling the various parameter-setting functions, clAmdFFtSet_____.
    3. Optionally, "bake" or finalize the plan, calling clfftBakePlan. This signals to the library the end of the specification phase, and causes it to generate and compile the exact OpenCL kernels needed to perform the specified FFT on the OpenCL device provided.

      At this point, all performance-enhancing optimizations are applied, possibly including executing benchmark kernels on the OpenCL device context in order to maximize runtime performance.

      Although this step is optional, most users probably want to include it so that they can control when this work is done. Usually, this time consuming step is done when the application is initialized. If the user does not call clfftBakePlan, this work is done during the first call to clfftEnqueueTransform.

  • The OpenCL FFT kernels now are ready to execute as many times as needed.
    1. Call clfftEnqueueTransform. At this point, specify whether you want to execute a forward or reverse transform; also, provide the OpenCL cl_mem handles for the input buffer(s), output buffer(s)–unless you want the transformed data to overwrite the input buffers, and (optionally) scratch buffer.

      clfftEnqueueTransform performs one or more calls to the OpenCL function clEnqueueNDRangeKernel. Like clEnqueueNDRangeKernel, clfftEnqueueTransform is a non-blocking call. The commands to execute the FFT compute kernel(s) are added to the OpenCL context queue to be executed asynchronously. An OpenCL event handle is returned to the caller. If multiple NDRangeKernel operations are queued, the final event handle is returned.

    2. The application now can add additional OpenCL tasks to the OpenCL context's queue. For example, if the next step in the application's process is to apply a filter to the transformed data, the application would generate that clEnqueueNDRangeKernel, specifying the transform's output buffer(s) as the input to the filter kernel, and providing the transform's event handle to ensure proper synchronization.
    3. If the application must access the transformed data directly, it must call one of the OpenCL functions for synchronizing the host computer's execution with the OpenCL device (for example: clFinish()).
  • Terminate the library by calling clfftTeardown.

FFTs of Real Data

When real data is subject to DFT transformation, the resulting complex output follows a special property. About half of the output is redundant because they are complex conjugates of the other half. This is called the Hermitian redundancy. So, for space and performance considerations, it is only necessary to store the non-redundant part of the data. Most FFT libraries use this property to offer specific storage layouts for FFTs involving real data. clFFT provides 3 enumerated types to deal with real data FFTs:

  • CLFFT_REAL
  • CLFFT_HERMITIAN_INTERLEAVED
  • CLFFT_HERMITIAN_PLANAR

The first enum specifies that the data is purely real. This can be used to feed real input or get back real output. The second and third enums specify layouts for storing FFT output. They are similar to the corresponding full complex enums in the way they store real and imaginary components. The difference is that they store only about half of the complex output. Client applications can do just a forward transform and analyze the output. Or they can do some processing of the output and do a backward transform to get back real data. This is illustrated in the following figure.

realfft_fwdinv.jpg
Forward and Backward Transform Processes

Let us consider a 1D real FFT of length N. The full output looks as shown in following figure.

realfft_1dlen.jpg
1D Real FFT of Length N

Here, C* denotes the complex conjugate of. Since the values at indices greater than N/2 can be deduced from the first half of the array, clFFT stores data only up to the index N/2. This means that the output contains only 1 + N/2 complex elements, where the division N/2 is rounded down. Examples for even and odd lengths are given below.

Example for N = 8 is shown in following figure.

realfft_ex_n8.jpg
Example for N = 8

Example for N = 7 is shown in following figure.

realfft_ex_n7.jpg
Example for N = 7

For length 8, only (1 + 8/2) = 5 of the output complex numbers are stored, with the index ranging from 0 through 4. Similarly for length 7, only (1 + 7/2) = 4 of the output complex numbers are stored, with the index ranging from 0 through 3.

For 2D and 3D FFTs, the FFT length along the least dimension is used to compute the (1 + N/2) value. This is because the FFT along the least dimension is what is computed first and is logically a real-to-hermitian transform. The FFTs along other dimensions are computed afterwards; they are simply 'complex-tocomplex' transforms. For example, assuming clLengths[2] is used to set up a 2D real FFT, let N1 = clLengths[1], and N0 = clLengths[0]. The output FFT has N1*(1 + N0/2) complex elements. Similarly, for a 3D FFT with clLengths[3] and N2 = clLengths[2], N1 = clLengths[1], and N0 = clLengths[0], the output has N2*N1*(1 + N0/2) complex elements.

Supported Modes

Out-of-place transforms:

  • CLFFT_REAL to CLFFT_HERMITIAN_INTERLEAVED
  • CLFFT_REAL to CLFFT_HERMITIAN_PLANAR
  • CLFFT_HERMITIAN_INTERLEAVED to CLFFT_REAL
  • CLFFT_ CLFFT_HERMITIAN_PLANAR to CLFFT_REAL

In-place transforms:

  • CLFFT_REAL to CLFFT_HERMITIAN_INTERLEAVED
  • CLFFT_HERMITIAN_INTERLEAVED to CLFFT_REAL

Setting strides

The library currently requires the user to explicitly set input and output strides for real transforms. See the following examples to understand what values to use for input and output strides under different scenarios. The examples only show typical usages. The user has flexibility in allocating their buffers and laying out data according to their needs.

Examples

The following pages provide figures and examples to explain in detail the real FFT features of this library.

realfft_expl_01.jpg
1D FFT - Real to Hermitian
realfft_expl_02.jpg
1D FFT - Real to Hermitian, Example 1
realfft_expl_03.jpg
1D FFT - Real to Hermitian, Example 2
realfft_expl_04.jpg
1D FFT - Real to Hermitian, Example 3
realfft_expl_05.jpg
1D FFT - Hermitian to Real
realfft_expl_06.jpg
1D FFT - Hermitian to Real, Example
realfft_expl_07.jpg
2D FFT - Real to Hermitian In Place
realfft_expl_08.jpg
2D FFT - Real to Hermitian, Example