3.5. Optional build settings

LAMMPS can be built with several optional settings. Each sub-section explain how to do this for building both with CMake and make.


3.5.1. C++11 standard compliance

A C++11 standard compatible compiler is a requirement for compiling LAMMPS. LAMMPS version 3 March 2020 is the last version compatible with the previous C++98 standard for the core code and most packages. Most currently used C++ compilers are compatible with C++11, but some older ones may need extra flags to enable C++11 compliance. Example for GNU c++ 4.8.x:

CCFLAGS = -g -O3 -std=c++11

3.5.2. FFT library

When the KSPACE package is included in a LAMMPS build, the kspace_style pppm command performs 3d FFTs which require use of an FFT library to compute 1d FFTs. The KISS FFT library is included with LAMMPS but other libraries can be faster. LAMMPS can use them if they are available on your system.

-D FFT=value              # FFTW3 or MKL or KISS, default is FFTW3 if found, else KISS
-D FFT_SINGLE=value       # yes or no (default), no = double precision
-D FFT_PACK=value         # array (default) or pointer or memcpy

Note

The values for the FFT variable must be in upper-case. This is an exception to the rule that all CMake variables can be specified with lower-case values.

Usually these settings are all that is needed. If FFTW3 is selected, then CMake will try to detect, if threaded FFTW libraries are available and enable them by default. This setting is independent of whether OpenMP threads are enabled and a packages like KOKKOS or OPENMP is used. If CMake cannot detect the FFT library, you can set these variables to assist:

-D FFTW3_INCLUDE_DIR=path   # path to FFTW3 include files
-D FFTW3_LIBRARY=path       # path to FFTW3 libraries
-D FFT_FFTW_THREADS=on      # enable using threaded FFTW3 libraries
-D MKL_INCLUDE_DIR=path     # ditto for Intel MKL library
-D FFT_MKL_THREADS=on       # enable using threaded FFTs with MKL libraries
-D MKL_LIBRARY=path         # path to MKL libraries

The KISS FFT library is included in the LAMMPS distribution. It is portable across all platforms. Depending on the size of the FFTs and the number of processors used, the other libraries listed here can be faster.

However, note that long-range Coulombics are only a portion of the per-timestep CPU cost, FFTs are only a portion of long-range Coulombics, and 1d FFTs are only a portion of the FFT cost (parallel communication can be costly). A breakdown of these timings is printed to the screen at the end of a run when using the kspace_style pppm command. The Screen and logfile output page gives more details. A more detailed (and time consuming) report of the FFT performance is generated with the kspace_modify fftbench yes command.

FFTW is a fast, portable FFT library that should also work on any platform and can be faster than the KISS FFT library. You can download it from www.fftw.org. LAMMPS requires version 3.X; the legacy version 2.1.X is no longer supported.

Building FFTW for your box should be as simple as ./configure; make; make install. The install command typically requires root privileges (e.g. invoke it via sudo), unless you specify a local directory with the “–prefix” option of configure. Type ./configure --help to see various options.

The Intel MKL math library is part of the Intel compiler suite. It can be used with the Intel or GNU compiler (see the FFT_LIB setting above).

Performing 3d FFTs in parallel can be time consuming due to data access and required communication. This cost can be reduced by performing single-precision FFTs instead of double precision. Single precision means the real and imaginary parts of a complex datum are 4-byte floats. Double precision means they are 8-byte doubles. Note that Fourier transform and related PPPM operations are somewhat less sensitive to floating point truncation errors and thus the resulting error is less than the difference in precision. Using the -DFFT_SINGLE setting trades off a little accuracy for reduced memory use and parallel communication costs for transposing 3d FFT data.

When using -DFFT_SINGLE with FFTW3 you may need to build the FFTW library a second time with support for single-precision.

For FFTW3, do the following, which should produce the additional library libfftw3f.a or libfftw3f.so.

make clean
./configure --enable-single; make; make install

Performing 3d FFTs requires communication to transpose the 3d FFT grid. The data packing/unpacking for this can be done in one of 3 modes (ARRAY, POINTER, MEMCPY) as set by the FFT_PACK syntax above. Depending on the machine, the size of the FFT grid, the number of processors used, one option may be slightly faster. The default is ARRAY mode.


3.5.3. Size of LAMMPS integer types and size limits

LAMMPS has a few integer data types which can be defined as either 4-byte (= 32-bit) or 8-byte (= 64-bit) integers at compile time. This has an impact on the size of a system that can be simulated or how large counters can become before “rolling over”. The default setting of “smallbig” is almost always adequate.

With CMake the choice of integer types is made via setting a variable during configuration.

-D LAMMPS_SIZES=value   # smallbig (default) or bigbig or smallsmall

If the variable is not set explicitly, “smallbig” is used.

LAMMPS system size restrictions

smallbig

bigbig

smallsmall

Total atom count

\(2^{63}\) atoms (= \(9.223 \cdot 10^{18}\))

\(2^{63}\) atoms (= \(9.223 \cdot 10^{18}\))

\(2^{31}\) atoms (= \(2.147 \cdot 10^9\))

Total timesteps

\(2^{63}\) steps (= \(9.223 \cdot 10^{18}\))

\(2^{63}\) steps (= \(9.223 \cdot 10^{18}\))

\(2^{31}\) steps (= \(2.147 \cdot 10^9\))

Atom ID values

\(1 \le i \le 2^{31} (= 2.147 \cdot 10^9)\)

\(1 \le i \le 2^{63} (= 9.223 \cdot 10^{18})\)

\(1 \le i \le 2^{31} (= 2.147 \cdot 10^9)\)

Image flag values

\(-512 \le i \le 511\)

\(- 1\,048\,576 \le i \le 1\,048\,575\)

\(-512 \le i \le 511\)

The “bigbig” setting increases the size of image flags and atom IDs over “smallbig” and the “smallsmall” setting is only needed if your machine does not support 64-bit integers or incurs performance penalties when using them.

These are limits for the core of the LAMMPS code, specific features or some styles may impose additional limits. The ATC package cannot be compiled with the “bigbig” setting. Also, there are limitations when using the library interface where some functions with known issues have been replaced by dummy calls printing a corresponding error message rather than crashing randomly or corrupting data.

Atom IDs are not required for atomic systems which do not store bond topology information, though IDs are enabled by default. The atom_modify id no command will turn them off. Atom IDs are required for molecular systems with bond topology (bonds, angles, dihedrals, etc). Similarly, some force or compute or fix styles require atom IDs. Thus if you model a molecular system or use one of those styles with more than 2 billion atoms, you need the “bigbig” setting.

Regardless of the total system size limits, the maximum number of atoms per MPI rank (local + ghost atoms) is limited to 2 billion for atomic systems and 500 million for systems with bonds (the additional restriction is due to using the 2 upper bits of the local atom index in neighbor lists for storing special bonds info).

Image flags store 3 values per atom in a single integer which count the number of times an atom has moved through the periodic box in each dimension. See the dump manual page for a discussion. If an atom moves through the periodic box more than this limit, the value will “roll over”, e.g. from 511 to -512, which can cause diagnostics like the mean-squared displacement, as calculated by the compute msd command, to be faulty.

Also note that the GPU package requires its lib/gpu library to be compiled with the same size setting, or the link will fail. A CMake build does this automatically. When building with make, the setting in whichever lib/gpu/Makefile is used must be the same as above.


3.5.4. Output of JPG, PNG, and movie files

The dump image command has options to output JPEG or PNG image files. Likewise the dump movie command outputs movie files in MPEG format. Using these options requires the following settings:

-D WITH_JPEG=value      # yes or no
                        # default = yes if CMake finds JPEG files, else no
-D WITH_PNG=value       # yes or no
                        # default = yes if CMake finds PNG and ZLIB files, else no
-D WITH_FFMPEG=value    # yes or no
                        # default = yes if CMake can find ffmpeg, else no

Usually these settings are all that is needed. If CMake cannot find the graphics header, library, executable files, you can set these variables:

-D JPEG_INCLUDE_DIR=path    # path to jpeglib.h header file
-D JPEG_LIBRARY=path        # path to libjpeg.a (.so) file
-D PNG_INCLUDE_DIR=path     # path to png.h header file
-D PNG_LIBRARY=path         # path to libpng.a (.so) file
-D ZLIB_INCLUDE_DIR=path    # path to zlib.h header file
-D ZLIB_LIBRARY=path        # path to libz.a (.so) file
-D FFMPEG_EXECUTABLE=path   # path to ffmpeg executable

Using ffmpeg to output movie files requires that your machine supports the “popen” function in the standard runtime library.

Note

On some clusters with high-speed networks, using the fork() library call (required by popen()) can interfere with the fast communication library and lead to simulations using ffmpeg to hang or crash.


3.5.5. Read or write compressed files

If this option is enabled, large files can be read or written with gzip compression by several LAMMPS commands, including read_data, rerun, and dump.

-D WITH_GZIP=value       # yes or no
                         # default is yes if CMake can find gzip, else no
-D GZIP_EXECUTABLE=path  # path to gzip executable if CMake cannot find it

This option requires that your operating system fully supports the “popen()” function in the standard runtime library and that a gzip executable can be found by LAMMPS during a run.

Note

On some clusters with high-speed networks, using the “fork()” library call (required by “popen()”) can interfere with the fast communication library and lead to simulations using compressed output or input to hang or crash. For selected operations, compressed file I/O is also available using a compression library instead, which is what the COMPRESS package enables.


3.5.6. Memory allocation alignment

This setting enables the use of the “posix_memalign()” call instead of “malloc()” when LAMMPS allocates large chunks or memory. Vector instructions on CPUs may become more efficient, if dynamically allocated memory is aligned on larger-than-default byte boundaries. On most current operating systems, the “malloc()” implementation returns pointers that are aligned to 16-byte boundaries. Using SSE vector instructions efficiently, however, requires memory blocks being aligned on 64-byte boundaries.

-D LAMMPS_MEMALIGN=value            # 0, 8, 16, 32, 64 (default)

Use a LAMMPS_MEMALIGN value of 0 to disable using “posix_memalign()” and revert to using the “malloc()” C-library function instead. When compiling LAMMPS for Windows systems, “malloc()” will always be used and this setting is ignored.


3.5.7. Workaround for long long integers

If your system or MPI version does not recognize “long long” data types, the following setting will be needed. It converts “long long” to a “long” data type, which should be the desired 8-byte integer on those systems:

-D LAMMPS_LONGLONG_TO_LONG=value     # yes or no (default)

3.5.8. Exception handling when using LAMMPS as a library

This setting is useful when external codes drive LAMMPS as a library. With this option enabled, LAMMPS errors do not kill the calling code. Instead, the call stack is unwound and control returns to the caller, e.g. to Python. Of course, the calling code has to be set up to catch exceptions thrown from within LAMMPS.

-D LAMMPS_EXCEPTIONS=value        # yes or no (default)

Note

When LAMMPS is running in parallel, it is not always possible to cleanly recover from an exception since not all parallel ranks may throw an exception and thus other MPI ranks may get stuck waiting for messages from the ones with errors.


3.5.9. Trigger selected floating-point exceptions

Many kinds of CPUs have the capability to detect when a calculation results in an invalid math operation like a division by zero or calling the square root with a negative argument. The default behavior on most operating systems is to continue and have values for NaN (= not a number) or Inf (= infinity). This allows software to detect and recover from such conditions. This behavior can be changed, however, often through use of compiler flags. On Linux systems (or more general on systems using the GNU C library), these so-called floating-point traps can also be selectively enabled through library calls. LAMMPS supports that by setting the -DLAMMPS_TRAP_FPE pre-processor define. As it is done in the main() function, this applies only to the standalone executable, not the library.

-D CMAKE_TUNE_FLAGS=-DLAMMPS_TRAP_FPE

After compilation with this flag set, the LAMMPS executable will stop and produce a core dump when a division by zero, overflow, illegal math function argument or other invalid floating point operation is encountered.