Shared memory: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
 
(18 intermediate revisions by 4 users not shown)
Line 1: Line 1:
VASP is mainly parallelized using MPI, and as much as practically feasible the computational work and storage demands are distributed over the MPI ranks.
VASP is mainly [[:Category:Parallelization|parallelized]] using MPI, and as much as practically feasible the computational work and storage demands are distributed over the MPI ranks.
Unavoidably, however, some data structures are duplicated across all MPI ranks.
Unavoidably, however, some data structures are duplicated across all MPI ranks.
For some of these data structures, VASP offers the option to reduce the memory consumption by putting them into shared-memory segments. That is segments of memory shared between the MPI-ranks that reside on the same compute-node and hence have access to the same physical memory.
For some of these data structures, VASP offers the option to reduce the memory consumption by putting them into shared-memory segments. That is segments of memory shared between the MPI ranks that reside on the same compute node and hence have access to the same physical memory.


Whether to use shared memory or not has to be decided when compiling the code. It is controlled by the [[precompiler options]]: <tt>-Duse_shmem</tt>, <tt>-Dshmem_bcast_buffer</tt>, <tt>-Dshmem_rproj</tt>, and <tt>-Dsysv</tt>:
Whether to use shared memory or not has to be decided when compiling the code. It is controlled by the [[precompiler options]]: <tt>-Duse_shmem</tt>, <tt>-Dshmem_bcast_buffer</tt>, <tt>-Dshmem_rproj</tt>, and <tt>-Dsysv</tt>:


'''-Duse_shmem'''
'''-Duse_shmem'''
:Use [[Shared memory|shared-memory]] segments to reduce the memory demands of [[ALGO|GW (ALGO = EVGW0, EVGW, QPGW0, and QPGW)]] and [[Machine_learning_force_field_calculations:_Basics#Performance_and_memory_usage|machine-learned&ndash;force-field]] calculations.
:Use [[Shared memory|shared-memory]] segments to reduce the memory demands of [[ALGO|GW (ALGO = EVGW0, EVGW, QPGW0, and QPGW)]] and the [[Machine_learning_force_field_calculations:_Basics#Performance_and_memory_usage|memory demands of machine-learned&ndash;force-field]] calculations.


'''-Dshmem_bcast_buffer'''
'''-Dshmem_bcast_buffer'''
:Use [[Shared memory|shared-memory]] segments to reduce the amount of MPI-communication in hybrid-functional calculations.
:Use [[Shared memory|shared-memory]] segments to reduce the amount of MPI communication in hybrid-functional calculations.


'''-Dshmem_rproj'''
'''-Dshmem_rproj'''
:Use [[Shared memory|shared-memory]] segments to reduce the storage demands of the [[LREAL|real-space PAW projectors]].
:Use [[Shared memory|shared-memory]] segments to reduce the storage demands of the [[LREAL|real-space PAW projectors]].


'''-Dsysv'''
'''-Dsysv''' (recommended if possible, see notes below)
:Use <tt>ipcs</tt> [[Shared memory|shared-memory]] segments and <tt>system-V</tt> semaphores.
:Use of <tt>ipcs</tt> [[Shared memory|shared-memory]] segments and <tt>system-V</tt> semaphores '''instead''' of using default MPI-3 shared-memory capabilities (see below).


:The allocation and handling of shared memory segments has been implemented in two different ways:
In any case the aforementioned [[precompiler options]] have to be accompanied by an additional change to your [[makefile.include]]: The variable <tt>OBJECTS_LIB</tt> needs to contain also the object file <tt>getshmem.o</tt>, e.g., it may look like this:


:* Using the MPI-3 shared memory capabilities (default).
OBJECTS_LIB = linpack_double.o getshmem.o
{{NB|warning|Per default, VASP uses MPI-3 calls to allocate and manage shared memory segments. Unfortunately, we have observed that for some MPI implementations an abnormal termination of the code (''e.g.'' <tt>segfaults</tt> or user initiated abort) does not free these shared memory segments. This is not a VASP related error. It is caused by the way these shared memory segments are handled by the operating system and MPI. Without explicit clean-up this leads to a "memory leakage" that persists until the compute node is rebooted. Obviously this is very problematic at high-performance-computing centers. For this reason we do not recommend using shared memory indiscriminately (''i.e.'', without explicit need).|::}}
{{NB|mind|If you forget to add <tt>getshmem.o</tt> you may receive errors in the linking stage at the end of the VASP build process, e.g., <code>undefined reference to `getshmem_C'</code>.}}
:* Using <tt>icps</tt> shared memory segments and <tt>system-V</tt> semaphores (add [[precompiler options|precompiler option]] <tt>-Dsysv</tt>).
{{NB|tip|Using <tt>icps</tt> shared memory segments and <tt>system-V</tt> semaphores (add [[precompiler options|precompiler option]] <tt>-Dsysv</tt>) rarely leads to memory leakage. However, when it does, it is guaranteed to persist until reboot of the node, and no manner of other clean-up will be effective.|::}}


:A common problem with the use of <tt>icps</tt> shared memory segments and <tt>system-V</tt> semaphores is that the maximum allowed number of semaphores and shared memory segments, and the maximum allowed size of the latter are system-wide kernel settings. The default settings of many Linux distributions are so strict, ''i.e.'', the allowed number and size of the shared memory segments is so small, that they are completely unusable for our purposes.
The allocation and handling of shared-memory segments has been implemented in two different ways:


{{NB|warning|How to change the maximum number of semaphores and shared-memory segments, and the maximum size of the latter, depends on the particular Linux distribution and generally requires superuser rights.|::}}
:* Using the MPI-3 shared-memory capabilities (default, any of <tt>-Duse_shmem</tt>, <tt>-Dshmem_bcast_buffer</tt>, <tt>-Dshmem_rproj</tt> but '''not''' <tt>-Dsysv</tt>).
{{NB|warning|Per default, VASP uses MPI-3 calls to allocate and manage shared-memory segments. Unfortunately, we have observed that for some MPI implementations an abnormal termination of the code (''e.g.'' <tt>segfaults</tt> or user initiated abort) does not free these shared-memory segments. This is not a VASP related error. It is caused by the way these shared-memory segments are handled by the operating system and MPI. Without explicit clean-up this leads to a "memory leakage" that persists until the compute node is rebooted. Obviously this is very problematic at high-performance-computing centers. For this reason we do not recommend using shared memory indiscriminately (''i.e.'', without explicit need).|::}}
:* Using <tt>ipcs</tt> shared-memory segments and <tt>system-V</tt> semaphores (add [[precompiler options|precompiler option]] <tt>-Dsysv</tt>).
{{NB|tip|Using <tt>ipcs</tt> shared-memory segments and <tt>system-V</tt> semaphores (add [[precompiler options|precompiler option]] <tt>-Dsysv</tt>) rarely leads to memory leakage. However, when it does, it is guaranteed to persist until reboot of the node, and no manner of other clean-up will be effective.|::}}
 
::A common problem with the use of <tt>ipcs</tt> shared-memory segments and <tt>system-V</tt> semaphores is that the maximum allowed number of semaphores and shared-memory segments, and the maximum allowed size of the latter are system-wide kernel settings. The default settings of many Linux distributions are so strict, ''i.e.'', the allowed number and size of the shared-memory segments is so small, that they are completely unusable for our purposes.
::One can verify the current limits, by:
<blockquote>
<pre>
#ipcs -l
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
 
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 16777216
max total shared memory (kbytes) = 16777216
min seg size (bytes) = 1
 
------ Semaphore Limits --------
max number of arrays = 262
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
</pre>
</blockquote>
::''i.e.'', on this particular machine the maximum number of semaphores is 32000, the maximum number of shared-memory segments is 4096, and their maximum size is 16Gb.
{{NB|warning|How to change the maximum number of semaphores and shared-memory segments, and the maximum size of the latter, depends on the particular Linux distribution and generally requires superuser rights. For this reason the use of this implementation of shared memory (<tt>-Dsysv</tt>) is not practical in many situations.|::}}


== Related articles ==
== Related articles ==
Line 35: Line 63:


----
----
[[The_VASP_Manual|Contents]]


[[Category:VASP]][[Category:Installation]][[Category:Performance]][[Category:VASP6]][[Category:Machine-learned force fields]]
[[Category:VASP]][[Category:Installation]][[Category:Performance]][[Category:Machine-learned force fields]][[Category:Memory]]

Latest revision as of 09:02, 21 February 2024

VASP is mainly parallelized using MPI, and as much as practically feasible the computational work and storage demands are distributed over the MPI ranks. Unavoidably, however, some data structures are duplicated across all MPI ranks. For some of these data structures, VASP offers the option to reduce the memory consumption by putting them into shared-memory segments. That is segments of memory shared between the MPI ranks that reside on the same compute node and hence have access to the same physical memory.

Whether to use shared memory or not has to be decided when compiling the code. It is controlled by the precompiler options: -Duse_shmem, -Dshmem_bcast_buffer, -Dshmem_rproj, and -Dsysv:

-Duse_shmem

Use shared-memory segments to reduce the memory demands of GW (ALGO = EVGW0, EVGW, QPGW0, and QPGW) and the memory demands of machine-learned–force-field calculations.

-Dshmem_bcast_buffer

Use shared-memory segments to reduce the amount of MPI communication in hybrid-functional calculations.

-Dshmem_rproj

Use shared-memory segments to reduce the storage demands of the real-space PAW projectors.

-Dsysv (recommended if possible, see notes below)

Use of ipcs shared-memory segments and system-V semaphores instead of using default MPI-3 shared-memory capabilities (see below).

In any case the aforementioned precompiler options have to be accompanied by an additional change to your makefile.include: The variable OBJECTS_LIB needs to contain also the object file getshmem.o, e.g., it may look like this:

OBJECTS_LIB = linpack_double.o getshmem.o
Mind: If you forget to add getshmem.o you may receive errors in the linking stage at the end of the VASP build process, e.g., undefined reference to `getshmem_C'.

The allocation and handling of shared-memory segments has been implemented in two different ways:

  • Using the MPI-3 shared-memory capabilities (default, any of -Duse_shmem, -Dshmem_bcast_buffer, -Dshmem_rproj but not -Dsysv).
Warning: Per default, VASP uses MPI-3 calls to allocate and manage shared-memory segments. Unfortunately, we have observed that for some MPI implementations an abnormal termination of the code (e.g. segfaults or user initiated abort) does not free these shared-memory segments. This is not a VASP related error. It is caused by the way these shared-memory segments are handled by the operating system and MPI. Without explicit clean-up this leads to a "memory leakage" that persists until the compute node is rebooted. Obviously this is very problematic at high-performance-computing centers. For this reason we do not recommend using shared memory indiscriminately (i.e., without explicit need).
  • Using ipcs shared-memory segments and system-V semaphores (add precompiler option -Dsysv).
Tip: Using ipcs shared-memory segments and system-V semaphores (add precompiler option -Dsysv) rarely leads to memory leakage. However, when it does, it is guaranteed to persist until reboot of the node, and no manner of other clean-up will be effective.
A common problem with the use of ipcs shared-memory segments and system-V semaphores is that the maximum allowed number of semaphores and shared-memory segments, and the maximum allowed size of the latter are system-wide kernel settings. The default settings of many Linux distributions are so strict, i.e., the allowed number and size of the shared-memory segments is so small, that they are completely unusable for our purposes.
One can verify the current limits, by:
#ipcs -l
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 16777216
max total shared memory (kbytes) = 16777216
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 262
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
i.e., on this particular machine the maximum number of semaphores is 32000, the maximum number of shared-memory segments is 4096, and their maximum size is 16Gb.
Warning: How to change the maximum number of semaphores and shared-memory segments, and the maximum size of the latter, depends on the particular Linux distribution and generally requires superuser rights. For this reason the use of this implementation of shared memory (-Dsysv) is not practical in many situations.

Related articles

Installing VASP.6.X.X, makefile.include, Precompiler options, Machine-learned force fields