Page 1 of 1

Path length limit causing missing output

Posted: Fri Jan 31, 2025 4:21 pm
by oscarlb

It has been known for a long time on the cluster we run vasp that you encounter issues when the path length for the run directory goes beyond 256 characters. After recent experience on different clusters I have narrowed down the behavior and cause, and would like to request a solution for coming versions of VASP.

Specifically, if the absolute path for the OUTCAR file is 259 characters, the calculation will start, but after going into the electronic convergence, new output file paths are capped to the 256 character limit, e.g., causing the remaining OUTCAR output to be written to "OUT" in the same folder. It is worse when output files get capped higher up in the path tree where some files are not supposed to be written or not allowed to be written.

The output files are closed before starting the electronic convergence and reopened using the filename obtained from INQUIRE (I am sure it was needed for flushing or other reasons at the time, but is it still necessary?). See subroutines WFORCE, REOPEN, etc. in lib/diolib.F. This filename is saved in a buffer of size 255 characters, which is usually fine since the proper/gnu behaviour of INQUIRE is to return the relative path of the file. However, intel compiler behavior is to return the absolute path, meaning this filepath capping error is encountered when the path is too long. I realize this is also an unusual situation, but in our production runs, we encode a lot of running information in the folder name for various reasons and therefore encounter this error often.

I see some possible solutions I would like to discuss:

  1. Add a compiler option for the size of this buffer, so one may request the cluster to update the current and coming binaries for the intel compilers.

  2. Make the buffer dynamic, increasing the buffer until full path is captured/buffer is properly terminated.

  3. Add flag/compiler option for skipping the reopening of files if users do not see the need.

  4. Inquire to intel about possible options on their end for uniformity across compilers.

On the use end we could always compile with a different compiler or manually modify the buffer size, but none of these solutions are very sustainable moving between clusters and VASP versions.
I would be very glad if you could come with an official solution to this issue, although I otherwise hope this post can serve as documentation for those running older versions with long paths. Thank you in advance for any guidance and discussion.


Re: Path length limit causing missing output

Posted: Tue Feb 04, 2025 12:33 pm
by alexey.tal

Dear oscarlb,

Thank you for a very detailed report and for bringing this issue to our attention. We weren't aware of this problem.

Indeed, you identified the issue correctly and I was able to reproduce this behavior with the Intel compiler.

The reason we still use WFORCE is that FLUSH does not guarantee that the data is actually written out and the data might be written only at the very end when the calculation is finished. NVHPC is one such compiler. So far WFORCE has been the most reliable way to flush.

The most straightforward solution is to increase the buffer size. I believe that 1024 characters would suffice or do you use even longer file paths?
We will release the fix in the upcoming version.

Best wishes,
Alexey


Re: Path length limit causing missing output

Posted: Fri Feb 07, 2025 10:18 am
by oscarlb

Dear Alexey,

Thank you for your prompt response and explaining the reason for WFORCE.

If you increase the buffer to 1024 characters it would probably suffice for most of what we do, for now. But I would urge a more sustainable solution, or otherwise the next generation of PhD students might contact you again when they have fearlessly made more nested runs with the new versions. Since WFORCE is not called that often (to my understanding) and is not an expensive routine made by several processes, I would think making a dynamically allocated buffer based on an INCAR parameter or a compiler flag to be the right way to go to minimize future troubles.

I personally made a dynamic solution to the problem a while back, which reallocates until the whole path fits into the buffer. This has worked quite well, without any noticeable effects to my calculations, but I realize the condition for "fitting into buffer" could break if the path has a whitespace at exactly the 255th or 511th position, etc. I include it, but do with it what you wish.

Best regards,
Oscar


Re: Path length limit causing missing output

Posted: Tue Feb 11, 2025 9:24 am
by alexey.tal

Hi Oscar,

Thanks for the patch.
There is a limitation on the maximum path length in the linux kernel which is 4096 characters (see include/linux/limits.h), so you can't have the path of an arbitrary length. Also, using a larger buffer is less prone to errors.

Best regards,
Alexey


Re: Path length limit causing missing output

Posted: Tue Feb 11, 2025 10:08 am
by oscarlb

Dear Alexey,

I see. Then I suppose the best solution is to keep a somewhat static buffer length and warn when buffer looks full perhaps. If you could increase the default in future versions to 1024 (or slightly more for some margin), we would be grateful.

A compiler flag (with default) would make it easier to communicate the change to the cluster staff if needed, but then do what you seem fit.

Best regards,
Oscar