分割错误后清空内核转储文件

I am running a program, and it is interrupted by Segmentation fault. The problem is that the core dump file is created, but of size zero.

我正在运行一个程序，它被分割错误打断。问题是创建了核心转储文件，但是大小为0。

Have you heard about such a case and how to resolve it?

你听说过这种情况吗?怎么解决?

I have enough space on the disk. I have already performed ulimit -c unlimited to unlimit the size of core file - both running it or putting on the top of the submitted batch file - but still have 0 byte core dump files. The permissions of the folder containing these files are uog+rw and the permissions on the core files created are u+rw only.

磁盘上有足够的空间。我已经执行了ulimit -c unlimited对核心文件的大小进行了无限制的限制——运行它或者把它放在提交的批处理文件的顶部——但是仍然有0字节的核心转储文件。包含这些文件的文件夹的权限是uog+rw，创建的核心文件的权限是u+rw。

The program is written by C++ and submitted on a linux cluster with qsub command of the Grid Engine, I don't know this information is relevant or not to this question.

这个程序是c++编写的，是在linux集群上用网格引擎的qsub命令提交的，我不知道这个信息是否与这个问题相关。

4 个解决方案

#1

setting ulimit -c unlimited turned on generation of dumps. by default core dumps were generated in current directory which was on nfs. setting /proc/sys/kernel/core_pattern to /tmp/core helped me to solve the problem of empty dumps.

设置ulimit -c unlimited，生成转储文件。默认情况下，核心转储是在nfs上的当前目录中生成的。设置/proc/sys/kernel/core_pattern到/tmp/core帮助我解决了空转储的问题。

The comment from Ranjith Ruban helped me to develop this workaround.

来自Ranjith Ruban的评论帮助我开发了这个解决方案。

What is the filesystem that you are using for dumping the core?

用于转储核心的文件系统是什么?

#2

It sounds like you're using a batch scheduler to launch your executable. Maybe the shell that Torque/PBS is using to spawn your job inherits a different ulimit value? Maybe the scheduler's default config is not to preserve core dumps?

听起来好像您正在使用一个批处理调度程序来启动您的可执行文件。也许转矩/PBS使用的shell会继承一个不同的ulimit值?也许调度器的默认配置不是保存核心转储?

Can you run your program directly from the command line instead?

你能直接从命令行运行你的程序吗?

Or if you add ulimit -c unlimited and/or ulimit -s unlimited to the top of your PBS batch script before invoking your executable, you might be able to override PBS' default ulimit behavior. Or adding 'ulimit -c' could report what the limit is anyway.

或者，如果您在调用可执行文件之前，将ulimit -c unlimited和/或ulimit -s unlimited添加到您的PBS批处理脚本的顶部，那么您可能可以覆盖PBS的默认ulimit行为。或者添加“ulimit -c”也可以报告极限是多少。

#3

You can set resource limits such as physical memory required by using qsub option such as -l h_vmem=6G to reserver 6 GB of physical memory.

您可以通过使用qsub选项(如-l h_vmem=6G)设置资源限制，例如物理内存所需的物理内存。

For file blocks you can set h_fsizeto appropriate value as well.

对于文件块，也可以将h_fsizeto设置为适当的值。

See RESOURCE LIMITS section of qconf manpage:

参见qconf manpage资源限制部分:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html

s_cpu     The per-process CPU time limit in seconds.

s_core    The per-process maximum core file size in bytes.

s_data    The per-process maximum memory limit in bytes.

s_vmem    The same as s_data (if both are set the minimum is
           used).
h_cpu     The per-job CPU time limit in seconds.

h_data    The per-job maximum memory limit in bytes.

h_vmem    The same as h_data (if both are set the minimum is
           used).

h_fsize   The total number of disk blocks that this job  can
           create.

Also, if cluster uses local TMPDIR to each node, and that is filling up, you can set TMPDIR to alternate location with more capacity, e.g. NFS share:

另外，如果集群对每个节点使用本地TMPDIR，并且正在填充，则可以将TMPDIR设置为具有更大容量的备用位置，例如，NFS共享:

export TEMPDIR=<some NFS mounted directory>

Then launch qsub with the -V option to export the current environment to the job.

然后使用-V选项启动qsub，将当前环境导出到作业。

One or a combination of the above may help you solve your problem.

上面的一个或多个组合可以帮助你解决你的问题。

#4

If you run the core file in a mounted drive.The core file can't be written to a mounted drive but must be written to the local drive.

如果在已挂载的驱动器中运行核心文件。核心文件不能写入已挂载的驱动器，但必须写入本地驱动器。

You can copy the file to the local drive.

您可以将文件复制到本地驱动器。

#1