I would like to log all file accesses a process makes during it's lifetime in an efficient manner.
我希望以一种有效的方式记录进程在其生命周期中访问的所有文件。
Currently, we are doing this by using LD_PRELOAD by preloading a shared library that intercepts C library calls that deal with file accesses. The method is efficient without much performance overhead, but is not leak proof.
目前,我们通过使用LD_PRELOAD来实现这一点,方法是预加载一个共享库,该库拦截处理文件访问的C库调用。该方法效率高,性能开销小,但不防泄漏。
For instance, the LD_PRELOAD shared library we have has a hook for dlopen. This hook is used to track accesses to shared libraries, but the mechanism fails to log tertiary dependencies of the shared library.
例如,LD_PRELOAD共享库我们有一个dlopen的钩子。这个钩子用于跟踪对共享库的访问,但是该机制不能记录共享库的第三依赖项。
We did try using strace but the performance overhead of using strace was a non-starter for us. I was curious if we have other mechanisms that we can explore to intercept file accesses that a process and it's sub-processes makes in an efficient manner. I am open to exploring options at the kernel level, hooks into the VFS layer or anything else.
我们确实尝试过使用strace,但是使用strace的性能开销对我们来说是不可能的。我很好奇,我们是否有其他机制可以探测到拦截文件访问的过程,并且它的子进程以一种有效的方式进行。我愿意探索内核级的选项,挂钩到VFS层或其他任何东西。
Thoughts?
想法吗?
1 个解决方案
#1
2
We did try using strace but the performance overhead of using strace was a non-starter for us.
我们确实尝试过使用strace,但是使用strace的性能开销对我们来说是不可能的。
strace
is slow, as it uses ancient and slow ptrace
syscall to be something like debugger for the application. Every syscall made by application will be converted into signal to strace, around two ptrace syscalls by strace (also some printing, access to other process memory for string/struct values) and continuing the target application (2 context switches). strace
supports syscall filters, but filter can't be registered for ptrace, and strace does the filtering in user-space, tracing all syscalls.
strace是缓慢的,因为它使用了古老而缓慢的ptrace syscall作为应用程序的调试器。应用程序生成的每个syscall将被转换为指向strace的信号,并围绕strace的两个ptrace syscalls(也包括一些打印、对字符串/struct值的其他进程内存的访问)并继续目标应用程序(两个上下文切换)。strace支持syscall过滤器,但是过滤器不能注册为ptrace, strace在用户空间中进行过滤,跟踪所有的系统。
There are faster kernel-based solutions, Brendan Gregg (author of the Dtrace Book - Solaris, OSX, FreeBSD) have many overviews of tracing tools (in his blog: tracing 15 minutes, BPF superpowers, 60s of linux perf, Choosing Tracer 2015 (with Magic pony), page cache stats), for example
有更快的基于内核的解决方案,Brendan Gregg (Dtrace Book - Solaris, OSX, FreeBSD的作者)在他的博客中有很多关于跟踪工具的概述(在他的博客中:跟踪15分钟,BPF超能力,60年代的linux perf,选择Tracer 2015(和Magic pony),页面缓存统计),例如。
You are interested in left part of this diagram, near VFS block. perf
(standard tool), dtrace
(supported only in some linuxes, have license problems - CDDL incompatible with GPL), stap
(systemtap, works better with red Linuxes like CentOS).
您对这个图的左边感兴趣,在VFS块附近。perf(标准工具)、dtrace(只在某些linuxes中支持)有许可证问题——CDDL与GPL不兼容)、stap (systemtap与CentOS等红色linuxes合作较好)。
There is direct replacement of strace - the sysdig tool (requires additional kernel module, github) which works for system calls like tcpdump works for network interface sniffing. This tool sniffs syscalls inside kernel without additional context switches or signals or poking into other process memory with ptrace (kernel already has all strings copied from user) and it also uses smart buffering to dump traces to userspace tool in huge packets.
有直接替换strace——sysdig工具(需要额外的内核模块,github),它适用于像tcpdump这样的系统调用,用于网络接口嗅探。这个工具在内核中嗅探syscalls,不需要额外的上下文切换或信号,也不需要用ptrace插入其他进程内存(内核已经从用户那里复制了所有的字符串),而且它还使用智能缓冲将跟踪转储到userspace工具的大数据包中。
There are other universal tracing frameworks/tools like lttng (out of tree), ftrace / trace-cmd. And bcc
with eBPF is very powerful framework included in modern (4.9+) Linux kernels (check http://www.brendangregg.com/Slides/SCALE2017_perf_analysis_eBPF.pdf). bcc and eBPF allow you to write small (ans safe) code fragments to do some data aggregation in-kernel near the tracepoint:
还有其他通用的跟踪框架/工具,如lttng (out of tree)、ftrace / trace-cmd。使用eBPF的bcc是非常强大的框架,包含在现代(4.9+)Linux内核中(请参阅http://www.brinkregg.com/slides/scale2017_perf_analysis_ebpf.pdf)。bcc和eBPF允许您编写小型(ans安全)代码片段,以便在tracepoint附近的内核中进行一些数据聚合:
Try Brendan's tools near VFS if your Linux kernel is recent enough: opensnoop
, statsnoop
, syncsnoop
; probably some file* tools too (tools support pid filtering with -p PID
or may work system-wide). They are described partially at http://www.brendangregg.com/dtrace.html and published on his github: https://github.com/brendangregg/perf-tools (also https://github.com/iovisor/bcc#tools)
如果您的Linux内核足够新,可以尝试Brendan在VFS附近的工具:opensnoop、statsnoop、syncsnoop;可能也有一些文件*工具(工具支持用-p pid的pid过滤或可能在系统范围内工作)。在http://www.brregg.com/dtrace.html中有部分描述,并发表在他的github上:https://github.com/brregg/perf -tools(也有https://github.com/iovisor/bcc#tools)
As of Linux 4.9, the Linux kernel finally has similar raw capabilities as DTrace. ...
从Linux 4.9开始,Linux内核终于拥有了与DTrace相似的原始功能……
opensnoop is a program to snoop file opens. The filename and file handle are traced along with some process details.
opensnoop是一个窥探文件打开的程序。文件名和文件句柄与一些进程细节一起被跟踪。
# opensnoop -g UID PID PATH FD ARGS 100 3528 /var/ld/ld.config -1 cat /etc/passwd 100 3528 /usr/lib/libc.so.1 3 cat /etc/passwd 100 3528 /etc/passwd 3 cat /etc/passwd 100 3529 /var/ld/ld.config -1 cal 100 3529 /usr/lib/libc.so.1 3 cal
rwsnoop snoop read/write events. This is measuring reads and writes at the application level - syscalls.
rwsnoop snoop读/写事件。这是在应用程序层(syscalls)测量读写。
# rwsnoop UID PID CMD D BYTES FILE 0 2924 sh R 128 /etc/profile 0 2924 sh R 128 /etc/profile 0 2924 sh R 128 /etc/profile 0 2924 sh R 84 /etc/profile 0 2925 quota R 757 /etc/nsswitch.conf 0 2925 quota R 0 /etc/nsswitch.conf 0 2925 quota R 668 /etc/passwd
#1
2
We did try using strace but the performance overhead of using strace was a non-starter for us.
我们确实尝试过使用strace,但是使用strace的性能开销对我们来说是不可能的。
strace
is slow, as it uses ancient and slow ptrace
syscall to be something like debugger for the application. Every syscall made by application will be converted into signal to strace, around two ptrace syscalls by strace (also some printing, access to other process memory for string/struct values) and continuing the target application (2 context switches). strace
supports syscall filters, but filter can't be registered for ptrace, and strace does the filtering in user-space, tracing all syscalls.
strace是缓慢的,因为它使用了古老而缓慢的ptrace syscall作为应用程序的调试器。应用程序生成的每个syscall将被转换为指向strace的信号,并围绕strace的两个ptrace syscalls(也包括一些打印、对字符串/struct值的其他进程内存的访问)并继续目标应用程序(两个上下文切换)。strace支持syscall过滤器,但是过滤器不能注册为ptrace, strace在用户空间中进行过滤,跟踪所有的系统。
There are faster kernel-based solutions, Brendan Gregg (author of the Dtrace Book - Solaris, OSX, FreeBSD) have many overviews of tracing tools (in his blog: tracing 15 minutes, BPF superpowers, 60s of linux perf, Choosing Tracer 2015 (with Magic pony), page cache stats), for example
有更快的基于内核的解决方案,Brendan Gregg (Dtrace Book - Solaris, OSX, FreeBSD的作者)在他的博客中有很多关于跟踪工具的概述(在他的博客中:跟踪15分钟,BPF超能力,60年代的linux perf,选择Tracer 2015(和Magic pony),页面缓存统计),例如。
You are interested in left part of this diagram, near VFS block. perf
(standard tool), dtrace
(supported only in some linuxes, have license problems - CDDL incompatible with GPL), stap
(systemtap, works better with red Linuxes like CentOS).
您对这个图的左边感兴趣,在VFS块附近。perf(标准工具)、dtrace(只在某些linuxes中支持)有许可证问题——CDDL与GPL不兼容)、stap (systemtap与CentOS等红色linuxes合作较好)。
There is direct replacement of strace - the sysdig tool (requires additional kernel module, github) which works for system calls like tcpdump works for network interface sniffing. This tool sniffs syscalls inside kernel without additional context switches or signals or poking into other process memory with ptrace (kernel already has all strings copied from user) and it also uses smart buffering to dump traces to userspace tool in huge packets.
有直接替换strace——sysdig工具(需要额外的内核模块,github),它适用于像tcpdump这样的系统调用,用于网络接口嗅探。这个工具在内核中嗅探syscalls,不需要额外的上下文切换或信号,也不需要用ptrace插入其他进程内存(内核已经从用户那里复制了所有的字符串),而且它还使用智能缓冲将跟踪转储到userspace工具的大数据包中。
There are other universal tracing frameworks/tools like lttng (out of tree), ftrace / trace-cmd. And bcc
with eBPF is very powerful framework included in modern (4.9+) Linux kernels (check http://www.brendangregg.com/Slides/SCALE2017_perf_analysis_eBPF.pdf). bcc and eBPF allow you to write small (ans safe) code fragments to do some data aggregation in-kernel near the tracepoint:
还有其他通用的跟踪框架/工具,如lttng (out of tree)、ftrace / trace-cmd。使用eBPF的bcc是非常强大的框架,包含在现代(4.9+)Linux内核中(请参阅http://www.brinkregg.com/slides/scale2017_perf_analysis_ebpf.pdf)。bcc和eBPF允许您编写小型(ans安全)代码片段,以便在tracepoint附近的内核中进行一些数据聚合:
Try Brendan's tools near VFS if your Linux kernel is recent enough: opensnoop
, statsnoop
, syncsnoop
; probably some file* tools too (tools support pid filtering with -p PID
or may work system-wide). They are described partially at http://www.brendangregg.com/dtrace.html and published on his github: https://github.com/brendangregg/perf-tools (also https://github.com/iovisor/bcc#tools)
如果您的Linux内核足够新,可以尝试Brendan在VFS附近的工具:opensnoop、statsnoop、syncsnoop;可能也有一些文件*工具(工具支持用-p pid的pid过滤或可能在系统范围内工作)。在http://www.brregg.com/dtrace.html中有部分描述,并发表在他的github上:https://github.com/brregg/perf -tools(也有https://github.com/iovisor/bcc#tools)
As of Linux 4.9, the Linux kernel finally has similar raw capabilities as DTrace. ...
从Linux 4.9开始,Linux内核终于拥有了与DTrace相似的原始功能……
opensnoop is a program to snoop file opens. The filename and file handle are traced along with some process details.
opensnoop是一个窥探文件打开的程序。文件名和文件句柄与一些进程细节一起被跟踪。
# opensnoop -g UID PID PATH FD ARGS 100 3528 /var/ld/ld.config -1 cat /etc/passwd 100 3528 /usr/lib/libc.so.1 3 cat /etc/passwd 100 3528 /etc/passwd 3 cat /etc/passwd 100 3529 /var/ld/ld.config -1 cal 100 3529 /usr/lib/libc.so.1 3 cal
rwsnoop snoop read/write events. This is measuring reads and writes at the application level - syscalls.
rwsnoop snoop读/写事件。这是在应用程序层(syscalls)测量读写。
# rwsnoop UID PID CMD D BYTES FILE 0 2924 sh R 128 /etc/profile 0 2924 sh R 128 /etc/profile 0 2924 sh R 128 /etc/profile 0 2924 sh R 84 /etc/profile 0 2925 quota R 757 /etc/nsswitch.conf 0 2925 quota R 0 /etc/nsswitch.conf 0 2925 quota R 668 /etc/passwd