Performance analysis tools for Linux.
Performance counters for Linux are a new kernel-based subsystem that provide a framework for all things performance analysis. It covers hardware level (CPU/PMU, Performance Monitoring Unit) features and software features (software counters, tracepoints) as well.
-p:stat events on existing process id (comma separated list). 仅分析目标进程及其创建的线程。
-a:system-wide collection from all CPUs. 从所有CPU上收集性能数据。
-r:repeat command and print average + stddev (max: 100). 重复执行命令求平均。
-C:Count only on the list of CPUs provided (comma separated list), 从指定CPU上收集性能数据。
-v:be more verbose (show counter open errors, etc), 显示更多性能数据。
-n:null run - don't start any counters,只显示任务的执行时间 。
-x SEP:指定输出列的分隔符。
-o file:指定输出文件,--append指定追加模式。
--pre <cmd>:执行目标程序前先执行的程序。
--post <cmd>:执行目标程序后再执行的程序。
Run a command and record its profile into perf.data.
This command runs a command and gathers a performance counter profile from it, into perf.data,without displaying anything. This file can then be inspected later on, using perf report.
(1) 常用参数
12345678
-e:Select the PMU event.
-a:System-wide collection from all CPUs.
-p:Record events on existing process ID (comma separated list).
-A:Append to the output file to do incremental profiling.
-f:Overwrite existing data file.
-o:Output file name.
-g:Do call-graph (stack chain/backtrace) recording.
-C:Collect samples only on the list of CPUs provided.
(2) 使用例子
记录nginx进程的性能数据:
1
# perf record -p `pgrep -d ',' nginx`
记录执行ls时的性能数据:
1
# perf record ls -g
记录执行ls时的系统调用,可以知道哪些系统调用最频繁:
1
# perf record -e syscalls:sys_enter ls
perf report
读取perf record创建的数据文件,并给出热点分析结果。
Read perf.data (created by perf record) and display the profile.
This command displays the performance counter profile information recorded via perf record.
Name:内核锁的名字。
aquired:该锁被直接获得的次数,因为没有其它内核路径占用该锁,此时不用等待。
contended:该锁等待后获得的次数,此时被其它内核路径占用,需要等待。
total wait:为了获得该锁,总共的等待时间。
max wait:为了获得该锁,最大的等待时间。
min wait:为了获得该锁,最小的等待时间。
最后还有一个Summary:
123456789
=== output for debug===
bad: 10, total: 246
bad rate: 4.065041 %
histogram of events caused bad sequence
acquire: 0
acquired: 0
contended: 0
release: 10
perf kmem
slab分配器的性能分析。
Tool to trace/measure kernel memory(slab) properties.
1
perf kmem {record | stat} [<options>]
(1) 常用选项
12345
--i <file>:输入文件
--caller:show per-callsite statistics,显示内核中调用kmalloc和kfree的地方。
--alloc:show per-allocation statistics,显示分配的内存地址。
-l <num>:print n lines only,只显示num行。
-s <key[,key2...]>:sort the output (default: frag,hit,bytes)
(2) 使用例子
12
# perf kmem record ls // 记录
# perf kmem stat --caller --alloc -l 20 // 报告
SUMMARY
=======
Total bytes requested: 290544
Total bytes allocated: 447016
Total bytes wasted on internal fragmentation: 156472
Internal fragmentation: 35.003669%
Cross CPU allocations: 2/509
# perf sched record sleep 10 // perf sched record <command>
# perf report latency --sort max
(2) 输出格式
12345678
---------------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |
---------------------------------------------------------------------------------------------------------------
events/10:61 | 0.655 ms | 10 | avg: 0.045 ms | max: 0.161 ms | max at: 9804.958730 s
sleep:11156 | 2.263 ms | 4 | avg: 0.052 ms | max: 0.118 ms | max at: 9804.865552 s
edac-poller:1125 | 0.598 ms | 10 | avg: 0.042 ms | max: 0.113 ms | max at: 9804.958698 s
events/2:53 | 0.676 ms | 10 | avg: 0.037 ms | max: 0.102 ms | max at: 9814.751605 s
perf:11155 | 2.109 ms | 1 | avg: 0.068 ms | max: 0.068 ms | max at: 9814.867918 s
TASK:进程名和pid。
Runtime:实际的运行时间。
Switches:进程切换的次数。
Average delay:平均的调度延迟。
Maximum delay:最大的调度延迟。
Maximum delay at:最大调度延迟发生的时刻。
perf probe
可以自定义探测点。
Define new dynamic tracepoints.
使用例子
(1) Display which lines in schedule() can be probed