Archive for November, 2013

Using ftrace to understanding linux kernel API

Here we will attempt to understand how swap space is used in the linux kernel.

First, the following kernel configuration are used:

CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
# CONFIG_PSTORE_FTRACE is not set
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set

CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_FUNCTION_PROFILER=y

And this is for 3.11.0-rc3+, from linus git tree branch. My present distro is Ubuntu 12.04.1 LTS 32-bit.

I am not sure which parameter above are really needed by FTRACE, but sufficient enough to meet our present objective.

Recompile the linux kernel using the above parameters, and reboot into it. Now run the following program using bash shell (as root, and note that for earlier kernel a different set of command are needed):

#!/bin/bash

mkdir /debug
mount -t debugfs nodev /debug

echo “” >/debug/tracing/trace
echo 0 >/debug/tracing/tracing_on

echo “*swap*” > /debug/tracing/set_ftrace_filter
echo function >/debug/tracing/current_tracer
echo 1 >/debug/tracing/tracing_on

swapon -a
xxd < YEYEYE.txt | tail -100

echo 0 >/debug/tracing/tracing_on
cat /debug/tracing/trace

The file “YEYEYE.txt” is a 200MB large binary file in my case. To get a good explanation of the ftrace mechanism (which is always kernel version specific) just “cat /debug/tracing/README” to read the self-documentation.

The output of the above:

# tracer: function
#
# entries-in-buffer/entries-written: 3228/3228 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
sh-5897 [001] ...1 10896.768829: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.768836: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.768840: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.768843: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.768846: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.768850: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.768853: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.768857: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.768860: reuse_swap_page <-do_wp_page
swapon-5898 [006] ...1 10896.768864: reuse_swap_page <-do_wp_page
swapon-5898 [006] ...1 10896.768873: reuse_swap_page <-do_wp_page
swapon-5898 [005] .... 10896.770397: free_pages_and_swap_cache <-tlb_flush_mmu
swapon-5898 [005] .... 10896.770399: free_pages_and_swap_cache <-tlb_flush_mmu
sh-5897 [001] ...1 10896.770602: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.770685: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.770689: reuse_swap_page <-do_wp_page
sh-5897 [001] ...1 10896.770692: reuse_swap_page <-do_wp_page

and if you just extract out all the "xxd" lines from above output, meaning all the ftrace output related to execution of the userspace program "xxd" (which supposedly is to trigger usage of swap space due to large file read).

xxd-5899 [006] …1 10896.770817: reuse_swap_page <-do_wp_page
xxd-5899 [007] …. 10896.770997: free_pages_and_swap_cache <-tlb_flush_mmu

All the above APIs essentially indicate caller-callee relationship.

The specific functions to be traced must have “swap” inside its name, since the command:

echo “*swap*” > /debug/tracing/set_ftrace_filter

is issued before enabling the tracing.

If you want global list of all kernel APIs to be traced, then just issue:

echo “*” > /debug/tracing/set_ftrace_filter

It is important to have the double quote (or single quote will do as well) wrapping the star.

If you want multiple APIs from different families the following is possible:

echo vfs_* blk_* tcp_* ipv4_* *socket* > /debug/tracing/set_ftrace_filter

As the star above is not wrapped by any quote sign, just make sure that there is no files matching the above names by regular expression matching.

The details of the tracing are given here:

http://pastebin.com/JdsZZacX

And to repeat: the above is just for the kernel API with “swap” as part of its name.

And here are some introduction to ftrace in Linux kernel:

http://lwn.net/Articles/370423/

https://www.kernel.org/doc/Documentation/trace/ftrace.txt (updated as of 3.10)

https://www.kernel.org/doc/Documentation/trace/ftrace-design.txt

Learning PF_RING and netmap

http://www.ntop.org/products/pf_ring/

http://www.ntop.org/support/documentation/

http://metaflowsblog.wordpress.com/2013/09/26/and-now-for-something-completely-technical-pf-ring-10-gbps-snort-ids/

http://www.ntop.org/products/pf_ring/dna/

http://www.ntop.org/pf_ring/benchmarking-pf_ring-dna/

(good coverage of PF_RING DNA –> use of hardware + zerocopy feature for direct access).

http://wiki.aanval.com/wiki/Netmap

http://info.iet.unipi.it/~luigi/netmap/

USENIX best paper:

https://www.usenix.org/conference/usenixfederatedconferencesweek/netmap-novel-framework-fast-packet-io

http://paolozaino.wordpress.com/2013/04/23/packetlinux-the-fastest-linux-distribution-for-networking-and-deep-packet-inspection/

%d bloggers like this: