Archive for the ‘kernel_general’ Category

How to extend the filesystem size in CentOS (default is XFS) running as a VMware guest?

sudo fdisk /dev/sdc –> /dev/sdc1

Then issue "pvcreate /dev/sdc1".

This will add the disk to the physical volume group (pvdisplay), noticed the "VG name" is not assigned yet:

vgextend centos /dev/sdc1 —> after this the following 59.5 G is displayed.

And "sudo pvdisplay" will show that /dev/sdc1 is assigned to the VG group "centos".

But lvdisplay still showing old values:

Finally, "lvextend -l +100%FREE /dev/centos/root" and this will extend the VG extents to the 57G limit.

Finally, boot in rescue mode and do the following (this is for ext4 filesystem):

resize2fs /dev/mapper/centos-root

But if your filesystem is XFS, which is default for CentOS 7.2 1511:

The rootfs is running XFS, and XFS does allow online resizing.

So without any rebooting or rescue mode:

So now the rootfs have been properly resize by the command:

xfs_growfs -d /dev/mapper/centos-root

The option "-d" is for maximum possible size.

So in summary, for LVM only the last part is different between XFS and ext4 filesystem: resize2fs or xfs_growfs command.

Advertisements

How does KCOVtrace worked (in the Linux Kernel)?

We all are familiar with how strace worked: it uses the system call “ptrace” to attach to a process and then setting breakpoints on all the system call.

So how does KCOVTRACE worked?

The tool is available here:

https://github.com/google/syzkaller/blob/master/tools/kcovtrace/kcovtrace.c

It requires the kernel to be compiled with CONFIG_KCOV.

When applied on a binary like “/bin/ls” and let it execute, we get a list of addresses as output:

0xffffffff81a109e4
0xffffffff81a109da
0xffffffff81a109e4
0xffffffff81a109da
0xffffffff81a109e4

To know the address 0xffffffff81a109e4, let use the Linux kernel image file “vmlinux”, do a disassembly via “objdump -d vmlinux”:

Selection_065

As we can see the address is the byte offset immediately following calling the function <__sanitizer_cov_trace_pc>.

This function is inserted using GCC plugin mechanism when the linux kernel is compiled with GCC plugin enabled, and CONFIG_KCOV=y.

Inside kernel/kcov.c, you see that that the address of each basic block that got executed in the kernel is updated into “area[]” array, which get floats to the userspace, which is retrieved via /sys/kernel/debug/kcov interface ioctl() read.

Extracted from above URL and displayed as below:

Since only the process that explicitly open the /sys/kernel/debug/kcov interface will get to receive the output the kcov is therefore specific to the process that open the file descriptor.

(unlike FTRACE, where the interface can be open, and another process generate FTRACE in the kernel, only to be retrieve by final process that read the FTRACE).

Having fun with funcgraph, functrace, and funccount

The utility tool "funcgraph" is a script that is based on "FTRACE", and in Ubuntu it is available through the following packages.

apt-file search funcgraph

perf-tools-unstable: /usr/bin/funcgraph
perf-tools-unstable: /usr/share/doc/perf-tools-unstable/examples/funcgraph_example.txt.gz

And looking at all the binaries inside perf-tools-unstable:

First trying to understand kmem_cache_alloc calls (execute via "sudo" as root privilege is needed):

(sudo funcgraph -m 100 "kmem_cache_alloc")

or understanding "scheduler":

(funcgraph -m 100 "schedule*")

Or trying to understand ext4* calls:

(funcgraph -m 100 "ext4*")

And then there is the "functrace" calls:

functrace ‘tcp*’

The command "funccount -t 10 ‘tcp*’" cannot be generated as my kernel lacks the kernel configuration:

And then there is tpoint:

References:

http://www.brendangregg.com/Slides/LISA2014_LinuxPerfAnalysisNewTools.pdf

How to understand the crashdump and identify the source codes in Linux Kernel?

Problem:

Got this in my crash dump, what does it mean? How to identify exactly where in the assembly listing this dump comes from?

[   90.984131] dump_stack+0xe6/0x15b
[   90.984346] ? _atomic_dec_and_lock+0x165/0x165
[   90.984628] ? do_raw_spin_lock+0xf7/0x1e0
[   90.984899] ubsan_epilogue+0x12/0x53
[   90.985131] __ubsan_handle_type_mismatch+0x1f1/0x2c6
[   90.985443] ? ubsan_epilogue+0x53/0x53
[   90.985683] ? debug_lockdep_rcu_enabled+0x26/0x40
[   90.985979] ? ubsan_epilogue+0x53/0x53
[   90.986219] ? __build_skb+0x87/0x320
[   90.986452] dev_gro_receive+0x1b2b/0x1c00
[   90.986805] ? trace_event_raw_event_fib_table_lookup_nh+0x8/0x4b0
[   90.987205] napi_gro_receive+0xa9/0x410
[   90.987446] e1000_clean_rx_irq+0x76f/0x1670
[   90.987709] ? e1000_alloc_jumbo_rx_buffers+0xd80/0xd80
[   90.988052] e1000_clean+0x8c2/0x2b60
[   90.988284] ? trace_hardirqs_on+0xd/0x10
[   90.988527] ? __lock_acquire+0xf4/0x39c0
[   90.988794] ? e1000_clean_tx_ring+0x3c0/0x3c0
[   90.989076] ? __ubsan_handle_type_mismatch+0x29e/0x2c6
[   90.989396] ? time_hardirqs_on+0x2a/0x380
[   90.989712] ? ubsan_epilogue+0x53/0x53
[   90.989947] ? net_rx_action+0x22e/0xe90
[   90.990207] net_rx_action+0x818/0xe90
[   90.990438] ? napi_complete_done+0x440/0x440
[   90.990726] ? debug_lockdep_rcu_enabled+0x26/0x40
[   90.991050] ? __do_softirq+0x18a/0xbc4
[   90.991293] ? time_hardirqs_on+0x2a/0x380
[   90.991560] ? do_raw_spin_trylock+0xd0/0xd0
[   90.991818] ? __do_softirq+0x18a/0xbc4
[   90.992063] __do_softirq+0x1ae/0xbc4
[   90.992336] irq_exit+0x149/0x1d0
[   90.992540] do_IRQ+0x92/0x1b0
[   90.992730] common_interrupt+0x90/0x90
[   90.992963] RIP: 0010:ext4_da_write_begin+0x5fa/0xee0

Firstly, I will assumed you know that the crashdump above are all return addresses of a “call xxxx” command. And so if you mapped it to the assembly listing in the “vmlinux” files when linux kernel is generated, you will see a “call xxx” just before the addresses above.

The cause of the crashdump is somewhere at the top, as it is the cause of triggering the UBSAN compiled kernel, and thus leading to calling dump_stack() function.

Therefore start reading from the top:

First we will need to convert the “vmlinux” output to its assembly listing:

objdump -d vmlinux > vmlinux.S

Then go direct to the “dump_stack” symbol:

So the address is “0xffffffff81e0ae65” for dump_stack():

dump_stack+0xe6/0x15b ==> 0xffffffff81e0ae65 + 0xe6 = 0xffffffff81e0af4b

So now looking at “0xffffffff81e0af4b”:

So we know the dump address is the address immediately after “call show_stack()” function call.

Therefore, using “addr2line -e vmlinux 0xffffffff81e0af4b” we get:

/home/tteikhua/linux-4.12/lib/dump_stack.c:54

Now we need to direct to the source:

So line 54 is exactly after the “__dump_stack()” call, and we can now do the same for all other calls.

cat crash.addr | sed 's/.* //' | sed 's/^M//' |sed 's//.*//' |sed 's/+/ /' > /tmp/tmp$$
 backtrace=20
 forwardtrace=40
 cat /tmp/tmp$$ | while
 read data data1
 do
 addr=`grep "<${data}.:$" vmlinux.S | sed 's/ .*//'`
 output=`addr2line -e ./vmlinux 0x$addr | sed 's/ .*//'`
 addr1=`bash ./addhex.bash 0x$addr $data1`
 output=`addr2line -e ./vmlinux 0x$addr1 | sed 's/ .*//'`
 echo "${output}"
 source=`echo $output | tail -1`
 filename=`echo $source | sed 's/:.*//'`
 line=`echo $source | sed 's/.*://'`
 line1=`expr ${line} - ${backtrace}`

echo "==========================================================="
 cat $filename | nl -ba | sed -n "${line1},+${forwardtrace}p" | sed "${backtrace}s/^/>>/"

echo "==========================================================="
 done

As the automated script above contained some control codes, the attached file is better (output as below):

https://drive.google.com/open?id=0B1hg6_FvnEa8cGpvZEdxTFNCVDA

https://drive.google.com/open?id=0B1hg6_FvnEa8MHEwNU10VDRXQzA

References:

http://helenfornazier.blogspot.sg/2015/07/linux-kernel-memory-corruption-debug.html

https://serverfault.com/questions/605946/kernel-stack-trace-to-source-code-lines

https://stackoverflow.com/questions/6151538/addr2line-on-kernel-module

http://elinux.org/Addr2line_for_kernel_debugging

http://www.linuxdevcenter.com/cmd/cmd.csp?path=a/addr2line

https://plus.google.com/+TheodoreTso/posts/h8H9z4EopkS

How to extend the disk size in my VMware guest running Centos 7

To increase the disk size inside the VMware guest running CentOS7 (7.3, 1611) these are the steps needed:

a. Remove all snapshots from your Vmware image, or if you cannot afford to remove the snapshot, shutdown the guest OS, and "clone from snapshot" (you can select any specific snapshot to clone) to create a new Vmware image. This new one is totally independent, and will not have any of the snapshot.

b. Now go to VMware settings, hardware, and "hard disk" to select "Expand Disk" option.

c. After expanding disk, reboot into your CentOS.

d. Issue "sudo fdisk /dev/sda" to repartition the /dev/sda inside the guest OS:

And here is the partition deletion and recreation part (be careful) – the starting block number must be the same, but the end will default to the largest possible block number.

Now remember to REBOOT, as the partition table will not be updated until after reboot.

And now check using pvdisplay – it is not updated yet, still 75G. Use "pvresize" to extend it.

So now the PV occupies 200G.

Now check with lvdisplay:

So we need to extend the LV to 200G:

Followed by xfs_growfs:

Check now:

Updated.

Linux kernel commandline bootup option

In your /boot/grub/grub.cfg kernel bootup setup, there is a line where you can add a lot of kernel option (see the “linux” line below):

Or if you startup your kernel using QEMU, the place where you can add the kernel options are inside the “-append” line (in bold):

qemu-system-x86_64 -m 3048 -net nic -net user,host=10.0.2.10,hostfwd=tcp::11398-:22 -display none -serial stdio -no-reboot -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp sockets=2,cores=2,threads=1 -enable-kvm -usb -usbdevice mouse -usbdevice tablet -soundhw all -hda /home/user/syz4123/wheezy_4.12.img -snapshot -initrd /home/tteikhua/syz4129/initrd -kernel /home/tteikhua/syz4129/bzImage -append “console=ttyS0 slub_debug=FZ vsyscall=native rodata=n oops=panic nmi_watchdog=panic panic_on_warn=1 panic=86400 ftrace_dump_on_oops=orig_cpu earlyprintk=serial net.ifnames=0 biosdevname=0 kvm-intel.nested=1 kvm-intel.unrestricted_guest=1 kvm-intel.vmm_exclusive=1 kvm-intel.fasteoi=1 kvm-intel.ept=1 kvm-intel.flexpriority=1 kvm-intel.vpid=1 kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1 kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1 kvm-intel.enable_apicv=1 root=/dev/sda”

Or you can add it inside the /etc/default/grub file, and modify the “GRUB_CMDLINE_LINUX_DEFAULT” parameter, and run “update-grub” to modify the /boot/grub/grub.cfg file:

In all these scenario, what are all the available kernel options which you can enable?

In the latest count, there are about 460+ kernel options available (full list is here: https://tthtlc.wordpress.com/a-list-of-kernel-command-line-option-as-collected-from-kernel-source-code/):

Too many, so what are most useful ones?

Cannot answer myself either. But for debugging:

These seemed to be useful.

In particular, one “slub_debug” is used for debugging a recently discoverd vulnerability (CVE-2017-7533):

https://bugzilla.redhat.com/attachment.cgi?id=1296934

image.png

The kernel function for handling “slub_debug” (from above) is “setup_slub_debug” (inside mm/slub.c) from where we can read and understand how the option is handled by the kernel:

Selection_133

What are the xxx_initcall() in linux kernel source codes?

Searching through some of the linux kernel source for xxxx_initcall() (master list I have compiled is here:   tthtlc.wordpress.com/a-compilation-of-the-xxx_initcall-initialization-routines-in-linux-kernel/):

./kernel/apic/apic.c:
early_initcall(validate_x2apic);
core_initcall(init_lapic_sysfs);
late_initcall(lapic_insert_resource);

./kernel/apic/io_apic.c:
device_initcall(ioapic_init_ops);

./kernel/apic/vector.c:
late_initcall(print_ICs);

./kernel/apic/probe_32.c:
late_initcall(print_ipi_mode);

./kernel/apic/x2apic_uv_x.c:
late_initcall(uv_init_heartbeat);

./kernel/apic/apic_numachip.c:
early_initcall(numachip_system_init);

./kernel/pcspeaker.c:
device_initcall(add_pcspkr);

./kernel/devicetree.c:
device_initcall(add_bus_probe);

./kernel/sysfb.c:
device_initcall(sysfb_init);

./xen/p2m.c:
fs_initcall(xen_p2m_debugfs);

./xen/grant-table.c:
core_initcall(xen_pvh_gnttab_setup);

./pci/broadcom_bus.c:
postcore_initcall(broadcom_postcore_init);

./pci/amd_bus.c:
postcore_initcall(amd_postcore_init);

You can see that there are many different types of xxx_initcall():

Going to the definition part:

./include/linux/init.h:

#define core_initcall(fn) __define_initcall(fn, 1)
#define core_initcall_sync(fn) __define_initcall(fn, 1s)
#define postcore_initcall(fn) __define_initcall(fn, 2)
#define postcore_initcall_sync(fn) __define_initcall(fn, 2s)
#define arch_initcall(fn) __define_initcall(fn, 3)
#define arch_initcall_sync(fn) __define_initcall(fn, 3s)
#define subsys_initcall(fn) __define_initcall(fn, 4)
#define subsys_initcall_sync(fn) __define_initcall(fn, 4s)
#define fs_initcall(fn) __define_initcall(fn, 5)
#define fs_initcall_sync(fn) __define_initcall(fn, 5s)
#define rootfs_initcall(fn) __define_initcall(fn, rootfs)
#define device_initcall(fn) __define_initcall(fn, 6)
#define device_initcall_sync(fn) __define_initcall(fn, 6s)
#define late_initcall(fn) __define_initcall(fn, 7)
#define late_initcall_sync(fn) __define_initcall(fn, 7s):

And so what are these initcall()?

Essentially these are to indicate to the kernel the name of the functions that does the initialization for each part of the kernel the developer is writing.   Different parts of kernel will have the initialization function called in different order.

When are they called?

Inside Linux kernel –> init/main.c:start_kernel():

And then start_kernel() eventually called rest_init():

So rest_init() will call kernel_thread() to start kernel_init() function which then call kernel_init_freeable(), which eventually called do_pre_smp_initcalls():

And later – after do_pre_smp_initcalls(), do_basic_setup() will also be calling the initcall() (via do_initcalls()):

As shown below:

And now the do_initcalls() actually traversed through all the different levels of initcalls():

How to debug these initcall()?
Image result for calling sequence of initcall in linux kernel

References:

  1. http://linuxgazette.net/157/amurray.html
  2. http://blog.techveda.org/kernel__initcalls/
  3. https://stackoverflow.com/questions/18605653/linux-module-init-vs-core-initcall-vs-early-initcall
  4. https://stackoverflow.com/questions/10368837/how-does-linux-determine-the-order-of-module-init-calls
  5. https://stackoverflow.com/questions/10540008/how-kernel-determine-the-sequence-of-init-calls
  6. https://lwn.net/Articles/141730/
  7. http://www.compsoc.man.ac.uk/~moz/kernelnewbies/documents/initcall/kernel.html
  8. https://0xax.gitbooks.io/linux-insides/content/Concepts/initcall.html
%d bloggers like this: