Archive for the ‘kernel_general’ Category

How to understand the crashdump and identify the source codes in Linux Kernel?

Problem:

Got this in my crash dump, what does it mean? How to identify exactly where in the assembly listing this dump comes from?

[   90.984131] dump_stack+0xe6/0x15b
[   90.984346] ? _atomic_dec_and_lock+0x165/0x165
[   90.984628] ? do_raw_spin_lock+0xf7/0x1e0
[   90.984899] ubsan_epilogue+0x12/0x53
[   90.985131] __ubsan_handle_type_mismatch+0x1f1/0x2c6
[   90.985443] ? ubsan_epilogue+0x53/0x53
[   90.985683] ? debug_lockdep_rcu_enabled+0x26/0x40
[   90.985979] ? ubsan_epilogue+0x53/0x53
[   90.986219] ? __build_skb+0x87/0x320
[   90.986452] dev_gro_receive+0x1b2b/0x1c00
[   90.986805] ? trace_event_raw_event_fib_table_lookup_nh+0x8/0x4b0
[   90.987205] napi_gro_receive+0xa9/0x410
[   90.987446] e1000_clean_rx_irq+0x76f/0x1670
[   90.987709] ? e1000_alloc_jumbo_rx_buffers+0xd80/0xd80
[   90.988052] e1000_clean+0x8c2/0x2b60
[   90.988284] ? trace_hardirqs_on+0xd/0x10
[   90.988527] ? __lock_acquire+0xf4/0x39c0
[   90.988794] ? e1000_clean_tx_ring+0x3c0/0x3c0
[   90.989076] ? __ubsan_handle_type_mismatch+0x29e/0x2c6
[   90.989396] ? time_hardirqs_on+0x2a/0x380
[   90.989712] ? ubsan_epilogue+0x53/0x53
[   90.989947] ? net_rx_action+0x22e/0xe90
[   90.990207] net_rx_action+0x818/0xe90
[   90.990438] ? napi_complete_done+0x440/0x440
[   90.990726] ? debug_lockdep_rcu_enabled+0x26/0x40
[   90.991050] ? __do_softirq+0x18a/0xbc4
[   90.991293] ? time_hardirqs_on+0x2a/0x380
[   90.991560] ? do_raw_spin_trylock+0xd0/0xd0
[   90.991818] ? __do_softirq+0x18a/0xbc4
[   90.992063] __do_softirq+0x1ae/0xbc4
[   90.992336] irq_exit+0x149/0x1d0
[   90.992540] do_IRQ+0x92/0x1b0
[   90.992730] common_interrupt+0x90/0x90
[   90.992963] RIP: 0010:ext4_da_write_begin+0x5fa/0xee0

Firstly, I will assumed you know that the crashdump above are all return addresses of a “call xxxx” command. And so if you mapped it to the assembly listing in the “vmlinux” files when linux kernel is generated, you will see a “call xxx” just before the addresses above.

The cause of the crashdump is somewhere at the top, as it is the cause of triggering the UBSAN compiled kernel, and thus leading to calling dump_stack() function.

Therefore start reading from the top:

First we will need to convert the “vmlinux” output to its assembly listing:

objdump -d vmlinux > vmlinux.S

Then go direct to the “dump_stack” symbol:

So the address is “0xffffffff81e0ae65” for dump_stack():

dump_stack+0xe6/0x15b ==> 0xffffffff81e0ae65 + 0xe6 = 0xffffffff81e0af4b

So now looking at “0xffffffff81e0af4b”:

So we know the dump address is the address immediately after “call show_stack()” function call.

Therefore, using “addr2line -e vmlinux 0xffffffff81e0af4b” we get:

/home/tteikhua/linux-4.12/lib/dump_stack.c:54

Now we need to direct to the source:

So line 54 is exactly after the “__dump_stack()” call, and we can now do the same for all other calls.

cat crash.addr | sed 's/.* //' | sed 's/^M//' |sed 's//.*//' |sed 's/+/ /' > /tmp/tmp$$
 backtrace=20
 forwardtrace=40
 cat /tmp/tmp$$ | while
 read data data1
 do
 addr=`grep "<${data}.:$" vmlinux.S | sed 's/ .*//'`
 output=`addr2line -e ./vmlinux 0x$addr | sed 's/ .*//'`
 addr1=`bash ./addhex.bash 0x$addr $data1`
 output=`addr2line -e ./vmlinux 0x$addr1 | sed 's/ .*//'`
 echo "${output}"
 source=`echo $output | tail -1`
 filename=`echo $source | sed 's/:.*//'`
 line=`echo $source | sed 's/.*://'`
 line1=`expr ${line} - ${backtrace}`

echo "==========================================================="
 cat $filename | nl -ba | sed -n "${line1},+${forwardtrace}p" | sed "${backtrace}s/^/>>/"

echo "==========================================================="
 done

As the automated script above contained some control codes, the attached file is better (output as below):

https://drive.google.com/open?id=0B1hg6_FvnEa8cGpvZEdxTFNCVDA

https://drive.google.com/open?id=0B1hg6_FvnEa8MHEwNU10VDRXQzA

References:

http://helenfornazier.blogspot.sg/2015/07/linux-kernel-memory-corruption-debug.html

https://serverfault.com/questions/605946/kernel-stack-trace-to-source-code-lines

https://stackoverflow.com/questions/6151538/addr2line-on-kernel-module

http://elinux.org/Addr2line_for_kernel_debugging

http://www.linuxdevcenter.com/cmd/cmd.csp?path=a/addr2line

https://plus.google.com/+TheodoreTso/posts/h8H9z4EopkS

How to extend the disk size in my VMware guest running Centos 7

To increase the disk size inside the VMware guest running CentOS7 (7.3, 1611) these are the steps needed:

a. Remove all snapshots from your Vmware image, or if you cannot afford to remove the snapshot, shutdown the guest OS, and "clone from snapshot" (you can select any specific snapshot to clone) to create a new Vmware image. This new one is totally independent, and will not have any of the snapshot.

b. Now go to VMware settings, hardware, and "hard disk" to select "Expand Disk" option.

c. After expanding disk, reboot into your CentOS.

d. Issue "sudo fdisk /dev/sda" to repartition the /dev/sda inside the guest OS:

And here is the partition deletion and recreation part (be careful) – the starting block number must be the same, but the end will default to the largest possible block number.

Now remember to REBOOT, as the partition table will not be updated until after reboot.

And now check using pvdisplay – it is not updated yet, still 75G. Use "pvresize" to extend it.

So now the PV occupies 200G.

Now check with lvdisplay:

So we need to extend the LV to 200G:

Followed by xfs_growfs:

Check now:

Updated.

Linux kernel commandline bootup option

In your /boot/grub/grub.cfg kernel bootup setup, there is a line where you can add a lot of kernel option (see the “linux” line below):

Or if you startup your kernel using QEMU, the place where you can add the kernel options are inside the “-append” line (in bold):

qemu-system-x86_64 -m 3048 -net nic -net user,host=10.0.2.10,hostfwd=tcp::11398-:22 -display none -serial stdio -no-reboot -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp sockets=2,cores=2,threads=1 -enable-kvm -usb -usbdevice mouse -usbdevice tablet -soundhw all -hda /home/user/syz4123/wheezy_4.12.img -snapshot -initrd /home/tteikhua/syz4129/initrd -kernel /home/tteikhua/syz4129/bzImage -append “console=ttyS0 slub_debug=FZ vsyscall=native rodata=n oops=panic nmi_watchdog=panic panic_on_warn=1 panic=86400 ftrace_dump_on_oops=orig_cpu earlyprintk=serial net.ifnames=0 biosdevname=0 kvm-intel.nested=1 kvm-intel.unrestricted_guest=1 kvm-intel.vmm_exclusive=1 kvm-intel.fasteoi=1 kvm-intel.ept=1 kvm-intel.flexpriority=1 kvm-intel.vpid=1 kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1 kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1 kvm-intel.enable_apicv=1 root=/dev/sda”

Or you can add it inside the /etc/default/grub file, and modify the “GRUB_CMDLINE_LINUX_DEFAULT” parameter, and run “update-grub” to modify the /boot/grub/grub.cfg file:

In all these scenario, what are all the available kernel options which you can enable?

In the latest count, there are about 460+ kernel options available (full list is here: https://tthtlc.wordpress.com/a-list-of-kernel-command-line-option-as-collected-from-kernel-source-code/):

Too many, so what are most useful ones?

Cannot answer myself either. But for debugging:

These seemed to be useful.

In particular, one “slub_debug” is used for debugging a recently discoverd vulnerability (CVE-2017-7533):

https://bugzilla.redhat.com/attachment.cgi?id=1296934

image.png

The kernel function for handling “slub_debug” (from above) is “setup_slub_debug” (inside mm/slub.c) from where we can read and understand how the option is handled by the kernel:

Selection_133

What are the xxx_initcall() in linux kernel source codes?

Searching through some of the linux kernel source for xxxx_initcall() (master list I have compiled is here:   tthtlc.wordpress.com/a-compilation-of-the-xxx_initcall-initialization-routines-in-linux-kernel/):

./kernel/apic/apic.c:
early_initcall(validate_x2apic);
core_initcall(init_lapic_sysfs);
late_initcall(lapic_insert_resource);

./kernel/apic/io_apic.c:
device_initcall(ioapic_init_ops);

./kernel/apic/vector.c:
late_initcall(print_ICs);

./kernel/apic/probe_32.c:
late_initcall(print_ipi_mode);

./kernel/apic/x2apic_uv_x.c:
late_initcall(uv_init_heartbeat);

./kernel/apic/apic_numachip.c:
early_initcall(numachip_system_init);

./kernel/pcspeaker.c:
device_initcall(add_pcspkr);

./kernel/devicetree.c:
device_initcall(add_bus_probe);

./kernel/sysfb.c:
device_initcall(sysfb_init);

./xen/p2m.c:
fs_initcall(xen_p2m_debugfs);

./xen/grant-table.c:
core_initcall(xen_pvh_gnttab_setup);

./pci/broadcom_bus.c:
postcore_initcall(broadcom_postcore_init);

./pci/amd_bus.c:
postcore_initcall(amd_postcore_init);

You can see that there are many different types of xxx_initcall():

Going to the definition part:

./include/linux/init.h:

#define core_initcall(fn) __define_initcall(fn, 1)
#define core_initcall_sync(fn) __define_initcall(fn, 1s)
#define postcore_initcall(fn) __define_initcall(fn, 2)
#define postcore_initcall_sync(fn) __define_initcall(fn, 2s)
#define arch_initcall(fn) __define_initcall(fn, 3)
#define arch_initcall_sync(fn) __define_initcall(fn, 3s)
#define subsys_initcall(fn) __define_initcall(fn, 4)
#define subsys_initcall_sync(fn) __define_initcall(fn, 4s)
#define fs_initcall(fn) __define_initcall(fn, 5)
#define fs_initcall_sync(fn) __define_initcall(fn, 5s)
#define rootfs_initcall(fn) __define_initcall(fn, rootfs)
#define device_initcall(fn) __define_initcall(fn, 6)
#define device_initcall_sync(fn) __define_initcall(fn, 6s)
#define late_initcall(fn) __define_initcall(fn, 7)
#define late_initcall_sync(fn) __define_initcall(fn, 7s):

And so what are these initcall()?

Essentially these are to indicate to the kernel the name of the functions that does the initialization for each part of the kernel the developer is writing.   Different parts of kernel will have the initialization function called in different order.

When are they called?

Inside Linux kernel –> init/main.c:start_kernel():

And then start_kernel() eventually called rest_init():

So rest_init() will call kernel_thread() to start kernel_init() function which then call kernel_init_freeable(), which eventually called do_pre_smp_initcalls():

And later – after do_pre_smp_initcalls(), do_basic_setup() will also be calling the initcall() (via do_initcalls()):

As shown below:

And now the do_initcalls() actually traversed through all the different levels of initcalls():

How to debug these initcall()?
Image result for calling sequence of initcall in linux kernel

References:

  1. http://linuxgazette.net/157/amurray.html
  2. http://blog.techveda.org/kernel__initcalls/
  3. https://stackoverflow.com/questions/18605653/linux-module-init-vs-core-initcall-vs-early-initcall
  4. https://stackoverflow.com/questions/10368837/how-does-linux-determine-the-order-of-module-init-calls
  5. https://stackoverflow.com/questions/10540008/how-kernel-determine-the-sequence-of-init-calls
  6. https://lwn.net/Articles/141730/
  7. http://www.compsoc.man.ac.uk/~moz/kernelnewbies/documents/initcall/kernel.html
  8. https://0xax.gitbooks.io/linux-insides/content/Concepts/initcall.html

HUGE memory copy

This is from mm/memory.c: copy_huge_page_from_user(), which is to copy memory from userspace to kernel, where the kernel is setup with HUGE page as the page table.

Look at the cond_resched() in the last few lines.

What does that mean? It means the the kernel has voluntarily schedule itself out, and if there are other more urgent process that need to be executed, it will take over the ownership of the CPU to continue execution.

In general, any long or large operation: always introduce cond_resched() to ensure the kernel will not be hijacked by these long running process.

The reflink(2) system call v2. LWN.net

https://lwn.net/Articles/332802/

https://pkalever.wordpress.com/2016/01/22/xfs-reflinks-tutorial/

https://lwn.net/Articles/331808/

https://blogs.oracle.com/devpartner/entry/whats_new_in_oracle_linux

https://blogs.oracle.com/wim/entry/ocfs2_reflink

https://lkml.org/lkml/2016/10/12/176

https://bbs.archlinux.org/viewtopic.php?id=212825

https://patchwork.kernel.org/patch/9480871/

TPM2 and Linux

http://blog.hansenpartnership.com/tpm2-and-linux/

http://twobit.us/2016/05/tpm2-uefi-measurements-and-event-log/

https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Implementing_TPM2_Support_in_EDKII.pdf

https://github.com/01org/tpm2.0-tools/wiki/How-to-use-tpm2-tools

https://communities.intel.com/thread/76492 (Intel NUC + TPM2 + Linux)

http://www.slideshare.net/k33a/trusted-platform-module-tpm

http://www.slideshare.net/OWASP_Poland/wroclaw-3-trusted-computing

%d bloggers like this: