Archive for the ‘android’ Category

TPM2 and Linux (Intel NUC + TPM2 + Linux)

Recovery of a lost or deleted virtual machine .vmx configuration file

Rule No 1 when using Vmware: never put it on a NTFS formatted drive and share it between Windows and Ubuntu. I have it twice corrupted – the VMX file – and the entire VM was lost.

Just put it entirely on Linux formatted drive (like ext4) will be much safer.

But luckily this time round I found this:

And using it I managed to salvaged the entire VM.

And the "vmdk" to be included into the new virtual machine has to be carefully selected: don’t select the "redo log files" instead, as explained below:

Understanding FPU usage in linux kernel

I am interested to learn how Linux kernel handle all the FPU registers (XMM, MMX, SSE, SSE2 etc). This is because it is a security opportunities for memory information leakage, if the values of these registers are not properly initialized. But on the other hand, these registers are so huge, that it will seriously slow down the kernel’s performance, should the context be saved and restored for all FPU registers, whenever there is a context switch.

To understand this, read this comment:

 * FPU context switching strategies:
 * Against popular belief, we don't do lazy FPU saves, due to the
 * task migration complications it brings on SMP - we only do
 * lazy FPU restores.
 * 'lazy' is the traditional strategy, which is based on setting
 * CR0::TS to 1 during context-switch (instead of doing a full
 * restore of the FPU state), which causes the first FPU instruction
 * after the context switch (whenever it is executed) to fault - at
 * which point we lazily restore the FPU state into FPU registers.
 * Tasks are of course under no obligation to execute FPU instructions,
 * so it can easily happen that another context-switch occurs without
 * a single FPU instruction being executed. If we eventually switch
 * back to the original task (that still owns the FPU) then we have
 * not only saved the restores along the way, but we also have the
 * FPU ready to be used for the original task.
 * 'lazy' is deprecated because it's almost never a performance win
 * and it's much more complicated than 'eager'.
 * 'eager' switching is by default on all CPUs, there we switch the FPU
 * state during every context switch, regardless of whether the task
 * has used FPU instructions in that time slice or not. This is done
 * because modern FPU context saving instructions are able to optimize
 * state saving and restoration in hardware: they can detect both
 * unused and untouched FPU state and optimize accordingly.

Image result for CR0 TS

In summary:

a. LAZY mode: FPU is not restored/saved all the time, but only when it is used, and the use of FPU will also reset a flag in CR0:TS, thus we don’t have to detect for FPU register usage all the time. But this mode is not the default, as the time save/performance enhanced is not significant, and the algorithm become very complex, thus increasing processing overheads.

b. EAGER mode: This is the default mode. FPU is always saved and restored for each context switch. But again there is hardware feature that can detect whether the long chain of FPU registers are used – and whichever are used, only that register will be saved/restored, and thus it is very hardware efficient.

In the kernel, arch/x86/:


has_fpu() will check via the following code whether there exists FPU being use:

if (cr0 & (X86_CR0_EM|X86_CR0_TS)) {

which is called by get_cpuflags():

void get_cpuflags(void)


if (has_fpu())

set_bit(X86_FEATURE_FPU, cpu.flags);

The following is a 208 patches in 2015 for FPU usage in kernel:

The instructions to save all FPU – XMM, MMX, SSE, SSE2 etc is called FXSAVE, FNSAVE, FSAVE:

and the overhead in linux kernel is benchmarked as 87 cycles.

These optimized way of saving can also be found in comments below:

 * When executing XSAVEOPT (or other optimized XSAVE instructions), if
 * a processor implementation detects that an FPU state component is still
 * (or is again) in its initialized state, it may clear the corresponding
 * bit in the header.xfeatures field, and can skip the writeout of registers
 * to the corresponding memory layout.
 * This means that when the bit is zero, the state component might still contain
 * some previous - non-initialized register state.

To detect that the kernel are triggered on FPU usage, we can set breakpoint on fpstate_sanitize_xstate in KGDB, and the kernel stacktrace are as follows:

Thread 441 hit Breakpoint 1, fpstate_sanitize_xstate (fpu=0xffff8801e7a2ea80) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/xstate.c:111
111 {
#0  fpstate_sanitize_xstate (fpu=0xffff8801e7a2ea80) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/xstate.c:111
#1  0xffffffff8103b183 in copy_fpstate_to_sigframe (buf=0xffff8801e7a2ea80, buf_fx=0x7f73ad4fe3c0, size=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/signal.c:178
#2  0xffffffff8102e207 in get_sigframe (frame_size=440, fpstate=0xffff880034dcbe10, regs=, ka=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:247
#3  0xffffffff8102e703 in __setup_rt_frame (regs=, set=, ksig=, sig=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:413
#4  setup_rt_frame (regs=, ksig=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:627
#5  handle_signal (regs=, ksig=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:671
#6  do_signal (regs=0xffff880034dcbf58) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:714
#7  0xffffffff8100320c in exit_to_usermode_loop (regs=0xffff880034dcbf58, cached_flags=4) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/entry/common.c:248
#8  0xffffffff81003c6e in prepare_exit_to_usermode (regs=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/entry/common.c:283

You can use “info thread 441” (see above) to check which process is the stacktrace above corresponding to. Among them is “Xorg”, otherwise, majority of process does not use FPU.

From the stacktrace, “get_sigframe()” is the first function that seemed to analyze on FPU usage:

static void __user *
get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
             void __user **fpstate)
if (fpu->fpstate_active) {
        unsigned long fx_aligned, math_size;

        sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
        *fpstate = (struct _fpstate_32 __user *) sp;
        if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
                            math_size) < 0)
                return (void __user *) -1L;

So essentially what is happening here is copying the FPU information to userspace stack pointer (which is “sp”).

TO DO: Understanding Long Mode & “Self-referenced page table”?

Linux kernel memory exploitation via PTE

An introduction to KProbes

This article answer the question:
How does kprobe worked?
How does jprobe worked?
Where in the kernel source is kprobe and jprobe detected and handled?
What is the hardware mechanisms used for probing?

How to use qemu for setting up VM client?

How to use QEMU to run a VM client, assuming that the kernel have kvm enabled and running?

a. create rootfs image as your OS file image, with all the general GNU/Linux utilities:

This is how I create the rootfs for Xenial (I copied and modified from Syzkaller project), using the debootstrap command mainly, but for CentOS rootfs, perhaps you can try:


And here is the script for creating Xenial-based rootfs using debootstrap:

# Copyright 2016 syzkaller project authors. All rights reserved.
# Use of this source code is governed by Apache 2 LICENSE that can be found in the LICENSE file.

# creates a minimal Debian-xenial Linux image suitable for syzkaller.

set -eux

# Create a minimal Debian-xenial distributive as a directory.
sudo rm -rf xenial
mkdir -p xenial
sudo debootstrap –include=openssh-server xenial xenial

# Set some defaults and enable promtless ssh to the machine for root.
sudo sed -i ‘/^root/ { s/:x:/::/ }’ xenial/etc/passwd
echo ‘V0:23:respawn:/sbin/getty 115200 hvc0’ | sudo tee -a xenial/etc/inittab
printf ‘\nauto eth0\niface eth0 inet dhcp\n’ | sudo tee -a xenial/etc/network/interfaces
echo ‘debugfs /sys/kernel/debug debugfs defaults 0 0’ | sudo tee -a xenial/etc/fstab
echo ‘debug.exception-trace = 0’ | sudo tee -a xenial/etc/sysctl.conf
sudo mkdir xenial/root/.ssh/
mkdir -p ssh
ssh-keygen -f ssh/id_rsa -t rsa -N ”
cat ssh/ | sudo tee xenial/root/.ssh/authorized_keys

# Install some misc packages.
sudo chroot xenial /bin/bash -c “export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; \
apt-get update; apt-get install –yes curl tar time strace”

# Build a disk image
dd if=/dev/zero of=xenial.img bs=5M seek=2047 count=1
mkfs.ext4 -F xenial.img
sudo mkdir -p /mnt/xenial
sudo mount -o loop xenial.img /mnt/xenial
sudo cp -a xenial/. /mnt/xenial/.
sudo mkdir -p /mnt/xenial/lib/modules/xxx/
sudo cp -a /lib/modules/xxx/. /mnt/xenial/lib/modules/xxx/.
sudo umount /mnt/xenial

b. compile the linux kernel, and this will generate a few files: vmlinux, initrd, and bzImage.

When compiling the kernel:

make will generate the vmlinux + bzImage file.

make install will generate the the initramfs.img file.

make modules_install will generate the kernel modules located in /lib/modules/xxx directory, which is used above.

c. boot it up with the correct option:

qemu-system-x86_64 -hda xenial.img -snapshot -m 2048 -net nic -net user,host=,hostfwd=tcp::53167-:22 -nographic -enable-kvm -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet -soundhw all -kernel /linux/arch/x86/boot/bzImage -append “console=ttyS0 root=/dev/sda debug earlyprintk=serial slub_debug=UZ” -initrd /boot/initramfs.img

From above we can see that the options choices are very great, which is why virt-manager is highly recommended to use, as it provides an interface for automatic generation of the different option easily:

Notice that vmlinux is not used above, but it is needed when kgdb debugging is needed:

%d bloggers like this: