TPM2 and Linux (Intel NUC + TPM2 + Linux)

Recovery of a lost or deleted virtual machine .vmx configuration file

Rule No 1 when using Vmware: never put it on a NTFS formatted drive and share it between Windows and Ubuntu. I have it twice corrupted – the VMX file – and the entire VM was lost.

Just put it entirely on Linux formatted drive (like ext4) will be much safer.

But luckily this time round I found this:

And using it I managed to salvaged the entire VM.

And the "vmdk" to be included into the new virtual machine has to be carefully selected: don’t select the "redo log files" instead, as explained below:

Understanding FPU usage in linux kernel

I am interested to learn how Linux kernel handle all the FPU registers (XMM, MMX, SSE, SSE2 etc). This is because it is a security opportunities for memory information leakage, if the values of these registers are not properly initialized. But on the other hand, these registers are so huge, that it will seriously slow down the kernel’s performance, should the context be saved and restored for all FPU registers, whenever there is a context switch.

To understand this, read this comment:

 * FPU context switching strategies:
 * Against popular belief, we don't do lazy FPU saves, due to the
 * task migration complications it brings on SMP - we only do
 * lazy FPU restores.
 * 'lazy' is the traditional strategy, which is based on setting
 * CR0::TS to 1 during context-switch (instead of doing a full
 * restore of the FPU state), which causes the first FPU instruction
 * after the context switch (whenever it is executed) to fault - at
 * which point we lazily restore the FPU state into FPU registers.
 * Tasks are of course under no obligation to execute FPU instructions,
 * so it can easily happen that another context-switch occurs without
 * a single FPU instruction being executed. If we eventually switch
 * back to the original task (that still owns the FPU) then we have
 * not only saved the restores along the way, but we also have the
 * FPU ready to be used for the original task.
 * 'lazy' is deprecated because it's almost never a performance win
 * and it's much more complicated than 'eager'.
 * 'eager' switching is by default on all CPUs, there we switch the FPU
 * state during every context switch, regardless of whether the task
 * has used FPU instructions in that time slice or not. This is done
 * because modern FPU context saving instructions are able to optimize
 * state saving and restoration in hardware: they can detect both
 * unused and untouched FPU state and optimize accordingly.

Image result for CR0 TS

In summary:

a. LAZY mode: FPU is not restored/saved all the time, but only when it is used, and the use of FPU will also reset a flag in CR0:TS, thus we don’t have to detect for FPU register usage all the time. But this mode is not the default, as the time save/performance enhanced is not significant, and the algorithm become very complex, thus increasing processing overheads.

b. EAGER mode: This is the default mode. FPU is always saved and restored for each context switch. But again there is hardware feature that can detect whether the long chain of FPU registers are used – and whichever are used, only that register will be saved/restored, and thus it is very hardware efficient.

In the kernel, arch/x86/:


has_fpu() will check via the following code whether there exists FPU being use:

if (cr0 & (X86_CR0_EM|X86_CR0_TS)) {

which is called by get_cpuflags():

void get_cpuflags(void)


if (has_fpu())

set_bit(X86_FEATURE_FPU, cpu.flags);

The following is a 208 patches in 2015 for FPU usage in kernel:

The instructions to save all FPU – XMM, MMX, SSE, SSE2 etc is called FXSAVE, FNSAVE, FSAVE:

and the overhead in linux kernel is benchmarked as 87 cycles.

These optimized way of saving can also be found in comments below:

 * When executing XSAVEOPT (or other optimized XSAVE instructions), if
 * a processor implementation detects that an FPU state component is still
 * (or is again) in its initialized state, it may clear the corresponding
 * bit in the header.xfeatures field, and can skip the writeout of registers
 * to the corresponding memory layout.
 * This means that when the bit is zero, the state component might still contain
 * some previous - non-initialized register state.

To detect that the kernel are triggered on FPU usage, we can set breakpoint on fpstate_sanitize_xstate in KGDB, and the kernel stacktrace are as follows:

Thread 441 hit Breakpoint 1, fpstate_sanitize_xstate (fpu=0xffff8801e7a2ea80) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/xstate.c:111
111 {
#0  fpstate_sanitize_xstate (fpu=0xffff8801e7a2ea80) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/xstate.c:111
#1  0xffffffff8103b183 in copy_fpstate_to_sigframe (buf=0xffff8801e7a2ea80, buf_fx=0x7f73ad4fe3c0, size=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/signal.c:178
#2  0xffffffff8102e207 in get_sigframe (frame_size=440, fpstate=0xffff880034dcbe10, regs=, ka=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:247
#3  0xffffffff8102e703 in __setup_rt_frame (regs=, set=, ksig=, sig=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:413
#4  setup_rt_frame (regs=, ksig=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:627
#5  handle_signal (regs=, ksig=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:671
#6  do_signal (regs=0xffff880034dcbf58) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:714
#7  0xffffffff8100320c in exit_to_usermode_loop (regs=0xffff880034dcbf58, cached_flags=4) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/entry/common.c:248
#8  0xffffffff81003c6e in prepare_exit_to_usermode (regs=) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/entry/common.c:283

You can use “info thread 441” (see above) to check which process is the stacktrace above corresponding to. Among them is “Xorg”, otherwise, majority of process does not use FPU.

From the stacktrace, “get_sigframe()” is the first function that seemed to analyze on FPU usage:

static void __user *
get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
             void __user **fpstate)
if (fpu->fpstate_active) {
        unsigned long fx_aligned, math_size;

        sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
        *fpstate = (struct _fpstate_32 __user *) sp;
        if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
                            math_size) < 0)
                return (void __user *) -1L;

So essentially what is happening here is copying the FPU information to userspace stack pointer (which is “sp”).

TO DO: Understanding Long Mode & “Self-referenced page table”?

Learning the core_pattern in linux kernel

The Core Pattern (core_pattern), or how to specify filename and path for core dumps | SIGQUIT

Operating Systems: File-System Implementation

A very good writeup on Filesystem Implementation internals:


Figure 12.11 – I/O without a unified buffer cache.


Figure 12.12 – I/O using a unified buffer cache.

  • Page replacement strategies can be complicated with a unified cache, as one needs to decide whether to replace process or file pages, and how many pages to guarantee to each category of pages. Solaris, for example, has gone through many variations, resulting in priority paging giving process pages priority over file I/O pages, and setting limits so that neither can knock the other completely out of memory.
  • Another issue affecting performance is the question of whether to implement synchronous writes or asynchronous writes. Synchronous writes occur in the order in which the disk subsystem receives them, without caching; Asynchronous writes are cached, allowing the disk subsystem to schedule writes in a more efficient order ( See Chapter 12. ) Metadata writes are often done synchronously. Some systems support flags to the open call requiring that writes be synchronous, for example for the benefit of database systems that require their writes be performed in a required order.
  • The type of file access can also have an impact on optimal page replacement policies. For example, LRU is not necessarily a good policy for sequential access files. For these types of files progression normally goes in a forward direction only, and the most recently used page will not be needed again until after the file has been rewound and re-read from the beginning, ( if it is ever needed at all. ) On the other hand, we can expect to need the next page in the file fairly soon. For this reason sequential access files often take advantage of two special policies:
    • Free-behind frees up a page as soon as the next page in the file is requested, with the assumption that we are now done with the old page and won’t need it again for a long time.
    • Read-ahead reads the requested page and several subsequent pages at the same time, with the assumption that those pages will be needed in the near future. This is similar to the track caching that is already performed by the disk controller, except it saves the future latency of transferring data from the disk controller memory into motherboard main memory.
  • The caching system and asynchronous writes speed up disk writes considerably, because the disk subsystem can schedule physical writes to the disk to minimize head movement and disk seek times. ( See Chapter 12. ) Reads, on the other hand, must be done more synchronously in spite of the caching system, with the result that disk writes can counter-intuitively be much faster on average than disk reads.

Linux kernel memory exploitation via PTE

%d bloggers like this: