How to setup libvirt or virtio in a custom kernel

While setting up a custom kernel for Ubuntu 14.04 LTS 64-bit to be able to run “virsh” / libvirt tools, many problems have been encountered (see references).

First install all the essential tools for Ubuntu the easy way first:

https://help.ubuntu.com/lts/serverguide/libvirt.html

sudo apt-get install qemu-kvm libvirt-bin
sudo apt-get install virtinst
sudo apt-get install qemu-system
sudo apt-get install virt-viewer

Next a custom kernel is needed, so download a stable kernel source from www.kernel.org. The existing config from current version of Ubuntu 14.04 is copied to “.config”, and the following additional changes made:

CONFIG_NETFILTER_XT_NAT=m
CONFIG_NF_NAT_MASQUERADE_IPV4=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_NF_NAT_MASQUERADE_IPV6=m

and then “make oldconfig” applied, before “make” to start the compilation. Then you will need to “sudo make modules_install” and “sudo make install”.

Reboot into the new kernel, and ensure that “nat” show up in /proc/net/ip_table_names. And ensure that “kvm” is listed in “lsmod”.

So to test the setup:

a. sudo virsh net-start default

b. sudo virsh list

c. sudo virsh sysinfo

d. sudo virsh pool-list

e. sudo virsh net-list –all

f. To create a guest VM, first create a file called “guest.xml”:

<domain type='kvm'>
 <name>guest</name>
 <uuid>f5fe9230-6ef3-4eec-af54-65363a68f3ce</uuid>
 <memory>524288</memory>
<currentMemory>524288</currentMemory>
<vcpu>1</vcpu>
<os>
 <type arch='x86_64' machine='pc-i440fx-1.5-qemu-kvm'>hvm</type>
 <boot dev='cdrom'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<clock offset='localtime'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='disk'>
 <source file='/var/lib/libvirt/images/guest.img'/>
 <target dev='hda' bus='ide'/>
</disk>
<disk type='file' device='cdrom'>
 <source file='/home/user/ubuntu1404_x86_64/ubuntu-14.04-desktop-amd64.iso'/>
 <target dev='hdc' bus='ide'/>
<readonly/>
</disk>
<interface type='network'>
<mac address='54:52:00:2a:58:0d'/>
<source network='default'/>
</interface>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
</devices>
</domain>

For each of the bold item above:

1. The unique number is generated by “uuidgen”.

2. For the machine type it must come from one of this items:

/usr/bin/qemu-system-x86_64 –machine ?

Different types of machine chosen may end up using “tcg” as the QEMU emulation mode, instead of “kvm” (which is based on hardware virtualization and thus much faster.

So beware – if you find the emulation abnormally slow. Just “ps -ef” and ensure that
the “accel=kvm” is displayed instead of “accel=tcg”.

3.    Create the guest image beforehand:

sudo qemu-img create -f qcow2 /var/lib/libvirt/images/guest.img 8192

4. The cdrom ISO is just the ISO downloaded.

Now issue “sudo virsh define guest.xml”. (what if you hate XML file, and are not sure how to setup the correct XML file for your version of libvirt,   which might be different from here? See below).

g. After “sudo virsh list” to ensure that the “guest” KVM image is listed, now do “sudo virsh start guest” to start the guest booting up from CDROM.

h. After starting the VM running: “ps -ef” to list the process:

libvirt+ 4737 1 5 11:44 ? 00:02:04 qemu-system-x86_64 -enable-kvm -name guest -S -machine pc-i440fx-1.5-qemu-kvm,accel=kvm,usb=off -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid f5fe9230-6ef3-4eec-af54-65363a68f3ce -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/guest.img,if=none,id=drive-ide0-0-0,format=raw -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/home/user/ubuntu1404_x86_64/ubuntu-14.04-desktop-amd64.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=54:52:00:2a:58:0d,bus=pci.0,addr=0x3 -vnc 127.0.0.1:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

Notice how the complexity of the qemu command line is now solved by properly entering the correct parameter in the “guest.xml” file.

i.   Finally, to connect to the running VM, you either can use “sudo virt-viewer <vm_name>” where <vm_name> is the name of the guest itself (which is “guest” in our case), or “sudo virt-viewer -c qemu:///system guest”.

j.    And “virsh shutdown guest” to shutdown the running VM.

k.   And “virsh destroy guest” to destroy the running VM.

How to setup virtio based on the libvirt infrastructure?

An architectural visualization of the virtio and its use in VM guest setup are as follows (http://www.openstack.cn/?p=580):

In summary, the diagram essentially means some of the I/O processing can pass directly from host into the guest without going through the QEMU event loop (which uses the /dev/kvm interface inside the host.   As shown in another diagram below (http://slides.com/braoru/kvm/fullscreen#/):

By default the following should be enable before compiling the custom kernel:

CONFIG_VHOST_NET=m
CONFIG_VHOST_SCSI=m
CONFIG_VHOST_RING=m
CONFIG_VHOST=m

After compiling and rebooting into the new kernel:

a. modprobe -c |grep vhost to search for all the virtio related kernel module (which essentially is just vhost, vhost_scci, vhost_net).

b. modprobe vhost, modprobe vhost_scsi, modprobe vhost_net to load the kernel module.

c. Use “sudo virt-host-validate” to check that the kernel setup:
QEMU: Checking for hardware virtualization                                 : PASS
QEMU: Checking for device /dev/kvm                                         : PASS
QEMU: Checking for device /dev/vhost-net                                   : PASS
QEMU: Checking for device /dev/net/tun                                     : PASS
LXC: Checking for Linux >= 2.6.26                                         : PASS

And to setup the virtio direct I/O between the host and the VM guest, you can follow through here:

https://easyengine.io/tutorials/kvm/enable-virtio-existing-vms/

How to setup the guest if no XML is given:

This is made possble by the package “virtinst” installed earlier.

And the command is:

sudo virt-install –virt-type qemu –arch x86_64 –machine ‘pc-i440fx-2.0’ –debug –name guest –ram 1024 –disk path=/var/lib/libvirt/images/guest.qcow2 –cdrom /home/user/ubuntu1404_x86_64/ubuntu-14.04-desktop-amd64.iso –boot cdrom

Again the machine type must come from one of those listed in:

/usr/bin/qemu-system-x86_64 --machine ?

And now “sudo virsh dumpxml guest” to extract out the XML file.

References:

How libvirt work internally:   https://libvirt.org/internals.html

http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html

http://www.linux-kvm.org/images/4/41/2011-forum-virtio_net_whatsnew.pdf

http://www.ibm.com/developerworks/library/l-virtio/l-virtio-pdf.pdf
From here:   http://dpdk.org/doc/guides/sample_app_ug/vhost.html

../_images/virtio_linux_vhost.png

More complex setup for virtio:

../_images/vhost_net_arch1.png

Architectural internals of vrtio:

https://jipanyang.wordpress.com/2014/10/27/virtio-guest-side-implementation-pci-virtio-device-virtio-net-and-virtqueue/

Common setup problems:
https://www.digitalocean.com/community/questions/problem-with-iptables

https://forums.gentoo.org/viewtopic-t-1009770.html?sid=7822e8eefcdb28edcedf9db7526b7b1e

http://stackoverflow.com/questions/21983554/iptables-v1-4-14-cant-initialize-iptables-table-nat-table-does-not-exist-d

http://serverfault.com/questions/593263/iptables-nat-does-not-exist/593289

Intel Processor Trace: How to use it

To be noted is that “Processor Trace” is a feature of recent Intel x86 processor (eg skylake):

https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing

First clone this:

https://github.com/01org/processor-trace

And do a “sudo make install” to install the libipt.so libraries.

Next do a clone of Andi Kleen pt tracing tool:

git clone github.com:andikleen/simple-pt

If you do “make install” is to make the kernel module, and then followed by “sudo insmod simple-pt.ko” to load it.

Next is to “make user” to make all the relevant binaries (see Makefile, which is essentially sptdump, fastdecode, sptdecode and ptfeature).

During compilation you make encounter some errors on missing files, and the following are the ones I need to install before successful “make user” (but your mileage may differ):

sudo apt-get install libelf-dev
 sudo apt-get install libdw-dev dwarfdump
 sudo apt-get install libdwarf-dev

There are other binaries like “rdmsr” which the tester program depends as well.

Next is to run tester as root: “sudo ./tester”, and the output is listed here:

http://pastebin.com/d3QeVNsV

And looking further into “stest.out”:

0 [+0] [+   1] native_write_msr_safe+18
 [+  31] trace_event_raw_event_msr+116 -> trace_event_buffer_reserve
 [+  21] trace_event_buffer_reserve+143 -> trace_event_buffer_lock_reserve
 [+  23] trace_event_buffer_lock_reserve+67 -> trace_buffer_lock_reserve
 [+  13] trace_buffer_lock_reserve+33 -> ring_buffer_lock_reserve
 [+  41] ring_buffer_lock_reserve+184 -> rb_reserve_next_event
 [+  36] rb_reserve_next_event+180 -> trace_clock_local
 [+   5] trace_clock_local+20 -> sched_clock
 [+   4] sched_clock+12 -> native_sched_clock
 [+   7] trace_buffer_lock_reserve+65 -> ring_buffer_event_data
 [+   4] ring_buffer_event_data+12 -> rb_event_data
 [+   6] trace_buffer_lock_reserve+90 -> tracing_generic_entry_update
 [+   7] trace_event_buffer_reserve+176 -> ring_buffer_event_data
 [+   4] ring_buffer_event_data+12 -> rb_event_data
 [+  11] trace_event_raw_event_msr+164 -> trace_event_buffer_commit
 [+  33] trace_event_buffer_commit+124 -> filter_check_discard
 [+  11] trace_event_buffer_commit+226 -> trace_buffer_unlock_commit
 [+  15] trace_buffer_unlock_commit+45 -> ring_buffer_unlock_commit
 [+  16] ring_buffer_unlock_commit+54 -> rb_update_write_stamp
 [+   7] trace_buffer_unlock_commit+115 -> ftrace_trace_userstack

It is generated by the sptdecode command:

sptdecode --sideband ${PREFIX}.sideband --pt ${PREFIX}.0 $DARGS > ${PREFIX}.out

The function “trace_event_raw_event_msr” is nowhere to be found inside simple-pt.c, nor part of the kernel symbols. But “trace_event_buffer_reserve” is part of the kernel symbol (sudo cat /proc/kallsyms |grep trace_event_buffer_reserve).

So now we shall disassemble the APIs shown in the stest.out from the live-running kernel (every is still read-only, so is safe, no modification is possible, but you will need root):

To see the APIs live in action, first identify the “vmlinux” where the kernel is build. Mine is build by myself, and so do this (sudo is needed before /proc/kcore is root read-only):

sudo gdb ./vmlinux /proc/kcore ==> gdb prompt.

And next is to identify the source directory of the simple-pt.ko kernel module, and its offset in memory:

sudo cat /proc/modules |grep simple
 simple_pt 61440 0 - Live 0xffffffffa1021000 (OE)
 

And so add the following inside the gdb prompt:

 add-symbol-file /home/tteikhua/simple-pt/simple-pt.ko 0xffffffffa1021000

Now you can list in assembly by name:

(gdb) x /10i trace_event_raw_event_msr
 0xffffffffa1022c70 <trace_event_raw_event_msr>: push %rbp
 0xffffffffa1022c71 <trace_event_raw_event_msr+1>: mov %rsp,%rbp
 0xffffffffa1022c74 <trace_event_raw_event_msr+4>: push %r15
 0xffffffffa1022c76 <trace_event_raw_event_msr+6>: push %r14
 0xffffffffa1022c78 <trace_event_raw_event_msr+8>: push %r13
<snip>
 0xffffffffa1022ccd <trace_event_raw_event_msr+93>: lea -0x58(%rbp),%rdi
 0xffffffffa1022cd1 <trace_event_raw_event_msr+97>: mov $0x20,%edx
 0xffffffffa1022cd6 <trace_event_raw_event_msr+102>: mov %r12,%rsi
 0xffffffffa1022cd9 <trace_event_raw_event_msr+105>: addq $0x1,0x7d27(%rip) # 0xffffffffa102aa08
 0xffffffffa1022ce1 <trace_event_raw_event_msr+113>: mov %ecx,-0x5c(%rbp)
 0xffffffffa1022ce4 <trace_event_raw_event_msr+116>: 
callq 0xffffffff81205060 <trace_event_buffer_reserve>  
0xffffffffa1022ce9 <trace_event_raw_event_msr+121>: addq $0x1,0x7d1f(%rip) # 0xffffffffa102aa10

From the “stest.out” output above, we can also see that the line by line output correspond to each “basic blocks” (https://en.wikipedia.org/wiki/Basic_block) in the assembly listing.

References:

https://lwn.net/Articles/576551/

https://lwn.net/Articles/584539/

https://lwn.net/Articles/654705/

https://lwn.net/Articles/648154/

How to create Xen HVM domU guest in Ubuntu 14.04 (not with xenbr0 but with legacy bridge)

I have so many network bridges in my system arising from legacies: docker0, br0, virbr0, and the last thing is to create another new xenbr0.   (just “ls -al /sys/class/net/*/* | grep ‘bridge:’” to identify all the bridges)

So how to setup the bridge networking for a HVM Xen domU guest using one of these bridges without xenbr0?

First, install Ubuntu 14.04 64-bit LTS. Next is to install the Xen hypervisor (see https://help.ubuntu.com/community/Xen):

sudo apt-get install xen-hypervisor-amd64

And after rebooting into Xen hypervisor, the dom0 should be running Ubuntu.   Open up a terminal and enter “sudo xl list” to confirm that dom0 is running.

Next is the bridge setup:

Inside /etc/xen/xl.conf, enter the name of the bridge to be used:

vif.default.bridge="virbr0"

and inside the /etc/xen/xen-config.sxp enter the following:

(network-script 'network-bridge bridge=virbr0')

And checking on the bridging:

sudo brctl addif virbr0 eth0

sudo brctl show

bridge name bridge id STP enabled interfaces
docker0 8000.33333 no
virbr0 8000.444444 yes eth0

And so now virbr0 is the bridge to be used by xen.

Do a “sudo service xen restart” and “sudo service xendomains restart” to restart all the services using the modified configuration.

Next is to create HVM domU running CentOS.

Copy the /etc/xen/xlexample.hvm sample config file (for HVM) into your home directory.

And inside this file modify the name, and memory must be at least 512 (MB) otherwise it will hang during installation (happened twice in two different machine for me). “vcpus” can set to 1 to minimize contention with the host. And for VGA console, I prefer “SDL=1”. For disk:

disk = [ 'file:/sda11/xenhvm.dd,xvda,rw', 'file:/sda11/CentOS-7-x86_64-DVD-1511.iso,hdc:cdrom,r' ]

where the file /sda11/xenhvm.dd is created via:

dd if=/dev/zero of=/sda11/xenhvm.dd bs=1M count 8196

which is 8GB in disk storage space. CentOS will need to be at least about 1.5GB, and 8 GB seemed to avoid all the latencies and slowness issues at runtime.

But if you want to pass an entire disk media directly to the domU, then specify it as:

'phys:/dev/sda, xvda,w' and inside the guest OS the /dev/sda will be accessed as /dev/xvda instead.

In summary the following file is the configuration file (xenhvmguest.cfg) after modification:

 ###vif = [ 'mac=00:16:3e:00:00:00,bridge=xenbr0' ]
# =====================================================================
 # Example HVM guest configuration
 # =====================================================================
 #
 # This is a fairly minimal example of what is required for an
 # HVM guest. For a more complete guide see xl.cfg(5)
# This configures an HVM rather than PV guest
 builder = "hvm"now virbr0 is the bridge.
# Guest name
 name = 'HVM_domU'
# 128-bit UUID for the domain as a hexadecimal number.
 # Use "uuidgen" to generate one if required.
 # The default behavior is to generate a new UUID each time the guest is started.
 #uuid = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
# Enable Microsoft Hyper-V compatibile paravirtualisation /
 # enlightenment interfaces. Turning this on can improve Windows guest
 # performance and is therefore recommended
 #viridian = 1
# Initial memory allocation (MB)
 memory = 512
# Maximum memory (MB)
 # If this is greater than `memory' then the slack will start ballooned
 # (this assumes guest kernel support for ballooning)
 #maxmem = 512
# Number of VCPUS
 vcpus = 1
# Network devices
 # A list of 'vifspec' entries as described in
 # docs/misc/xl-network-configuration.markdown
 vif = [ '' ]
# Disk Devices
 # A list of `diskspec' entries as described in
 # docs/misc/xl-disk-configuration.txt
 ## disk = [ '/dev/vg/guest-volume,raw,xvda,rw' ]
 disk = [ 'file:/sda11/xenhvm.dd,xvda,rw', 'file:/sda11/CentOS-7-x86_64-DVD-1511.iso,hdc:cdrom,r' ]
# Guest VGA console configuration, either SDL or VNC
 sdl = 1
 #vnc = 1
 #vnc = 1
 #vnclisten = '0.0.0.0'
 #vncdisplay = 1

Now is to create the domU using the above configuration file (xenhvmguest.cfg).

sudo xl list
sudo xl shutdown HVM_domU
sudo xl destroy HVM_domU
sudo xl create xenhvmguest.cfg
sudo xl list

Questions:

a. how to generate traces of calls from guest to host kernel?

b. How to do USB passthrough? PCI passthrough? SCSI passthrough?

How to use ftrace and gcov to understanding kernel scheduling?

The following described the steps for using FTRACE analysis:

First we need to mount debugfs for analysis:
sudo mkdir /debug
sudo mount -t debugfs nodev /debug
echo 0 >/debug/tracing/tracing_on
echo '*rt_mutex*' '*sched*' > /debug/tracing/set_ftrace_filter

You can experiment with yourself on this. Try putting different kernel API (found from /proc/kallsyms) and encapsulate it within the “*”.

And read it back with “cat /debug/tracing/set_ftrace_filter”.

echo function_graph >/debug/tracing/current_tracer
echo 1 >/debug/tracing/tracing_on

ls -alR /btrfs
cp /bin/l* /btrfs
sleep 3

Here is where you can issues command that makes a lot of calls to task switching logic, eg, repeated “sleep 1;ls /proc/”.

Notice that the procfs “/proc” is used – because procfs is not reading from cache, and actual traversal of the kernel is needed to generate all the displayed data, and thus will trigger a lot of kernel code path execution.

echo 0 >/debug/tracing/tracing_on ===> to stop tracing.
cat /debug/tracing/trace ==> to view the tracing output.

whether you have “stop tracing” or not does not matter.   If you have not, the tracing output will be different each time you “cat trace”.

 3)   0.371 us    |  _cond_resched();
 3)   0.278 us    |  resched_curr();
 3)   0.132 us    |  _cond_resched();
 3)               |  schedule_timeout() {
 3)               |    schedule() {
 3)   0.175 us    |      rcu_sched_qs();
 ------------------------------------------
 3)  btrfs-t-873   =>    <idle>-0
 ------------------------------------------
 3) + 22.614 us   |    }
 3) + 24.023 us   |  } /* schedule_preempt_disabled */
 3)   3.136 us    |  tick_nohz_stop_sched_tick();
 3)   0.150 us    |  sched_idle_set_state();
 0)   0.284 us    |  sched_idle_set_state();
 0)               |  tick_nohz_restart_sched_tick() {
 0)   0.267 us    |    sched_avg_update();
 0)   6.895 us    |  }
 0)   0.282 us    |  sched_ttwu_pending();
 0)               |  schedule_preempt_disabled() {
 0)               |    schedule() {
 0)   0.222 us    |      rcu_sched_qs();

The number in front represent the CPU core number, and thus the indentation does not look very logical, as it is quite messed up.   Moreover, only a very small subset of API have been instrumented.

Let us instrument all the API which is available to us for instrumentation:

echo '*' > /sys/kernel/debug/tracing/set_ftrace_filter

And now filtering out to leave behind only those instructions executing on CPU core 0 my machine (i7 8-core) – also called the “boot cpu” as it is the first core that got started at system bootup time:

0)               |  call_cpuidle() {
 0)               |    cpuidle_enter() {
 0)               |      cpuidle_enter_state() {
 0)   0.058 us    |        sched_idle_set_state();
 0)   0.093 us    |        ktime_get();
 0)               |        intel_idle() {
 0)   0.455 us    |          leave_mm();
 0) # 5151.052 us |        }
 0)   0.223 us    |        ktime_get();
 0)   0.266 us    |        sched_idle_set_state();
 0)               |        smp_reschedule_interrupt() {
 0)               |          scheduler_ipi() {
 0)               |            irq_enter() {
 0)               |              rcu_irq_enter() {
 0)   0.226 us    |                rcu_sysidle_exit();
 0)   2.197 us    |              }
 0)               |              tick_irq_enter() {
 0)   0.058 us    |                tick_check_oneshot_broadcast_this_cpu();
 0)   0.195 us    |                ktime_get();
 0)               |                tick_nohz_stop_idle() {
 0)               |                  update_ts_time_stats() {
 0)   0.238 us    |                    nr_iowait_cpu();
 0)   2.658 us    |                  }
 0)   0.174 us    |                  touch_softlockup_watchdog();
 0)   7.120 us    |                }
 0)   0.114 us    |                tick_do_update_jiffies64();
 0)   0.186 us    |                touch_softlockup_watchdog();
 0) + 17.748 us   |              }
 0)               |              _local_bh_enable() {
 0)   0.172 us    |                __local_bh_enable();
 0)   2.071 us    |              }
 0) + 27.292 us   |            }
 0)   0.157 us    |            sched_ttwu_pending();
 0)               |            irq_exit() {
 0)               |              __do_softirq() {
 0)               |                run_rebalance_domains() {
 0)   0.122 us    |                  idle_cpu();
 0)   0.137 us    |                  _raw_spin_lock_irq();
 0)   0.181 us    |                  update_rq_clock();
 0)               |                  rebalance_domains() {
 0)               |                    update_blocked_averages() {
 0)   0.127 us    |                      _raw_spin_lock_irqsave();
 0)   0.319 us    |                      update_rq_clock();
 0)   0.166 us    |                      __compute_runnable_contrib();
 0)   0.137 us    |                      __compute_runnable_contrib();
 0)   0.107 us    |                      __compute_runnable_contrib();
 0)   0.041 us    |                      __compute_runnable_contrib();
 0)   0.086 us    |                      __compute_runnable_contrib();

I have not understand fully the internals of scheduling yet, and so will not making much comments here.

But safe to say in any scheduling logic, you need:

a.  CPU switching logic:   being able to dequeue one task from one CPU and enqueue it on another CPU.

b.  load balancing logic:   all CPU have its own task queue running independently of the other CPU, and all execution of one CPU is highly indedepent on another CPU as well.   So sometimes you will need to examine the queue length to rebalance the queues.
c.  IPI (Inter-processor interrupt): this is the hardware signal between CPU to interrupt other CPU get its attention as one CPU has something to talk to another over the interprocessor bus.   It may be a one-to-one cross-switching bus or a broadcast mechanism.
d.   spinlocks:   Every CPU have its own independent timer, interrupt tables and and so on, but when reading from memory, it is a common shared memory (SMP architecture).  And so every read/write operation on common memory have to be locked with a spinlock structure to avoid collision.
e.   Process accounting:   each execution of the process have to update some resources usage information for that process.
And so below are the snapshots of events happening on CPU 0:
24726  0)               |          schedule() {
 24727  0)               |            rcu_note_context_switch() {
 24728  0)   0.064 us    |              rcu_sched_qs();
 24729  0)   2.628 us    |            }
 24730  0)   0.219 us    |            _raw_spin_lock_irq();
 24731  0)               |            deactivate_task() {
 24732  0)   0.235 us    |              update_rq_clock();
 24733  0)               |              dequeue_task_fair() {
 24734  0)               |                dequeue_entity() {
 24735  0)               |                  update_curr() {
 24736  0)   0.203 us    |                    update_min_vruntime();
 24737  0)   0.307 us    |                    cpuacct_charge();
 24738  0)   6.199 us    |                  }
 24739  0)   0.187 us    |                  __compute_runnable_contrib();
 24740  0)   0.157 us    |                  clear_buddies();
 24741  0)   0.284 us    |                  account_entity_dequeue();
 24742  0)   0.165 us    |                  update_min_vruntime();
 24743  0)               |                  update_cfs_shares() {
 24744  0)               |                    update_curr() {
 24745  0)   0.122 us    |                      __calc_delta();
 24746  0)   0.332 us    |                      update_min_vruntime();
 24747  0)   5.724 us    |                    }
 24748  0)   0.336 us    |                    account_entity_dequeue();
 24749  0)   0.110 us    |                    account_entity_enqueue();
 24750  0) + 13.877 us   |                  }
 24751  0) + 37.606 us   |                }
 24752  0)               |                dequeue_entity() {
 24753  0)   0.202 us    |                  update_curr();
 24754  0)   0.089 us    |                  clear_buddies();
 24755  0)   0.092 us    |                  account_entity_dequeue();
 24756  0)   0.054 us    |                  update_min_vruntime();
 24757  0)   0.289 us    |                  update_cfs_shares();
 24758  0) + 14.732 us   |                }
 24759  0)   0.168 us    |                hrtick_update();
 24760  0) + 59.122 us   |              }
 24761  0) + 65.030 us   |            }
 24762  0)               |            pick_next_task_fair() {
 24763  0)   0.067 us    |              __msecs_to_jiffies();

And below shows an Xorg triggered to go into idling mode:

24775  0)   Xorg-2305    =>    <idle>-0
 24776  0)   0.119 us    |      __switch_to_xtra();
 24777  0)   0.394 us    |      finish_task_switch();
 24778  0) ! 220.623 us  |    } /* schedule */
 24779  0) ! 223.059 us  |  } /* schedule_preempt_disabled */
 24780  0)               |  tick_nohz_idle_enter() {
 24781  0)   0.072 us    |    set_cpu_sd_state_idle();
 24782  0)               |    __tick_nohz_idle_enter() {
 24783  0)   0.150 us    |      ktime_get();
 24784  0)               |      tick_nohz_stop_sched_tick() {
 24785  0)   0.151 us    |        rcu_needs_cpu();
 24786  0)               |        get_next_timer_interrupt() {
 24787  0)   0.160 us    |          _raw_spin_lock();
 24788  0)               |          hrtimer_get_next_event() {
 24789  0)   0.305 us    |            _raw_spin_lock_irqsave();
 24790  0)   0.259 us    |            _raw_spin_unlock_irqrestore();
 24791  0)   6.779 us    |          }
 24792  0) + 11.664 us   |        }
 24793  0) + 17.438 us   |      }
 24794  0) + 22.068 us   |    }
 24795  0) + 27.167 us   |  }

And here is the logic for picking the next task during idling mode:

4765   0)               |            pick_next_task_idle() {
 24766  0)               |              put_prev_task_fair() {
 24767  0)               |                put_prev_entity() {
 24768  0)   0.208 us    |                  check_cfs_rq_runtime();
 24769  0)   3.249 us    |                }
 24770  0)               |                put_prev_entity() {
 24771  0)   0.106 us    |                  check_cfs_rq_runtime();
 24772  0)   3.038 us    |                }

And here is the timer statistics updating, which ultimately even propagated to the KVM kernel modules which maintained timer information for its guest through the notifier call chain mechanism (which is not hardcoded into the kernel source code, and so reading the kernel source code directly you will not see the callout directly):

 1081  0)               |                update_wall_time() {
 1082  0)   0.184 us    |                  _raw_spin_lock_irqsave();
 1083  0)   0.139 us    |                  ntp_tick_length();
 1084  0)   0.142 us    |                  ntp_tick_length();
 1085  0)   0.149 us    |                  ntp_tick_length();
 1086  0)               |                  timekeeping_update() {
 1087  0)   0.142 us    |                    ntp_get_next_leap();
 1088  0)   0.232 us    |                    update_vsyscall();
 1089  0)               |                    raw_notifier_call_chain() {
 1090  0)               |                      notifier_call_chain() {
 1091  0)   0.385 us    |                        pvclock_gtod_notify [kvm]();
 1092  0)   1.865 us    |                      }
 1093  0)   3.014 us    |                    }
 1094  0)   0.155 us    |                    update_fast_timekeeper();
 1095  0)   0.152 us    |                    update_fast_timekeeper();
 1096  0)   8.926 us    |                  }
 1097  0)   0.220 us    |                  _raw_spin_unlock_irqrestore();
 1098  0) + 16.559 us   |                }
 1099  0) + 21.748 us   |              }
 1100  0)   0.133 us    |              touch_softlockup_watchdog();
And here is the APIC timer interrupt triggered processing –
 1113  0)               |          local_apic_timer_interrupt() {
 1114  0)               |            hrtimer_interrupt() {
 1115  0)   0.139 us    |              _raw_spin_lock();
 1116  0)   0.257 us    |              ktime_get_update_offsets_now();
 1117  0)               |              __hrtimer_run_queues() {
 1118  0)   0.860 us    |                __remove_hrtimer();
 1119  0)               |                tick_sched_timer() {
 1120  0)   0.194 us    |                  ktime_get();
 1121  0)               |                  tick_sched_do_timer() {
 1122  0)   0.162 us    |                    tick_do_update_jiffies64();
 1123  0)   1.462 us    |                  }
 1124  0)               |                  tick_sched_handle.isra.14() {
 1125  0)   0.136 us    |                    touch_softlockup_watchdog();
 1126  0)               |                    update_process_times() {
 1127  0)   0.328 us    |                      account_process_tick();
 1128  0)   0.150 us    |                      hrtimer_run_queues();
 1129  0)   0.251 us    |                      raise_softirq();
 1130  0)               |                      rcu_check_callbacks() {
 1131  0)   0.166 us    |                        rcu_sched_qs();
 1132  0)   0.199 us    |                        cpu_needs_another_gp();
 1133  0)               |                        invoke_rcu_core() {
 1134  0)   0.199 us    |                          raise_softirq();
 1135  0)   1.406 us    |                        }
 1136  0)   5.365 us    |                      }
 1137  0)               |                      scheduler_tick() {
 1138  0)   0.191 us    |                        _raw_spin_lock();
 1139  0)   0.336 us    |                        update_rq_clock();
 1140  0)   0.152 us    |                        task_tick_idle();
 1141  0)               |                        update_cpu_load_active() {
 1142  0)               |                          __update_cpu_load() {
 1143  0)   0.140 us    |                            sched_avg_update();
 1144  0)   1.587 us    |                          }
 1145  0)   2.719 us    |                        }
 1146  0)   0.178 us    |                        calc_global_load_tick();
 1147  0)               |                        trigger_load_balance() {
 1148  0)   0.189 us    |                          raise_softirq();
 1149  0)   1.386 us    |                        }
 1150  0) + 11.249 us   |                      }
 1151  0)   0.353 us    |                      run_posix_cpu_timers();
 1152  0) + 25.259 us   |                    }
 1153  0)   0.151 us    |                    profile_tick();
 1154  0) + 28.624 us   |                  }
 1155  0) + 33.276 us   |                }
 1156  0)   0.142 us    |                _raw_spin_lock();
 1157  0) + 37.941 us   |              }
 1158  0)   0.173 us    |              __hrtimer_get_next_event();
 1159  0)               |              tick_program_event() {
 1160  0)               |                clockevents_program_event() {
Now using gcov analysis:
cd /home/tthtlc/linux_latest
 gcov -o /debug/gcov/home/tthtlc/linux_latest/kernel/sched/ core.c

And if you view the “*.gcov” files generated in the present directory:

 -:   41:#ifdef __ARCH_HAS_VTIME_TASK_SWITCH
 -:   42:extern void vtime_task_switch(struct task_struct *prev);
 -:   43:#else
 -:   44:extern void vtime_common_task_switch(struct task_struct *prev);
 1459453:   45:static inline void vtime_task_switch(struct task_struct *prev)
 -:   46:{
 1461467:   47:    if (vtime_accounting_enabled())
   #####:   48:        vtime_common_task_switch(prev);
 1461467:   49:}
       -:   50:#endif /* __ARCH_HAS_VTIME_TASK_SWITCH */
 -:   51:
 -:   52:extern void vtime_account_system(struct task_struct *tsk);
 -:   53:extern void vtime_account_idle(struct task_struct *tsk);
 -:   54:extern void vtime_account_user(struct task_struct *tsk);
 -:   55:
 -:   56:#ifdef __ARCH_HAS_VTIME_ACCOUNT
 -:   57:extern void vtime_account_irq_enter(struct
A lot of these are “####” meaning it is not executed, if not the number represent the times it has been executed like line 47-48 above.

And if you sort it you can get the highest frequently called lines inside core.c:

grep -v "\-:" core.c.gcov |grep -v "####"|sort -n
 
 1681581:  565:}
 1701534:  545:void wake_up_q(struct wake_q_head *head)
 1701534:  547:    struct wake_q_node *node = head->first;
 1788519: 1680:        for_each_domain(this_cpu, sd) {
 1871785: 3569:    struct rq *rq = cpu_rq(cpu);
 1871785: 3571:    if (rq->curr != rq->idle)
 2012652: 1972:    while (p->on_cpu)
 2181250: 1032:        for_each_class(class) {
 2232245: 1046:    if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
 2303005: 3192:    if (!tsk->state || tsk_is_pi_blocked(tsk))
 2396808:  104:    if (rq->clock_skip_update & RQCF_ACT_SKIP)
 2396808:   98:void update_rq_clock(struct rq *rq)
 2910568: 1033:            if (class == rq->curr->sched_class)
 2911445: 1035:            if (class == p->sched_class) {
 2912432: 3052:    for_each_class(class) {
 2912432: 3053:        p = class->pick_next_task(rq, prev);
 2915415: 3054:        if (p) {
 3840916:  549:    while (node != WAKE_Q_TAIL) {
23406406: 4615:int __sched _cond_resched(void)
23406406: 4617:    if (should_resched(0)) {

This will not tell you much about the logic of scheduling, but perhaps only the bottleneck inside core.c.

How to fix docker’s devicemapper error? And how to estimate storage space requirement of docker?

The command line “sudo service docker restart” will never bring up any docker daemon running.

So reading the docker logs (as root):

# cat /var/log/upstart/docker.log

Getting lots of such errors:

plicitly choose storage driver (-s <DRIVER>)
/var/run/docker.sock is up
API listen on /var/run/docker.sock
Error starting daemon: error initializing graphdriver: “/var/lib/docker” contains other graphdrivers: devicemapper; Please cleanup or explicitly choose storage driver (-s <DRIVER>)
/var/run/docker.sock is up
API listen on /var/run/docker.sock
Error starting daemon: error initializing graphdriver: “/var/lib/docker” contains other graphdrivers: devicemapper; Please cleanup or explicitly choose storage driver (-s <DRIVER>)

And according to:

https://github.com/docker/docker/issues/14035

Issue is devicemapper directory:

# cd /var/lib/docker/devicemapper/

ls -al
total 16
drwx——  4 root root 4096 Nov  7 07:06 .
drwx—— 11 root root 4096 Nov 22 21:17 ..
drwx——  2 root root 4096 Nov  7 07:06 devicemapper
drwx——  2 root root 4096 Nov 22 21:15 metadata

So I deleted devicemapper:

# mv /var/lib/docker/devicemapper /tmp

And followed by restart docker daemon:   sudo service docker restart

And checking info (as non-root):

docker info
Containers: 6
Images: 56
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 68
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-68-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 4
Total Memory: 14.57 GiB
Name: mamapapa
ID: BU43:2QUH:WCT4:66EN:MU2Y:EFVO:RM2M:WAMC:RL5S:F3H4:PLG6:XVO7
WARNING: No swap limit support

Next checking for images:

docker images
ubuntu              latest              91e54dfb1179        3 months ago        188.4 MB
<none>              <none>              07f8e8c5e660        7 months ago        188.3 MB

Seemed working now.

But where are all the repository files located?
Looking into:
/var/lib/docker/containers# ls -al
total 32
3467d66231f5f8e120510fd19ec6566f97d9d83e0d00dc41c655272c949bac78
38cc37c2512a22d9ad59b4d332aa417f856fee1ed5fd2c9c68058c9ccfb5b831
58f980b406beaacf48d101c76e9b698f943a0f402ba06883beaf37abe27fb36e
7bf91b8b6aa8c43ab18482b2c38af6057cff0977ace84fe8af16008664a2299e
a3d217cfcf5e7bfc7036bcab9973b71781e7810d3add9f783e88970a5e8dda73
ea7bd4d2185611e8b5107bb3e890a07a508df65d2923054b0721e4c6594dda25
The above six directories matches the output of “docker ps -a”.
But the bulk of storage space still lies in aufs:
du -k -d 1
4       ./volumes
1424    ./containers
2520    ./graph
8       ./trust
3349732    ./aufs
40       ./network
24       ./execdriver
4        ./tmp
39156    ./init
3392932  .
And in /var/lib/docker/aufs:
.
├── diff
│   ├── 06b53d536bb553efd9338ce705022dfbd7ae7415a8f4bbfaad4405f91e2d5ae3
│   │   ├── root
│   │   ├── usr
│   │   │   └── local
│   │   │       └── bin
│   │   │           ├── rackup
│   │   │           └── tilt
│   │   └── var
│   │       └── lib
│   │           └── gems
│   │               └── 1.9.1
│   │                   ├── cache
│   │                   │   ├── rack-1.6.4.gem
│   │                   │   ├── rack-protection-1.5.3.gem
│   │                   │   ├── sinatra-1.4.6.gem
│   │                   │   └── tilt-2.0.1.gem
│   │                   ├── doc
│   │                   │   ├── rack-1.6.4
│   │                   │   │   ├── rdoc
│   │                   │   │   │   ├── created.rid
│   │                   │   │   │   ├── FCGI
│   │                   │   │   │   │   └── Stream.html
│   │                   │   │   │   ├── FCGI.html
│   │                   │   │   │   ├── images

There you go, it is a list of all files on the specific containers.

Get a list of all the containers that exists:    docker ps -a

And then restarting one of them:

docker restart ea7bd4d21856

And followed by counting all the files within:
docker exec ea7bd4d21856 “ls” “-R” |wc

  51868   47195  679193

This offers another way to count the number of files.

 

Getting started with Mean.IO

As a start, the webpage at http://mean.io/#!/ is not really helpful.

But upon reading this page:

https://travismaynard.com/writing/getting-started-with-gulp

So summarizing here:

After downloading “node” from https://nodejs.org/ and locating it in /opt, create a softlink “nodejs” to point to node, nodejs and npm:

lrwxrwxrwx 1 root root 34 Nov 17 11:10 npm -> /opt/node-v4.2.2-linux-x64/bin/npm
lrwxrwxrwx 1 root root 35 Nov 17 11:10 node -> /opt/node-v4.2.2-linux-x64/bin/node
lrwxrwxrwx 1 root root 35 Nov 17 11:10 nodejs -> /opt/node-v4.2.2-linux-x64/bin/node

Next checking its version:

node -v
nodejs -v

Now using npm to install other dependencies globally:

sudo npm install -g jshint gulp-jshint gulp-sass gulp-concat gulp-uglify gulp-rename –save-dev
sudo npm install -g mean-cli
sudo npm install -g bower
sudo npm install -g gulp

Before the following operation, go to the directory where “mynewapp” will be created:

mean init mynewapp

cd mynewapp

npm install
bower install
gulp

A server running at port 3000 will be started after “gulp”.

The entire Mean.IO stack is installed successfully.

The installaion log is here: http://pastebin.com/j6Mm2EDi

How to build Android SDK samples using Gradle?

First, my environment is Ubuntu 14.04 LTS, and I would like to compile Android SDK samples. The present write-up is a follow-up to this:

https://tthtlc.wordpress.com/2015/08/01/how-to-quickly-compile-all-the-android-samples-via-command-line/

where compilation is done using the traditional “ant”, at present is a version using “gradle”.

First go to the Android sample directory and issue the compilation command:

cd /opt/android-sdk-linux/samples/android-23/testing/ActivityInstrumentation

./gradlew build

Got error, and so I created a file (as requested by error) called “local.properties” with the following content (just one liner):

sdk.dir=/opt/android-sdk-linux

And recompile, and got error again.

FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ‘:Application:compileDebugJava’.
> invalid source release: 1.7

Meaningless error message, but a intuition just arrive, checking java version:

java -version
java version “1.6.0_45”
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

Possibly this version is not right, though it is the right version (in the past, not sure now?) for compiling AOSP. And anyway, after updating to java7:


export PATH=/usr/lib/jvm/java-7-openjdk-amd64/bin:$PATH

(where the above “java-7-openjdk” comes from is via “sudo apt-get install openjdk-7-jdk”.)

now the entire sample builds to completion without error:


./gradlew build
:Application:preBuild UP-TO-DATE
:Application:preDebugBuild UP-TO-DATE
:Application:checkDebugManifest
:Application:preReleaseBuild UP-TO-DATE
[snip]
:Application:compileReleaseUnitTestSources UP-TO-DATE
:Application:assembleReleaseUnitTest
:Application:testRelease
:Application:test
:Application:check
:Application:build

BUILD SUCCESSFUL

Of course, if you want to debug what are the root cause of error, another way is to generate verbose output with the following command:

./gradlew build –stacktrace –info –debug

Example 2

First cd to directory:


cd /opt/android_sdk/samples/android-22/renderscript/BasicRenderScript

Issue the command (after ensuring that “gradlew” command exists):


“./gradlew build”

:Application:preBuild
:Application:compileDebugNdk UP-TO-DATE
[snip]
:Application:assembleRelease UP-TO-DATE
:Application:assemble UP-TO-DATE
:Application:compileLint
:Application:lint
Ran lint on variant debug: 49 issues found
Ran lint on variant release: 49 issues found
Wrote HTML report to file:/sda7/android_sdk/samples/android-22/renderscript/BasicRenderScript/Application/build/outputs/lint-results.html
Wrote XML report to /sda7/android_sdk/samples/android-22/renderscript/BasicRenderScript/Application/build/outputs/lint-results.xml
:Application:check
:Application:build

BUILD SUCCESSFUL

Total time: 19.919 secs

In between above there were some errors, corrections of which are shown below:

In the file “/opt/android_sdk/samples/android-22/renderscript/BasicRenderScript/Application/build.gradle”:


39 lintOptions {
40 abortOnError false
41 }

was inserted inside the android {} tag:

35 android {
36 compileSdkVersion 21
37 buildToolsVersion "21.0.1"
38
39 lintOptions {
40 abortOnError false
41 }
42
43 defaultConfig {
44 minSdkVersion 8
45 targetSdkVersion 20
46 }

But many times, the lint errors is not supposed to be ignored by 
adding "abortOnError false" inside build.gradle file.   For example is the error below:

ls -1 /sda7/android_sdk/samples/android-22/renderscript/BasicRenderScript/Application/src/main/res/values/

attrs.xml
base-strings.xml
colors.xml
styles.xml
template-dimens.xml
template-styles.xml

There is a bug in base-strings.xml, just ensure that there are no aprostrophe inside the CDATA tags, or specifically the following (aprotrophe has been removed):

20 <![CDATA[
21
22
23             This sample demonstrates using RenderScript to perform basic image manipulation. Specifically, it allows users
24             to dynamically adjust the saturation for an image using a slider. A custom RenderScript kernel performs the saturation
25             adjustment, running the computation on the device GPU or other compute hardware as deemed appropriate by the system.
26
27
28         ]]>

Using the above method I was able to compile successfully 84 of the samples under the “Android-22” directory in “samples”, and 19 failures, which mostly are attributable to lint error:

http://pastebin.com/33JBDAqj

 

Follow

Get every new post delivered to your Inbox.

Join 25 other followers

%d bloggers like this: