Archive for November, 2018

How to run 32-bit docker inside 64-bit host, and enabling strace (and other APIs)

As documented here, there is a long list of system call disabled by the seccomp mechanism in Docker:

https://docs.docker.com/engine/security/seccomp/

A few of these system call disabled by default are listed below:

Syscall Description
bpf Deny loading potentially persistent bpf programs into kernel, already gated by CAP_SYS_ADMIN.
clock_adjtime Time/date is not namespaced. Also gated by CAP_SYS_TIME.
clock_settime Time/date is not namespaced. Also gated by CAP_SYS_TIME.
clone Deny cloning new namespaces. Also gated by CAP_SYS_ADMIN for CLONE_* flags, except CLONE_USERNS.
create_module Deny manipulation and functions on kernel modules. Obsolete. Also gated by CAP_SYS_MODULE.
delete_module Deny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
finit_module Deny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
get_kernel_syms Deny retrieval of exported kernel and module symbols. Obsolete.
get_mempolicy Syscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE.
init_module Deny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
ioperm Prevent containers from modifying kernel I/O privilege levels. Already gated by CAP_SYS_RAWIO.
iopl Prevent containers from modifying kernel I/O privilege levels. Already gated by CAP_SYS_RAWIO.
kcmp Restrict process inspection capabilities, already blocked by dropping CAP_PTRACE.
kexec_file_load Sister syscall of kexec_load that does the same thing, slightly different arguments. Also gated by CAP_SYS_BOOT.
kexec_load Deny loading a new kernel for later execution. Also gated by CAP_SYS_BOOT.
keyctl Prevent containers from using the kernel keyring, which is not namespaced.
lookup_dcookie Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by CAP_SYS_ADMIN.
mbind Syscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE.
mount Deny mounting, already gated by CAP_SYS_ADMIN.
move_pages Syscall that modifies kernel memory and NUMA settings.
name_to_handle_at Sister syscall to open_by_handle_at. Already gated by CAP_SYS_NICE.
nfsservctl Deny interaction with the kernel nfs daemon. Obsolete since Linux 3.1.
open_by_handle_at Cause of an old container breakout. Also gated by CAP_DAC_READ_SEARCH.
perf_event_open Tracing/profiling syscall, which could leak a lot of information on the host.
personality Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns.
pivot_root Deny pivot_root, should be privileged operation.
process_vm_readv Restrict process inspection capabilities, already blocked by dropping CAP_PTRACE.
process_vm_writev Restrict process inspection capabilities, already blocked by dropping CAP_PTRACE.
ptrace Tracing/profiling syscall, which could leak a lot of information on the host. Already blocked by dropping CAP_PTRACE.

Program like gdb and strace are all using ptrace() to debug another program. Without ptrace() syscall, there is no way to do gdb inside docker container.

To bypass the seccomp restriction:

docker run –rm -it –security-opt seccomp=unconfined 32bit/ubuntu:bionic bash

The above will be running a 32-bit docker container inside the existing 64-bit host.

After installing the appropriate tools, we can do strace:

root@68ba6b846014:~# apt-get update
root@68ba6b846014:~# apt-get install strace ltrace

root@68ba6b846014:~# strace /bin/ls

execve("/bin/ls", ["/bin/ls"], 0xffb2e750 /* 11 vars */) = 0
brk(NULL) = 0x580ed000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7767000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=11433, …}) = 0
mmap2(NULL, 11433, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf7764000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/i386-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0L\0\0004\0\0\0"…, 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=169960, …}) = 0
mmap2(NULL, 179612, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf7738000
mmap2(0xf7761000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0xf7761000
mmap2(0xf7763000, 3484, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf7763000
close(3) = 0

How to run PPC binaries in docker

Using a QEMU solution:

https://tthtlc.wordpress.com/2018/09/28/how-to-install-ppc-based-ubuntu-inside-the-x86-64-environment/

is one option, but performance is not good. Docker offer possibly 6 times faster performance (ie, 6 docker container = 1 VMware guest) but will require more elaborate setup:

First download the QEMU PPC emulator:

wget https://github.com/multiarch/qemu-user-static/releases/download/v2.7.0/qemu-ppc64le-static.tar.gz

Unpack it into the $HOME directory:

tar tvfz qemu-ppc64le-static.tar.gz

Then pull the PPC64LE image:

docker pull ppc64le/ubuntu:latest

This is a key step – to be repeated each time the host machine is rebooted:

docker run –rm –privileged multiarch/qemu-user-static:register

What it does is to map the types of the binaries to the corresponding QEMU instruction emulator – for PPC the instruction emulator “qemu-ppc64le-static” will be used, and it is available at the $HOME directory as unpacked earlier.

Next start the container:

docker run -v ~/qemu-ppc64le-static:/usr/bin/qemu-ppc64le-static -it ppc64le/ubuntu

After knowing the container ID (via docker ps), we can also start another alternative screen into the same container (assuming 16b74063bd84 is the container ID):

docker exec -it 16b74063bd84 bash

Should the host be rebooted, you just need to repeat the following steps to restart the same container again:

docker run –rm –privileged multiarch/qemu-user-static:register
docker start 16b74063bd84

docker exec -it 16b74063bd84 bash

And verifying it is PPC:

Selection_014

Learning about mbox: virtualization via ptrace()

http://cursuri.cs.pub.ro/pipermail/oss/2014-February/000163.html

https://github.com/tsgates/mbox

https://gts3.org/assets/papers/2013/kim:mbox-slides.pdf

https://news.ycombinator.com/item?id=7214419

https://pdos.csail.mit.edu/archive/mbox/

https://people.csail.mit.edu/nickolai/papers/kim-mbox.pdf

https://www.usenix.org/node/174518

https://www.usenix.org/system/files/conference/atc13/atc13-kim.pdf

A criticism of getting GPU system calls

https://arxiv.org/pdf/1705.06965.pdf

Why is this not a good idea:

1. GPU vary across different capabilities. The decision to use the different capabilities should be done at the userspace, not in the kernel.

2. Supposed if you have multiple GPU, then you will need multiple ways of interfacing with different GPU done at the userspace level, and not a single GPU system to interact across all the different GPU. The latter will have a simple userspace program, but a highly complex kernel to divert processing to different GPUs.

In general hardware choices decisions should be made at userspace level, so long as it has nothing to do with security. The kernel should be as thin as possible, with minimal coding and only deals with anything security / concurrency related.

The way to treat GPU should be similar to that of PCI and all other hardware – enabling reusability of kernel source code. Introducing any dissimilarity into the treatment of hardware will lead to exponential increase in kernel code complexity, for example, one new system call has to worry about 1×300 ways of how this system call can interact/interface with all other system call, how it can share the system operation structures containing all the different function pointers etc.

Vickblöm

Research scattered with thoughts, ideas, and dreams

Penetration Testing Lab

Offensive Techniques & Methodologies

Astr0baby's not so random thoughts _____ rand() % 100;

@astr0baby on Twitter for fresh randomness

The Data Explorer

playing around with open data to learn some cool stuff about data analysis and the world

Conorsblog

Data | ML | NLP | Python | R

quyv

Just a thought

IFT6266 - H2017 Deep Learning

A Graduate Course Offered at Université de Montréal

Deep Learning IFT6266-H2017 UdeM

Philippe Paradis - My solutions to the image inpainting problem

IFT6266 – H2017 DEEP LEARNING

Pulkit's thoughts on the course project

Thomas Dinsmore's Blog

No man but a blockhead ever wrote except for money -- Samuel Johnson

the morning paper

a random walk through Computer Science research, by Adrian Colyer

The Spectator

Shakir's Machine Learning Blog