As documented here, there is a long list of system call disabled by the seccomp mechanism in Docker:
https://docs.docker.com/engine/security/seccomp/
A few of these system call disabled by default are listed below:
Syscall | Description |
---|---|
bpf |
Deny loading potentially persistent bpf programs into kernel, already gated by CAP_SYS_ADMIN . |
clock_adjtime |
Time/date is not namespaced. Also gated by CAP_SYS_TIME . |
clock_settime |
Time/date is not namespaced. Also gated by CAP_SYS_TIME . |
clone |
Deny cloning new namespaces. Also gated by CAP_SYS_ADMIN for CLONE_* flags, except CLONE_USERNS . |
create_module |
Deny manipulation and functions on kernel modules. Obsolete. Also gated by CAP_SYS_MODULE . |
delete_module |
Deny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE . |
finit_module |
Deny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE . |
get_kernel_syms |
Deny retrieval of exported kernel and module symbols. Obsolete. |
get_mempolicy |
Syscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE . |
init_module |
Deny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE . |
ioperm |
Prevent containers from modifying kernel I/O privilege levels. Already gated by CAP_SYS_RAWIO . |
iopl |
Prevent containers from modifying kernel I/O privilege levels. Already gated by CAP_SYS_RAWIO . |
kcmp |
Restrict process inspection capabilities, already blocked by dropping CAP_PTRACE . |
kexec_file_load |
Sister syscall of kexec_load that does the same thing, slightly different arguments. Also gated by CAP_SYS_BOOT . |
kexec_load |
Deny loading a new kernel for later execution. Also gated by CAP_SYS_BOOT . |
keyctl |
Prevent containers from using the kernel keyring, which is not namespaced. |
lookup_dcookie |
Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by CAP_SYS_ADMIN . |
mbind |
Syscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE . |
mount |
Deny mounting, already gated by CAP_SYS_ADMIN . |
move_pages |
Syscall that modifies kernel memory and NUMA settings. |
name_to_handle_at |
Sister syscall to open_by_handle_at . Already gated by CAP_SYS_NICE . |
nfsservctl |
Deny interaction with the kernel nfs daemon. Obsolete since Linux 3.1. |
open_by_handle_at |
Cause of an old container breakout. Also gated by CAP_DAC_READ_SEARCH . |
perf_event_open |
Tracing/profiling syscall, which could leak a lot of information on the host. |
personality |
Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns. |
pivot_root |
Deny pivot_root , should be privileged operation. |
process_vm_readv |
Restrict process inspection capabilities, already blocked by dropping CAP_PTRACE . |
process_vm_writev |
Restrict process inspection capabilities, already blocked by dropping CAP_PTRACE . |
ptrace |
Tracing/profiling syscall, which could leak a lot of information on the host. Already blocked by dropping CAP_PTRACE . |
Program like gdb and strace are all using ptrace() to debug another program. Without ptrace() syscall, there is no way to do gdb inside docker container.
To bypass the seccomp restriction:
docker run –rm -it –security-opt seccomp=unconfined 32bit/ubuntu:bionic bash
The above will be running a 32-bit docker container inside the existing 64-bit host.
After installing the appropriate tools, we can do strace:
root@68ba6b846014:~# apt-get update
root@68ba6b846014:~# apt-get install strace ltrace
root@68ba6b846014:~# strace /bin/ls
execve("/bin/ls", ["/bin/ls"], 0xffb2e750 /* 11 vars */) = 0
brk(NULL) = 0x580ed000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7767000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=11433, …}) = 0
mmap2(NULL, 11433, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf7764000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/i386-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0L\0\0004\0\0\0"…, 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=169960, …}) = 0
mmap2(NULL, 179612, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf7738000
mmap2(0xf7761000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0xf7761000
mmap2(0xf7763000, 3484, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf7763000
close(3) = 0
You must be logged in to post a comment.