Interesting stuff. What the article doesn’t mention is the relative sysadmin overhead of keeping multiple VMs up to date versus updating the platform that multiple containers are running on. Because of this one thing, my money is on containers being a better long term bet for operational security…:
Are virtual machines (VM) more secure than containers? You may think you know the answer, but IBM Research has found containers can be as secure, or more secure, than VMs.
James Bottomley, an IBM Research Distinguished Engineer and top Linux kernel developer, writes: “One of the biggest problems with the current debate about Container vs Hypervisor security is that no-one has actually developed a way of measuring security, so the debate is all in qualitative terms (hypervisors ‘feel’ more secure than containers because of the interface breadth) but no-one actually has done a quantitative comparison.” To meet this need, Bottomley created Horizontal Attack Profile (HAP), designed to describe system security in a way that it can be objectively measured. Bottomley has discovered that “a Docker container with a well crafted seccomp profile (which blocks unexpected system calls) provides roughly equivalent security to a hypervisor.”
Bottomley starts by defining Vertical Attack Profile (VAP). This is all the code, which is traversed to provide a service all the way from input to database update to output. This code, like all programs, contains bugs. The bug density varies, but the more code you traverse the greater your chance of exposure to a security hole. Stack security holes exploits — which can jump into either the physical server host or VMs — are HAPs.
HAPs are the worst kind of security holes. Bottomley calls them, “potentially business destroying events.” So, how do you measure a system for HAPs? Bottomley explains:
The Quantitative approach to measuring the HAP says that we take the bug density of the Linux Kernel code and multiply it by the amount of unique code traversed by the running system after it has reached a steady state (meaning that it doesn’t appear to be traversing any new kernel paths). For the sake of this method, we assume the bug density to be uniform and thus the HAP is approximated by the amount of code traversed in the steady state. Measuring this for a running system is another matter entirely, but, fortunately, the kernel has a mechanism called ftrace which can be used to provide a trace of all of the functions called by a given userspace process and thus gives a reasonable approximation of the number of lines of code traversed. (Note this is an approximation because we measure the total number of lines in the function taking no account of internal code flow, primarily because ftrace doesn’t give that much detail.) Additionally, this methodology works very well for containers where all of the control flow emanates from a well known group of processes via the system call information, but it works less well for hypervisors where, in addition to the direct hypercall interface, you also have to add traces from the back end daemons (like the kvm vhost kernel threads or dom0 in the case of Xen).
In short, you measure how many lines of code a system–be it bare metal, VM, or container–uses to run a given application. The more code it runs, the more likely it is to have a HAP-level security hole.
Having defined HAPs and how to measure it, Bottomley then ran several standard benchmarks — redis-bench-set, redis-bench-get, python-tornado and node-express — with the latter two also running the web servers with simple external transactional clients. He performed these tests with Docker, Google’s gVisor, a container runtime sandbox; gVisor-kvm, the same container sandbox using the KVM, Linux’s built-in VM hypervisor; Kata Containers, an open-source lightweight VM; and Nabla, IBM’s just released container type, which is designed for strong server isolation.