提交 c9751df2 编写于 作者: jia zhang's avatar jia zhang

inclavare-containers: an implementation of protected container

inclavare-containers is a set of tools for running trusted
applications in containers with the hardware-assisted enclave
technology. Enclave, referred to as a protected execution
environment, prevents the untrusted entity from accessing the
sensitive and confidential assets in use.

Currently, inclavare-containers consists of two core components:
rune and enclave runtime.

rune is a CLI tool for spawning and running enclaves in containers
according to the OCI specification. The codebase of rune is
a fork of runc, so rune can be used as runc if enclave is not
configured or available.

Enclave runtime is the backend of rune, which is responsible
for loading and running applications inside enclaves. The
interface between rune and enclave runtime is Enclave Runtime PAL
API, which allows invoking enclave runtime through well-defined
functions. The software for confidential computing may benefit
from this interface to interact with OCI runtime.

Additionally, this commit includes additional information about the
use of inclavare-containers.
- Run sample enclave runtime skeleton with rune
- Run enclave runtime Occlum with rune

See README.md for more details.
Signed-off-by: jia zhang's avatarJia Zhang <zhang.jia@linux.alibaba.com>
Signed-off-by: NXiaozhe Wang <wangxiaozhe@linux.alibaba.com>
Signed-off-by: NYilin Li <YiLin.Li@linux.alibaba.com>
上级 161f6e3d

要显示的变更太多。

To preserve performance only 1000 of 1000+ files are displayed.
# inclavare-containers
\ No newline at end of file
# inclavare-containers
## Introduction
`inclavare-containers` is a set of tools for running trusted applications in containers with the hardware-assisted enclave technology. Enclave, referred to as a protected execution environment, prevents the untrusted entity from accessing the sensitive and confidential assets in use.
## Components
### rune
`rune` is a CLI tool for spawning and running enclaves in containers according to the OCI specification. The codebase of `rune` is a fork of [runc](https://github.com/opencontainers/runc), so `rune` can be used as `runc` if enclave is not configured or available.
`rune` currently supports the Linux platform with x86-64 architecture only. It must be built with Go version 1.14 or higher.
`rune` depends on protobuf compiler. Please refer to [this guide](https://github.com/protocolbuffers/protobuf#protocol-compiler-installation) to install it on your platform. Additionally, `rune` by default enables seccomp support as [runc](https://github.com/opencontainers/runc#building) so you need to install libseccomp on your platform. Note that the libseccomp is also required in container environment, and the host version should be equal or higher than the one in container.
```bash
# create $WORKSPACE folder
mkdir -p "$WORKSPACE"
cd "$WORKSPACE"
git clone https://github.com/alibaba/inclavare-containers
cd inclavare-containers/rune
# install Go protobuf plugin for protobuf3
go get github.com/golang/protobuf/protoc-gen-go@v1.3.5
# build and install rune
make
sudo make install
```
`rune` will be installed to `/usr/local/sbin/rune` on your system.
### enclave runtime
The backend of `rune` is a component called enclave runtime, which is responsible for loading and running protected applications inside enclaves. The interface between `rune` and enclave runtime is [Enclave Runtime PAL API](https://github.com/alibaba/inclavare-containers/blob/master/rune/libenclave/internal/runtime/pal/spec.md), which allows invoking enclave runtime through well-defined functions. The software for confidential computing may benefit from this interface to interact with OCI runtime.
One typical class of enclave runtime implementations is based on library OSes. Currently, the default enclave runtime interacting with `rune` is [Occlum](https://github.com/occlum/occlum), a memory-safe, multi-process library OS for Intel SGX.
In addition, you can write your own enclave runtime with any programming language and SDK (e.g, [Intel SGX SDK](https://github.com/intel/linux-sgx)) you prefer as long as it implements Enclave Runtime PAL API.
### shim-rune
`shim-rune` resides in between `containerd` and `rune`, conducting enclave signing and management beyond the normal `shim` basis. `shim-rune` and `rune` can compose a basic enclave containerization stack for the cloud-native ecosystem.
`shim-rune` will be open source soon.
---
## Using rune
### Run Occlum
Please refer to [this guide](https://github.com/occlum/occlum/blob/master/docs/rune_quick_start.md) to run `Occlum` with `rune`.
### Run Docker
Please refer to [this guide](https://github.com/alibaba/inclavare-containers/blob/master/docs/running_rune_with_docker.md) to run `Docker` with `rune`.
### Run skeleton
Skeleton is an example of enclave runtime, interfacing with Enclave Runtime PAL API for easy interfacing with `rune`. Skeleton sample code is helpful to write your own enclave runtime.
Please refer to [this guide](https://github.com/alibaba/inclavare-containers/blob/master/rune/libenclave/internal/runtime/pal/skeleton/README.md) to run skeleton with `rune`.
For more information about Enclave Runtime PAL API, please refer to [Enclave Runtime PAL API Specification](https://github.com/alibaba/inclavare-containers/blob/master/rune/libenclave/internal/runtime/pal/spec.md).
# Quick Start: Running rune with Docker
## Build and install rune
`rune` is a CLI tool for spawning and running enclaves in containers according to the OCI specification.
Please refer to [this guide](https://github.com/alibaba/inclavare-containers/blob/master/README.md#rune) to build `rune` from scratch.
---
## Configure Docker runtimes
Add the `rune` OCI runtime configuration in dockerd config file (`/etc/docker/daemon.json`) in your system.
``` JSON
{
"runtimes": {
"rune": {
"path": "/usr/local/sbin/rune",
"runtimeArgs": []
}
}
}
```
then restart docker service on your system.
> e.g. `sudo systemctl restart docker` for CentOS, or `sudo service docker restart` for Ubuntu
You can check whether `rune` is correctly added to container runtime or not with
``` shell
sudo docker info | grep rune
Runtimes: rune runc
```
---
## Running Docker using rune
You need to specify a set of parameters to `docker run` in order to use `rune`, e.g,
``` shell
docker run -it --rm --runtime=rune \
-e ENCLAVE_TYPE=intelSgx \
-e ENCLAVE_RUNTIME_PATH=/run/rune/liberpal-skeleton.so \
-e ENCLAVE_RUNTIME_ARGS=skeleton,debug \
$image
```
where:
- @runtime: choose the runtime (`rune`, `runc` or others) to use for this container.
- @ENCLAVE_TYPE: specify the type of enclave hardware to use, such as `intelSgx`.
- @ENCLAVE_PATH: specify the path to enclave runtime to launch.
- @ENCLAVE_ARGS: specify the specific arguments to enclave runtime, seperated by the comma.
Note that the skeleton is a sample enclave runtime. Please refer to [this guide](https://github.com/alibaba/inclavare-containers/blob/master/rune/libenclave/internal/runtime/pal/skeleton/README.md) to run skeleton with `rune`. In addition, refer to [this guide]() to run more useful Occlum library OS.
vendor/pkg
/rune
/rune-*
contrib/cmd/recvtty/recvtty
man/man8
release
*.pb.go
liberpal-skeleton.so
approve_by_comment: true
approve_regex: ^LGTM
reject_regex: ^Rejected
reset_on_push: true
author_approval: ignored
reviewers:
teams:
- runc-maintainers
name: default
required: 2
dist: bionic
language: go
go:
- 1.14.x
- 1.13.x
- tip
cache:
directories:
- /home/travis/.vagrant.d/boxes
matrix:
include:
- go: 1.14.x
name: "verify-dependencies"
script:
- make verify-dependencies
- go: 1.13.x
name: "cgroup-systemd"
env:
- RUNC_USE_SYSTEMD=1
script:
- make all
- sudo PATH="$PATH" make localintegration RUNC_USE_SYSTEMD=1
- go: 1.13.x
name: "cgroup-v2"
env:
- VAGRANT_VERSION=2.2.7
before_install:
- cat /proc/cpuinfo
# https://github.com/alvistack/ansible-role-virtualbox/blob/6887b020b0ca5c59ddb6620d73f053ffb84f4126/.travis.yml#L30
- sudo apt-get install -q -y bridge-utils dnsmasq-base ebtables libvirt-bin libvirt-dev qemu-kvm qemu-utils ruby-dev && wget https://releases.hashicorp.com/vagrant/${VAGRANT_VERSION}/vagrant_${VAGRANT_VERSION}_$(uname -m).deb && sudo dpkg -i vagrant_${VAGRANT_VERSION}_$(uname -m).deb && rm -f vagrant_${VAGRANT_VERSION}_$(uname -m).deb
- sudo vagrant plugin install vagrant-libvirt
- sudo vagrant up && sudo mkdir -p /root/.ssh && sudo sh -c "vagrant ssh-config >> /root/.ssh/config"
script:
- sudo ssh default -t 'cd /vagrant && sudo make localunittest'
# cgroupv2+systemd: test on vagrant host itself as we need systemd
- sudo ssh default -t 'cd /vagrant && sudo make localintegration RUNC_USE_SYSTEMD=yes'
# same setup but with fs2 driver instead of systemd
- sudo ssh default -t 'cd /vagrant && sudo make localintegration'
# rootless
- sudo ssh default -t 'cd /vagrant && sudo make localrootlessintegration'
allow_failures:
- go: tip
go_import_path: github.com/opencontainers/runc
# `make ci` uses Docker.
sudo: required
services:
- docker
before_install:
- sudo apt-get -qq update
- sudo apt-get install -y libseccomp-dev
- GO111MODULE=off go get -u golang.org/x/lint/golint
- GO111MODULE=off go get -u github.com/vbatts/git-validation
- env | grep TRAVIS_
script:
- git-validation -run DCO,short-subject -v
- make
- make clean ci cross
## Contribution Guidelines
### Security issues
If you are reporting a security issue, do not create an issue or file a pull
request on GitHub. Instead, disclose the issue responsibly by sending an email
to security@opencontainers.org (which is inhabited only by the maintainers of
the various OCI projects).
### Pull requests are always welcome
We are always thrilled to receive pull requests, and do our best to
process them as fast as possible. Not sure if that typo is worth a pull
request? Do it! We will appreciate it.
If your pull request is not accepted on the first try, don't be
discouraged! If there's a problem with the implementation, hopefully you
received feedback on what to improve.
We're trying very hard to keep runc lean and focused. We don't want it
to do everything for everybody. This means that we might decide against
incorporating a new feature. However, there might be a way to implement
that feature *on top of* runc.
### Conventions
Fork the repo and make changes on your fork in a feature branch:
- If it's a bugfix branch, name it XXX-something where XXX is the number of the
issue
- If it's a feature branch, create an enhancement issue to announce your
intentions, and name it XXX-something where XXX is the number of the issue.
Submit unit tests for your changes. Go has a great test framework built in; use
it! Take a look at existing tests for inspiration. Run the full test suite on
your branch before submitting a pull request.
Update the documentation when creating or modifying features. Test
your documentation changes for clarity, concision, and correctness, as
well as a clean documentation build. See ``docs/README.md`` for more
information on building the docs and how docs get released.
Write clean code. Universally formatted code promotes ease of writing, reading,
and maintenance. Always run `gofmt -s -w file.go` on each changed file before
committing your changes. Most editors have plugins that do this automatically.
Pull requests descriptions should be as clear as possible and include a
reference to all the issues that they address.
Pull requests must not contain commits from other users or branches.
Commit messages must start with a capitalized and short summary (max. 50
chars) written in the imperative, followed by an optional, more detailed
explanatory text which is separated from the summary by an empty line.
Code review comments may be added to your pull request. Discuss, then make the
suggested modifications and push additional commits to your feature branch. Be
sure to post a comment after pushing. The new commits will show up in the pull
request automatically, but the reviewers will not be notified unless you
comment.
Before the pull request is merged, make sure that you squash your commits into
logical units of work using `git rebase -i` and `git push -f`. After every
commit the test suite should be passing. Include documentation changes in the
same commit so that a revert would remove all traces of the feature or fix.
Commits that fix or close an issue should include a reference like `Closes #XXX`
or `Fixes #XXX`, which will automatically close the issue when merged.
### Sign your work
The sign-off is a simple line at the end of the explanation for the
patch, which certifies that you wrote it or otherwise have the right to
pass it on as an open-source patch. The rules are pretty simple: if you
can certify the below (from
[developercertificate.org](http://developercertificate.org/)):
```
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
```
then you just add a line to every git commit message:
Signed-off-by: Joe Smith <joe@gmail.com>
using your real name (sorry, no pseudonyms or anonymous contributions.)
You can add the sign off when creating the git commit via `git commit -s`.
ARG GO_VERSION=1.13
ARG BATS_VERSION=v1.1.0
ARG CRIU_VERSION=v3.13
FROM golang:${GO_VERSION}-buster
ARG DEBIAN_FRONTEND=noninteractive
RUN dpkg --add-architecture armel \
&& dpkg --add-architecture armhf \
&& dpkg --add-architecture arm64 \
&& dpkg --add-architecture ppc64el \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
build-essential \
crossbuild-essential-arm64 \
crossbuild-essential-armel \
crossbuild-essential-armhf \
crossbuild-essential-ppc64el \
curl \
gawk \
iptables \
jq \
kmod \
libaio-dev \
libcap-dev \
libnet-dev \
libnl-3-dev \
libprotobuf-c-dev \
libprotobuf-dev \
libseccomp-dev \
libseccomp-dev:arm64 \
libseccomp-dev:armel \
libseccomp-dev:armhf \
libseccomp-dev:ppc64el \
libseccomp2 \
pkg-config \
protobuf-c-compiler \
protobuf-compiler \
python-minimal \
sudo \
uidmap \
&& apt-get clean \
&& rm -rf /var/cache/apt /var/lib/apt/lists/*;
# Add a dummy user for the rootless integration tests. While runC does
# not require an entry in /etc/passwd to operate, one of the tests uses
# `git clone` -- and `git clone` does not allow you to clone a
# repository if the current uid does not have an entry in /etc/passwd.
RUN useradd -u1000 -m -d/home/rootless -s/bin/bash rootless
# install bats
ARG BATS_VERSION
RUN cd /tmp \
&& git clone https://github.com/bats-core/bats-core.git \
&& cd bats-core \
&& git reset --hard "${BATS_VERSION}" \
&& ./install.sh /usr/local \
&& rm -rf /tmp/bats-core
# install criu
ARG CRIU_VERSION
RUN mkdir -p /usr/src/criu \
&& curl -fsSL https://github.com/checkpoint-restore/criu/archive/${CRIU_VERSION}.tar.gz | tar -C /usr/src/criu/ -xz --strip-components=1 \
&& cd /usr/src/criu \
&& echo 1 > .gitid \
&& curl -sSL https://github.com/checkpoint-restore/criu/commit/4c27b3db4f4325a311d8bfa9a50ea3efb4d6e377.patch | patch -p1 \
&& curl -sSL https://github.com/checkpoint-restore/criu/commit/aac41164b2cd7f0d2047f207b32844524682e43f.patch | patch -p1 \
&& curl -sSL https://github.com/checkpoint-restore/criu/commit/6f19249b2565f3f7c0a1f8f65b4ae180e8f7f34b.patch | patch -p1 \
&& curl -sSL https://github.com/checkpoint-restore/criu/commit/378337a496ca759848180bc5411e4446298c5e4e.patch | patch -p1 \
&& make install-criu \
&& cd - \
&& rm -rf /usr/src/criu
COPY script/tmpmount /
WORKDIR /go/src/github.com/opencontainers/runc
ENTRYPOINT ["/tmpmount"]
# setup a playground for us to spawn containers in
COPY tests/integration/multi-arch.bash tests/integration/
ENV ROOTFS /busybox
RUN mkdir -p "${ROOTFS}"
RUN . tests/integration/multi-arch.bash \
&& curl -fsSL `get_busybox` | tar xfJC - "${ROOTFS}"
COPY . .
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
Copyright 2014 Docker, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Michael Crosby <michael@docker.com> (@crosbymichael)
Mrunal Patel <mpatel@redhat.com> (@mrunalp)
Daniel, Dao Quang Minh <dqminh89@gmail.com> (@dqminh)
Qiang Huang <h.huangqiang@huawei.com> (@hqhq)
Aleksa Sarai <asarai@suse.de> (@cyphar)
Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> (@AkihiroSuda)
Kir Kolyshkin <kolyshkin@gmail.com> (@kolyshkin)
## Introduction
Dear maintainer. Thank you for investing the time and energy to help
make runc as useful as possible. Maintaining a project is difficult,
sometimes unrewarding work. Sure, you will get to contribute cool
features to the project. But most of your time will be spent reviewing,
cleaning up, documenting, answering questions, justifying design
decisions - while everyone has all the fun! But remember - the quality
of the maintainers work is what distinguishes the good projects from the
great. So please be proud of your work, even the unglamorous parts,
and encourage a culture of appreciation and respect for *every* aspect
of improving the project - not just the hot new features.
This document is a manual for maintainers old and new. It explains what
is expected of maintainers, how they should work, and what tools are
available to them.
This is a living document - if you see something out of date or missing,
speak up!
## What are a maintainer's responsibility?
It is every maintainer's responsibility to:
* 1) Expose a clear roadmap for improving their component.
* 2) Deliver prompt feedback and decisions on pull requests.
* 3) Be available to anyone with questions, bug reports, criticism etc.
on their component. This includes IRC and GitHub issues and pull requests.
* 4) Make sure their component respects the philosophy, design and
roadmap of the project.
## How are decisions made?
Short answer: with pull requests to the runc repository.
runc is an open-source project with an open design philosophy. This
means that the repository is the source of truth for EVERY aspect of the
project, including its philosophy, design, roadmap and APIs. *If it's
part of the project, it's in the repo. It's in the repo, it's part of
the project.*
As a result, all decisions can be expressed as changes to the
repository. An implementation change is a change to the source code. An
API change is a change to the API specification. A philosophy change is
a change to the philosophy manifesto. And so on.
All decisions affecting runc, big and small, follow the same 3 steps:
* Step 1: Open a pull request. Anyone can do this.
* Step 2: Discuss the pull request. Anyone can do this.
* Step 3: Accept (`LGTM`) or refuse a pull request. The relevant maintainers do
this (see below "Who decides what?")
*I'm a maintainer, should I make pull requests too?*
Yes. Nobody should ever push to master directly. All changes should be
made through a pull request.
## Who decides what?
All decisions are pull requests, and the relevant maintainers make
decisions by accepting or refusing the pull request. Review and acceptance
by anyone is denoted by adding a comment in the pull request: `LGTM`.
However, only currently listed `MAINTAINERS` are counted towards the required
two LGTMs.
Overall the maintainer system works because of mutual respect across the
maintainers of the project. The maintainers trust one another to make decisions
in the best interests of the project. Sometimes maintainers can disagree and
this is part of a healthy project to represent the point of views of various people.
In the case where maintainers cannot find agreement on a specific change the
role of a Chief Maintainer comes into play.
The Chief Maintainer for the project is responsible for overall architecture
of the project to maintain conceptual integrity. Large decisions and
architecture changes should be reviewed by the chief maintainer.
The current chief maintainer for the project is Michael Crosby (@crosbymichael).
Even though the maintainer system is built on trust, if there is a conflict
with the chief maintainer on a decision, their decision can be challenged
and brought to the technical oversight board if two-thirds of the
maintainers vote for an appeal. It is expected that this would be a
very exceptional event.
### How are maintainers added?
The best maintainers have a vested interest in the project. Maintainers
are first and foremost contributors that have shown they are committed to
the long term success of the project. Contributors wanting to become
maintainers are expected to be deeply involved in contributing code,
pull request review, and triage of issues in the project for more than two months.
Just contributing does not make you a maintainer, it is about building trust
with the current maintainers of the project and being a person that they can
depend on and trust to make decisions in the best interest of the project. The
final vote to add a new maintainer should be approved by over 66% of the current
maintainers with the chief maintainer having veto power. In case of a veto,
conflict resolution rules expressed above apply. The voting period is
five business days on the Pull Request to add the new maintainer.
### What is expected of maintainers?
Part of a healthy project is to have active maintainers to support the community
in contributions and perform tasks to keep the project running. Maintainers are
expected to be able to respond in a timely manner if their help is required on specific
issues where they are pinged. Being a maintainer is a time consuming commitment and should
not be taken lightly.
When a maintainer is unable to perform the required duties they can be removed with
a vote by 66% of the current maintainers with the chief maintainer having veto power.
The voting period is ten business days. Issues related to a maintainer's performance should
be discussed with them among the other maintainers so that they are not surprised by
a pull request removing them.
CONTAINER_ENGINE := docker
GO := go
PREFIX := $(DESTDIR)/usr/local
BINDIR := $(PREFIX)/sbin
MANDIR := $(PREFIX)/share/man
GIT_BRANCH := $(shell git rev-parse --abbrev-ref HEAD 2>/dev/null)
GIT_BRANCH_CLEAN := $(shell echo $(GIT_BRANCH) | sed -e "s/[^[:alnum:]]/-/g")
RUNC_IMAGE := runc_dev$(if $(GIT_BRANCH_CLEAN),:$(GIT_BRANCH_CLEAN))
PROJECT := github.com/opencontainers/runc
BUILDTAGS ?= seccomp selinux apparmor
COMMIT_NO := $(shell git rev-parse HEAD 2> /dev/null || true)
COMMIT ?= $(if $(shell git status --porcelain --untracked-files=no),"$(COMMIT_NO)-dirty","$(COMMIT_NO)")
VERSION := $(shell cat ./VERSION)
# TODO: rm -mod=vendor once go 1.13 is unsupported
ifneq ($(GO111MODULE),off)
MOD_VENDOR := "-mod=vendor"
endif
ifeq ($(DEBUG),1)
GCFLAGS=-gcflags "-N -l"
endif
PROTO_DIR := libenclave/proto
GO_BUILD := $(GO) build $(MOD_VENDOR) -buildmode=pie $(GCFLAGS) $(EXTRA_FLAGS) -tags "$(BUILDTAGS)" \
-ldflags "-X main.gitCommit=$(COMMIT) -X main.version=$(VERSION) $(EXTRA_LDFLAGS)"
GO_BUILD_STATIC := CGO_ENABLED=1 $(GO) build $(MOD_VENDOR) $(GCFLAGS) $(EXTRA_FLAGS) -tags "$(BUILDTAGS) netgo osusergo" \
-ldflags "-w -extldflags -static -X main.gitCommit=$(COMMIT) -X main.version=$(VERSION) $(EXTRA_LDFLAGS)"
.DEFAULT: rune
rune: $(patsubst %.proto,%.pb.go,$(wildcard $(PROTO_DIR)/*.proto))
$(GO_BUILD) -o rune .
%.pb.go: %.proto
@protoc --proto_path=$(shell dirname $^) --go_out=$(shell dirname $^) $^
all: rune recvtty skeleton
recvtty:
$(GO_BUILD) -o contrib/cmd/recvtty/recvtty ./contrib/cmd/recvtty
skeleton: libenclave/internal/runtime/pal/skeleton/liberpal-skeleton.so
libenclave/internal/runtime/pal/skeleton/liberpal-skeleton.so:
gcc -fPIC -shared -Wall -Wno-unused-const-variable -Ilibenclave/pal -o $@ libenclave/internal/runtime/pal/skeleton/liberpal-skeleton.c
static:
$(GO_BUILD_STATIC) -o rune .
$(GO_BUILD_STATIC) -o contrib/cmd/recvtty/recvtty ./contrib/cmd/recvtty
release:
script/release.sh -r release/$(VERSION) -v $(VERSION)
dbuild: runcimage
$(CONTAINER_ENGINE) run $(CONTAINER_ENGINE_RUN_FLAGS) \
--privileged --rm \
-v $(CURDIR):/go/src/$(PROJECT) \
$(RUNC_IMAGE) make clean all
lint:
$(GO) vet ./...
$(GO) fmt ./...
man:
man/md2man-all.sh
runcimage:
$(CONTAINER_ENGINE) build $(CONTAINER_ENGINE_BUILD_FLAGS) -t $(RUNC_IMAGE) .
test: unittest integration rootlessintegration
localtest: localunittest localintegration localrootlessintegration
unittest: runcimage
$(CONTAINER_ENGINE) run $(CONTAINER_ENGINE_RUN_FLAGS) \
-t --privileged --rm \
-v /lib/modules:/lib/modules:ro \
-v $(CURDIR):/go/src/$(PROJECT) \
$(RUNC_IMAGE) make localunittest TESTFLAGS=$(TESTFLAGS)
localunittest: all
$(GO) test $(MOD_VENDOR) -timeout 3m -tags "$(BUILDTAGS)" $(TESTFLAGS) -v ./...
integration: runcimage
$(CONTAINER_ENGINE) run $(CONTAINER_ENGINE_RUN_FLAGS) \
-t --privileged --rm \
-v /lib/modules:/lib/modules:ro \
-v $(CURDIR):/go/src/$(PROJECT) \
$(RUNC_IMAGE) make localintegration TESTPATH=$(TESTPATH)
localintegration: all
bats -t tests/integration$(TESTPATH)
rootlessintegration: runcimage
$(CONTAINER_ENGINE) run $(CONTAINER_ENGINE_RUN_FLAGS) \
-t --privileged --rm \
-v $(CURDIR):/go/src/$(PROJECT) \
-e ROOTLESS_TESTPATH \
$(RUNC_IMAGE) make localrootlessintegration
localrootlessintegration: all
tests/rootless.sh
shell: runcimage
$(CONTAINER_ENGINE) run $(CONTAINER_ENGINE_RUN_FLAGS) \
-ti --privileged --rm \
-v $(CURDIR):/go/src/$(PROJECT) \
$(RUNC_IMAGE) bash
install:
install -D -m0755 rune $(BINDIR)/rune
install-bash:
install -D -m0644 contrib/completions/bash/rune $(PREFIX)/share/bash-completion/completions/rune
install-man: man
install -d -m 755 $(MANDIR)/man8
install -D -m 644 man/man8/*.8 $(MANDIR)/man8
clean:
rm -f rune rune-*
rm -f contrib/cmd/recvtty/recvtty
rm -rf release
rm -rf man/man8
rm -f libenclave/proto/*.pb.go
rm -f libenclave/internal/runtime/pal/skeleton/liberpal-skeleton.so
validate:
script/validate-gofmt
script/validate-c
$(GO) vet ./...
ci: validate test release
vendor:
export GO111MODULE=on \
$(GO) mod tidy && \
$(GO) mod vendor && \
$(GO) mod verify
verify-dependencies: vendor
@test -z "$$(git status --porcelain -- go.mod go.sum vendor/)" \
|| (echo -e "git status:\n $$(git status -- go.mod go.sum vendor/)\nerror: vendor/, go.mod and/or go.sum not up to date. Run \"make vendor\" to update"; exit 1) \
&& echo "all vendor files are up to date."
cross: runcimage
$(CONTAINER_ENGINE) run $(CONTAINER_ENGINE_RUN_FLAGS) \
-e BUILDTAGS="$(BUILDTAGS)" --rm \
-v $(CURDIR):/go/src/$(PROJECT) \
$(RUNC_IMAGE) make localcross
localcross:
CGO_ENABLED=1 GOARCH=arm GOARM=6 CC=arm-linux-gnueabi-gcc $(GO_BUILD) -o runc-armel .
CGO_ENABLED=1 GOARCH=arm GOARM=7 CC=arm-linux-gnueabihf-gcc $(GO_BUILD) -o runc-armhf .
CGO_ENABLED=1 GOARCH=arm64 CC=aarch64-linux-gnu-gcc $(GO_BUILD) -o runc-arm64 .
CGO_ENABLED=1 GOARCH=ppc64le CC=powerpc64le-linux-gnu-gcc $(GO_BUILD) -o runc-ppc64le .
.PHONY: rune all recvtty static release dbuild lint man runcimage \
test localtest unittest localunittest integration localintegration \
rootlessintegration localrootlessintegration shell install install-bash \
install-man clean validate ci \
vendor verify-dependencies cross localcross skeleton
runc
Copyright 2012-2015 Docker, Inc.
This product includes software developed at Docker, Inc. (http://www.docker.com).
The following is courtesy of our legal counsel:
Use and transfer of Docker may be subject to certain restrictions by the
United States and other governments.
It is your responsibility to ensure that your use and/or transfer does not
violate applicable laws.
For more information, please see http://www.bis.doc.gov
See also http://www.apache.org/dev/crypto.html and/or seek legal counsel.
# runc principles
In the design and development of runc and libcontainer we try to follow these principles:
(Work in progress)
* Don't try to replace every tool. Instead, be an ingredient to improve them.
* Less code is better.
* Fewer components are better. Do you really need to add one more class?
* 50 lines of straightforward, readable code is better than 10 lines of magic that nobody can understand.
* Don't do later what you can do now. "//TODO: refactor" is not acceptable in new code.
* When hesitating between two options, choose the one that is easier to reverse.
* "No" is temporary; "Yes" is forever. If you're not sure about a new feature, say no. You can change your mind later.
* Containers must be portable to the greatest possible number of machines. Be suspicious of any change which makes machines less interchangeable.
* The fewer moving parts in a container, the better.
* Don't merge it unless you document it.
* Don't document it unless you can keep it up-to-date.
* Don't merge it unless you test it!
* Everyone's problem is slightly different. Focus on the part that is the same for everyone, and solve that.
# runc
[![Build Status](https://travis-ci.org/opencontainers/runc.svg?branch=master)](https://travis-ci.org/opencontainers/runc)
[![Go Report Card](https://goreportcard.com/badge/github.com/opencontainers/runc)](https://goreportcard.com/report/github.com/opencontainers/runc)
[![GoDoc](https://godoc.org/github.com/opencontainers/runc?status.svg)](https://godoc.org/github.com/opencontainers/runc)
## Introduction
`runc` is a CLI tool for spawning and running containers according to the OCI specification.
## Releases
`runc` depends on and tracks the [runtime-spec](https://github.com/opencontainers/runtime-spec) repository.
We will try to make sure that `runc` and the OCI specification major versions stay in lockstep.
This means that `runc` 1.0.0 should implement the 1.0 version of the specification.
You can find official releases of `runc` on the [release](https://github.com/opencontainers/runc/releases) page.
Currently, the following features are not considered to be production-ready:
* Support for cgroup v2
## Security
The reporting process and disclosure communications are outlined [here](https://github.com/opencontainers/org/blob/master/SECURITY.md).
### Security Audit
A third party security audit was performed by Cure53, you can see the full report [here](https://github.com/opencontainers/runc/blob/master/docs/Security-Audit.pdf).
## Building
`runc` currently supports the Linux platform with various architecture support.
It must be built with Go version 1.13 or higher.
In order to enable seccomp support you will need to install `libseccomp` on your platform.
> e.g. `libseccomp-devel` for CentOS, or `libseccomp-dev` for Ubuntu
```bash
# create a 'github.com/opencontainers' in your GOPATH/src
cd github.com/opencontainers
git clone https://github.com/opencontainers/runc
cd runc
make
sudo make install
```
You can also use `go get` to install to your `GOPATH`, assuming that you have a `github.com` parent folder already created under `src`:
```bash
go get github.com/opencontainers/runc
cd $GOPATH/src/github.com/opencontainers/runc
make
sudo make install
```
`runc` will be installed to `/usr/local/sbin/runc` on your system.
#### Build Tags
`runc` supports optional build tags for compiling support of various features,
with some of them enabled by default (see `BUILDTAGS` in top-level `Makefile`).
To change build tags from the default, set the `BUILDTAGS` variable for make,
e.g.
```bash
make BUILDTAGS='seccomp apparmor'
```
| Build Tag | Feature | Enabled by default | Dependency |
|-----------|------------------------------------|--------------------|------------|
| seccomp | Syscall filtering | yes | libseccomp |
| selinux | selinux process and mount labeling | yes | <none> |
| apparmor | apparmor profile support | yes | <none> |
| nokmem | disable kernel memory accounting | no | <none> |
### Running the test suite
`runc` currently supports running its test suite via Docker.
To run the suite just type `make test`.
```bash
make test
```
There are additional make targets for running the tests outside of a container but this is not recommended as the tests are written with the expectation that they can write and remove anywhere.
You can run a specific test case by setting the `TESTFLAGS` variable.
```bash
# make test TESTFLAGS="-run=SomeTestFunction"
```
You can run a specific integration test by setting the `TESTPATH` variable.
```bash
# make test TESTPATH="/checkpoint.bats"
```
You can run a specific rootless integration test by setting the `ROOTLESS_TESTPATH` variable.
```bash
# make test ROOTLESS_TESTPATH="/checkpoint.bats"
```
You can run a test using your container engine's flags by setting `CONTAINER_ENGINE_BUILD_FLAGS` and `CONTAINER_ENGINE_RUN_FLAGS` variables.
```bash
# make test CONTAINER_ENGINE_BUILD_FLAGS="--build-arg http_proxy=http://yourproxy/" CONTAINER_ENGINE_RUN_FLAGS="-e http_proxy=http://yourproxy/"
```
### Dependencies Management
`runc` uses [Go Modules](https://github.com/golang/go/wiki/Modules) for dependencies management.
Please refer to [Go Modules](https://github.com/golang/go/wiki/Modules) for how to add or update
new dependencies. When updating dependencies, be sure that you are running Go `1.14` or newer.
```
# Update vendored dependencies
make vendor
# Verify all dependencies
make verify-dependencies
```
## Using runc
### Creating an OCI Bundle
In order to use runc you must have your container in the format of an OCI bundle.
If you have Docker installed you can use its `export` method to acquire a root filesystem from an existing Docker container.
```bash
# create the top most bundle directory
mkdir /mycontainer
cd /mycontainer
# create the rootfs directory
mkdir rootfs
# export busybox via Docker into the rootfs directory
docker export $(docker create busybox) | tar -C rootfs -xvf -
```
After a root filesystem is populated you just generate a spec in the format of a `config.json` file inside your bundle.
`runc` provides a `spec` command to generate a base template spec that you are then able to edit.
To find features and documentation for fields in the spec please refer to the [specs](https://github.com/opencontainers/runtime-spec) repository.
```bash
runc spec
```
### Running Containers
Assuming you have an OCI bundle from the previous step you can execute the container in two different ways.
The first way is to use the convenience command `run` that will handle creating, starting, and deleting the container after it exits.
```bash
# run as root
cd /mycontainer
runc run mycontainerid
```
If you used the unmodified `runc spec` template this should give you a `sh` session inside the container.
The second way to start a container is using the specs lifecycle operations.
This gives you more power over how the container is created and managed while it is running.
This will also launch the container in the background so you will have to edit the `config.json` to remove the `terminal` setting for the simple examples here.
Your process field in the `config.json` should look like this below with `"terminal": false` and `"args": ["sleep", "5"]`.
```json
"process": {
"terminal": false,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"sleep", "5"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"inheritable": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"ambient": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"noNewPrivileges": true
},
```
Now we can go through the lifecycle operations in your shell.
```bash
# run as root
cd /mycontainer
runc create mycontainerid
# view the container is created and in the "created" state
runc list
# start the process inside the container
runc start mycontainerid
# after 5 seconds view that the container has exited and is now in the stopped state
runc list
# now delete the container
runc delete mycontainerid
```
This allows higher level systems to augment the containers creation logic with setup of various settings after the container is created and/or before it is deleted. For example, the container's network stack is commonly set up after `create` but before `start`.
#### Rootless containers
`runc` has the ability to run containers without root privileges. This is called `rootless`. You need to pass some parameters to `runc` in order to run rootless containers. See below and compare with the previous version.
**Note:** In order to use this feature, "User Namespaces" must be compiled and enabled in your kernel. There are various ways to do this depending on your distribution:
- Confirm `CONFIG_USER_NS=y` is set in your kernel configuration (normally found in `/proc/config.gz`)
- Arch/Debian: `echo 1 > /proc/sys/kernel/unprivileged_userns_clone`
- RHEL/CentOS 7: `echo 28633 > /proc/sys/user/max_user_namespaces`
Run the following commands as an ordinary user:
```bash
# Same as the first example
mkdir ~/mycontainer
cd ~/mycontainer
mkdir rootfs
docker export $(docker create busybox) | tar -C rootfs -xvf -
# The --rootless parameter instructs runc spec to generate a configuration for a rootless container, which will allow you to run the container as a non-root user.
runc spec --rootless
# The --root parameter tells runc where to store the container state. It must be writable by the user.
runc --root /tmp/runc run mycontainerid
```
#### Supervisors
`runc` can be used with process supervisors and init systems to ensure that containers are restarted when they exit.
An example systemd unit file looks something like this.
```systemd
[Unit]
Description=Start My Container
[Service]
Type=forking
ExecStart=/usr/local/sbin/runc run -d --pid-file /run/mycontainerid.pid mycontainerid
ExecStopPost=/usr/local/sbin/runc delete mycontainerid
WorkingDirectory=/mycontainer
PIDFile=/run/mycontainerid.pid
[Install]
WantedBy=multi-user.target
```
## License
The code and docs are released under the [Apache 2.0 license](LICENSE).
# Security
The reporting process and disclosure communications are outlined [here](https://github.com/opencontainers/org/blob/master/SECURITY.md).
1.0.0-rc10+dev
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
# Fedora box is used for testing cgroup v2 support
config.vm.box = "fedora/32-cloud-base"
config.vm.provider :virtualbox do |v|
v.memory = 2048
v.cpus = 2
end
config.vm.provider :libvirt do |v|
v.memory = 2048
v.cpus = 2
end
config.vm.provision "shell", inline: <<-SHELL
cat << EOF | dnf -y shell
update
install iptables gcc make golang-go libseccomp-devel bats jq \
patch protobuf protobuf-c protobuf-c-compiler protobuf-c-devel protobuf-compiler \
protobuf-devel libnl3-devel libcap-devel libnet-devel \
nftables-devel libbsd-devel gnutls-devel
ts run
EOF
dnf clean all
# Add a user for rootless tests
useradd -u2000 -m -d/home/rootless -s/bin/bash rootless
# Add busybox for libcontainer/integration tests
. /vagrant/tests/integration/multi-arch.bash \
&& mkdir /busybox \
&& curl -fsSL $(get_busybox) | tar xfJC - /busybox
# Apr 25, 2020 (master)
( git clone https://github.com/checkpoint-restore/criu.git /usr/src/criu \
&& cd /usr/src/criu \
&& git checkout 5c5e7695a51318b17e3d982df8231ac83971641c \
&& make install-criu )
rm -rf /usr/src/criu
SHELL
end
// +build linux
package main
import (
"fmt"
"os"
"strconv"
"strings"
"github.com/opencontainers/runc/libcontainer"
"github.com/opencontainers/runc/libcontainer/system"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/sirupsen/logrus"
"github.com/urfave/cli"
"golang.org/x/sys/unix"
)
var checkpointCommand = cli.Command{
Name: "checkpoint",
Usage: "checkpoint a running container",
ArgsUsage: `<container-id>
Where "<container-id>" is the name for the instance of the container to be
checkpointed.`,
Description: `The checkpoint command saves the state of the container instance.`,
Flags: []cli.Flag{
cli.StringFlag{Name: "image-path", Value: "", Usage: "path for saving criu image files"},
cli.StringFlag{Name: "work-path", Value: "", Usage: "path for saving work files and logs"},
cli.StringFlag{Name: "parent-path", Value: "", Usage: "path for previous criu image files in pre-dump"},
cli.BoolFlag{Name: "leave-running", Usage: "leave the process running after checkpointing"},
cli.BoolFlag{Name: "tcp-established", Usage: "allow open tcp connections"},
cli.BoolFlag{Name: "ext-unix-sk", Usage: "allow external unix sockets"},
cli.BoolFlag{Name: "shell-job", Usage: "allow shell jobs"},
cli.BoolFlag{Name: "lazy-pages", Usage: "use userfaultfd to lazily restore memory pages"},
cli.StringFlag{Name: "status-fd", Value: "", Usage: "criu writes \\0 to this FD once lazy-pages is ready"},
cli.StringFlag{Name: "page-server", Value: "", Usage: "ADDRESS:PORT of the page server"},
cli.BoolFlag{Name: "file-locks", Usage: "handle file locks, for safety"},
cli.BoolFlag{Name: "pre-dump", Usage: "dump container's memory information only, leave the container running after this"},
cli.StringFlag{Name: "manage-cgroups-mode", Value: "", Usage: "cgroups mode: 'soft' (default), 'full' and 'strict'"},
cli.StringSliceFlag{Name: "empty-ns", Usage: "create a namespace, but don't restore its properties"},
cli.BoolFlag{Name: "auto-dedup", Usage: "enable auto deduplication of memory images"},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, exactArgs); err != nil {
return err
}
// XXX: Currently this is untested with rootless containers.
if os.Geteuid() != 0 || system.RunningInUserNS() {
logrus.Warn("runc checkpoint is untested with rootless containers")
}
container, err := getContainer(context)
if err != nil {
return err
}
status, err := container.Status()
if err != nil {
return err
}
if status == libcontainer.Created || status == libcontainer.Stopped {
fatalf("Container cannot be checkpointed in %s state", status.String())
}
options := criuOptions(context)
if !(options.LeaveRunning || options.PreDump) {
// destroy container unless we tell CRIU to keep it
defer destroy(container)
}
// these are the mandatory criu options for a container
setPageServer(context, options)
setManageCgroupsMode(context, options)
if err := setEmptyNsMask(context, options); err != nil {
return err
}
return container.Checkpoint(options)
},
}
func getCheckpointImagePath(context *cli.Context) string {
imagePath := context.String("image-path")
if imagePath == "" {
imagePath = getDefaultImagePath(context)
}
return imagePath
}
func setPageServer(context *cli.Context, options *libcontainer.CriuOpts) {
// xxx following criu opts are optional
// The dump image can be sent to a criu page server
if psOpt := context.String("page-server"); psOpt != "" {
addressPort := strings.Split(psOpt, ":")
if len(addressPort) != 2 {
fatal(fmt.Errorf("Use --page-server ADDRESS:PORT to specify page server"))
}
portInt, err := strconv.Atoi(addressPort[1])
if err != nil {
fatal(fmt.Errorf("Invalid port number"))
}
options.PageServer = libcontainer.CriuPageServerInfo{
Address: addressPort[0],
Port: int32(portInt),
}
}
}
func setManageCgroupsMode(context *cli.Context, options *libcontainer.CriuOpts) {
if cgOpt := context.String("manage-cgroups-mode"); cgOpt != "" {
switch cgOpt {
case "soft":
options.ManageCgroupsMode = libcontainer.CRIU_CG_MODE_SOFT
case "full":
options.ManageCgroupsMode = libcontainer.CRIU_CG_MODE_FULL
case "strict":
options.ManageCgroupsMode = libcontainer.CRIU_CG_MODE_STRICT
default:
fatal(fmt.Errorf("Invalid manage cgroups mode"))
}
}
}
var namespaceMapping = map[specs.LinuxNamespaceType]int{
specs.NetworkNamespace: unix.CLONE_NEWNET,
}
func setEmptyNsMask(context *cli.Context, options *libcontainer.CriuOpts) error {
/* Runc doesn't manage network devices and their configuration */
nsmask := unix.CLONE_NEWNET
for _, ns := range context.StringSlice("empty-ns") {
f, exists := namespaceMapping[specs.LinuxNamespaceType(ns)]
if !exists {
return fmt.Errorf("namespace %q is not supported", ns)
}
nsmask |= f
}
options.EmptyNs = uint32(nsmask)
return nil
}
/*
* Copyright 2016 SUSE LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package main
import (
"fmt"
"io"
"io/ioutil"
"net"
"os"
"strings"
"github.com/containerd/console"
"github.com/opencontainers/runc/libcontainer/utils"
"github.com/urfave/cli"
)
// version will be populated by the Makefile, read from
// VERSION file of the source code.
var version = ""
// gitCommit will be the hash that the binary was built from
// and will be populated by the Makefile
var gitCommit = ""
const (
usage = `Open Container Initiative contrib/cmd/recvtty
recvtty is a reference implementation of a consumer of runC's --console-socket
API. It has two main modes of operation:
* single: Only permit one terminal to be sent to the socket, which is
then hooked up to the stdio of the recvtty process. This is useful
for rudimentary shell management of a container.
* null: Permit as many terminals to be sent to the socket, but they
are read to /dev/null. This is used for testing, and imitates the
old runC API's --console=/dev/pts/ptmx hack which would allow for a
similar trick. This is probably not what you want to use, unless
you're doing something like our bats integration tests.
To use recvtty, just specify a socket path at which you want to receive
terminals:
$ recvtty [--mode <single|null>] socket.sock
`
)
func bail(err error) {
fmt.Fprintf(os.Stderr, "[recvtty] fatal error: %v\n", err)
os.Exit(1)
}
func handleSingle(path string) error {
// Open a socket.
ln, err := net.Listen("unix", path)
if err != nil {
return err
}
defer ln.Close()
// We only accept a single connection, since we can only really have
// one reader for os.Stdin. Plus this is all a PoC.
conn, err := ln.Accept()
if err != nil {
return err
}
defer conn.Close()
// Close ln, to allow for other instances to take over.
ln.Close()
// Get the fd of the connection.
unixconn, ok := conn.(*net.UnixConn)
if !ok {
return fmt.Errorf("failed to cast to unixconn")
}
socket, err := unixconn.File()
if err != nil {
return err
}
defer socket.Close()
// Get the master file descriptor from runC.
master, err := utils.RecvFd(socket)
if err != nil {
return err
}
c, err := console.ConsoleFromFile(master)
if err != nil {
return err
}
console.ClearONLCR(c.Fd())
// Copy from our stdio to the master fd.
quitChan := make(chan struct{})
go func() {
io.Copy(os.Stdout, c)
quitChan <- struct{}{}
}()
go func() {
io.Copy(c, os.Stdin)
quitChan <- struct{}{}
}()
// Only close the master fd once we've stopped copying.
<-quitChan
c.Close()
return nil
}
func handleNull(path string) error {
// Open a socket.
ln, err := net.Listen("unix", path)
if err != nil {
return err
}
defer ln.Close()
// As opposed to handleSingle we accept as many connections as we get, but
// we don't interact with Stdin at all (and we copy stdout to /dev/null).
for {
conn, err := ln.Accept()
if err != nil {
return err
}
go func(conn net.Conn) {
// Don't leave references lying around.
defer conn.Close()
// Get the fd of the connection.
unixconn, ok := conn.(*net.UnixConn)
if !ok {
return
}
socket, err := unixconn.File()
if err != nil {
return
}
defer socket.Close()
// Get the master file descriptor from runC.
master, err := utils.RecvFd(socket)
if err != nil {
return
}
// Just do a dumb copy to /dev/null.
devnull, err := os.OpenFile("/dev/null", os.O_RDWR, 0)
if err != nil {
// TODO: Handle this nicely.
return
}
io.Copy(devnull, master)
devnull.Close()
}(conn)
}
}
func main() {
app := cli.NewApp()
app.Name = "recvtty"
app.Usage = usage
// Set version to be the same as runC.
var v []string
if version != "" {
v = append(v, version)
}
if gitCommit != "" {
v = append(v, fmt.Sprintf("commit: %s", gitCommit))
}
app.Version = strings.Join(v, "\n")
// Set the flags.
app.Flags = []cli.Flag{
cli.StringFlag{
Name: "mode, m",
Value: "single",
Usage: "Mode of operation (single or null)",
},
cli.StringFlag{
Name: "pid-file",
Value: "",
Usage: "Path to write daemon process ID to",
},
}
app.Action = func(ctx *cli.Context) error {
args := ctx.Args()
if len(args) != 1 {
return fmt.Errorf("need to specify a single socket path")
}
path := ctx.Args()[0]
pidPath := ctx.String("pid-file")
if pidPath != "" {
pid := fmt.Sprintf("%d\n", os.Getpid())
if err := ioutil.WriteFile(pidPath, []byte(pid), 0644); err != nil {
return err
}
}
switch ctx.String("mode") {
case "single":
if err := handleSingle(path); err != nil {
return err
}
case "null":
if err := handleNull(path); err != nil {
return err
}
default:
return fmt.Errorf("need to select a valid mode: %s", ctx.String("mode"))
}
return nil
}
if err := app.Run(os.Args); err != nil {
bail(err)
}
}
#!/bin/bash
#
# bash completion file for runc command
#
# This script provides completion of:
# - commands and their options
# - filepaths
#
# To enable the completions either:
# - place this file in /usr/share/bash-completion/completions
# or
# - copy this file to e.g. ~/.runc-completion.sh and add the line
# below to your .bashrc after bash completion features are loaded
# . ~/.runc-completion.sh
#
# Configuration:
#
# Note for developers:
# Please arrange options sorted alphabetically by long name with the short
# options immediately following their corresponding long form.
# This order should be applied to lists, alternatives and code blocks.
__runc_previous_extglob_setting=$(shopt -p extglob)
shopt -s extglob
__runc_list_all() {
COMPREPLY=($(compgen -W "$(runc list -q)" -- $cur))
}
__runc_pos_first_nonflag() {
local argument_flags=$1
local counter=$((${subcommand_pos:-${command_pos}} + 1))
while [ $counter -le $cword ]; do
if [ -n "$argument_flags" ] && eval "case '${words[$counter]}' in $argument_flags) true ;; *) false ;; esac"; then
((counter++))
else
case "${words[$counter]}" in
-*) ;;
*)
break
;;
esac
fi
((counter++))
done
echo $counter
}
# Transforms a multiline list of strings into a single line string
# with the words separated by "|".
# This is used to prepare arguments to __runc_pos_first_nonflag().
__runc_to_alternatives() {
local parts=($1)
local IFS='|'
echo "${parts[*]}"
}
# Transforms a multiline list of options into an extglob pattern
# suitable for use in case statements.
__runc_to_extglob() {
local extglob=$(__runc_to_alternatives "$1")
echo "@($extglob)"
}
# Subcommand processing.
# Locates the first occurrence of any of the subcommands contained in the
# first argument. In case of a match, calls the corresponding completion
# function and returns 0.
# If no match is found, 1 is returned. The calling function can then
# continue processing its completion.
#
# TODO if the preceding command has options that accept arguments and an
# argument is equal to one of the subcommands, this is falsely detected as
# a match.
__runc_subcommands() {
local subcommands="$1"
local counter=$(($command_pos + 1))
while [ $counter -lt $cword ]; do
case "${words[$counter]}" in
$(__runc_to_extglob "$subcommands"))
subcommand_pos=$counter
local subcommand=${words[$counter]}
local completions_func=_runc_${command}_${subcommand}
declare -F $completions_func >/dev/null && $completions_func
return 0
;;
esac
((counter++))
done
return 1
}
# List all Signals
__runc_list_signals() {
COMPREPLY=($(compgen -W "$(for i in $(kill -l | xargs); do echo $i; done | grep SIG)"))
}
# suppress trailing whitespace
__runc_nospace() {
# compopt is not available in ancient bash versions
type compopt &>/dev/null && compopt -o nospace
}
# The list of capabilities is defined in types.go, ALL was added manually.
__runc_complete_capabilities() {
COMPREPLY=($(compgen -W "
ALL
AUDIT_CONTROL
AUDIT_WRITE
AUDIT_READ
BLOCK_SUSPEND
CHOWN
DAC_OVERRIDE
DAC_READ_SEARCH
FOWNER
FSETID
IPC_LOCK
IPC_OWNER
KILL
LEASE
LINUX_IMMUTABLE
MAC_ADMIN
MAC_OVERRIDE
MKNOD
NET_ADMIN
NET_BIND_SERVICE
NET_BROADCAST
NET_RAW
SETFCAP
SETGID
SETPCAP
SETUID
SYS_ADMIN
SYS_BOOT
SYS_CHROOT
SYSLOG
SYS_MODULE
SYS_NICE
SYS_PACCT
SYS_PTRACE
SYS_RAWIO
SYS_RESOURCE
SYS_TIME
SYS_TTY_CONFIG
WAKE_ALARM
" -- "$cur"))
}
_runc_exec() {
local boolean_options="
--help
--no-new-privs
--tty, -t
--detach, -d
"
local options_with_args="
--console-socket
--cwd
--env, -e
--user, -u
--additional-gids, -g
--process, -p
--pid-file
--process-label
--apparmor
--cap, -c
--preserve-fds
"
local all_options="$options_with_args $boolean_options"
case "$prev" in
--cap | -c)
__runc_complete_capabilities
return
;;
--console-socket | --cwd | --process | --apparmor)
case "$cur" in
*:*) ;; # TODO somehow do _filedir for stuff inside the image, if it's already specified (which is also somewhat difficult to determine)
'')
COMPREPLY=($(compgen -W '/' -- "$cur"))
__runc_nospace
;;
/*)
_filedir
__runc_nospace
;;
esac
return
;;
--env | -e)
COMPREPLY=($(compgen -e -- "$cur"))
__runc_nospace
return
;;
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$all_options" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
# global options that may appear after the runc command
_runc_runc() {
local boolean_options="
$global_boolean_options
--help
--version -v
--debug
"
local options_with_args="
--log
--log-format
--root
--criu
--rootless
"
case "$prev" in
--log | --root | --criu)
case "$cur" in
*:*) ;; # TODO somehow do _filedir for stuff inside the image, if it's already specified (which is also somewhat difficult to determine)
'')
COMPREPLY=($(compgen -W '/' -- "$cur"))
__runc_nospace
;;
*)
_filedir
__runc_nospace
;;
esac
return
;;
--log-format)
COMPREPLY=($(compgen -W 'text json' -- "$cur"))
return
;;
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
local counter=$(__runc_pos_first_nonflag $(__runc_to_extglob "$options_with_args"))
if [ $cword -eq $counter ]; then
COMPREPLY=($(compgen -W "${commands[*]} help" -- "$cur"))
fi
;;
esac
}
_runc_pause() {
local boolean_options="
--help
-h
"
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_ps() {
local boolean_options="
--help
-h
"
local options_with_args="
--format, -f
"
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_delete() {
local boolean_options="
--help
-h
--format, -f
"
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_kill() {
local boolean_options="
--help
-h
--all
-a
"
case "$prev" in
"kill")
__runc_list_all
return
;;
*)
__runc_list_signals
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_events() {
local boolean_options="
--help
--stats
"
local options_with_args="
--interval
"
case "$prev" in
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_list() {
local boolean_options="
--help
--quiet
-q
"
local options_with_args="
--format
-f
"
case "$prev" in
--format | -f)
COMPREPLY=($(compgen -W 'text json' -- "$cur"))
return
;;
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
local counter=$(__runc_pos_first_nonflag $(__runc_to_extglob "$options_with_args"))
;;
esac
}
_runc_spec() {
local boolean_options="
--help
--rootless
"
local options_with_args="
--bundle
-b
"
case "$prev" in
--bundle | -b)
case "$cur" in
'')
COMPREPLY=($(compgen -W '/' -- "$cur"))
__runc_nospace
;;
/*)
_filedir
__runc_nospace
;;
esac
return
;;
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
local counter=$(__runc_pos_first_nonflag $(__runc_to_extglob "$options_with_args"))
;;
esac
}
_runc_run() {
local boolean_options="
--help
--detatch
-d
--no-subreaper
--no-pivot
--no-new-keyring
"
local options_with_args="
--bundle
-b
--console-socket
--pid-file
--preserve-fds
"
case "$prev" in
--bundle | -b | --console-socket | --pid-file)
case "$cur" in
'')
COMPREPLY=($(compgen -W '/' -- "$cur"))
__runc_nospace
;;
/*)
_filedir
__runc_nospace
;;
esac
return
;;
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_checkpoint() {
local boolean_options="
--help
-h
--leave-running
--tcp-established
--ext-unix-sk
--shell-job
--lazy-pages
--file-locks
--pre-dump
--auto-dedup
"
local options_with_args="
--image-path
--work-path
--parent-path
--status-fd
--page-server
--manage-cgroups-mode
--empty-ns
"
case "$prev" in
--page-server) ;;
--manage-cgroups-mode)
COMPREPLY=($(compgen -W "soft full strict" -- "$cur"))
return
;;
--image-path | --work-path | --parent-path)
case "$cur" in
*:*) ;; # TODO somehow do _filedir for stuff inside the image, if it's already specified (which is also somewhat difficult to determine)
'')
COMPREPLY=($(compgen -W '/' -- "$cur"))
__runc_nospace
;;
*)
_filedir
__runc_nospace
;;
esac
return
;;
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_create() {
local boolean_options="
--help
--no-pivot
--no-new-keyring
"
local options_with_args="
--bundle
-b
--console-socket
--pid-file
--preserve-fds
"
case "$prev" in
--bundle | -b | --console-socket | --pid-file)
case "$cur" in
'')
COMPREPLY=($(compgen -W '/' -- "$cur"))
__runc_nospace
;;
/*)
_filedir
__runc_nospace
;;
esac
return
;;
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_help() {
local counter=$(__runc_pos_first_nonflag)
if [ $cword -eq $counter ]; then
COMPREPLY=($(compgen -W "${commands[*]}" -- "$cur"))
fi
}
_runc_restore() {
local boolean_options="
--help
--tcp-established
--ext-unix-sk
--shell-job
--file-locks
--detach
-d
--no-subreaper
--no-pivot
--auto-dedup
--lazy-pages
"
local options_with_args="
-b
--bundle
--image-path
--work-path
--manage-cgroups-mode
--pid-file
--empty-ns
"
local all_options="$options_with_args $boolean_options"
case "$prev" in
--manage-cgroups-mode)
COMPREPLY=($(compgen -W "soft full strict" -- "$cur"))
return
;;
--pid-file | --image-path | --work-path | --bundle | -b)
case "$cur" in
*:*) ;; # TODO somehow do _filedir for stuff inside the image, if it's already specified (which is also somewhat difficult to determine)
'')
COMPREPLY=($(compgen -W '/' -- "$cur"))
__runc_nospace
;;
/*)
_filedir
__runc_nospace
;;
esac
return
;;
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$all_options" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_resume() {
local boolean_options="
--help
-h
"
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_state() {
local boolean_options="
--help
-h
"
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_start() {
local boolean_options="
--help
-h
"
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc_update() {
local boolean_options="
--help
"
local options_with_args="
--blkio-weight
--cpu-period
--cpu-quota
--cpu-rt-period
--cpu-rt-runtime
--cpu-share
--cpuset-cpus
--cpuset-mems
--kernel-memory
--kernel-memory-tcp
--memory
--memory-reservation
--memory-swap
--pids-limit
--l3-cache-schema
--mem-bw-schema
"
case "$prev" in
$(__runc_to_extglob "$options_with_args"))
return
;;
esac
case "$cur" in
-*)
COMPREPLY=($(compgen -W "$boolean_options $options_with_args" -- "$cur"))
;;
*)
__runc_list_all
;;
esac
}
_runc() {
local previous_extglob_setting=$(shopt -p extglob)
shopt -s extglob
local commands=(
checkpoint
create
delete
events
exec
init
kill
list
pause
ps
restore
resume
run
spec
start
state
update
help
h
)
# These options are valid as global options for all client commands
# and valid as command options for `runc daemon`
local global_boolean_options="
--help -h
--version -v
"
COMPREPLY=()
local cur prev words cword
_get_comp_words_by_ref -n : cur prev words cword
local command='runc' command_pos=0 subcommand_pos
local counter=1
while [ $counter -lt $cword ]; do
case "${words[$counter]}" in
-*) ;;
=)
((counter++))
;;
*)
command="${words[$counter]}"
command_pos=$counter
break
;;
esac
((counter++))
done
local completions_func=_runc_${command}
declare -F $completions_func >/dev/null && $completions_func
eval "$previous_extglob_setting"
return 0
}
eval "$__runc_previous_extglob_setting"
unset __runc_previous_extglob_setting
complete -F _runc runc
package main
import (
"os"
"github.com/urfave/cli"
)
var createCommand = cli.Command{
Name: "create",
Usage: "create a container",
ArgsUsage: `<container-id>
Where "<container-id>" is your name for the instance of the container that you
are starting. The name you provide for the container instance must be unique on
your host.`,
Description: `The create command creates an instance of a container for a bundle. The bundle
is a directory with a specification file named "` + specConfig + `" and a root
filesystem.
The specification file includes an args parameter. The args parameter is used
to specify command(s) that get run when the container is started. To change the
command(s) that get executed on start, edit the args parameter of the spec. See
"runc spec --help" for more explanation.`,
Flags: []cli.Flag{
cli.StringFlag{
Name: "bundle, b",
Value: "",
Usage: `path to the root of the bundle directory, defaults to the current directory`,
},
cli.StringFlag{
Name: "console-socket",
Value: "",
Usage: "path to an AF_UNIX socket which will receive a file descriptor referencing the master end of the console's pseudoterminal",
},
cli.StringFlag{
Name: "pid-file",
Value: "",
Usage: "specify the file to write the process id to",
},
cli.BoolFlag{
Name: "no-pivot",
Usage: "do not use pivot root to jail process inside rootfs. This should be used whenever the rootfs is on top of a ramdisk",
},
cli.BoolFlag{
Name: "no-new-keyring",
Usage: "do not create a new session keyring for the container. This will cause the container to inherit the calling processes session key",
},
cli.IntFlag{
Name: "preserve-fds",
Usage: "Pass N additional file descriptors to the container (stdio + $LISTEN_FDS + N in total)",
},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, exactArgs); err != nil {
return err
}
if err := revisePidFile(context); err != nil {
return err
}
spec, err := setupSpec(context)
if err != nil {
return err
}
status, err := startContainer(context, spec, CT_ACT_CREATE, nil)
if err != nil {
return err
}
// exit with the container's exit status so any external supervisor is
// notified of the exit with the correct exit status.
os.Exit(status)
return nil
},
}
// +build !solaris
package main
import (
"fmt"
"os"
"path/filepath"
"time"
"github.com/opencontainers/runc/libcontainer"
"github.com/urfave/cli"
"golang.org/x/sys/unix"
)
func killContainer(container libcontainer.Container) error {
_ = container.Signal(unix.SIGKILL, false)
for i := 0; i < 100; i++ {
time.Sleep(100 * time.Millisecond)
if err := container.Signal(unix.Signal(0), false); err != nil {
destroy(container)
return nil
}
}
return fmt.Errorf("container init still running")
}
var deleteCommand = cli.Command{
Name: "delete",
Usage: "delete any resources held by the container often used with detached container",
ArgsUsage: `<container-id>
Where "<container-id>" is the name for the instance of the container.
EXAMPLE:
For example, if the container id is "ubuntu01" and runc list currently shows the
status of "ubuntu01" as "stopped" the following will delete resources held for
"ubuntu01" removing "ubuntu01" from the runc list of containers:
# runc delete ubuntu01`,
Flags: []cli.Flag{
cli.BoolFlag{
Name: "force, f",
Usage: "Forcibly deletes the container if it is still running (uses SIGKILL)",
},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, exactArgs); err != nil {
return err
}
id := context.Args().First()
force := context.Bool("force")
container, err := getContainer(context)
if err != nil {
if lerr, ok := err.(libcontainer.Error); ok && lerr.Code() == libcontainer.ContainerNotExists {
// if there was an aborted start or something of the sort then the container's directory could exist but
// libcontainer does not see it because the state.json file inside that directory was never created.
path := filepath.Join(context.GlobalString("root"), id)
if e := os.RemoveAll(path); e != nil {
fmt.Fprintf(os.Stderr, "remove %s: %v\n", path, e)
}
if force {
return nil
}
}
return err
}
s, err := container.Status()
if err != nil {
return err
}
switch s {
case libcontainer.Stopped:
destroy(container)
case libcontainer.Created:
return killContainer(container)
default:
if force {
return killContainer(container)
}
return fmt.Errorf("cannot delete container %s that is not stopped: %s\n", id, s)
}
return nil
},
}
# Checkpoint and Restore #
For a basic description about checkpointing and restoring containers with
`runc` please see [runc-checkpoint(8)](../man/runc-checkpoint.8.md) and
[runc-restore(8)](../man/runc-restore.8.md).
## Checkpoint/Restore Annotations ##
In addition to specifying options on the command-line like it is described
in the man-pages (see above), it is also possible to influence CRIU's
behaviour using CRIU configuration files. For details about CRIU's
configuration file support please see [CRIU's wiki](https://criu.org/Configuration_files).
In addition to CRIU's default configuration files `runc` tells CRIU to
also evaluate the file `/etc/criu/runc.conf`. Using the annotation
`org.criu.config` it is, however, possible to change this additional
CRIU configuration file.
If the annotation `org.criu.config` is set to an empty string `runc`
will not pass any additional configuration file to CRIU. With an empty
string it is therefore possible to disable the additional CRIU configuration
file. This can be used to make sure that no additional configuration file
changes CRIU's behaviour accidentally.
If the annotation `org.criu.config` is set to a non-empty string `runc` will
pass that string to CRIU to be evaluated as an additional configuration file.
If CRIU cannot open this additional configuration file, it will ignore this
file and continue.
### Annotation Example to disable additional CRIU configuration file ###
```
{
"ociVersion": "1.0.0",
"annotations": {
"org.criu.config": ""
},
"process": {
```
### Annotation Example to set a specific CRIU configuration file ###
```
{
"ociVersion": "1.0.0",
"annotations": {
"org.criu.config": "/etc/special-runc-criu-options"
},
"process": {
```
## Changing systemd unit properties
In case runc uses systemd to set cgroup parameters for a container (i.e.
`--systemd-cgroup` CLI flag is set), systemd creates a scope (a.k.a.
transient unit) for the container, usually named like `runc-$ID.scope`.
The systemd properties of this unit (shown by `systemctl show runc-$ID.scope`
after the container is started) can be modified by adding annotations
to container's runtime spec (`config.json`). For example:
```json
"annotations": {
"org.systemd.property.TimeoutStopUSec": "uint64 123456789",
"org.systemd.property.CollectMode":"'inactive-or-failed'"
},
```
The above will set the following properties:
* `TimeoutStopSec` to 2 minutes and 3 seconds;
* `CollectMode` to "inactive-or-failed".
The values must be in the gvariant format (for details, see
[gvariant documentation](https://developer.gnome.org/glib/stable/gvariant-text.html)).
To find out which type systemd expects for a particular parameter, please
consult systemd sources.
# Terminals and Standard IO #
*Note that the default configuration of `runc` (foreground, new terminal) is
generally the best option for most users. This document exists to help explain
what the purpose of the different modes is, and to try to steer users away from
common mistakes and misunderstandings.*
In general, most processes on Unix (and Unix-like) operating systems have 3
standard file descriptors provided at the start, collectively referred to as
"standard IO" (`stdio`):
* `0`: standard-in (`stdin`), the input stream into the process
* `1`: standard-out (`stdout`), the output stream from the process
* `2`: standard-error (`stderr`), the error stream from the process
When creating and running a container via `runc`, it is important to take care
to structure the `stdio` the new container's process receives. In some ways
containers are just regular processes, while in other ways they're an isolated
sub-partition of your machine (in a similar sense to a VM). This means that the
structure of IO is not as simple as with ordinary programs (which generally
just use the file descriptors you give them).
## Other File Descriptors ##
Before we continue, it is important to note that processes can have more file
descriptors than just `stdio`. By default in `runc` no other file descriptors
will be passed to the spawned container process. If you wish to explicitly pass
file descriptors to the container you have to use the `--preserve-fds` option.
These ancillary file descriptors don't have any of the strange semantics
discussed further in this document (those only apply to `stdio`) -- they are
passed untouched by `runc`.
It should be noted that `--preserve-fds` does not take individual file
descriptors to preserve. Instead, it takes how many file descriptors (not
including `stdio` or `LISTEN_FDS`) should be passed to the container. In the
following example:
```
% runc run --preserve-fds 5 <container>
```
`runc` will pass the first `5` file descriptors (`3`, `4`, `5`, `6`, and `7` --
assuming that `LISTEN_FDS` has not been configured) to the container.
In addition to `--preserve-fds`, `LISTEN_FDS` file descriptors are passed
automatically to allow for `systemd`-style socket activation. To extend the
above example:
```
% LISTEN_PID=$pid_of_runc LISTEN_FDS=3 runc run --preserve-fds 5 <container>
```
`runc` will now pass the first `8` file descriptors (and it will also pass
`LISTEN_FDS=3` and `LISTEN_PID=1` to the container). The first `3` (`3`, `4`,
and `5`) were passed due to `LISTEN_FDS` and the other `5` (`6`, `7`, `8`, `9`,
and `10`) were passed due to `--preserve-fds`. You should keep this in mind if
you use `runc` directly in something like a `systemd` unit file. To disable
this `LISTEN_FDS`-style passing just unset `LISTEN_FDS`.
**Be very careful when passing file descriptors to a container process.** Due
to some Linux kernel (mis)features, a container with access to certain types of
file descriptors (such as `O_PATH` descriptors) outside of the container's root
file system can use these to break out of the container's pivoted mount
namespace. [This has resulted in CVEs in the past.][CVE-2016-9962]
[CVE-2016-9962]: https://nvd.nist.gov/vuln/detail/CVE-2016-9962
## <a name="terminal-modes" /> Terminal Modes ##
`runc` supports two distinct methods for passing `stdio` to the container's
primary process:
* [new terminal](#new-terminal) (`terminal: true`)
* [pass-through](#pass-through) (`terminal: false`)
When first using `runc` these two modes will look incredibly similar, but this
can be quite deceptive as these different modes have quite different
characteristics.
By default, `runc spec` will create a configuration that will create a new
terminal (`terminal: true`). However, if the `terminal: ...` line is not
present in `config.json` then pass-through is the default.
*In general we recommend using new terminal, because it means that tools like
`sudo` will work inside your container. But pass-through can be useful if you
know what you're doing, or if you're using `runc` as part of a non-interactive
pipeline.*
### <a name="new-terminal"> New Terminal ###
In new terminal mode, `runc` will create a brand-new "console" (or more
precisely, a new pseudo-terminal using the container's namespaced
`/dev/pts/ptmx`) for your contained process to use as its `stdio`.
When you start a process in new terminal mode, `runc` will do the following:
1. Create a new pseudo-terminal.
2. Pass the slave end to the container's primary process as its `stdio`.
3. Send the master end to a process to interact with the `stdio` for the
container's primary process ([details below](#runc-modes)).
It should be noted that since a new pseudo-terminal is being used for
communication with the container, some strange properties of pseudo-terminals
might surprise you. For instance, by default, all new pseudo-terminals
translate the byte `'\n'` to the sequence `'\r\n'` on both `stdout` and
`stderr`. In addition there are [a whole range of `ioctls(2)` that can only
interact with pseudo-terminal `stdio`][tty_ioctl(4)].
> **NOTE**: In new terminal mode, all three `stdio` file descriptors are the
> same underlying file. The reason for this is to match how a shell's `stdio`
> looks to a process (as well as remove race condition issues with having to
> deal with multiple master pseudo-terminal file descriptors). However this
> means that it is not really possible to uniquely distinguish between `stdout`
> and `stderr` from the caller's perspective.
[tty_ioctl(4)]: https://linux.die.net/man/4/tty_ioctl
### <a name="pass-through"> Pass-Through ###
If you have already set up some file handles that you wish your contained
process to use as its `stdio`, then you can ask `runc` to pass them through to
the contained process (this is not necessarily the same as `--preserve-fds`'s
passing of file descriptors -- [details below](#runc-modes)). As an example
(assuming that `terminal: false` is set in `config.json`):
```
% echo input | runc run some_container > /tmp/log.out 2>& /tmp/log.err
```
Here the container's various `stdio` file descriptors will be substituted with
the following:
* `stdin` will be sourced from the `echo input` pipeline.
* `stdout` will be output into `/tmp/log.out` on the host.
* `stderr` will be output into `/tmp/log.err` on the host.
It should be noted that the actual file handles seen inside the container may
be different [based on the mode `runc` is being used in](#runc-modes) (for
instance, the file referenced by `1` could be `/tmp/log.out` directly or a pipe
which `runc` is using to buffer output, based on the mode). However the net
result will be the same in either case. In principle you could use the [new
terminal mode](#new-terminal) in a pipeline, but the difference will become
more clear when you are introduced to [`runc`'s detached mode](#runc-modes).
## <a name="runc-modes" /> `runc` Modes ##
`runc` itself runs in two modes:
* [foreground](#foreground)
* [detached](#detached)
You can use either [terminal mode](#terminal-modes) with either `runc` mode.
However, there are considerations that may indicate preference for one mode
over another. It should be noted that while two types of modes (terminal and
`runc`) are conceptually independent from each other, you should be aware of
the intricacies of which combination you are using.
*In general we recommend using foreground because it's the most
straight-forward to use, with the only downside being that you will have a
long-running `runc` process. Detached mode is difficult to get right and
generally requires having your own `stdio` management.*
### Foreground ###
The default (and most straight-forward) mode of `runc`. In this mode, your
`runc` command remains in the foreground with the container process as a child.
All `stdio` is buffered through the foreground `runc` process (irrespective of
which terminal mode you are using). This is conceptually quite similar to
running a normal process interactively in a shell (and if you are using `runc`
in a shell interactively, this is what you should use).
Because the `stdio` will be buffered in this mode, some very important
peculiarities of this mode should be kept in mind:
* With [new terminal mode](#new-terminal), the container will see a
pseudo-terminal as its `stdio` (as you might expect). However, the `stdio` of
the foreground `runc` process will remain the `stdio` that the process was
started with -- and `runc` will copy all `stdio` between its `stdio` and the
container's `stdio`. This means that while a new pseudo-terminal has been
created, the foreground `runc` process manages it over the lifetime of the
container.
* With [pass-through mode](#pass-through), the foreground `runc`'s `stdio` is
**not** passed to the container. Instead, the container's `stdio` is a set of
pipes which are used to copy data between `runc`'s `stdio` and the
container's `stdio`. This means that the container never has direct access to
host file descriptors (aside from the pipes created by the container runtime,
but that shouldn't be an issue).
The main drawback of the foreground mode of operation is that it requires a
long-running foreground `runc` process. If you kill the foreground `runc`
process then you will no longer have access to the `stdio` of the container
(and in most cases this will result in the container dying abnormally due to
`SIGPIPE` or some other error). By extension this means that any bug in the
long-running foreground `runc` process (such as a memory leak) or a stray
OOM-kill sweep could result in your container being killed **through no fault
of the user**. In addition, there is no way in foreground mode of passing a
file descriptor directly to the container process as its `stdio` (like
`--preserve-fds` does).
These shortcomings are obviously sub-optimal and are the reason that `runc` has
an additional mode called "detached mode".
### Detached ###
In contrast to foreground mode, in detached mode there is no long-running
foreground `runc` process once the container has started. In fact, there is no
long-running `runc` process at all. However, this means that it is up to the
caller to handle the `stdio` after `runc` has set it up for you. In a shell
this means that the `runc` command will exit and control will return to the
shell, after the container has been set up.
You can run `runc` in detached mode in one of the following ways:
* `runc run -d ...` which operates similar to `runc run` but is detached.
* `runc create` followed by `runc start` which is the standard container
lifecycle defined by the OCI runtime specification (`runc create` sets up the
container completely, waiting for `runc start` to begin execution of user
code).
The main use-case of detached mode is for higher-level tools that want to be
wrappers around `runc`. By running `runc` in detached mode, those tools have
far more control over the container's `stdio` without `runc` getting in the
way (most wrappers around `runc` like `cri-o` or `containerd` use detached mode
for this reason).
Unfortunately using detached mode is a bit more complicated and requires more
care than the foreground mode -- mainly because it is now up to the caller to
handle the `stdio` of the container.
Another complication is that the parent process is responsible for acting as
the subreaper for the container. In short, you need to call
`prctl(PR_SET_CHILD_SUBREAPER, 1, ...)` in the parent process and correctly
handle the implications of being a subreaper. Failing to do so may result in
zombie processes being accumulated on your host.
These tasks are usually performed by a dedicated (and minimal) monitor process
per-container. For the sake of comparison, other runtimes such as LXC do not
have an equivalent detached mode and instead integrate this monitor process
into the container runtime itself -- this has several tradeoffs, and runc has
opted to support delegating the monitoring responsibility to the parent process
through this detached mode.
#### Detached Pass-Through ####
In detached mode, pass-through actually does what it says on the tin -- the
`stdio` file descriptors of the `runc` process are passed through (untouched)
to the container's `stdio`. The purpose of this option is to allow a user to
set up `stdio` for a container themselves and then force `runc` to just use
their pre-prepared `stdio` (without any pseudo-terminal funny business). *If
you don't see why this would be useful, don't use this option.*
**You must be incredibly careful when using detached pass-through (especially
in a shell).** The reason for this is that by using detached pass-through you
are passing host file descriptors to the container. In the case of a shell,
usually your `stdio` is going to be a pseudo-terminal (on your host). A
malicious container could take advantage of TTY-specific `ioctls` like
`TIOCSTI` to fake input into the **host** shell (remember that in detached
mode, control is returned to your shell and so the terminal you've given the
container is being read by a shell prompt).
There are also several other issues with running non-malicious containers in a
shell with detached pass-through (where you pass your shell's `stdio` to the
container):
* Output from the container will be interleaved with output from your shell (in
a non-deterministic way), without any real way of distinguishing from where a
particular piece of output came from.
* Any input to `stdin` will be non-deterministically split and given to either
the container or the shell (because both are blocked on a `read(2)` of the
same FIFO-style file descriptor).
They are all related to the fact that there is going to be a race when either
your host or the container tries to read from (or write to) `stdio`. This
problem is especially obvious when in a shell, where usually the terminal has
been put into raw mode (where each individual key-press should cause `read(2)`
to return).
> **NOTE**: There is also currently a [known problem][issue-1721] where using
> detached pass-through will result in the container hanging if the `stdout` or
> `stderr` is a pipe (though this should be a temporary issue).
[issue-1721]: https://github.com/opencontainers/runc/issues/1721
#### Detached New Terminal ####
When creating a new pseudo-terminal in detached mode, and fairly obvious
problem appears -- how do we use the new terminal that `runc` created? Unlike
in pass-through, `runc` has created a new set of file descriptors that need to
be used by *something* in order for container communication to work.
The way this problem is resolved is through the use of Unix domain sockets.
There is a feature of Unix sockets called `SCM_RIGHTS` which allows a file
descriptor to be sent through a Unix socket to a completely separate process
(which can then use that file descriptor as though they opened it). When using
`runc` in detached new terminal mode, this is how a user gets access to the
pseudo-terminal's master file descriptor.
To this end, there is a new option (which is required if you want to use `runc`
in detached new terminal mode): `--console-socket`. This option takes the path
to a Unix domain socket which `runc` will connect to and send the
pseudo-terminal master file descriptor down. The general process for getting
the pseudo-terminal master is as follows:
1. Create a Unix domain socket at some path, `$socket_path`.
2. Call `runc run` or `runc create` with the argument `--console-socket
$socket_path`.
3. Using `recvmsg(2)` retrieve the file descriptor sent using `SCM_RIGHTS` by
`runc`.
4. Now the manager can interact with the `stdio` of the container, using the
retrieved pseudo-terminal master.
After `runc` exits, the only process with a copy of the pseudo-terminal master
file descriptor is whoever read the file descriptor from the socket.
> **NOTE**: Currently `runc` doesn't support abstract socket addresses (due to
> it not being possible to pass an `argv` with a null-byte as the first
> character). In the future this may change, but currently you must use a valid
> path name.
In order to help users make use of detached new terminal mode, we have provided
a [Go implementation in the `go-runc` bindings][containerd/go-runc.Socket], as
well as [a simple client][recvtty].
[containerd/go-runc.Socket]: https://godoc.org/github.com/containerd/go-runc#Socket
[recvtty]: /contrib/cmd/recvtty
package main
import (
"fmt"
"os"
"runtime"
"github.com/opencontainers/runc/libcontainer/logs"
"github.com/opencontainers/runc/libenclave"
"github.com/sirupsen/logrus"
"github.com/urfave/cli"
)
func init() {
if len(os.Args) > 1 && os.Args[1] == "enclave" {
runtime.GOMAXPROCS(1)
runtime.LockOSThread()
level := os.Getenv("_LIBENCLAVE_LOGLEVEL")
logLevel, err := logs.ParseLogLevel(level)
if err != nil {
panic(fmt.Sprintf("runelet: failed to parse log level: %q: %v", level,
err))
}
err = logs.ConfigureLogging(logs.Config{
LogPipeFd: os.Getenv("_LIBENCLAVE_LOGPIPE"),
LogFormat: "json",
LogLevel: logLevel,
})
if err != nil {
panic(fmt.Sprintf("runelet: failed to configure logging: %v", err))
}
logrus.Debug("runelet process started")
}
}
var enclaveCommand = cli.Command{
Name: "enclave",
Usage: `initialize the enclave runtime (do not call it outside of rune)`,
Action: func(context *cli.Context) error {
exitCode, err := libenclave.StartInitialization()
if err != nil {
logrus.Fatal(err)
}
os.Exit(int(exitCode))
panic("runelet process failed to exit")
},
}
// +build linux
package main
import (
"encoding/json"
"fmt"
"os"
"sync"
"time"
"github.com/opencontainers/runc/libcontainer"
"github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/intelrdt"
"github.com/opencontainers/runc/types"
"github.com/sirupsen/logrus"
"github.com/urfave/cli"
)
var eventsCommand = cli.Command{
Name: "events",
Usage: "display container events such as OOM notifications, cpu, memory, and IO usage statistics",
ArgsUsage: `<container-id>
Where "<container-id>" is the name for the instance of the container.`,
Description: `The events command displays information about the container. By default the
information is displayed once every 5 seconds.`,
Flags: []cli.Flag{
cli.DurationFlag{Name: "interval", Value: 5 * time.Second, Usage: "set the stats collection interval"},
cli.BoolFlag{Name: "stats", Usage: "display the container's stats then exit"},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, exactArgs); err != nil {
return err
}
container, err := getContainer(context)
if err != nil {
return err
}
duration := context.Duration("interval")
if duration <= 0 {
return fmt.Errorf("duration interval must be greater than 0")
}
status, err := container.Status()
if err != nil {
return err
}
if status == libcontainer.Stopped {
return fmt.Errorf("container with id %s is not running", container.ID())
}
var (
stats = make(chan *libcontainer.Stats, 1)
events = make(chan *types.Event, 1024)
group = &sync.WaitGroup{}
)
group.Add(1)
go func() {
defer group.Done()
enc := json.NewEncoder(os.Stdout)
for e := range events {
if err := enc.Encode(e); err != nil {
logrus.Error(err)
}
}
}()
if context.Bool("stats") {
s, err := container.Stats()
if err != nil {
return err
}
events <- &types.Event{Type: "stats", ID: container.ID(), Data: convertLibcontainerStats(s)}
close(events)
group.Wait()
return nil
}
go func() {
for range time.Tick(context.Duration("interval")) {
s, err := container.Stats()
if err != nil {
logrus.Error(err)
continue
}
stats <- s
}
}()
n, err := container.NotifyOOM()
if err != nil {
return err
}
for {
select {
case _, ok := <-n:
if ok {
// this means an oom event was received, if it is !ok then
// the channel was closed because the container stopped and
// the cgroups no longer exist.
events <- &types.Event{Type: "oom", ID: container.ID()}
} else {
n = nil
}
case s := <-stats:
events <- &types.Event{Type: "stats", ID: container.ID(), Data: convertLibcontainerStats(s)}
}
if n == nil {
close(events)
break
}
}
group.Wait()
return nil
},
}
func convertLibcontainerStats(ls *libcontainer.Stats) *types.Stats {
cg := ls.CgroupStats
if cg == nil {
return nil
}
var s types.Stats
s.Pids.Current = cg.PidsStats.Current
s.Pids.Limit = cg.PidsStats.Limit
s.CPU.Usage.Kernel = cg.CpuStats.CpuUsage.UsageInKernelmode
s.CPU.Usage.User = cg.CpuStats.CpuUsage.UsageInUsermode
s.CPU.Usage.Total = cg.CpuStats.CpuUsage.TotalUsage
s.CPU.Usage.Percpu = cg.CpuStats.CpuUsage.PercpuUsage
s.CPU.Throttling.Periods = cg.CpuStats.ThrottlingData.Periods
s.CPU.Throttling.ThrottledPeriods = cg.CpuStats.ThrottlingData.ThrottledPeriods
s.CPU.Throttling.ThrottledTime = cg.CpuStats.ThrottlingData.ThrottledTime
s.Memory.Cache = cg.MemoryStats.Cache
s.Memory.Kernel = convertMemoryEntry(cg.MemoryStats.KernelUsage)
s.Memory.KernelTCP = convertMemoryEntry(cg.MemoryStats.KernelTCPUsage)
s.Memory.Swap = convertMemoryEntry(cg.MemoryStats.SwapUsage)
s.Memory.Usage = convertMemoryEntry(cg.MemoryStats.Usage)
s.Memory.Raw = cg.MemoryStats.Stats
s.Blkio.IoServiceBytesRecursive = convertBlkioEntry(cg.BlkioStats.IoServiceBytesRecursive)
s.Blkio.IoServicedRecursive = convertBlkioEntry(cg.BlkioStats.IoServicedRecursive)
s.Blkio.IoQueuedRecursive = convertBlkioEntry(cg.BlkioStats.IoQueuedRecursive)
s.Blkio.IoServiceTimeRecursive = convertBlkioEntry(cg.BlkioStats.IoServiceTimeRecursive)
s.Blkio.IoWaitTimeRecursive = convertBlkioEntry(cg.BlkioStats.IoWaitTimeRecursive)
s.Blkio.IoMergedRecursive = convertBlkioEntry(cg.BlkioStats.IoMergedRecursive)
s.Blkio.IoTimeRecursive = convertBlkioEntry(cg.BlkioStats.IoTimeRecursive)
s.Blkio.SectorsRecursive = convertBlkioEntry(cg.BlkioStats.SectorsRecursive)
s.Hugetlb = make(map[string]types.Hugetlb)
for k, v := range cg.HugetlbStats {
s.Hugetlb[k] = convertHugtlb(v)
}
if is := ls.IntelRdtStats; is != nil {
if intelrdt.IsCatEnabled() {
s.IntelRdt.L3CacheInfo = convertL3CacheInfo(is.L3CacheInfo)
s.IntelRdt.L3CacheSchemaRoot = is.L3CacheSchemaRoot
s.IntelRdt.L3CacheSchema = is.L3CacheSchema
}
if intelrdt.IsMbaEnabled() {
s.IntelRdt.MemBwInfo = convertMemBwInfo(is.MemBwInfo)
s.IntelRdt.MemBwSchemaRoot = is.MemBwSchemaRoot
s.IntelRdt.MemBwSchema = is.MemBwSchema
}
}
s.NetworkInterfaces = ls.Interfaces
return &s
}
func convertHugtlb(c cgroups.HugetlbStats) types.Hugetlb {
return types.Hugetlb{
Usage: c.Usage,
Max: c.MaxUsage,
Failcnt: c.Failcnt,
}
}
func convertMemoryEntry(c cgroups.MemoryData) types.MemoryEntry {
return types.MemoryEntry{
Limit: c.Limit,
Usage: c.Usage,
Max: c.MaxUsage,
Failcnt: c.Failcnt,
}
}
func convertBlkioEntry(c []cgroups.BlkioStatEntry) []types.BlkioEntry {
var out []types.BlkioEntry
for _, e := range c {
out = append(out, types.BlkioEntry{
Major: e.Major,
Minor: e.Minor,
Op: e.Op,
Value: e.Value,
})
}
return out
}
func convertL3CacheInfo(i *intelrdt.L3CacheInfo) *types.L3CacheInfo {
return &types.L3CacheInfo{
CbmMask: i.CbmMask,
MinCbmBits: i.MinCbmBits,
NumClosids: i.NumClosids,
}
}
func convertMemBwInfo(i *intelrdt.MemBwInfo) *types.MemBwInfo {
return &types.MemBwInfo{
BandwidthGran: i.BandwidthGran,
DelayLinear: i.DelayLinear,
MinBandwidth: i.MinBandwidth,
NumClosids: i.NumClosids,
}
}
// +build linux
package main
import (
"encoding/json"
"fmt"
"os"
"strconv"
"strings"
"github.com/opencontainers/runc/libcontainer"
"github.com/opencontainers/runc/libcontainer/utils"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/urfave/cli"
)
var execCommand = cli.Command{
Name: "exec",
Usage: "execute new process inside the container",
ArgsUsage: `<container-id> <command> [command options] || -p process.json <container-id>
Where "<container-id>" is the name for the instance of the container and
"<command>" is the command to be executed in the container.
"<command>" can't be empty unless a "-p" flag provided.
EXAMPLE:
For example, if the container is configured to run the linux ps command the
following will output a list of processes running in the container:
# runc exec <container-id> ps`,
Flags: []cli.Flag{
cli.StringFlag{
Name: "console-socket",
Usage: "path to an AF_UNIX socket which will receive a file descriptor referencing the master end of the console's pseudoterminal",
},
cli.StringFlag{
Name: "cwd",
Usage: "current working directory in the container",
},
cli.StringSliceFlag{
Name: "env, e",
Usage: "set environment variables",
},
cli.BoolFlag{
Name: "tty, t",
Usage: "allocate a pseudo-TTY",
},
cli.StringFlag{
Name: "user, u",
Usage: "UID (format: <uid>[:<gid>])",
},
cli.Int64SliceFlag{
Name: "additional-gids, g",
Usage: "additional gids",
},
cli.StringFlag{
Name: "process, p",
Usage: "path to the process.json",
},
cli.BoolFlag{
Name: "detach,d",
Usage: "detach from the container's process",
},
cli.StringFlag{
Name: "pid-file",
Value: "",
Usage: "specify the file to write the process id to",
},
cli.StringFlag{
Name: "process-label",
Usage: "set the asm process label for the process commonly used with selinux",
},
cli.StringFlag{
Name: "apparmor",
Usage: "set the apparmor profile for the process",
},
cli.BoolFlag{
Name: "no-new-privs",
Usage: "set the no new privileges value for the process",
},
cli.StringSliceFlag{
Name: "cap, c",
Value: &cli.StringSlice{},
Usage: "add a capability to the bounding set for the process",
},
cli.BoolFlag{
Name: "no-subreaper",
Usage: "disable the use of the subreaper used to reap reparented processes",
Hidden: true,
},
cli.IntFlag{
Name: "preserve-fds",
Usage: "Pass N additional file descriptors to the container (stdio + $LISTEN_FDS + N in total)",
},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, minArgs); err != nil {
return err
}
if err := revisePidFile(context); err != nil {
return err
}
status, err := execProcess(context)
if err == nil {
os.Exit(status)
}
return fmt.Errorf("exec failed: %v", err)
},
SkipArgReorder: true,
}
func execProcess(context *cli.Context) (int, error) {
container, err := getContainer(context)
if err != nil {
return -1, err
}
status, err := container.Status()
if err != nil {
return -1, err
}
if status == libcontainer.Stopped {
return -1, fmt.Errorf("cannot exec a container that has stopped")
}
path := context.String("process")
if path == "" && len(context.Args()) == 1 {
return -1, fmt.Errorf("process args cannot be empty")
}
detach := context.Bool("detach")
state, err := container.State()
if err != nil {
return -1, err
}
bundle := utils.SearchLabels(state.Config.Labels, "bundle")
p, err := getProcess(context, bundle)
if err != nil {
return -1, err
}
logLevel := "info"
if context.GlobalBool("debug") {
logLevel = "debug"
}
r := &runner{
enableSubreaper: false,
shouldDestroy: false,
container: container,
consoleSocket: context.String("console-socket"),
detach: detach,
pidFile: context.String("pid-file"),
action: CT_ACT_RUN,
init: false,
preserveFDs: context.Int("preserve-fds"),
logLevel: logLevel,
}
return r.run(p)
}
func getProcess(context *cli.Context, bundle string) (*specs.Process, error) {
if path := context.String("process"); path != "" {
f, err := os.Open(path)
if err != nil {
return nil, err
}
defer f.Close()
var p specs.Process
if err := json.NewDecoder(f).Decode(&p); err != nil {
return nil, err
}
return &p, validateProcessSpec(&p)
}
// process via cli flags
if err := os.Chdir(bundle); err != nil {
return nil, err
}
spec, err := loadSpec(specConfig)
if err != nil {
return nil, err
}
p := spec.Process
p.Args = context.Args()[1:]
// override the cwd, if passed
if context.String("cwd") != "" {
p.Cwd = context.String("cwd")
}
if ap := context.String("apparmor"); ap != "" {
p.ApparmorProfile = ap
}
if l := context.String("process-label"); l != "" {
p.SelinuxLabel = l
}
if caps := context.StringSlice("cap"); len(caps) > 0 {
for _, c := range caps {
p.Capabilities.Bounding = append(p.Capabilities.Bounding, c)
p.Capabilities.Inheritable = append(p.Capabilities.Inheritable, c)
p.Capabilities.Effective = append(p.Capabilities.Effective, c)
p.Capabilities.Permitted = append(p.Capabilities.Permitted, c)
p.Capabilities.Ambient = append(p.Capabilities.Ambient, c)
}
}
// append the passed env variables
p.Env = append(p.Env, context.StringSlice("env")...)
// set the tty
p.Terminal = false
if context.IsSet("tty") {
p.Terminal = context.Bool("tty")
}
if context.IsSet("no-new-privs") {
p.NoNewPrivileges = context.Bool("no-new-privs")
}
// override the user, if passed
if context.String("user") != "" {
u := strings.SplitN(context.String("user"), ":", 2)
if len(u) > 1 {
gid, err := strconv.Atoi(u[1])
if err != nil {
return nil, fmt.Errorf("parsing %s as int for gid failed: %v", u[1], err)
}
p.User.GID = uint32(gid)
}
uid, err := strconv.Atoi(u[0])
if err != nil {
return nil, fmt.Errorf("parsing %s as int for uid failed: %v", u[0], err)
}
p.User.UID = uint32(uid)
}
for _, gid := range context.Int64Slice("additional-gids") {
if gid < 0 {
return nil, fmt.Errorf("additional-gids must be a positive number %d", gid)
}
p.User.AdditionalGids = append(p.User.AdditionalGids, uint32(gid))
}
return p, validateProcessSpec(p)
}
module github.com/opencontainers/runc
go 1.14
require (
github.com/checkpoint-restore/go-criu v0.0.0-20191125063657-fcdcd07065c5
github.com/cilium/ebpf v0.0.0-20200319110858-a7172c01168f
github.com/containerd/console v1.0.0
github.com/coreos/go-systemd/v22 v22.0.0
github.com/cyphar/filepath-securejoin v0.2.2
github.com/docker/go-units v0.4.0
github.com/godbus/dbus/v5 v5.0.3
github.com/golang/protobuf v1.3.5
github.com/moby/sys/mountinfo v0.1.3
github.com/mrunalp/fileutils v0.0.0-20171103030105-7d4729fb3618
github.com/opencontainers/runtime-spec v1.0.2
github.com/opencontainers/selinux v1.4.0
github.com/pkg/errors v0.9.1
github.com/seccomp/libseccomp-golang v0.9.1
github.com/sirupsen/logrus v1.5.0
github.com/syndtr/gocapability v0.0.0-20180916011248-d98352740cb2
// NOTE: urfave/cli must be <= v1.22.1 due to a regression: https://github.com/urfave/cli/issues/1092
github.com/urfave/cli v1.22.1
github.com/vishvananda/netlink v1.1.0
golang.org/x/sys v0.0.0-20200327173247-9dae0f8f5775
)
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/checkpoint-restore/go-criu v0.0.0-20191125063657-fcdcd07065c5 h1:950diauOSoCNaQfvLE0WaFadyI/ilKIjb6jEUQnm+hE=
github.com/checkpoint-restore/go-criu v0.0.0-20191125063657-fcdcd07065c5/go.mod h1:TrMrLQfeENAPYPRsJuq3jsqdlRh3lvi6trTZJG8+tho=
github.com/cilium/ebpf v0.0.0-20200319110858-a7172c01168f h1:W1RQPz3nR8RxUw/Uqk71GU3JlZ7pNa1pXrHs98h0o9U=
github.com/cilium/ebpf v0.0.0-20200319110858-a7172c01168f/go.mod h1:XT+cAw5wfvsodedcijoh1l9cf7v1x9FlFB/3VmF/O8s=
github.com/containerd/console v1.0.0 h1:fU3UuQapBs+zLJu82NhR11Rif1ny2zfMMAyPJzSN5tQ=
github.com/containerd/console v1.0.0/go.mod h1:8Pf4gM6VEbTNRIT26AyyU7hxdQU3MvAvxVI0sc00XBE=
github.com/coreos/go-systemd/v22 v22.0.0 h1:XJIw/+VlJ+87J+doOxznsAWIdmWuViOVhkQamW5YV28=
github.com/coreos/go-systemd/v22 v22.0.0/go.mod h1:xO0FLkIi5MaZafQlIrOotqXZ90ih+1atmu1JpKERPPk=
github.com/cpuguy83/go-md2man/v2 v2.0.0-20190314233015-f79a8a8ca69d h1:U+s90UTSYgptZMwQh2aRr3LuazLJIa+Pg3Kc1ylSYVY=
github.com/cpuguy83/go-md2man/v2 v2.0.0-20190314233015-f79a8a8ca69d/go.mod h1:maD7wRr/U5Z6m/iR4s+kqSMx2CaBsrgA7czyZG/E6dU=
github.com/cyphar/filepath-securejoin v0.2.2 h1:jCwT2GTP+PY5nBz3c/YL5PAIbusElVrPujOBSCj8xRg=
github.com/cyphar/filepath-securejoin v0.2.2/go.mod h1:FpkQEhXnPnOthhzymB7CGsFk2G9VLXONKD9G7QGMM+4=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/docker/go-units v0.4.0 h1:3uh0PgVws3nIA0Q+MwDC8yjEPf9zjRfZZWXZYDct3Tw=
github.com/docker/go-units v0.4.0/go.mod h1:fgPhTUdO+D/Jk86RDLlptpiXQzgHJF7gydDDbaIK4Dk=
github.com/godbus/dbus/v5 v5.0.3 h1:ZqHaoEF7TBzh4jzPmqVhE/5A1z9of6orkAe5uHoAeME=
github.com/godbus/dbus/v5 v5.0.3/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
github.com/golang/protobuf v1.3.5 h1:F768QJ1E9tib+q5Sc8MkdJi1RxLTbRcTf8LJV56aRls=
github.com/golang/protobuf v1.3.5/go.mod h1:6O5/vntMXwX2lRkT1hjjk0nAC1IDOTvTlVgjlRvqsdk=
github.com/konsorten/go-windows-terminal-sequences v1.0.1 h1:mweAR1A6xJ3oS2pRaGiHgQ4OO8tzTaLawm8vnODuwDk=
github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
github.com/moby/sys/mountinfo v0.1.3 h1:KIrhRO14+AkwKvG/g2yIpNMOUVZ02xNhOw8KY1WsLOI=
github.com/moby/sys/mountinfo v0.1.3/go.mod h1:w2t2Avltqx8vE7gX5l+QiBKxODu2TX0+Syr3h52Tw4o=
github.com/mrunalp/fileutils v0.0.0-20171103030105-7d4729fb3618 h1:7InQ7/zrOh6SlFjaXFubv0xX0HsuC9qJsdqm7bNQpYM=
github.com/mrunalp/fileutils v0.0.0-20171103030105-7d4729fb3618/go.mod h1:x8F1gnqOkIEiO4rqoeEEEqQbo7HjGMTvyoq3gej4iT0=
github.com/opencontainers/runtime-spec v1.0.2 h1:UfAcuLBJB9Coz72x1hgl8O5RVzTdNiaglX6v2DM6FI0=
github.com/opencontainers/runtime-spec v1.0.2/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/selinux v1.4.0 h1:cpiX/2wWIju/6My60T6/z9CxNG7c8xTQyEmA9fChpUo=
github.com/opencontainers/selinux v1.4.0/go.mod h1:yTcKuYAh6R95iDpefGLQaPaRwJFwyzAJufJyiTt7s0g=
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/russross/blackfriday/v2 v2.0.1 h1:lPqVAte+HuHNfhJ/0LC98ESWRz8afy9tM/0RK8m9o+Q=
github.com/russross/blackfriday/v2 v2.0.1/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/seccomp/libseccomp-golang v0.9.1 h1:NJjM5DNFOs0s3kYE1WUOr6G8V97sdt46rlXTMfXGWBo=
github.com/seccomp/libseccomp-golang v0.9.1/go.mod h1:GbW5+tmTXfcxTToHLXlScSlAvWlF4P2Ca7zGrPiEpWo=
github.com/shurcooL/sanitized_anchor_name v1.0.0 h1:PdmoCO6wvbs+7yrJyMORt4/BmY5IYyJwS/kOiWx8mHo=
github.com/shurcooL/sanitized_anchor_name v1.0.0/go.mod h1:1NzhyTcUVG4SuEtjjoZeVRXNmyL/1OwPU0+IJeTBvfc=
github.com/sirupsen/logrus v1.5.0 h1:1N5EYkVAPEywqZRJd7cwnRtCb6xJx7NH3T3WUTF980Q=
github.com/sirupsen/logrus v1.5.0/go.mod h1:+F7Ogzej0PZc/94MaYx/nvG9jOFMD2osvC3s+Squfpo=
github.com/stretchr/testify v1.2.2 h1:bSDNvY7ZPG5RlJ8otE/7V6gMiyenm9RtJ7IUVIAoJ1w=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/syndtr/gocapability v0.0.0-20180916011248-d98352740cb2 h1:b6uOv7YOFK0TYG7HtkIgExQo+2RdLuwRft63jn2HWj8=
github.com/syndtr/gocapability v0.0.0-20180916011248-d98352740cb2/go.mod h1:hkRG7XYTFWNJGYcbNJQlaLq0fg1yr4J4t/NcTQtrfww=
github.com/urfave/cli v1.22.1 h1:+mkCCcOFKPnCmVYVcURKps1Xe+3zP90gSYGNfRkjoIY=
github.com/urfave/cli v1.22.1/go.mod h1:Gos4lmkARVdJ6EkW0WaNv/tZAAMe9V7XWyB60NtXRu0=
github.com/vishvananda/netlink v1.1.0 h1:1iyaYNBLmP6L0220aDnYQpo1QEV4t4hJ+xEEhhJH8j0=
github.com/vishvananda/netlink v1.1.0/go.mod h1:cTgwzPIzzgDAYoQrMm0EdrjRUBkTqKYppBueQtXaqoE=
github.com/vishvananda/netns v0.0.0-20191106174202-0a2b9b5464df h1:OviZH7qLw/7ZovXvuNyL3XQl8UFofeikI1NW1Gypu7k=
github.com/vishvananda/netns v0.0.0-20191106174202-0a2b9b5464df/go.mod h1:JP3t17pCcGlemwknint6hfoeCVQrEMVwxRLRjXpq+BU=
golang.org/x/sys v0.0.0-20190422165155-953cdadca894/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20190606203320-7fc4e5ec1444/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191115151921-52ab43148777/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191120155948-bd437916bb0e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200124204421-9fbb57f87de9/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200327173247-9dae0f8f5775 h1:TC0v2RSO1u2kn1ZugjrFXkRZAEaqMN/RW+OTZkBzmLE=
golang.org/x/sys v0.0.0-20200327173247-9dae0f8f5775/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 h1:E7g+9GITq07hpfrRu66IVDexMakfv52eLZ2CXBWiKr4=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
package main
import (
"fmt"
"os"
"runtime"
"github.com/opencontainers/runc/libcontainer"
"github.com/opencontainers/runc/libcontainer/logs"
_ "github.com/opencontainers/runc/libcontainer/nsenter"
"github.com/sirupsen/logrus"
"github.com/urfave/cli"
)
func init() {
if len(os.Args) > 1 && os.Args[1] == "init" {
runtime.GOMAXPROCS(1)
runtime.LockOSThread()
level := os.Getenv("_LIBCONTAINER_LOGLEVEL")
logLevel, err := logs.ParseLogLevel(level)
if err != nil {
panic(fmt.Sprintf("libcontainer: failed to parse log level: %q: %v", level, err))
}
err = logs.ConfigureLogging(logs.Config{
LogPipeFd: os.Getenv("_LIBCONTAINER_LOGPIPE"),
LogFormat: "json",
LogLevel: logLevel,
})
if err != nil {
panic(fmt.Sprintf("libcontainer: failed to configure logging: %v", err))
}
logrus.Debug("child process in init()")
}
}
var initCommand = cli.Command{
Name: "init",
Usage: `initialize the namespaces and launch the process (do not call it outside of runc)`,
Action: func(context *cli.Context) error {
factory, _ := libcontainer.New("")
if err := factory.StartInitialization(); err != nil {
// as the error is sent back to the parent there is no need to log
// or write it to stderr because the parent process will handle this
os.Exit(1)
}
panic("libcontainer: container init failed to exec")
},
}
// +build linux
package main
import (
"fmt"
"strconv"
"strings"
"github.com/urfave/cli"
"golang.org/x/sys/unix"
)
var killCommand = cli.Command{
Name: "kill",
Usage: "kill sends the specified signal (default: SIGTERM) to the container's init process",
ArgsUsage: `<container-id> [signal]
Where "<container-id>" is the name for the instance of the container and
"[signal]" is the signal to be sent to the init process.
EXAMPLE:
For example, if the container id is "ubuntu01" the following will send a "KILL"
signal to the init process of the "ubuntu01" container:
# runc kill ubuntu01 KILL`,
Flags: []cli.Flag{
cli.BoolFlag{
Name: "all, a",
Usage: "send the specified signal to all processes inside the container",
},
},
Action: func(context *cli.Context) error {
if err := checkArgs(context, 1, minArgs); err != nil {
return err
}
if err := checkArgs(context, 2, maxArgs); err != nil {
return err
}
container, err := getContainer(context)
if err != nil {
return err
}
sigstr := context.Args().Get(1)
if sigstr == "" {
sigstr = "SIGTERM"
}
signal, err := parseSignal(sigstr)
if err != nil {
return err
}
return container.Signal(signal, context.Bool("all"))
},
}
func parseSignal(rawSignal string) (unix.Signal, error) {
s, err := strconv.Atoi(rawSignal)
if err == nil {
return unix.Signal(s), nil
}
sig := strings.ToUpper(rawSignal)
if !strings.HasPrefix(sig, "SIG") {
sig = "SIG" + sig
}
signal := unix.SignalNum(sig)
if signal == 0 {
return -1, fmt.Errorf("unknown signal %q", rawSignal)
}
return signal, nil
}
# libcontainer
[![GoDoc](https://godoc.org/github.com/opencontainers/runc/libcontainer?status.svg)](https://godoc.org/github.com/opencontainers/runc/libcontainer)
Libcontainer provides a native Go implementation for creating containers
with namespaces, cgroups, capabilities, and filesystem access controls.
It allows you to manage the lifecycle of the container performing additional operations
after the container is created.
#### Container
A container is a self contained execution environment that shares the kernel of the
host system and which is (optionally) isolated from other containers in the system.
#### Using libcontainer
Because containers are spawned in a two step process you will need a binary that
will be executed as the init process for the container. In libcontainer, we use
the current binary (/proc/self/exe) to be executed as the init process, and use
arg "init", we call the first step process "bootstrap", so you always need a "init"
function as the entry of "bootstrap".
In addition to the go init function the early stage bootstrap is handled by importing
[nsenter](https://github.com/opencontainers/runc/blob/master/libcontainer/nsenter/README.md).
```go
import (
_ "github.com/opencontainers/runc/libcontainer/nsenter"
)
func init() {
if len(os.Args) > 1 && os.Args[1] == "init" {
runtime.GOMAXPROCS(1)
runtime.LockOSThread()
factory, _ := libcontainer.New("")
if err := factory.StartInitialization(); err != nil {
logrus.Fatal(err)
}
panic("--this line should have never been executed, congratulations--")
}
}
```
Then to create a container you first have to initialize an instance of a factory
that will handle the creation and initialization for a container.
```go
factory, err := libcontainer.New("/var/lib/container", libcontainer.Cgroupfs, libcontainer.InitArgs(os.Args[0], "init"))
if err != nil {
logrus.Fatal(err)
return
}
```
Once you have an instance of the factory created we can create a configuration
struct describing how the container is to be created. A sample would look similar to this:
```go
defaultMountFlags := unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NODEV
config := &configs.Config{
Rootfs: "/your/path/to/rootfs",
Capabilities: &configs.Capabilities{
Bounding: []string{
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE",
},
Effective: []string{
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE",
},
Inheritable: []string{
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE",
},
Permitted: []string{
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE",
},
Ambient: []string{
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE",
},
},
Namespaces: configs.Namespaces([]configs.Namespace{
{Type: configs.NEWNS},
{Type: configs.NEWUTS},
{Type: configs.NEWIPC},
{Type: configs.NEWPID},
{Type: configs.NEWUSER},
{Type: configs.NEWNET},
{Type: configs.NEWCGROUP},
}),
Cgroups: &configs.Cgroup{
Name: "test-container",
Parent: "system",
Resources: &configs.Resources{
MemorySwappiness: nil,
AllowAllDevices: nil,
AllowedDevices: configs.DefaultAllowedDevices,
},
},
MaskPaths: []string{
"/proc/kcore",
"/sys/firmware",
},
ReadonlyPaths: []string{
"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
},
Devices: configs.DefaultAutoCreatedDevices,
Hostname: "testing",
Mounts: []*configs.Mount{
{
Source: "proc",
Destination: "/proc",
Device: "proc",
Flags: defaultMountFlags,
},
{
Source: "tmpfs",
Destination: "/dev",
Device: "tmpfs",
Flags: unix.MS_NOSUID | unix.MS_STRICTATIME,
Data: "mode=755",
},
{
Source: "devpts",
Destination: "/dev/pts",
Device: "devpts",
Flags: unix.MS_NOSUID | unix.MS_NOEXEC,
Data: "newinstance,ptmxmode=0666,mode=0620,gid=5",
},
{
Device: "tmpfs",
Source: "shm",
Destination: "/dev/shm",
Data: "mode=1777,size=65536k",
Flags: defaultMountFlags,
},
{
Source: "mqueue",
Destination: "/dev/mqueue",
Device: "mqueue",
Flags: defaultMountFlags,
},
{
Source: "sysfs",
Destination: "/sys",
Device: "sysfs",
Flags: defaultMountFlags | unix.MS_RDONLY,
},
},
UidMappings: []configs.IDMap{
{
ContainerID: 0,
HostID: 1000,
Size: 65536,
},
},
GidMappings: []configs.IDMap{
{
ContainerID: 0,
HostID: 1000,
Size: 65536,
},
},
Networks: []*configs.Network{
{
Type: "loopback",
Address: "127.0.0.1/0",
Gateway: "localhost",
},
},
Rlimits: []configs.Rlimit{
{
Type: unix.RLIMIT_NOFILE,
Hard: uint64(1025),
Soft: uint64(1025),
},
},
}
```
Once you have the configuration populated you can create a container:
```go
container, err := factory.Create("container-id", config)
if err != nil {
logrus.Fatal(err)
return
}
```
To spawn bash as the initial process inside the container and have the
processes pid returned in order to wait, signal, or kill the process:
```go
process := &libcontainer.Process{
Args: []string{"/bin/bash"},
Env: []string{"PATH=/bin"},
User: "daemon",
Stdin: os.Stdin,
Stdout: os.Stdout,
Stderr: os.Stderr,
Init: true,
}
err := container.Run(process)
if err != nil {
container.Destroy()
logrus.Fatal(err)
return
}
// wait for the process to finish.
_, err := process.Wait()
if err != nil {
logrus.Fatal(err)
}
// destroy the container.
container.Destroy()
```
Additional ways to interact with a running container are:
```go
// return all the pids for all processes running inside the container.
processes, err := container.Processes()
// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()
// pause all processes inside the container.
container.Pause()
// resume all paused processes.
container.Resume()
// send signal to container's init process.
container.Signal(signal)
// update container resource constraints.
container.Set(config)
// get current status of the container.
status, err := container.Status()
// get current container's state information.
state, err := container.State()
```
#### Checkpoint & Restore
libcontainer now integrates [CRIU](http://criu.org/) for checkpointing and restoring containers.
This let's you save the state of a process running inside a container to disk, and then restore
that state into a new process, on the same machine or on another machine.
`criu` version 1.5.2 or higher is required to use checkpoint and restore.
If you don't already have `criu` installed, you can build it from source, following the
[online instructions](http://criu.org/Installation). `criu` is also installed in the docker image
generated when building libcontainer with docker.
## Copyright and license
Code and documentation copyright 2014 Docker, inc.
The code and documentation are released under the [Apache 2.0 license](../LICENSE).
The documentation is also released under Creative Commons Attribution 4.0 International License.
You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.
## Container Specification - v1
This is the standard configuration for version 1 containers. It includes
namespaces, standard filesystem setup, a default Linux capability set, and
information about resource reservations. It also has information about any
populated environment settings for the processes running inside a container.
Along with the configuration of how a container is created the standard also
discusses actions that can be performed on a container to manage and inspect
information about the processes running inside.
The v1 profile is meant to be able to accommodate the majority of applications
with a strong security configuration.
### System Requirements and Compatibility
Minimum requirements:
* Kernel version - 3.10 recommended 2.6.2x minimum(with backported patches)
* Mounted cgroups with each subsystem in its own hierarchy
### Namespaces
| Flag | Enabled |
| --------------- | ------- |
| CLONE_NEWPID | 1 |
| CLONE_NEWUTS | 1 |
| CLONE_NEWIPC | 1 |
| CLONE_NEWNET | 1 |
| CLONE_NEWNS | 1 |
| CLONE_NEWUSER | 1 |
| CLONE_NEWCGROUP | 1 |
Namespaces are created for the container via the `unshare` syscall.
### Filesystem
A root filesystem must be provided to a container for execution. The container
will use this root filesystem (rootfs) to jail and spawn processes inside where
the binaries and system libraries are local to that directory. Any binaries
to be executed must be contained within this rootfs.
Mounts that happen inside the container are automatically cleaned up when the
container exits as the mount namespace is destroyed and the kernel will
unmount all the mounts that were setup within that namespace.
For a container to execute properly there are certain filesystems that
are required to be mounted within the rootfs that the runtime will setup.
| Path | Type | Flags | Data |
| ----------- | ------ | -------------------------------------- | ---------------------------------------- |
| /proc | proc | MS_NOEXEC,MS_NOSUID,MS_NODEV | |
| /dev | tmpfs | MS_NOEXEC,MS_STRICTATIME | mode=755 |
| /dev/shm | tmpfs | MS_NOEXEC,MS_NOSUID,MS_NODEV | mode=1777,size=65536k |
| /dev/mqueue | mqueue | MS_NOEXEC,MS_NOSUID,MS_NODEV | |
| /dev/pts | devpts | MS_NOEXEC,MS_NOSUID | newinstance,ptmxmode=0666,mode=620,gid=5 |
| /sys | sysfs | MS_NOEXEC,MS_NOSUID,MS_NODEV,MS_RDONLY | |
After a container's filesystems are mounted within the newly created
mount namespace `/dev` will need to be populated with a set of device nodes.
It is expected that a rootfs does not need to have any device nodes specified
for `/dev` within the rootfs as the container will setup the correct devices
that are required for executing a container's process.
| Path | Mode | Access |
| ------------ | ---- | ---------- |
| /dev/null | 0666 | rwm |
| /dev/zero | 0666 | rwm |
| /dev/full | 0666 | rwm |
| /dev/tty | 0666 | rwm |
| /dev/random | 0666 | rwm |
| /dev/urandom | 0666 | rwm |
**ptmx**
`/dev/ptmx` will need to be a symlink to the host's `/dev/ptmx` within
the container.
The use of a pseudo TTY is optional within a container and it should support both.
If a pseudo is provided to the container `/dev/console` will need to be
setup by binding the console in `/dev/` after it has been populated and mounted
in tmpfs.
| Source | Destination | UID GID | Mode | Type |
| --------------- | ------------ | ------- | ---- | ---- |
| *pty host path* | /dev/console | 0 0 | 0600 | bind |
After `/dev/null` has been setup we check for any external links between
the container's io, STDIN, STDOUT, STDERR. If the container's io is pointing
to `/dev/null` outside the container we close and `dup2` the `/dev/null`
that is local to the container's rootfs.
After the container has `/proc` mounted a few standard symlinks are setup
within `/dev/` for the io.
| Source | Destination |
| --------------- | ----------- |
| /proc/self/fd | /dev/fd |
| /proc/self/fd/0 | /dev/stdin |
| /proc/self/fd/1 | /dev/stdout |
| /proc/self/fd/2 | /dev/stderr |
A `pivot_root` is used to change the root for the process, effectively
jailing the process inside the rootfs.
```c
put_old = mkdir(...);
pivot_root(rootfs, put_old);
chdir("/");
unmount(put_old, MS_DETACH);
rmdir(put_old);
```
For container's running with a rootfs inside `ramfs` a `MS_MOVE` combined
with a `chroot` is required as `pivot_root` is not supported in `ramfs`.
```c
mount(rootfs, "/", NULL, MS_MOVE, NULL);
chroot(".");
chdir("/");
```
The `umask` is set back to `0022` after the filesystem setup has been completed.
### Resources
Cgroups are used to handle resource allocation for containers. This includes
system resources like cpu, memory, and device access.
| Subsystem | Enabled |
| ---------- | ------- |
| devices | 1 |
| memory | 1 |
| cpu | 1 |
| cpuacct | 1 |
| cpuset | 1 |
| blkio | 1 |
| perf_event | 1 |
| freezer | 1 |
| hugetlb | 1 |
| pids | 1 |
All cgroup subsystem are joined so that statistics can be collected from
each of the subsystems. Freezer does not expose any stats but is joined
so that containers can be paused and resumed.
The parent process of the container's init must place the init pid inside
the correct cgroups before the initialization begins. This is done so
that no processes or threads escape the cgroups. This sync is
done via a pipe ( specified in the runtime section below ) that the container's
init process will block waiting for the parent to finish setup.
### IntelRdt
Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
two sub-features of RDT.
Cache Allocation Technology (CAT) provides a way for the software to restrict
cache allocation to a defined 'subset' of L3 cache which may be overlapping
with other 'subsets'. The different subsets are identified by class of
service (CLOS) and each CLOS has a capacity bitmask (CBM).
Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
over memory bandwidth for the software. A user controls the resource by
indicating the percentage of maximum memory bandwidth or memory bandwidth limit
in MBps unit if MBA Software Controller is enabled.
It can be used to handle L3 cache and memory bandwidth resources allocation
for containers if hardware and kernel support Intel RDT CAT and MBA features.
In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.
Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.
CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via
"resource control" filesystem.
Intel RDT "resource control" filesystem hierarchy:
```
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
| |-- L3
| | |-- cbm_mask
| | |-- min_cbm_bits
| | |-- num_closids
| |-- MB
| |-- bandwidth_gran
| |-- delay_linear
| |-- min_bandwidth
| |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
|-- ...
|-- schemata
|-- tasks
```
For runc, we can make use of `tasks` and `schemata` configuration for L3
cache and memory bandwidth resources constraints.
The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent.
The file `schemata` has a list of all the resources available to this group.
Each resource (L3 cache, memory bandwidth) has its own line and format.
L3 cache schema:
It has allocation bitmasks/values for L3 cache on each socket, which
contains L3 cache id and capacity bitmask (CBM).
```
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
```
For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0"
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel CPU models. Kernel will check if it is valid when writing.
e.g., default value 0xfffff in root indicates the max bits of CBM is 20
bits, which mapping to entire L3 cache capacity. Some valid CBM values to
set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which contains
L3 cache id and memory bandwidth.
```
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
```
For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"
The minimum bandwidth percentage value for each CPU model is predefined and
can be looked up through "info/MB/min_bandwidth". The bandwidth granularity
that is allocated is also dependent on the CPU model and can be looked up at
"info/MB/bandwidth_gran". The available bandwidth control steps are:
min_bw + N * bw_gran. Intermediate values are rounded to the next control
step available on the hardware.
If MBA Software Controller is enabled through mount option "-o mba_MBps"
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
instead of "percentages". The kernel underneath would use a software feedback
mechanism or a "Software Controller" which reads the actual bandwidth using
MBM counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".
For example, on a two-socket machine, the schema line could be
"MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
and 7000 MBps memory bandwidth limit on socket 1.
For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
```
An example for runc:
Consider a two-socket machine with two L3 caches where the default CBM is
0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
with a memory bandwidth granularity of 10%.
Tasks inside the container only have access to the "upper" 7/11 of L3 cache
on socket 0 and the "lower" 5/11 L3 cache on socket 1, and may use a
maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
"linux": {
"intelRdt": {
"closID": "guaranteed_group",
"l3CacheSchema": "L3:0=7f0;1=1f",
"memBwSchema": "MB:0=20;1=70"
}
}
```
### Security
The standard set of Linux capabilities that are set in a container
provide a good default for security and flexibility for the applications.
| Capability | Enabled |
| -------------------- | ------- |
| CAP_NET_RAW | 1 |
| CAP_NET_BIND_SERVICE | 1 |
| CAP_AUDIT_READ | 1 |
| CAP_AUDIT_WRITE | 1 |
| CAP_DAC_OVERRIDE | 1 |
| CAP_SETFCAP | 1 |
| CAP_SETPCAP | 1 |
| CAP_SETGID | 1 |
| CAP_SETUID | 1 |
| CAP_MKNOD | 1 |
| CAP_CHOWN | 1 |
| CAP_FOWNER | 1 |
| CAP_FSETID | 1 |
| CAP_KILL | 1 |
| CAP_SYS_CHROOT | 1 |
| CAP_NET_BROADCAST | 0 |
| CAP_SYS_MODULE | 0 |
| CAP_SYS_RAWIO | 0 |
| CAP_SYS_PACCT | 0 |
| CAP_SYS_ADMIN | 0 |
| CAP_SYS_NICE | 0 |
| CAP_SYS_RESOURCE | 0 |
| CAP_SYS_TIME | 0 |
| CAP_SYS_TTY_CONFIG | 0 |
| CAP_AUDIT_CONTROL | 0 |
| CAP_MAC_OVERRIDE | 0 |
| CAP_MAC_ADMIN | 0 |
| CAP_NET_ADMIN | 0 |
| CAP_SYSLOG | 0 |
| CAP_DAC_READ_SEARCH | 0 |
| CAP_LINUX_IMMUTABLE | 0 |
| CAP_IPC_LOCK | 0 |
| CAP_IPC_OWNER | 0 |
| CAP_SYS_PTRACE | 0 |
| CAP_SYS_BOOT | 0 |
| CAP_LEASE | 0 |
| CAP_WAKE_ALARM | 0 |
| CAP_BLOCK_SUSPEND | 0 |
Additional security layers like [apparmor](https://wiki.ubuntu.com/AppArmor)
and [selinux](http://selinuxproject.org/page/Main_Page) can be used with
the containers. A container should support setting an apparmor profile or
selinux process and mount labels if provided in the configuration.
Standard apparmor profile:
```c
#include <tunables/global>
profile <profile_name> flags=(attach_disconnected,mediate_deleted) {
#include <abstractions/base>
network,
capability,
file,
umount,
deny @{PROC}/sys/fs/** wklx,
deny @{PROC}/sysrq-trigger rwklx,
deny @{PROC}/mem rwklx,
deny @{PROC}/kmem rwklx,
deny @{PROC}/sys/kernel/[^s][^h][^m]* wklx,
deny @{PROC}/sys/kernel/*/** wklx,
deny mount,
deny /sys/[^f]*/** wklx,
deny /sys/f[^s]*/** wklx,
deny /sys/fs/[^c]*/** wklx,
deny /sys/fs/c[^g]*/** wklx,
deny /sys/fs/cg[^r]*/** wklx,
deny /sys/firmware/efi/efivars/** rwklx,
deny /sys/kernel/security/** rwklx,
}
```
*TODO: seccomp work is being done to find a good default config*
### Runtime and Init Process
During container creation the parent process needs to talk to the container's init
process and have a form of synchronization. This is accomplished by creating
a pipe that is passed to the container's init. When the init process first spawns
it will block on its side of the pipe until the parent closes its side. This
allows the parent to have time to set the new process inside a cgroup hierarchy
and/or write any uid/gid mappings required for user namespaces.
The pipe is passed to the init process via FD 3.
The application consuming libcontainer should be compiled statically. libcontainer
does not define any init process and the arguments provided are used to `exec` the
process inside the application. There should be no long running init within the
container spec.
If a pseudo tty is provided to a container it will open and `dup2` the console
as the container's STDIN, STDOUT, STDERR as well as mounting the console
as `/dev/console`.
An extra set of mounts are provided to a container and setup for use. A container's
rootfs can contain some non portable files inside that can cause side effects during
execution of a process. These files are usually created and populated with the container
specific information via the runtime.
**Extra runtime files:**
* /etc/hosts
* /etc/resolv.conf
* /etc/hostname
* /etc/localtime
#### Defaults
There are a few defaults that can be overridden by users, but in their omission
these apply to processes within a container.
| Type | Value |
| ------------------- | ------------------------------ |
| Parent Death Signal | SIGKILL |
| UID | 0 |
| GID | 0 |
| GROUPS | 0, NULL |
| CWD | "/" |
| $HOME | Current user's home dir or "/" |
| Readonly rootfs | false |
| Pseudo TTY | false |
## Actions
After a container is created there is a standard set of actions that can
be done to the container. These actions are part of the public API for
a container.
| Action | Description |
| -------------- | ------------------------------------------------------------------ |
| Get processes | Return all the pids for processes running inside a container |
| Get Stats | Return resource statistics for the container as a whole |
| Wait | Waits on the container's init process ( pid 1 ) |
| Wait Process | Wait on any of the container's processes returning the exit status |
| Destroy | Kill the container's init process and remove any filesystem state |
| Signal | Send a signal to the container's init process |
| Signal Process | Send a signal to any of the container's processes |
| Pause | Pause all processes inside the container |
| Resume | Resume all processes inside the container if paused |
| Exec | Execute a new process inside of the container ( requires setns ) |
| Set | Setup configs of the container after it's created |
### Execute a new process inside of a running container
User can execute a new process inside of a running container. Any binaries to be
executed must be accessible within the container's rootfs.
The started process will run inside the container's rootfs. Any changes
made by the process to the container's filesystem will persist after the
process finished executing.
The started process will join all the container's existing namespaces. When the
container is paused, the process will also be paused and will resume when
the container is unpaused. The started process will only run when the container's
primary process (PID 1) is running, and will not be restarted when the container
is restarted.
#### Planned additions
The started process will have its own cgroups nested inside the container's
cgroups. This is used for process tracking and optionally resource allocation
handling for the new process. Freezer cgroup is required, the rest of the cgroups
are optional. The process executor must place its pid inside the correct
cgroups before starting the process. This is done so that no child processes or
threads can escape the cgroups.
When the process is stopped, the process executor will try (in a best-effort way)
to stop all its children and remove the sub-cgroups.
// +build apparmor,linux
package apparmor
import (
"fmt"
"io/ioutil"
"os"
"github.com/opencontainers/runc/libcontainer/utils"
)
// IsEnabled returns true if apparmor is enabled for the host.
func IsEnabled() bool {
if _, err := os.Stat("/sys/kernel/security/apparmor"); err == nil && os.Getenv("container") == "" {
if _, err = os.Stat("/sbin/apparmor_parser"); err == nil {
buf, err := ioutil.ReadFile("/sys/module/apparmor/parameters/enabled")
return err == nil && len(buf) > 1 && buf[0] == 'Y'
}
}
return false
}
func setProcAttr(attr, value string) error {
// Under AppArmor you can only change your own attr, so use /proc/self/
// instead of /proc/<tid>/ like libapparmor does
path := fmt.Sprintf("/proc/self/attr/%s", attr)
f, err := os.OpenFile(path, os.O_WRONLY, 0)
if err != nil {
return err
}
defer f.Close()
if err := utils.EnsureProcHandle(f); err != nil {
return err
}
_, err = fmt.Fprintf(f, "%s", value)
return err
}
// changeOnExec reimplements aa_change_onexec from libapparmor in Go
func changeOnExec(name string) error {
value := "exec " + name
if err := setProcAttr("exec", value); err != nil {
return fmt.Errorf("apparmor failed to apply profile: %s", err)
}
return nil
}
// ApplyProfile will apply the profile with the specified name to the process after
// the next exec.
func ApplyProfile(name string) error {
if name == "" {
return nil
}
return changeOnExec(name)
}
// +build !apparmor !linux
package apparmor
import (
"errors"
)
var ErrApparmorNotEnabled = errors.New("apparmor: config provided but apparmor not supported")
func IsEnabled() bool {
return false
}
func ApplyProfile(name string) error {
if name != "" {
return ErrApparmorNotEnabled
}
return nil
}
// +build linux
package libcontainer
import (
"fmt"
"strings"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/syndtr/gocapability/capability"
)
const allCapabilityTypes = capability.CAPS | capability.BOUNDS | capability.AMBS
var capabilityMap map[string]capability.Cap
func init() {
capabilityMap = make(map[string]capability.Cap)
last := capability.CAP_LAST_CAP
// workaround for RHEL6 which has no /proc/sys/kernel/cap_last_cap
if last == capability.Cap(63) {
last = capability.CAP_BLOCK_SUSPEND
}
for _, cap := range capability.List() {
if cap > last {
continue
}
capKey := fmt.Sprintf("CAP_%s", strings.ToUpper(cap.String()))
capabilityMap[capKey] = cap
}
}
func newContainerCapList(capConfig *configs.Capabilities) (*containerCapabilities, error) {
bounding := []capability.Cap{}
for _, c := range capConfig.Bounding {
v, ok := capabilityMap[c]
if !ok {
return nil, fmt.Errorf("unknown capability %q", c)
}
bounding = append(bounding, v)
}
effective := []capability.Cap{}
for _, c := range capConfig.Effective {
v, ok := capabilityMap[c]
if !ok {
return nil, fmt.Errorf("unknown capability %q", c)
}
effective = append(effective, v)
}
inheritable := []capability.Cap{}
for _, c := range capConfig.Inheritable {
v, ok := capabilityMap[c]
if !ok {
return nil, fmt.Errorf("unknown capability %q", c)
}
inheritable = append(inheritable, v)
}
permitted := []capability.Cap{}
for _, c := range capConfig.Permitted {
v, ok := capabilityMap[c]
if !ok {
return nil, fmt.Errorf("unknown capability %q", c)
}
permitted = append(permitted, v)
}
ambient := []capability.Cap{}
for _, c := range capConfig.Ambient {
v, ok := capabilityMap[c]
if !ok {
return nil, fmt.Errorf("unknown capability %q", c)
}
ambient = append(ambient, v)
}
pid, err := capability.NewPid2(0)
if err != nil {
return nil, err
}
err = pid.Load()
if err != nil {
return nil, err
}
return &containerCapabilities{
bounding: bounding,
effective: effective,
inheritable: inheritable,
permitted: permitted,
ambient: ambient,
pid: pid,
}, nil
}
type containerCapabilities struct {
pid capability.Capabilities
bounding []capability.Cap
effective []capability.Cap
inheritable []capability.Cap
permitted []capability.Cap
ambient []capability.Cap
}
// ApplyBoundingSet sets the capability bounding set to those specified in the whitelist.
func (c *containerCapabilities) ApplyBoundingSet() error {
c.pid.Clear(capability.BOUNDS)
c.pid.Set(capability.BOUNDS, c.bounding...)
return c.pid.Apply(capability.BOUNDS)
}
// Apply sets all the capabilities for the current process in the config.
func (c *containerCapabilities) ApplyCaps() error {
c.pid.Clear(allCapabilityTypes)
c.pid.Set(capability.BOUNDS, c.bounding...)
c.pid.Set(capability.PERMITTED, c.permitted...)
c.pid.Set(capability.INHERITABLE, c.inheritable...)
c.pid.Set(capability.EFFECTIVE, c.effective...)
c.pid.Set(capability.AMBIENT, c.ambient...)
return c.pid.Apply(allCapabilityTypes)
}
// +build linux
package cgroups
import (
"fmt"
"github.com/opencontainers/runc/libcontainer/configs"
)
type Manager interface {
// Applies cgroup configuration to the process with the specified pid
Apply(pid int) error
// Returns the PIDs inside the cgroup set
GetPids() ([]int, error)
// Returns the PIDs inside the cgroup set & all sub-cgroups
GetAllPids() ([]int, error)
// Returns statistics for the cgroup set
GetStats() (*Stats, error)
// Toggles the freezer cgroup according with specified state
Freeze(state configs.FreezerState) error
// Destroys the cgroup set
Destroy() error
// The option func SystemdCgroups() and Cgroupfs() require following attributes:
// Paths map[string]string
// Cgroups *configs.Cgroup
// Paths maps cgroup subsystem to path at which it is mounted.
// Cgroups specifies specific cgroup settings for the various subsystems
// Returns cgroup paths to save in a state file and to be able to
// restore the object later.
GetPaths() map[string]string
// GetUnifiedPath returns the unified path when running in unified mode.
// The value corresponds to the all values of GetPaths() map.
//
// GetUnifiedPath returns error when running in hybrid mode as well as
// in legacy mode.
GetUnifiedPath() (string, error)
// Sets the cgroup as configured.
Set(container *configs.Config) error
// Gets the cgroup as configured.
GetCgroups() (*configs.Cgroup, error)
}
type NotFoundError struct {
Subsystem string
}
func (e *NotFoundError) Error() string {
return fmt.Sprintf("mountpoint for %s not found", e.Subsystem)
}
func NewNotFoundError(sub string) error {
return &NotFoundError{
Subsystem: sub,
}
}
func IsNotFound(err error) bool {
if err == nil {
return false
}
_, ok := err.(*NotFoundError)
return ok
}
// +build linux
package cgroups
import (
"testing"
)
func TestParseCgroups(t *testing.T) {
cgroups, err := ParseCgroupFile("/proc/self/cgroup")
if err != nil {
t.Fatal(err)
}
if IsCgroup2UnifiedMode() {
return
}
if _, ok := cgroups["cpu"]; !ok {
t.Fail()
}
}
package ebpf
import (
"github.com/cilium/ebpf"
"github.com/cilium/ebpf/asm"
"github.com/pkg/errors"
"golang.org/x/sys/unix"
)
// LoadAttachCgroupDeviceFilter installs eBPF device filter program to /sys/fs/cgroup/<foo> directory.
//
// Requires the system to be running in cgroup2 unified-mode with kernel >= 4.15 .
//
// https://github.com/torvalds/linux/commit/ebc614f687369f9df99828572b1d85a7c2de3d92
func LoadAttachCgroupDeviceFilter(insts asm.Instructions, license string, dirFD int) (func() error, error) {
nilCloser := func() error {
return nil
}
// Increase `ulimit -l` limit to avoid BPF_PROG_LOAD error (#2167).
// This limit is not inherited into the container.
memlockLimit := &unix.Rlimit{
Cur: unix.RLIM_INFINITY,
Max: unix.RLIM_INFINITY,
}
_ = unix.Setrlimit(unix.RLIMIT_MEMLOCK, memlockLimit)
spec := &ebpf.ProgramSpec{
Type: ebpf.CGroupDevice,
Instructions: insts,
License: license,
}
prog, err := ebpf.NewProgram(spec)
if err != nil {
return nilCloser, err
}
if err := prog.Attach(dirFD, ebpf.AttachCGroupDevice, unix.BPF_F_ALLOW_MULTI); err != nil {
return nilCloser, errors.Wrap(err, "failed to call BPF_PROG_ATTACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI)")
}
closer := func() error {
if err := prog.Detach(dirFD, ebpf.AttachCGroupDevice, unix.BPF_F_ALLOW_MULTI); err != nil {
return errors.Wrap(err, "failed to call BPF_PROG_DETACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI)")
}
return nil
}
return closer, nil
}
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
// +build linux
package fs
import (
"bufio"
"os"
"path/filepath"
"strconv"
"github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/cgroups/fscommon"
"github.com/opencontainers/runc/libcontainer/configs"
)
type CpuGroup struct {
}
func (s *CpuGroup) Name() string {
return "cpu"
}
func (s *CpuGroup) Apply(d *cgroupData) error {
// We always want to join the cpu group, to allow fair cpu scheduling
// on a container basis
path, err := d.path("cpu")
if err != nil && !cgroups.IsNotFound(err) {
return err
}
return s.ApplyDir(path, d.config, d.pid)
}
func (s *CpuGroup) ApplyDir(path string, cgroup *configs.Cgroup, pid int) error {
// This might happen if we have no cpu cgroup mounted.
// Just do nothing and don't fail.
if path == "" {
return nil
}
if err := os.MkdirAll(path, 0755); err != nil {
return err
}
// We should set the real-Time group scheduling settings before moving
// in the process because if the process is already in SCHED_RR mode
// and no RT bandwidth is set, adding it will fail.
if err := s.SetRtSched(path, cgroup); err != nil {
return err
}
// because we are not using d.join we need to place the pid into the procs file
// unlike the other subsystems
return cgroups.WriteCgroupProc(path, pid)
}
func (s *CpuGroup) SetRtSched(path string, cgroup *configs.Cgroup) error {
if cgroup.Resources.CpuRtPeriod != 0 {
if err := fscommon.WriteFile(path, "cpu.rt_period_us", strconv.FormatUint(cgroup.Resources.CpuRtPeriod, 10)); err != nil {
return err
}
}
if cgroup.Resources.CpuRtRuntime != 0 {
if err := fscommon.WriteFile(path, "cpu.rt_runtime_us", strconv.FormatInt(cgroup.Resources.CpuRtRuntime, 10)); err != nil {
return err
}
}
return nil
}
func (s *CpuGroup) Set(path string, cgroup *configs.Cgroup) error {
if cgroup.Resources.CpuShares != 0 {
if err := fscommon.WriteFile(path, "cpu.shares", strconv.FormatUint(cgroup.Resources.CpuShares, 10)); err != nil {
return err
}
}
if cgroup.Resources.CpuPeriod != 0 {
if err := fscommon.WriteFile(path, "cpu.cfs_period_us", strconv.FormatUint(cgroup.Resources.CpuPeriod, 10)); err != nil {
return err
}
}
if cgroup.Resources.CpuQuota != 0 {
if err := fscommon.WriteFile(path, "cpu.cfs_quota_us", strconv.FormatInt(cgroup.Resources.CpuQuota, 10)); err != nil {
return err
}
}
return s.SetRtSched(path, cgroup)
}
func (s *CpuGroup) Remove(d *cgroupData) error {
return removePath(d.path("cpu"))
}
func (s *CpuGroup) GetStats(path string, stats *cgroups.Stats) error {
f, err := os.Open(filepath.Join(path, "cpu.stat"))
if err != nil {
if os.IsNotExist(err) {
return nil
}
return err
}
defer f.Close()
sc := bufio.NewScanner(f)
for sc.Scan() {
t, v, err := fscommon.GetCgroupParamKeyValue(sc.Text())
if err != nil {
return err
}
switch t {
case "nr_periods":
stats.CpuStats.ThrottlingData.Periods = v
case "nr_throttled":
stats.CpuStats.ThrottlingData.ThrottledPeriods = v
case "throttled_time":
stats.CpuStats.ThrottlingData.ThrottledTime = v
}
}
return nil
}
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册