#10111 iad2/openshift: F34 builds fail, possible runtime mismatch
Closed: Will Not/Can Not fix 3 years ago by kevin. Opened 3 years ago by lucab.

On the IAD2 OpenShift cluster, I did trigger a build of this F34-based image which used to work fine previously (i.e. when using F33 as the base image).

The build got scheduled on os-node04.iad2.fedoraproject.org and it failed with:

Step 6/10 : RUN cargo build --release &&   mv /src/target/release/fcos-graph-builder /usr/local/bin/fcos-graph-builder &&   mv /src/target/release/fcos-policy-engine /usr/local/bin/fcos-policy-engine

error: failed to get `actix-cors` as a dependency of package `commons v0.1.0 (/src/commons)`

Caused by:
  failed to initialize index git repository

Caused by:
  failed to resolve path '/root/.cargo/registry/index/github.com-1ecc6299db9ec823/.git/': Operation not permitted; class=Os (2)

This seems to be due to cargo failing with an EPERM when resolving filesystem paths, which is a quite bizarre error.

Instead I think this is a symptom of https://github.com/opencontainers/runc/issues/2151. That is, the host may have too old seccomp support and container runtime/stack, thus not recognizing newer syscalls used by the F34 images. This in turn lead to an EPERM instead of the expected ENOSYS, as reported in the runc bug.

Would it be possible to double check the stack on all cluster nodes, and make sure it is running latest/fixed versions?


Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

3 years ago

I can confirm this is indeed related to the seccomp profile from docker on ocp 3.11.

strace profile shows that the impacted syscall is, in this case, faccessat2 :

mkdir("/root/.cargo/registry/index/github.com-1ecc6299db9ec823/.git", 0777) = 0
readlink("/root", 0x7ffd9bb1b770, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo", 0x7ffd9bb1b770, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo/registry", 0x7ffd9bb1b770, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo/registry/index", 0x7ffd9bb1b770, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo/registry/index/github.com-1ecc6299db9ec823", 0x7ffd9bb1b770, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo/registry/index/github.com-1ecc6299db9ec823/.git", 0x7ffd9bb1b770, 1023) = -1 EINVAL (Invalid argument)
faccessat2(AT_FDCWD, "/root/.cargo/registry/index/github.com-1ecc6299db9ec823/.git/", F_OK, AT_EACCESS) = -1 EPERM (Operation not permitted)

Seccomp is supposed to be disabled by default on Openshift 3.11, so if you run the same image/command combo in a pod, it fallback to faccessat as expected :

mkdir("/root/.cargo/registry/index/github.com-1ecc6299db9ec823/.git", 0777) = 0
readlink("/root", 0x7ffc47fe50f0, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo", 0x7ffc47fe50f0, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo/registry", 0x7ffc47fe50f0, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo/registry/index", 0x7ffc47fe50f0, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo/registry/index/github.com-1ecc6299db9ec823", 0x7ffc47fe50f0, 1023) = -1 EINVAL (Invalid argument)
readlink("/root/.cargo/registry/index/github.com-1ecc6299db9ec823/.git", 0x7ffc47fe50f0, 1023) = -1 EINVAL (Invalid argument)
faccessat2(AT_FDCWD, "/root/.cargo/registry/index/github.com-1ecc6299db9ec823/.git/", F_OK, AT_EACCESS) = -1 ENOSYS (Function not implemented)
faccessat(AT_FDCWD, "/root/.cargo/registry/index/github.com-1ecc6299db9ec823/.git/", F_OK) = 0

But for Openshift build, we are in a "docker in docker" situation where the container running the openshift build has seccomp disabled, but the docker build command running inside it is using the default (outdated) seccomp profile.

As far as I know, this is still not fixed in the latest version of ocp 3.11 (or more precisely the docker version coming with it), and I don't see any easy way to work around that yet. (except moving to ocp 4)


edit:
There is a bugzilla opened for rhel7 https://bugzilla.redhat.com/show_bug.cgi?id=1961206
The backport is very unlikely to happen at this point.

I think probibly best here is to just wait for us to bring up our ocp4.x cluster and move things there.

I am hoping that will be done in the coming weeks.

Metadata Update from @kevin:
- Issue close_status updated to: Will Not/Can Not fix
- Issue status updated to: Closed (was: Open)

3 years ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Done