#11440 OSError: [Errno 28] No space left on device
Closed: Fixed 2 years ago by darknao. Opened 2 years ago by lbalhar.

Hi.

A scratch build in CI failed because host buildvm-a64-30.iad2.fedoraproject.org has no space left on disk. See https://koji.fedoraproject.org/koji/taskinfo?taskID=104206283

Describe what you would like us to do:


Whatever makes the host work again.

When do you need this to be done by? (YYYY/MM/DD)


Sooner is better.


buildvm-a64-12.iad2.fedoraproject.org appears to be out of space as well:
https://koji.fedoraproject.org/koji/taskinfo?taskID=104211499

I checked both builders and they currently have plenty of space. So whatever happened, it somehow resolved itself.
Seems like #11422
I will check the logs more closely if I find anything, but no luck so far.

Thanks darknao!

I just ran into the same issue once more on buildvm-a64-12.iad2.fedoraproject.org:
https://koji.fedoraproject.org/koji/taskinfo?taskID=104214794

tagBuild of perl-System-Info-0.064-1.fc39 failed

https://koji.fedoraproject.org/koji/taskinfo?taskID=104221733 on buildvm-a64-12.iad2.fedoraproject.org

I checked both builders and they currently have plenty of space. So whatever happened, it somehow resolved itself.
Seems like #11422
I will check the logs more closely if I find anything, but no luck so far.

This sounds similar to the problem that affected ppc64le builders a while ago. They too claimed to have plenty of free disk space but something breaks with "no more space left on the device" regardless ...

I might have found something, but would need to confirm when this happens again.

Please let us know if there are any other failed tasks with this error.

Thanks for the notice! I was able to confirm my theory: The btrfs metadata space was full and unable to expand.
I fixed it on all ppc64 & arch64 builders so we should be clear now.
Please rebuild and let me know if you are still experiencing it.

I have this side tag that might be affected by this issue:

https://koji.fedoraproject.org/koji/builds?inherited=0&tagID=70975&order=-build_id&latest=1

Invoking this would just stall indefinitely: koji wait-repo f38-build-side-70975 --build=rust-libbpf-sys-1.2.1-1.fc38

(thanks @pcreech17 for suggesting it might be this issue).

The last newrepo basically failed on ppc64le - can someone regenerate it? https://koji.fedoraproject.org/koji/taskinfo?taskID=104322198

^ ignore my request to regenerate, I ended up cloning my side tag then deleting the old one

Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

2 years ago

Either weekly btrfs-balance.timer/service from btrfsmaintenance package [1] or the autorelocation feature in sysfs should help with this problem. [2]

Optional and not urgent would be to confirm this problem is the same upstream btrfs developers are aware of, rather than some new problem. [2]

[1]
If I'm understanding the defaults in /etc/sysconfig/btrfsmaintenance correctly, the mx script selects block groups that are up to 10% full for balance. The extents in those targeted block groups are relocated to existing block groups, resulting in the original block groups being emptied and becoming unallocated space. Unallocated space can then be assigned as either data or metadata block groups, thereby allowing the btrfs kernel code to create block groups per a workload specific data/metadata usage ratio.

[2]
https://lore.kernel.org/linux-btrfs/20230803211258.GA3669918@zen/#t

It looks like it is now fixed.
The btrfs-balance timer has been activated on all builders to avoid such a situation in the future.

Metadata Update from @darknao:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

This was actually not on builders themselves, but the netapp volume. It should be all working again now.

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog
Related Pull Requests
  • #1533 Merged 2 years ago