We recently discovered that there are many machines in EC2 that are a result of failed provisions from Duffy.
When Duffy runs the Ansible playbook to create a machine, it can fail, and then Duffy will retry. However, it seems that the failed host might have been created and then something else goes wrong (e.g. perhaps a communication error), which causes Duffy to retry without deleting the host.
We should look into some form of reporting / cleanup mechanism (even just a Zabbix notify) that helps us avoid letting these hosts build up again.
Metadata Update from @arrfab: - Issue tagged with: centos-ci-infra
Log in to comment on this ticket.