#7 Added the infra SOPs ported to asciidoc.
Merged 2 years ago by pbokoc. Opened 2 years ago by asaleh.
asaleh/infra-docs-fpo master  into  master

Remove old sysadmin guide
Adam Saleh • 2 years ago  
file modified
-2
@@ -16,5 +16,3 @@ 

  - modules/ROOT/nav.adoc

  - modules/developer_guide/nav.adoc

  - modules/sysadmin_guide/nav.adoc

- - modules/old_sysadmin_guide/nav.adoc

- - modules/communishift/nav.adoc

@@ -7,11 +7,9 @@ 

  

  Services handling identity and providing personal space to our contributors.

  

- FAS https://fas.fedoraproject.org[fas.fp.o]::

- The __F__edora __A__ccount __S__ystem, our directory and identity management

- tool, provides community members with a single account to login on Fedora

- services. https://admin.fedoraproject.org/accounts/user/new[Creating an

- account] is one of the first things to do if you plan to work on Fedora.

+ Accounts https://accounts.fedoraproject.org/[accounts.fp.o]::

+ Our directory and identity management tool provides community members with a single account to login on Fedora

+ services. Registering an account there is one of the first things to do if you plan to work on Fedora.

  

  Fedora People https://fedorapeople.org/[fedorapeople.org]::

  Personnal web space provided to community members to share files, git

@@ -1,1 +0,0 @@ 

- * xref:index.adoc[Communishift documentation]

@@ -1,10 +0,0 @@ 

- :experimental:

- = Communishift documentation

- 

- link:https://console-openshift-console.apps.os.fedorainfracloud.org/[Communishift] is the name for the OpenShift community cluster run by the Fedora project.

- It's intended to be a place where community members can test/deploy/run things that are of benefit to the community at a lower SLE (Service Level Expectation) than services directly run and supported by infrastructure, additionally doing so in a self service manner.

- It's also an incubator for applications that may someday be more fully supported once they prove their worth.

- Finally, it's a place for Infrastructure folks to learn and test and discover OpenShift in a less constrained setting than our production clusters.

- 

- This documentation focuses on implementation details of Fedora's OpenShift instance, not on OpenShift usage in general.

- These instructions are already covered by link:https://docs.openshift.com/container-platform/4.1/welcome/index.html[upstream documentation].

@@ -1,1 +0,0 @@ 

- * link:https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/old/index.html[Old System Administrator Guide]

file modified
+118 -1
@@ -1,1 +1,118 @@ 

- * link:https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/index.html[System Administrator Guide]

+ * xref:orientation.adoc[Orientation for Sysadmin Guide]

+ * xref:index.adoc[Sysadmin Guide]

+ ** xref:2-factor.adoc[Two factor auth]

+ ** xref:accountdeletion.adoc[Account Deletion SOP]

+ ** xref:anitya.adoc[Anitya Infrastructure SOP]

+ ** xref:ansible.adoc[ansible - SOP]

+ ** xref:apps-fp-o.adoc[apps-fp-o - SOP]

+ ** xref:archive-old-fedora.adoc[How to Archive Old Fedora Releases - SOP]

+ ** xref:arm.adoc[Fedora ARM Infrastructure - SOP]

+ ** xref:aws-access.adoc[Amazon Web Services Access - SOP]

+ ** xref:bastion-hosts-info.adoc[Fedora Bastion Hosts - SOP]

+ ** xref:blockerbugs.adoc[Blockerbugs Infrastructure - SOP]

+ ** xref:bodhi.adoc[Bodhi Infrastructure - SOP]

+ ** xref:bugzilla.adoc[Bugzilla Sync Infrastructure - SOP]

+ ** xref:bugzilla2fedmsg.adoc[bugzilla2fedmsg - SOP]

+ ** xref:collectd.adoc[Collectd - SOP]

+ ** xref:compose-tracker.adoc[Compose Tracker - SOP]

+ ** xref:contenthosting.adoc[Content Hosting Infrastructure - SOP]

+ ** xref:copr.adoc[Copr - SOP]

+ ** xref:database.adoc[Database Infrastructure - SOP]

+ ** xref:datanommer.adoc[datanommer - SOP]

+ ** xref:debuginfod.adoc[Fedora Debuginfod Service - SOP]

+ ** xref:departing-admin.adoc[Departing admin - SOP]

+ ** xref:dns.adoc[DNS repository for fedoraproject - SOP]

+ ** xref:docs.fedoraproject.org.adoc[Docs - SOP]

+ ** xref:fas-notes.adoc[Fedora Account System - SOP]

+ ** xref:fas-openid.adoc[FAS-OpenID - SOP]

+ ** xref:fedmsg-certs.adoc[fedmsg (Fedora Messaging) Certs, Keys, and CA - SOP]

+ ** xref:fedmsg-gateway.adoc[fedmsg-gateway - SOP]

+ ** xref:fedmsg-introduction.adoc[fedmsg introduction and basics - SOP]

+ ** xref:fedmsg-irc.adoc[fedmsg-irc - SOP]

+ ** xref:fedmsg-new-message-type.adoc[Adding a new fedmsg message type - SOP]

+ ** xref:fedmsg-relay.adoc[fedmsg-relay - SOP]

+ ** xref:fedmsg-websocket.adoc[WebSocket - SOP]

+ ** xref:fedocal.adoc[Fedocal - SOP]

+ ** xref:fedora-releases.adoc[Fedora Release Infrastructure - SOP]

+ ** xref:fedorawebsites.adoc[Websites Release - SOP]

+ ** xref:fmn.adoc[FedMsg Notifications (FMN) - SOP]

+ ** xref:gather-easyfix.adoc[Fedora gather easyfix - SOP]

+ ** xref:gdpr_delete.adoc[GDPR Delete - SOP]

+ ** xref:gdpr_sar.adoc[GDPR SAR - SOP]

+ ** xref:geoip-city-wsgi.adoc[geoip-city-wsgi - SOP]

+ ** xref:github2fedmsg.adoc[github2fedmsg - SOP]

+ ** xref:github.adoc[Using github for Infra Projects - SOP]

+ ** xref:greenwave.adoc[Greenwave - SOP]

+ ** xref:guestdisk.adoc[Guest Disk Resize - SOP]

+ ** xref:guestedit.adoc[Guest Editing - SOP]

+ ** xref:haproxy.adoc[Haproxy Infrastructure - SOP]

+ ** xref:hotfix.adoc[HOTFIXES - SOP]

+ ** xref:hotness.adoc[The New Hotness - SOP]

+ ** xref:infra-git-repo.adoc[Infrastructure Git Repos - SOP]

+ ** xref:infra-hostrename.adoc[Infrastructure Host Rename - SOP]

+ ** xref:infra-raidmismatch.adoc[Infrastructure Raid Mismatch Count - SOP]

+ ** xref:infra-repo.adoc[Infrastructure Yum Repo - SOP]

+ ** xref:infra-retiremachine.adoc[Infrastructure retire machine - SOP]

+ ** xref:ipsilon.adoc[Ipsilon Infrastructure - SOP]

+ ** xref:iscsi.adoc[iSCSI - SOP]

+ ** xref:jenkins-fedmsg.adoc[Jenkins Fedmsg - SOP]

+ ** xref:kerneltest-harness.adoc[Kerneltest-harness - SOP]

+ ** xref:kickstarts.adoc[Kickstart Infrastructure - SOP]

+ ** xref:koji.adoc[Koji Infrastructure - SOP]

+ ** xref:koji-archive.adoc[Koji Archive - SOP]

+ ** xref:koji-builder-setup.adoc[Setup Koji Builder - SOP]

+ ** xref:koschei.adoc[Koschei - SOP]

+ ** xref:layered-image-buildsys.adoc[Layered Image Build System - SOP]

+ ** xref:mailman.adoc[Mailman Infrastructure - SOP]

+ ** xref:making-ssl-certificates.adoc[SSL Certificate Creation - SOP]

+ ** xref:massupgrade.adoc[Mass Upgrade Infrastructure - SOP]

+ ** xref:mastermirror.adoc[Master Mirror Infrastructure - SOP]

+ ** xref:mbs.adoc[Module Build Service Infra - SOP]

+ ** xref:memcached.adoc[Memcached Infrastructure - SOP]

+ ** xref:message-tagging-service.adoc[Message Tagging Service - SOP]

+ ** xref:mirrorhiding.adoc[Mirror Hiding Infrastructure - SOP]

+ ** xref:mirrormanager.adoc[MirrorManager Infrastructure - SOP]

+ ** xref:mirrormanager-S3-EC2-netblocks.adoc[AWS Mirrors - SOP]

+ ** xref:mote.adoc[mote - SOP]

+ ** xref:nagios.adoc[Fedora Infrastructure Nagios - SOP]

+ ** xref:netapp.adoc[Netapp Infrastructure - SOP]

+ ** xref:new-hosts.adoc[DNS Host Addition - SOP]

+ ** xref:nonhumanaccounts.adoc[Non-human Accounts Infrastructure - SOP]

+ ** xref:nuancier.adoc[Nuancier - SOP]

+ ** xref:odcs.adoc[On Demand Compose Service - SOP]

+ ** xref:openqa.adoc[OpenQA Infrastructure - SOP]

+ ** xref:openshift.adoc[OpenShift - SOP]

+ ** xref:openvpn.adoc[OpenVPN - SOP]

+ ** xref:outage.adoc[Outage Infrastructure - SOP]

+ ** xref:packagereview.adoc[Package Review - SOP]

+ ** xref:pagure.adoc[Pagure Infrastructure - SOP]

+ ** xref:pdc.adoc[PDC - SOP]

+ ** xref:pesign-upgrade.adoc[Pesign upgrades/reboots - SOP]

+ ** xref:planetsubgroup.adoc[Planet Subgroup Infrastructure - SOP]

+ ** xref:publictest-dev-stg-production.adoc[Fedora Infrastructure Machine Classes - SOP]

+ ** xref:rabbitmq.adoc[RabbitMQ - SOP]

+ ** xref:rdiff-backup.adoc[rdiff-backup - SOP]

+ ** xref:registry.adoc[Container registry - SOP]

+ ** xref:requestforresources.adoc[Request for resources - SOP]

+ ** xref:resultsdb.adoc[ResultsDB - SOP]

+ ** xref:retrace.adoc[Retrace - SOP]

+ ** xref:scmadmin.adoc[SCM Admin - SOP]

+ ** xref:selinux.adoc[SELinux Infrastructure - SOP]

+ ** xref:sigul-upgrade.adoc[Sigul servers upgrades/reboots - SOP]

+ ** xref:simple_koji_ci.adoc[simple_koji_ci - SOP]

+ ** xref:sshaccess.adoc[SSH Access Infrastructure - SOP]

+ ** xref:sshknownhosts.adoc[SSH known hosts Infrastructure - SOP]

+ ** xref:staging.adoc[Staging - SOP]

+ ** xref:status-fedora.adoc[Fedora Status Service - SOP]

+ ** xref:syslog.adoc[Log Infrastructure - SOP]

+ ** xref:tag2distrepo.adoc[Tag2DistRepo Infrastructure - SOP]

+ ** xref:torrentrelease.adoc[Torrent Releases Infrastructure - SOP]

+ ** xref:unbound.adoc[Fedora Infra Unbound Notes - SOP]

+ ** xref:virt-image.adoc[Fedora Infrastructure Kpartx Notes - SOP]

+ ** xref:virtio.adoc[Virtio Notes - SOP]

+ ** xref:virt-notes.adoc[Fedora Infrastructure Libvirt Notes - SOP]

+ ** xref:voting.adoc[Voting Infrastructure - SOP]

+ ** xref:waiverdb.adoc[WaiverDB - SOP]

+ ** xref:wcidff.adoc[What Can I Do For Fedora - SOP]

+ ** xref:wiki.adoc[Wiki Infrastructure - SOP]

+ ** xref:zodbot.adoc[Zodbot Infrastructure - SOP]

@@ -0,0 +1,98 @@ 

+ = Two factor auth

+ 

+ Fedora Infrastructure has implemented a form of two factor auth for

+ people who have sudo access on Fedora machines. In the future we may

+ expand this to include more than sudo but this was deemed to be a high

+ value, low hanging fruit.

+ 

+ == Using two factor

+ 

+ http://fedoraproject.org/wiki/Infrastructure_Two_Factor_Auth

+ 

+ To enroll a Yubikey, use the fedora-burn-yubikey script like normal. To

+ enroll using FreeOTP or Google Authenticator, go to

+ https://admin.fedoraproject.org/totpcgiprovision/

+ 

+ === What's enough authentication?

+ 

+ FAS Password+FreeOTP or FAS Password+Yubikey Note: don't actually enter

+ a +, simple enter your FAS Password and press your yubikey or enter your

+ FreeOTP code.

+ 

+ == Administrating and troubleshooting two factor

+ 

+ Two factor auth is implemented by a modified copy of the

+ https://github.com/mricon/totp-cgi project doing the authentication and

+ pam_url submitting the authentication tokens.

+ 

+ totp-cgi runs on the fas servers (currently fas01.stg and

+ fas01/fas02/fas03 in production), listening on port 8443 for pam_url

+ requests.

+ 

+ FreeOTP, Google authenticator and yubikeys are supported as tokens to

+ use with your password.

+ 

+ === FreeOTP, Google authenticator:

+ 

+ FreeOTP application is preferred, however Google authenticator works as

+ well. (Note that Google authenticator is not open source)

+ 

+ This is handled via totpcgi. There's a command line tool to manage

+ users, totpprov. See 'man totpprov' for more info. Admins can use this

+ tool to revoke lost tokens (google authenticator only) with 'totpprov

+ delete-user username'

+ 

+ To enroll using FreeOTP or Google Authenticator for production machines,

+ go to https://admin.fedoraproject.org/totpcgiprovision/

+ 

+ To enroll using FreeOTP or Google Authenticator for staging machines, go

+ to https://admin.stg.fedoraproject.org/totpcgiprovision/

+ 

+ You'll be prompted to login with your fas username and password.

+ 

+ Note that staging and production differ.

+ 

+ === YubiKeys:

+ 

+ Yubikeys are enrolled and managed in FAS. Users can self-enroll using

+ the fedora-burn-yubikey utility included in the fedora-packager package.

+ 

+ === What do I do if I lose my token?

+ 

+ Send an email to admin@fedoraproject.org that is encrypted/signed with

+ your gpg key from FAS, or otherwise identifies you are you.

+ 

+ === How to remove a token (so the user can re-enroll)?

+ 

+ First we MUST verify that the user is who they say they are, using any

+ of the following:

+ 

+ * Personal contact where the person can be verified by member of

+ sysadmin-main.

+ * Correct answers to security questions.

+ * Email request to admin@fedoraproject.org that is gpg encrypted by the

+ key listed for the user in fas.

+ 

+ Then:

+ 

+ . For google authenticator,

+ +

+ ____

+ .. ssh into batcave01 as root

+ .. ssh into os-master01.iad2.fedoraproject.org

+ .. $ oc project fas

+ .. $ oc get pods

+ .. $ oc rsh <pod> (Pick one of totpcgi pods from the above list)

+ .. $ totpprov delete-user <username>

+ ____

+ . For yubikey: login to one of the fas machines and run:

+ /usr/local/bin/yubikey-remove.py username

+ 

+ The user can then go to

+ https://admin.fedoraproject.org/totpcgiprovision/ and reprovision a new

+ device.

+ 

+ If the user emails admin@fedoraproject.org with the signed request, make

+ sure to reply to all indicating that a reset was performed. This is so

+ that other admins don't step in and reset it again after its been reset

+ once.

@@ -0,0 +1,294 @@ 

+ = Account Deletion SOP

+ 

+ For the most part we do not delete accounts. In the case that a deletion

+ is paramount, it will need to be coordinated with appropriate entities.

+ 

+ Disabling accounts is another story but is limited to those with the

+ appropriate privileges. Reasons for accounts to be disabled can be one

+ of the following:

+ 

+ ____

+ * Person has placed SPAM on the wiki or other sites.

+ * It is seen that the account has been compromised by a third party.

+ * A person wishes to leave the Fedora Project and wants the account

+ disabled.

+ ____

+ 

+ == Contents

+ 

+ * <<_disabling>>

+ ** <<_disable_accounts>>

+ ** <<_disable_groups>>

+ * <<_user_requested_disables>>

+ * <<_renames>>

+ ** <<_rename_accounts>>

+ ** <<_rename_groups>>

+ * <<_deletion>>

+ ** <<_delete_accounts>>

+ ** <<_delete_groups>>

+ 

+ === Disabling

+ 

+ Disabling accounts is the easiest to accomplish as it just blocks people

+ from using their account. It does not remove the account name and

+ associated UID so we don't have to worry about future, unintentional

+ collisions.

+ 

+ == Disable Accounts

+ 

+ To begin with, accounts should not be disabled until there is a ticket

+ in the Infrastructure ticketing system. After that the contents inside

+ the ticket need to be verified (to make sure people aren't playing

+ pranks or someone is in a crappy mood). This needs to be logged in the

+ ticket (who looked, what they saw, etc). Then the account can be

+ disabled.:

+ 

+ ....

+ ssh db02

+ sudo -u postgres pqsql fas2

+ 

+ fas2=# begin;

+ fas2=# select * from people where username = 'FOOO';

+ ....

+ 

+ Here you need to verify that the account looks right, that there is only

+ one match, or other issues. If there are multiple matches you need to

+ contact one of the main sysadmin-db's on how to proceed.:

+ 

+ ....

+ fas2=# update people set status = 'admin_disabled' where username = 'FOOO';

+ fas2=# commit;

+ fas2=# /q

+ ....

+ 

+ == Disable Groups

+ 

+ There is no explicit way to disable groups in FAS2. Instead, we close

+ the group for adding new members and optionally remove existing members

+ from it. This can be done from the web UI if you are an administrator of

+ the group or you are in the accounts group. First, go to the group info

+ page. Then click the (edit) link next to Group Details. Make sure that

+ the Invite Only box is checked. This will prevent other users from

+ requesting the group on their own.

+ 

+ If you want to remove the existing users, View the Group info, then

+ click on the View Member List link. Click on All under the Results

+ heading. Then go through and click on Remove for each member.

+ 

+ Doing this in the database instead can be quicker if you have a lot of

+ people to remove. Once again, this requires someone in sysadmin-db to do

+ the work:

+ 

+ ....

+ ssh db02

+ sudo -u postgres pqsql fas2

+ 

+ fas2=# begin;

+ fas2=# update group, set invite_only = true where name = 'FOOO';

+ fas2=# commit;

+ fas2=# begin;

+ fas2=# select p.name, g.name, r.role_status from people as p, person_roles as r, groups as g

+ where p.id = r.person_id and g.id = r.group_id

+ and g.name = 'FOOO';

+ fas2=# -- Make sure that the list of users in the groups looks correct

+ fas2=# delete from person_roles where person_roles.group_id = (select id from groups where g.name = 'FOOO');

+ fas2=# -- number of rows in both of the above should match

+ fas2=# commit;

+ fas2=# /q

+ ....

+ 

+ === User Requested Disables

+ 

+ According to our Privacy Policy, a user may request that their personal

+ information from FAS if they want to disable their account. We can do

+ this but need to do some extra work over simply setting the account

+ status to disabled.

+ 

+ == Record User's CLA information

+ 

+ If the user has signed the CLA/FPCA, then they may have contributed

+ something to Fedora that we'll need to contact them about at a later

+ date. For that, we need to keep at least the following information:

+ 

+ * Fedora username

+ * human name

+ * email address

+ 

+ All of this information should be on the CLA email that is sent out when

+ a user signs up. We need to verify with spot (Tom Callaway) that he has

+ that record. If not, we need to get it to him. Something like:

+ 

+ ....

+ select id, username, human_name, email, telephone, facsimile, postal_address from people where username = 'USERNAME';

+ ....

+ 

+ and send it to spot to keep.

+ 

+ == Remove the personal information

+ 

+ The following sequence of db commands should do it:

+ 

+ ....

+ fas2=# begin;

+ fas2=# select * from people where username = 'USERNAME';

+ ....

+ 

+ Here you need to verify that the account looks right, that there is only

+ one match, or other issues. If there are multiple matches you need to

+ contact one of the main sysadmin-db's on how to proceed.:

+ 

+ ....

+ fas2=# update people set human_name = '', gpg_keyid = null, ssh_key = null, unverified_email = null, comments = null, postal_address = null, telephone = null, facsimile = null, affiliation = null, ircnick = null, status = 'inactive', locale = 'C', timezone = null, latitude = null, longitude = null, country_code = null, email = 'disabled1@fedoraproject.org'  where username = 'USERNAME';

+ ....

+ 

+ Make sure only one record was updated:

+ 

+ ....

+ fas2=# select * from people where username = 'USERNAME';

+ ....

+ 

+ Make sure the correct record was updated:

+ 

+ ....

+ fas2=# commit;

+ ....

+ 

+ [NOTE]

+ .Note

+ ====

+ The email address is both not null and unique in the database. Due to

+ this, you need to set it to a new string for every user who requests

+ deletion like this.

+ ====

+ 

+ === Renames

+ 

+ In general, renames do not require as much work as deletions but they

+ still require coordination. This is because renames do not change the

+ UID/GID but some of our applications save information based on

+ username/groupname rather than UID/GID.

+ 

+ == Rename Accounts

+ 

+ [WARNING]

+ .Warning

+ ====

+ Needs more eyes This list may not be complete.

+ ====

+ 

+ * Check the databases for koji, pkgdb, and bodhi for occurrences of

+ the old username and update them to the new username.

+ * Check fedorapeople.org for home directories and yum repositories under

+ the old username that would need to be renamed

+ * Check (or ask the user to check and update) mailing list subscriptions

+ on fedorahosted.org and lists.fedoraproject.org under the old

+ username@fedoraproject.org email alias

+ * Check whether the user has a username@fedoraproject.org bugzilla

+ account in python-fedora and update that. Also ask the user to update

+ that in bugzilla.

+ * If the user is in a sysadmin-* group, check for home directories on

+ bastion and other infrastructure boxes that are owned by them and need

+ to be renamed (Could also just tell the user to backup any files there

+ themselves b/c they're getting a new home directory).

+ * grep through ansible for occurrences of the username

+ * Check for entries in trac on fedorahosted.org for the username as an

+ "Assigned to" or "CC" entry.

+ * Add other places to check here

+ 

+ == Rename Groups

+ 

+ [WARNING]

+ .Warning

+ ====

+ Needs more eyes This list may not be complete.

+ ====

+ * grep through ansible for occurrences of the group name.

+ * Check for group-members,group-admins,group-sponsors@fedoraproject.org

+ email alias presence in any fedorahosted.org or lists.fedoraproject.org

+ mailing list

+ * Check for entries in trac on fedorahosted.org for the username as an

+ "Assigned to" or "CC" entry.

+ * Add other places to check here

+ 

+ === Deletion

+ 

+ Deletion is the toughest one to audit because it requires that we look

+ through our systems looking for the UID and GID in addition to looking

+ for the username and password. The UID and GID are used on things like

+ filesystem permissions so we have to look there as well. Not catching

+ these places may lead to security issus should the UID/GID ever be

+ reused.

+ 

+ [NOTE]

+ .Note

+ ====

+ Recommended to rename instead When not strictly necessary to purge all

+ traces of an account, it's highlyrecommended to rename the user or group

+ to something like DELETED_oldusername instead of deleting. This avoids

+ the problems and additional checking that we have to do below.

+ ====

+ == Delete Accounts

+ 

+ [WARNING]

+ .Warning

+ ====

+ Needs more eyes This list may be incomplete. Needs more people to look

+ at this and find places that may need to be updated

+ ====

+ * Check everything for the #Rename Accounts case.

+ * Figure out what boxes a user may have had access to in the past. This

+ means you need to look at all the groups a user may ever have been

+ approved for (even if they are not approved for those groups now). For

+ instance, any git*, svn*, bzr*, hg* groups would have granted access to

+ hosted03 and hosted04. packager would have granted access to

+ pkgs.fedoraproject.org. Pretty much any group grants access to

+ fedorapeople.org.

+ * For those boxes, run a find over the files there to see if the UID

+ owns any files on the system:

+ +

+ ....

+ # find / -uid 100068 -print

+ ....

+ +

+ Any files owned by that uid must be reassigned to another user or::

+   removed.

+ 

+ [WARNING]

+ .Warning

+ ====

+ What to do about backups? Backups pose a special problem as they may

+ contain the uid that's being removed. Need to decide how to handle this

+ ====

+ * Add other places to check here

+ 

+ == Delete Groups

+ 

+ [WARNING]

+ .Warning

+ ====

+ Needs more eyes This list may be incomplete. Needs more people to look

+ at this and find places that may need to be updated

+ ====

+ * Check everything for the #Rename Groups case.

+ * Figure out what boxes may have had files owned by that group. This

+ means that you'd need to look at the users in that group, what boxes

+ they have shell accounts on, and then look at those boxes. groups used

+ for hosted would also need to add hosted03 and hosted04 to that list and

+ the box that serves the hosted mailing lists.

+ * For those boxes, run a find over the files there to see if the GID

+ owns any files on the system:

+ +

+ ....

+ # find / -gid 100068 -print

+ ....

+ +

+ Any files owned by that GID must be reassigned to another group or

+ removed.

+ 

+ [WARNING]

+ .Warning

+ ====

+ What to do about backups? Backups pose a special problem as they may

+ contain the gid that's being removed. Need to decide how to handle this

+ ====

+ * Add other places to check here

@@ -0,0 +1,210 @@ 

+ = Anitya Infrastructure SOP

+ 

+ Anitya is used by Fedora to track upstream project releases and maps

+ them to downstream distribution packages, including (but not limited to)

+ Fedora.

+ 

+ Anitya staging instance: https://stg.release-monitoring.org

+ 

+ Anitya production instance: https://release-monitoring.org

+ 

+ Anitya project page: https://github.com/fedora-infra/anitya

+ 

+ == Contact Information

+ 

+ Owner::

+   Fedora Infrastructure Team

+ Contact::

+   #fedora-admin, #fedora-apps

+ Persons::

+   zlopez

+ Location::

+   iad2.fedoraproject.org

+ Servers::

+   Production

+   +

+   * os-master01.iad2.fedoraproject.org

+   +

+   Staging

+   +

+   * os-master01.stg.iad2.fedoraproject.org

+ Purpose::

+   Map upstream releases to Fedora packages.

+ 

+ == Hosts

+ 

+ The current deployment is made up of release-monitoring OpenShift

+ namespace.

+ 

+ === release-monitoring

+ 

+ This OpenShift namespace runs following pods:

+ 

+ * The apache/mod_wsgi application for release-monitoring.org

+ * A libraries.io SSE client

+ * A service checking for new releases

+ 

+ This OpenShift project relies on:

+ 

+ * A postgres db server running in OpenShift

+ * Lots of external third-party services. The anitya webapp can scrape

+ pypi, rubygems.org, sourceforge and many others on command.

+ * Lots of external third-party services. The check service makes all

+ kinds of requests out to the Internet that can fail in various ways.

+ * Fedora messaging RabbitMQ hub for publishing messages

+ 

+ Things that rely on this host:

+ 

+ * `hotness-sop` is a fedora messaging consumer running in Fedora Infra

+ in OpenShift. It listens for Anitya messages from here and performs

+ actions on koji and bugzilla.

+ 

+ == Releasing

+ 

+ The release process is described in

+ https://anitya.readthedocs.io/en/latest/contributing.html#release-guide[Anitya

+ documentation].

+ 

+ === Deploying

+ 

+ Staging deployment of Anitya is deployed in OpenShift on

+ os-master01.stg.iad2.fedoraproject.org.

+ 

+ To deploy staging instance of Anitya you need to push changes to staging

+ branch on https://github.com/fedora-infra/anitya[Anitya GitHub]. GitHub

+ webhook will then automatically deploy a new version of Anitya on

+ staging.

+ 

+ Production deployment of Anitya is deployed in OpenShift on

+ os-master01.iad2.fedoraproject.org.

+ 

+ To deploy production instance of Anitya you need to push changes to

+ production branch on https://github.com/fedora-infra/anitya[Anitya

+ GitHub]. GitHub webhook will then automatically deploy a new version of

+ Anitya on production.

+ 

+ ==== Configuration

+ 

+ To deploy the new configuration, you need

+ https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/sshaccess.html[ssh

+ access] to batcave01.iad2.fedoraproject.org and

+ https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/ansible.html[permissions

+ to run the Ansible playbook].

+ 

+ All the following commands should be run from batcave01.

+ 

+ First, ensure there are no configuration changes required for the new

+ update. If there are, update the Ansible anitya role(s) and optionally

+ run the playbook:

+ 

+ ....

+ $ sudo rbac-playbook openshift-apps/release-monitoring.yml

+ ....

+ 

+ The configuration changes could be limited to staging only using:

+ 

+ ....

+ $ sudo rbac-playbook openshift-apps/release-monitoring.yml -l staging

+ ....

+ 

+ This is recommended for testing new configuration changes.

+ 

+ ==== Upgrading

+ 

+ ===== Staging

+ 

+ To deploy new version of Anitya you need to push changes to staging

+ branch on https://github.com/fedora-infra/anitya[Anitya GitHub]. GitHub

+ webhook will then automatically deploy a new version of Anitya on

+ staging.

+ 

+ ===== Production

+ 

+ To deploy new version of Anitya you need to push changes to production

+ branch on https://github.com/fedora-infra/anitya[Anitya GitHub]. GitHub

+ webhook will then automatically deploy a new version of Anitya on

+ production.

+ 

+ Congratulations! The new version should now be deployed.

+ 

+ == Administrating release-monitoring.org

+ 

+ Anitya web application offers some functionality to administer itself.

+ 

+ User admin status is tracked in Anitya database. Admin users can grant

+ or revoke admin priviledges to users in the

+ https://release-monitoring.org/users[users tab].

+ 

+ Admin users have additional functionality available in web interface. In

+ particular, admins can view flagged projects, remove projects and remove

+ package mappings etc.

+ 

+ For more information see

+ https://anitya.readthedocs.io/en/stable/admin-user-guide.html[Admin user

+ guide] in Anitya documentation.

+ 

+ === Flags

+ 

+ Anitya lets users flag projects for administrator attention. This is

+ accessible to administrators in the

+ https://release-monitoring.org/flags[flags tab].

+ 

+ == Monitoring

+ 

+ To monitor the activity of Anitya you can connect to Fedora infra

+ OpenShift and look at the state of pods.

+ 

+ For staging look at the [.title-ref]#release-monitoring# namespace in

+ https://os.stg.fedoraproject.org/console/project/release-monitoring/overview[staging

+ OpenShift instance].

+ 

+ For production look at the [.title-ref]#release-monitoring# namespace in

+ https://os.fedoraproject.org/console/project/release-monitoring/overview[production

+ OpenShift instance].

+ 

+ == Troubleshooting

+ 

+ This section contains various issues encountered during deployment or

+ configuration changes and possible solutions.

+ 

+ === Fedmsg messages aren't sent

+ 

+ *Issue:* Fedmsg messages aren't sent.

+ 

+ *Solution:* Set USER environment variable in pod.

+ 

+ *Explanation:* Fedmsg is using USER env variable as a username inside

+ messages. Without USER env set it just crashes and didn't send anything.

+ 

+ === Cronjob is crashing

+ 

+ *Issue:* Cronjob pod is crashing on start, even after configuration

+ change that should fix the behavior.

+ 

+ *Solution:* Restart the cronjob. This could be done by OPS.

+ 

+ *Explanation:* Every time the cronjob is executed after crash it is

+ trying to actually reuse the pod with bad configuration instead of

+ creating a new one with new configuration.

+ 

+ === Database migration is taking too long

+ 

+ *Issue:* Database migration is taking few hours to complete.

+ 

+ *Solution:* Stop every pod and cronjob before migration.

+ 

+ *Explanation:* When creating new index or doing some other complex

+ operation on database, the migration script needs exclusive access to

+ the database.

+ 

+ === Old version is deployed instead the new one

+ 

+ *Issue:* The pod is deployed with old version of Anitya, but it says

+ that it was triggered by correct commit.

+ 

+ *Solution:* Set [.title-ref]#dockerStrategy# in buildconfig.yml to

+ noCache.

+ 

+ *Explanation:* The OpenShift is by default caching the layers of docker

+ containers, so if there is no change in Dockerfile it will just use the

+ cached version and don't run the commands again.

@@ -0,0 +1,249 @@ 

+ = Ansible infrastructure SOP/Information.

+ 

+ == Background

+ 

+ Fedora infrastructure used to use func and puppet for system change

+ management. We are now using ansible for all system change mangement and

+ ad-hoc tasks.

+ 

+ == Overview

+ 

+ Ansible runs from batcave01 or backup01. These hosts run a ssh-agent

+ that has unlocked the ansible root ssh private key. (This is unlocked

+ manually by a human with the passphrase each reboot, the passphrase

+ itself is not stored anywhere on the machines). Using 'sudo -i',

+ sysadmin-main members can use this agent to access any machines with the

+ ansible root ssh public key setup, either with 'ansible' for one-off

+ commands or 'ansible-playbook' to run playbooks.

+ 

+ Playbooks are idempotent (or should be). Meaning you should be able to

+ re-run the same playbook over and over and it should get to a state

+ where 0 items are changing.

+ 

+ Additionally (see below) there is a rbac wrapper that allows members of

+ some other groups to run playbooks against specific hosts.

+ 

+ === GIT repositories

+ 

+ There are 2 git repositories associated with Ansible:

+ 

+ * The Fedora Infrastructure Ansible repository and replicas.

+ 

+ [CAUTION]

+ ====

+ This is a public repository. Never commit private data to this repo.

+ ====

+ 

+ image:ansible-repositories.png[image]

+ 

+ This repository exists as several copies or replicas:

+ 

+ ** The "upstream" repository on Pagure.

+ 

+ https://pagure.io/fedora-infra/ansible

+ 

+ This repository is the public facing place where people can contribute

+ (e.g. pull requests) as well as the authoritative source. Members of the

+ `sysadmin` FAS group or the `fedora-infra` Pagure group have commit

+ access to this repository.

+ 

+ To contribute changes, fork the repository on Pagure and submit a Pull

+ Request. Someone from the aforementioned groups can then review and

+ merge them.

+ 

+ It is recommended that you configure git to use `pull --rebase` by

+ default by running `git config --bool pull.rebase true` in your ansible

+ clone directory. This configuration prevents unneeded merges which can

+ occur if someone else pushes changes to the remote repository while you

+ are working on your own local changes.

+ 

+ ** Two bare mirrors on _batcave01_, `/srv/git/ansible.git`

+ and `/srv/git/mirrors/ansible.git`

+ 

+ [CAUTION]

+ ====

+ These are public repositories. Never commit private data to these

+ repositories. Don't commit or push to these repos directly, unless

+ Pagure is unavailable.

+ ====

+ 

+ The `mirror_pagure_ansible` service on _batcave01_ receives

+ bus messages about changes in the repository on Pagure, fetches these

+ into `/srv/git/mirrors/ansible.git` and pushes from there to

+ `/srv/git/ansible.git`. When this happens, various actions are triggered

+ via git hooks:

+ 

+ *** The working copy at `/srv/web/infra/ansible` is updated.

+ 

+ *** A mail about the changes is sent to _sysadmin-members_.

+ 

+ *** The changes are announced on the message bus, which in turn triggers

+ announcements on IRC.

+ 

+ You can check out the repo locally on _batcave01_ with:

+ 

+ ....

+ git clone /srv/git/ansible.git

+ ....

+ 

+ If the Ansible repository on Pagure is unavailable, members of the

+ _sysadmin_ group may commit directly, provided this

+ procedure is followed:

+ [arabic]

+ . The synchronization service is stopped and disabled:

+ 

+ ....

+ sudo systemctl disable --now mirror_pagure_ansible.service

+ ....

+ . Changes are applied to the repository on _batcave01_.

+ . After Pagure is available again, the changes are pushed to the

+ repository there.

+ . The synchronization service is enabled and started:

+ 

+ ....

+ sudo systemctl enable --now mirror_pagure_ansible.service

+ ....

+ ** `/srv/web/infra/ansible` on _batcave01_, the working copy

+ from which playbooks are run.

+ 

+ [CAUTION]

+ ====

+ This is a public repository. Never commit private data to this repo.

+ Don't commit or push to this repo directly, unless Pagure is

+ unavailable.

+ ====

+ +

+ You can access it also via a cgit web interface at:

+ https://pagure.io/fedora-infra/ansible/

+ 

+ * `/srv/git/ansible-private` on _batcave01_.

+ 

+ [CAUTION]

+ ====

+ This is a private repository for passwords and other sensitive data. It

+ is not available in cgit, nor should it be cloned or copied remotely.

+ ====

+ 

+ This repository is only accessible to members of 'sysadmin-main'.

+ 

+ === Cron job/scheduled runs

+ 

+ With use of run_ansible-playbook_cron.py that is run daily via cron we

+ walk through playbooks and run them with _--check --diff_

+ params to perform a dry-run.

+ 

+ This way we make sure all the playbooks are idempotent and there is no

+ unexpected changes on servers (or playbooks).

+ 

+ === Logging

+ 

+ We have in place a callback plugin that stores history for any

+ ansible-playbook runs and then sends a report each day to

+ sysadmin-logs-members with any CHANGED or FAILED actions. Additionally,

+ there's a fedmsg plugin that reports start and end of ansible playbook

+ runs to the fedmsg bus. Ansible also logs to syslog verbose reporting of

+ when and what commands and playbooks were run.

+ 

+ === role based access control for playbooks

+ 

+ There's a wrapper script on _batcave01_ called 'rbac-playbook' that allows

+ non sysadmin-main members to run specific playbooks against specific

+ groups of hosts. This is part of the ansible_utils package. The upstream

+ for ansible_utils is: https://bitbucket.org/tflink/ansible_utils

+ 

+ To add a new group:

+ 

+ [arabic]

+ . add the playbook name and sysadmin group to the rbac-playbook

+ (ansible-private repo)

+ . add that sysadmin group to sudoers on batcave01 (also in

+ ansible-private repo)

+ 

+ To use the wrapper:

+ 

+ ....

+ sudo rbac-playbook playbook.yml

+ ....

+ 

+ == Directory setup

+ 

+ === Inventory

+ 

+ The inventory directory tells ansible all the hosts that are managed by

+ it and the groups they are in. All files in this dir are concatenated

+ together, so you can split out groups/hosts into separate files for

+ readability. They are in ini file format.

+ 

+ Additionally under the inventory directory are host_vars and group_vars

+ subdirectories. These are files named for the host or group and

+ containing variables to set for that host or group. You should strive to

+ set variables in the highest level possible, and precedence is in:

+ global, group, host order.

+ 

+ === Vars

+ 

+ This directory contains global variables as well as OS specific

+ variables. Note that in order to use the OS specific ones you must have

+ 'gather_facts' as 'True' or ansible will not have the facts it needs to

+ determine the OS.

+ 

+ === Roles

+ 

+ Roles are a collection of tasks/files/templates that can be used on any

+ host or group of hosts that all share that role. In other words, roles

+ should be used except in cases where configuration only applies to a

+ single host. Roles can be reused between hosts and groups and are more

+ portable/flexable than tasks or specific plays.

+ 

+ === Scripts

+ 

+ In the ansible git repo under scripts are a number of utilty scripts for

+ sysadmins.

+ 

+ === Playbooks

+ 

+ In the ansible git repo there's a directory for playbooks. The top level

+ contains utility playbooks for sysadmins. These playbooks perform

+ one-off functions or gather information. Under this directory are hosts

+ and groups playbooks. These playbooks are for specific hosts and groups

+ of hosts, from provision to fully configured. You should only use a host

+ playbook in cases where there will never be more than one of that thing.

+ 

+ === Tasks

+ 

+ This directory contains one-off tasks that are used in playbooks. Some

+ of these should be migrated to roles (we had this setup before roles

+ existed in ansible). Those that are truely only used on one host/group

+ could stay as isolated tasks.

+ 

+ === Syntax

+ 

+ Ansible now warns about depreciated syntax. Please fix any cases you see

+ related to depreciation warnings.

+ 

+ Templates use the jinja2 syntax.

+ 

+ == Libvirt virtuals

+ 

+ * TODO: add steps to make new libvirt virtuals in staging and production

+ * TODO: merge in new-hosts.txt

+ 

+ == Cloud Instances

+ 

+ * TODO: add how to make new cloud instances

+ * TODO: merge in from ansible README file.

+ 

+ == rdiff-backups

+ 

+ see:

+ https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/rdiff-backup.html

+ 

+ == Additional Reading/Resources

+ 

+ Upstream docs:::

+   https://docs.ansible.com/

+ Example repo with all kinds of examples:::

+   * https://github.com/ansible/ansible-examples

+   * https://gist.github.com/marktheunissen/2979474

+ Jinja2 docs:::

+   http://jinja.pocoo.org/docs/

@@ -0,0 +1,31 @@ 

+ = apps-fp-o SOP

+ 

+ Updating and maintaining the landing page at

+ https://apps.fedoraproject.org/

+ 

+ == Contact Information

+ 

+ Owner:::

+   Fedora Infrastructure Team

+ Contact:::

+   #fedora-apps, #fedora-admin

+ Servers:::

+   proxy0*

+ Purpose:::

+   Have a nice landing page for all our webapps.

+ 

+ == Description

+ 

+ We have a number of webapps, many of which our users don't know about.

+ This page was created so there was a central place where users could

+ stumble through them and learn.

+ 

+ The page is generated by a ansible role in ansible/roles/apps-fp-o/ It

+ makes use of an RPM package, the source code for which is at

+ https://github.com/fedora-infra/apps.fp.o

+ 

+ You can update the page by updating the apps.yaml file in that ansible

+ module.

+ 

+ When ansible is run next, the two ansible handlers should see your

+ changes and regenerate the static html and json data for the page.

@@ -0,0 +1,104 @@ 

+ = How to Archive Old Fedora Releases

+ 

+ The Fedora download servers contain terabytes of data, and to allow for

+ mirrors to not have to take all of that data, infrastructure regularly

+ moves data of end of lifed releases (from `/pub/fedora/linux`) to the

+ archives section (`/pub/archive/fedora/linux`)

+ 

+ == Steps Involved

+ 

+ [arabic]

+ . log into batcave01.phx2.fedoraproject.org and ssh to bodhi-backend01

+ +

+ [source]

+ ----

+ $ sudo -i ssh root@bodhi-backend01.iad2.fedoraproject.org

+ # su - ftpsync

+ ----

+ 

+ . Then change into the releases directory.

+ +

+ [source]

+ ----

+ $ cd /pub/fedora/linux/releases

+ ----

+ 

+ . Check to see that the target directory doesn't already exist.

+ +

+ [source]

+ ----

+ $ ls /pub/archive/fedora/linux/releases/

+ ----

+ 

+ . If the target directory does not already exist, do a recursive link

+ copy of the tree you want to the target

+ +

+ [source]

+ ----

+ $ cp -lvpnr 21 /pub/archive/fedora/linux/releases/21

+ ----

+ 

+ . If the target directory already exists, then we need to do a recursive

+ rsync to update any changes in the trees since the previous copy.

+ +

+ [source]

+ ----

+ $ rsync -avAXSHP --delete ./21/ /pub/archive/fedora/linux/releases/21/

+ ----

+ 

+ . We now do the updates and updates/testing in similar ways.

+ +

+ [source]

+ ----

+ $ cd ../updates/

+ $ cp -lpnr 21 /pub/archive/fedora/linux/updates/21

+ $ cd testing

+ $ cp -lpnr 21 /pub/archive/fedora/linux/updates/testing/21

+ ----

+ +

+ Alternative if this is a later refresh of an older copy.

+ +

+ [source]

+ ----

+ $ cd ../updates/

+ $ rsync -avAXSHP 21/ /pub/archive/fedora/linux/updates/21/

+ $ cd testing

+ $ rsync -avAXSHP 21/ /pub/archive/fedora/linux/updates/testing/21/

+ ----

+ 

+ . Do the same with fedora-secondary.

+ 

+ . Announce to the mirror list this has been done and that in 2 weeks you

+ will move the old trees to archives.

+ 

+ . In two weeks, log into mm-backend01 and run the archive script

+ +

+ [source]

+ ----

+ $ sudo -u mirrormanager mm2_move-to-archive --originalCategory="Fedora Linux" --archiveCategory="Fedora Archive" --directoryRe='/21/Everything'

+ ----

+ 

+ . If there are problems, the postgres DB may have issues and so you need

+ to get a DBA to update the backend to fix items.

+ 

+ . Wait an hour or so then you can remove the files from the main tree.

+ +

+ [source]

+ ----

+ $ ssh bodhi-backend01

+ $ cd /pub/fedora/linux

+ $ cd releases/21

+ $ ls # make sure you have stuff here

+ $ rm -rf *

+ $ ln ../20/README .

+ $ cd ../../updates/21

+ $ ls #make sure you have stuff here

+ $ rm -rf *

+ $ ln ../20/README .

+ $ cd ../testing/21

+ $ ls # make sure you have stuff here

+ $ rm -rf *

+ $ ln ../20/README .

+ ----

+ 

+ This should complete the archiving.

@@ -0,0 +1,206 @@ 

+ = Fedora ARM Infrastructure

+ 

+ == Contact Information

+ 

+ Owner::

+   Fedora Infrastructure Team

+ Contact::

+   #fedora-admin, sysadmin-main, sysadmin-releng

+ Location::

+   Phoenix

+ Servers::

+   arm01, arm02, arm03, arm04

+ Purpose::

+   Information on working with the arm SOCs

+ 

+ == Description

+ 

+ We have 4 arm chassis in phx2, each containing 24 SOCs (System On Chip).

+ 

+ Each chassis has 2 physical network connections going out from it. The

+ first one is used for the management interface on each SOC. The second

+ one is used for eth0 for each SOC.

+ 

+ Current allocations (2016-03-11):

+ 

+ arm01::

+   primary builders attached to koji.fedoraproject.org

+ arm02::

+   primary arch builders attached to koji.fedoraproject.org

+ arm03::

+   In cloud network, public qa/packager and copr instances

+ arm04::

+   primary arch builders attached to koji.fedoraproject.org

+ 

+ == Hardware Configuration

+ 

+ Each SOC has:

+ 

+ * eth0 and eth1 (unused) and a management interface.

+ * 4 cores

+ * 4GB ram

+ * a 300GB disk

+ 

+ SOCs are addressed by:

+ 

+ ....

+ arm{chassisnumber}-builder{number}.arm.fedoraproject.org

+ ....

+ 

+ Where chassisnumber is 01 to 04 and number is 00-23

+ 

+ == PXE installs

+ 

+ Kickstarts for the machines are in the kickstarts repo.

+ 

+ PXE config is on noc01. (or cloud-noc01.cloud.fedoraproject.org for

+ arm03)

+ 

+ The kickstart installs the latests Fedora and sets them up with a base

+ package set.

+ 

+ == IPMI tool Management

+ 

+ The SOCs are managed via their mgmt interfaces using a custom ipmitool

+ as well as a custom python script called 'cxmanage'. The ipmitool

+ changes have been submitted upstream and cxmanage is under review in

+ Fedora.

+ 

+ The ipmitool is currently installed on noc01 and it has ability to talk