I would like to implement monitoring for OpenShift apps using Nagios. I know there are some plans to replace Nagios with something else, but that hasn't happened yet and Nagios is already there. For me this is a blocker for moving Koschei to OpenShift - I'm not feeling comfortable having production Koschei without monitoring that is integrated with our existing alert system (email/IRC notifications).
I would like to start with monitoring number of pods. Nagios would check number of pods matching configured selector and compare it with configured range of expected numbers. Result of the check would be defined as follows:
Example:
Implementation: Nagios plugin, non-NRPE. There would be a service account created for Nagios. The account would have minimal privileges that would allow it to list pods, but nothing else. Credentials for the account would be stored on noc01 and noc02. Nagios plugin would use Kubernetes REST API to communicate with OpenShift. noc01 would talk directly to each of masters using internal addresses/names. noc02 would talk to OpenShift over public interface.
What do you think about this idea?
This sounds like a good idea. The plugins I looked at was:
https://github.com/appuio/nagios-plugins-openshift
Another example was
https://github.com/jmferrer/nagios-openshift
Sounds good to me. Either a basic script or leveraging one of those plugins...
Metadata Update from @mizdebsk: - Issue assigned to mizdebsk
This sounds like a good idea. The plugins I looked at was: https://github.com/appuio/nagios-plugins-openshift Another example was https://github.com/jmferrer/nagios-openshift
From the two above plugins I like nagios-plugins-openshift better. The approach it uses is almost the same as mine - one difference is that they use oc command to communicate with OpenShift, while I would use curl. If we want to have this plugin used then I can try to package it and build for epel7-infra (I don't want to maintain this package in EPEL 7 myself). Or I can write my own plugin and put it in ansible.git. We can talk about this during one of future meetings.
oc
curl
Metadata Update from @mizdebsk: - Issue priority set to: Waiting on Assignee (was: Next Meeting)
Nagios is frozen. I'll try to work on this ticket after final freeze (F30 GA).
Metadata Update from @mizdebsk: - Issue tagged with: unfreeze
Update: the freeze is over now, I am planning to work on this issue some time next week.
Metadata Update from @mizdebsk: - Issue untagged with: unfreeze
Currently I don't have time to work on this due to different priorities and upcoming vacation. Lack of monitoring is still blocking Koschei from moving to OpenShift and therefore I would still like this feature to be implemented, but it will need to wait a few months, unless someone else wants to work on this.
@mizdebsk I believe you have done that for Koschei, is there a small "How to" to do that for other applications ?
Metadata Update from @cverna: - Assignee reset
Going to close as we aren't moving on this and it should be rolled into the monitoring initiative
Metadata Update from @smooge: - Issue close_status updated to: Initiative Worthy - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.