#399 Increase gunicorn worker default timeout.
Closed 5 years ago by cverna. Opened 5 years ago by cverna.
cverna/greenwave increase_gunicorn_timeout  into  master

file modified
+1 -1
@@ -35,4 +35,4 @@ 

  RUN rm -rf ./fedmsg.d

  USER 1001

  EXPOSE 8080

- ENTRYPOINT docker/install-ca.sh && gunicorn-3 --workers 8 --bind 0.0.0.0:8080 --access-logfile=- --enable-stdio-inheritance greenwave.wsgi:app

+ ENTRYPOINT docker/install-ca.sh && gunicorn-3 --workers 8 --timeout 330 --graceful-timeout 300 --bind 0.0.0.0:8080 --access-logfile=- --enable-stdio-inheritance greenwave.wsgi:app

By default a gunicorn worker is killed and restarted after 30s.
In greenwave's case it can happen for a worker to have to wait
longer to fetch results from resultsDB.

This commit increases the timeout to 330s and 300s for the
graceful timeout. These values are temporary and will need to
be finely tune later.

Signed-off-by: Clement Verna cverna@tutanota.com

@cverna, that seems a bit long. Is there a specific issue/outage you're trying to address?

Can you instead specify those additional parameters in the OpenShift template which uses the container image?

@cverna, that seems a bit long. Is there a specific issue/outage you're trying to address?

So in Fedora's instance we were seeing a lot of these ( 105 times within an hour )

[2019-03-15 16:06:23 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:56)
[2019-03-15 16:06:23 +0000] [56] [INFO] Worker exiting (pid: 56)
[2019-03-15 16:06:23 +0000] [60] [INFO] Booting worker with pid: 60

We have deployed a fix to test this configuration. We are seeing a lot less error but there are still workers that are waiting for more than 5 minutes doing nothing (10 times in 1 hour) :(.

Also do note that Openshift's uses haproxy default configuration and terminate requests after 30s

We had to run the following on the greenwave route and that made the number of 504 returned by greenwave decrease.

oc -n greenwave annotate route greenwave-web --overwrite haproxy.router.openshift.io/timeout=330s

Can you instead specify those additional parameters in the OpenShift template which uses the container image?

Yes this is what we have ended up doing --> https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=f6784eb283dde2aee057da0c2772b61c243b7fa5

Do you want me to close this PR ?

We saw those logs also in the internal instance. I made some tests disabling completely the timeout, and in some cases I had to wait even more than 5 minutes. Increasing the timeout, in my opinion, is not going to solve the problem, and even if, it's not the solution that I would hope for.
For our objectives Greenwave should return in a more reasonable time (seconds).

Internally in stage the change to address this issue was already deployed and we didn't see anymore these kind of logs and problems.

So I would prefer not to merge this change. But let's hear what other people say.

For our objectives Greenwave should return in a more reasonable time (seconds).
Internally in stage the change to address this issue was already deployed and we didn't see anymore these kind of logs and problems.

What was the change ?

For our objectives Greenwave should return in a more reasonable time (seconds).
Internally in stage the change to address this issue was already deployed and we didn't see anymore these kind of logs and problems.

What was the change ?

https://pagure.io/greenwave/pull-request/378#request_diff

For our objectives Greenwave should return in a more reasonable time (seconds).
Internally in stage the change to address this issue was already deployed and we didn't see anymore these kind of logs and problems.
What was the change ?

https://pagure.io/greenwave/pull-request/378#request_diff

Thanks :)

I gonna close this, since we should not need it with the incoming change

Pull-Request has been closed by cverna

5 years ago
Metadata