There are a number of URLs, including
which get hit a lot by the default Fedora web browser start page (see https://gitlab.com/fedora/websites-apps/fedora-websites/fedora-websites-3.0/-/blob/develop/pages/start.vue?ref_type=heads#L97)
There are also follow-up requests to topics that are returned in the json from these index queries.
This is causing a lot of extra load on Fedora Discussion (particularly the search query), but it might actually be that the others are bad too, just obscured in the reports I get.
Could we put something in the middle here that caches these responses? I don't think that Discourse itself should be hit more than, say, every fifteen minutes? (Even limiting to one per minute would be an order-of-magnitude improvement. But there's no real reason for it to be that up-to-date.)
We're not, presently, in trouble, but we're getting close to thresholds where we would need to pay more. This seems like a bad reason to pay thousands of dollars more for hosting. So.... by the end of June, maybe?
CC: @darknao
Metadata Update from @phsmoura: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: medium-gain, medium-trouble, ops
I was thinking about a solution for this, but couldn't find any that is easy to implement.
cc @glb
The third option does sound best to me as well for start.fp.o. However, it would be a lot of work.
The first option seems OK to me as a quick fix. Could the server be configured to only proxy the specific json queries that we need for start.fp.o? Also, I don't know the details of what you are running server-side, but if you are already running Apache httpd, you might be able to configure the caching directly in httpd instead of installing Squid. I haven't used it personally, but here is a link to the documentation for httpd's mod_cache:
https://httpd.apache.org/docs/2.4/mod/mod_cache.html#cacheenable
Edit: FWIW, this example might be similar to what we would want?:
https://taylor.callsen.me/creating-a-caching-proxy-server-with-apache/#:~:text=4.-,Bringing%20it%20all%20together,-The%20complete%20Apache
I'm not sure I like the idea of another app for this. ;)
Crazy idea: How about we setup a cloudfront distribution for these and start hits cloudfront which caches them? That unfortunately puts cloudfront in our critical path, but it would be super easy to do.
Otherwise IMHO, just doing static and once a day builds should be ok. The only thing that likely changes much is the 'recently solved' and perhaps we could switch that just to a link instead of pulling from it?
I think a static build of the whole site is a bit intensive. Is there a way to only rebuild the start page automatically? If so, I think that would be OK, but I'd like to do it at least four times per day so things like announcements from discussion.fp.o (or things like release announcements or CVE announcements from Fedora Magazine) wouldn't be potentially delayed for as long 24 hours.
I like the idea of splitting off the start page build from the rest. Then we could just build every hour or something and it would be super quick...
I'm okay with the spun-off static start page idea, but FWIW Cloudfront was what I had in mind. Isn't this kind of thing what it's for? :)
I don't know how Cloudfront works, but if it creates caches at specific times, I'd suggest using something like 5 minutes past the hour so that the Fedora Magazine posts that typically run on the hour (08:00 UTC for normal posts, 14:00 UTC for release announcements) will show up quickly. Otherwise, it is all good with me. I don't (fore)see any obvious problems.
We will split the start page and statically build it eventually, but it will take some time to get there. In the meantime, I think the Cloudfront solution is the fastest to implement. Once the distribution point is created, it should not take long for us to update the start page to use it.
ok. I can set that up... although I don't like putting cloudflare so directly in our production path. :)
So what exactly do we need on the cloudflare setup? what origin should it use? just all of discussion.fedoraproject.org ? or more targeted?
Ideally, we would need the origin paths mentioned by mattdm:
https://discussion.fedoraproject.org/c/news/announce-list/76.json https://discussion.fedoraproject.org/tags/c/ask/common-issues/82/none/f40.json (or /f*.json to catch all versions) https://discussion.fedoraproject.org/search.json?q=%23ask%20status%3Asolved%20order%3Alatest_topic
or, if that is easier to configure, all of https://discussion.fedoraproject.org.
edit: Discourse disables caching on those urls with the cache-control: no-cache, no-store header, so the Cloudfront distribution needs to set its own cache setting (TTL of 1 or 2 hours should be a good start).
cache-control: no-cache, no-store
I've created https://d36melcmqgchij.cloudfront.net with the entire site... easier than particular parts.
Let me know if that works or if anything more is needed for now.
So I need to update the start page to query d36melcmqgchij.cloudfront.net instead of discussion.fedoraproject.org right?
For gathering the data yes. Ideally I think any actual links people would end up clicking on should still go direct to discussion if thats possible?
I've submitted MR !986 to route the start page API queries through this new proxy.
Ideally I think any actual links people would end up clicking on should still go direct to discussion if thats possible?
Yes, I've made sure the links still go to the "real" discourse site. Thanks for pointing out that those should be maintained. 🙂
While updating the start page code, I noticed that the user avatars (displayed next to the latest solved issues) are being fetched from https://sea1.discourse-cdn.com/fedoraproject. Is that endpoint "metered"? If so, it might be necessary to add that to the proxy as well.
Does this new proxy adjust/remove the same origin headers? If not, this might not work.
Edit: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-cloudfront-supports-cors-security-custom-http-response-headers/
curl https://d36melcmqgchij.cloudfront.net/c/news/announce-list/76.json -I HTTP/2 200 content-type: application/json; charset=utf-8 ... cache-control: no-cache, no-store access-control-allow-origin: https://fedoraproject.org x-cache: Miss from cloudfront
The CORS headers are preserved so we are good on that front. But the cache-control is still not set, and I get a cache miss for every request I make, so I think we need to enforce a cache TTL on cloudfront.
This cache-control configuration is the root cause of this issue and is set by Discourse. With this current configuration, no one will cache this request (browsers or proxies).
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesObjectCaching
If it is difficult, don't worry about it, but if you could add stg.fedoraproject.org, fedora.gitlab.io, and localhost to that access-control-allow-origin list from cloudfront, that should enable some of this functionality to be previewed/tested on the staging sites before the start page goes live on the production site.
just FYI, I read most of these tickets via email and pagure doesn't send any edits via email... :) So, IMHO, it would alway be better to just add a new comment over editing.
Anyhow, so I guess this won't work unless we can get cloudfront to ignore the cache control settings on the discussion side?
I am not sure what you mean by access-control-allow-origin list here? Currently it's open to everyone. We could of course lock it down if desired, but I didn't bother until we got it working...
Not sure localhost will work here as the development env uses a specific port. Maybe http://localhost:3000. But I agree that having any of the 2 other domains would be really helpful as we can only test this in production right now.
localhost
http://localhost:3000
I'm seeing the same results that darknao reported in his earlier comment. It looks like both cache-control and access-control-allow-origin are set. The documentation for cloudfront appears to indicate that these headers can be adjusted on the proxy:
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/modifying-response-headers.html
Not sure localhost will work here as the development env uses a specific port. Maybe http://localhost:3000.
Yes! Thanks for catching that darknao. If possible (and not difficult), I would like http://localhost:3000 added to the access-control-allow-origin list (along with https://fedoraproject.org, https://stg.fedoraproject.org, and https://fedora.gitlab.io).
Yes, in the CloudFront configuration, that should be : Object Caching: Customize Minimum TTL: 2h
It's cross-reference access-control used by browsers. Currently, only fedoraproject.org can request assets from discourse (from the browser. So javascript or other assets). I think you can override this by following https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/creating-response-headers-policies.html (Origin override checkbox)
The cloudfront console interface is... a bit confusing to me.
But I think I have adjusted things. Can you test and see if I missed anything?
It doesn't look like there is any access-control-allow-origin now. I don't think that will work.
Huh... I have:
Access-Control-Allow-Origin https://fedoraproject.org http://localhost:3000 https://stg.fedoraproject.org https://fedora.gitlab.io Origin override
I'm just going by what I see in the output from the curl command that darknao provided earlier. Do those values (https://fedoraproject.org https://stg.fedoraproject.org https://fedora.gitlab.io http://localhost:3000) need to be on the same line as the key (Access-Control-Allow-Origin)? The typical format for showing/setting HTTP headers is <key>: <value>, [value, ...], but I have no idea how CloudFront works.
<key>: <value>, [value, ...]
Actually, it looks like Access-Control-Allow-Origin only allows one value, so you might have to set it to * for all our staging sites to work. I don't know if that would be a problem or not?
*
Excerpted from https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Origin:
Limiting the possible Access-Control-Allow-Origin values to a set of allowed origins requires code on the server side to check the value of the Origin request header, compare that to a list of allowed origins, and then if the Origin value is in the list, set the Access-Control-Allow-Origin value to the same value as the Origin value.
It's a set of checkboxes. ;)
<img alt="Screenshot_from_2024-06-04_12-42-25.png" src="/fedora-infrastructure/issue/raw/files/ecdb737db0473c7ddc3507e0c3e355910a500150cdd772f2d2f0a8482e15fc7c-Screenshot_from_2024-06-04_12-42-25.png" />
Well, if darknao will approve my MR, we should be able to see if it is working on GitLab pages. 🙂
I just tested this on a local build and I see a problem.
<img alt="startpage.jpg" src="/fedora-infrastructure/issue/raw/files/07b6604a903e3292f99c32646df98c81b756ad3005b75860ae6a03cd9af9b63a-startpage.jpg" />
The "Common Issues" query is working, but the "Latest Solved Issues" query is not. The difference between them is that the latest solved issues query requires query string parameters. I don't know why the latter would be being blocked, but that is the only difference between the queries (unless I have a typo somewhere, but I don't see any).
Can we try it with Access-Control-Allow-Origin set to "All origins"? Are we concerned about this proxy being used to harvest data from our discourse instance?
sure. Set it to allow all...
Ok so the cache setting is correct and I get cache hit everytime. That's good. The CORS headers are good too, and works for all mentioned domains. Perfect. One of the URL is using query parameters, but CloudFront ignore them for caching, which then doesn't return the correct result.
There should be a "Query string forwarding and caching" parameter that you can set to "Forward all, cache based on all" (I assume it's currently set to None) https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesQueryString
No dice with it set to allow all. Hopefully what darknao suggests will work instead.
also note that the previous cache setting (if you already changed it) was working just fine
ok, set the query string thing.
I currently have Access-Control-Allow-Origin set to all... should I switch it back to that list of domains?
Hmm, it still doesn't seem to be working. 😕
It doesn't matter to me. I guess the explicit list is, in theory, a little more secure.
explicit list is best I think, it's working in both cases.
For the query string, it's working now. So I think everything is good and we can push the updated start.fp-o to production.
@glb the "latest solved issues" is not working on your side cause the function tries to query each topic details on discussion.fp-o (instead of the cloudfront proxy) and fails due to the CORS headers set there. Should be fine on production (or you could use the cloudfront URL for that too, since it forwards all requests in the end)
I'm sure I have the solved query pointing at the cloudfront proxy:
[/home/glb/Repositories/fedora-websites-3.0]$ git diff HEAD~1 ... const fp_solved_issues = async () => { - let dcdata = await $fetch(`${discourse_uri}/search.json?${solved_query}`); + let dcdata = await $fetch(`${_dc_proxy_uri}/search.json?${solved_query}`); let solved = dcdata.posts; let i = 0; [/home/glb/Repositories/fedora-websites-3.0]$
I'm running/testing the route-startpage-queries-through-proxy branch locally with npm run dev.
route-startpage-queries-through-proxy
npm run dev
I'm talking about the lines just below:
let topic; if (solved[i].topic_id) { topic = await $fetch(`${discourse_uri}/t/${solved[i].topic_id}.json`); }
Yeah, I get it now. :person_facepalming:
I guess I should change that to go through the proxy.
That's the only request left that doesn't use the proxy. Now that we have it, I'd say let's use it for all requests :) (and this one also set the no-cache, no-store header so even if it's not the most queried URL, it will still be beneficial to have a cached version).
no-cache, no-store
Side effect to this: start.fp-o is now loading super fast :D
I think I've made the needed update to my MR on GitLab. Let me know if I've missed anything.
BTW: There is still the avatars query. Do you have any ideas about how that should be handled?
avatars are cached (thanks to the cache-control header properly set this time) so I don't think there is a need to cache them on that proxy.
cache-control
ok. I am going to put back the Access-Control-Allow-Origin list... if everything looks ok after that, is there anything left to do here?
I don't think so. If it still works on https://fedora.gitlab.io/websites-apps/fedora-websites/fedora-websites-3.0/start then we are (probably) OK. 🙂
Everything looks good on staging so I'm pushing it on prod right now. I think we can close this ticket. I would be interested in the stats on Discourse following this change just to see how much of an improvement it makes.
Ditto. If we are well under whatever the threshold is where the cost would increase, we might consider reducing the cache time so that the solved issues will update more frequently (just to keep the start page a little more interesting/active).
Thanks! Agree it would be good to look at this after a bit and see what effect it's having...
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.