#9990 sssd fails on ipsilon servers, causing authentication issues.
Closed: Fixed 3 years ago by kevin. Opened 3 years ago by q5sys.

Describe what you would like us to do:


The error message says to reach out to you all... so here I am following instructions.

When do you need this to be done by? (YYYY/MM/DD)


Preferably before the election cycle ends, as Id like to vote.

400error.png


Update:
Others are reporting the same thing, and having success with logging into other fedora sites with their FAS accounts and then being able to log in. I did the same and was able to log in. So I'm not sure what the issues is, but wanted to report back my findings.

I think it's not that going to another site helps, but that it's transitory and people should just retry...

But investigating.

I got the same behavior. Clicking back and logging in again let me in. I also hit this on the OpenShift console, so I don't think it's specific to the Elections app. I saw it on another application this morning, but I don't remember which one it was.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: high-gain, high-trouble, ops

3 years ago

Please re-try now. I have made a change that I hope will keep it working.

If folks are still seeing issues, please let me know your account name and what time exactly it was?

I'm currently experiencing this issue while trying to login in Openshift Web Console.
username: mattia
time: 2021-05-29 15:16 UTC

I can successfully login into accounts.fedoraproject.org

I have the same issue with the elections app.

username: obudai
time: 2021-05-31 9:37 UTC

Most puzzling.

[Sat May 29 15:16:21.724960 2021]... authentication failed for user mattia: Authentication failure

[Mon May 31 09:35:56.396373 2021] ... authentication failed for user obudai: Authentication failure ...

So from the ipsilon point of view it seems like it's the wrong password. :(

@abompard or @puiterwijk any ideas here?

It's apparently only happenning on ipsilon02, when I end up on ipsilon01 I can login fine.

I get this in the journal:

May 31 17:50:57 ipsilon02.iad2.fedoraproject.org httpd[701471]: pam_sss(ipsilon:auth): authentication failure; logname= uid=48 euid=48 tty= ruser= rhost=192.168.1.195 user=abompard
May 31 17:50:57 ipsilon02.iad2.fedoraproject.org httpd[701471]: pam_sss(ipsilon:auth): received for user abompard: 4 (System error)

Restarting sssd seems to have fixed it.

ok. Sounds like that was it. I do wonder why sssd went into that state though. ;(

If anyone still has any problems, please re-open or file a new issue. Thanks!

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

ok. Sounds like that was it. I do wonder why sssd went into that state though. ;(

If anyone still has any problems, please re-open or file a new issue. Thanks!

Can't reopen, but I just saw this again with Elections and another app (either commblog or wiki, I forget now)

Yeah, it looks like 02 is hitting those errors again. ;(

01 has almost no cases of it...

@abompard any idea of the underlying cause here?
i'm running the playbook over 02 to make sure there's nothing just out of sync config wise.

Metadata Update from @kevin:
- Issue status updated to: Open (was: Closed)

3 years ago

In the journal I see quite a few entries like:

kernel: sssd_be[1511570]: segfault at 8 ip 00007f0d148d1e24 sp 00007ffffa182258 error 4 in libdbus-1.so.3.19.13[7f0d148ac000+30000]
systemd-coredump[1513354]: [🡕] Process 1513356 (sssd_be) of user 0 dumped core.
Stack trace of thread 1511570:
[...]

An yesterday there were lines like sssd_be[1508324]: LDAP connection error: unknown error.

I don't know where that could come from but I'll open a ticket with the sssd folks. And I'll restart sssd in the meantime.

Metadata Update from @ryanlerch:
- Issue priority set to: Waiting on External (was: Waiting on Assignee)

3 years ago

I just had this issue when trying to login to ask.fedoraproject.org. Even resetting my password didn't help. I was able to login here (first time) and now it seems to work again on ask.fedoraproject.org as well, which agrees with @q5sys observation.

I confirm, exactly this error message was shown 3 times today.

I came cross this login issue today. Depending on which Fedora service I started from had different outcomes.

Via discussion.fedoraproject.org and clicking "Log In" button I am seeing the SC 400 error repeatedly. No matter the number of attempts. Did not have a linked account set-up.

Whereas via Pagure fedora-infrastructure forum it took 3 attempts before my credentials were accepted.

I just had the same issue as @paulgb - ask.fedoraroject.org wouldn't let me login (HTTP 400). But I could login to pagure.io after which ask.fedorproject.org was fine.

I'm seeing HTTP 400 on the wiki right now. With Badges, I'm getting a different behavior: Logging in to badges keeps redirecting me back to the login page (with a different ipsilon_transaction_id each time). If I go back to badges, I'm still not logged in.

On CommBlog, I'm getting the same loop behavior that I see with Badges

I got two different reports of login problems this morning, and discovered that I get an error which looks like the initial image when I try to log in to ask, but

 OpenID request was cancelled

when trying to log into pagure.io in a private window.

im also seing this error "400 - Bad Request

User not authenticated at continue" and "OpenID request was cancelled" wen having trouble logging in to ask.fedoraproject.org to report an issue with my sound card on my acer laptop
some of my failed kernel tests are not showing up in my email notifications for today.
it seems to go away after resetting the password over and over again several times. but this is happened before.
I think you guys might want to consider printing carbon copies and doing offline backups for sensitive tickets

@duffy is reporting "OpenID request was cancelled" errors logging into Pagure right now.

Possibly this is a different issue, but it all feels connected.

@abompard any news here?

Perhaps we can at least add a nagios check so we know when this starts and can restart it? Or just restart sssd every hour or 15min or something?

Would moving the ipsilon hosts over to rhel7 or rhel8 work around this?

I have added a note to fedorastatus about this issue. If you are reading this coming from that, please:

  1. Make sure you can login ok to https://accounts.fedoraproject.org with the same username/password. This issue does NOT affect direct logins to the accounts page, so if you can't login there, there is something else wrong, not this issue.

  2. If you can login to https://accounts.fedoraproject.org ok but not anywhere else, please note the issue to #fedora-admin channel on irc.libera.chat or mail admin@fedoraproject.org and note that you may be seeing this issue. We are working to track it down and trying to watch when it happens so we can restart it before it affects anyone, but we may miss that during off hours.

Sorry for the trouble. ;(

We are having issue on CentOS CI OCP cluster with same "Error 400 - Bad Request - User not authenticated at continue"

CentOS has it's own ipsilon? or is it using the fedoraproject ones? Both of the fedora ones seem to be functional right now... no sssd issues.

I took me five attempts to get logged in here, the first four attempts were cancelled for no apparent reason. I've been noticing this mostly when logging in to https://ask.fedoraproject.org.
Clearing the local storage and cookies, seems to fix the problem most of the times.

It happened to me about 1h ago on Fedora infrastructure (trying to use Pagure failed repeatedly even after logging in to FAS itself, until the error cleared itself out).

I haven't been able to clone my Pagure repo for several days. I can logon through the web interface, but git clone ... keeps giving me the following:

$ git clone ssh://git@pagure.io/forks/glb/fedoramagazine-theme.git
Cloning into 'fedoramagazine-theme'...
Received disconnect from 8.43.85.76 port 22:2: Too many authentication failures
Disconnected from 8.43.85.76 port 22
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Edit: Nevermind. I just realized what the problem was. -- I had too many ssh keys in my ~/.ssh directory. I just needed to add the following to ~/.ssh/config and it is working now.

Host pagure.io
    IdentitiesOnly yes
    IdentityFile ~/.ssh/my-fedora-ssh-key
    User glb

Note: this was not this issue, we were having a network outage at our primary datacenter (IAD2) around that time.

From https://ask.fedoraproject.org/t/login-issues-with-fas-to-ask-fp-o-and-pagure-io/16474:

Login issues with FAS to ask.fp.o and pagure.io

When I tried with another guest machine, I cannot logon to ask.fp.o using FAS.

Then when trying to logon to pagure.io to create a ticket, access failed also.

But logon to accounts.fp.o is OK.

When going back to this machine (which ask.fp.o is already log on):

  • logout and log on ask.fp.o again is fine.
  • logon to pagure.io failed
  • logon to accounts.fp.o ok

sssd on 01 was broken, restarted.

Also, I talked with @abompard who was looking into this and gotten the upstream bugreports, etc.

I am going to work next week on gathering the info that sssd developers need to track this down.
Hopefully we will have a fix once and for all soon.

FYI got the 400 Bad Request trying to log in to Element. Worked on second try.

@kevin good news! I'm hitting this on the wiki (~1340 UTC) so the ludicrous logging now has something to log.

Sadly, this was 01 that was affected. I only setup debugging on 02. ;(

I've restarted 01 so it's back to normal.
If 02 doesn't fail soon, I'll ask upstream for more advice.

An outreachy applicant seems to be running into this now: Mon 2021-10-11 09:07:16 UTC

Except neither server is in the failed state. ;(

Whats their username? Make sure they are not using email address, are appending otp if they enrolled one.

This was for vraj21, they were able to login a few minutes later though, so could be user error too perhaps.

Odd. I do see an error for them in the logs, but it's not causing the entire process to keep erroring from then on. Perhaps this is a different issue. ;(

Good Afternoon,

I've been locked out of Fedora Magazine WordPress. I was stuck in a refresh loop and get Jetpack has locked your site's login page. The issue is when I attempt to send myself the link with email it gives me secure connection failed. I've attempted on another device and still get the same issue.

Anything I can do to get my account unlocked?

Hey @zexcon that doesn't sound like it has anything to do with this issue. ;(

@misc perhaps you could help out with magazine here? I'd guess jetpack has blocked the ip @zexcon is coming from?

I got the OpenID request was cancelled messages when trying to login.
I tried a several times in course of a couple of days, but I kept getting the same message.

I reset my password, and now I can log in.

So, this issue has not happened since we updated to a hopefully fixed version more than 2 weeks ago.

I am going to call it fixed.

Other authentication issues, please file new tickets on please and we will track them down.

@tmds Thats pretty odd. When was the last time you changed your password before that? was it back before we moved account systems?

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog
Attachments 1
Attached 3 years ago View Comment