Problems with fetching epel release metalink from proxy06.fedoraproject.org (140.211.169.206)

I have been investigating random build failures in our jenkins build system. The builds fail occasionally - maybe a few times per week - in a centos 7 based docker container build phase because the epel repository metalink fetch fails.

I have debugged the issue further and it seems that sometimes https requests to fedora mirror at proxy06.fedoraproject.org (140.211.169.206) either get completely stuck or take extremely long time to complete.

I have verified this with the following curl command:

# curl -v --resolve "mirrors.fedoraproject.org:443:140.211.169.206"  "https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=x86_64"
* About to connect() to mirrors.fedoraproject.org port 443 (#0)
*   Trying 140.211.169.206...
* Connected to mirrors.fedoraproject.org (140.211.169.206) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=*.fedoraproject.org,O="Red Hat, Inc.",L=Raleigh,ST=North Carolina,C=US
*       start date: Feb 27 00:00:00 2020 GMT
*       expire date: Mar 02 12:00:00 2022 GMT
*       common name: *.fedoraproject.org
*       issuer: CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US
> GET /metalink?repo=epel-7&arch=x86_64 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: mirrors.fedoraproject.org
> Accept: */*
> 

The requests above got stuck after the http request had been sent. Request do not get stuck on every try, but when doing them in a loop, it does not take many minutes to see this happening. As you can see from the curl output, this problem happens after TLS negotiation has completed. Thus the problem is in no way related to system trusted CA configurations.

The reason why our builds keep failing much less often is that mirrors.fedoraproject.org seems to be DNS load balanced and the name resolver picks a different proxy IP address on each resolve request:

$ host mirrors.fedoraproject.org
mirrors.fedoraproject.org is an alias for wildcard.fedoraproject.org.
wildcard.fedoraproject.org has address 152.19.134.198
wildcard.fedoraproject.org has address 67.219.144.68
wildcard.fedoraproject.org has address 209.132.190.2
wildcard.fedoraproject.org has address 140.211.169.206
wildcard.fedoraproject.org has address 38.145.60.21
wildcard.fedoraproject.org has address 8.43.85.73
wildcard.fedoraproject.org has address 140.211.169.196
wildcard.fedoraproject.org has address 152.19.134.142
wildcard.fedoraproject.org has address 38.145.60.20
wildcard.fedoraproject.org has IPv6 address 2605:bc80:3010:600:dead:beef:cafe:fed9
wildcard.fedoraproject.org has IPv6 address 2604:1580:fe00:0:dead:beef:cafe:fed1
wildcard.fedoraproject.org has IPv6 address 2600:2701:4000:5211:dead:beef:fe:fed3
wildcard.fedoraproject.org has IPv6 address 2620:52:3:1:dead:beef:cafe:fed6
wildcard.fedoraproject.org has IPv6 address 2605:bc80:3010:600:dead:beef:cafe:feda

Does anyone know if this is a known issue, or if it is caused by some request rate limiting going on at proxy06…?

Are there any known work arounds? I am thinking something like blacklisting the hostname proxy06… or IP address 140.211.169.206 so that yum will never attempt to use that mirror.

Br,
Sami

1 Like

I’m not sure where one files issues related to EPEL but you can file an issue at the Fedora infrastructure team’s tracker to start with. (It requires detailed infrastructure knowledge that we here on the forum are unlikely to have):

https://pagure.io/fedora-infrastructure/issues

Yes, an infrastructure ticket is the right place. (Also, no problem asking here to check!)

I am not sure if there is rate-limiting, but I wouldn’t be completely surprised if that’s at least part of the issue. How frequent are your builds? Have you considered making a local mirror or cache?

Actually, we started cleaning up the docker containers and found that we need packages from epel repo in very few containers. I am expecting this cleanup in itself will bring the likelihood of build failures close to neglible.

I can file a infrastructure ticket though, as there seems to be a real issue that could affect other users too.

Br,
Sami

1 Like