-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure CLI task fails with AADSTS700024
after 60 minutes
#28708
Comments
refresh OIDC token is a feature |
Callback interface proposalsDifferent external identity providers (IdP) have different ways of retrieving the ID token:
I had a discussion with MSAL team today and proposed 2 possible callback interfaces:
|
Mitigation: Extend task duration to 60 minutesWarning This mitigation doesn't work with Azure CLI 2.59.0. See #28708 (comment).
An ID token lasts for 5 minutes on GitHub Actions and 10 minutes on Azure DevOps, but an access token lasts for 60 minutes. When you run After the ID token expires, if acquiring an access token for other scopes, such as
as currently there is no access token for that scope in the token cache, Azure CLI/MSAL will try to get an access token with the ID token. However, as the ID token has expired, the command fails with So, the mitigation is pretty straightforward: Acquire all access tokens before the ID token expires. You have to know which scopes are used in your pipeline task and call For example:
Warning Even though GitHub Actions can mask the access token as
You MUST specify Then subsequence commands using these scopes will use the access tokens saved in the token cache, so that they won't fail after the ID token expires, but they will still fail after the access token expires (60 minutes). |
I tried fixing the issue with provided mitigation but it is still persistent, maybe I'm doing something wrong?
After that I added step to mitigate the issue:
But after ~10 minutes Im still getting:
Did I miss something? I use https://www.npmjs.com/package/@azure/service-bus |
Thanks for the mitigation @jiasli. However, I don't think I'm hitting the issue where the Azure CLI tries to acquire an access token for a difference audience after the ID token has expired. I'm fairly confident that the
The general flow is:
The time it takes to swap slots varies greatly, however more than 5 minutes have always elapsed by the time it's done. Now, what is strange is that stopping the slot sometimes work, and sometimes doesn't, dependending on how much time has passed since we ran To me, it sounds like the access token expires "quicker" than before. Edit: I checked across many workflow runs, and to me it looks like the access token expires after 10 minutes. |
@Kapsztajn, I can successfully get an access token for
Decoded claims:
I am not entirely sure why this line is printed:
The Azure Service Bus client library for JavaScript SDK also didn't fail with |
@mderriey, this seems odd as all these operations are indeed ARM operations. Could you check the actual expiration time of the access token issued for ARM?
|
Hi @Kapsztajn, the suggested mitigation did not work for me as well. It was able to fetch the token with an expiry that was reasonable, but I was able to see the same error once the OID token expired after 5 mins. I propose a workaround by fetching the OID token every 4 mins to avoid the expiry. I was able to get this working and here is what I did: I inserted the following step in my workflow just before the step where this token expiry issue was popping:
Could you try this out and see if this works for you as well? |
Good suggestion @jiasli , thanks. Here's what I ran: steps:
- name: Login to Azure
uses: azure/login@v2
with:
client-id: ${{ env.oidcAppRegistrationClientId }}
tenant-id: ${{ env.azureTenantId }}
allow-no-subscriptions: true
enable-AzPSSession: true
- name: Check token expiry
shell: bash
run: |
echo "Current date: $(date '+%Y-%m-%dT%H:%M:%S')"
echo "Token expiration: $(az account get-access-token --resource-type arm --query expiresOn --output tsv --debug)"
echo "Token AzureAD/microsoft-authentication-library-for-python#2 expiration: $(az account get-access-token --resource-type arm --query expiresOn --output tsv --debug)" And the output (debug output omitted): Current date: 2024-04-11T06:57:14
Token expiration: 2024-04-11 07:57:14.000000
Token AzureAD/microsoft-authentication-library-for-python#2 expiration: 2024-04-11 07:57:14.000000 So the token is valid for 1 hour. And both calls to
I'm not sure what happens, then... Thanks again, let me know if I can perform some more testing if anything comes to mind. |
Apologize for the confusion caused. As I tested today, the mitigation I provided in #28708 (comment) stopped working for Azure CLI 2.59.0, because of an MSAL regression introduced in 1.27.0 (AzureAD/microsoft-authentication-extensions-for-python#127, AzureAD/microsoft-authentication-library-for-python#644) which is adopted by Azure CLI 2.59.0 (#28556). This regression makes MSAL's I will work with MSAL on this issue with high priority. WorkaroundFor now, please keep using service principal secret for authentication to get unblocked: https://github.com/marketplace/actions/azure-login#login-with-a-service-principal-secret |
My question is why this has popped up as an issue recently. We've had pipelines run for well over 20 minutes before and never seen this. But within the last week, it seems any workflow using Azure CLI with OIDC federated auth is experiencing this issue. |
@iamrk04
I had to add |
@smokedlinq, In my case, it's due to a new version of the GitHub hosted runner image for The image went from You can see which image your run uses in the "Set up job" step at the very top. |
@mderriey I assumed something like that, I was more referring to how that broke inside of |
@TomWildenhain, thanks for the information. If you used service principal secret to create the service connection, I don't think the federated identity credential added to the app is actually used. |
@jiasli Is it possible to give any realistic timeline for a fix? I am wondering if it makes sense to ask for a rollback of the cli version contained in actions/runner-images that is used by both Github Actions and Azure DevOps. |
We are seeing the same issue related to moving away from service principal secrets. We are looking into adding logic for all Az CLI calls using the ARM token to ensure it gets refreshed (but not as a background process) to get the OIDC token from |
If you can help to resolve that will be appreciated |
I have exactly the same use case as @TomWildenhain. Is there a way to make the token valid period customable? We can't use Service principal as that's discouraged by the cred free best practices. Even a workaround would be much appreciated. |
Have the same issue for our long-running tasks:
|
@panpanwa we are not using github actions. We're using AzureDevOps in yml, e.g. - task: AzureCLI@2
displayName: Run load profile
inputs:
azureSubscription: $(federatedCredConnection)
scriptType: ps
scriptLocation: scriptPath
scriptPath: $(Pipeline.Workspace)/test.ps1 |
@panpanwa this is the stopgap solution that was shared by a colleague we can implement in our AzureCLI task Start-Job -Name 'RefreshOidcToken' -ScriptBlock {
do {
Get-ChildItem -Path Env: -Recurse -Include ENDPOINT_DATA_* `
| Select-Object -First 1 -ExpandProperty Name `
| ForEach-Object { $_.Split("_")[2] } `
| Set-Variable serviceConnectionId
$oidcRequestUrl = "${env:SYSTEM_TEAMFOUNDATIONCOLLECTIONURI}${env:SYSTEM_TEAMPROJECTID}/_apis/distributedtask/hubs/build/plans/${env:SYSTEM_PLANID}/jobs/${env:SYSTEM_JOBID}/oidctoken?api-version=7.1-preview.1&serviceConnectionId=${serviceConnectionId}"
Invoke-RestMethod -Headers @{
Authorization = "Bearer $env:SYSTEM_ACCESSTOKEN"
'Content-Type' = 'application/json'
} -Uri "${oidcRequestUrl}" -Method Post | Set-Variable oidcTokenResponse
$oidcToken = $oidcTokenResponse.oidcToken
if (!$oidcToken) {
Write-Warning "OIDC token could not be acquired. Retrying..."
Start-Sleep -Seconds 30
continue
}
az account show -o json | ConvertFrom-Json | Set-Variable account
az login --service-principal -u $account.user.name --tenant $account.tenantId --allow-no-subscriptions --federated-token $oidcToken | Out-Null
Start-Sleep -Seconds 480 # 8 minutes
} while ($true)
} | Tee-Object -Variable refreshOidcTokenJob `
| Select-Object -ExcludeProperty Command `
| Write-Host -ForegroundColor DarkMagenta
# do long running work
Receive-Job $refreshOidcTokenJob
Stop-Job -Job $refreshOidcTokenJob
Remove-Job -Job $refreshOidcTokenJob |
Also this seems to be in preview for v1.12.0-beta.2 |
This might work in 99% of the cases but is not completely reliable; beware of race conditions. |
Azure DevOps's document now also explains https://learn.microsoft.com/en-us/azure/devops/pipelines/release/troubleshoot-workload-identity
|
Do we have any updates on the issue? A lot of our ADO pipelines are intermittently failing and we have been asked to move away from service principals to be cred free. The PR linked is still in draft state #28778 |
Thanks @jiasli! This works for my use case! |
I got same error for the time duration between 10 min to 1 hour, as mentioned on the Microsoft Docs as mentioned in the docs we have access storage account at beginning but in terraform apply we cannot manage by ourselves. I'm using terraform apply the pipeline running around 10 min and then gives below error:
|
Just to be clear - this isn't just a problem with github runners, happens on AzDO pipelines as well. Sprinkling |
I've got the same issue using terraform deployment and federated identity by an user assigned identity.
|
I stand corrected - we still see intermittent failures. Basically, federated identity is unusable at this stage as long as you have slightly more complex build scenarios that require more than 10min to deploy. We have converted back to service principal + secret |
Similar stance to @chriswue in that we converted back to the service principal + secret route. As much, and as aggressive, as Microsoft are pushing for the use of OIDC - it's just not been tested properly and is not fit for purpose on a lot of production workloads - especially Terraform, but also anything else Azure CLI related that takes a few minutes to run |
Perhaps interesting as a workaround: keepAzSessionActive: true https://github.com/microsoft/azure-pipelines-tasks/blob/master/Tasks/AzureCLIV2/azureclitask.ts
|
Acquiring access token with expired OIDC token fails with:
As the error indicates, the OIDC token is only valid for 10 minutes. After it is passed to
az login
via--federated-token
, Azure CLI cannot get a new OIDC token after the OIDC token expires.This is the designed v1 behavior of OIDC token support (#19853).
However, as Azure DevOps task AzureCLI@2 (microsoft/azure-pipelines-tasks#17633) and GitHub Action azure/login@v2 (Azure/login#147) have supported OIDC token authentication, and it is recommended to use workload identity federation, this limitation is becoming more prevailing.
Possible solutions
References
The text was updated successfully, but these errors were encountered: