Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The produce-summaries script sometimes throws "maximum request timeout" #3771

Closed
trivikr opened this issue Jun 19, 2024 · 10 comments
Closed

Comments

@trivikr
Copy link
Member

trivikr commented Jun 19, 2024

Describe the issue

The produce-summaries script sometimes throws "maximum request timeout"

Steps to reproduce

View if you have access https://cloudlogging.app.goo.gl/fqXuDi35jwJpNoaK7

Observed behavior

The request has been terminated because it has reached the maximum request timeout. To change this limit, see https://cloud.google.com/run/docs/configuring/request-timeout

Expected behavior

No request timeout

Additional context

Possibly causing missing download stats #3697

@trivikr
Copy link
Member Author

trivikr commented Jun 25, 2024

This happens since produce_summaries is taking more than 3 minutes while being executed though cloud scheduler API

Logs, if you have access: https://cloudlogging.app.goo.gl/sxzThXp5egfCGZcD8

INFO 2024-06-25T11:00:00.388357577Z {"@type":"type.googleapis.com/google.cloud.scheduler.logging.AttemptStarted", "jobName":"projects/nodejs-org/locations/us-central1/jobs/produce_summaries", "scheduledTime":"2024-06-25T11:00:00.375875Z", "targetType":"HTTP", "url":"https://produce-summaries-kdtacnjogq-uc.a.run.app/"}
  {
    "insertId": "1t7extbfvjjkbu",
    "jsonPayload": {
      "targetType": "HTTP",
      "@type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptStarted",
      "url": "https://produce-summaries-kdtacnjogq-uc.a.run.app/",
      "scheduledTime": "2024-06-25T11:00:00.375875Z",
      "jobName": "projects/nodejs-org/locations/us-central1/jobs/produce_summaries"
    },
    "resource": {
      "type": "cloud_scheduler_job",
      "labels": {
        "job_id": "produce_summaries",
        "location": "us-central1",
        "project_id": "nodejs-org"
      }
    },
    "timestamp": "2024-06-25T11:00:00.388357577Z",
    "severity": "INFO",
    "logName": "projects/nodejs-org/logs/cloudscheduler.googleapis.com%2Fexecutions",
    "receiveTimestamp": "2024-06-25T11:00:00.388357577Z"
  }

ERROR 2024-06-25T11:03:00.436573558Z {"@type":"type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished", "debugInfo":"URL_TIMEOUT-TIMEOUT_WEB", "jobName":"projects/nodejs-org/locations/us-central1/jobs/produce_summaries", "status":"DEADLINE_EXCEEDED", "targetType":"HTTP", "url":"https://produce-summaries-kdtacnjogq-uc.a.run.app/"}
  {
    "insertId": "1qzm7chfx5f1d5",
    "jsonPayload": {
      "@type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished",
      "debugInfo": "URL_TIMEOUT-TIMEOUT_WEB",
      "url": "https://produce-summaries-kdtacnjogq-uc.a.run.app/",
      "jobName": "projects/nodejs-org/locations/us-central1/jobs/produce_summaries",
      "targetType": "HTTP",
      "status": "DEADLINE_EXCEEDED"
    },
    "httpRequest": {
      "status": 504
    },
    "resource": {
      "type": "cloud_scheduler_job",
      "labels": {
        "job_id": "produce_summaries",
        "project_id": "nodejs-org",
        "location": "us-central1"
      }
    },
    "timestamp": "2024-06-25T11:03:00.436573558Z",
    "severity": "ERROR",
    "logName": "projects/nodejs-org/logs/cloudscheduler.googleapis.com%2Fexecutions",
    "receiveTimestamp": "2024-06-25T11:03:00.436573558Z"
  }

An easy fix would be to increase the timeout.

@trivikr
Copy link
Member Author

trivikr commented Jun 26, 2024

An easy fix would be to increase the timeout.

A good timeout would be 10 20 mins (max allowed is 30 mins) to test successful execution in logs.

As per documentation, the command should be:

$ gcloud scheduler jobs update http produce_summaries --attempt-deadline=1200s --region=us-central1

@trivikr
Copy link
Member Author

trivikr commented Jun 26, 2024

To unblock download stats, the timeout needs to be increased to 20 mins. I've updated previous comment.


While looking into docs, I noticed that the request time latency has been hovering around 17 mins for some reason.
It started in the first week of May, aligned with download stats going missing.

Screenshot produce_summaries_request_time_latency

Logs, if you have permissions: https://console.cloud.google.com/run/detail/us-central1/produce-summaries/metrics?project=nodejs-org

@trivikr
Copy link
Member Author

trivikr commented Jun 30, 2024

I received "Cloud Invoker Role" in #3781 and verified that the summaries are produced if the script is run manually

curl -X POST https://produce-summaries-kdtacnjogq-uc.a.run.app -H "Authorization: Bearer $(gcloud auth print-identity-token)"

From the logs, it took ~10 mins to run

2024-06-30 08:24:53.539 PDT POST200116 B550.1 scurl/8.1.2 https://produce-summaries-kdtacnjogq-uc.a.run.app/ 
2024-06-30 08:24:53.564 PDT 20240629/
2024-06-30 08:34:05.364 PDT Upload complete

Confirmed that the summary for 20240629 is available at https://storage.googleapis.com/access-logs-summaries-nodejs/nodejs.org-access.log.20240629.json

@trivikr
Copy link
Member Author

trivikr commented Jul 1, 2024

Verified with another manual run that summaries are produced with UTC Date as updated in #3783

However, the attempt silently failed in around 10 minutes with no logs.

2024-06-30 17:02:43.458 PDT POST 200 116 B 594.1 s curl/8.1.2 https://produce-summaries-kdtacnjogq-uc.a.run.app/ 
2024-06-30 17:02:44.067 PDT > [email protected] produceSummaries
2024-06-30 17:02:44.067 PDT > node --trace-warnings summaries.js
2024-06-30 17:02:44.633 PDT Listening on port:  8080
2024-06-30 17:02:44.634 PDT Default STARTUP TCP probe succeeded after 1 attempt for container "producesummaries-1" on port 8080.
2024-06-30 17:02:44.654 PDT 20240630/

It's likely due to pre-set timeout being 10 minutes https://cloud.google.com/run/docs/configuring/request-timeout

@trivikr
Copy link
Member Author

trivikr commented Jul 1, 2024

I increased the timeout to 20 minutes, but unsure if that helped since request was executed within 10 mins.
I'll debug further after the option to pass date is provided in #3791

CLI to increase timeout (verified that it'll apply to all future versions)

gcloud run services update produce-summaries --timeout=20m

Logs for execution

2024-06-30 17:51:36.707 PDT POST 200 116 B 512.2 s curl/8.1.2 https://produce-summaries-kdtacnjogq-uc.a.run.app/ 
2024-06-30 17:51:36.738 PDT 20240630/
2024-06-30 18:00:10.919 PDT Upload complete

Summary for 20240630 is available at https://storage.googleapis.com/access-logs-summaries-nodejs/nodejs.org-access.log.20240630.json

@trivikr
Copy link
Member Author

trivikr commented Jul 2, 2024

The attempt for 20240701 took ~12 mins

2024-07-01 19:11:35.933 PDT POST 200 116 B 698.7 s curl/8.1.2 https://produce-summaries-kdtacnjogq-uc.a.run.app/ 
2024-07-01 19:11:36.982 PDT > [email protected] produceSummaries
2024-07-01 19:11:36.982 PDT > node --trace-warnings summaries.js
2024-07-01 19:11:37.752 PDT Listening on port:  8080
2024-07-01 19:11:37.753 PDT Default STARTUP TCP probe succeeded after 1 attempt for container "producesummaries-1" on port 8080.
2024-07-01 19:11:37.852 PDT 20240701/
2024-07-01 19:23:14.937 PDT Upload complete: nodejs.org-access.log.20240701.json

Summary: https://storage.googleapis.com/access-logs-summaries-nodejs/nodejs.org-access.log.20240701.json

@trivikr
Copy link
Member Author

trivikr commented Jul 2, 2024

As per scheduled job on 20240702 at 4 AM Pacific, although the scheduled job timed out, the summary was created.

Scheduler job logs, if you have access https://cloudlogging.app.goo.gl/JFmi5694JqyLhbXc7

ERROR 2024-07-02T11:03:00.771601438Z {"@type":"type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished", "debugInfo":"URL_TIMEOUT-TIMEOUT_WEB", "jobName":"projects/nodejs-org/locations/us-central1/jobs/produce_summaries", "status":"DEADLINE_EXCEEDED", "targetType":"HTTP", "url":"https://produce-summaries-kdtacnjogq-uc.a.run.app/"}
  {
    "insertId": "nv67v8f8asxe7",
    "jsonPayload": {
      "debugInfo": "URL_TIMEOUT-TIMEOUT_WEB",
      "url": "https://produce-summaries-kdtacnjogq-uc.a.run.app/",
      "status": "DEADLINE_EXCEEDED",
      "jobName": "projects/nodejs-org/locations/us-central1/jobs/produce_summaries",
      "@type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished",
      "targetType": "HTTP"
    },
    "httpRequest": {
      "status": 504
    },
    "resource": {
      "type": "cloud_scheduler_job",
      "labels": {
        "job_id": "produce_summaries",
        "location": "us-central1",
        "project_id": "nodejs-org"
      }
    },
    "timestamp": "2024-07-02T11:03:00.771601438Z",
    "severity": "ERROR",
    "logName": "projects/nodejs-org/logs/cloudscheduler.googleapis.com%2Fexecutions",
    "receiveTimestamp": "2024-07-02T11:03:00.771601438Z"
  }

However, as per the Cloud Run log the summary was created https://cloudlogging.app.goo.gl/Sk7g9kXw2j6pgj4Z7

INFO 2024-07-02T11:00:00.781423Z https://produce-summaries-kdtacnjogq-uc.a.run.app/
DEFAULT 2024-07-02T11:00:01.948839Z > [email protected] produceSummaries
DEFAULT 2024-07-02T11:00:01.948860Z > node --trace-warnings summaries.js
DEFAULT 2024-07-02T11:00:03.083742Z Listening on port: 8080
INFO 2024-07-02T11:00:03.084869Z Default STARTUP TCP probe succeeded after 1 attempt for container "producesummaries-1" on port 8080.
DEFAULT 2024-07-02T11:00:03.125523Z 20240701/
DEFAULT 2024-07-02T11:11:22.309444Z Upload complete: nodejs.org-access.log.20240701.json

It's likely that we don't need to increase the request timeout as mentioned in #3771 (comment), and some other updates in metrics scripts would have fixed it.

I'll revisit on 20240703 to check if the job succeeds.

@trivikr
Copy link
Member Author

trivikr commented Jul 5, 2024

@trivikr
Copy link
Member Author

trivikr commented Jul 14, 2024

The summaries have been created for 10+ days now without any failures.
Batch processing of logs in #3812 is expected to reduce the time needed further.

Closing.

@trivikr trivikr closed this as completed Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant