Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune New Relic Apdex Score Parameters #4475

Closed
3 tasks
btylerburton opened this issue Sep 28, 2023 · 6 comments
Closed
3 tasks

Tune New Relic Apdex Score Parameters #4475

btylerburton opened this issue Sep 28, 2023 · 6 comments
Assignees
Labels
O&M Operations and maintenance tasks for the Data.gov platform

Comments

@btylerburton
Copy link
Contributor

btylerburton commented Sep 28, 2023

User Story

In order to make better use of our monitoring and alerts, datagovteam wants to tune the Apdex score to ensure we are delivering an acceptable user experience.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN our logs' response times are being used to determine Apdex user experience score
    THEN we want to set realistic upper and lower bounds to prevent always/never being in alarm
    AND instead to know when performance degrades from what is expected.

Background

After speaking with NR support on 9.27.23, we determined that our Apdex score is not tuned effectively, resulting in our site being in a constant state of alert. We want to tune that metric to more accurately reflect normal performance for our site.

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

@btylerburton btylerburton added the O&M Operations and maintenance tasks for the Data.gov platform label Sep 28, 2023
@hkdctol hkdctol moved this to 📔 Product Backlog in data.gov team board Sep 28, 2023
@hkdctol hkdctol moved this from 📔 Product Backlog to 📟 Sprint Backlog [7] in data.gov team board Sep 28, 2023
@btylerburton btylerburton added O&M Operations and maintenance tasks for the Data.gov platform and removed O&M Operations and maintenance tasks for the Data.gov platform labels Oct 3, 2023
@Jin-Sun-tts Jin-Sun-tts self-assigned this Oct 4, 2023
@Jin-Sun-tts Jin-Sun-tts moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Oct 4, 2023
@Jin-Sun-tts Jin-Sun-tts removed their assignment Oct 10, 2023
@Jin-Sun-tts Jin-Sun-tts moved this from 🏗 In Progress [8] to 📟 Sprint Backlog [7] in data.gov team board Oct 10, 2023
@Jin-Sun-tts Jin-Sun-tts self-assigned this Oct 17, 2023
@Jin-Sun-tts Jin-Sun-tts moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Oct 17, 2023
@Jin-Sun-tts
Copy link
Contributor

how to reach target Apdex score to 90, and we need to set percentage to 80%
Screenshot 2023-10-18 at 11 08 47 AM

based on following formula, the value should be close to 0
select percentile(duration, 80) from Transaction where appID=xxxx since 24 hours ago

here is the current stats, when Apdex set to 0.5 second by default:
Screenshot 2023-10-18 at 12 13 37 PM

@Jin-Sun-tts
Copy link
Contributor

select percentile(duration, 80) from Transaction where entityGuid=xxxx since 30 days ago
1.781

Image

Also re-run above query this morning, got 2.063, so setup up this threshold number and the Apdex should reach 88, will monitor it in 24 hours.

@Jin-Sun-tts
Copy link
Contributor

we may reach .92 Apdex score when do the threshold 3.087 from following query:
select apdex(duration, t: 3.087) from Transaction where entityGuid=xxxx since 30 days ago

setup this threshold for now and will continue monitoring the dashboard and message alert. close this issue.

@github-project-automation github-project-automation bot moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board Oct 20, 2023
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board Oct 26, 2023
@hkdctol hkdctol reopened this Dec 15, 2023
@github-project-automation github-project-automation bot moved this from 🗄 Closed to 📟 Sprint Backlog [7] in data.gov team board Dec 15, 2023
@hkdctol
Copy link
Contributor

hkdctol commented Dec 15, 2023

Reopening. As discussed 12/15, we are still seeing too many alerts. We should raise parameter again, and then monitor alerts for the next few days, to get at the right level.

@Jin-Sun-tts Jin-Sun-tts moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Dec 26, 2023
@Jin-Sun-tts
Copy link
Contributor

Apdex scores have been configured exclusively for the production environment, and they have been performing well thus far.

However, we've been receiving Apdex score-related warning messages from the staging environment. To address this, we have decided to establish the same threshold values for Apdex scores in the staging environment and closely monitor any warning emails.

Regarding the warning emails related to the 'error percentage,' we have set up alert conditions that were initially triggered at a 1% error rate. For now, we have adjusted this threshold to 5% to assess the frequency of these email alerts.

Furthermore, we will be conducting an investigation to identify potential solutions for filtering out errors like 404 etc.

@Jin-Sun-tts
Copy link
Contributor

Attempted to exclude specific HTTP response codes from the server-side configuration, but the initial approach to add them to the ignore list did not yield the desired results. Currently, all log messages appear as plain text, and it seems we lack a mechanism to filter them based on HTTP response codes.

So we decide to implement a customized query or filter within the New Relic dashboard like in #4234

Close this ticket as the warning emails (from staging) reduced after setup threshold, and error percentage adjustment for the alert condition to lead less warning emails from New Relic.

@github-project-automation github-project-automation bot moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board Dec 28, 2023
@btylerburton btylerburton moved this from ✔ Done to 🗄 Closed in data.gov team board Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O&M Operations and maintenance tasks for the Data.gov platform
Projects
Archived in project
Development

No branches or pull requests

3 participants