Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch time sync plugin to handle stack too deep errors #993

Open
3 tasks
zanete opened this issue Aug 27, 2024 · 12 comments
Open
3 tasks

patch time sync plugin to handle stack too deep errors #993

zanete opened this issue Aug 27, 2024 · 12 comments
Assignees
Milestone

Comments

@zanete
Copy link

zanete commented Aug 27, 2024

Why: Sub of #949 . The framework is implemented to deal with very granular time units (seconds), but in real life the units for measurements come in larger time intervals that the IF is unable to handle, causing overflow errors.
What: Find an MVP solution for the time being to enable working with larger time intervals. Will return to this problem at a later date to implement a more robust solution

Context

We often have patchy data, we don't have granular data, or we are only interested in values at e.g. monthly time resolution.

We don't work well in these situations, even though they are very common - we're really just set up for situations where we can load in granular time series.

We often see people (and this is what I ended up doing in v1 of the GSF site manifest) have a single timestep with a duration of 1 month or one year in seconds and then execute a single, large pipeline that eventually yields SCI. This is because it's very fiddly and repetitive to compartmentalise into individual components and often we can't / don't really need to access granular temporal data anyway.

If I just want an overall SCI value for my entire application, I'm not going to spend a week trying to source granular data, or even worse naively chunking up the single value that fits my needs into time block just because that's what a manifest favours.

As a user in this situation it's much easier to interpret my manifest when I've just constructed a pipeline, can follow through the logic and read off my single value as compared to interpreting aggregated data, which is presented in quite a non-aesthetic way.

A separate time issue is the current behaviour of time sync. It works as a plugin, which means we have to invoke it in every pipeline in the tree. But we encountered some problematic side effects associated with making it a “global” feature. On balance we decided to stick with the status quo for now, but we need to revisit the way we handle time, perhaps starting from a blank page.

The current time-sync plugin often errors out with “RangeError: Maximum call stack size exceeded” when we’re using time units greater than minutes. For example, time-sync applied to monthly data, even when we just want to create time series at a resolution of 10 or even 100 hours, throws with this error. Maybe this is because of our “base” resolution being fixed at 1s, which creates too much data when we’re trying to operate over monthly data. Either way, this needs fixing or we’re only applicable to observations over seconds -> minute ranges, not week -> month -> year ranges.

Scope of work:

  • @jmcook1186 Provide a manifest file that throws the errors
  • Discuss options for a quick n dirty solution
  • Implement solution
@zanete zanete added this to IF Aug 27, 2024
@zanete zanete converted this from a draft issue Aug 27, 2024
@zanete zanete added the core-only This issue is reserved for the IF core team only label Aug 27, 2024
@zanete zanete changed the title patch time sync plugin to handle overflow patch time sync plugin to handle stack too deep errors Aug 27, 2024
@zanete
Copy link
Author

zanete commented Aug 27, 2024

There is also a community PR open that could fix this

@zanete zanete moved this from In Design to Ready in IF Sep 2, 2024
@zanete zanete mentioned this issue Sep 16, 2024
8 tasks
@zanete zanete added this to the IF Lifecycle Assessment milestone Sep 17, 2024
@narekhovhannisyan narekhovhannisyan moved this from Ready to In Progress in IF Sep 18, 2024
@zanete
Copy link
Author

zanete commented Sep 19, 2024

@jmcook1186 any chance you could point us to the manifest that was failing?

@zanete zanete moved this from In Progress to Blocked in IF Sep 26, 2024
@jmcook1186
Copy link
Contributor

@zanete @narekhovhannisyan

The manifest below yields the following error:

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xcb8196 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
 2: 0x1033090 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 3: 0x1033377 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0x12525c5  [node]
 5: 0x1252a9e  [node]
 6: 0x1267cc6 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
 7: 0x12687e9  [node]
 8: 0x1268df8  [node]
 9: 0x19b8811  [node]
Aborted (core dumped)

However - i really had to push the time sync parameters to an extreme set of values to cause this!

name: nesting
description: a manifest that includes nested child components
tags:
  kind: web
  complexity: moderate
  category: on-premise
aggregation:
  metrics:
    - carbon
  type: "both"
initialize:
  plugins:
    "interpolate":
      method: Interpolation
      path: "builtin"
      config:
        method: linear
        x: [0, 10, 50, 100]
        y: [0.12, 0.32, 0.75, 1.02]
        input-parameter: "cpu/utilization"
        output-parameter: "cpu-factor"
      parameter-metadata:
        inputs:
          cpu/utilization:
            unit: percentage
            description: refers to CPU utilization.
            aggregation-method:
              time: avg
              component: sum
        outputs:
          cpu-factor:
            unit: kWh
            description: result of interpolate
            aggregation-method:
              time: avg
              component: avg
    "cpu-factor-to-wattage":
      method: Multiply
      path: builtin
      config:
        input-parameters: ["cpu-factor", "cpu/thermal-design-power"]
        output-parameter: "cpu-wattage"
      parameter-metadata:
        inputs:
          cpu-factor:
            unit: kWh
            description: result of interpolate
            aggregation-method:
              time: avg
              component: avg
          cpu/thermal-design-power:
            unit: kWh
            description: thermal design power for a processor
            aggregation-method:
              time: avg
              component: avg
        outputs:
          cpu-wattage:
            unit: kWh
            description: the energy used by the CPU
            aggregation-method:
              time: sum
              component: sum
    "wattage-times-duration":
      method: Multiply
      path: builtin
      config:
        input-parameters: ["cpu-wattage", "duration"]
        output-parameter: "cpu-wattage-times-duration"
    "wattage-to-energy-kwh":
      method: Divide
      path: "builtin"
      config:
        numerator: cpu-wattage-times-duration
        denominator: 3600000
        output: cpu-energy-raw
      parameter-metadata:
        inputs:
          cpu-wattage-times-duration:
            unit: kWh
            description: CPU wattage multiplied by duration
            aggregation-method:
              time: sum
              component: sum
        outputs:
          cpu-energy-raw:
            unit: kWh
            description: Raw energy used by CPU in kWh
            aggregation-method:
              time: sum
              component: sum
    "calculate-vcpu-ratio":
      method: Divide
      path: "builtin"
      config:
        numerator: vcpus-total
        denominator: vcpus-allocated
        output: vcpu-ratio
      parameter-metadata:
        outputs:
          vcpu-ratio:
            unit: none
            description: Ratio of vCPUs
            aggregation-method:
              time: copy
              component: copy
    "correct-cpu-energy-for-vcpu-ratio":
      method: Divide
      path: "builtin"
      config:
        numerator: cpu-energy-raw
        denominator: vcpu-ratio
        output: cpu-energy-kwh
    sci-embodied:
      path: "builtin"
      method: SciEmbodied
    "operational-carbon":
      method: Multiply
      path: builtin
      config:
        input-parameters: ["cpu-energy-kwh", "grid/carbon-intensity"]
        output-parameter: "carbon-operational"
      parameter-metadata:
        inputs:
          cpu-energy-kwh:
            unit: kWh
            description: Corrected CPU energy in kWh
            aggregation-method:
              time: sum
              component: sum
          grid/carbon-intensity:
            unit: gCO2eq/kWh
            description: Carbon intensity for the grid
            aggregation-method:
              time: avg
              component: avg
        outputs:
          carbon-operational:
            unit: gCO2eq
            description: Operational carbon footprint
            aggregation-method:
              time: sum
              component: sum
    sci:
      path: "builtin"
      method: Sci
      config:
        functional-unit: "requests"
      parameter-metadata:
        inputs:
          requests:
            unit: none
            description: expressed the final SCI value
            aggregation-method:
              time: sum
              component: sum
    "sum-carbon":
      path: "builtin"
      method: Sum
      config:
        input-parameters:
          - carbon-operational
          - embodied-carbon
        output-parameter: carbon
      parameter-metadata:
        inputs:
          carbon-operational:
            description: Operational carbon footprint
            unit: gCO2eq
            aggregation-method:
              time: sum
              component: sum
          embodied-carbon:
            description: Embodied carbon footprint
            unit: gCO2eq
            aggregation-method:
              time: sum
              component: sum
        outputs:
          carbon:
            description: Total carbon footprint
            unit: gCO2eq
            aggregation-method:
              time: sum
              component: sum
    time-sync:
      method: TimeSync
      path: "builtin"
      config:
        start-time: '2023-01-01T00:00:00.000Z'
        end-time: '2024-01-01T00:00:00.000Z'
        interval: 2
        allow-padding: true
      parameter-metadata:
        inputs:
          timestamp:
            unit: RFC3339
            description: refers to the time of occurrence of the input
            aggregation-method:
              time: none
              component: none
          duration:
            unit: seconds
            description: refers to the duration of the input
            aggregation-method:
              time: sum
              component: sum
          cloud/instance-type:
            unit: none
            description: type of Cloud Instance name used in the cloud provider APIs
            aggregation-method:
              time: copy
              component: copy
          cloud/region:
            unit: none
            description: region cloud instance
            aggregation-method:
              time: copy
              component: copy
          time-reserved:
            unit: seconds
            description: time reserved for a component
            aggregation-method:
              time: avg
              component: avg
          network/energy:
            description: "Energy consumed by the Network of the component"
            unit: "kWh"
            aggregation-method:
              time: sum
              component: sum

tree:
  children:
    child-0:
      defaults:
        cpu/thermal-design-power: 100
        grid/carbon-intensity: 800
        device/emissions-embodied: 1533.120 # gCO2eq
        time-reserved: 3600 # 1hr in seconds
        device/expected-lifespan: 94608000 # 3 years in seconds
        vcpus-allocated: 1
        vcpus-total: 8
      pipeline:
        compute:
          - interpolate
          - cpu-factor-to-wattage
          - wattage-times-duration
          - wattage-to-energy-kwh
          - calculate-vcpu-ratio
          - correct-cpu-energy-for-vcpu-ratio
          - sci-embodied
          - operational-carbon
          - sum-carbon
          - time-sync
          - sci
      inputs:
        - timestamp: "2023-01-01T00:00:00.000Z"
          cloud/instance-type: A1
          cloud/region: uk-west
          duration: 2629800
          cpu/utilization: 50
          network/energy: 0.000001
          requests: 50
        - timestamp: "2023-02-01T00:00:00.000Z"
          duration: 2629800
          cpu/utilization: 20
          cloud/instance-type: A1
          cloud/region: uk-west
          network/energy: 0.000001
          requests: 60
        - timestamp: "2023-03-01T00:00:00.000Z"
          duration: 2629800
          cpu/utilization: 15
          cloud/instance-type: A1
          cloud/region: uk-west
          network/energy: 0.000001
          requests: 70
        - timestamp: "2023-04-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-05-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-06-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-07-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
    child-1:
      defaults:
        cpu/thermal-design-power: 100
        grid/carbon-intensity: 800
        device/emissions-embodied: 1533.120 # gCO2eq
        time-reserved: 3600 # 1hr in seconds
        device/expected-lifespan: 94608000 # 3 years in seconds
        vcpus-allocated: 1
        vcpus-total: 8
      pipeline:
        compute:
          - interpolate
          - cpu-factor-to-wattage
          - wattage-times-duration
          - wattage-to-energy-kwh
          - calculate-vcpu-ratio
          - correct-cpu-energy-for-vcpu-ratio
          - sci-embodied
          - operational-carbon
          - sum-carbon
          - time-sync
          - sci
      inputs:
        - timestamp: "2023-01-01T00:00:00.000Z"
          cloud/instance-type: A1
          cloud/region: uk-west
          duration: 2629800
          cpu/utilization: 50
          network/energy: 0.000001
          requests: 50
        - timestamp: "2023-02-01T00:00:00.000Z"
          duration: 2629800
          cpu/utilization: 20
          cloud/instance-type: A1
          cloud/region: uk-west
          network/energy: 0.000001
          requests: 60
        - timestamp: "2023-03-01T00:00:00.000Z"
          duration: 2629800
          cpu/utilization: 15
          cloud/instance-type: A1
          cloud/region: uk-west
          network/energy: 0.000001
          requests: 70
        - timestamp: "2023-04-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-05-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-06-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-07-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
    child-2:
      children:
        child-2-0:
          defaults:
            cpu/thermal-design-power: 100
            grid/carbon-intensity: 800
            device/emissions-embodied: 1533.120 # gCO2eq
            time-reserved: 3600 # 1hr in seconds
            device/expected-lifespan: 94608000 # 3 years in seconds
            vcpus-allocated: 1
            vcpus-total: 8
          pipeline:
            compute:
              - interpolate
              - cpu-factor-to-wattage
              - wattage-times-duration
              - wattage-to-energy-kwh
              - calculate-vcpu-ratio
              - correct-cpu-energy-for-vcpu-ratio
              - sci-embodied
              - operational-carbon
              - sum-carbon
              - time-sync
              - sci
          inputs:
            - timestamp: "2023-01-01T00:00:00.000Z"
              cloud/instance-type: A1
              cloud/region: uk-west
              duration: 2629800
              cpu/utilization: 50
              network/energy: 0.000001
              requests: 50
            - timestamp: "2023-02-01T00:00:00.000Z"
              duration: 2629800
              cpu/utilization: 20
              cloud/instance-type: A1
              cloud/region: uk-west
              network/energy: 0.000001
              requests: 60
            - timestamp: "2023-03-01T00:00:00.000Z"
              duration: 2629800
              cpu/utilization: 15
              cloud/instance-type: A1
              cloud/region: uk-west
              network/energy: 0.000001
              requests: 70
            - timestamp: "2023-04-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-05-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-06-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-07-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
        child-2-1:
          defaults:
            cpu/thermal-design-power: 100
            grid/carbon-intensity: 800
            device/emissions-embodied: 1533.120 # gCO2eq
            time-reserved: 3600 # 1hr in seconds
            device/expected-lifespan: 94608000 # 3 years in seconds
            vcpus-allocated: 1
            vcpus-total: 8
          pipeline:
            compute:
              - interpolate
              - cpu-factor-to-wattage
              - wattage-times-duration
              - wattage-to-energy-kwh
              - calculate-vcpu-ratio
              - correct-cpu-energy-for-vcpu-ratio
              - sci-embodied
              - operational-carbon
              - sum-carbon
              - time-sync
              - sci
          inputs:
            - timestamp: "2023-01-01T00:00:00.000Z"
              cloud/instance-type: A1
              cloud/region: uk-west
              duration: 2629800
              cpu/utilization: 50
              network/energy: 0.000001
              requests: 50
            - timestamp: "2023-02-01T00:00:00.000Z"
              duration: 2629800
              cpu/utilization: 20
              cloud/instance-type: A1
              cloud/region: uk-west
              network/energy: 0.000001
              requests: 60
            - timestamp: "2023-03-01T00:00:00.000Z"
              duration: 2629800
              cpu/utilization: 15
              cloud/instance-type: A1
              cloud/region: uk-west
              network/energy: 0.000001
              requests: 70
            - timestamp: "2023-04-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-05-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-06-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-07-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55

@zanete zanete moved this from Blocked to Ready in IF Sep 27, 2024
@zanete zanete removed the core-only This issue is reserved for the IF core team only label Sep 30, 2024
@zanete
Copy link
Author

zanete commented Oct 10, 2024

@jmcook1186 please attach the new manifest that throws this error

@zanete
Copy link
Author

zanete commented Oct 10, 2024

@narekhovhannisyan your suggestion - to break down the file into logical chunks

@zanete zanete removed this from the IF 1.0 milestone Oct 21, 2024
@zanete
Copy link
Author

zanete commented Oct 21, 2024

Put it aside given #1057

@zanete zanete closed this as not planned Won't fix, can't repro, duplicate, stale Oct 21, 2024
@github-project-automation github-project-automation bot moved this from Ready to Done in IF Oct 21, 2024
@zanete zanete reopened this Nov 1, 2024
@zanete
Copy link
Author

zanete commented Nov 1, 2024

as #1057 is not going to be implemented, this again becomes an issue to fix

@zanete zanete moved this from Done to Ready in IF Nov 1, 2024
@zanete
Copy link
Author

zanete commented Nov 15, 2024

Will get back to this after discovering what was the blocker with the group by issue

@zanete zanete added this to the IF 1.0 milestone Nov 18, 2024
@zanete
Copy link
Author

zanete commented Nov 22, 2024

Status update: expecting a PR by early next week

@zanete
Copy link
Author

zanete commented Dec 2, 2024

@jmcook1186 to confirm that @narekhovhannisyan solution in slack would work, then we can finalise this :) 🙏

@zanete
Copy link
Author

zanete commented Dec 17, 2024

Status update: Will need a quick sync with Joseph in order to restart.

@narekhovhannisyan
Copy link
Member

Josephs comment from slack thread:

I think the most efficient resampling resolution can be found as the greatest common divisor across the concatenation of arrays A, B, C
where
A = all the unique values in [timestamp[N]+duration, timestamp[N+1]+duration ... ] (i.e. the periods under observation in the inputs array)
B = all the unique values in [... (timestamp[N+2] - timestamp[N+1]+duration) , (timestamp[N+1] - timestamp[N]+duration)] (i.e. the gaps in the time series)
C = the user-defined resampling interval from the time-sync config

(applied after trimming or extrapolating to the time-sync start/end)

in an ideal scenario, there is only one unique value in A and B, and in B it is 0. In this happy case, the problem reduces to the GCD between A and C (i.e. what value divides both the existing time interval and the desired time interval) (edited)

So for a simple worked example:
let's say a time series has timestamps separated by the following number of seconds:
[50, 100, 20, 50]
and the durations are [40, 80, 20, 40]
This leaves gaps of
[10, 20, 0, 10]
And we want to resample at an interval of 30 s
So we have
A = [20, 40, 80]
B = [10, 20]
C = [30]
We want to find the greatest common divisor of ABC
ABC = [10, 20, 30, 40, 80]
So the resampling interval should be 10
Just for experimentation purposes, we remove the element in the time series that yields a gap of 0, and get
ABC = [20, 30, 40, 80]
The resampling interval is still 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants