Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on CAMS experiments #20

Open
patel-zeel opened this issue Aug 30, 2024 · 2 comments
Open

Questions on CAMS experiments #20

patel-zeel opened this issue Aug 30, 2024 · 2 comments
Labels
question Further information is requested

Comments

@patel-zeel
Copy link

patel-zeel commented Aug 30, 2024

Hi authors and @wesselb!

It is great to see such a powerful foundation model for air quality. Thank you for making the codebase available. I had a few questions about the CAMS experiments.

  1. In Figure 2 (Aurora outperforms operational CAMS across many targets) of the paper, what was the ground truth to compute the RMSE of CAMS and Aurora and show that Aurora has, at best, 30% better RMSE? For example, the Integrated Surface Database (ISD) was used as the ground truth for meteorology.
  2. Do you have more insights on using or not using emission inventory as input to Aurora? Did you do any small/large scale experiments with/without emission inventory as an input and know whether it is useful? The question is more from an ML point of view on intuitions about the usefulness of emission inventory.
@wesselb
Copy link
Contributor

wesselb commented Aug 30, 2024

Hey @patel-zeel! Good to hear from you. :)

We're very excited about the air quality application, and hope to release the air quality version here soon too.

For the CAMS experiments, we used CAMS analysis as the ground truth. (To clarify, the CAMS system produces both forecasts and an analysis product. The forecasts are, well, forecasts; and the analysis product is the system's best estimate of the ground truth.) We did not compare to any station measurements. Such a comparison would be possible, but I think that the model's resolution is just too low for that. At 0.4 degrees, you really only capture average/background levels and no local effects.

Aurora would probably perform better if we also included estimates of anthropogenic factors. We did not do any ablation studies to see what the effect of including an emission inventory is. My intuition is that, given enough data, the model should be able to learn these effects automatically, at least to some extent. The problem is that CAMS data is very scarce, so it's likely that explicitly accounting for these factors will improve performance.

@patel-zeel
Copy link
Author

Thank you for the clarification, @wesselb. This is useful.

@wesselb wesselb added the question Further information is requested label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants