Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Windows flow takes 1.5h #13726

Open
Tracked by #13813
comphead opened this issue Dec 10, 2024 · 16 comments
Open
Tracked by #13813

CI: Windows flow takes 1.5h #13726

comphead opened this issue Dec 10, 2024 · 16 comments
Assignees
Labels
enhancement New feature or request

Comments

@comphead
Copy link
Contributor

Is your feature request related to a problem or challenge?

The windows CI flow consistently deteriorates and now takes 1.5h

Describe the solution you'd like

Investigate the reason

Describe alternatives you've considered

No response

Additional context

No response

@comphead comphead added the enhancement New feature or request label Dec 10, 2024
@comphead comphead self-assigned this Dec 10, 2024
@comphead
Copy link
Contributor Author

With latest PR #13718

I can see some SLT tests are taking forever on Windows

Executed "map.slt". Took 460.5636141s
Executed "window.slt". Took 217.240136s
Executed "struct.slt". Took 1316.049712s
Executed "array.slt". Took 2407.7054095s

Which probably relates to different optimizations needed to apply for Windows platform comparing to unix/macos

@comphead
Copy link
Contributor Author

The compilation is 13 mins for main DF and for CLI so 26mins in total comparing to 10mins in macos/unix, but the biggest problem for now is test run time which is probably related to different set of optimizations Windows expects

@findepi
Copy link
Member

findepi commented Dec 11, 2024

cc @sadboy

@comphead
Copy link
Contributor Author

Thanks @korowa for pointing on slow Windows runner issues
actions/runner-images#7320

Some projects give up rust testing on Windows like DataDog/orchestrion#415

@alamb wondering should we pause the Windows flow temporarily? The build time is incredibly slow and we holding GH resources for way longer time

Leaving this issue opened, probably can be fixed with
actions/runner-images#10806

@alamb
Copy link
Contributor

alamb commented Dec 15, 2024

@alamb wondering should we pause the Windows flow temporarily? The build time is incredibly slow and we holding GH resources for way longer time

I think it is a good think to consider

The value of running Windows CI in my mind is that we ensure that people developing on windows can do so given a stable base. I wonder if there is some way we can pare back windows testing (like maybe compile with minimal features and do minimal testing, sqllogictests for example)?

@comphead
Copy link
Contributor Author

The thing is the sqllogictests are enormously slow. Looks at that numbers

Executed "array.slt". Took 2407.7054095s

it depends slightly on LTO and optimizations but still most of sqllogictests are 70-100x times slower than linux/macos

@alamb
Copy link
Contributor

alamb commented Dec 16, 2024

The thing is the sqllogictests are enormously slow. Looks at that numbers

Executed "array.slt". Took 2407.7054095s

😮 that is crazy. I don't really know anything about building / optimizing for windows, but there must be some low hanging fruit there. 2400 seconds is like 40 minutes!

@comphead
Copy link
Contributor Author

I have already played with different optimizations and there is still no progress. Namely:

  • different LTO settings
  • different opt-levels
  • windows images
  • linkers
  • code-units
    and many more. The best for now is either leave it as is or pause the flow

@findepi
Copy link
Member

findepi commented Dec 17, 2024

i don't personally don't care how much CPU hours we spend on windows runners, but I do care about build latency (overall throughput). Thanks for working on this @comphead . I appreciate the effort. I understand we exhausted what's readily available to current project maintainers. I OK disabling tests of windows if we see these queue up (and this affecting build latency). If someone cares about windows dev-ex & stability, we'd appreciate contributions to make the CI usable.

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

I agree 100% about build latency, etc
I have started writing up several related tickets / organizing the work under the following epic

@comphead
Copy link
Contributor Author

@alamb @findepi does that mean you guys okay with temporary Windows flow paused?
I'll keep eye on new Windows 2025 server runner which promises 20x speed boost and reenable tests once this runner has released.

@alamb
Copy link
Contributor

alamb commented Dec 18, 2024

Thanks @comphead

@Alexhuszagh
Copy link
Contributor

On my machine, one issue was the memory usage, so decreasing the number of created binaries in the test cases is unlikely to help on my end. I tried on both Windows and Linux (Linux with ~12GB available of RAM, Windows with ~30GB available). I'm wondering if using a build script and conditionally adding in options to disable tests if environment variables are set could improve this for testing of specific changes for local development?

@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

@Alexhuszagh can you build DataFusion with a command like this:

cargo build -j 1

(I am wondering if something about doing so many things in parallel is the problem)

@Alexhuszagh
Copy link
Contributor

@Alexhuszagh can you build DataFusion with a command like this:

cargo build -j 1

(I am wondering if something about doing so many things in parallel is the problem)

The tests were able to build and run under these conditions but the full suite took ~90 min to build on WSL2. So at least with incremental builds I should be able to do some testing.

@alamb
Copy link
Contributor

alamb commented Dec 21, 2024

@Alexhuszagh can you build DataFusion with a command like this:

cargo build -j 1

(I am wondering if something about doing so many things in parallel is the problem)

The tests were able to build and run under these conditions but the full suite took ~90 min to build on WSL2. So at least with incremental builds I should be able to do some testing.

Yikes!

We were also seeing some very slow windows builds too:

I wonder if we are doing something silly with windows that we should fix 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants