Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: force pods with volumes to be scheduled on Cloud servers #743

Merged
merged 12 commits into from
Oct 29, 2024

Conversation

lukasmetzner
Copy link
Contributor

Due to a bug in the scheduler a node with no driver instance might be picked and the volume is stuck in pending as the "no capacity - > reschedule" recovery is never triggered [0], [1].

@lukasmetzner lukasmetzner requested a review from a team as a code owner October 9, 2024 14:05
@lukasmetzner lukasmetzner linked an issue Oct 9, 2024 that may be closed by this pull request
Copy link

codecov bot commented Oct 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 35.98%. Comparing base (7211dd8) to head (76abdbb).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #743      +/-   ##
==========================================
+ Coverage   35.95%   35.98%   +0.03%     
==========================================
  Files          20       20              
  Lines        1847     1848       +1     
==========================================
+ Hits          664      665       +1     
  Misses       1150     1150              
  Partials       33       33              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lukasmetzner lukasmetzner self-assigned this Oct 9, 2024
@lukasmetzner lukasmetzner marked this pull request as draft October 9, 2024 14:23
@lukasmetzner lukasmetzner force-pushed the 400-no-topology-key-found-on-hw-nodes branch from 5145da1 to 0a409cb Compare October 23, 2024 08:33
docs/kubernetes/README.md Outdated Show resolved Hide resolved
chart/templates/core/storageclass.yaml Outdated Show resolved Hide resolved
chart/templates/core/storageclass.yaml Outdated Show resolved Hide resolved
chart/values.yaml Outdated Show resolved Hide resolved
@lukasmetzner lukasmetzner requested a review from apricote October 28, 2024 09:52
@lukasmetzner lukasmetzner marked this pull request as ready for review October 28, 2024 10:26
docs/kubernetes/README.md Outdated Show resolved Hide resolved
docs/kubernetes/README.md Outdated Show resolved Hide resolved
chart/.snapshots/full.values.yaml Outdated Show resolved Hide resolved
docs/kubernetes/README.md Outdated Show resolved Hide resolved
lukasmetzner and others added 11 commits October 28, 2024 11:49
provided-by is a new label, which will be automatically applied by hccm in a future release. It provides more detail about the origin of a server and allows for easier custom extension (e.g. robot + cloud + raspberry pi).
@apricote apricote changed the title fix: Volume requests are falsely scheduled to Robot servers feat: force pods with volumes to be scheduled on Cloud servers Oct 29, 2024
@lukasmetzner lukasmetzner merged commit 702fe01 into main Oct 29, 2024
8 checks passed
@lukasmetzner lukasmetzner deleted the 400-no-topology-key-found-on-hw-nodes branch October 29, 2024 11:40
lukasmetzner pushed a commit that referenced this pull request Oct 29, 2024
🤖 I have created a release *beep* *boop*
---


##
[2.10.0](v2.9.0...v2.10.0)
(2024-10-29)


### Features

* add support & tests for Kubernetes 1.31
([#721](#721))
([85035b9](85035b9))
* allow arbitrary length API tokens
([#724](#724))
([61c3a0e](61c3a0e))
* allow passing mkfs format options via storage class parameters
([#747](#747))
([4b9aa4e](4b9aa4e))
* change XFS default options to support older kernels
([#747](#747))
([4b9aa4e](4b9aa4e))
* drop tests for Kubernetes 1.27
([#722](#722))
([d46a54b](d46a54b))
* force pods with volumes to be scheduled on Cloud servers
([#743](#743))
([702fe01](702fe01))
* fstype is directly passed to mkfs: mkfs.<fstype>
([#749](#749))
([173bf2f](173bf2f))
* support for SELinux mount
([#756](#756))
([719247e](719247e)),
closes [#582](#582)
* Support SINGLE_NODE_MULTI_WRITER capability
([#725](#725))
([cd53c23](cd53c23)),
closes [#327](#327)
* **swarm:** removed workaround support for mock staging/unstaging
([#746](#746))
([465ec21](465ec21))


### Bug Fixes

* do not log sensitive mount options
([#755](#755))
([0b6e860](0b6e860))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
lukasmetzner added a commit that referenced this pull request Nov 11, 2024
Due to a bug in the scheduler a node with no driver instance might be
picked and the volume is stuck in pending as the "no capacity - >
reschedule" recovery is never triggered
[[0]](kubernetes/kubernetes#122109),
[[1]](kubernetes-csi/external-provisioner#544).

- See #400

---------

Co-authored-by: lukasmetzner <[email protected]>
Co-authored-by: Julian Tölle <[email protected]>
lukasmetzner pushed a commit that referenced this pull request Nov 11, 2024
🤖 I have created a release *beep* *boop*
---


##
[2.10.0](v2.9.0...v2.10.0)
(2024-10-29)


### Features

* add support & tests for Kubernetes 1.31
([#721](#721))
([85035b9](85035b9))
* allow arbitrary length API tokens
([#724](#724))
([61c3a0e](61c3a0e))
* allow passing mkfs format options via storage class parameters
([#747](#747))
([4b9aa4e](4b9aa4e))
* change XFS default options to support older kernels
([#747](#747))
([4b9aa4e](4b9aa4e))
* drop tests for Kubernetes 1.27
([#722](#722))
([d46a54b](d46a54b))
* force pods with volumes to be scheduled on Cloud servers
([#743](#743))
([702fe01](702fe01))
* fstype is directly passed to mkfs: mkfs.&lt;fstype&gt;
([#749](#749))
([173bf2f](173bf2f))
* support for SELinux mount
([#756](#756))
([719247e](719247e)),
closes [#582](#582)
* Support SINGLE_NODE_MULTI_WRITER capability
([#725](#725))
([cd53c23](cd53c23)),
closes [#327](#327)
* **swarm:** removed workaround support for mock staging/unstaging
([#746](#746))
([465ec21](465ec21))


### Bug Fixes

* do not log sensitive mount options
([#755](#755))
([0b6e860](0b6e860))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No topology key found on hw nodes
2 participants