Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update volume status after every reconcile loop #144

Conversation

lukas016
Copy link
Contributor

@lukas016 lukas016 commented Jan 29, 2024

Proposed Changes

Rewrite detach/attach logic for using of status as source of truth.

  • status of volumes is updated after every reconcile loop
  • volumes are deleted from status, when they aren't part of domain anymore
  • update state for detaching volumes

Fixes #114
Fixes #145

@lukas016 lukas016 linked an issue Jan 29, 2024 that may be closed by this pull request
@lukas016 lukas016 marked this pull request as ready for review January 29, 2024 14:39
@lukas016 lukas016 requested a review from a team as a code owner January 29, 2024 14:39
pkg/api/machine.go Outdated Show resolved Hide resolved
@lukas016 lukas016 force-pushed the 114-detachvolume-failure-disk-is-not-getting-unplugged-from-domain branch from 55b37bc to b712690 Compare February 5, 2024 11:58
@lukas016
Copy link
Contributor Author

lukas016 commented Feb 5, 2024

PR was updated and rebased

@so-sahu so-sahu requested a review from a team February 5, 2024 12:40
@lukasfrank lukasfrank added the integration-tests to run integration tests label Feb 6, 2024
pkg/api/machine.go Show resolved Hide resolved
pkg/controllers/machine_controller.go Outdated Show resolved Hide resolved
pkg/controllers/machine_controller.go Outdated Show resolved Hide resolved
pkg/controllers/machine_controller.go Outdated Show resolved Hide resolved
pkg/controllers/machine_controller_volumes.go Outdated Show resolved Hide resolved
@so-sahu so-sahu requested a review from a team February 7, 2024 07:54
pkg/controllers/machine_controller.go Show resolved Hide resolved
Comment on lines 905 to 920
if len(volumes) != 0 {
requireUpdate = true
machine.Status.VolumeStatus = volumes
}

if len(nics) != 0 {
requireUpdate = true
machine.Status.NetworkInterfaceStatus = nics
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get the point of this code. How is the length of Volumes and NICs related to updates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you have right comparison for nil will be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 106 to 98
currentStatus := machine.Status.GetVolumesAsMap()

attachedVolumes, err := attacher.ListVolumesAsMap()
if err != nil {
// missing list of attached volumes can affect deletion of volumes, so we cannot continue
return machine.Status.VolumeStatus, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bad practice returning "data" in an error case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we can return nil, it isn't problem with new update status functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return nil, fmt.Errorf("error iterating mounted volumes: %w", err)
syncStatusWithAttachedVolumes(currentStatus, attachedVolumes)

errs := r.deleteDetachedVolumes(ctx, log, currentStatus, mounter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening here? Why do we just continue in case of errors?

Copy link
Contributor Author

@lukas016 lukas016 Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions are designed as failure tolerant. You have multiple logic on one place where one functionality can affect another (detach, delete, resize, attach). So one problematic disk cannot affect another and disks are reconciled based on their current status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function was reverted to previous implementation which was failure tolerant too.

pkg/controllers/machine_controller_volumes.go Outdated Show resolved Hide resolved
Copy link
Member

@afritzler afritzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to subscribe to the LifeCycleEvents of the libvirt Domain and enqueue Machines if something changes in the domain.

Here is some sample code how this might look like:

// Subscribe to lifecycle events using the provided context
eventChan, err := l.LifecycleEvents(ctx)
if err != nil {
	fmt.Println("Failed to subscribe to lifecycle events:", err)
	return
}

fmt.Println("Subscribed to lifecycle events. Listening for events...")

// Listen for lifecycle events in a separate goroutine
go func() {
	for event := range eventChan {
		fmt.Printf("Received lifecycle event: %+v\n", event)
                // enqueue Machines here and switch over the events
	}
	fmt.Println("Lifecycle event channel closed.")
}()

// Wait for context to be cancelled (e.g., by receiving a signal)
<-ctx.Done()
fmt.Println("Context cancelled. Shutting down...")

This can be also used to consolidate other go routines which we are starting during the initialization of the provider.

@lukas016
Copy link
Contributor Author

@afritzler we discussed implement listen to libvirt events. And we want to do that but not now.

@lukas016 lukas016 force-pushed the 114-detachvolume-failure-disk-is-not-getting-unplugged-from-domain branch 2 times, most recently from 850ff8f to 9fe64cd Compare February 16, 2024 08:19
@lukas016 lukas016 force-pushed the 114-detachvolume-failure-disk-is-not-getting-unplugged-from-domain branch from 9fe64cd to 3fe8959 Compare February 16, 2024 09:49
if err := attacher.ForEachVolume(func(volume *AttachVolume) bool {
currentVolumeNames.Insert(volume.Name)
return true
}); err != nil {
return nil, fmt.Errorf("error iterating attached volumes: %w", err)
return nil, fmt.Errorf("error iteratin attached volumes: %w", err)
Copy link
Contributor Author

@lukas016 lukas016 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@lukas016
Copy link
Contributor Author

problems solve in this PR will solve with events and different logic.

@lukas016 lukas016 closed this Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
4 participants