Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete the implementation of the MD5 checks for "local storage" #829

Open
2 tasks
alberto-bortolan opened this issue Dec 3, 2024 · 1 comment
Open
2 tasks

Comments

@alberto-bortolan
Copy link
Contributor

alberto-bortolan commented Dec 3, 2024

Project board link

Medusa allows for MD5 checks to be optionally enabled during backups and backup verifications ( backup , backup-cluster, verify commands) via the command line option --enable-md5-checks or the configuration parameter enable_md5_checks

While this works in general, in the case of local storage:

  1. the relevant methods to do the check ignore MD5 values
  2. the MD5 value is calculated on the fly every time the files in the backups get listed
  3. No MD5 information is stored as part of the backup

differences between cloud and local storage

Metadata object information (like size, datetime) for the various cloud providers also includes the MD5 value of the file, which is calculated automatically on upload and stored as part of the metadata.
Local filesystem metadata information ( stat()) does not come with an MD5 field so the value should be associated to the file in some other way. Options are:
A) if the underline filesystem allows it use an extended attribute
B) otherwise store the MD5 value in dedicated files stored on backup side

Option (A) would work on genuinely local filesystems (i.e. a local disk mounted ) and for NFS mounts but only if NFS is version 4 or newer. unfortunately NFS v3 seems still widely used and it does not support extended attributes which would force at least in this case to go to option (B) or to impose a minimum version of NFS. Other network mount options (SMB) also may or may not support extended attributes depending on the version or mount type.

Option (B) would work regardless but it needs to be implemented carefully to avoid generating too many extra files and also cater for cases where older backups were taken with a version of Medusa that did not have the feature, or when a cloud backup is moved to local storage.

A performance issue with the current code

Point 2 has the potential to introduce delays every time a list of the files on backup need to be retrieved, which happens not just during a backup, but also during conceptually simple operations such as list-backups . This delay can become substantial if the size of the the backups is large, especially if network mounts are used.

I propose to tackle this in two steps:

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-112

@pstef
Copy link

pstef commented Dec 17, 2024

I don't like the option B) because of the burden it would place on maintaining the code and also on maintaining the files in each backup repository.

For this reason option A) seems more attractive to me. I wouldn't worry about NFSv4 because it's only a problem in corporate environments where nothing is allowed to change and in that case it's hard to imagine that introduction of Medusa would be possible either.

The bigger problem here is that extended attributes over NFS are only supported since Linux 5.9 (release notes) and, for example, Ubuntu 22 meets this requirement just barely (currently 5.15, exactly one year younger than 5.9).

I also think that extended attributes, even if available, shouldn't be a hard requirement. It's easy to imagine a situation where someone unaware of this would send an entire backup repository using rsync or tar without using --xattrs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants