Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm(..., recursive=true) "directory not empty" errors #34700

Open
stevengj opened this issue Feb 8, 2020 · 37 comments
Open

rm(..., recursive=true) "directory not empty" errors #34700

stevengj opened this issue Feb 8, 2020 · 37 comments
Labels
bug Indicates an unexpected problem or unintended behavior system:windows Affects only Windows

Comments

@stevengj
Copy link
Member

stevengj commented Feb 8, 2020

I was trying to delete my Anaconda installation, which is a huge directory tree (10000+ files), with

using Conda
rm(Conda.ROOTENV, recursive=true)

and I keep getting SystemError .... rmdir: Directory not empty errors. However, if I check that directory, I find that it is indeed empty. Indeed, if I run rm(Conda.ROOTENV, recursive=true) again, then it succeeds in deleting that directory, but then proceeds to give me a similar error with another directory, ad nauseam.

For example (apologies for the screenshot):
image

It looks like there is some kind of race condition in the recursive rm, where it empties the directory, but Windows doesn't realize that the files are gone when it tries to rmdir.

(To reproduce, add the Conda.jl package, make sure it installs Anaconda by running Conda.add("jupyter"), and then try running rm as above. I'm seeming this in a Windows VM, but I saw the same problem on a student's machine earlier today, both with Julia 1.3.1.)

@stevengj stevengj added the system:windows Affects only Windows label Feb 8, 2020
@DilumAluthge
Copy link
Member

Out of curiosity, what happens if you try running rm with the keyword argument force = true?

@DilumAluthge
Copy link
Member

Also, can you post the output of ] st -m?

@stevengj
Copy link
Member Author

stevengj commented Feb 8, 2020

I tried it with force=true (actually I did this in the screenshot), and there was no difference.

@stevengj
Copy link
Member Author

stevengj commented Feb 8, 2020

st -m output is here, though that shouldn't matter much as rm is a Base function. (Again, apologies for the screenshot — I'm running Windows in a VM and having trouble getting copy-and-paste to cross the VM boundary. But a student's native Windows 10 laptop with a fresh Julia installation exhibited the same problem yesterday.)

image

@StefanKarpinski StefanKarpinski added the bug Indicates an unexpected problem or unintended behavior label Feb 8, 2020
@EdoAlvarezR
Copy link

Running into the same issue

@EdoAlvarezR
Copy link

Turns out that rm(...) works just fine. It was a system-level issue: there was a .fuse_hidden file in the directory I wanted to remove that was keeping the system from being able to remove it (even the terminal rm was throwing the same error).

@stevengj
Copy link
Member Author

Turns out that rm(...) works just fine. It was a system-level issue

I don't think that was the case in my original issue, where simply running the rm command twice worked.

@KristofferC
Copy link
Member

@AbhimanyuAryan
Copy link

AbhimanyuAryan commented Feb 22, 2022

running into same issue ERROR: LoadError: SystemError (with ../build): rmdir: Directory not empty with rm("../build", force=true) the directory contains sub directories with .html files

julia 1.6 intel macbook m1

@vtjnash
Copy link
Member

vtjnash commented Feb 22, 2022

where it empties the directory, but Windows doesn't realize that the files are gone when it tries to rmdir.

Windows does not delete files when you rm them, though it is possible to implement that as a feature by abusing the rename syscall. For example: vtjnash/Pidfile.jl@81ef0a5#diff-3b86733f3bbb623c9149814588ef59698913a5e9d9654f999cc726c50a8334cdR241

@KristofferC
Copy link
Member

That talks about open files. I don't think that is the case in the other scenarios here.

@vtjnash
Copy link
Member

vtjnash commented Feb 22, 2022

If you are running a virus scanner, many of those causes deletion errors without the trick above

@AhmedSalih3d
Copy link

I am seeing this error again in Julia 1.8.

@StefanKarpinski
Copy link
Member

Seems like we should probably implement the trick from pidfiles and/or put some retry logic in on Windows. Annoying but better than being unreliable at this.

@vtjnash
Copy link
Member

vtjnash commented Oct 25, 2022

The pidfiles trick gets a bit harder if you are dealing with directories, since you need to attempt to move all of the files to the parent while deleting them and then the directory. Or delay the directory deletion for some random later time (maybe with fsnotify?)

@StefanKarpinski
Copy link
Member

The stupidest thing we could do is retry a few times optionally with an increasing backoff delay.

@stevengj
Copy link
Member Author

Can we call a Win32 function on Windows and let the OS deal with it? SHFileOperationW is supposed to be able to recursively delete directories.

@vtjnash
Copy link
Member

vtjnash commented Oct 25, 2022

I am uncertain if we load shell32.dll (I think libuv does), but it should just be delegating to our same underlying Win32 calls

@giordano
Copy link
Contributor

giordano commented Oct 25, 2022

For what is worth, I recently had this problem outside of Julia. On an x86_64 linux system I got

$ rm -rf directory/
rm: cannot remove 'directory/.spack-env': Directory not empty

What I believe has happened in my case is that a rogue spack process (which I thought I had killed but I failed to do so) was keeping writing to the directory/.spack-env directory while I was trying to delete it, thus causing the surprising error message.

What I want to say here is that if anything Julia isn't any worse than Coreutils rm.

@stevengj
Copy link
Member Author

@giordano, we aren't talking about rogue processes here. Even if the directory is completely static, Julia's rm is sometimes failing on Windows with large directory trees, because Windows apparently takes some time to realize that a directory is empty when you tell it to delete files.

@mestinso
Copy link
Contributor

I've been facing this issue recently in a corporate windows environment. I'm confident it's antivirus/security related. I've experienced it in two places:

  1. PythonCall environment cleanup at the end of running runtests.jl in a package that uses PythonCall
  2. rm operation that is issued as part of a cleanup process when using when using FMI.jl toolbox

Not sure if there is anything I can do to help/expedite resolution/workaround, but I'd certainly offer to test any potential solutions.

@mestinso
Copy link
Contributor

Does anybody have any update or workaround for this issue?

Also, does anybody know if this PR potentially helps or interacts: #50842 ? In my personal testing and use cases, I still (although inconsistently....makes me think potentially some race condition) seem to be getting the issue, but potentially the PR didn't go far enough as it only covered the force=true only cases?

@stevengj
Copy link
Member Author

stevengj commented Dec 3, 2023

@mestinso, I don't think this has anything to do with #50842. The issue here is that a single process has trouble recursively deleting a directory on Windows, apparently because of some quirks in the Windows filesystem.

I still think we should try to just call SHFileOperationW or some similar function, since presumably Microsoft has figured out how to delete directories on their own operating system. But @StefanKarpinski's suggestion of just doing a stupid retry method has the merit of being simple(?).

@p-foresman
Copy link

Just a heads up, I'm seeing this issue running Julia 1.10 on Linux CentOs 7.

@stevengj
Copy link
Member Author

stevengj commented Dec 28, 2023

@p-foresman, without more information, I’m skeptical it’s the same issue. (There are legitimate reasons why you might get “directory not empty” errors, and the specific issue here seems Windows-specific thus far.)

@KristofferC
Copy link
Member

Not sure this is related but I got this on Windows

julia> rm(".julia/compiled/v1.11"; recursive=true)
ERROR: IOError: rm(".julia/compiled/v1.11\\Accessors"): directory not empty (ENOTEMPTY)
Stacktrace:
 [1] uv_error
   @ Base .\libuv.jl:100 [inlined]
 [2] rm(path::String; force::Bool, recursive::Bool)
   @ Base.Filesystem .\file.jl:307

when I had another Windows Julia session opened that kept the shared library in that folder open in the process.

@StefanKarpinski
Copy link
Member

Yeah, unfortunately, Windows just won't let you delete something that's opened by some other process.

@stevengj
Copy link
Member Author

stevengj commented Jan 9, 2024

See also this stackoverflow thread — you can apparently mark the file to be deleted on the next reboot 😆 .

@StefanKarpinski
Copy link
Member

Amazing 😂😭

@vtjnash
Copy link
Member

vtjnash commented Jan 10, 2024

Often you can just rename it too, as the actual lock preventing deletion is often on the name, not the file handle

@tlnagy
Copy link
Contributor

tlnagy commented Jan 12, 2024

I'm wondering if this is related to EBUSY errors when trying to remove a whole directory on Windows:

rm(absolute_root; force=true, recursive=true) 

(xref: JuliaDocs/DemoCards.jl#160)

Full output from the Github windows-latest runner:

[ Info: Clean up DemoCards build dir: "..\figures"
ERROR: LoadError: IOError: rm("D:\\a\\Nagy_2023_SwellMigration\\Nagy_2023_SwellMigration\\site\\src\\figures\\Notebooks"): resource busy or locked (EBUSY)
Stacktrace:
 [1] uv_error
   @ Base .\libuv.jl:100 [inlined]
 [2] rm(path::String; force::Bool, recursive::Bool)
   @ Base.Filesystem .\file.jl:307
 [3] rm(path::String; force::Bool, recursive::Bool)
   @ Base.Filesystem .\file.jl:294
 [4] rm
   @ .\file.jl:273 [inlined]
 [5] (::DemoCards.var"#113#118"{String, String, String, String})()
   @ DemoCards C:\Users\runneradmin\.julia\packages\DemoCards\Oz6IE\src\generate.jl:209
 [6] top-level scope
   @ D:\a\Nagy_2023_SwellMigration\Nagy_2023_SwellMigration\site\make.jl:47
in expression starting at D:\a\Nagy_2023_SwellMigration\Nagy_2023_SwellMigration\site\make.jl:47
GKS: could not find font bold.ttf
Error: Process completed with exit code 1.

@stevengj
Copy link
Member Author

@tlnagy if calling rm(absolute_root; force=true, recursive=true) repeatedly does not resolve your problem, then it's unrelated.

The issue here is recursive deletion of directories where nothing is locked/busy, but it still fails because of a race condition in the Windows filesystem.

@vtjnash
Copy link
Member

vtjnash commented Jan 13, 2024

FWIW, the race condition usually is not in the Windows filesystem, but rather that the NT kernel currently makes it impossible to implement a reliable virus scanner, but everyone runs a virus scanner anyways.

@inkydragon
Copy link
Member

Same issue in Rust: rust-lang/rust#29497
May learn something from: https://github.com/XAMPPRocky/remove_dir_all

For Windows an implementation that handles the locking of directories that occurs when deleting directory trees rapidly.

@vtjnash
Copy link
Member

vtjnash commented Jan 19, 2024

That functionality hasn't been present in remove_dir_all since it was deleted in XAMPPRocky/remove_dir_all@61c03eb#diff-a4c907d91078072b617381605d6ded19c006a1934007d2607c73ebde5f7fd6bcL39-L41 (see the modified comments in that commit), but the documentation and README wasn't updated

But the rust stdlib now uses the POSIX_SEMANTICS flag, when supported by the OS and filesystem, to probably fix this: rust-lang/rust@5ab67bf#diff-e8df55f38a9a224cf1cfd40e6c535535aa66e8073cc8d9b959308659ba1de1f9R564-R592

Seems like at least half of this bug might be in the implementation of DeleteFile on Win32 being bad, so that calling the underlying API instead is a bit more reliable? (DeleteFile sets the FILE_SHARE_DELETE flag, which delays deletion, while the underlying API can be called without that flag)

FWIW, this has all been known for at least a few years, but waiting for some developer to care enough about Windows to use the new kernel APIs: libuv/libuv#3839

@vtjnash
Copy link
Member

vtjnash commented Jan 19, 2024

@arlowhite
Copy link

arlowhite commented Apr 2, 2024

I was encountering this "directory not empty" error with pkg instantiate in Julia 1.10
when trying to rm("C:\\Users\\USER\\.julia\\artifacts\\jl_vxDd76\\bin")
I do have Cylance running (antivirus), possibly a factor?

Switching to Julia 1.11 fixed it! currently 1.11.0-alpha2+0.x64.w64.mingw32

possibly fixed by #53456 (though I did not confirm if this commit is in the 1.11 build)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior system:windows Affects only Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.