Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifact folder could not be made on Windows. #3822

Closed
KnutAM opened this issue Mar 4, 2024 · 20 comments · Fixed by #4001
Closed

Artifact folder could not be made on Windows. #3822

KnutAM opened this issue Mar 4, 2024 · 20 comments · Fixed by #4001

Comments

@KnutAM
Copy link
Contributor

KnutAM commented Mar 4, 2024

When running Pkg.add("CairoMakie") in an empty project I'm getting this, Error: "C:\\Users\\meyer\\.julia\\artifacts\\a8244d6d23cbb895fcd39dd3eddb859a0c05d1c6" could not be made, which I seems to occur as the atomic rename fails after #3768.

However, before getting there, I hit JuliaLang/julia#34700, but that could be due to a failure related to above, but the rm is in a final block...

To "fix" the problem, I first hacked Base.rm

function Base.rm(path::String; force::Bool=false, recursive::Bool=false)
   if !(force && recursive)
       return invoke(Base.rm, Tuple{AbstractString}, path; force, recursive)
   else
       max_attempts = 3
       attempts = 0
       while attempts < max_attempts
           attempts += 1
           try
               invoke(Base.rm, Tuple{AbstractString}, path; recursive=true, force=true)
           catch err
               if isa(err, Base.IOError) && attempts < max_attempts
                   println("Trying again for \"", path, "\"")
                   attempts == (max_attempts - 1) && sleep(1.0)
                   continue
               else
                   println("Failed for \"", path, "\"")
                   rethrow(err)
               end
           end
       end
   end
end

(but this could just as well be done in create_artifact)

And modified _mv_temp_artifact_dir (which I couldn't hack since the dispatch is concrete...)

function _mv_temp_artifact_dir(temp_dir::String, new_path::String)::Nothing
    if !isdir(new_path)
        # This next step is like
        # `mv(temp_dir, new_path)`.
        # However, `mv` defaults to `cp` if `rename` returns an error.
        # `cp` is not atomic, so avoid the potential of calling it.
        err = ccall(:jl_fs_rename, Int32, (Cstring, Cstring), temp_dir, new_path)
        # Ignore rename error, but ensure `new_path` exists.
        if !isdir(new_path)
            println("Just do cp 💀")
            mv(temp_dir, new_path)
        end
        if !isdir(new_path)
            error("$(repr(new_path)) could not be made")
        end
        chmod(new_path, filemode(dirname(new_path)))
        set_readonly(new_path)
    end
    nothing
end

Just do cp 💀 is printed twice, along with a couple of Trying again for ....
In fact, it seems like the x264_jll and Pixman_jll are the ones that needs retries, not sure if by chance or anything special about these.

Strangely, trying many times, it seemed to work, but fails again if I empty the artifacts folder...

Versioninfo
Julia Version 1.10.2
Commit bd47eca2c8 (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =
@KnutAM
Copy link
Contributor Author

KnutAM commented Mar 4, 2024

Update, perhaps not unsuprisingly, it seems to have been an anti-virus issue (using a different, non-centrally administrated anti-virus solved the problem).
Even so, perhaps adding some fix with retry (or even if using Base.mv could be made safe) could make sense? Having these issues could be a reason for first-time users to give up on julia?

@KristofferC
Copy link
Member

Having these issues could be a reason for first-time users to give up on julia?

Perhaps, but at the same time it feels fruitless to fight anti viruses since they sit at a different privilege level and can just block anything they feel like.

@KnutAM
Copy link
Contributor Author

KnutAM commented Mar 4, 2024

What about providing a hint to windows users to check if their anti-virus could be interfering then?

@nhz2
Copy link
Contributor

nhz2 commented Mar 4, 2024

This post seems to be about a similar issue https://discourse.julialang.org/t/installation-of-nodejs-fails-directory-not-empty/111117

It might be good to put the rm at src\Artifacts.jl:376 in a try block. If the rm fails there will be some extra junk in the artifact directory, but no serious problems. I initially didn't do that because I thought rm failing was a very rare event.

If you run Pkg.add("CairoMakie") again after it fails it should hopefully make progress. Even if there is an error, the artifact folder might still be in a valid state.

@nhz2
Copy link
Contributor

nhz2 commented Mar 5, 2024

Oh, I see, the real issue is jl_fs_rename failing. Would sleeping and multiple attempts of calling jl_fs_rename fix the issue?

@IanButterworth
Copy link
Member

As @vtjnash points out, it may be possible to change libuv to using posix rename on Windows. Posix rm on windows is ready and tested with julia libuv/libuv#4318

@vtjnash
Copy link
Member

vtjnash commented Mar 5, 2024

This seems a common misconception, so just note that that comment does not reflect what atomic rename is documented to mean on Unix or anywhere else either. The atomic only refers to the new file name and contents, but does not require the old name to be deleted simultaneously

   If newpath already exists, it will be atomically replaced, so
   that there is no point at which another process attempting to
   access newpath will find it missing.  However, there will
   probably be a window in which both oldpath and newpath refer to
   the file being renamed.

@nhz2
Copy link
Contributor

nhz2 commented Mar 5, 2024

When renaming directories instead of files, does that mean oldpath and newpath can refer to the same directory shortly after rename succeeds?

@vtjnash
Copy link
Member

vtjnash commented Mar 5, 2024

No, most file systems usually won't permit a hard link to be created to a directory

@KnutAM
Copy link
Contributor Author

KnutAM commented Mar 5, 2024

The atomic only refers to the new file name and contents, but does not require the old name to be deleted simultaneously

So in that case, it should be fine/equivalent to call cp instead of using jl_fs_rename, since this would only occur if new_path doesn't exists?

And in that case, most likely this problem would be solved? Unfortunately, I don't have time to re-install the old antivirus to test that now...

@mzaffalon
Copy link

mzaffalon commented Mar 5, 2024

This post seems to be about a similar issue https://discourse.julialang.org/t/installation-of-nodejs-fails-directory-not-empty/111117

The fix suggested in #3822 (comment) (the rm part only) did not work for me. I deleted the artifacts directory and reinstalled without problems.

EDIT: after rereading the OP, the error I got with the hacked rm was the same.

@nhz2
Copy link
Contributor

nhz2 commented Mar 5, 2024

The issue with using cp is that it is not atomic for directories. It makes the destination directory and then copies the files one at a time (if I understand the code in base/file.jl correctly). If you have a huge artifact and quit julia while it is doing the cp you will end up with an invalid artifact directory.

@vtjnash
Copy link
Member

vtjnash commented Mar 5, 2024

Yeah, it feels like someone really needs to rewrite a lot of that code in Filesystem better. It is currently intended to replicate the mv command with force instead of --clobber / -n, which has this non-atomic behavior, as documented in that man page.

For example, we could try to be inspired more by the rsync API, which is a general purpose tool for both mv, copy, and delete, atomically, given various combinations of --delay-updates, -a, and --remove-source-files options. The Filesystem module doesn't need most of the options--that can be left to a package--but better care with these options would be quite welcome for being able to deal with cases like this better.

@nhz2
Copy link
Contributor

nhz2 commented Mar 5, 2024

@mzaffalon @KnutAM
Do you think the example error message in the PR #3827 will help others workaround similar issues?

I don't know how to test this in a real-world scenario because I don't have a problematic anti-virus installed.

But if you want to test this out https://github.com/JuliaLang/Pkg.jl?tab=readme-ov-file#using-the-development-version-of-pkgjl has instructions for using the development version of Pkg.jl. You will also need the nightly version of julia https://julialang.org/downloads/nightlies/

@mzaffalon
Copy link

@nhz2 I still had the old artifacts directory on my computer.

I gave it a try with the nightly version and your instructions for the development version of Pkg.jl: no errors during installation with an empty artifacts. I redid the same operation using the old artifacts, still no error messages.

I removed the directory 926... from the old artifacts, tried to install NodeJS with Julia v1.10.2: again no error messages. So it seems I cannot reproduce the error even with the release version.

@KnutAM
Copy link
Contributor Author

KnutAM commented Mar 6, 2024

I don't know how to test this in a real-world scenario because I don't have a problematic anti-virus installed.

I don't know exactly how the antivirus is interferring, but perhaps it is possible to reproduce the error by opening one of the files that are attempted to be renamed?

@nhz2
Copy link
Contributor

nhz2 commented Mar 6, 2024

I found a way to reproduce the error on Linux, but it requires sudo so try this at your own risk. After running this I had to manually move my artifact folder to trash.

The following Julia code continuously scans through the artifact folder, making any temporary directories immutable.

julia> d = joinpath(DEPOT_PATH[1], "artifacts");

julia> while true
           sleep(0.1)
           ds = readdir(d)
           for i in findall(startswith("jl_"), ds)
               t = joinpath(d, ds[i])
               run(`sudo chattr +i $(t)`)
           end
       end

While this is running, if I try to add NodeJS in 1.10 I get:

(@v1.10) pkg> add NodeJS
   Resolving package versions...
  Downloaded artifact: JpegTurbo
ERROR: IOError: rm("/home/nathan/.julia/artifacts/jl_39jFUR"): operation not permitted (EPERM)
Stacktrace:
...

In the PR I get:

(jl_Fa2kB2) pkg> add NodeJS
   Resolving package versions...
┌ Warning: Failed to clean up temporary directory "/home/nathan/.julia/artifacts/jl_x8Yk1d"
│   exception = IOError: rm("/home/nathan/.julia/artifacts/jl_x8Yk1d"): operation not permitted (EPERM)
└ @ Pkg.Artifacts ~/github/Pkg.jl/src/Artifacts.jl:385
┌ Warning: Failed to clean up temporary directory "/home/nathan/.julia/artifacts/jl_CqRuTl"
│   exception = IOError: rm("/home/nathan/.julia/artifacts/jl_CqRuTl"): operation not permitted (EPERM)
└ @ Pkg.Artifacts ~/github/Pkg.jl/src/Artifacts.jl:385
ERROR: Unable to automatically download/install artifact 'nodejs_app' from sources listed in '/home/nathan/.julia/packages/NodeJS/LntTk/Artifacts.toml'.
Sources attempted:
- https://pkg.julialang.org/artifact/9c278c61d6242d19deca58e582fc6a6f0a727de8
    Error: SystemError: opening file "/home/nathan/.julia/artifacts/jl_x8Yk1d/CHANGELOG.md": Operation not permitted
- https://github.com/davidanthoff/NodeJSBuilder/releases/download/v18.16.0%2B0/NodeJS-18.16.0+0-x86_64-linux-gnu.tar.gz
    Error: SystemError: opening file "/home/nathan/.julia/artifacts/jl_CqRuTl/CHANGELOG.md": Operation not permitted

Stacktrace:
...

@xlxs4
Copy link

xlxs4 commented Jul 29, 2024

Manually renaming the artifact folder to the hash is another fix for anyone getting hit by this (e.g. mv .\jl_1vF2ic\ 8043c72c48288c74e7f13c0c4aecbd239ef872bb\)

@nhz2
Copy link
Contributor

nhz2 commented Jul 29, 2024

If the rename was retried after sleeping for some increasing amount of time, would that solve this issue, or does the anti-virus always stop Julia from doing a rename?

@xlxs4
Copy link

xlxs4 commented Aug 27, 2024

I think there's a related issue with

Base.rm(path; recursive=true, force=true)
, where interference makes Pkg report ENOENT when trying to remove artifact folders for GC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants