Skip to content

Commit

Permalink
misc updates (#23)
Browse files Browse the repository at this point in the history
  • Loading branch information
CarloLucibello authored Dec 19, 2024
1 parent 948de74 commit 6ee754b
Show file tree
Hide file tree
Showing 14 changed files with 113 additions and 189 deletions.
7 changes: 7 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
23 changes: 12 additions & 11 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
matrix:
version:
- '1.9'
- '1' # add back when 1.10 is out
- '1'
- 'nightly'
os:
- ubuntu-latest
Expand All @@ -44,18 +44,19 @@ jobs:
name: Documentation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
- uses: actions/checkout@v4.2.1
- uses: julia-actions/setup-julia@v2
with:
version: '1'
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-docdeploy@v1
- run: |
julia --project=docs -e '
using Pkg
Pkg.develop(PackageSpec(path=pwd()))
Pkg.instantiate()'
- run: julia --color=yes --project=docs docs/make.jl
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }}
- run: |
julia --project=docs -e '
using Documenter: DocMeta, doctest
using HuggingFaceDatasets
DocMeta.setdocmeta!(HuggingFaceDatasets, :DocTestSetup, :(using HuggingFaceDatasets); recursive=true)
doctest(HuggingFaceDatasets)'
JULIA_CONDAPKG_OPENSSL_VERSION: "ignore"


7 changes: 6 additions & 1 deletion .github/workflows/CompatHelper.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
run: which julia
continue-on-error: true
- name: Install Julia, but only if it is not already available in the PATH
uses: julia-actions/setup-julia@v1
uses: julia-actions/setup-julia@v2
with:
version: '1'
arch: ${{ runner.arch }}
Expand All @@ -41,5 +41,10 @@ jobs:
shell: julia --color=yes {0}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# This repo uses Documenter, so we can reuse our [Documenter SSH key](https://documenter.juliadocs.org/stable/man/hosting/walkthrough/).
# If we didn't have one of those setup, we could configure a dedicated ssh deploy key `COMPATHELPER_PRIV` following https://juliaregistries.github.io/CompatHelper.jl/dev/#Creating-SSH-Key.
# Either way, we need an SSH key if we want the PRs that CompatHelper creates to be able to trigger CI workflows themselves.
# That is because GITHUB_TOKEN's can't trigger other workflows (see https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#using-the-github_token-in-a-workflow).
# Check if you have a deploy key setup using these docs: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/reviewing-your-deploy-keys.
COMPATHELPER_PRIV: ${{ secrets.DOCUMENTER_KEY }}
# COMPATHELPER_PRIV: ${{ secrets.COMPATHELPER_PRIV }}
2 changes: 1 addition & 1 deletion .github/workflows/TagBot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
workflow_dispatch:
inputs:
lookback:
default: 3
default: "3"
permissions:
actions: read
checks: read
Expand Down
9 changes: 2 additions & 7 deletions CondaPkg.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
channels = ["conda-forge"]

[deps]
# h5py = ""
# pillow = ">=9.1, <10"
# pyarrow = "==6.0.0"
datasets = ">=2.12, <3"
numpy = ">=1.20, <2"
datasets = ">=3.0, <4"
numpy = ">=2.0, <3"
pillow = ""

2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ PythonCall = "6099a3de-0909-46bc-b1f4-468b9a2dfc0d"

[compat]
CondaPkg = "0.2"
DLPack = "0.1"
DLPack = "0.3"
ImageCore = "0.9, 0.10"
MLUtils = "0.4.1"
PythonCall = "0.9"
Expand Down
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,31 +23,33 @@ HuggingFaceDatasets.jl provides wrappers around types from the `datasets` python
Check out the [examples/](https://github.com/JuliaGenAI/HuggingFaceDatasets.jl/tree/main/examples) folder for usage examples.

```julia
julia> using HuggingFaceDatasets

julia> train_data = load_dataset("mnist", split = "train")
Dataset({
features: ['image', 'label'],
num_rows: 60000
})

# Indexing starts with 1.
# Python types are returned by default.
julia> train_data[1]
Python: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x7F04DE661CD0>, 'label': 5}
Python: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x3340B0290>, 'label': 5}

julia> length(train_data)
60000

# Now we set the julia format
julia> train_data = load_dataset("mnist", split = "train").with_format("julia");

# Returned observations are now julia objects
julia> train_data[1]
julia> train_data[1] # Returned observations are now julia objects
Dict{String, Any} with 2 entries:
"label" => 5
"image" => Gray{N0f8}[Gray{N0f8}(0.0) Gray{N0f8}(0.0) Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) Gray{N0f8}(0.0) Gray{N0f8}(0.0); ; Gray{N0f8}(0.0) Gray{N0f8}(0.0) ……
"image" => Gray{N0f8}[0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; ; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0]

julia> train_data[1:2]
Dict{String, Vector} with 2 entries:
"label" => [5, 0]
"image" => ReinterpretArray{Gray{N0f8}, 2, UInt8, Matrix{UInt8}, false}[[Gray{N0f8}(0.0) Gray{N0f8}(0.0) Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) Gray{N0f8}(0.0) Gra
"image" => ReinterpretArray{Gray{N0f8}, 2, UInt8, Matrix{UInt8}, false}[[0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; ; 0
```
## Troubleshooting
- If having problems in resolving the CondaPkg environment, try to set `ENV["JULIA_CONDAPKG_OPENSSL_VERSION"] = true`before loading the package. See more details [here](https://github.com/JuliaPy/CondaPkg.jl?tab=readme-ov-file#preferences)
100 changes: 0 additions & 100 deletions docs/Manifest.toml

This file was deleted.

1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ makedocs(;
),
pages=[
"Home" => "index.md",
"API" => "api.md",
],
)

Expand Down
11 changes: 4 additions & 7 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
# API

## Index

```@index
Pages = ["api.md"]
```@meta
CurrentModule = HuggingFaceDatasets
CollapsedDocStrings = true
```

## Docs
# API

```@autodocs
Modules = [HuggingFaceDatasets]
Expand Down
30 changes: 17 additions & 13 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,26 +26,30 @@ HuggingFaceDatasets.jl provides wrappers around types from the `datasets` python
Check out the `examples/` folder for usage examples.

```julia
# Returned observations are now julia objects
julia> using HuggingFaceDatasets

julia> train_data = load_dataset("mnist", split = "train")
Dataset(<py Dataset({
Dataset({
features: ['image', 'label'],
num_rows: 60000
})>, identity)
})

# Indexing starts with 1.
# By defaul, python types are returned.
julia> train_data[1]
Python dict: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x2B64E2E90>, 'label': 5}
Python: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x3340B0290>, 'label': 5}

julia> set_format!(train_data, "julia")
Dataset(<py Dataset({
features: ['image', 'label'],
num_rows: 60000
})>, HuggingFaceDatasets.py2jl)
julia> length(train_data)
60000

# Now we have julia types
julia> train_data[1]
julia> train_data = load_dataset("mnist", split = "train").with_format("julia");

julia> train_data[1] # Returned observations are now julia objects
Dict{String, Any} with 2 entries:
"label" => 5
"image" => UInt8[0x00 0x00 0x00 0x00; 0x00 0x00 0x00 0x00; ; 0x00 0x00 0x00 0x00; 0x00 0x00 0x00 0x00]
"image" => Gray{N0f8}[0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; ; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0]

julia> train_data[1:2]
Dict{String, Vector} with 2 entries:
"label" => [5, 0]
"image" => ReinterpretArray{Gray{N0f8}, 2, UInt8, Matrix{UInt8}, false}[[0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; ; 0
```
5 changes: 3 additions & 2 deletions src/HuggingFaceDatasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module HuggingFaceDatasets
using PythonCall
using MLUtils: getobs, numobs
import MLUtils
using DLPack
using DLPack: DLPack
using ImageCore

const datasets = PythonCall.pynew()
Expand Down Expand Up @@ -37,8 +37,9 @@ include("load_dataset.jl")
export load_dataset

function __init__()
ENV["JULIA_CONDAPKG_OPENSSL_VERSION"] = "ignore"
# Since it is illegal in PythonCall to import a python module in a module, we need to do this here.
# https://cjdoris.github.io/PythonCall.jl/dev/pythoncall-reference/#PythonCall.pycopy!
# https://juliapy.github.io/PythonCall.jl/dev/pythoncall-reference/#PythonCall.Core.pycopy!
PythonCall.pycopy!(datasets, pyimport("datasets"))
PythonCall.pycopy!(PIL, pyimport("PIL"))
pyimport("PIL.PngImagePlugin")
Expand Down
Loading

0 comments on commit 6ee754b

Please sign in to comment.