Skip to content

Commit

Permalink
Implement FlowSample on top of AxisArrays (#15)
Browse files Browse the repository at this point in the history
* Rewriting of tests and addition of indexing interface tests

The existing tests have been rewritten to take their test data from
local files instead of attempting to download them --- the urls were
broken.

The large file test has been deleted.

New tests have been added for:
  - the key-based indexing of channels, which is existing
  functionality
  - the key-based indexing of multiple channels, which is to be
  implemented
  - the integer indexing of individual samples, which is to be
  implemented
  - the collection of integer indexing of multiple samples, which is
  to be implemented

* Implements different kinds of indexing for FlowSample

Multiple channels can be indexed using a Vector of Strings
Individual samples can be indexed using an Integer
Multiple samples can be indexed using a Vector of Integers

* Changing course to wrap AxisArray with FlowSample

I realised I was just rewriting the implementation of AxisArray. This
commit introduces the tests for a new FlowSample struct which is a
simple wrapper for AxisArray with some params attached.

AxisArray performance is supposed to be good, the interface doesn't
have to change, and they can easily be converted to matrices as needed
by GigaSOM.

* Wrapped the AxisArray in FlowSample

These changes add lots of indexing options for FlowSamples.

I have tried to keep backwards compatibility with the old FlowSample,
I think we have it, but cannot be absolutely certain.

* Change range call in testsuite to be Julia 1.6 compatible

* CompatHelper: add new compat entry for "AxisArrays" at version "0.4"

* Requested changes

The axes of the FlowSample data are now called :param and :event

size calls refactored to include dimension

version bumped

Base.axes redefined to call out to AxisArrays versions

Tests for large file added back in

* Deletes mistakenly commited HTML file

* Tests for iteration

Iterate just forwards the method to AxisArrays

* Quick fix of missing keyword in runtests.jl

* Deprecates params and data

A warning is emitted when the user tries to access params or data
directly.

Tests for these warnings are also added.

* Adds dot syntax access to FCS keywords in params

FCS TEXT segment keywords can now be accessed using the dot
syntax. E.g. `flowrun.par` retrieves the "\$PAR" keyword value.

Dot syntax access to `data` and `params` throws deprecation warnings.

`Base.propertynames` produces a list of the keywords for the
`FlowSample`.

This commit adds tests for these features also.

* Add AxisArrays.axisnames definition

* Fix compatibility with julia 1.6 regex

* Add @static to tests for testing getproperty in julia <v1.8

* Update README with indexing examples

* Increment major version

* Code formatting README

* Fix version number mistake

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
  • Loading branch information
lgrozinger and github-actions[bot] authored Nov 2, 2022
1 parent 3663d41 commit 5c28a71
Show file tree
Hide file tree
Showing 6 changed files with 405 additions and 25 deletions.
4 changes: 3 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
name = "FCSFiles"
uuid = "d76558cf-badf-52d4-a17e-381ab0b0d937"
version = "0.1.5"
version = "0.2.0"

[deps]
AxisArrays = "39de3d68-74b9-583c-8d2d-e117c070f3a9"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"

[compat]
AxisArrays = "0.4"
FileIO = "1"
julia = "1"
137 changes: 135 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ Add FileIO.jl integration for FCS files
|--------------------------------------------------|--------------|
| ![](https://juliahub.com/docs/FCSFiles/version.svg) | [![][ci-img]][ci-url] [![][codecov-img]][codecov-url] |

## Usage
## Loading an FCSFile
FCS files can be loaded by using the FileIO interface.

```julia
julia> using FileIO
Expand All @@ -27,14 +28,146 @@ FCS.FlowSample{Float32}
SSC-W
B_530-30-A
Time
```

## Metadata
Once loaded the parameters of the FCS file are available as properties.

```
julia> flowrun.last_modified
"2019-Oct-03 15:35:15"
julia> flowrun.p1n
"FSC-A"
```

## Indexing
There are many ways to index into the FCS file. You can index the FCS file as a matrix (actually an `AxisArray`).

```
julia> flowrun[:, 1]
1-dimensional AxisArray{Float32,1,...} with axes:
:param, ["FSC-A", "FSC-H", "SSC-A", "SSC-H", "B1-A", "B1-H", "B2-A", "B2-H", "HDR-CE", "HDR-SE" … "V2-A", "V2-H", "Y1-A", "Y1-H", "Y2-A", "Y2-H", "Y3-A", "Y3-H", "Y4-A", "Y4-H"]
And data, a 23-element Vector{Float32}:
19.319384
12.838199
44.391308
20.214031
0.01834727
0.72980446
-0.25282443
0.4430968
0.54869235
-0.027989198
0.48970717
4.498265
5.900927
0.02512901
0.3956769
```

This retrieves the values of all the parameters for the first event in the FCS file.

Similarly you can get the values of a single parameter for all events.

```
julia> flowrun[1, :]
1-dimensional AxisArray{Float32,1,...} with axes:
:event, 1:83562
And data, a 83562-element Vector{Float32}:
19.319384
22.961153
36.157864
30.91769
5.644829
14.188097
34.42944
4.4080987
23.391977
-4.813841
-1.2413055
11.075016
13.712906
23.54529
5.740017
```

You can also take ranges of events.

```
julia> flowrun[1, end-99:end]
1-dimensional AxisArray{Float32,1,...} with axes:
:event, 83463:83562
And data, a 100-element Vector{Float32}:
4.576562
2.553804
10.608879
-6.4025674
-18.626959
6.1649327
24.049818
21.735662
23.391977
-4.813841
-1.2413055
11.075016
13.712906
23.54529
5.740017
```

If you know the name of a parameter you can use that name to index.

```
julia> flowrun["FSC-A"]
1-dimensional AxisArray{Float32,1,...} with axes:
:event, 1:83562
And data, a 83562-element Vector{Float32}:
19.319384
22.961153
36.157864
30.91769
5.644829
14.188097
34.42944
4.4080987
23.391977
-4.813841
-1.2413055
11.075016
13.712906
23.54529
5.740017
```

Or you can get multiple parameters at the same time.

```
julia> flowrun[["FSC-A", "FSC-H"]]
2-dimensional AxisArray{Float32,2,...} with axes:
:param, ["FSC-A", "FSC-H"]
:event, 1:83562
And data, a 2×83562 Matrix{Float32}:
19.3194 22.9612 36.1579 30.9177 … 11.075 13.7129 23.5453 5.74002
12.8382 3.40729 17.4995 14.0875 8.80171 5.29686 13.0893 11.3576
```

In general, any indexing that works with `AxisArray`s should work the same with FCS files.

## Plotting
Here is an example which constructs a 2D histogram visualisation of a FCS file.

```
julia> using Gadfly
julia> p = plot(x=flowrun["FSC-A"], y=flowrun["SSC-A"], Geom.histogram2d,
Guide.xlabel("FSC-A"), Guide.ylabel("SSC-A"), Coord.cartesian(xmin=0, ymin=0))
julia> draw(PNG("example.png", 10cm, 7cm, dpi=300), p)

```

![](example.png)
Expand Down
2 changes: 2 additions & 0 deletions src/FCSFiles.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
module FCSFiles

using FileIO
using AxisArrays
const axes = Base.axes

include("type.jl")
include("utils.jl")
Expand Down
8 changes: 5 additions & 3 deletions src/parse.jl
Original file line number Diff line number Diff line change
Expand Up @@ -91,11 +91,13 @@ function parse_data(io,
# data should be in multiples of `n_params` for list mode
(mod(length(flat_data), n_params) != 0) && error("FCS file is corrupt. DATA and TEXT sections don't match.")

data = Dict{String, Vector{dtype}}()
datamatrix = Matrix{dtype}(undef, n_params, length(flat_data) ÷ n_params)
rows = Vector{String}(undef, n_params)

for i in 1:n_params
data[text_mappings["\$P$(i)N"]] = flat_data[i:n_params:end]
rows[i] = text_mappings["\$P$(i)N"]
datamatrix[i, :] = flat_data[i:n_params:end]
end

data = AxisArray(datamatrix, Axis{:param}(rows), Axis{:event}(1:size(datamatrix, 2)))
FlowSample(data, text_mappings)
end
71 changes: 61 additions & 10 deletions src/type.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
struct FlowSample{T}
data::Dict{String, Vector{T}}
struct FlowSample{T<:Number, I<:AbstractVector{Int}}
data::AxisArray{T, 2, Matrix{T}, Tuple{Axis{:param, Vector{String}}, Axis{:event, I}}}
params::Dict{String, String}
end

Expand Down Expand Up @@ -32,11 +32,62 @@ function Base.show(io::IO, f::FlowSample)
end
end

# Implement most important parts of Dict interface
Base.length(f::FlowSample) = length(f.data)
Base.haskey(f::FlowSample, x) = haskey(f.data, x)
Base.getindex(f::FlowSample, key) = f.data[key]
Base.keys(f::FlowSample) = keys(f.data)
Base.values(f::FlowSample) = values(f.data)
Base.iterate(iter::FlowSample) = Base.iterate(iter.data)
Base.iterate(iter::FlowSample, state) = Base.iterate(iter.data, state)
"""
Looks for `s` in the `params` dict.
`s` is searched for both as a FCS standard keyword then as a user-defined keyword, with precendence given to the standard keywords. E.g. `param_lookup(flowrun, "par")` will look for both `"\$PAR"` and `"PAR"` but return `"\$PAR"` if it exists, otherwise `"PAR"`.
In accordance with the FCS3.0 standard, the search is cas insensitive.
If no match is found, `nothing` is returned.
"""
function param_lookup(f::FlowSample, s::AbstractString)
s = uppercase(s)
params = getfield(f, :params)

result = get(params, startswith(s, "\$") ? s : "\$" * s, nothing)

return result === nothing ? get(params, s, nothing) : result
end

function Base.getproperty(f::FlowSample, s::Symbol)
if s == :params
Base.depwarn("`flowrun.params` is deprecated and will be removed in a future release. Parameters can be accessed like any other member variable. E.g. `flowrun.par` or `flowrun.PAR`.", "flowrun.params")
elseif s == :data
Base.depwarn("`flowrun.data` is deprecated and will be removed in a future release. The data can be indexed, e.g. `flowrun[\"SSC-A\"]` or can be obtained as a matrix with `Array(flowrun)`.", "flowrun.data")
end

value = param_lookup(f, String(s))

if value === nothing
getfield(f, s)
else
value
end
end

function Base.propertynames(f::FlowSample, private::Bool=false)
makesym(x) = Symbol.(lowercase(first(match(r"^\$?(.+)", x).captures)))
names = makesym.(keys(getfield(f, :params)))

if private
append!(names, fieldnames(FlowSample))
end
names
end

Base.size(f::FlowSample) = size(getfield(f, :data))
Base.size(f::FlowSample, dim::Int) = size(f)[dim]
Base.length(f::FlowSample) = size(f, 1)

Base.keys(f::FlowSample) = getfield(f, :data).axes[1]
Base.haskey(f::FlowSample, x) = x in keys(f)
Base.values(f::FlowSample) = [getfield(f, :data)[key] for key in keys(f)]

Base.axes(f::FlowSample, args...) = AxisArrays.axes(getfield(f, :data), args...)
Base.getindex(f::FlowSample, args...) = getindex(getfield(f, :data), args...)
Base.iterate(iter::FlowSample) = iterate(getfield(iter, :data))
Base.iterate(iter::FlowSample, state) = iterate(getfield(iter, :data), state)
Base.Array(f::FlowSample) = Array(getfield(f, :data))

AxisArrays.axisnames(f::FlowSample) = axisnames(getfield(f, :data))
Loading

2 comments on commit 5c28a71

@tlnagy
Copy link
Owner

@tlnagy tlnagy commented on 5c28a71 Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/71512

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.2.0 -m "<description of version>" 5c28a71e684f53b0833cf7f4b153762b1f760ab1
git push origin v0.2.0

Please sign in to comment.