Skip to content

Sequel extension for querying large datasets in batches

License

Notifications You must be signed in to change notification settings

umbrellio/sequel-batches

Repository files navigation

Sequel::Batches    Gem Version Build Status Coverage Status

This dataset extension provides the #in_batches method. The method splits dataset in parts and yields it.

Note: currently only PostgreSQL database is supported.

Installation

Add this line to your application's Gemfile:

gem 'sequel-batches'

Usage

In order to use the feature you should enable the extension:

DB.extension(:batches)

After that the #in_batches method becomes available on dataset:

User.where(role: "admin").in_batches(of: 4) do |ds|
  ds.delete
end

Finally, here's an example including all the available options:

options = {
  of: 4,
  pk: [:project_id, :external_user_id],
  start: { project_id: 2, external_user_id: 3 },
  finish: { project_id: 5, external_user_id: 70 },
  order: :desc,
}

Event.where(type: "login").in_batches(**options) do |dataset|
  dataset.delete
end

Options

You can set the following options:

pk

Overrides primary key of your dataset. This option is required in case your table doesn't have a real PK, otherwise you will get Sequel::Extensions::Batches::MissingPKError.

Note that you have to provide columns that don't contain NULL values, otherwise this may not work as intended. You will receive Sequel::Extensions::Batches::NullPKError in case batch processing detects a NULL value on it's way, but it's not guaranteed since it doesn't check all the rows for performance reasons.

of

Sets chunk size (1000 by default).

start

A hash { [column]: <start_value> } that represents frame start for batch processing. Note that you will get Sequel::Extensions::Batches::InvalidPKError in case you provide a hash with wrong keys (ordering matters as well).

finish

Same as start but represents the frame end.

order

Specifies the primary key order (can be :asc or :desc). Defaults to :asc.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/umbrellio/sequel-batches.

License

The gem is available as open source under the terms of the MIT License.

Supported by Umbrellio

About

Sequel extension for querying large datasets in batches

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages