Data Pipeline#
MLX::DSL::Data provides lazy, chainable enumerable transforms for trainer
input pipelines.
dataset = MLX::DSL::Data.from(records).batch(32)
trainer.fit(dataset, epochs: 3)
Entry points#
MLX::DSL::Data.from(source_or_block)MLX::DSL::Data.pipeline(...)
records = [{x: [1.0, 2.0], y: 0}, {x: [3.0, 4.0], y: 1}]
from_enum = MLX::DSL::Data.from(records)
from_block = MLX::DSL::Data.pipeline do
Enumerator.new do |yielder|
records.each { |row| yielder << row }
end
end
Transforms#
mapfilterbatch(size, drop_last:)take(count)repeat(times = nil)shuffle(seed:, random:)prefetch(size)
batches = MLX::DSL::Data
.from(records)
.shuffle(seed: 7)
.map { |row| [row[:x], row[:y]] }
.batch(32, drop_last: false)
.take(10)
Signature-aware map/filter#
map and filter support positional/keyword callable signatures with
item, index, and pipeline context.
data = MLX::DSL::Data
.from(records)
.map { |item, index| preprocess(item, index: index) }
.filter { |item, index:| index.even? }
.batch(32)
See implementation:
lib/mlx/dsl/data_pipeline.rb