Skip to content

Commit

Permalink
Windows/WindowsIterator/WindowsDataset (#2338)
Browse files Browse the repository at this point in the history
* Windows/WindowIterator/WindowDataset

* naming

* naming/book

* fix: format

* move to `tranform` folder

* format

* fix: summary/example imports

* fix: docs/import

* fix: generics

* cleanup

* chore: format/cleanup

* chore: update summarry

* chore: keep window.rs naming

* chore: summary

* fix: book table

* chore: summary fix

* chore: fix summary

* chore: fix summary

* chore: fix summaries

* chore: fix summaries

* add missing test coverage
  • Loading branch information
NicoZweifel authored Oct 15, 2024
1 parent 2533509 commit 35bb19a
Show file tree
Hide file tree
Showing 6 changed files with 298 additions and 159 deletions.
5 changes: 4 additions & 1 deletion burn-book/src/building-blocks/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ distributions.
| `PartialDataset` | Returns a view of the input dataset with a specified range. |
| `MapperDataset` | Computes a transformation lazily on the input dataset. |
| `ComposedDataset` | Composes multiple datasets together to create a larger one without copying any data. |
| `WindowDataset` | Dataset designed to work with overlapping windows of data extracted from an input dataset. |
| `WindowsDataset` | Dataset designed to work with overlapping windows of data extracted from an input dataset. |

Let us look at the basic usages of each dataset transform and how they can be composed together.
These transforms are lazy by default except when specified, reducing the need for unnecessary
Expand Down Expand Up @@ -91,6 +91,9 @@ let data_split = match split {
multiple sources (say different HuggingfaceDatasetLoader sources) into a single bigger dataset
which can be sampled from one source.

- **WindowsDataset**: This transform is useful to create overlapping windows of a dataset.
Particularly useful for sequential Time series Data, for example when working with an LSTM.

## Storage

There are multiple dataset storage options available for you to choose from. The choice of the
Expand Down
34 changes: 2 additions & 32 deletions crates/burn-dataset/src/dataset/base.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use std::{num::NonZeroUsize, sync::Arc};
use std::sync::Arc;

use crate::{dataset::window::WindowDataset, DatasetIterator};
use crate::DatasetIterator;

/// The dataset trait defines a basic collection of items with a predefined size.
pub trait Dataset<I>: Send + Sync {
Expand All @@ -22,36 +22,6 @@ pub trait Dataset<I>: Send + Sync {
{
DatasetIterator::new(self)
}

/// Returns a new `Dataset` of all the windows of length `size`. The windows overlap.
/// Is empty if the input `Dataset` is shorter than `size`.
///
/// # Panics
///
/// Panics if `size` is 0.
///
/// # Examples
///
/// ```
/// use crate::burn_dataset::{Dataset,InMemDataset};
/// let items = [1, 2, 3, 4].to_vec();
/// let dataset = InMemDataset::new(items.clone());
///
/// let windows = dataset.windows(2);
///
/// assert_eq!(windows.len(), 3);
/// ```
///
/// # Returns
///
/// A `WindowDataset` instance.
fn windows(&self, size: usize) -> WindowDataset<'_, I>
where
Self: Sized,
{
let size = NonZeroUsize::new(size).expect("window size must be non-zero");
WindowDataset::new(self, size)
}
}

impl<D, I> Dataset<I> for Arc<D>
Expand Down
2 changes: 0 additions & 2 deletions crates/burn-dataset/src/dataset/mod.rs
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
mod base;
mod in_memory;
mod iterator;
mod window;

pub use base::*;
pub use in_memory::*;
pub use iterator::*;
pub use window::*;

#[cfg(any(test, feature = "fake"))]
mod fake;
Expand Down
124 changes: 0 additions & 124 deletions crates/burn-dataset/src/dataset/window.rs

This file was deleted.

2 changes: 2 additions & 0 deletions crates/burn-dataset/src/transform/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@ mod mapper;
mod partial;
mod random;
mod sampler;
mod window;

pub use composed::*;
pub use mapper::*;
pub use partial::*;
pub use random::*;
pub use sampler::*;
pub use window::*;
Loading

0 comments on commit 35bb19a

Please sign in to comment.