Skip to content

Commit

Permalink
Update documentation pages (#57)
Browse files Browse the repository at this point in the history
* add a sidebar for the more info page

* add descriptions and listings

* restructure create components

* restructure getting started

* fix project structure

* update test info

* Update running tests

* update more information pages

* create getting started page

* document how to run workflows

* document parameter lists

* update changelog

* remove duplicate info

* fix listing
  • Loading branch information
rcannood authored Jan 12, 2024
1 parent 3f4002e commit 59f58de
Show file tree
Hide file tree
Showing 27 changed files with 423 additions and 197 deletions.
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,43 @@
# OpenPipelines.bio next release

## MAJOR CHANGES

* Add descriptions to all pages and add listings to index pages.

* Update documentation on creating components for developers.

* Update getting started page for developers

* Update project structure.

* Update information on running tests.

* Update "More information" pages

* Write getting started page for user guide

* Document how to run workflows

* Document parameter lists

# OpenPipelines.bio v0.12.1

## MAJOR CHANGES

* Update component documentation to v0.12.1 release.

# OpenPipelines.bio v0.12.0

## MAJOR CHANGES

* Update component documentation to v0.12.0 release.

# OpenPipelines.bio v0.11.0

## MAJOR CHANGES

* Update component documentation to v0.11.0 release.

# OpenPipelines.bio v0.10.0

## MAJOR CHANGES
Expand Down
6 changes: 5 additions & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,11 @@ website:
- id: contributing
collapse-level: 3
title: Contributing
contents: contributing
contents: contributing
- id: more_information
collapse-level: 3
title: More information
contents: more_information

format:
html:
Expand Down
29 changes: 13 additions & 16 deletions contributing/creating_components.qmd
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
---
title: Creating components
description: A guide on how to create new components
order: 20
---

# A common file format
## A common file format

One of the core principals of OpenPipelines is to use [MuData](https://mudata.readthedocs.io/) as a common data format troughout the whole pipeline. See [the concepts page](../fundamentals/concepts.qmd#sec-common-file-format) for more information on openpipelines uses MuData to store single-cell data.
One of the core principals of OpenPipelines is to use [MuData](https://mudata.readthedocs.io/) as a common data format troughout the whole pipeline. See [the concepts page](/fundamentals/concepts.qmd#sec-common-file-format) for more information on openpipelines uses MuData to store single-cell data.

# Component location
## Component location
As discussed in [the project structure](project_structure.qmd#sec-project-structure), components in the repository are stored within `src`. Additionally, components are grouped into namespaces, according to a common functionality. An example of such a namespace is the dimensionality reduction namespace (`dimred`), of which the components `pca` and `umap` are members. This means that within `src`, the namespace folders can be found that stores the components that belong to these namespaces.

In order to create a new component in OpenPipelines, you will need to create a new folder that will contain the different elements of the component:
Expand All @@ -20,7 +21,7 @@ mkdir src/my_namespace/my_component
Take a look at the components that are already in `src/`! There might be a component that already does something similar to what you need.
:::

# The elements of a component
## The elements of a component
A component consists of one or more scripts that provide the functionality of the component together with metadata of the component in a configuration file. The [Viash config](https://viash.io/reference/config/) contains metadata of your dataset, which script is used to run it, and the required dependencies. An in-depth guide on how to create components is available on the [viash website](https://viash.io/guide/component/create-component.html), but a few specifics and guidelines will be discussed here.

## The config
Expand Down Expand Up @@ -90,37 +91,33 @@ Resources checklist:
- Script resources are located next to the config and added to the config with the correct type (`python_script`, `r_script`, ...)
- Small resources (<50MB) that are not scripts can also be checked in into the repo, next to the

### Build information
### The script file

TODO

## The script file
### Author information

TODO

## Author information
## Adding dependencies

TODO

# Adding dependencies

TODO

# Building components from their source
## Building components from their source
When running or [testing individual components](#running-component-unittests), it is not necessary to execute an extra command to run the build step, `viash test` and `viash run` will build the component on the fly. However, before integrating components into a pipeline, you will need to build the components. More specifically, openpipelines uses Nextflow to combine components into pipelines, so we need to have at least the components build for `nextflow` platform as target. The easiest method to build the components is to use:

```bash
viash ns build --parallel --setup cachedbuild
```
After using `.viash ns build`, the target folder will be populated with three subfolders, corresponding to the build platforms that viash supports: `native`, `docker` and `nextflow`. In contrast to `./bin/viash build`, `viash_build` will use all of the platforms defined in each of the components configuration instead of the first one. Keep in mind that running `./bin/viash_build` will not always cause a component to be re-build completely. Caching mechanisms in the docker platform for example will make sure only components for which alterations have been made will be build, significantly reducing build times. In summary, using `./bin/viash_build` makes sure that the latest build of components are available before starting to integrate them in pipelines.
After using `viash ns build`, the target folder will be populated with three subfolders, corresponding to the build platforms that viash supports: `native`, `docker` and `nextflow`.

Building an individual component can still be useful, for example when debugging a component for which the build fails or if you want to create a standalone executable for a component to execute it without the need to use `viash`. To build an individual component, `./bin/viash build` can be used. Note that the default build directory of this viash base command is `output`, which is not the location where build components will be imported from when integrating them in pipelines. Using the `--output` argument, you can set it to any directory you want, for example:
Building an individual component can still be useful, for example when debugging a component for which the build fails or if you want to create a standalone executable for a component to execute it without the need to use `viash`. To build an individual component, `viash build` can be used. Note that the default build directory of this viash base command is `output`, which is not the location where build components will be imported from when integrating them in pipelines. Using the `--output` argument, you can set it to any directory you want, for example:

```bash
./bin/viash build ./src/filter/do_filter/config.vsh.yaml -o ./target/native/filter/do_filter/ -p native
viash build src/filter/do_filter/config.vsh.yaml -o target/native/filter/do_filter/ -p native
```

# Containerization
## Containerization
One of the key benefits of using Viash is that containers can be created that gather dependencies per component,
which avoids building one container that has to encorporate all dependencies for a pipeline together. The containers for a single component can be reduced in size, defining the minimal requirements to run the component. That being said, building containers from scratch can be labour intensive and error prone, with base containers from reputable publishers often benefiting from improved reliability and security. Hence, a balance has to be made between reducing the container's size and adding many dependencies to a small base container.

Expand Down
1 change: 1 addition & 0 deletions contributing/creating_pipelines.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
---
title: Creating pipelines
description: A guide on how to create new workflows
order: 30
---
26 changes: 19 additions & 7 deletions contributing/getting_started.qmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
---
title: Getting started
description: Install dependencies and fetch test resources
order: 0
---

# Forking the code and cloning the repository
The openpipelines code is hosted on GitHub. To start working on openpipeline, you should create your own copy of the repository by forking it. Visit the openpipeline repository [here](https://github.com/openpipelines-bio/openpipeline) and use the 'Fork' button on the top right hand side of the page. After you are done forking, you can clone the repository to a local directory on your computer using `git clone`. You can choose between using an SSH key to log in to GitHub or username and password (HTTPS) to connect to github.
## Forking the code and cloning the repository
The OpenPipelines code is hosted on GitHub. To start working on OpenPipelines, you should create your own copy of the repository by forking it. Visit the OpenPipelines repository [here](https://github.com/openpipelines-bio/openpipeline) and use the 'Fork' button on the top right hand side of the page. After you are done forking, you can clone the repository to a local directory on your computer using `git clone`. You can choose between using an SSH key to log in to GitHub or username and password (HTTPS) to connect to github.

::: {.panel-tabset}

Expand All @@ -24,12 +25,23 @@ git remote add upstream https://github.com/openpipeline-bio/openpipeline.git

:::

# Installing `viash` and `nextflow` {#sec-install-viash-nextflow}
Openpipelines is being developed in Viash and Nextflow. If you are unfamiliar with either one of these platforms, you can check out their respective documentations [here](https://viash.io/guides/getting_started/introduction/) and [here](https://www.nextflow.io/docs/latest/index.html). To start contributing to openpipelines, you will need at least a working version of Java 11, OpenJDK 11, or a later version (up to Java 18). Additionally, by using [Docker](https://www.docker.com/),you can build and test pipeline components and pipelines without needing to manually install dependencies for these components on your machine.
## Install `viash` and `nextflow` {#sec-install-viash-nextflow}

Viash and Nextflow can be installed by following the guides in the documentation for both of these tools. Make sure the `viash` and `nextflow` binaries are located in an existing directory that is listed in your `$PATH`. You can check if everything worked by running the following two commands and see if they output the correct location of the executables.
To start contributing to OpenPipelines, you will need at [Java 11](https://www.oracle.com/java/technologies/downloads/) (or higher) and [Docker](https://docs.docker.com/get-docker/) installed on your system.

OpenPipelines is being developed in [Viash](https://viash.io/quickstart/) and [Nextflow](https://www.nextflow.io/docs/latest/index.html). If you are unfamiliar with either one of these platforms, you can check out their respective documentation pages.

You can check if is installed correctly by running the following commands.

```bash
nextflow run hello -with-docker
viash --version
```

## Fetch test resources

OpenPipelines uses a number of test resources to test the pipelines. If everything is installed correctly, you should be able to fetch these resources by running the following command.

```bash
which viash
which nextflow
viash run src/download/sync_test_resources/config.vsh.yaml
```
Binary file added contributing/images/compare_across_forks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions contributing/index.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
---
title: Contributing
sidebar: true
listing:
template: ../templates/listing.ejs
type: default
contents: "./*.qmd"
---
8 changes: 4 additions & 4 deletions contributing/project_structure.qmd
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
---
title: Project structure
description: The structure of OpenPipelines
order: 10
---

## Project structure {#sec-project-structure}
The root of the repository contains three main folders:
The root of the repository contains two main folders:

1. `src`, which contains the source code for individual components.
2. The `workflows` folder containing the implementations of the pipelines (combining one or more components).
1. `src`, which contains the source code for components and workflows.
3. (optionally) the `target` folder

Each subfolder from `src` contains a Viash [namespace](https://viash.io/guides/projects/namespaces/), a logical grouping of pipeline components that perform a similar function. Within each namespace, subfolders designate individual pipeline components. For example `./src/convert/from_bdrhap_to_h5ad` contains the implementation for a component `from_bdrhap_to_h5ad` which is grouped together with other components such as `from_10xmtx_to_h5mu` into a namespace `convert`. In a similar manner as grouping components into `namespaces`, pipelines are grouped together into folders. However, these are not component namespaces and as such do not interact with `viash ns` commands.

As will become apparent later on, Viash not only provides commands to perform operations on individual components, but also on groups of components in a namespace and all components in a project. As a rule of thumb, the basic Viash commands (like `./bin/viash test`) are designated for running commands on individual components, while `ns` commands are (`./bin/viash ns test`) are for namespaces.
As will become apparent later on, Viash not only provides commands to perform operations on individual components, but also on groups of components in a namespace and all components in a project. As a rule of thumb, the basic Viash commands (like `viash test`) are designated for running commands on individual components, while `ns` commands are (`viash ns test`) are for namespaces.
When cloning a fresh repository, there will be no `target` folder present. This is because the target folder will only be created after components have been build.

## Versioning and branching strategy {#sec-versioning}
Expand Down
44 changes: 39 additions & 5 deletions contributing/pull_requests.qmd
Original file line number Diff line number Diff line change
@@ -1,14 +1,20 @@
---
title: Publishing your changes
description: How to create a pull request
order: 40
---

## Updating your pull request or branch
While creating changes on your local branch, another developper could have added new changes the openpipeline repository.
These changes will need to be included into your local branch before a pull request can be merged. Updating your branch involves merging
the upstream branch (usually `main`) into your branch:
After ensuring that the implemented changes pass all relevant tests and meets the contribution guidelines, you can create a pull request following the steps below.

## Step 1: Merge upstream repository

Before you contribute your changes need to merge the upstream main branch into your fork. This ensures that your changes are based on the latest version of the code.

To do this, enter the following commands adapted from [Syncing a Fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork#syncing-a-fork-branch-from-the-command-line) in your terminal or command prompt:

```bash
# add the upstream repository to your local repository
git remote add upstream https://github.com/openpipelines-bio/openpipeline.git
# download the changes from the openpipelines repo
git fetch upstream
# change your current branch to the branch of the pull request
Expand All @@ -17,4 +23,32 @@ git checkout <feature_branch>
git merge upstream/main
# push the updates, your pull request will also be updated
git push
```
```

## Step 2: Edit changelog

Add an entry to the CHANGELOG.md file describing the proposed changes.

## Step 3: Create pull request

The following steps were adapted from [Creating a pull request from a fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork)

1. Go to <https://github.com/openpipelines-bio/openpipeline/pulls>.
2. Click on the `New pull request` button.
3. On the compare page click on the link `compare across forks` below the title. ![](images/compare_across_forks.png)
4. On the right side in the `head` section select your fork repo and the correct branch you want to merge.
5. Click on `Create pull request`.
6. Construct your PR by giving it a title and description.
7. Make sure you select the box below the description `Allow edits from maintainers`.
8. If the PR is ready for review click the button `Create Pull Request`. Otherwise you can click the arrow next to the button and select `Create Draft Pull Request` and click the button when it changes.

## Next steps

### Github Actions

Whenever a Pull Request (including draft) is created a github workflow will perform checks. These checks need to be succesful as a minimum requirement before a merge can be done.
When there are errors in the checks, try to fix them while waiting on a review. If it is not possible to fix the error, add a comment to the PR to let the reviewers know.

### Review

Your PR will be reviewed by maintainers of OpenPipelines. During the review, you can be asked for changes to the code.
29 changes: 10 additions & 19 deletions contributing/running_tests.qmd
Original file line number Diff line number Diff line change
@@ -1,46 +1,37 @@
---
title: Running tests
description: How to run component and integration tests.
order: 30
---


### Fetching the test data.
## Fetch the test data
The input data that is needed to run the tests will need to be downloaded from the openpipelines Amazon AWS s3 bucket first.
To do so, the `download/sync_test_resource` component can be used, which will download the data to the correct location (`resources_test`) by default.

```bash
./bin/viash run ./src/download/sync_test_resources/config.vsh.yaml -p docker -- --input s3://openpipelines-data
viash run src/download/sync_test_resources/config.vsh.yaml
```

Or, if you do not want to use Docker and have `aws-cli` tools installed natively:
```bash
./bin/viash run ./src/download/sync_test_resources/config.vsh.yaml -p native -- --input s3://openpipelines-data
```

### Running component unittests
## Run component tests
To build and run tests for individual component that you are working on, use [viash test](https://viash.io/api/commands/test/) with the `config.vsh.yaml` of the component you would like to test.
For example:

```bash
./bin/viash test ./src/convert/from_10xh5_to_h5mu/config.vsh.yaml
viash test src/convert/from_10xh5_to_h5mu/config.vsh.yaml
```
Keep in mind that when no platform is passed to `viash test`, it will use the first platform that is specified in the config, which is `docker` for most of the components in openpipelines. Use `-p native` for example if you do not want to use docker.

It is also possible to execute the tests for all components in each namespace using `./bin/viash_test` (note the underscore instead of a space).

```bash
./bin/viash_test
```

### Integration tests
Individual integration tests can be run by using the `integration_test.sh` scripts for a pipeline, located next to the `main.nf` in the `workflows` folder.
It is also possible to execute the tests for all components in each namespace using:

```bash
./workflows/ingestion/cellranger_demux/integration_test.sh
viash ns test --parallel -q convert
```

Running all integration tests is also possible using a helper script that can be found at `workflows/test/integration_test.sh`. Using this script requires a working `R` installation with [tidyverse](https://www.tidyverse.org/packages/) installed. However, as pipelines are implemented by combining individual components
## Run integration tests
Individual integration tests can be run by using the `integration_test.sh` scripts for a pipeline, located next to the `main.nf` in the `src/workflows` folder.

```bash
./workflows/test/integration_test.sh
src/workflows/ingestion/cellranger_demux/integration_test.sh
```
1 change: 1 addition & 0 deletions fundamentals/architecture.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Architecture
description: Structure of the project
order: 30
d2:
layout: elk
Expand Down
1 change: 1 addition & 0 deletions fundamentals/concepts.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Concepts
description: The core concepts behind this project
order: 20
---

Expand Down
4 changes: 4 additions & 0 deletions fundamentals/index.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
---
title: Fundamentals
sidebar: true
listing:
template: ../templates/listing.ejs
type: default
contents: "./*.qmd"
---
1 change: 1 addition & 0 deletions fundamentals/philosophy.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Philosophy
description: Our approach and mission
order: 10
---

Expand Down
2 changes: 1 addition & 1 deletion fundamentals/roadmap.qmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: Roadmap
description: Development roadmap
order: 40
---



:::{.column-screen-inset-shaded}

```{mermaid}
Expand Down
Loading

0 comments on commit 59f58de

Please sign in to comment.