PipeOp Specifications

General rules:

Inherit from PipeOp for general pipeops, PipeOpTaskPreproc for preprocessing pipeops that have one task input, one task output, and from PipeOpTaskPreprocSimple for a subset of these that perform exactly the same operation during training and prediction.
Overwrite the train_internal() and predict_internal() functions when inheriting PipeOp. Overwrite the train_task()/train_dt() and predict_task()/predict_dt() as well as possibly select_cols() (for ..._dt()) functions when inheriting PipeOpTaskPreproc. Overwrite the get_state()/get_state_dt(), transform()/transform_dt() as well as possibly select_cols() (for ..._dt()) functions when inheriting PipeOpTaskPreprocSimple.
Set the $input and $output train and predict columns to the acceptable types for these operations. Do not check input values for types that are already specified in the $input and $output tables. Ok:
```
train_internal(inputs) {
  if (inputs$nrow < 1) stop("Input too small")
```
Bad (because the input type "Task" is already checked by the train() function):
```
train_internal(inputs) {
  assert_task(inputs[[1]])
```
Inputs in train_internal() / predict_internal() are always given by-reference, so if any R6 objects are modified, they must be cloned before. This is not the case for train_task, train_dt, ... in PipeOpTaskPreproc[Simple]: The PipeOpTaskPreproc[Simple] takes care of cloning so Tasks/data.tables can be modified in-place.
PipeOpTaskPreproc[Simple] $state must always be a named list; The machinery in PipeOpTaskPreproc[Simple] adds a few slots: $affected_cols, $intasklayout, $outtasklayout, $dt_columns (only if train_task/predict_task/get_state/transform are not overwritten). Therefore, these names are "reserved" and should not be set by the class inheriting by PipeOpTaskPreproc[Simple]. Even though PipeOp $state can be anything, it is recommended to also keep it a named list.
Every change done by the $train() method must be reflected by the $state variable. I.e.
```
po2 = po1$clone(deep = TRUE)
po1$train(input)
po2$state = po1$state
po1 = po1$clone(deep = TRUE)
```
must leave po1 and po2 identical. (The last clone call is necessary to mirror effects done by po2 = po1$clone())

$predict() must be idempotent, i.e.

po2 = po1$clone(deep = TRUE)
po1$predict(input1)
po1$predict(input2)
po2$predict(input3)
po1 = po1$clone(deep = TRUE)

must leave po1 and po2 identical. (The last clone call for the same reason as above.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PipeOp Specifications

Clone this wiki locally