Skip to content

PipeOp Specifications

mb706 edited this page Aug 12, 2019 · 4 revisions

General rules:

  • Inherit from PipeOp for general pipeops, PipeOpTaskPreproc for preprocessing pipeops that have one task input, one task output, and from PipeOpTaskPreprocSimple for a subset of these that perform exactly the same operation during training and prediction.
  • Overwrite the train_internal() and predict_internal() functions when inheriting PipeOp. Overwrite the train_task()/train_dt() and predict_task()/predict_dt() as well as possibly select_cols() (for ..._dt()) functions when inheriting PipeOpTaskPreproc. Overwrite the get_state()/get_state_dt(), transform()/transform_dt() as well as possibly select_cols() (for ..._dt()) functions when inheriting PipeOpTaskPreprocSimple.
  • Set the $input and $output train and predict columns to the acceptable types for these operations. Do not check input values for types that are already specified in the $input and $output tables. Ok:
    train_internal(inputs) {
      if (inputs$nrow < 1) stop("Input too small")
    Bad (because the input type "Task" is already checked by the train() function):
    train_internal(inputs) {
      assert_task(inputs[[1]])
  • Inputs in train_internal() / predict_internal() are always given by-reference, so if any R6 objects are modified, they must be cloned before. This is not the case for train_task, train_dt, ... in PipeOpTaskPreproc[Simple]: The PipeOpTaskPreproc[Simple] takes care of cloning so Tasks/data.tables can be modified in-place.
  • PipeOpTaskPreproc[Simple] $state must always be a named list; The machinery in PipeOpTaskPreproc[Simple] adds a few slots: $affected_cols, $intasklayout, $outtasklayout, $dt_columns (only if train_task/predict_task/get_state/transform are not overwritten). Therefore, these names are "reserved" and should not be set by the class inheriting by PipeOpTaskPreproc[Simple]. Even though PipeOp $state can be anything, it is recommended to also keep it a named list.
  • Every change done by the $train() method must be reflected by the $state variable. I.e.
    po2 = po1$clone(deep = TRUE)
    po1$train(input)
    po2$state = po1$state
    po1 = po1$clone(deep = TRUE)
    must leave po1 and po2 identical. (The last clone call is necessary to mirror effects done by po2 = po1$clone())
  • $predict() must be idempotent, i.e.
    po2 = po1$clone(deep = TRUE)
    po1$predict(input1)
    po1$predict(input2)
    po2$predict(input3)
    po1 = po1$clone(deep = TRUE)
    must leave po1 and po2 identical. (The last clone call for the same reason as above.)
Clone this wiki locally