Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve %load_node for node that doesn't have persisted dataset #4169

Open
noklam opened this issue Sep 16, 2024 · 0 comments
Open

Improve %load_node for node that doesn't have persisted dataset #4169

noklam opened this issue Sep 16, 2024 · 0 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@noklam
Copy link
Contributor

noklam commented Sep 16, 2024

Description

Originated from #4158

%load_node works great but it doesn't work on MemoryDataset, I don't save every node and it's not easy to figure out which nodes I need to run again to produce the data

Context

  • This is much more powerful because not every node has persisted data and this limited the usage of the feature.
  • This enable much more powerful slicing feature in kedro-viz, which is currently limited because we do not want the slicing generate a command that Kedro does not know how to run.
  • Same as above, if we are able to support more flexible slicing, we may end up expanding the API of kedro run as well since we can support different combinations of slicing

Possible Implementation

Check out the suggest_resume_scanerio in SequentialRunner:

remaining_nodes = set(pipeline.nodes) - set(done_nodes)
postfix = ""
if done_nodes:
start_node_names = _find_nodes_to_resume_from(
pipeline=pipeline,
unfinished_nodes=remaining_nodes,
catalog=catalog,
)
start_nodes_str = ",".join(sorted(start_node_names))
postfix += f' --from-nodes "{start_nodes_str}"'

This roughly has the logic to resume pipeline but it's hidden in a private API, we need to surface this for more generic usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

1 participant