[BUG] Clarify non-opitionality of `TensordictPrimer` #2362

matteobettini · 2024-08-05T05:56:20Z

Without the primer, the collector does not feed any hidden state to the policy

in the RNN tutorial it is stated that the primer is optional and it is used just to store the hidden states in the buffer.

This is not true in practice. Not adding the primer will result in the collector not feeding the hidden states to the policy during execution. Which will silently cause the rnn to loose any recurrency.

In the tutorial it seems like the only reason the Primer is there is to store hidden states in the buffer (which I would also highly advise against, as this makes any algorithm on-policy and can lead to severe issues).

I think it needs to be changed and remove any claims on optionality of the primer. Instead strong claims about its non-optionality should be made as, if a user removes it, the tutorial will silently run without recurrency.

Reproduce

To reproduce, comment out this line

rl/tutorials/sphinx-tutorials/dqn_with_rnn.py

Line 269 in 0063741

env.append_transform(lstm.make_tensordict_primer())

and print the policy input at this line

rl/torchrl/collectors/collectors.py

Line 733 in 0063741

policy_output = self.policy(policy_input)

you will see that no hidden state is fed to the rnn during execution and no errors or warnings are thrown

The future I want to live in

In the future I want to live in there are no primers. The torchrl components are able to look at the policy outputs and carry forward whatever is in "next". The primer for me is a unique pain point particular to torchrl, as users doing recurrency in other libs won't have any equivalent of this and will probably forget it causing major silent bugs

matteobettini added the bug Something isn't working label Aug 5, 2024

matteobettini assigned vmoens Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Clarify non-opitionality of `TensordictPrimer` #2362

[BUG] Clarify non-opitionality of `TensordictPrimer` #2362

matteobettini commented Aug 5, 2024 •

edited

Loading

[BUG] Clarify non-opitionality of TensordictPrimer #2362

[BUG] Clarify non-opitionality of TensordictPrimer #2362

Comments

matteobettini commented Aug 5, 2024 • edited Loading

Without the primer, the collector does not feed any hidden state to the policy

Reproduce

The future I want to live in

[BUG] Clarify non-opitionality of `TensordictPrimer` #2362

[BUG] Clarify non-opitionality of `TensordictPrimer` #2362

matteobettini commented Aug 5, 2024 •

edited

Loading