I think when actor is generating the next state, it fails to use the initial state from last cell, as in this line [trainer.py](https://github.com/voiler/unreal/blob/367620c3c6a94ea10bd78bb7bcad4ccb88b23d50/agent/trainer.py#L146)