Decoder only text reconstruction

In your work, it seems like the decoder is only used to pre-train the model, and then discarded for downstream tasks, and only the encoder is being used.

Is there a way for the decoder to predict text, and get back to the original text unicode representation? Is an OCR model (that can be trained in conjunction perhaps) required?