Hello team,
Thank you so much for all your hard work in releasing this! I had a couple quick questions about the model design and dataset:
- Did you use any specific base model to initialize weights before training or was this model trained from scratch?
- How many tokens was this model trained using?
Thanks again!