Dear TR,
After training the caption models "show, attend, and tell" and "top-down" for 10 EPOCH's, my achieved training loss (~2.8) is still much higher than the suggested upper bound in the lab handout (2.3.) On the other hand as a benchmark I tried "show and tell" model, and the loss is equal to ~2.8, too. Is the recommended upper bound also applicable to "show and tell" model? Since "show and tell" model has no "TODO" part in the code, I assume the loss value should not be affected by the code I added.