Conversation
|
Thanks! This is highly appreciated. I haven't been able to make the time to do a careful review yet, but it's upcoming. |
|
Multithreading support would be great, but even in a single thread, maintaining multiple computational graphs in parallel would help be a lot since it would enable model ensemble without having to reset the CG between querying different models. That would be great too! |
… oir-threadsafe-rebase
|
Thanks again for contributing, and I'm super-sorry for taking so long to get to this! Here are a few comments/questions:
|
|
@oir FYI: If you're busy and don't have time to handle this we can pick things up and do the rest on our side. Of course if you're willing to help we'll be happy to have you. |
|
@neubig Hey! Sorry for the late response, I am willing to pick up. I will go through your comments and address them, as well as attempt a rebase, hopefully soon enough. |
|
@neubig To keep you updated: This week I am starting to look again at this (possibly alongside NAACL). We have noticed another minor issue with the PR which needs to be fixed (about guarding shared parameter pools), which will also be part of this PR after I do the rebase. |
|
@oir Great, thanks! |
|
Has any progress been made on this in the past two years? |
|
If my comments above could be addressed I'd be happy to merge a PR! |
|
Please see also oir#1. It doesn't appear that the modifications are sufficient. |
| for (size_t t = 0; t < 4; ++t) { threads[t].join(); } | ||
| for (size_t t = 0; t < 4; ++t) { | ||
| for(size_t i = 1; i < results[t].size(); ++i) { | ||
| BOOST_CHECK_CLOSE(results[t][0], results[t][i], 0.0001); |
There was a problem hiding this comment.
This code never runs because results[t].size() is always 1. Thread safety is never tested (unless the test crashes, which it does).
|
The line dynet/tests/test-exec-dynamic.cc Line 144 in 2da4a05 dynet/tests/test-exec-dynamic.cc Line 143 in 2da4a05 results[t].size() is always 1, so the loop is not entered.
If the code is changed to for (size_t t = 1; t < threadCount; ++t) {
BOOST_CHECK_CLOSE(results[0][0], results[t][0], 0.0001);
}the check will pass when the threads are all processed serially, which is not much of a surprise. When they are processed in parallel, the test crashes and the check is never performed. The PR contains some promising code, but it does not appear to be usable/correct. It should not be merged. |
This PR includes changes to (optionally) enable threadsafe operation of dynet, providing the ability to run multiple dynet models within a single application, or executing a single dynet model over multiple data instances (computation graphs) concurrently.
This includes:
Multithread data parallelism without copying a single model in memory works as follows:
dynet::ParameterCollectionobject shared between threads containing (physical) model parametersLSTMBuilder). This copy causes copies of model parameters but that is okay because these are just shells that contain pointers to the same physical storage.Main motivation was multithreaded inference, but possibly the changes might apply to training-time as well, similar to asynchronous SGD training (which I did not test).
My implementation is limited (and tested on) only the
SimpleExecutionEngine(so no autobatch) and only for CPU devices.