Conversation
…d laplacian noise to the counts to allow the usage of stats tables training examples to train a classifier
Conflicts: plugins/stats_table_procedure.cc testing/testing.mk
Conflicts: jml/math/xdiv.h plugins/stats_table_procedure.cc testing/MLDB-873_stats_table_test.py testing/testing.mk
…9_statstable_fncts_noise_injection
…l issue where the rounding was always done to the lower value
…l issue where the rounding was always done to the lower value
…l issue where the rounding was always done to the lower value
plugins/stats_table_procedure.cc
Outdated
|
|
||
| static const std::string INJECT_NOISE_DOC_STR = | ||
| "Inject laplacian noise to counts. This is useful when training " | ||
| "a classifier on the examples that were used to genereate the counts. " |
There was a problem hiding this comment.
Should we say that this is acceptable only if the count values are relatively large compare to the noise? How large?
There was a problem hiding this comment.
I suggest we add more on label data leakage in the documentation of stats table functions. I feel this comment is hard to understand without an example.
|
|
||
| mldb = mldb_wrapper.wrap(mldb) # noqa | ||
|
|
||
| class MLDB1209StatstableBiasNoiseTest(MldbUnitTest): # noqa |
There was a problem hiding this comment.
I had not realized that this is also lacking some validation point. I will add these too.
| { | ||
| } | ||
|
|
||
| bool injectNoise; |
There was a problem hiding this comment.
We're trying to transition to C++ 11 member initialization (i.e. bool injectNoise = false)
|
|
||
| std::string outcomeToUse; | ||
|
|
||
| bool injectNoise; |
|
|
||
| struct NoiseInjector { | ||
|
|
||
| NoiseInjector() : mu(0), b(3) |
There was a problem hiding this comment.
Initialize defaults in the declaration and kill the constructor.
mathieumb
left a comment
There was a problem hiding this comment.
+1 after very minor changes to constructor / member initialization.
This is Francois' branch with a change by vbisserie to make it compile and a unit test for the noise generation functions. A small issue was found in the NoiseInjector::add_noise where the rounding was always done to the lower value.