Skip to content

Mldb 1209 statstable fncts noise injection#884

Open
guyd wants to merge 14 commits intomasterfrom
MLDB-1209_statstable_fncts_noise_injection
Open

Mldb 1209 statstable fncts noise injection#884
guyd wants to merge 14 commits intomasterfrom
MLDB-1209_statstable_fncts_noise_injection

Conversation

@guyd
Copy link
Copy Markdown
Contributor

@guyd guyd commented May 3, 2017

This is Francois' branch with a change by vbisserie to make it compile and a unit test for the noise generation functions. A small issue was found in the NoiseInjector::add_noise where the rounding was always done to the lower value.

Francois Maillet and others added 12 commits July 3, 2016 16:47
…d laplacian noise to the counts to allow the usage of stats tables training examples to train a classifier
Conflicts:
	plugins/stats_table_procedure.cc
	testing/testing.mk
Conflicts:
	jml/math/xdiv.h
	plugins/stats_table_procedure.cc
	testing/MLDB-873_stats_table_test.py
	testing/testing.mk
…l issue where the rounding was always done to the lower value
…l issue where the rounding was always done to the lower value
…l issue where the rounding was always done to the lower value

static const std::string INJECT_NOISE_DOC_STR =
"Inject laplacian noise to counts. This is useful when training "
"a classifier on the examples that were used to genereate the counts. "
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genereate -> generate

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say that this is acceptable only if the count values are relatively large compare to the noise? How large?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add more on label data leakage in the documentation of stats table functions. I feel this comment is hard to understand without an example.


mldb = mldb_wrapper.wrap(mldb) # noqa

class MLDB1209StatstableBiasNoiseTest(MldbUnitTest): # noqa
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not realized that this is also lacking some validation point. I will add these too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done that.

{
}

bool injectNoise;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to transition to C++ 11 member initialization (i.e. bool injectNoise = false)


std::string outcomeToUse;

bool injectNoise;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.


struct NoiseInjector {

NoiseInjector() : mu(0), b(3)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize defaults in the declaration and kill the constructor.

Copy link
Copy Markdown
Contributor

@mathieumb mathieumb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after very minor changes to constructor / member initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants