Skip to content
KevinLinGit edited this page Sep 24, 2014 · 7 revisions

How to prepare training data

The existing code consumes user-ad records from metaq, and then prepares training data automatically.

If your data source is aside from metaq, you have to implement your own pre-processor.

Offline model training data

You should extract user feature and ad feature for every user-ad record. Besides the ordinary first order feature, LASER introduces conjunction feature. For more detail in LASER-A Scalable Response Prediction Platform For Online Advertising

Online model training data

Besides offline training data, you have to prepare online training data. For this, the ready offline model is used to calculate a known offset.

Note

For each item has a individual model, the training data is stored by one line for one item. Take OnlineVectorWritable as a reference.

Increment

The two data above have to persist on HDFS and quartz will trigger training task via specified policy. But before starting task, it will increment online and offline training data's filename. So a increment interface is need.

For detail in LaserMessageConsumer

Train Model

How to train online model

This training is based on hadoop, and the optimize is based on QNMinimizer

To reduce latency, the partition policy is tuned by LrIterationInputFormat

How to train offline model

Offline model is based on ADMM, and partition is also tuned. The guideline is that one mapper processes as many as data. For detail, AdmmIterationInputFormat

Note

Both online model and offline model is L2 penalty.

Clone this wiki locally