-
Notifications
You must be signed in to change notification settings - Fork 30
Developer Document
The existing code consumes user-ad records from metaq, and then prepares training data automatically.
If your data source is aside from metaq, you have to implement your own pre-processor.
You should extract user feature and ad feature for every user-ad record. Besides the ordinary first order feature, LASER introduces conjunction feature. For more detail in LASER-A Scalable Response Prediction Platform For Online Advertising
Besides offline training data, you have to prepare online training data. For this, the ready offline model is used to calculate a known offset.
For each item has a individual model, the training data is stored by one line for one item. Take OnlineVectorWritable as a reference.
The two data above have to persist on HDFS and quartz will trigger training task via specified policy. But before starting task, it will increment online and offline training data's filename. So a increment interface is need.
For detail in LaserMessageConsumer
This training is based on hadoop, and the optimize is based on QNMinimizer
To reduce latency, the partition policy is tuned by LrIterationInputFormat
Offline model is based on ADMM, and partition is also tuned. The guideline is that one mapper processes as many as data. For detail, AdmmIterationInputFormat
Both online model and offline model is L2 penalty.