From 962bd979328b34deb3197d62fca56a650f39354b Mon Sep 17 00:00:00 2001 From: Patrik Schilter Date: Thu, 28 Aug 2025 16:43:31 +0200 Subject: [PATCH 1/4] Docu: `summary.clv.fitted` --- man-roxygen/template_summary_clvfitted.R | 5 +++++ man/summary.clv.fitted.Rd | 4 ++++ 2 files changed, 9 insertions(+) diff --git a/man-roxygen/template_summary_clvfitted.R b/man-roxygen/template_summary_clvfitted.R index ca0a1d49..aaea32ad 100644 --- a/man-roxygen/template_summary_clvfitted.R +++ b/man-roxygen/template_summary_clvfitted.R @@ -14,6 +14,11 @@ #' (for example if specified in parameter \code{optimx.args}), all information here refers to #' the last method/row of the resulting \code{optimx} object. #' +#' Note that for the main model coefficients (coefs not for covariates), +#' \code{z-val} and p-values are set to \code{NA} because they are by definition always +#' strictly positive and hypothesis test relative to a null of 0 does not make sense. +#' +#' #' @return This function computes and returns a list of summary information of the fitted model #' given in \code{object}. It returns a list of class \code{summary.clv.no.covariates} that contains the #' following components: diff --git a/man/summary.clv.fitted.Rd b/man/summary.clv.fitted.Rd index e0a3a0a5..28bb12e6 100644 --- a/man/summary.clv.fitted.Rd +++ b/man/summary.clv.fitted.Rd @@ -69,6 +69,10 @@ Summary method for fitted CLV models that provides statistics about the estimate and information about the optimization process. If multiple optimization methods were used (for example if specified in parameter \code{optimx.args}), all information here refers to the last method/row of the resulting \code{optimx} object. + +Note that for the main model coefficients (coefs not for covariates), +\code{z-val} and p-values are set to \code{NA} because they are by definition always +strictly positive and hypothesis test relative to a null of 0 does not make sense. } \examples{ \donttest{ From e29cd034dc8638dcc0a6fb1050eecc55613ebb68 Mon Sep 17 00:00:00 2001 From: Patrik Schilter Date: Thu, 28 Aug 2025 16:58:01 +0200 Subject: [PATCH 2/4] Vignette: Explain significance indicators are NA --- vignettes/CLVTools.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/CLVTools.Rmd b/vignettes/CLVTools.Rmd index 02df86cb..c3e9df51 100644 --- a/vignettes/CLVTools.Rmd +++ b/vignettes/CLVTools.Rmd @@ -155,7 +155,7 @@ To execute the model estimation you have the choice between a formula-based inte -Parameter estimates may be reported by either printing the estimated object (i.e. `est.pnbd`) directly in the console or by calling `summary(est.pnbd)` to get a more detailed report including the likelihood value as well as AIC and BIC. Alternatively parameters may be directly extracted using `coef(est.pnbd)`. Also `loglik()`, `confint()` and `vcov()` are available to directly access the Loglikelihood value, confidence intervals for the parameters and to calculate the Variance-Covariance Matrix for the fitted model. For the standard Pareto/NBD model, we get 4 parameters $r, \alpha, s$ and $\beta$. where $r,\alpha$ represent the shape and scale parameter of the gamma distribution that determines the purchase rate and $s,\beta$ of the attrition rate across individual customers. $r/\alpha$ can be interpreted as the mean purchase and $s/\beta$ as the mean attrition rate. A significance level is provided for each parameter estimates. In the case of the apparelTrans dataset we observe a an average purchase rate of $r/\alpha=0.147$ transactions and an average attrition rate of $s/\beta=0.031$ per customer per week. KKT 1 and 2 indicate the Karush-Kuhn-Tucker optimality conditions of the first and second order [@KKT]. If those criteria are not met, the optimizer has probably not arrived at an optimal solution. If this is the case it is usually a good idea to rerun the estimation using alternative starting values. +Parameter estimates may be reported by either printing the estimated object (i.e. `est.pnbd`) directly in the console or by calling `summary(est.pnbd)` to get a more detailed report including the likelihood value as well as AIC and BIC. Alternatively parameters may be directly extracted using `coef(est.pnbd)`. Also `loglik()`, `confint()` and `vcov()` are available to directly access the Loglikelihood value, confidence intervals for the parameters and to calculate the Variance-Covariance Matrix for the fitted model. For the standard Pareto/NBD model, we get 4 parameters $r, \alpha, s$ and $\beta$. where $r,\alpha$ represent the shape and scale parameter of the gamma distribution that determines the purchase rate and $s,\beta$ of the attrition rate across individual customers. $r/\alpha$ can be interpreted as the mean purchase and $s/\beta$ as the mean attrition rate. Note that the significance indicators are set to `NA` for each parameter. The main model parameters are by definition always strictly positive and a hypothesis test relative to a null of 0 therefore does not make sense. In the case of the apparelTrans dataset we observe a an average purchase rate of $r/\alpha=0.147$ transactions and an average attrition rate of $s/\beta=0.031$ per customer per week. KKT 1 and 2 indicate the Karush-Kuhn-Tucker optimality conditions of the first and second order [@KKT]. If those criteria are not met, the optimizer has probably not arrived at an optimal solution. If this is the case it is usually a good idea to rerun the estimation using alternative starting values. ```{r param-summary} @@ -383,7 +383,7 @@ est.pnbd.dyn <- pnbd(clv.dyn, optimx.args = list(control=list(trace=5))) ``` -To inspect the estimated model we use `summary()`, however all other commands such as `print()`, `coef()`, `loglike()`, `confint()` and `vcov()` are also available. Now, output contains also parameters for the covariates for both processes. Since covariates are added separately for the purchase and the attrition process, there are also separate model parameters for the two processes. These parameters are directly interpretable as rate elasticity of the corresponding factors: A 1% change in a contextual factor $\bf{X}^{P}$ or $\bf{X}^{L}$ changes the purchase or the attrition rate by $\gamma_{purch}\bf{X}^{P}$ or $\gamma_{life}\bf{X}^{L}$ percent, respectively [@Gupta1991]. In the example of the apparel retailer, we observe that female customer purchase significantly more (`trans.Gender=1.42576`). Note, that female customers are coded as 1, male customers as 0. Also customers acquired offline (coded as Channel=1), purchase more (`trans.Channel=0.40304`) and stay longer (`life.Channel=0.9343`). Make sure to check the Karush-Kuhn-Tucker optimality conditions of the first and second order [@KKT] (KKT1 and KKT1) before interpreting the parameters. If those criteria are not met, the optimizer has probably not arrived at an optimal solution. If this is the case it is usually a good idea to rerun the estimation using alternative starting values. +To inspect the estimated model we use `summary()`, however all other commands such as `print()`, `coef()`, `loglike()`, `confint()` and `vcov()` are also available. Now, output contains also parameters for the covariates for both processes. Since covariates are added separately for the purchase and the attrition process, there are also separate model parameters for the two processes. Note that while significance indicators are `NA` for the main model parameters, they are present for the covariate parameters because a hypothesis test relative to a null of 0 does make sense for them. These parameters are directly interpretable as rate elasticity of the corresponding factors: A 1% change in a contextual factor $\bf{X}^{P}$ or $\bf{X}^{L}$ changes the purchase or the attrition rate by $\gamma_{purch}\bf{X}^{P}$ or $\gamma_{life}\bf{X}^{L}$ percent, respectively [@Gupta1991]. In the example of the apparel retailer, we observe that female customer purchase significantly more (`trans.Gender=1.42576`). Note, that female customers are coded as 1, male customers as 0. Also customers acquired offline (coded as Channel=1), purchase more (`trans.Channel=0.40304`) and stay longer (`life.Channel=0.9343`). Make sure to check the Karush-Kuhn-Tucker optimality conditions of the first and second order [@KKT] (KKT1 and KKT1) before interpreting the parameters. If those criteria are not met, the optimizer has probably not arrived at an optimal solution. If this is the case it is usually a good idea to rerun the estimation using alternative starting values. ```{r Cov-summary} From 7ef3062a0fdc6ca3ee7e3209265ff72f64cf35fd Mon Sep 17 00:00:00 2001 From: Patrik Schilter Date: Thu, 28 Aug 2025 17:03:47 +0200 Subject: [PATCH 3/4] Examples: Explain why NA --- man-roxygen/template_examples_nocovmodelinterface.R | 5 ++++- man/bgnbd.Rd | 5 ++++- man/ggomnbd.Rd | 5 ++++- man/pnbd.Rd | 5 ++++- 4 files changed, 16 insertions(+), 4 deletions(-) diff --git a/man-roxygen/template_examples_nocovmodelinterface.R b/man-roxygen/template_examples_nocovmodelinterface.R index b92de087..df3de503 100644 --- a/man-roxygen/template_examples_nocovmodelinterface.R +++ b/man-roxygen/template_examples_nocovmodelinterface.R @@ -21,7 +21,10 @@ #' # estimated coefs #' coef(apparel.<%=name_model_short%>) #' -#' # summary of the fitted model +#' # summary of the fitted model. +#' # Note that the significance indicators are set to NA on purpose because all +#' # model parameters are by definition strictly positive. A hypothesis test +#' # relative to a null of 0 therefore does not make sense. #' summary(apparel.<%=name_model_short%>) #' #' # predict CLV etc for holdout period diff --git a/man/bgnbd.Rd b/man/bgnbd.Rd index b9fc4e32..8023a9d6 100644 --- a/man/bgnbd.Rd +++ b/man/bgnbd.Rd @@ -130,7 +130,10 @@ apparel.bgnbd <- bgnbd(clv.data.apparel, # estimated coefs coef(apparel.bgnbd) -# summary of the fitted model +# summary of the fitted model. +# Note that the significance indicators are set to NA on purpose because all +# model parameters are by definition strictly positive. A hypothesis test +# relative to a null of 0 therefore does not make sense. summary(apparel.bgnbd) # predict CLV etc for holdout period diff --git a/man/ggomnbd.Rd b/man/ggomnbd.Rd index 6fcbbad0..708082eb 100644 --- a/man/ggomnbd.Rd +++ b/man/ggomnbd.Rd @@ -116,7 +116,10 @@ apparel.ggomnbd <- ggomnbd(clv.data.apparel, # estimated coefs coef(apparel.ggomnbd) -# summary of the fitted model +# summary of the fitted model. +# Note that the significance indicators are set to NA on purpose because all +# model parameters are by definition strictly positive. A hypothesis test +# relative to a null of 0 therefore does not make sense. summary(apparel.ggomnbd) # predict CLV etc for holdout period diff --git a/man/pnbd.Rd b/man/pnbd.Rd index 956570d9..f4a774e4 100644 --- a/man/pnbd.Rd +++ b/man/pnbd.Rd @@ -173,7 +173,10 @@ apparel.pnbd <- pnbd(clv.data.apparel, # estimated coefs coef(apparel.pnbd) -# summary of the fitted model +# summary of the fitted model. +# Note that the significance indicators are set to NA on purpose because all +# model parameters are by definition strictly positive. A hypothesis test +# relative to a null of 0 therefore does not make sense. summary(apparel.pnbd) # predict CLV etc for holdout period From 9473a0e6070c5ae0078fe4f832820ab4e79927d2 Mon Sep 17 00:00:00 2001 From: Patrik Schilter Date: Thu, 28 Aug 2025 17:05:24 +0200 Subject: [PATCH 4/4] Improve docu --- man-roxygen/template_summary_clvfitted.R | 4 ++-- man/summary.clv.fitted.Rd | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/man-roxygen/template_summary_clvfitted.R b/man-roxygen/template_summary_clvfitted.R index aaea32ad..092fad8e 100644 --- a/man-roxygen/template_summary_clvfitted.R +++ b/man-roxygen/template_summary_clvfitted.R @@ -14,8 +14,8 @@ #' (for example if specified in parameter \code{optimx.args}), all information here refers to #' the last method/row of the resulting \code{optimx} object. #' -#' Note that for the main model coefficients (coefs not for covariates), -#' \code{z-val} and p-values are set to \code{NA} because they are by definition always +#' Note that for the main model coefficients (all coefs not for covariates), +#' the significance indicators \code{z-val} and p-values are set to \code{NA} because they are by definition always #' strictly positive and hypothesis test relative to a null of 0 does not make sense. #' #' diff --git a/man/summary.clv.fitted.Rd b/man/summary.clv.fitted.Rd index 28bb12e6..566d4596 100644 --- a/man/summary.clv.fitted.Rd +++ b/man/summary.clv.fitted.Rd @@ -70,8 +70,8 @@ and information about the optimization process. If multiple optimization methods (for example if specified in parameter \code{optimx.args}), all information here refers to the last method/row of the resulting \code{optimx} object. -Note that for the main model coefficients (coefs not for covariates), -\code{z-val} and p-values are set to \code{NA} because they are by definition always +Note that for the main model coefficients (all coefs not for covariates), +the significance indicators \code{z-val} and p-values are set to \code{NA} because they are by definition always strictly positive and hypothesis test relative to a null of 0 does not make sense. } \examples{