The aim of the SURVPOOL project is to investigate the potential influence of lifetime exposure to modifiable risk factors, such as overweight and obesity, on the occurrence of cancer and death due to cancer. Here, we describe succinctly the methodology used to

- - define individual-specific indicators of lifetime exposure to overweight and obesity from repeated BMI measurements; and
- - assess the effect of these indicators on a time-to-event outcome (occurrence of cancer or death due to cancer).

\( \newcommand{\ud}{\mathrm{d}} \newcommand{\p}{\mathrm{P}} \newcommand{\np}{\mathrm{NP}} \newcommand{\pr}{\mathrm{Pr} } \newcommand{\tlm}{\textlengthmark} \newcommand{\bth}{\boldsymbol{\theta}} \newcommand{\yb}{\mathbf{y}} \newcommand{\xb}{\mathbf{x}} \newcommand{\zb}{\mathbf{z}} \newcommand{\w}{\mathbf{w}} \newcommand{\rr}{\mathrm{RR}} \newcommand{\bbt}{\boldsymbol{\beta}} \newcommand{\bgm}{\boldsymbol{\gamma}} \newcommand{\xbi}{\boldsymbol{\xi}} \newcommand{\bzt}{\boldsymbol{\zeta}} \)

Body mass index (BMI) trajectories over time will be modelled for every study participant with at least two BMI assessments (after exclusion of BMI information in the year before and the year after cancer diagnosis for those who developed an invasive malignancy). This will be done using a quadratic growth model with a random intercept and random slope. More precisely, for individual \(i\) and measurement occurrence \(j\), the BMI will be modelled as a quadratic polynomial of age according to the following equation: \begin{equation} \label{eq:1} \nonumber \mathrm{BMI}_{ij}= (\alpha_0+u_{0i})+(\alpha_1+u_{1i})\cdot\mathrm{Age}_{ij}+\alpha_2\cdot\mathrm{Age}_{ij}^2+\varepsilon_{ij} \end{equation} \begin{equation} \label{eq:3} \nonumber \mathrm{with~~} \begin{pmatrix} u_{0i} \\ u_{1i} \end{pmatrix} \sim N(\mathbf{0},\Sigma) \mathrm{~~and~~} \varepsilon_{ij}\sim N(0,\sigma^2) \end{equation} This model describes an individual's BMI trajectory as the sum of

- an overall trajectory (i.e. the mean trajectory of the study population), given by the quadratic polynomial in age with coefficients \(\alpha_0\), \(\alpha_1\) and \(\alpha_2\); and
- an individual-specific component given by the random coefficients \(u_{0i}\) and \(u_{1i}\) (the individual-specific random intercept and slope, respectively); this allows the individual trajectory to diverge from the overall trajectory.

Each individual BMI trajectory can then be described by a simple polynomial equation using the estimated parameters from this model, as illustrated in the figure below. The left panel shows the observed trajectories for a group of individuals, with two individual trajectories highlighted (red and dark-blue lines). The number and the timing of the measurements differ from one individual to the next, making it difficult to determine meaningful average values (e.g. the mean BMI at the age of 50 years). The right panel shows the result of the modelling process: the dashed black curve represents the mean BMI trajectory of the population, and each light-blue line represents an individual-specific trajectory (the red and dark-blue curves represent the modelled trajectories for the same two individuals highlighted in the left panel). These individual-specific curves can be used to define individual BMI-related variables, as explained in the next section.

The growth curve model described in the previous section enables us to define individual-specific BMI trajectories as simple quadratic polynomial functions of age. These curves will be used to define several BMI-related variables that summarize the individual BMI trajectories and that can be used as predictors in time-to-event models. For the purpose of this example, we restrict our interest to summary BMI-related variables over a specified period of time (in this case, between the ages of 20 and 50 years), of length \(T\). We describe three such variables below:

- \(\mu_{\mathrm{BMI}}\) is the area under the BMI trajectory curve over the defined period of time. The mean BMI over the entire time period can be calculated by dividing \(\mu_{\mathrm{BMI}}\) by \(T\), the length of the period.
- \(t_{v}\) is the time spent with a BMI at or exceeding a given threshold value, \(v\) (typically \(v\)=25
for
*overweight*, and \(v\)=30 for*obesity*). This variable has a value of 0 for individuals whose BMI was continuously below \(v\) over the time period and a maximum value of \(T\) for individuals with a BMI continuously at or above \(v\). - \(\mathrm{BMI}_{v}\) is the BMI-years spent with a BMI at or exceeding a given threshold value, \(v\). This variable combines information about the time spent at or above \(v\) with information about the magnitude of the BMI: \(\mathrm{BMI}_{v}\) like \(T\), has a value of 0 for individuals with a BMI continuously below \(v\); for a given amount of time spent above \(v\), individuals with a higher BMI will have a higher \(\mathrm{BMI}_{v}\) value.

The next step is to use the previously defined variables as predictors in time-to-event models (e.g. Cox proportional hazard regression models) to predict the occurrence of cancer (or of death in cancer patients). The general use of this kind of model is to assess whether the individual-specific hazard (i.e. the function of time describing the instantaneous risk of presenting the event of interest) is dependent on the variables of interest (e.g. BMI-related variables), possibly after adjustment for variables that are already known to have an effect on the event of interest.

A standard model formulation might relate the hazard of, for example, cancer occurrence \(\lambda\) at time \(t\) and a BMI-related variable (e.g. \(\mathrm{BMI}_{25}\) as defined in the previous section) according to the following equation: \begin{equation} \nonumber \lambda(t,\mathrm{BMI}_{25},\xb)=\lambda_0(t)\exp\bigl\{\beta\,\mathrm{BMI}_{25}+\xb^{T}\bgm\bigr\} \end{equation} where \(\lambda_0\) is the baseline hazard, \(\xb\) represents a vector of adjusting variables (e.g. age, gender, or smoking status), and \(\beta\) and \(\bgm\) are regression coefficients related to the effects of the variables on the hazard. In particular, the exponential of \(\beta\) is the hazard ratio related to the variable \(\mathrm{BMI}_{25}\): it indicates by how much the hazard at any point in time is increased when the variable \(\mathrm{BMI}_{25}\) is increased by one unit.