From: Erik Olofsen <>
Subject: Maximizing coefficient of determination
Date: Thu, 8 Feb 2001 15:32:55 +0100 (CET)

Dear NONMEM users,

Instead of maximizing the likelihood of a set of observations I would like to maximize the coefficient of determination, given by:

r2 = 1 - sum((yi-yhati)^2)/sum(yi-mean(y)^2) eq.(1)

where yi is a measured variable and yhati is its prediction. Both yi and yhati are given by models that contain parameters to be estimated, and minimizing the sum of squares would lead to parameter values that give the optimal, but meaningless yi = yhati = constant.

Now note that the rightmost part of eq.(1) can written as

sum((yi-yhati)^2)/N/sigma^2 eq.(2)

and the log-likelihood function for normally distributed observations is

LL = -N log(sigma) - N/2 log(2pi) - 1/2/sigma^2 sum((yi-yhati)^2) eq.(3)

so maximizing the correlation coefficient would be equivalent with maximizing LL if we would drop the first (and second) term and let sigma be given by eqs.(1) and (2) instead of letting this be an estimable parameter of the residual standard deviation.

I've implemented this in NONMEM using the LIKELIHOOD option of the $ESTIMATION record and it works. At the moment I combine observations of a population by taking the rightmost part of eq.(3) where sigma may depend on the individual, but I'm not sure yet how to incorporate the fact that N is not the same for each individual.

I would like to ask you to comment on this procedure and whether it affects NONMEM parameter estimation and hypothesis testing using the minimum value of the objective function in ways I might overlook.

Erik Olofsen
Department of Anesthesiology
Leiden University Medical Center
The Netherlands


From: LSheiner <>
Subject: Re: Maximizing coefficient of determination
Date: Thu, 08 Feb 2001 08:42:30 -0800

Dear Erik,

Before anything else, we all need to understand what you are doing and why.

I do not understand, for example, your statement: "Both yi and yhati are given by models that contain parameters to be estimated."

From the usual definition of r2, yi would be the observations, and they would be fixed. In which case your objective function is a linear transformation of the sum of squares sum((yi-yhati)^2, and maximizing it is equivalent to minimizing the latter.

Perhaps, though, you are transforming yi (say g(yi)) and trying to estimate the parameters of the transformation g at the same time as the parameters of the model yhat? That would make your statement "minimizing the sum of squares would lead to parameter values that give the optimal, but meaningless yi = yhati = constant" understandable. In this case maximizing the (squared) correlation between yhati and g(yi) subject to the constraint that var(g(yi))=1 is indeed reasonable, and is implemented in the ACE algorithm (See Breiman, L. and J. H. Friedman (1985). Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80: 580-598.), available in S+. For parametric g and yhati, a likelihood-based approach is more flexible ("Transform Both Sides" - TBS - see Carroll & Ruppert, Transformation and Weighting in Regression, NY, Chapman & Hall, 1988) and can be adapted to hierarchical models in NONMEM.

_/ _/ _/_/ _/_/_/ _/_/_/ Lewis B Sheiner, MD (
_/ _/ _/ _/_ _/_/ Professor: Lab. Med., Bioph. Sci., Med.
_/ _/ _/ _/ _/ Box 0626, UCSF, SF, CA, 94143-0626
_/_/ _/_/ _/_/_/ _/ 415-476-1965 (v), 415-476-2796 (fax)