From:"S.Beal" 
Subject:NONMEM Bug - Noninfluential Correlated Eta
Date: Thu, 14 Nov 2002 12:08:33 -0800 (PST)

Nick Holford and Diane Mould have recently helped identify  an  interesting
bug  in  NONMEM.   It  affects results obtained with conditional estimation
only, and it has been present in  all  NONMEM  releases  since  conditional
estimation  was  introduced  in  1992.  It cannot be easily fixed in NONMEM
Version V.  It is identifiable in a fairly easily way, and chances are that
if  you  had  been  affected  by  it,  and paying attention, you would have
noticed it.  However, the circumstances in which it arises  (see  I  below)
are  rather  rare,  and to date, Nick and Diane may well have been the only
ones affected by this bug.

This bug affects NONMEM results in two different ways:
A)  Population  estimates  obtained  via  conditional  estimation  are  not
correct.  However, limited investigation and informal reasoning (see below)
suggest that the discrepancies will not be  large.   Most  analyses  obtain
population  estimates  using the First-Order estimation method, and because
this method is not a conditional  estimation  method,  such  estimates  are
unaffected by the bug.
B) Conditional on whatever values are used for the  population  parameters,
estimates  of individuals' parameters are not correct.  So if, for example,
the First-Order estimation method  is used to obtain population  estimates,
but  then  the  POSTHOC  option is used to obtain estimates of individuals'
parameters, these latter estimates are affected by the bug.   Certain  such
estimates  are  quite  noticably  affected,  allowing the bug to be clearly
observed.   Certain other such estimates are far less adversely affected.

The discussion below contains:

  (I) a description of the circumstances in which the bug arises

  (II) a description of the bug's effects and how  one  might  notice  that
  there is indeed some problem

  (III) a description of a method with which one might revisit an  analysis
  where  this  bug occurred and investigate to what extent the results were
  affected by the bug

  This method might also allow one to redo the entire analysis "correctly",
  and  it  can be used to more correctly implement future analyses where it
  is known that the bug will occur.

I.  Circumstances for the Bug to Appear
There are three conditions, all of which must occur together.

A.
As  indicated  above, conditional estimation (including POSTHOC estimation)
is used.

B.
The model contains an eta whose value for some individual has no  influence
on  the  individual's  data.  For example, suppose that PK and PD responses
are being fit simultaneously, that the PD model is an EMAX  model  with  an
eta (eta1) on EMAX, but no PD responses are available from some individual.
Then the value of eta1 for the individual in question has no  influence  on
the  individual's  data.  We shall call the eta "a noninfluential eta" (for
the data from the given individual).   With  most  studies  where  multiple
types  of  response are observed, all types are observed with all patients,
and so a noninfluential eta does not exist.

C.
The noninfluential eta is correlated with another eta that  does  influence
the  data.   That  is, the structure of OMEGA matrix must be specified such
that a nonzero covariance element between the two eta's exists.   We  shall
call  such  a  noninfluential  eta "a correlated noninfluential eta".  Com-
monly, but by no means invariably, population modelers have constrained the
OMEGA  matrix  so  that  it has only zero-valued covariance elements.   But
even when a noninfluential eta is correlated with another eta, usually this
other  eta  is also noninfluential; e.g.  etas affecting PK are usually not
taken to be correlated with etas affecting PD.

II.  The Bug's Effects

If there are too few data available from an individual, an estimate of  the
vector  of  etas  variables pertaining to the individual cannot be obtained
from these data alone.  So in general, along with the individual's data,  a
multivariate normal prior distribution on the eta vector is used to further
constrain the estimate.  The parameters of this distribution are the  popu-
lation  parameters:  the mean (which is the same as the mode) and variance-
covariance matrix of the distribution are taken to be the 0 vector and  the
OMEGA  matrix,  respectively.  Then the estimate of the eta vector is taken
to be the mode of the posterior Bayes distribution, given the  individual's
data.   If an eta in noninfluential, so that there are no data at all which
bear directly on its value, then its estimate will depend  heavily  on  the
prior  constraint.   The estimate is identical to the corresponding element
of the mode of the conditional prior distribution, given that the values of
the  influential  etas  are equal to their estimates.  If the eta is nonin-
fluential and noncorrelated, the conditioning has no effect, and the  esti-
mate  is  simply  0.   If  the  eta is a correlated noninfluential eta, the
correlation affects the conditioning in such a way that  estimate  is  gen-
erally  different  from 0.  A scatterplot between the estimates of a corre-
lated noninfluential eta and those of an eta with which  it  is  correlated
should reflect the correlation.

The effect of the bug is that this is not actually what happens.  If an eta
is  used  in the control stream, which, for some individual is a correlated
noninfluential eta, then the estimate of the eta vector for this individual
is  not  exactly the posterior mode; it lies closer to the 0 vector than it
should.  (Of course, if there are no actual observations  for  the  indivi-
dual, then the posterior mode is in fact 0.)

The error in the estimate of an influential eta is not likely to be  large.
As  it  turns  out,  for  the purpose of computing the estimate, a value of
OMEGA is taken which is too small, and this is what leads to the  incorrect
estimate.   It  may  be  seen  analytically that this results in a relative
error in the expected value of the estimate which, as the  amount  of  data
from  the  individual  increases,  decreases inversely proportional to this
amount.  Also, because the incorrect  estimate  lies  between  the  correct
estimate  and 0, the relative error cannot be larger than 100%.  It is pos-
sible that as the amount of data decreases, this bound is  approached,  but
at the same time the correct estimate itself becomes small.

On the other hand, with the bug the estimate of a correlated noninfluential
eta is always exactly 0 (even if it very highly correlated with an influen-
tial eta whose estimate is far from 0).  If one  displays  the  values  for
this  eta  in  a  table  or  scatterplot, one should see this spurious zero
value.

More generally, consider a parameter modeled in  terms  of  eta  variables.
For  a  given  individual,  we usually call the value of the parameter, the
"individual's parameter" (e.g. the "individual's CL").   To  estimate  this
value, the estimate of the eta vector is substituted into the model for the
parameter.  The typical individual is the (hypothetical)  individual  whose
eta  vector is 0, and the parameter for this individual is obtained by sub-
stituting the 0 vector into the model  for  the  parameter;  the  resulting
value is regarded as a population parameter.  If an eta is used in the con-
trol stream, which, for some individual is a correlated noninfluential eta,
then  the estimate of the individual's parameter lies closer to the parame-
ter for the typical individual than it should.  If the parameter itself  is
modeled  with  only  influential  etas, the discrepancy is likely not to be
much.   If  it  is  modeled  with  a  noninfluential  correlated  eta,  the
discrepancy can be much greater.  If it is modeled only with noninfluential
correlated etas, the estimate is exactly the typical  individual's  parame-
ter.

Using the First-Order estimation method, estimates of the population param-
eters THETA, OMEGA (and SIGMA) are obtained independently from estimates of
the eta vector.  Using conditional estimation methods, estimates  of  these
parameters  depend on the estimates of the eta vector from each individual.
(Of course, an estimate of the eta vector itself depends on values for  the
population  parameters; see NONMEM Users Guide VII for how this circularity
is handled.)  If the latter estimates  are  not  quite  correct,  then  the
former estimates are also not quite correct.  (Of course, only estimates of
influential elements of the eta vector affect the estimates of the  popula-
tion  parameters.)  However, compare the First-Order estimation method (FO)
with the First-Order Conditional Estimation method  (FOCE).   Both  methods
may  be viewed as using the same general objective function, which involves
values for eta vectors for all individuals, but each method uses  different
values  (again;  see NONMEM Users Guide VII).  FO uses the 0 vector for all
of these values, and FOCE, in an attempt to reduce bias, uses the posterior
modes  (described above).  If, due to the bug, the values actually used are
closer to 0 than they should be, then FOCE is simply "more FO-like" than it
should  be.  In this case FOCE may not be entirely as effective in reducing
bias as one hopes it will be, but at  worst,  its  estimates  are  probably
acceptable in the same way that FO estimates often are.

When the correct FOCE results are already close to the FO results, as  when
the  amounts  of data from all but an insignificant fraction of individuals
are small, what discrepancies exist due to the  bug  must  be  small.   The
correct  FOCE  results  can  differ substantially from FO results only when
there are at least moderate amounts of data from a significant fraction  of
individuals,  and  so  it  is only in this circumstance that FOCE is really
helpful.  As explained above, discrepancies in the computation of the  pos-
terior  modes  become  small  as  the  amounts of data from the individuals
increase, and therefore, so do the discrepancies in the computation of  the
FOCE  objective  function.  Therefore, it is unlikely that descrepancies in
FOCE estimates will be large.

Generally, it will not be possible to spot the bug by examining only  popu-
lation estimates.  One will need to look closely at estimates of individual
parameters or of etas, as explained above.

III.  Zeta Transformation

There is a way to rewrite a model that has used  etas,  so  that  all  etas
become  influential  for every individual, and thus the bug will not occur.
However, this technique is somewhat unwieldy, and one would not want to use
it  as  a  matter  of  course.   I suggest that it be tried only when it is
desired to check the effects of the bug on an old analysis, or to avoid the
bug in a future analysis where one knows the bug will arise.

Consider a subset of n etas whose variance-covariance matrix is a  diagonal
block  of  OMEGA  with nonzero covariance elements - i.e. the initial esti-
mates of the elements of the matrix is  given  by  an  $OMEGA  record  with
option BLOCK(n) - and which includes a correlated noninfluential eta.  Call
such a subset a "subset of concern".  Such a subset might  consist  of  all
etas  in  the  model (and then there is only a single $OMEGA record).  How-
ever, by definition, a subset of size n=1 cannot be "of concern".

Let the n etas be renamed zetas, and consider a new set of etas, related to
the zetas as follows:

zeta1 = eta1  +  (eta2  +  eta3  +  ...  +  etan)/2
zeta2 = eta2  +  (eta1  +  eta3  +  ...  +  etan)/2
 .
 .
 .
zetan = etan  +  (eta1  +  eta2  +  ...  +  etam)/2

where m=n-1.

If an individual's data are influenced by at least one zeta,  then  because
all zetas depend on all etas, it follows that there can only be influential
etas for the individual's data, and so there cannot be a correlated  nonin-
fluential  eta  for these data.  If an individual's data are not influenced
by any zeta, then these data are also not influenced by *any* eta,  and  by
definition, there can be no *correlated* noninfluential eta.  Therefore, in
both cases the bug will not be active.  (NONMEM accepts a model  such  that
the  data  from  some  individuals  are  not  influenced by any etas in the
model.)

We call the above  transformation  of  new  etas  to  old  etas,  the  zeta
transformation.   This  transformation  should be applied separately to all
subsets of concern.

So basically, the NONMEM user need only take abbreviated code such  as  the
following,

CL  =THETA(1)*EXP(ETA(1))
V   =THETA(2)*EXP(ETA(2))
KA  =THETA(3)*EXP(ETA(3))
ALAG=THETA(4)+EXP(ETA(4))

and assuming that {eta1, eta2, eta3} contains a  correlated  noninfluential
eta  and  that  the  variance-covariance matrix of these etas is a diagonal
block of OMEGA, replace this code with:

ZETA1=ETA(1)+HALF*(ETA(2)+ETA(3))
ZETA2=ETA(2)+HALF*(ETA(1)+ETA(3))
ZETA3=ETA(3)+HALF*(ETA(1)+ETA(2))
CL=THETA(1)*EXP(ZETA1)
V =THETA(2)*EXP(ZETA2)
KA=THETA(3)*EXP(ETA(3))
ALAG=THETA(4)+EXP(ETA(4))

In the original code, we have used an eta on the ALAG parameter simply  for
illustrative purposes.  Most often it is not possible to obtain an estimate
of interindividual variability in the absorption lag parameter.  Note  that
if  indeed, the only etas are the four shown above, then because the subset
{eta4} is a set of size 1, it cannot be of concern.

It is easy to implement the type of  abbreviated  code  illustrated  above.
However,  there  is  further complication.  The initial estimates given for
the diagonal block of OMEGA, i.e. in the $OMEGA BLOCK(n) record,  must  now
be those for the new etas.  How will they be obtained?

Using matrix notation, the new etas are given by

eta = A zeta

where

Aij= -1/(1+(n-1)/2)        if i!=j
   = (2+(n-2))/(1+(n-1)/2) if i=j

If OMO is the original matrix of initial estimates, then (A OMO A)  is  the
new matrix of initial estimates.

Here is a FORTRAN program that will produce (A OMO A).  It will prompt  the
user to key in the dimension of OMO, and then OMO itself.  The program will
output the new matrix of initial estimates.

      double precision omega(10,10),od,d,s,b(10,10),a(10,10)
      write (6,*) 'input dimension of diagonal block'
      read (5,*) n
      write (6,*) 'input diagonal block itself in lower-triangular form'
      read (5,*) ((omega(i,j),j=1,i),i=1,n)
      od=-1/(1+.5*(n-1))
      d=(2+(n-2))/(1+.5*(n-1))
      do 5 j=1,n
      do 5 i=1,n
      if (i.eq.j) then
         a(i,j)=d
      else
         a(i,j)=od
      endif
    5 continue
      do 10 j=1,n
      do 10 i=1,j
   10 omega(i,j)=omega(j,i)
      do 20 j=1,n
      do 20 i=1,j
      s=0.
      do 15 k=1,n
      do 15 l=1,n
   15 s=s+a(i,k)*omega(k,l)*a(j,l)
      b(i,j)=s
   20 b(j,i)=s
      write (6,*) n,((b(i,j),j=1,i),i=1,n)
      end

If OMO is, e.g. 

  .1  .01 .01
  .01 .1  .01
  .01 .01 .1

then the lower triangle is

  .1
  .01 .1
  .01 .01 .1

and one keys in: .1 .01 .1 .01 .01 .1
               
This program would be used for all subsets of concern.

Then, after running NONMEM and obtaining final  population  estimates,  one
needs  to  obtain the final estimate for the diagonal block of the original
OMEGA matrix.  We have

zeta = B eta

where

Bij= . 5  if i!=j
   = 1.0  if i=j

If OMN is the final estimate of the new diagonal block, then (B OMN  B)  is
the final estimate of the original diagonal block.

Here is a FORTRAN program that will produce (B OMN B).  It will prompt  the
user to key in the dimension of OMN, and then OMN itself.  The program will
output the final estimate of the original diagonal block.

      double precision omega(10,10),od,d,s,b(10,10),a(10,10)
      write (6,*) 'input dimension of diagonal block'
      read (5,*) n
      write (6,*) 'input diagonal block itself in lower-triangular form'
      read (5,*) ((omega(i,j),j=1,i),i=1,n)
      od=.5
      d=1.
      do 5 j=1,n
      do 5 i=1,n
      if (i.eq.j) then
         a(i,j)=d
      else
         a(i,j)=od
      endif
    5 continue
      do 10 j=1,n
      do 10 i=1,j
   10 omega(i,j)=omega(j,i)
      do 20 j=1,n
      do 20 i=1,j
      s=0.
      do 15 k=1,n
      do 15 l=1,n
   15 s=s+a(i,k)*omega(k,l)*a(j,l)
      b(i,j)=s
   20 b(j,i)=s
      write (6,*) n,((b(i,j),j=1,i),i=1,n)
      end

This   program   would   be   used   for   all    subsets    of    concern.
______________________________________________________________________