From: Justin Wilkins jwilkins@uctgsh1.uct.ac.za
Subject: [NMusers] $OMEGA blocks and log-likelihood profiling
Date: Mon, May 31, 2004 9:26 am

Hi all,

When doing a log-likelihood profile of a model containing an $OMEGA 
BLOCK(n) structure, what is best practice for mapping ETAs that are 
defined by this kind of parameterization? It's my understanding that 
only the entire block can be fixed...

Justin
-- 
-------------------------------------------
Justin Wilkins
-------------------------------------------
Division of Clinical Pharmacology
Department of Medicine
University of Cape Town
-------------------------------------------
K45 Old Main Building
Groote Schuur Hospital
Observatory 7925
South Africa
-------------------------------------------
Email:  jwilkins@uctgsh1.uct.ac.za
Phone:  +27 (0)21 406 6448
Fax:    +27 (0)21 448 1989
Web:    http://www.uct.ac.za/depts/pha/
_______________________________________________________

From: Nick Holford n.holford@auckland.ac.nz
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Tue, June 1, 2004 6:23 am

Justin,

My preference is to bootstrap rather than do lots of LLPs. The bootstrap gives you
confidence intervals without the chi-square null hypothesis assumption (known to be
commonly wrong) required to interpret LLPs.

The bootstrap will of course let you compute confidence intervals for all your
parameters (and secondary parameters derived from the primary parameters).

Nick

--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
 
_______________________________________________________

From: mark.e.sale@gsk.com   
Subject: RE: [NMusers] $OMEGA blocks and log-likelihood profiling 
Date: Tue, June 1, 2004 6:55 am 

Nick, 
  I like your suggestion that bootstrap is a better way to determine CI than LLP.
  The other advantage is speed, one set of bootstrap samples will give you CI for
  all parameters, rather than need to do a set for each parameter. But a question
  (or two). 
I know you've been an advocate that the covariance step success isn't always
necessary to believe the results (and I'm beginning to agree with you). So, I assume
you'd be willing to include NONMEM runs that failed covariance step in your bootstrap
interval.  But, what about other failures?  Do you include termination due to
rounding error?  How do you protect against local minima?  And then there are
the ones that simply crash.  We do this a lot as well (so far we have discarded
the runs that fail covariance step), but are concerned that the runs that fail
are the very ones that we're most interested in (the ones that define the tails
where one or more parameters is poorly estimated), so eve if they are only 2%,
they may be critical in defining the CI.


Mark
_______________________________________________________

From: Leonid Gibiansky lgibiansky@emmes.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Tue, June 1, 2004 10:43 am

Justin
I'll leave the bootstrap/LLP comparison to others, but if you really want 
to do LLP for OMEGA, you need to express those in terms of THETA 
parameters. For diagonal OMEGA matrix it is trivial:

$PK
CL=THETA(1)*EXP(SQRT(THETA(11))*ETA(1))
V=THETA(2)*EXP(SQRT(THETA(22))*ETA(2))
.....
$OMEGA
1 FIXED
1 FIXED

Here you can LLP THETA(11) and THETA(22) that are equivalent to OMEGA11, 
OMEGA22

For off-diagonal elements it is more tricky. One possibility is to express 
the model in terms of
META1=THETA()*ETA(1)
META2=THETA()*ETA(1)+THETA()*ETA(2)
META3=THETA()*ETA(1)+THETA()*ETA(2)+THETA()*ETA(3)
......
CL=THETA()*EXP(META1)
etc.

and profile those THETA-coefficients (this is also the way to check for 
significance of each of the off-diagonal elements/correlations via 
LL).  OMEGA elements are easily expressed in terms of THETAs.

Leonid
_______________________________________________________

From: Nick Holford n.holford@auckland.ac.nz
Subject: RE: [NMusers] $OMEGA blocks and log-likelihood profiling
Date: Tue, June 1, 2004 5:15 pm 

Mark,

You will find some anecdotal evidence about the influence of including all runs or
only successful runs for bootstrap CI here:
http://www.cognigencorp.com/nonmem/nm/99jul152003.html

Some remarks I made then were:

"I bootstrapped the original data set using the preferred model and found 28% of
bootstrap runs minimized successfully and 7.1% ran the $COV step. "

"The mean of the parameters obtained from all bootstrap runs and the mean from those
which ran the $COV step were all within 2%."

"95% confidence intervals obtained from all the bootstrap runs were very similar to
those obtained from minimization successful and $COV successful runs."

"I computed the ratio of the mean standard error from the $COV successful runs to
the bootstrap standard error obtained from all runs. For THETA:se estimates the $COV
SE was on average 3% smaller but for OMEGA:se the $COV SE was 58% larger than the
overall bootstrap SE."

I would add that because $COV ran on some of the bootstrap samples the failure of
NONMEM to run $COV does not mean the model is somehow badly formed (e.g.
overparameterized). It must mean that NONMEM fails $COV because of the data because
the only difference between runs that ran $COV and those that did not was a random
sample of the data.

BTW key parts of the results of the bootstrap exercise I described last year are
on-line and will appear soon in print in BJCP:
http://www.blackwell-synergy.com/links/doi/10.1111/j.1365-2125.2004.02114.x/full

Nick

--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
_______________________________________________________

From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE: [NMusers] $OMEGA blocks and log-likelihood profiling
Date: Wed, June 2, 2004 9:16 am

Nick, Nmusers,

Regardless of whether the $COV step fails, I have a hard time understanding
why only 28% of the bootstrap runs minimized successfully if it wasn't due
to over-parameterization.  Note that over-parameterization does not mean
that the model is badly formed.  It could indeed be a correctly specified
model (e.g., no lack of fit) just that the existing data do not support
estimating all of the parameters.

Ken
_______________________________________________________

From: "Gastonguay, Marc" marc.gastonguay@snet.net
Subject: RE: [NMusers] $OMEGA blocks and log-likelihood profiling
Date: Wed, June 2, 2004 10:27 am   

Ken, Nick, Mark, NMusers:

In addition to problems with over-parameterization, it's possible that a
large fraction of bootstrap runs can fail (e.g. minimization unsuccessful)
even when the model is not over-parameterized: particularly when one of the
conditional estimation methods is used (e.g. methods other than FO).

I've learned from Tom Ludden that NONMEM V can sometimes run into local
minima when searching for individual ETA estimates - resulting in
convergence difficulties for the population model. I suspect that this may
be happening with a fraction of the bootstrap failures under conditional
estimation. I've found that placing constraints on the individual ETA
estimates through PRED-exit codes can be helpful in dramatically reducing
the fraction of failed bootstrap runs.

For example (assuming an exponential inter-individual variance model and 3
variance terms in OMEGA):

IF(ABS(ETA(1)).GT.10) EXIT 1 102
IF(ABS(ETA(2)).GT.10) EXIT 1 103
IF(ABS(ETA(3)).GT.10) EXIT 1 104
... (also use NOABORT option in $EST)

I hope this is helpful.

Marc Gastonguay
_______________________________________________________

From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Wed, June 2, 2004 11:19 am

Marc, Nmusers,

I just have a hard time believing that this is the primary reason when so
large a fraction (78%) fail to converge, and for those that do converge the
$COV step fails.  Certainly there are a number of reasons why we may
encounter minimization failures but I would first rule-out
over-parameterization before I pursued other explanations when such a large
fraction fail.  It would be of interest to review the $COV step output for
some of the bootstrap runs of the 7% that had successful $COV estimation to
see if the diagnostics suggest that the model is over-parameterized for
these bootstrap runs.  For example, are there extremely large SEs for one or
more of the parameters, are there any  pairwise correlations near +/-1, etc?
If so, then over-parameterization would remain high on my list of potential
reasons for the large fraction of minimization failures.

Ken

_______________________________________________________

From: jeffrey.a.wald@gsk.com  
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Wed, June 2, 2004 3:04 pm

I am with Ken on this one.  I am not too troubled by the lack of the $COV,
but the low fraction of convergence is bothersome.  Whatever the cause, I
have a hard time reasoning what the bootstrap actually means in this case.
The sample of parameter estimates is effectively censored by a presumably
non-random filter (vis. NONMEM).  Marc's suggestion sounds pragmatic, but I
still have to think that something more systematic is occurring with the
structural model.

If some parameter/s is/are not identifiable in a large fraction of the runs
then my feeling is that the 'correct' bootstrap distribution for one or
more of the parameters would have a large fraction of its values (up to
72%?) sitting on 1 or 0 or whatever its null value would be.

Jeff

Jeff Wald, PhD
jeffrey.a.wald@gsk.com
Clinical Pharmacokinetics/Modeling and Simulation
Neurology and GI
RTP, NC
_______________________________________________________

From: "Gastonguay, Marc" marc.gastonguay@snet.net   
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Wed, June 2, 2004 4:56 pm 

No argument here, either. When the primary problem is over-parameterization,
constraining ETA estimates probably won't help much. In other cases,
however, constraining individual ETA estimates can be very helpful with
convergence in NONMEM. I have one example where constraining ETAs (within
broad, realistic bounds) reduced the bootstrap run convergence failure rate
from near 50% to less than 1%.

Marc
_______________________________________________________

From: Nick Holford n.holford@auckland.ac.nz
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Thu, June 3, 2004 6:58 am  

Hi,

Thanks to all of you for your opinions.  

Ken does not consider over-parameterization is a form of malformed model (but I
think this is hairsplitting). However, it is my impression is that it is not the
model per se that is causing the failure to converge. Random (non-parametric
bootstrap) samples of the original data are able to use the same model to converge
and in some cases complete the $COV step. So this means to me that the failure is
not a systematic feature of the model. It might however be a systematic problem with
the NONMEM code or the compiler/processor combination which may take different
execution paths depending on the numbers it has to work with.

I looked at one of the bootstrap runs that ran the $COV step. The highest CV%
(SEE/estimate*100) was 85% for between occasion variability in V1 but all the rest
were less than 28%). The highest estimation error correlation was 0.93 (between
renal and non-renal clearance) but not enough to declare overparameterization (See
Ken's contribution http://www.cognigencorp.com/nonmem/nm/99may012003.html). It also
has eigenvalues ranging from 7.35E-03 to 5.26 suggesting ill-conditioning is not
severe (once again see Ken's comments). Do these values invalidate the model's
ability to describe and predict the influence of covariates in predicting individual
clearance values (the main point of the model building study)?

I also tried Marc's suggestion to use EXIT to avoid high etas. Exiting if ETA was
>=5 did not let the model complete its initialization phase. Q and V2 caused exits
if ETA was >= 5.5 and <10. The runs terminated in the same way as the unconstrained
run on the original data "DUE TO PROXIMITY OF LAST ITERATION EST. TO A VALUE AT
WHICH THE OBJ. FUNC. IS INFINITE (ERROR=136) AT THE LAST COMPUTED INFINITE VALUE OF
THE OBJ. FUNCT.". I only tried this on the original dataset. It took about a month
to do the unconstrained bootstrap runs and I have other things to do right now.

With regard to Jeff's comment about overparameterized bootstrap runs tending to have
many "sitting on its null value" -- this certainly wasn't the case for the
parameters defining the covariate effects. But it's not clear to me what he means by
the 'correct' bootstrap distribution.

It seems to me that the performance of a model should be judged by it's purpose and
not by criteria determined by numerical idiosyncrasies. NONMEM V is a bit of a dog
when it comes to local minima as Marc reminds us and the conditional methods are
clearly less robust (but nevertheless IMHO more believable). So I prefer to base
decisions about model performance in how good the fit is by assessment of its merits
in the intended application rather than the ability to jump over certain arbitrary
hurdles.

Nick

--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
_______________________________________________________

From: jeffrey.a.wald@gsk.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling 
Date: Thu, June 3, 2004 7:59 am

Nick

My point referenced what might have been seen in the distributions of
parameter estimates if the runs that failed to converge had actually
provided some information...assuming that the full model was not
identifiable in this set of runs.  What you have seen in the parameters
defining the covariate effects is not surprising as the runs that
potentially show the over-parameterization are the ones that are censored
out of the distribution.

The  'correct' bootstrap distribution statement refers to the distribution
of parameter estimates that makes an appropriate statement about the
uncertainty that we have in the joint distribution of parameter values.  By
potentially ignoring a large fraction of the plausible parameter values -
that is assuming a simpler model would have given a plausible answer - we
risk assuming that we know more than we really do.

Most of this string has been based on the premise that the model is over
parameterized.  I think that proposing technical solutions detracts from
the philosophical point at hand.  Even if the model is not over
parameterized, the fact that so many runs fail to converge tells me
something about the uncertainty in the model, its parameter estimates, or
the tools we have to evaluate the model.  I do not necessarily know how to
quantify that uncertainty, but I have a hard time dismissing it.

Jeff
_______________________________________________________

From: "Kowalski, Ken" 
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Thu, June 3, 2004 12:40 pm

Nick, Jeff, Marc, Nmusers,

I don't think it is hairsplitting depending on your definition of a "badly
formed" or "malformed" model.  I tend to equate such statements with a poor
fitting model.  My point is that good fitting models can suffer from the
effects of over-parameterization just as much as poor fitting models.  

I don't see how you can conclude because 28% of the bootstrap models
converged that the failure of the other 72% is not related to some
systematic feature of the model (although I wouldn't characterize it as some
deficiency in a systematic feature of the model as this might imply that the
model is poorly fitting but rather I would characterize it as a possible
limitation of the data to support the model). Your latest COV step
information regarding the 7% that converged with a successful COV step is
encouraging from the standpoint that over-parameterization is not an issue
for these bootstrap runs but Jeff makes a good point that the ones that did
not converge are censored out from this evaluation.  Perhaps the 93% where
the estimation and/or COV step failed is still related to problems with
over-parameterization.  For example, if your dataset is based on a pooled
analysis with several studies with varying designs where only small portions
of the data from 1 or 2 studies provide information on some key parameter
then a basic bootstrap resampling scheme that does not stratify by study
and/or these key treatment/design features when performing the resampling
from the original dataset may result in bootstrap datasets that
under-represent key data necessary to support the model.  I'm not saying
that this is the issue you are encountering but it certainly is something I
would investigate if I was encountering such a large failure rate in my
bootstrap runs.   

I agree with Jeff on his philosophical point that we want to use
bootstrapping to characterize the uncertainty in our parameters and having a
large fraction fail is somewhat disconcerting about our ability to
characterize this uncertainty.  I always feel more enlightened when I can
identify the root cause for convergence/COV step failures.  From my own
experience these failures are often related to some aspect of
over-parameterization in elements of theta, Omega or Sigma.  I'm not always
successful in resolving these failures, but thinking through possible
limitations of the data to support the model and diagnostic NONMEM runs
specifically trying to resolve the convergence/COV step failures are worth
the effort IMHO.

With regards to NONMEM V's estimation methods being a dog I think you are
being too harsh.  I'm not fully informed on Marc's or Tom's work with FOCE
and problems with convergence to local minima but I'm willing to bet these
problems are especially pronounced when fitting models that are somewhat
over-parameterized.  I draw analogy to Pete Bonate's exercise where he
showed that sensitivity to compiler/NONMEM installations was most pronounced
when fitting ill-conditioned (over-parameterized) models.  I think a lot of
things can go wrong with NONMEM when we push the data too hard in supporting
the models we fit.  Statements that many of you make trivializing the
importance of trying to get a successful COV step and just as important, to
actually review the COV step output just galvanizes my thinking that the
effects of over-parameterization are too often ignored.

I agree with you that at the end of the day we need to develop good fitting
models that meet the purposes/intended use of the model.  However, I
disagree that achieving convergence and successful COV steps in the majority
of the bootstrap runs is an arbitrary hurdle...it's good science.

Ken
_______________________________________________________

From: "Mats Karlsson" mats.karlsson@farmbio.uu.se>
Subject: RE: [NMusers] $OMEGA blocks and log-likelihood profiling
Date: Sat, June 5, 2004 3:27 am

Dear Ken, Jeff, Nick and all,

It seems that in the latest discussion, Nick's original observation that
parameter estimates were the same, regardless of successful COV step or
not has been forgotten. Thus, Ken's suggestion below, that somehow the
7% data sets with successful COV step may contain some information that
the other 93% data sets would not, does not seems plausible. If that had
been the case, at least one parameter would have displayed a different
mean and/or considerably more variability in the runs with failed
COV-step than in those with successful COV step.

Jeff is worried by so many bootstrap runs fail to converge and that
therefore the bootstrap distribution of parameter estimates would be
censored. However, no censoring appears to have taken place as Nick does
report on the distribution of *all* parameter estimates (even non
successful terminations such as "Rounding Errors dominating" will
provide a set of parameter estimates). As the type of termination
(whether successful COV step or not, successful minimization or not) was
not of importance for the resulting distribution of parameter estimates,
and local minima related to the use of the same initial estimates was
not a problem (which I think Nick said it wasn't), I think it much
supports Nicks notion that there is no correlation (in this case)
between termination status and parameter estimates. 

On a general note, we know that with poor models (overparametrisation,
misspecification) we can see failed convergence or failed covariance
steps, whereas for appropriate models, supported by adequate data the
opposite is true. We use the convergence and covariance steps of NONMEM
to diagnose when we're moving from one type of problem (the appropriate)
to another. However, Nick's results strongly suggest that these
diagnostics are not appropriate in this case. This should not come as a
surprise. We have here a situation where the COV step diagnostic is not
perfect. Most of us have had similar experiences before. Nick seems to
draw the conclusion that it therefore is useless. I don't agree with
this, but still see COV step as a diagnostic with value even if we
should be cautious in accepting both positive and negative results from
it. Thus, with access to results from a properly carried out bootstrap,
I would not care about COV-step failure or not, just as I generally
would not pay attention to NONMEM's approximate SEs if I had bootstrap
confidence intervals. COV-step is a cheap (in a positive sense) and
often informative diagnostic, but yet not as informative as the more
expensive bootstrap.

Best regards,
Mats

--
Mats Karlsson, PhD
Professor of Pharmacometrics
Div. of Pharmacokinetics and Drug Therapy
Dept. of Pharmaceutical Biosciences
Faculty of Pharmacy
Uppsala University
Box 591
SE-751 24 Uppsala
Sweden
phone +46 18 471 4105
fax   +46 18 471 4003
mats.karlsson@farmbio.uu.se

_______________________________________________________

From: Nick Holford 
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Sat, June 5, 2004 4:52 am

Mats, thank you for re-emphasising my earlier assertion.

So everyone can make their own judgement of the similarity of the bootstrap average
and stdev obtained with all runs, minimization successful runs and covariance step
successful runs here are the parameters of a 2 compartment model (BSV* are
sqrt(OMEGA), CVAG and SDAG are sqrt(SIGMA), R12 and R34 are correlations of random
effects of CL and V and V2 and Q) plus some covariate parameters.

Param        avg-all        avg-sxs        avg-cov        sd-all        sd-sxs        sd-cov
CL        4.70        4.68        4.64        0.12        0.13        0.11
Vss        30.8        31.1        30.6        1.75        1.68        1.68
CLNR        1.12        1.12        1.11        0.24        0.25        0.30
CLR        3.56        3.56        3.53        0.23        0.24        0.27
V1        19.52        19.53        19.53        0.26        0.25        0.22
Q        1.02        1.01        1.00        0.11        0.10        0.09
V2        11.28        11.29        11.03        1.52        1.52        1.66
BSVCL        0.30        0.29        0.29        0.017        0.016        0.019
BSVV1        0.24        0.24        0.24        0.014        0.014        0.013
BSVQ        0.82        0.82        0.84        0.081        0.076        0.076
BSVV2        1.17        1.20        1.17        0.22        0.21        0.21
CVAG        0.16        0.16        0.16        0.006        0.006        0.005
SDAG         0.14        0.14        0.15        0.025        0.024        0.020
R12        0.55        0.55        0.55        0.059        0.064        0.058
R34         0.90        0.91        0.90        0.085        0.082        0.052
KAGECR        111.9        111.8        112.4        5.88        6.39        7.63
FSEXCR        0.82        0.82        0.82        0.033        0.032        0.027
KAGERF        119.0        119.1        118.4        8.05        7.95        7.25
FCPRLO        0.70        0.70        0.71        0.030        0.031        0.031

I don't contend that NONMEM's asymptotic standard errors are useless. But failure of
the $COV step is not a reliable sign that the parameters are not trustworthy (see
above). Equally I believe that success of the $COV step implies little more than
good (numerical) luck. The SE estimates of the $COV step are a cheap and quick way
of understanding something about estimation uncertainty but when they disagree with
the bootstrap values I prefer the bootstrap.

Nick

--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
_______________________________________________________

From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Tue, June 8, 2004 5:23 pm

Mats, Nick, and all,

I have not forgotten Nick's original premise that the parameter estimates
were the same regardless of successful convergence or successful COV step.
However, my preference is to understand why he has such a high convergence
failure rate.  In my opinion Nick has two choices, 1) he can try to
understand why the failure rate is so high perhaps leading to a more stable
model selection (if his model is over-parameterized) that resolves the high
failure rate, or 2) he can verify that the empirical marginal distributions
of the parameter estimates are the same between the successful and failed
convergence runs.  I don't believe Nick has provided sufficient information
to assess the latter and I will respond directly to his message with what I
think would provide compelling evidence that his bootstrap sample
distribution is indeed independent of convergence status.  I could be
convinced with sufficient data that pooling bootstrap estimates from both
the successful and failed convergence runs is OK for his particular example
but Nick wants to generalize based on this one example to conclude that as
long as we are doing bootstrapping we never have to worry about convergence
and/or COV step failures and this is where I take exception.  

Suppose with another example with a high convergence failure rate the
bootstrap distributions between the successful and failed runs are
different.  In this setting the analyst has no choice but to go back and try
to figure out why he/she has such a high failure rate.  This is where the
COV step provides diagnostic information that may be helpful.  In Nick's
example, I wanted to try and understand why he has such a high convergence
failure rate and the 7% that had a successful COV step do have additional
information to assess the possible instability of the model at least with
respect to these particular bootstrap datasets.  It is in this regard that
they contain more information than the 93% where the COV step failed.  In a
previous message Nick provided information from the COV step output that
suggests the bootstraps runs for these 7% are indeed stable.  That didn't
have to be the case as the COV step can be successful and the model can
still be unstable.  It is for this reason that I agree with Nick that simple
success or failure of the COV step alone is a poor indicator of the
reliability/stability of the model.  If the COV step output for these 7% had
diagnostic information to suggest that these particular fits were unstable,
then it would have been of interest to postulate alternative, more stable
models that resolve this instability and see if that also resolves the high
convergence failure rate as well.  But in Nick's example the 7% percent
appear to be stable which means diagnosing the reason for the high
convergence failure rate is going to be more difficult.  It could still be
related to instability/over-parameterization of the model for the remaining
93% of bootstrap datasets but we would need more information from Nick
regarding the design of his dataset and bootstrap sampling scheme to assess
this.  

Ideally, one would be looking at the COV step output throughout the model
building process before getting to the bootstrap phase to give one a better
shot at not encountering such a high convergence failure rate when
performing bootstrapping.  In fact I often don't perform the COV step during
bootstrapping as it can be impractical especially for models/data with long
run-times.  On the other hand, I don't often encounter such a high
convergence failure rate when I perform bootstrapping either.  I believe
this is in part due to the concious effort I take to avoid instability in my
model building.  I agree with Mats that the COV step provides imperfect
diagnostic information but that is the case with other diagnostics such as
empirical Bayes estimation as well.  Imperfect as the COV step information
may be, it still provides valuable diagnostic information.  That is not to
say I would use the COV step output to make formal inference via confidence
intervals because I generally don't, but I do think we should be reviewing
the COV step output routinely to help guide model development.

Ken
_______________________________________________________

From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Tue, June 8, 2004 5:27 pm 

Nick and all,

I can be convinced of your assertion that it is OK to pool bootstrap results
for all the runs including those with failed convergence but I would need
additional information.  The table below summarizing the means and SDs don't
provide sufficient information.  If you are planning to use these bootstrap
results to report out confidence intervals then it is important to compare
the tails of the distributions with and without the failed convergence runs.
At a minimum you should calculate bootstrap CIs with and without the failed
runs to see how they compare.  However, since only 28% of your runs had
successful convergence, you might be faced with very poor precision to
estimate the tail percentiles for the bootstrap CIs based on the successful
convergence runs alone.  In which case you might consider performing
quantile-quantile (Q-Q) plots comparing the order statistics between the
empirical distributions of the bootstrap estimates for the failed versus
successful runs.  If these Q-Q plots are fairly concordant between say the
10th and 90th percentiles (80% of the distribution) then I would be inclined
to believe that the empirical distribution of the bootstrap estimates is
independent of convergence status and that any breakdown in the tails is
probably due to poor precision.  In this setting I would then conclude it is
OK to pool both the successful and failed convergence results in reporting
bootstrap CIs.

Assuming such information supports your assertion, I would be careful not to
over-generalize these results to suggest that one can always pool failed
convergence results.  I believe every bootstrap simulation where you
encounter a high convergence failure rate has to be dealt with on a
case-by-case basis and a similar exercise as above would have to be
performed if you wanted to pool these results.  Moreover, if for a
particular example, the empirical distributions are different between the
failed and successful runs then the analyst has to go back to the drawing
board to try and diagnose the reason for the high convergence failure rate.
Here is where we seem to be in disagreement regarding the value of the COV
step.  Equating success of the COV step with good luck suggests that success
or failure of the COV step is a purely random event outside of our
control...I strongly disagree.  I agree with you that success or failure of
the COV step alone provides insufficient information regarding the
reliability of the estimates, and in general it is not a good idea to
perform formal inference with CIs based on the COV step std errors, but we
appear to disagree on the value of the COV step output as a diagnostic tool
to help assess instability which may be the reason for the high convergence
failure rate.  I agree with Mats that the COV step is an imperfect
diagnostic and certainly we can get COV step and convergence failures that
are unrelated to instability (e.g., over-parameterization) but that doesn't
mean it has no value as you seem to suggest.

Ken

_______________________________________________________

From: Leonid Gibiansky lgibiansky@emmes.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Tue, June 8, 2004 5:45 pm 

I think we need to discuss what is "failed" run. The run may fail for 
variety of reasons with different error messages. The most "safe" reason is 
insufficient accuracy: failure to achieve required precision. I am sure 
everyone observed this phenomenon when the run stops at the precision just 
below the requested one. If you request 3 digits you get 2.9, if you 
request 5 you get 4.5. I believe, these runs should be treated as 
"converged", not failed. Indeed, what is the difference if the run attained 
numerical precision of 3 significant digits with SIGDIGITS=3  (what is 
called good convergence) or with  SIGDIGITS=5 (which would be treated as 
failure)? I would guess that failed runs in Nick example are of this variety.

On the other hand, if the run fails with undefined OF or undefined 
precision, I would be hesitant to accept it even in the bootstrap setting.

Leonid
_______________________________________________________

From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Wed, June 9, 2004 8:55 am

Leonid,

I don't think Nick has provided sufficient detail to know how insufficient
the accuracy might be.  Moreover, even for the 28% which did achieve
sufficient sigdigit accuracy, only one quarter (7%) of these had a
successful COV step so I still think the problem could be due to a very flat
likelihood response surface where difficulty in achieving sufficient
accuracy can be problematic (i.e., the problem of over-parameterization
where there is an infinite number of solutions for the parameters that
result in essentially the same OFV).  In any event, if indeed the majority
of runs are failing with rounding error messages where the minimum sigdigits
for the run is 2.9 then I would expect the bootstrap distribution for these
failed runs would be no different from the successful runs.  The whole point
to doing the bootstrapping is to construct these distributions.  So
regardless of the reason for the failure, the proof is in the estimates
obtained to generate these empirical distributions. I'm just concerned that
there is the potential for many of them to fail near a local minima or
perhaps not move very much from the starting values.  In this setting, the
bootstrap distribution for the failed runs could be very ugly (e.g.,
bimodal).  That might not be the case for Nick's example but in general the
potential is there...which is why we have to be cautious when we have such a
high convergence failure rate.

Ken
_______________________________________________________

From: Nick Holford n.holford@auckland.ac.nz
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Thu, June 10, 2004 3:55 pm

Ken and Leonid,

Thanks for all your thoughtful remarks and suggestions. I am afraid I am rather busy
this week (leaving tomorrow for PAGE and 2 weeks in Europe) so I don't have time to
take up the suggested investigations right now. 

But a quick thought -- Can either of you provide a data based example where you can
show that bootstrap runs that minimized successfully with $COV produced more
reliable parameter estimates than those that did not complete $COV or terminated? It
seems that much of the discussion is based on opinion rather than data. I have
offered some data to test the hypothesis that successful runs (+/- $COV) are better
than all bootstrap runs. I realize it is only one data set (and I am working on
another as I type) but it is data not just opinion.

Nick

--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
_______________________________________________________

From: Leonid Gibiansky lgibiansky@emmes.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling 
Date: Thu, June 10, 2004 4:29 pm

Nick,
I do not have an opinion on the subject that you raised:
"The hypothesis that successful runs (+/- $COV) are better than all 
bootstrap runs". However, the question of whether to accept "strongly 
failed" bootstrap runs (that have parameters defined with zero significant 
digits or with undefined OF) is interesting to discuss. Can someone offer 
an example when "strongly failed" NONMEM run was used in any paper or 
regulatory submission? My guess would be no. I would simplify the model, 
fix some parameters or do any other possible tricks to get reasonable 
convergence.

  I think the criteria for accepting the bootstrap runs should be 
similar.  We cannot accept "strongly failed" runs simply because they have 
parameters similar to those that converged. They need to be treated as 
failed, with undefined parameters. The next question is what to do with 
those. One can try to push them to convergence with several starting points 
(initial parameters). This would be my choice, but it need to be automated. 
Another option is to place them at the tails of the distribution: say if 5% 
of runs "strongly failed" then the most you can count on is to look for the 
bounds in the 5 to 95% interval (5% set aside for the failed runs). But his 
might be too strict. The other option (actually, my favorite) is to look on 
the bootstrap as a useful diagnostics of the problem, not the tool to get 
confidence intervals with great precision. Then 5% of "strongly failed" 
runs can be ignored and the rest used for approximate diagnostics, 
investigation of the parameter distributions, publishing nice papers with 
bootstrap figures, etc.

Have a nice trip to PAGE and Europe !
Leonid
_______________________________________________________

From: Nick Holford n.holford@auckland.ac.nz 
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Thu, June 10, 2004 4:57 pm

Leonid,

Thank you for your further comments. I am afraid your guess is wrong. The model (a
"strongly failed NONMEM run") which is the subject of this discussion has indeed
been published in Br J Clin Pharmacol.
http://www.blackwell-synergy.com/links/doi/10.1111/j.1365-2125.2004.02114.x/

I can assure you many tricks and initial estimate games were played to try and get a
cleaner solution for this model. If you have read the previous discussion you will
realize that the *model* can describe a random sample of the original data with
successful minimization and $COV. So my contention is that this is not a model
problem but a modelling program (NONMEM) problem.

I invite you to write to the editor with your opinions if you wish but I would
prefer to read some arguments based on data.

Nick

--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
_______________________________________________________

From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE: [NMusers] $OMEGA blocks and log-likelihood profiling
Date: Thu, June 10, 2004 5:14 pm

Nick,

As I indicated in a previous message I don't typically run bootstraps with
$COV.  I really don't see the value in summarizing the bootstrap results
with/without a successful COV step.  The issue is the high convergence
failure rate and whether or not you can use the estimates from the failed
runs to provide valid inference via the empirical distribution generated
from your bootstrap samples.  The COV step only comes into play as a
diagnostic to provide insight into why you are getting such a high
convergence failure rate.  Ideally, closer attention to the COV step output
during model building to guide model selection may help to avoid the high
convergence failure rate when performing bootstrapping.  In the past you
have indicated that you would much rather just go right into bootstrapping
rather than take the time to get a successful COV step and review this
output for ill-conditioning during model building.  That's fine, but then
you are more likely to encounter the problem of a high convergence failure
rate during bootstrapping.  In which case you have more work on the back end
to verify that these failed runs can still be used to provide valid
inference regarding the uncertainty in the parameters. You may be fine with
your one example, but in general its a slippery slope to be on and each case
where you have a high convergence failure rate the burden will be on you to
verify that the empirical distributions for the failed and successful runs
are unchanged before pooling them. 

If you had a successful COV step from your model fit on the original
dataset, which suggested that your model was stable, and you ended up with
>90% successful convergence in your bootstrap runs, then I couldn't care
less whether you even ran the COV step for the bootstrap runs.

Ken
_______________________________________________________

From: Leonid Gibiansky lgibiansky@emmes.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Thu, June 10, 2004 5:47 pm


Nick
Every rule has exceptions. I will not write to the editor, of course (why 
you included this suggestion to the message?) but this is an open forum, 
and everybody can express their opinions even the opinions that are not 
based on the data. My opinion is that the results based on the strongly 
failed run should be taken with great suspicion, even if they are confirmed 
by 1000s more strongly failed bootstrap runs.  I would agree with you that 
you can accept the model (treated as black box) and model predictions based 
on the model diagnostics (who cares how the model was obtained if it 
describes the data well enough?) but it is more difficult to argue that the 
parameters and CI obtained by this procedure reflect true parameters and 
CI. It cannot be only NONMEM problem. NONMEM was able to fit thousands of 
models to thousands of data sets successfully. The failure of the program 
on one particular data set means that this particular model and this 
particular data set have some intrinsic problems so that general and stable 
nonlinear model fitting algorithm fails on this particular problem. In any 
case, this means that experience gained on this problem is unique and 
cannot be (or should not be) generalized to other problems.
Leonid

_______________________________________________________

From: "Hutmacher, Matt" 
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling 
Date: Fri, June 11, 2004 9:53 am

Hello all,

I have some general overall comments on the discussion based on my
understanding of the discussion and on the bootstrap.

It seems to me, that for the bootstrap to perform correctly, one must have a
"proper" estimate.  In the simple case of a univariate random identically
distributed random variable (not necessarily known), the sample mean is an
estimator of the population mean.  This population may be bootstrapped, the
sample means for each bootstrap calculated, and the sample means can be
ordered to provide an estimate of the confidence interval (which I believe
is asymptotically consistent with the true confidence interval for very
general situations).  In calculating the sample mean, no iteration is
necessary, so we are always assured of a "proper" estimate of the mean.
When things get more complicated in the non-linear mixed effects world, it
would seem to me - to ensure good estimates of the confidence intervals that
one would want the "proper" (i.e. global minimum) parameter estimates for
each bootstrapped data set.  The problem in the practical setting is
determining when you have "proper" estimates.  This can be difficult to
determine even for the original data set.  Even when the COV step completes,
we may not be sure that we are at a global minimum.  To help ensure that we
get there, we build stable models and use good starting values.  Both of
these practices can be applied to the model run on the original data set,
but it becomes difficult when bootstrapping because of the large number of
runs.  Thus, I agree with Ken, that to ensure the best overall performance,
not only of the original model run, but the bootstrap procedure, that a
stable model should be built and used for inference.  As Mats points out,
computationally, the COV step is cheap (compared to the bootstrap) in that
it gives us an indication of the stability of the model.  If the COV step
runs, and the model condition number is low, then I would have good faith in
proceeding with the bootstrap.  However, if the COV step does not run, as
Ken indicated, I would be concerned about whether the model has found a
saddle point (indicating the estimates are not proper), is
overparameterized, etc.  I do agree with you Nick, that sometimes the COV
may not run for a good model.  One often used example is when TLAG is
estimated near a data point.  Here, the derivative is undefined, so no
covariance estimate is available.  This model may still yield good estimates
and be suitable for inference by bootstrapping.  However, in my opinion, for
general situations some detective work needs to be done to determine why the
COV failed.  Otherwise, I would think, some suspicion as to whether the
estimates are "proper" would exist in a reviewer's mind.  When the COV step
fails on the original model, I think the bootstrap could be valuable tool to
help show that the estimates are proper.  If one looks at the distributions
and finds they are reasonable (smooth, unimodal, [and "ideally" symmetric
due to the central limit theorem]) then one has some credible evidence that
the original model/data is robust.  However, one must be careful.  In a case
like Nick's, where, hypothetically, a large number of runs do not converge,
the parameter estimates from the runs that didn't converge could be
unnecessarily widening the confidence intervals.  Thus, in my opinion, it is
good to always include the distribution of the bootstrap estimates in the
appendix of one's report.  For rounding errors, I agree with Leonid that to
some extent, selection of significant digits is arbitrary.  However, in my
mind, if one can't get three significant digits, one must ask why can't I
achieve them.  Perhaps if the lowest is 2.9 you could proceed.  However, in
this case if you change sigdigits to 2 and you get 1.9, wouldn't that be
concerning (in the sense of instability)?  If the parameter estimates were
identical then maybe they are ok, but in my experience this is not usually
the case.  [I know usually when this happens we increase the sigdigits and
trick NONMEM to run the COV using the MSFI option - I state this as somewhat
of an old argument or thought process]. If a sigdigit is less than 1, would
you trust the estimates. 

Nick, I would liked to have gone to PAGE and discussed this with you over a
pint, as I think you have an interesting problem - that being, "how do you
know you have good estimates".  There is always room for debate there.  But,
I have to say, I disagree with the general tone of your 2nd paragraph below.
I believe this statement implies (or at least I infer) that this one
dataset/model provides proof that the COV step is meaningless.  You are
using the data to argue your point here, and you claim it is generalizable,
which I think is not a scientific argument.  You say the proof is in the
data and unarguable.  However, when data drives a correlation estimate
between CL and V to 95%, I believe that you would state that the model is
wrong and that this result was in error, even if the model fit the data well
(correct me if this is inconsistent with previous emails).  You would
discount the data in that situation based on an "opinion", wouldn't you?  Or
is your "opinion" subject matter knowledge...

Model fitting is not 100% composed of subject matter knowledge
(pharmacology/pharamcokinetics).  When one wants to "realize" the data by
fitting a model to the data and perform inference, estimation is necessary.
Estimation is statistical in nature.  Statistical theory plays a valuable
role in assessing estimates and ensuring proper inference.  Statistics
allows us to make probabilistic statements about the data, future
predictions, etc.  Some statistical rigor is necessary to ensure that the
probabilistic statements made will be valid.  These "hurdles" have been
shown to be valuable over time.  If this were not the case, statistics would
be such a value part of interpreting clinical trials.  Therefore, I would
not call good statistical practice "arbitrary" hurdles.

Matt  
_______________________________________________________

From: Nick Holford n.holford@auckland.ac.nz
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling
Date: Fri, June 11, 2004 5:25 pm   

Matt,

Thanks for your remarks. Most of which I fully agree with. But I do take issue with
your faith in statistical theory. For many years it was commonly accepted that the
difference in log likelihood under the null was distributed chi-square. But we know
from data based experimentation that this is not true for NONMEM. It is also known
that NONMEM's standard error's are of little use for confidence intervals because we
can find from data based experimentation (using bootstraps) that confidence
intervals are often asymmetrical and SE based predictions of parameters such as
OMEGA can easily include impossible negative values. Once again statistical theory
applied to NONMEM is misleading. These data based experimental tests have forced me
to think harder about the assumptions we make when applying statistical theory in
this area.

So why should I accept the 'good statistical practice' notion that getting the $COV
step to run in NONMEM is a marker for a 'stable model' which is somehow more
reliable? It is here that I am asking for data. Can anyone support this hypothesis
with data based experiments using NONMEM?

Of course I would be happy if all my runs converged and the $COV step ran. But in
the real world of non-trivial PKPD analysis this cannot be guaranteed and indeed I
find the closer one gets to a mechanistically plausible model the harder it is to
get these things to happen. So in the real world I live in I feel I cannot rely on
statistical theory for NONMEM results but want some data based backup. The bootstrap
and randomization test are tools for doing this.

You and other have indicated that you think I wish to generalize the results from
one study to all cases. If I have given this impression it was not intentional. I am
not offering a general theory but I am offering an experiment to test a hypothesis.
The hypothesis is that NONMEM runs that converge with $COV are somehow more
reliable/stable than those that do not. I really don't have a good idea how to test
for model 'stability' but in this context I would consider reliability to mean that
the parameter estimates are unbiased. As I understand it this hypothesis is not
built on any mathematical theory but arises from 'good statistical practice'. In the
single case I have tested I can find no evidence to support this hypothesis. I have
read and understand the various viewpoints that have offered reasons why my one
experiment may be misleading. I accept these possibilities (e.g. failed runs might
widen bootstap confidence intervals) but the data at hand gives no obvious support 
 that this
is happening. I am aware of the need to do some 'detective work' to try to
understand why a model may be failing to converge. There are numerous ad hoc tricks
one can apply e.g. using SLOW or HYBRID or increasing SIGDIG to get results for MSFI
with a lower SIGDIG. But at the end of the day, after months of work, I want to move
forward with the model that best describes the data and seems to explain how the
world works. It is here I am reluctant to throw the baby out with bathwater because
the model fails some test of 'good statistical practice' which I cannot find any
data to support.

Too bad you won't be at PAGE. I look foward to catching up on another occasion.

Nick

--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
_______________________________________________________

From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling 
Date: Tue, June 29, 2004 3:08 pm  

Nick,

I'm sure we're all getting tired of this thread but I just can't leave it
where it last ended.  Below you ask the rhetorical question why we should
consider it good statistical practice to get the COV step to run as a marker
for a stable model that is somehow more reliable.  I don't consider it good
statistical practice to simply use success/failure of the COV step as such a
marker.  What I and Matt have been saying is that it is good statistical
practice to develop models that have a successful COV step AND to inspect
the COV step output to assess the stability of the models.  It is the
stability of the model that allows us to guage our confidence (i.e.,
reliability) in the parameter estimates that we obtain from NONMEM as being
optimal from which to make inference via point and interval (i.e.,
bootstrapping) estimates.  You seem to want to rely solely on your
confidence that you have a correctly specified mechanistic model and that is
sufficient to have confidence in the estimates regardless of whether NONMEM
achieves successful convergence.  Surely you would agree that if we knew
with 100% certainty that your mechanistic model was correctly specified, and
you had a dataset where every subject was sampled only once at the same
fixed time point, it would be ludicrous to fit this data in NONMEM (we would
expect it not to converge) and put any level of trust in the estimates we
obtain.  The reliability in the paramater estimates depends not only in our
confidence of a correctly specified model (which we can never really know
with real data) but also in our confidence that the data in hand contains
rich enough information to estimate these parameters. 

Reliable estimates should not be equated with unbiased estimates.  Unbiased
estimates speak to the accuracy of the estimates and the correctness of the
model which cannot be assessed with real data sets.  Even if you wanted to
assess the value of simple success/failure of the COV step in your bootstrap
runs as a marker of the validity of the results with real data, at best, you
can conclude that the distribution of the parameter estimates from the
failed COV step runs (and in your example the majority also fail in
convergence) are similar to the distribution of the estimates from the
successful COV step runs.  The problem with real data sets is we don't know
what the true distribution of the parameter estimates should be.  If you
want to assess whether a successful COV step provides any value as a marker
in assessing the accuracy and precision of the estimates you need to do this
via simulations where you know the true value of the parameters.  As you
vary the design to change the information content in the data resulting in
increased instablity (with fewer data points), I think you will find that
the accuracy of the parameter estimates and coverage probabilities of the
bootstrap CIs will get worse even though the model is correctly specified
(i.e., fitting the same model as you used to simulate the data) regardless
of COV step status.  

Note that the nonparametric bootstrap relies on statistical theory and is
not a data-based result.  In order for the bootstrap to give us valid
confidence intervals we need to rely on the randomness  and optimality of
the estimates.  Rounding errors, lack of convergence, and COV step failures
may be indicative that the estimates are sub-optimal (not at the global
minimum) and could result in systematic biases that invalidate any inference
from the resulting empirical distribution of these sub-optimal estimates.  

Finally, I'd encourage you to read up more on statistical theory for
nonlinear models.  There is a wealth of information in the statistical
literature dating back to the 60's and 70's that Wald-based symmetric
confidence intervals for nonlinear models often do not have proper covergage
and that the distributions of many parameter estimates from nonlinear models
are asymmetric.  A lot of this is in standard texts for nonlinear regression
models.  Bates & Watts, Nonlinear Regression Analysis and its Applications,
Wiley, NY, 1988 is a good text.  It is true that maximum likelihood theory
states that asymptotically, maximum likelihood estimates have a multivariate
normal distribution, however, for population models, Vonesh (Biometrika
1996;83:447-452) has shown that the asymptotics require not only a large
number of subjects (N) but also a large number of observations per subject
(n).  For certain intrinsically nonlinear parameters the magnitude of n
necessary to achieve these asymptotics may never be realistically achieved.
Furthermore, with regards to NONMEM we are not doing exact maximum
likelihood estimation but approximate maximum likelihood estimation due to
the first-order approximations that are employed and this also plays into
the problem.  Mats Karlsson and his colleagues have shown situations when
the first-order approximations (especially the FO method) do not perform
well in maintaining the nominal type I error rate for likelihood ratio
tests.  However, they have also shown situations where the type I error rate
is maintained based on the Chi-Square distribution.  So, situations where
the likelihood ratio test do not follow a Chi-Square distribution are not
evidence of a failure of statistical theory but rather an indication that
our approximations and asymptotics may not be working to our advantage.
Application of statistical theory requires us to have an understanding of
the properties and limitations of the estimation methods that we employ.  We
(the PK/PD modeling community) are continuing to contribute to this
statistical theory as it applies to NONMEM and its estimation methods.

Regards,

Ken
_______________________________________________________

From: Nick Holford n.holford@auckland.ac.nz
Subject: RE:[NMusers] $OMEGA blocks and log-likelihood profiling 
Date: Wed, 30 Jun 2004 15:51:23 +1200
Ken,

Thanks for keeping the ball rolling...

I think we need to keep clear a distinction between statistical
 theory and the assumptions required by the theory when applied to
 NONMEM. Please note this sentence in my response to Matt (below) "These
 data based experimental tests have forced me to think harder about the
 assumptions we make when applying statistical theory in this area."

Thanks for the suggestions for improving my education. My own preference
for a textbook is Seber GAF, Wild CJ. Non-linear regression. New York:
John Wiley & Sons; 1989 (the authors are academic colleagues at the University
of Auckland <grin>). I have no quibble with the theories (at least to the
extent I understand them) but as you note below they often involve assumptions
of normality (e.g. for predicting CIs from SEs) or that the likelihood is
correct when applying the chi-square distribution to the likelihood ratio
test. Both of these assumptions are dubious when applied to results from
NONMEM. Bob Leary provided some additional evidence of the problems of
NONMEM's Maximum Approximate Likelihood at the PAGE meeting 2 weeks ago.
http://www.page-meeting.org/page/page2004/Leary.pdf

On matters that are not based on statistical theory like
'reliability' and 'model stability' which you justify by
'good statistical practice' I can only say that I prefer data
based SOPs. If NONMEM's $COV results are only approximately correct
then just how valuable is the inspection of the entrails for evidence
of model stability? I note that Bates & Watts offer no guidance on
either reliability or stability (in the index) but Seber & Wild do
offer a discussion of parameter stability but only give a practical
example in the case of a model with 2 parameters. There is no mention
of model or parameter reliability in the index.

My confidence in non-parametric bootstrap results is not simply on the
basis of a plausible model but also from consideration of a reasonable
design confirmed by simulation based tests. I consider your example of
a reductio ad absurdam design to be a straw man in this context.

With regard to your point about simulation -- You seem to have forgotten
that I reported to you already results of using simulation with this
problem (http://www.cognigencorp.com/nonmem/nm/99jul152003.html).
There was no evidence of bias or important differences in CIs in runs that failed
compared with those that were successful, or those that also completed the covariance step.

Your recent suggestion to examine Q-Q plots of the distributions of estimates
obtained with varying termination/minimization conditions has been very
helpful - thank you. I ended up (with some assistance from Christoffer Turnoe)
creating CDFs of the empirical distribution of the non-parametric bootstrap
parameters. There are some minor systematic differences between these plots
when I compare the worst case (terminated due to proximity to infinite
objective function value) with the best (covariance step completed). But the
inferences about coverage are not importantly different in the context of the 
application (Matthews et al.2004). I am currently doing some more runs to increase
the total number of $COV successful estimates to refine the evaluation of the
distributions. I would be happy to send you the data for you to perform your
own examinations.

NONMEM parameter estimation involves calculations that are often at the limits
of computational precision. I know that Stuart Beal has put a lot of effort
into trying to make this as  platform independent as possible but it is widely
known that NONMEM results depend on compiler (and options) and CPU type. I
believe that Stuart fine-tuned NONMEM on a specific platform (Sun workstation
and compiler, I think). Heuristic decisions were no doubt made on the basis
of performance on this platform. Even if other compilers and processors
provide superior numerics NONMEM may not perform so well because of the
Sun specific pragmatic implementation. Thus NONMEM running with an AMD
Athlon CPU and the Compaq Visual Fortran compiler (my own platform) may
fail despite being very close to a solution that would be successful
on a Sun. It is my current hypothesis that it is this numerical dice 
throwing that gives rise to the high minimization failure rate rather
than any major deficiency in the data or model that I have been using.
My data based experiments are currently aimed at testing this hypothesis.
If minimization and/or covariance step success are dependent on pseudo-random
numerical issues then I would predict that the distribution of parameter
estimates would be very similar irrespective of success of failure. Results
to date do not provide evidence to reject this hypothesis.

Nick

Matthews I, Kirkpatrick C, Holford NHG. Quantitative justification
for target concentration intervention - Parameter variability and
predictive performance using population pharmacokinetic models for
aminoglycosides. British Journal  of Clinical Pharmacology 2004;58(1):8-19.
--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
_______________________________________________________
From: Kowalski, Ken Ken.Kowalski@pfizer.com
Subject: RE: [NMusers] $OMEGA blocks and log-likelihood profiling
Date: Fri, 2 Jul 2004 09:37:48 -0400

Nick,

It sounds like we are not in disagreement about the value of statistical
theory just that we have to understand the assumptions we make when applying
it...no argument there.  Just as there is a wealth of PK/PD terminology and
different words are sometimes used to mean the same thing, the same is true
in the statistical literature.  I equate stability or rather 'instability'
with 'ill-conditioning'  which can be a result of 'over-parameterization'.
Bates and Watts (pp. 86-91) discusses inspecting the correlation matrix of
the parameter estimates to help diagnose collinearity among the parameter
estimates (general ill-conditioning) and/or over-parameterization (fitting
too many parameters).  With regards to 'reliability' I'm using the term as
per Webster to mean the extent to which results are reproducible.  In this
context a model fit might be unreliable because it is difficult to reproduce
estimates across platforms/compilers, with different starting values that
may only differ by 10%, etc. 

Certainly numerical instability of NONMEM itself may indeed be the
explanation for your particular problem, however, if you don't strive to get
the COV step to run and inspect the output during model building before
performing bootstrapping a reviewer could always question whether your model
is stable.  Moreover, Pete Bonate's exercise showed that NONMEM differences
across platforms/compilers were most likely to occur when one is dealing
with an unstable model.  Thus, numerical instability of NONMEM and model
instability in many cases may just be two sides of the same coin. 

Regards,

Ken
_______________________________________________________