From: "TKT (Thomas Klitgaard)"
Subject: [NMusers] Outliers and the FDA guideline 
Date: Wed, August 18, 2004 3:21 am 

Dear all, 
In the FDA Guidance For industry: "Population Pharmacokinetics" (February 1999)
section VII, C (p. 11 onwards) states the following about outliers: 

"The statistical definition of an outlier is, to some extent, arbitrary. The
reasons for declaring a data point to be an outlier should be statistically convincing
and, if possible, prespecified in the protocol. 1* Any physiological or study-related event
that renders the data unusable should be explained in the study report. 2* A distinction
should be made between outlying individuals (intersubject variability) and
outlier data points (intrasubject variability). Because of the exploratory nature
of population analysis, the study protocol may not specify a procedure for dealing
with outliers. In such a situation, it would be possible to perform model building
on the reduced data set (i.e., the data set without outliers) to 3* reanalyze
the entire data set (including the outliers) using the final population model,
and to discuss the difference in the results. Including extreme outliers is not
a good practice when using least-squares or normal-theory type estimation methods,
as such outliers 4* inevitably have a disproportionate effect on estimates. Also, it
is well known that for most biological phenomena, outlying observations are far
more frequent than suggested by the normal distribution (i.e., biological distributions
are heavy-tailed). Some robust methods of population analysis have recently been suggested,
and these may allow outliers to be retained without giving them undue weight (38-40).
Outliers should be specified in a separate appendix to the report, with all data available"
Our interpretation of this is the following: Either the criteria may are predefined, in a
statistically reasonable way, in the protocol (*1) - or they're not, in which case (*3) model
building on the reduced data set could be performed followed by a re-run of the final model
on the full data set. (Section 2* does not appear to be an outlier issue, as it pertains to
a non-result. ) 

My questions are: 
1) What would be a statically convincing criterion, for the first approach in section 1*?. Note
that for this approach, the criterion should be stated a priori in the protocol. 

2) The procedure explained in 3*) requires that the outliers be known before model
development, hence excluding the application of a model-based exclusion criterion applied on
the full dataset first (e.g. "exclude if WRES>4"), followed by a re-run on the reduced data
set and a discussion of the sensitivity to the applied censoring. How do you get to know you
outliers beforehand? 

3) How much flexibility do you allow yourselves with the criterions - would you go with
common-sense based "looks better - CV% are notably smaller"  instead of strict rules. Would the FDA? 

Thanks in advance. 

Thomas Klitgaard, Pharmacometrics, Novo Nordisk Denmark 

From: "Robert L. James"
Subject: RE: [NMusers] Outliers and the FDA guideline 
Date: Wed, August 18, 2004 8:33 am

I always classify outliers as those that are 1) "highly improbable", and
2) those that are due to "natural extremes" in variation.   "Highly improbable"
outliers strongly suggest experimental protocol error (for example, the lab
technician left out an important reagent when performing the assay or a laboratory
appartus wasn't properly zeroed or "warmed up", incomplete mixing of a drug in blood
during the first minutes following an bolus arterial injection, etc).  "Highly
improbable" outliers are ususally near the limit of biologic impossibility.  "Natural
extremes" outliers, on the other hand, are unlucky but real.  Biologic systems can
occassionaly vary producing very extreme values.  
For, "Highly improbable" outliers, I simply discard the outlier from all analyses and
make a note of discarding it in my results.  
However, discarding "natural extreme" outliers are statistically problematic.  To
discard them outright will bias the results by shrinking the variance.  Including
them may make it very difficult to fit a good model.  For "natural extreme" outliers
I initially exclude them from the data during the model fitting.  But then for my final
model run, I'll put "natural extreme" outliers  back into my model so that the variance
structure reflects the natural (although extreme) variability.  For this final run, I
may or may not fix the theta parameters to the estimates obtained by the earlier model
(without the outliers).  I report model diagnostics using the final model fit which was
based on the data that included the "natural extreme" outliers.
Robert James 

From: "Hutmacher, Matt"
Subject: RE: [NMusers] Outliers and the FDA guideline 
Date: Wed, August 18, 2004 11:43 am

Outliers are a difficult subject.  I think if you asked 10
different modelers you would get 10 different answers on how to handle
them.  I would suggest a systematic approach to data elimination in general. 
A systematic approach is the analyst's best surrogate for objectivity, since
only the reviewer/audience can determine ultimately the level of objectivity. 
For an analysis which will be submitted to a regulatory authority, I would
advocate specifying the criteria for classifying data as outliers a priori
(before unblinding the data) in a population modeling analysis plan.  This
document should also specify how the analyst will determine if the outlier
is influential and how he/she will proceed if the outlier is influential. 
This systematic, pre-specified approach will mitigate the subjectivity induced
by eliminating data a posteriori.  In general, my opinion is that it is best to
include all the data whenever possible.  If there are number of outliers, one
might try using a mixture of epsilons (and hence variances) to down-weight
these observations and reduce their influence.
Sometimes, handling of outliers will depend on the goal of the analysis, and
the outliers may not fulfill pre-specified criteria such as |residuals|>=3 or 4. 
For example, we did a population PK (PPK) analysis on some sparse data.  Because
the estimated CV of residual variation was >80%, no data appeared as outliers by
the usual residual criteria.  When you looked at the data (2 samples, 1 hour apart
for each individual for each visit), some visits appeared to have concentrations,
which were ascending with time (as if absorption were occurring). However, these
pseudo-absorption phases were occurring much to late in the dosing interval; these
were "highly improbable" observations (as below) given the drug had very predictable
absorption in every other study.  We figured these results were do to incorrect
recollection/recording of the last administered dose.  Thus, the large CV estimate
was from the model predicting elimination when the data were exhibiting this
pseudo-absorption.  Ultimately, the purpose of the PPK analysis was to test for
influential covariates.  The large %CV would reduce the power to detect these covariates,
so (in my opinion) it was of interest to eliminate these data points (since any attempt
we made to include them failed) to better perform the exploratory covariate analysis.  To
mitigate the subjectivity induced by selecting the points by visual inspection (again 10
analysts might end up with 10 different data sets), we used a mixture model on Tlag.
Three mixtures were discovered, the "typical", "unrealistic 1", and "unrealistic 2"
absorbers.  The model classified each visit for each patient into one of these three
categories.  We plotted the data by the three mixture classifications and it was easy to
see that these data had different, unlikely characteristics.   These data were deleted,
and the CV was reduced  to ~30%.  The reviewer/audience could disagree with the procedure,
but if he/she thought it was reasonable, then there would be no argument over classifying
which data should be eliminated.


From: Mats Karlsson []
Subject: RE: [NMusers] Outliers and the FDA guideline 
Date: Monday, August 23, 2004 8:22 AM

Hi Matt,

Nice advice. I was just wondering about a technicality. In
applying the mixture model to Tlag in a visit-specific manner,
did you treat each visit as a separate subject or did you apply
such a mixture while keeping the individual records intact (ie
handling occasions as such not as IDs). If the latter, it would
be nice to see some code.




Mats Karlsson, PhD
Professor of Pharmacometrics
Div. of Pharmacokinetics and Drug Therapy
Dept. of Pharmaceutical Biosciences
Faculty of Pharmacy
Uppsala University
Box 591
SE-751 24 Uppsala
phone +46 18 471 4105
fax   +46 18 471 4003

From: "Hutmacher, Matt"
Subject: RE: [NMusers] Outliers and the FDA guideline 
Date: Mon, August 23, 2004 8:31 am

Hello Mats,
We did treat each individual as a separate subject.  The
variability patterns in the Tlag parameters seemed to be
intrasubject (and not intersubject), and since it was a
screening procedure, we did not pursue anything more complicated.