From: VPIOTROV@PRDBE.jnj.com
Subject: [NMusers] FW: [S] The effect of default values on statistical results
Date:Wed, 19 Jun 2002 08:58:14 +020

Dear NONMEM users, 

FYI I am forwarding a few mails I got through the S+ user forum. It is primarily about GLM and GAM,
but my understanding is that the topic is relevant for us, too.

Best regards, 
Vladimir 
----------------------------------------------------------------- 

Vladimir Piotrovsky, Ph.D. 
Global Clinical Pharmacokinetics and Clinical Pharmacology 
Johnson & Johnson Pharmaceutical Research & Development 
B-2340 Beerse 
Belgium 

=======================================================
I got an interesting call from a reporter at the New York Times yesterday, who alerted me to some
research by Francesca Dominici and colleagues at Johns Hopkins 
(see http://biosun01.biostat.jhsph.edu/~fdominic/research.html, although the actual
paper's link is no longer available).  They have discovered that when the effect to be estimated with a GAM model is 
very small, the default convergence setting in many statistical software's GAM routines (including S-PLUS's) can 
lead to biased estimates.  This resulted in a downwards revision in the estimates of an air pollution study when the
data were re-analysed using a stricter convergence criterion.

You can read the full story in today's New York Times, or on their website (registration required) at 
http://www.nytimes.com/2002/06/05/science/05PART.html 

The reporter would like to do a follow-up story focusing on other  statistical studies that may have had to revise
results after relying on defaults in statistical software.  If you have any similar tales or cautionary notes and
would like to send them on, I'll pass them to the reporter.

BTW, I'll add a warning to gam()'s help page about this issue.  I'd also welcome any discussion
about whether you think the default convergence criterion in gam() should be reduced in general.

# David 

-- 
David M Smith  
Product Manager, Insightful Corp, Seattle WA 
Tel: +1 (206) 802 2360 
Fax: +1 (206) 283 6310 

====================================================================================== 
Thanks to everyone who responded to my query about yesterday's New York Times article: Tom Filloon, Jim Pratt, Peter England,
Brian Ripley, Rich Calaway and Bert Gunter.  I summarize the responses below.

Thanks also to Trevor Hastie for his contribution, and we'll update S-PLUS in the next release to tighten the default 
convergence criteria for GAM and GLM as he suggests.  Thanks too to Francesca Dominici, the author of the paper
cited by the Times article, for filling me in on the background.  She tells me that she has also heard from R and
Stata who are also looking into this.  She mentions that SAS should also do so as well.

Since I didn't get explicit permission to post names (except in one case) and given the media interest,
I post these summaries without the traditional attribution.

----- 

My opinion, for what it is worth, is that many software packages have 
weak convergence criteria, since they (obviously) want to appear fast. 
Staisticians, as a matter of course, should check that their procedures 
have converged adequately.  Whenever I do anything important using 
GAMs/GLMs, I adopt stricter convergence criteria than standard, and have 
been doing so for years.  Unfortunately, I do not have examples where 
using the standard convergence criteria would have altered a conclusion. 

From memory, I think Stata continues iterating until the parameter 
estimates (individually) change less than a certain tolerance, whereas 
most packages rely on an overall goodness-of-fit measure such as 
deviance. 

----- 

I've always thought that one should check robustness of results using 
different convergence criteria.  If one sets default criteria too small, 
it greatly decreases the performance of the routine for most situations 
unnecessarily.  When estimates are small or effects sizes small, then 
need ability to tighten convergence criteria.  This would be a general 
cautionary note to any iterative routine, not just GAM.  I do not see 
this as a criticism to any particular software (unless current default 
convergence criteria were not given considerable thought), but a general 
caution to any iteration routine for all software packages. 

I have no example to provide where default settings biased results. 

----- 

[Should default convergence criteria in gam be stricter?] 

My answer is yes, and more importantly in glm too. 

----- 

[Finally, this from Bert Gunter summarizes things well:] 

Translation 1: 
Data analysis is a tricky business -- a trickier business than even tricky 
data analysts sometimes think. 

Translation 2: 
There's no free lunch even when lunch is free. 

-- 
David M Smith  
Product Manager, Insightful Corp, Seattle WA 
Tel: +1 (206) 802 2360 
Fax: +1 (206) 283 6310 

======================================================================================= 
To follow up on my previous summary, I got one further reply on 
this issue from Bruce McCullough, which I include in its entirety 
below.  He also provides some relevant references to his papers on reliability of statistical software. 

He replied: 

The idea that nonlinear results are dependent upon 
default options is nothing new.  I made this point in 
my review of S-PLUS, SAS and SPSS.  I also made 
the point in other reviews, where I showed that 
reliance on default values is not a good idea.  Of 
course, I am hardly the first person to do this. 

Anyhow,  this strikes me as a user problem, not a software 
problem.  Tightening up the tolerances may be 
somewhat useful, but there is no one set of criteria 
that works for all problems.  Hence, I think warnings 
should be attached to the documentation: The user 
should vary the options, switch algorithms,  check the 
gradient, etc. i.e., do all the things that one usually does 
to ensure that one has found a local extremum,  to make 
sure that the solver has not just stopped at a convenient 
point that is not an extremum. 

Better yet: supply no defaults for nonlinear, so that 
the user must choose all the options!   :) 

Bruce 

"Assessing the Reliability of Statistical Software: Part I," 
The American Statistician 52(4), 358-366, 1998 
  
"Assessing the Reliability of Statistical Software: Part II," 
The American Statistician 53(2), 149-159, 1999 
  
"The Numerical Reliability of Econometric Software" 
(with H.D. Vinod), 
Journal of Economic Literature 37(2), 633-665, 1999 
  
"Econometric Software Reliability: E-Views, LIMDEP, 
SHAZAM, and TSP," 
Journal of Applied Econometrics, 14(2), 191-202, 1999 

B. D. McCullough, Associate Professor 
Department of Decision Sciences, Drexel University 
Philadelphia, PA  19104-2875 
w: 215-895-2134   f: 215-895-2907            
bdmccullough@drexel.edu    www.pages.drexel.edu/~bdm25 

-- 
David M Smith  
Product Manager, Insightful Corp, Seattle WA 
Tel: +1 (206) 802 2360 
Fax: +1 (206) 283 6310