From: David Foster david.foster@adelaide.edu.au Subject: [NMusers] LLR test, AIC, BIC Date: 10/21/2003 8:06 PM Hello all, Im not so new to PK and POPPK (but I am a recent NONMEM user)...hopefully this question isnt too silly. I have read a thread posted some time ago on this topic, but it raised a few questions for me. AIC= obj. fun + 2*NPAR BIC= obj. fun + NPAR* log(NOBS) As I understand, log(NOBS) is the natural log, and NOBS is the number of data points -is this the number of concentration-time points or the number of subjects? To examine the impact of including a covariable (no eta, just a new theta), I understand the use of the the LLR test just fine. But what about, for example, going from a 1 compartment to a 2 compartment model -there are new theta's and matching eta's. So my question is: does the "number of parameters" include theta's, eta's and sigma's? This also has an impact on AIC and BIC calculations as different results are obtained if one only counts theta's... In addition, is it very possible that a drop in OBJ that gives a significant "yes" to a 2 comp oral over a 1 comp oral model in the LLR test (counting theta's and eta's = 9.5; theta's only = 6 units drop) is often likely to be supported by the AIC, but not by the BIC -a much larger drop is needed, especially if the number of observations is large. This is to be expected, but how do people rationalize this in their model selection? -The use of diagnostic plots and other stats MPE, RMSPE etc is important here no doubt. I have also seen a correction to the AIC: AICc = obj. fun + 2*NPAR + (2*NPAR*(NPAR+1))/(NOBS-NPAR-1) Does anyone use this, and does it also apply to the BIC? I do understand that one must look at a range of other factors/plots/statistics (that I am well aware of), but LLR and AIC/BIC may be useful, and I just would appreciate the groups input. Regards, David that's the "Other David Foster" Nick -- David Foster, PhD NHMRC Research Officer Department of Clinical and Experimental Pharmacology Faculty of Health Sciences The University of Adelaide Adelaide, South Australia 5005 Tel: +61 08 8303 5985 Fax: +61 08 8224 0685 Email: david.foster@adelaide.edu.au http://www.adelaide.edu.au/Pharm/index.htm _______________________________________________________ From: Nick Holford n.holford@auckland.ac.nz Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/21/2003 8:31 PM David (the other), My advice is not to waste your time with AIC, LLR etc if you are using NONMEM. If you want to know the true null distribution for an objective function change then you should be prepared to estimate it using the randomization test. In your example this means fitting the original data with a one compartment model and a two compartment model and recording the delta OBJorg. Then use the one compartment parameter estimates to simulate say 1000 data sets (the randomization part). Fit each of these data sets to a one compartment model and a two compartment model. Look at the distribution of the 1000 delta OBJ values to find the probability that you would have observed delta OBJorg under the null hypothesis. This is an estimate of the true P value for falsely rejecting the null (the test part). Whether the time spent doing the randomization test is a better waste of time instead of worrying about AIC, LLR etc. is up to you Nick Nick Holford, Dept Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ _______________________________________________________ From: Paul Hutson prhutson@pharmacy.wisc.edu Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/21/2003 9:36 PM Nick: You always make it sound easy. Do you do these 1 and 2 compt fits, then 1000 simulations followed by fits to 1 & 2 compt models, followed by statistical tests in one batch program? Even if not, are there sites in the NONMEM archive that have sample control streams for those such as me that struggle with these iterations to admire and emulate? Paul _______________________________________________________ From: Leonid Gibiansky lgibiansky@emmes.com Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/22/2003 10:06 AM Nick, This randomization method might be the most appropriate for the problem, but even with the most advanced hardware/software combination you will not be able to apply this procedure to each model comparison step of the modeling process (the number of this steps can easily be in dozens for the base model, in hundreds for the covariate model, each step running anywhere from few minutes to few hours). Therefore, there should be a way (you may call it quick and dirty way) to decide whether to accept the model or make it more complicated. Then the question is not whether to use some quick criteria based on the objective function or use randomization test, but rather which of the criteria (LLC, AIC, BIC) to use and how to compute them correctly. Correctly here means "with the highest probability that crude criteria will correctly approximate true distribution". It would interesting to compare crude approach with randomization approach to extract some recommendations on when and how to use crude approach. Leonid _______________________________________________________ From: "Hutmacher, Matt" mhutmach@amgen.com Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/22/2003 3:12 PM Hello all, I certainly agree that what Nick has proposed below is one way to proceed. However, I would like to comment on his suggestion. The algorithm that Nick proposes below, in my mind, is not technically a randomization test. In my experience, the randomization test randomly permutes the "actual data" to establish the distribution of the null hypothesis test. What Nick is doing is "simulating data" to assess what the LRT would look like if the data were generated by the 1 compartment model that he describes. Since the data are simulated from a model that was based on a fit to the observed data, I would argue that what is being simulated is not the true null distribution but an approximation to it. Thus the cut-off value that you select is a prediction. While I believe that this is one way to proceed, the method is not as absolute (in my mind) as Nick suggests. The assessment of type 1 error of 5% is a prediction not a truth. Perhaps a simpler way to proceed here is to look at a few diagnostics. Check the model's condition number to see that the model is not over-parameterized. Check to make sure the theta estimates don't indicate the 2-cmt model is wanting to collapse into a 1-cmt. Look at the WRES and IWRES plots versus time to make sure the 1-cmt model is adequately characterizing the peak concentrations and the tail (no time trends). If you are still unsure, simulate from the 2-cmt and 1-cmt models and see which can better reproduce the data (small posterior predictive check). Matt _______________________________________________________ From: Nick Holford n.holford@auckland.ac.nz Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/22/2003 5:21 PM Matt, Thanks for your comments. I am aware of the 'true' randomization test method that you refer to (e.g. see http://wfn.sourceforge.net/wfnrt.htm). I agree that a method based on randomization of the actual data is what Fisher originally proposed e.g. the 'Fisher exact test'. However, I do not know of a way to perform this kind of randomization to investigate the performance the LLR to distinguish a one vs two cpt model. The parametric bootstrap method I described to create an empirical null distribution has been called by one group of authors the 'simulation hypothesis test' (Gisleskog PO, Karlsson MO, Beal SL. Use of Prior Information to Stabilize a Population Data Analysis. Journal of Pharmacokinetics & Biopharmaceutics 2003;29(5/6):473-505). A randomization test, based strictly on the original data, is used to estimate the probability of rejecting the null for *that specific set of data*. It is not a generalized result. The 'simulation hypothesis test' more closely resembles the asymptotic flavour of conventional statistical testing because it is obtained by considering a large sample from the proposed distribution of typical data generated from the null model. I prefer not to use the term 'simulation hypothesis test' because it focusses attention on hypothesis testing. The procedure can be seen in a broader context. It is an algorithm for generating the null distribution of a (test) statistic. This null distribution has several uses other than strictly doing an hypothesis test with some arbitrary alpha criterion e.g. it can be used to estimate the true probability of the data arising under the null, it can be used to create a table of lookup values for doing hypothesis testing, it can be used to teach and learn about the shape of distributions that are are widely assumed to have certain shapes (but these assumptions may be wrong). The NONMEM community has been exposed over the last couple of years to the problems of assuming the chi-square distribution for the null distribtion of LLR (especially with FO but also with FOCE). If you need to get involved in making important modelling decisions using hypothesis testing with the LLR then I would encourage you to verify by experiment what null distribution is required for your decision. The other diagnostics you mention are of course valuable and I would typically rely more on a visual examination of the time course of observed and predicted concs to make a decision on an individual data set. However, some tasks e.g. using clinical trial simulation to examine the power of designs, require an automatable, objective decision criterion. I have been using the randomization test to get better critical values for rejecting the null when doing clinical trial simulation. This has had a major impact on the estimates of power -- critical values for LLR changes are often much larger than expected even using FOCE. Nick -- Nick Holford, Dept Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ _______________________________________________________ From: Nick Holford n.holford@auckland.ac.nz Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/22/2003 7:39 PM Leonid, The simulation form of the randomization test can be made as crude as you wish to meet your own correctness criterion "with the highest probability that crude criteria will correctly approximate true distribution". The crudity (or precision) is determined simply by the number of replications. I am not advocating that the RT be used at every stage of model building. A 10 point change in LLR for one parameter using FOCE along with other diagnostic info is a practical approach while learning about a drug. But when the focus of the drug development modelling is to support a confirming rather than a learning focus then attention should be paid to the assumptions made when doing hypothesis testing. BTW I cannot agree with your generalization "even with the most advanced hardware/software combination you will not be able to apply this procedure ...". As other recent posts to nmusers have indicated it is quite *possible* to apply the procedure to many problems but I would not consider it an effective use of time and resources for model building. Nick -- Nick Holford, Dept Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ _______________________________________________________ From: David Foster [mailto:david.foster@adelaide.edu.au] Subject: Re: [NMusers] LLR test, AIC, BIC Date: Thursday, October 23, 2003 1:44 AM Dear all, I seem to have stimulated a bit of discussion! However, a few of my original points have been left out of the ensuing discussion. I understand the limitations of LLR, AIC etc, but I still would like to be able to calculate them, given that it doesn't really take much effort (well, sort of...). Here are my questions: I have read a thread posted some time ago on this topic, but it raised a ew questions for me. If: AIC= obj. fun + 2*NPAR BIC= obj. fun + NPAR* log(NOBS) As I understand, log(NOBS) is the natural log, and NOBS is the number of data points -is this the number of concentration-time points or the number of subjects? To examine the impact of including a covariable (no eta, just a new theta), I understand the use of the the LLR test etc just fine. But what about, for example, going from a 1 compartment to a 2 compartment model -there are new theta's and matching eta's. ***So my question is: does the "number of parameters (NPAR)" include theta's, eta's and sigma's? This also has an impact on AIC and BIC calculations as different results are obtained if one only counts theta's...*** I have also seen a correction to the AIC: AICc = obj. fun + 2*NPAR + (2*NPAR*(NPAR+1))/(NOBS-NPAR-1) Does anyone use this, and does it also apply to the BIC? Regards, David -- David Foster, PhD NHMRC Research Officer Department of Clinical and Experimental Pharmacology Faculty of Health Sciences The University of Adelaide Adelaide, South Australia 5005 Tel: +61 08 8303 5985 Fax: +61 08 8224 0685 Email: david.foster@adelaide.edu.au http://www.adelaide.edu.au/Pharm/index.htm CRICOS Provider Number 00123M _______________________________________________________ From: "Kowalski, Ken" Ken.Kowalski@pfizer.com Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/23/2003 8:34 AM David, To specifically answer your questions: NPAR is the total number of parameters estimated in the likelihood including all the parameters in Theta, Omega and Sigma. NOBS is the total number of observations NOT the total number of subjects. See Vonesh & Chinchilli, Linear and Nonlinear Models for the Analysis of Repeated Measures, Marcel Dekker, 1997, p. 262. Ken _______________________________________________________ From: "Bonate, Peter" pbonate@ilexonc.com Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/23/2003 8:43 AM David, I don't know if you ever got anybody to answer you. But The AIC was originally predicated on independent observations, like in linear regression. I am not sure if it was ever really validated for repeated measures data, but people do it all the time. NOBS is total number of observations. NPAR is the number of estimable parameters, all thetas, etas, and sigmas, including covariances. BIC tends to pick the simpler model more often than AIC or AICc. The AICc was for small sample sizes and I don't believe really applies to pop pk models. NOBS is usually much larger than NPAR, so the second-order correction term is practically zero. An excellent book on this is Model Selection and Inference by Burnham and Anderson. This is a must read. Hope this helps, pete bonate _______________________________________________________ From: Robert L. James rjames@rhoworld.com Subject: Re: [NMusers] LLR test, AIC, BIC Date: 10/23/2003 9:00 AM David, I regularly use BIC (along with graphic examination, convergence of standard errors, and physiological intuition) as a criterium in model development. I prefer BIC because it is more parsimonious than AIC. I always try to error on the side of a simpler model. For NPAR I use all parameters estimated by the model (thetas,etas,sigmas). For NOBS I use the number of concentration-time points. Robert _______________________________________________________