From: Joern Loetsch j.loetsch@em.uni-frankfurt.de Subject: [NMusers] covariate selection question Date: Tue, 17 Jan 2006 13:45:24 +0100 Dear NONMEM users, adding a particular covariate to the basic model does not provide a significant decrease in -2LL. In the course of model building and covariate assignment, adding that particular covariate at a later step did significantly improve -2LL. Deleting that covariate from the final, full model, does significantly increase the objective function. Based on -2LL without regarding now the change in IIV, should that covariate be taken in the final model or not? Thank you in advance. Regards Jörn Lötsch _______________________________________________________ Prof. Dr. med. Jörn Lötsch pharmazentrum frankfurt/ZAFES Institut für Klinische Pharmakologie Johann Wolfgang Goethe-Universität Theodor-Stern-Kai 7 D-60590 Frankfurt am Main Tel.:069-6301-4589 Fax.:069-6301-7636 http://www.klinik.uni-frankfurt.de/zpharm/klin/ _______________________________________________________ From: Michael.J.Fossler@gsk.com Subject: RE: [NMusers] covariate selection question Date: Tue, 17 Jan 2006 08:20:24 -0500 What you describe happens frequently. Two (or more) covariates may not have much influence by themselves, but together they influence the fit to a significant extent. However, I would not obsess over the -2LL as a sole criterion by which to judge the fit. Ask yourself the following questions: Does the inclusion of the covariate a) decrease the standard error of the relevant parameter,b) improve the fit (as judged by plots), c) make biologic sense? I would urge you not to rely on the -2LL as the sole criterion - I have seen too many examples where inclusion of a covariate decreased the -2LL but had a negative impact on the fit. Mike ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Michael J. Fossler, Pharm. D., Ph. D., F.C.P. Director Clinical Pharmacokinetics, Modeling & Simulation GlaxoSmithKline (610) 270 - 4797 FAX: (610) 270-5598 Cell: (443) 350-1194 Michael_J_Fossler@gsk.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________________ From: "Jakob Ribbing" Jakob.Ribbing@farmbio.uu.se Subject: RE: [NMusers] covariate selection question Date: Tue, 17 Jan 2006 15:47:48 +0100 Dear Joern, Mike and others, I would agree with Mike. To answer Joerns question on how to interpret the results of the stepwise selection: As far as the p-value/LRT can guide you in selecting the covariate model you should keep this particular covariate in the model. Just be sure to use a p-value/likelihood-ratio which is adjusted for the number of parameter-covariate relations that you have tested (or otherwise explored). To judge if the covariate relation makes biological sense it may be helpful to understand why the covariate first was not significant but later became so. There could be a number of reasons for the covariate-selection behaving this way: 1. Including a very influential covariate-relation may make the picture clearer and other, weaker relations appear from out of the mist due to the reduced random noise. For example, including CRCL on CL for a drug eliminated mainly by renal filtration would reduce the (random) variability in CL so that less important covariate relations could be found 2. One covariate relation could be masking another relation. If the first relation is included in the model the other becomes statistically significant. This behaviour is due to correlation between covariates that both end up influencing the same structural-model parameter (or correlation of estimate between two structural-model parameters). An example of this could be a drug with higher CL for females (compared to males of the same size). This relation may be masked by males generally being larger than females (and size is often an important covariate). Including the one covariate would make inclusion of the other statistically significant. Another example would be model misspecification: Including a linear covariate relation (where another relation would have been more appropriate) could cause a second covariate to compensate for this, eg if WT instead of lean-body weight is included BMI may become statistically significant to compensate for this 3. Random. If the LRT gave almost the same result when including the covariate to the basic and to the latter model (e.g. the nominal p-value changed from 0.011 to 0.099) this could be seen as just a random change. If the p-value required for inclusion were 0.01 the covariate is significant in the latter test but not in the first. This is a problem with all selection methods which either includes a covariate fully (according to the maximum-likelihood estimate) or not at all. On the other hand, getting rid of all the "maybe"-covariates may provide the best big picture of what is important. Further, using the LRT often translates into a p-value - whatever that will tell you… :>) Jakob _______________________________________________________ From: mark.e.sale@gsk.com Subject: RE: [NMusers] covariate selection question Date: Tue, 17 Jan 2006 10:15:58 -0500 Joern, Thanks for the opportunity for me to once again rant on my favourite subjects, the limitations of step wise model building. This behaviour is well documented (see Wade JR. Beal SL. Sambol NC. Interaction between structural, statistical, and covariate models in population pharmacokinetic analysis. Journal of Pharmacokinetics & Biopharmaceutics. 22(2):165-77, 1994 Apr.). First, as you imply, one should clearly not base the final model decision on -2LL alone. Does the covariate addition have any ther good or bad effects (better plots, better PPC, smaller inter individual variances)? Is it biologically plausible or even almost certainly the case? But, on to your question. Imagine, if you will, that you are trying to explain the area of a rectangle. One covariate is a (very) imprecise measure of the length, another is a somewhat less imprecise measure of the width. You put in the length covariate and find a small improvement in ability to explain area - it is a very imprecise measure - or perhaps your structural model is wrong (rather than Area = theta(1)*cov_l x theta(2), you have Area = theta(1)*exp(cov_l) x theta(2), where cov_l is the covariate proportional to length). Next you try cov_w on theta(2) (Area= theta(1) x theta(2)*cov_w) - and this is better. Now, you go back and try cov_l as a predictor of theta(1) - and you find it is helpful, now you have the correct structural and covariate model (with cov_l a predictor of length and cov_w a predictor of width). It can be shown that this can easily occur (the Wade and Beal paper demonstrates it for structural, covariate and variance effects). Hence, my view that step wise searches will only give you the correct answer if all effects are independent - which they never are in complex biological systems. Therefore, step wise searches will never give you the correct answer. so, the answer is, put the covariate in. Mark Sale M.D. Global Director, Research Modeling and Simulation GlaxoSmithKline 919-483-1808 Mobile 919-522-6668 _______________________________________________________ From: "Joern Loetsch" j.loetsch@em.uni-frankfurt.de Subject: RE: [NMusers] covariate selection question Date: 17-Jan-2006 09:29 Mark, thank you very much indeed. Now, I am still a bit uncertain. One, as you suggest, I could the covariate leave in, which is supported statically. Two, I could it leave out because it does not convincingly improve anything and moreover, it is Weight on F, which is not very helpful anyway. So I tend rather to leave it out. Because I am going to publish the staff, I wanted support to be at ease with the covariates selection that is going to be published and to avoid the taste of randomness or biased selections not supported by statics in the publication. Thank you again Regards Jörn _______________________________________________________ Prof. Dr. med. Jörn Lötsch pharmazentrum frankfurt/ZAFES Institut für Klinische Pharmakologie Johann Wolfgang Goethe-Universität Theodor-Stern-Kai 7 D-60590 Frankfurt am Main Tel.:069-6301-4589 Fax.:069-6301-7636 http://www.klinik.uni-frankfurt.de/zpharm/klin/ _______________________________________________________ From: mark.e.sale@gsk.com Subject: RE: [NMusers] covariate selection question Date: Tue, 17 Jan 2006 Joern, It becomes a judgement call (reason #2 to abandon step wise model selection - it isn't objective). Ideally, all of this should be specified prior to analysis - and you don't ask questions that you don't have some a priori reason to believe might be true - don't ask if clearance is a function of astrological sign. In a Bayesian framework, you might have a different criteria for hypotheses that you believe a priori (1 point for wt on volume, but 80 points for hair color on volume) - better statisticians than I could put this in a more rigorous framework. (if you prior probability is 0.8, then delta OBJ = 1, if prior is 0.000001 then delta OBJ = 100, if your prior is 0.0, don't ask the question). It is always better to define subjective criteria prior to the beginning of the analysis, making subjective decisions during an analysis really violates principles of data analysis. (but we do it all the time) Mark Sale M.D. Global Director, Research Modeling and Simulation GlaxoSmithKline 919-483-1808 Mobile 919-522-6668 _______________________________________________________ From: Paul Hutson prhutson@pharmacy.wisc.edu Subject: RE: [NMusers] covariate selection question Date: Tue, 17 Jan 2006 22:50:33 -0600 Mark raises an interesting point that may be appropriate for comments from the Wise Ones. I typically consider model building in terms of increasing the number of exponential decay terms, then the covariates and etas trying to lower the objective function. Is it the standard of practice that the covariates are prospectively identified as being collected for inclusion in the model-testing, or are they tested in the model post hoc just because they are available from the exhaustive efficacy & safety testing seen with Phase I and II trials? Most fundamentally, should I declare the covariates to be tested before beginning a trial? but also, Should a higher standard (greater reduction) in the objective function be imposed when doing post hoc data mining because of the effect of repeated measures? I can see where the subsequent use of bootstrap testing tests the robustness of the contribution (value) of the covariate, but should the covariate even be have been accepted in the first place? Part of my concern is the future (maybe current for some of you?) testing of genotypic mutations as covariates of clearance and/or response. Thanks in advance for your wisdom. Paul -- Paul R. Hutson, Pharm.D. Associate Professor UW School of Pharmacy 777 Highland Avenue Madison WI 53705-2222 Tel 608.263.2496 Fax 608.265.5421 Pager 608.265.7000, p7856 _______________________________________________________ From: mark.e.sale@gsk.com Subject: RE: [NMusers] covariate selection question Date: Wed, 18 Jan 2006 08:17:39 -0500 Mats, It isn't the step wise part of the traditional model building that is subjective. Step wise can be completely objective - as in the automated step wise linear and logistic regression algorithms in many stats packages. The subjective part comes when someone is weighing a 3 point decrease in OBJ, with maybe a little better time vs wres (or cwres), but not a compelling biological basis, but now fails a covariance step (which Nick tell us isn't really important anyway - and I'm beginning to agree ....) If all these were specified (and quantifiable?) prior to the analysis, then it would be fine, but they tend to be done real time during the analysis. So .... did the analyst at GSK (I mean Pfizer) decide that the race effect on K21 shouldn't be included because the plot really wasn't any better and it doesn't make any biological sense, or because it might cause problems in discussions about labelling? WRT your comment I think we all agree that improved model building procedures are valuable, but maybe the part that least needs new methods is the covariate model, we need much more guidance on how to build good structural models. I agree, I suspect that the best opportunity for improvement in most models is in the structural model, not the covariate model. Mark Sale M.D. Global Director, Research Modeling and Simulation GlaxoSmithKline 919-483-1808 Mobile 919-522-6668 _______________________________________________________ From: Mats Karlsson" mats.karlsson@farmbio.uu.se Subject: RE: [NMusers] covariate selection question Date: 18-Jan-2006 04:21 Hi Mark, Some loose thoughts. Stepwise doesn't equal subjective. Often the stepwise covariate modeling is the least subjective in the entire stepwise procedure of building a population model. It is generally clearer outlined in analysis plans than the stepwise building of the structural or stochastic parts of the model. We know that stepwise model selection has problems, but most of the criticism seems to be focusing on the covariate sub-model. The reason for that may be that none of us would take the time and effort to try a structural or stochastic model that didn't make biological sense. However for covariate model building we do try models that don't make biological sense to everyone. The reason being: (i) it is easier to try too many relations than too few (given that opinions about "biological sense" varies), and/or (ii) it is perceived that regulatory authorities want to have information even about relations that don't make sense (to e.g. to confirm expected non-interactions). I like your point about penalizing decisions based on prior belief. The point that "making subjective decisions during an analysis really violates principles of data analysis" is relevant for confirmatory analyses, but most of the time when we apply biologically rational models we are in learning mode and not making subjective (or data-driven) model building decisions would make the analyses rather useless. The article by Wade et al that you reference, concern mostly the fact that if you get the structural model wrong, other parts of the model can become wrong too (like the covariate model). One would expect that this works the other way around too: If you get your covariate model wrong, the structural model may get wrong too. Similar interactions are likely to occur between other model parts too. Regarding your last comment "step wise searches will never give you the correct answer": (i) the alternative to stepwise searches is to postulate a model before looking at the data - generally not a good idea (ii) no model building procedure will give us the correct answer... I think we all agree that improved model building procedures are valuable, but maybe the part that least needs new methods is the covariate model, we need much more guidance on how to build good structural models. Best regards, Mats -- Mats Karlsson, PhD Professor of Pharmacometrics Div. of Pharmacokinetics and Drug Therapy Dept. of Pharmaceutical Biosciences Faculty of Pharmacy Uppsala University Box 591 SE-751 24 Uppsala Sweden phone +46 18 471 4105 fax +46 18 471 4003 mats.karlsson@farmbio.uu.se _______________________________________________________ From: "Gobburu, Jogarao V" GOBBURUJ@cder.fda.gov Subject: RE: [NMusers] covariate selection question Date: Wed, 18 Jan 2006 14:34:41 -0500 Dear Mats, I (and others in our pharmacometrics team) could not help but notice your following remark: "(ii) it is perceived that regulatory authorities want to have information even about relations that don't make sense (to e.g. to confirm expected non-interactions)." The following are my personal comments: In my experience, this perception is unfounded. But then perception is reality, as they say. The exposure-response guidance clearly encourages mechanism-based modeling. In fact, I am unaware of any label where the dosing is based on a prognostic factor that does not make biological sense (and derived using mixed-effects modeling). Statistical inference can (only) provide supportive evidence for mechanism-based covariates. I presume you have had an experience with that type of issue and hence your statement. Unless, the specifics of your experience are known, a meaningful discussion cannot occur. Now, there are cases when modeling found covariates that did not make biological sense - no party involved with the drug wanted dose adjustments based on that covariate. There are cases when the opposite (ie., mechanistic not included, but then included after subsequent discussions) also occurred. On the other hand, there is empirical evidence in few cases where the strong prior beliefs do not hold good - so your 'don't make sense' becomes subjective and depends on prior experience one might have had. All decisions will have some risk (false negatives/positives), the only way I can think of increasing comfort in taking this 'risk' is by adhering to biology. The alternative is prohibitively costly. Joga _______________________________________________________ From: mark.e.sale@gsk.com Subject: RE: [NMusers] covariate selection question Date: Wed, 18 Jan 2006 15:45:40 -0500 Joga, - the rant continues; Thanks for your insight, the view that you relate is consistent with my personal experience with the FDA. But, I think it is important to point out the risk associated with that view. Not that I disagree, I entirely agree, but think that the risk of this approach needs to be pointed out. The risk is a high degree of inertia in our understanding. If we only ask question that are based on what we already believe, it will greatly impede progress. I certainly agree (as I believe you and Mats are saying), that the "data dredging" can only yield hypotheses, not conclusions. But, it is reasonable to ask the questions, even questions that seem silly, based on our current understanding of biology (may I point out: 1. H pylori and ulcers (silly hypothesis, turned out to be true) 2. PVCs and sudden cardiac death (everyone knew that preventing PVCs would reduce sudden death, turns out not to be true) 3. Beta Carotene and Vitamin E and cancer (lots of retrospectively controlled data, good biological explanation - turned out not to be true) the list of hypotheses that were inconsistent with current understanding of biology - that turned out to be true is very long. A good Bayesian, I think, never accepts a hypothesis - only assigns a probability that it is true - while assigning some non-zero probability to many other hypotheses, even the silly ones. In this way, as data is accumulated, we could, in theory, eventually accept hypotheses that are currently viewed as silly, but in fact are correct. Unfortunately, human being have a remarkably limited ability to entertain multiple hypotheses - in fact, rarely can we really entertain more than one at a time (this has actually been researched - and no one can entertain more than about 3 at once). We have one hypotheses, which decide if it is true (invariable we decide that it is, otherwise we wouldn't have a grant to write). Only if that hypothesis turns out not to be true do we look for another. Importantly, we also have a remarkable ability to dismiss data that is inconsistent with our current view of the world - also documented. (e.g., events over the past few years in certain countries in the Middle East). It is generally thought that Gregor Mendel discarded lots of data that was inconsistent with his hypothesis about genetics - his statistics were far to perfect to be random - every experiment sorted nearly exactly as it should. The result of these two effects is a high degree of persistence of hypotheses/conclusions, regardless of whether they are correct. I don't have a solution, to build models without a basis in understanding of biology is silly, and will without question lead to many wrong conclusions. But, to not ask questions just because our current view of biology would reject it as silly is a problem as well. As usual, Bayesians have the answer, if only we were mentally capable of objectively entertaining 10 competing hypotheses at the same time. In the US at least, the NIH funding system insists on one hypothesis, forcing researchers to decide what they believe, and then defend it to the death, rather than keeping an open mind. Mark Sale M.D. Global Director, Research Modeling and Simulation GlaxoSmithKline 919-483-1808 Mobile 919-522-6668 _______________________________________________________ From: "Kowalski, Ken" Ken.Kowalski@pfizer.com Subject: RE: [NMusers] covariate selection question Date: Thu, 19 Jan 2006 16:49:25 -0500 NMusers, I think blaming the NONMEM OBJ and stepwise procedures when we don't like the result of a covariate model selection process is a bit misplaced. I think there are 3 major factors that contribute to a successful covariate model building strategy. To make my point I'll draw an analogy to building a house. The quality of the house we build depends on 1) the materials, 2) the tools available to work with this material, and 3) the proficiency of the builder in using those materials and tools. In covariate model building the material we have to work with is our data, the tools available to us are NONMEM, stepwise procedures, diagnostic plots, etc., and the builder is the modeler. A successful covariate model building strategy depends on how well the modeler understands the limitations of the data, and how effective they are in using the available tools with the data given the limitations. Of course, there are times when our tools are inadequate for the task at hand, however, I think more often the issue is not fully appreciating the limitations of our data and not tailoring our model building strategies to these limitations. I know I'm treading on old ground but in my opinion the diagnostic output from a successful COV step to help us understand the limitations of our data and how it can be used to guide our model building strategy is under-appreciated. Here is my 2 cents on covariate model building and stepwise procedures. I apologize in advance for the long and rambling message, and for treading on old ground. 1) We generally perform systematic procedures for covariate model building to identify a parsimonious model with the fewest covariates that explain as much of the inter-individual variability as possible. We should not be viewing such procedures as providing an assessment of the statistical significance of each covariate parameter. If we want to assess statistical significance of each and every covariate parameter that we might entertain in a systemate covariate selection procedure (e.g., stepwise procedures) we are better off doing this based on a "full model" (see Point 9c below). 2) Stepwise procedures can routinely find a parsimonious model, however, there is no guarantee that they will find the most parsimonious model nor the most biologically plausible model. There may be several almost equally parsimonious models of which a stepwise procedure may find one. Other parsimonious models not selected by a stepwise procedure may be more biologically plausible. 3) While stepwise procedures and the delta OBJ often cannot be used to find a biologically plausible parsimonious model among a search space of both plausible and non-plausible candidate models, this should not be considered an indictment of stepwise procedures or the delta OBJ. In my opinion it should be considered an indictment of the practice of casting too wide a net searching numerous covariate parameters of which many may have dubious biological relevance. While we try to be mechanistic and guided by biology/pharmacology in postulating structural models, when it comes to specifying covariate submodels we often resort to empiricism. We are willing to investigate numerous covariate effects on several parameter submodels in part because it is easy to use a systematic procedure and just turn the crank with little forethought to the covariate parameters we are evaluating. In so doing, we often cross our fingers and hope that the final model selected by a stepwise procedure is one that can be scientifically justified. When the selected model is not scientifically justifiable, it is easy but misguided to place the blame on the stepwise procedure. In this setting the modeler should ask themselves why they are investigating covariate effects that cannot be scientifically justified. Of course, as Mark has pointed out, we need to be cautious here and recognize that with model building we are generating hypotheses and we must be open-minded to possible hypotheses that may run counter to our prior beliefs. 4) Better upfront planning and judicious selection of covariates and covariate-parameter effects can help steer a stepwise procedure to focus only on biologically plausible models. In specifying the covariate parameters to be evaluated by a stepwise procedure, the modeler should ask themselves upfront, "Am I prepared to accept any model within the search space as being scientfically justifiable?" If the answer to this question is "no" then the modeler should re-think the set of covariate parameters before undertaking the stepwise procedure. 5) There may be degrees of biological plausibility. For example, a gender or sex effect may be interpreted as a surrogate for body size rather than an intrinsic gender/sex effect. In this setting one may question whether gender/sex should be included in the investigation. To be more plausible as well as parsimonious in our search of covariate effects the modeler may wish to choose one body size covariate among several measures of body size (body weight, lean body weight, BMI, BSA, etc.) that they feel is the most plausible and use that in the covariate search. Of course the modeler can and should evaluate diagnostics (graphically) to ensure that any trends in the other body size covariate effects not included in the stepwise procedure can be explained by the one selected for evaluation in the stepwise procedure. 6) To avoid or at least reduce the problems associated with collinearity and selection bias we should try to understand the limitations of our data to provide information on the covariate parameters that we wish to evaluate in a stepwise procedure. This is where I depart from others regarding the value of the COV step. I do agree that a successful COV step should not be used as a "stamp of approval" or down-weight/penalize models when the COV step fails. However, when the COV step runs successfully, there is useful diagnostic information in the COV step output that can help steer us away from some of the pitfalls of stepwise procedures such as those encountered by Joern which initiated this email thread (see Points 7 and 8). 7) During base structural model development it is useful to inspect the COV step output to assess correlation in the parameter estimates before undertaking a stepwise procedure. If two structural parameter estimates are highly correlated the modeler may be faced with a difficult decision as to whether a particular covariate effect is more plausible on one structural parameter or the other as there may be insufficient information in the data to investigate the covariate on both structural parameters. For example, suppose concentration-response data has sufficient curvature to support fitting an Emax model but Emax and EC50 may not be precisely estimated. In this setting the correlation in the estimates of Emax and EC50 may be high. This could lead to potentially unstable covariate model investigations (leading to convergence problems) if we begin to evaluate the same covariate on both parameters. For example, suppose that we are interested in evaluating the effect of sex on both Emax and EC50. Inclusion of a sex effect simultaneously on both Emax and EC50 may exacerbate the instability of the model such that the model may not converge. Because of the correlation in these two structural parameter estimates there may be insufficient information in the data to distinguish whether the sex effect should be on the potency or efficacy or both. In this setting the modeler should question whether it is more plausible to investigate a sex effect on potency or efficacy recognizing the limitations of the data to evaluate it on both. If one is more plausible than the other we should not rely on a stepwise procedure to select among the two as it could by random chance select the one that is less plausible simply due to the collinearity in the parameter estimates. 8) Another place where I use the COV step output to help with covariate model building is in evaluating a "full model". By full model I mean the model in which all of the covariate parameters that one might evaluate in a stepwise procedure are included in the model simultaneously. If the COV step output from this full model suggests that it is stable (i.e., no extremely high correlations or numerous moderately high correlations that would result in a extremely high ratio of the largest to the smallest eigenvalues of the correlation matrix of the estimates...obtained from the PRINT=E option on the $COV step) then we have some diagnostic information to suggest that the data can support evaluation of all the covariate effects. 9) Evaluating a full model has intrinsic value regardless of whether or not the full model is used as part of a systematic covariate model building procedure. Some of the benefits of fitting a full model include: a) COV step output can be used to ensure that the data can support the evaluation of ALL the covariate effects of interest (see Point 8 above). b) Among a class of hierarchical covariate models, the full model represents the best that we can do with respect to OBJ. That is, the delta OBJ between the base and full model is the largest. Thus, the full model can be used to help assess the degree of parsimony of a final model selected by a stepwise procedure. A parsimonious model is one that has an OBJ as close to the full model OBJ but with as few covariate parameters as possible. So, if we use a forward selection procedure in a situation like Joern's where perhaps the combination of the two covariate effects that result in a large drop in OBJ only occurs when both are included simultaneously never gets evaluated by the forward selection procedure, we may very well end up with a final model that is not very parsimonious in comparison to the full model. In this particular setting, it may be advantageous to perform a pure backward elimination procedure beginning with the full model, which by definition, would include both of these covariate effects in the model at the start of the procedure. c) If one is interested in assessing statistical significance of ALL the covariate effects, bootstrapping the full model to construct confidence intervals and/or bootstrap p-values is less likely to be prone to statistical issues regarding the adequacy of the Chi-Square assumption for the likelihood ratio test and the problems associated with multiplicity of testing in using a final model based on a covariate selection procedure to assess statistical significance as both issues can result in the inflation of type I errors. Moreover, the issue of ruling out a DDI effect can be easily incorporated by including it in the full model. d) I'll make a shameless plug for the WAM procedure (see Kowalski & Hutmacher, JPP 2001;28:253-275) which makes use of the COV step output from a full model run to identify a subset of potentially parsimonious models that can then be fit in NONMEM. Unlike stepwise procedures that can only select a single parsimonious model, the WAM procedure can give the modeler a sense of the competing models that may have comparable degrees of parsimony. For those interested, Pfizer in collaboration with Pharsight has developed a freeware version of the WAM software that can be downloaded from the NONMEM repository (ftp:/ftp.globomaxnm.com/Public/nonmem). 10) The benefits of the COV step and full model evaluation are difficult to realize unless we are more judicious in our selection of covariates to be investigated. We need to change our practices to understand the limitations of our data when we perform covariate model building and to apply biological reasoning more effectively in developing our submodels. Ken _______________________________________________________ From: mark.e.sale@gsk.com Subject: RE: [NMusers] covariate selection question Date: Fri, 20 Jan 2006 11:47:04 -0500 Ken, I agree with (nearly) everything you said. Especially the part about casting too wide a net. Where we disagree is: "Stepwise procedures can routinely find a parsimonious model, however, there is no guarantee that they will find the most parsimonious model nor the most biologically plausible model. There may be several almost equally parsimonious models of which a stepwise procedure may find one. Other parsimonious models not selected by a stepwise procedure may be more biologically plausible." You seem to imply (perhaps you don't mean to), that step wise will typically find the most parsimonious model. Not only is there no guarantee of finding the most parsimonious model, you have no reason to expect that you will. And in fact we have internal data that step wise rarely, if ever finds the best solution (in our, as yet unreported results, stepwise is about 0 for 20 in finding the optimal model - a record that makes the Michigan football team look good - Go Bucks). For stepwise to find the true optimal solution the assumption of independence of effects. The index case in this discussion is strong evidence (and I think we all must believe intuitively) that covariate effects - and probably all effects are highly dependent. Stepwise cannot be expected to given you the most parsimonious model in the presence of dependencies of the effects. Personal opinion: The only reason we use step wise is because we haven't found a better way (with the exception of WAM of course). Textbooks on combinatorial optimization will provide insight into better ways. Mark Sale M.D. Global Director, Research Modeling and Simulation GlaxoSmithKline 919-483-1808 Mobile 919-522-6668 _______________________________________________________ From: "Bill Bachman" bachmanw@comcast.net Subject: RE: [NMusers] covariate selection question Date: Fri, 20 Jan 2006 12:49:21 -0500 Mark, While I agree that stepwise covariate selection is neither the optimal nor most statistically rigorous method of model building, your "0 for 20" results seem contrary to numerous simulation and analysis studies in which the stepwise results are in good, if not perfect, accord with the known simulation model. To me this suggests that while not that the ultimate method, it is not completely ludicrous and could be a part of the modelers toolbox. To totally dismiss it seems a bit drastic in my opinion. Bill _______________________________________________________ From: mark.e.sale@gsk.com Subject: RE: [NMusers] covariate selection question Date: Fri, 20 Jan 2006 12:57:19 -0500 Bill, I might suggest that simulation studies are based on a model in which there are no complex interactions between effects. Interestingly, even when we have done simulation studies, we sometimes find that the optimal model is not the one used for the simulation, due to random variation, a better model exists. This has only been things like the form of the covariate relationship (exponential rather than linear), or very commonly, the structure of the OMEGA matrix. I would be even more generous than to say that step wise is not completely ludicrous - it actually isn't too bad. And I don't totally dismiss it - I'm using at this very moment. But, the only way to be absolutely positive of finding the optimal solution is an exhaustive search, which we have done a couple of times - takes a lot of computer time. One doesn't however, have to do an exhaustive search to proof that a better model exists - one only has to find a single better model. Mark Sale M.D. Global Director, Research Modeling and Simulation GlaxoSmithKline 919-483-1808 Mobile 919-522-6668 _______________________________________________________ From: "Kowalski, Ken" Ken.Kowalski@pfizer.com Subject: RE: [NMusers] covariate selection question Date: Fri, 20 Jan 2006 13:06:24 -0500 Mark, I certainly did not mean to imply that stepwise procedures can find the most parsimonious model...so I don't think we have any disagreement. Of course we should not equate the most parsimonious model with the true model. Even though we may have sufficient power to detect certain covariate effects, the power to select the true correct model (correct combination of covariates), assuming that it is in within the search space of hierachical models being investigated, is often very poor regardless of the covariate selection procedure (including the WAM). In my paper on the WAM I did a simulation study for a relatively small problem (80 subjects, 400 observations, 9 covariate effects) where the true model was contained within the 2**9=512 possible covariate models. While the power was reasonable to detect indivdidual covariate effects (80-90%) the power to correctly identify the true model out of the 512 possible models was only 11.5%. Of course with larger datasets we may have greater power but if we also substantially increase the search space (number of covariate parameters) this will reduce our power to identify the correct model. Ken _______________________________________________________ From: Leonid Gibiansky leonidg@metrumrg.com Subject: RE: [NMusers] covariate selection question Date: Fri, 20 Jan 2006 13:59:57 -0500 Just to pick on the word: "Interestingly, even when we have done simulation studies, we sometimes find that the optimal model is not the one used for the simulation, due to random variation, a better model exists." ------ You cannot find better model than the true one (the one that was used for simulations). If you was not able to get it, this either mean that the study design (population selection/sample size in case of the covariate model) cannot support the model or the design has no power to separate very similar models. Leonid _______________________________________________________ From: "A.J. Rossini" blindglobe@gmail.com Subject: RE: [NMusers] covariate selection question Date: Fri, 20 Jan 2006 21:13:32 +0100 This isn't quite true; it's quite context-laden. To clarify, and I'm nitpicking here: Any finite dataset (datasets!) could be reasonably generated by a number of models, not necessarily the ones you used. The larger the individual dataset (and the more independent datasets taken), the better the chance that you actually rediscover the model that you originally used for generation. Of course, in a sense you are cheating, since you have a good clue on how to restrict the space of potential models in order to "rediscover" it. While we like to simulate, we have to remember that just as the same model can generate many realized datasets, the same dataset can originate from a number of models, and this has implications. And back to the original point: stepwise procedures are notoriously awful, failing to preserve type I error in the final model, i.e. they don't lead to sensible decisions based on the model unless you are lucky. Regularization methods of variable selection (where you slowly increase the amount that covariates contribute and look at the selection paths) seem to do reasonably for automatic variable selection by effect for linear and generalized linear (categorical data) regression, and I thought I'd seen a recent paper on this for nonlinear regression, but not yet for mixed effects. I'm not sure how you'd balance fixed and random effects in this case. _______________________________________________________ From: Mats Karlssonmats.karlsson@farmbio.uu.se Subject: RE: [NMusers] covariate selection question Date: 24 January 2006 12:27 PM Mark and all, I believe we mainly build models because of their predictive ability. In relation to that, it is hard to see any other model than the model data were simulated from as the "optimal model". Mark, what definition of "optimal model" do you use? The discussion on covariate modeling procedures has gone on for years. We know that all procedures have theoretical deficiencies. However, are the properties of these methods so different that their predictive performances are clinically relevant? When we in Uppsala have applied different covariate methods in parallel on real data and then evaluated the relative predictive performance of the final models on a separate set of real data, we have found only marginal differences between the model building procedures. Does anyone have experience with clinically relevant differences in predictive performance between covariate model building procedures for real data? We need also to consider that the model building procedure itself is only one approximation in the covariate model building. Many other are usually ignored. For example (i) many covariates are measured with error, but this is ignored in the analysis of data, (ii) the time-course of covariates are usually imputed using unrealistic assumptions, (iii) many time-varying covariates are assumed time-constant, (iv) models for the shape of the covariate-parameter relation is often, if at all, assessed using simplistic methods, (v) there is usually assumed that there is no inter-individual variability in covariate relationships, (vi) change in a covariate within a subject is assumed to induce the same parameter value change as the same covariate difference between subjects, (vi) interaction between covariates, that is a parameter-covariate relation is dependent on the value of another covariate, are usually ignored, (vii) missing covariate data are regularly imputed with simplistic procedures, (viii) ... Best regards, Mats -- Mats Karlsson, PhD Professor of Pharmacometrics Div. of Pharmacokinetics and Drug Therapy Dept. of Pharmaceutical Biosciences Faculty of Pharmacy Uppsala University Box 591 SE-751 24 Uppsala Sweden phone +46 18 471 4105 fax +46 18 471 4003 mats.karlsson@farmbio.uu.se _______________________________________________________