From: "HUTMACHER, MATTHEW [Non-Pharmacia/1825]" <matthew.hutmacher@pharmacia.com>

Subject: Slow Gradient Method.

Date: Thu, 24 May 2001 14:08:44 -0500

I am trying to use the CENTERING option of the ESTIMATION statement for a mixture model. I get a statement at the end of the report file that says "CENTERED METHODS MUST USE SLOW GRADIENT METHOD WITH MIXTURE MODEL". Can someone tell me how to use this method and what it means/does?

From: "Niclas Jonsson" <Niclas.Jonsson@farmbio.uu.se>

Subject: Re: Slow Gradient Method.

Date: Thu, 24 May 2001 20:56:54 -0500 (CDT)

You can use the slow method by giving the SLOW option of the $ESTIMATION record.

Department of Pharmaceutical Bioscienses

Division of Pharmacokinetics and Drug Therapy

E-mail: niclas.jonsson@farmbio.uu.se

From: "Piotrovskij, Vladimir [JanBe]" <VPIOTROV@janbe.jnj.com>

Subject: RE: Slow Gradient Method.

Date: Mon, 28 May 2001 08:57:45 +0200

One more nondocumented NONMEM option? Can anybody tell something about it?

From: Nick Holford <n.holford@auckland.ac.nz>

Subject: Re: Slow Gradient Method.

Date: Mon, 28 May 2001 19:30:29 +1200

> "Piotrovskij, Vladimir [JanBe]" wrote:

> One more nondocumented NONMEM option?

> Can anybody tell something about it?

What do you mean 'nondocumented'? Can't you read FORTRAN? :-)

IF (OPTWO.EQ.2.AND.OPNOGR.EQ.0) THEN

46 FORMAT ('0CENTERED METHODS MUST USE SLOW GRADIENT METHOD',

If the variable OPNOGR.NE.0 then the SLOW GRADIENT method is used. Just check out what this does in the following files and you will learn exactly what the SLOW GRADIENT method does.

IF (ICONTR.EQ.1.AND.OPETA1.EQ.1.AND.OPNOGR.EQ.0) THEN

IF (OPTWO.EQ.2.AND.OPNOGR.EQ.0) THEN

IF (OPNOGR.LT.0.OR.OPNOGR.GT.1) GO TO 1110

IF (OPNOGR.EQ.0.AND.OPLAPN.EQ.1) GO TO 1110

IF (OPNOGR.EQ.1) WRITE (UNOUT,1123) NY(OPNOGR+1)

Nick Holford, Divn Pharmacology & Clinical Pharmacology

University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand

email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556

http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

From: "Bachman, William" <bachmanw@globomax.com>

Subject: RE: Slow Gradient Method.

Date: Wed, 30 May 2001 07:55:14 -0400

You are correct that this idea is not documented.

There are essentially two ways NONMEM obtains gradients needed for performing the pseudo-Newton minimization. One involves only numerical derivatives, the other involves a combination of analytical and numerical derivatives. As you might imagine, the first is often slower than the second. It is therefore used less often. It is used when NUMERICAL is specified.

From: "Piotrovskij, Vladimir [JanBe]" <VPIOTROV@janbe.jnj.com>

Subject: RE: Slow Gradient Method.

Date: Wed, 30 May 2001 14:18:25 +0200

Thanks Bill for your explanation. Now I understand that SLOW can hardly have any advantages over the default algorithm. Meanwhile I performed some exploration and found that using SLOW with METHOD=COND did not improve the convergence behaviour in complicated cases. I have currently one example where FOCE fails to converge whereas FO method converges perfectly. With FOCE NONMEM stops due to rounding errors, and before this it does a few iterations with very high gradient (>10^5) and no changes in parameters at all.

From: "Stephen Duffull" <sduffull@pharmacy.uq.edu.au>

Subject: RE: Slow Gradient Method.

Date: Thu, 31 May 2001 08:30:15 +1000

Based on the discussions I am a little unsure what the value of the slow gradient method is. I would have thought that analytical derivatives would be more accurate and perhaps more stable than numerical - and therefore I am not sure why a potentially slower and perhaps less reliable method is of interest to us? Could you explain where the numerical method might be valuable?

I presume for situations where the model can only be described as ODEs then there might be little choice - but otherwise I can't see the advantage.

http://www.uq.edu.au/pharmacy/duffull.htm

From: "Niclas Jonsson" <Niclas.Jonsson@farmbio.uu.se>

Subject: RE: Slow Gradient Method.

Date: Thu, 31 May 2001 09:31:36 +0200

I don't know if the SLOW method uses numerical derivatives or not but it is perhaps important to point out that the SLOW option on the $ESTIMATION is not the same as the NUMERICAL option. The NUMERICAL option requests that the second derivatives for the LAPLACE method are computed numerically, which, I presume, is quicker and sometimes more tractable than analytical second derivatives.

As I recall it, the SLOW option gives you the version of FOCE that was implemented in NONMEM IV. In one of the beta versions of NONMEM V there was an improvement to the FOCE algorithm that made it about three times faster (my own, hardly remebered, benchmarks). The new method could, however, not handle certain cases, i.e. CENTERing, mixture model and when the NUMERICAL option is used.

From: Erik Olofsen <E.Olofsen@lumc.nl>

Subject: RE: Slow Gradient Method.

Date: Thu, 31 May 2001 10:01:10 +0200 (CEST)

I came across the same phenomenon a while ago:

http://www.cognigencorp.com/nonmem/nm/99apr042001.html

Suddenly some components of the gradient vector get very large and one or two iterations later the same might happen to other components and even sometimes the problem disappears after a few iterations. The magnitude of the largest values depend on the number of significant digits and I have successful convergence and covariance step with eg SIG=3, and that's why I got the feeling that it has something to do with the precision of numerical derivatives of the prediction with respect to the thetas. In PRED first and second analytical derivatives with respect to the etas need to be computed, and second analytical derivatives only when the NUMERICAL option is not used. How is the SLOW option diffferent from the NUMERICAL option?

From: Nick Holford <n.holford@auckland.ac.nz>

Subject: Re: Slow Gradient Method.

Date: Thu, 31 May 2001 20:12:19 +1200

Thank you for your hardly remembered viewpoint :-) Your historical perspective of the evolution of NONMEM IV to NONMEM V is certainly of interest.

Given the pedantic nature of the NONMEM Project Group documentation it seems quite reasonable to extrapolate that the SLOW (undocumented) and the NUMERICAL (documented) options do not have identical meanings.

But I am not clear why you think a numerical derivative might be quicker than an analytical derivative. Typical numerical derivatives are (f(t+dt) - f(t))/dt while analytical derivatives are f'(t) so unless f'(t) involves at least twice as much computation as f(t) plus the notoriously computationally expensive division by dt it seems that an analytical derivative would usually be faster than a numerical derivative. There are cases I believe when no convenient analytical derivative exists and then of course one must use numerical derivatives. I found that the use of numerical derivatives in MKMODEL seemed to give reasonable results without having to resort to the labour of obtaining analytical derivatives so in terms of the end result I am not sure if there is any real world difference when numerical vs analytical derivatives are used for the purposes of parameter estimation (assuming the analytical derivative is conveniently available).

I have copied this message to those directly responsible (Stuart Beal and Alison Boeckmann) to see if they can throw some light on what distinguishes SLOW from NUMERICAL.

Nick Holford, Divn Pharmacology & Clinical Pharmacology

University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand

email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556

http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

From: "Niclas Jonsson" <Niclas.Jonsson@farmbio.uu.se>

Subject: Re: Slow Gradient Method.

Date: Thu, 31 May 2001 10:51:38 +0200

I'm certain that your historical perspective by far exceeds mine;)

I'm convinced about your reasoning about the numerical derivatives, which leaves numerical tractability as the main benefit of the NUMERICAL option.

Date: Thu, 31 May 2001 09:51:15 -0700 (PDT)

>I am trying to use the CENTERING option of the ESTIMATION statement for a

>mixture model. I get a statement at the end of the report file that says

>"CENTERED METHODS MUST USE SLOW GRADIENT METHOD WITH MIXTURE MODEL".

>Can someone tell me how to use this method and what it means/does?

As some may know, there are undocumented and unsupported features in NONMEM which are not intended for the general user and which should not interfere with the general use of the program. We make no apology for this.

There are other arcane features which will pop into view on rare occasions. Please feel quite free to contact the NONMEM User Support Group when this happens. It seems that Matt has stumbled on one of these occasions. The meaning of the "SLOW gradient method" is one which the NONMEM user can essentially ignore. However, Matt will need to respond to the message; he should simply include the option SLOW in the $ESTIMATION record. (Matt, please ask yourself once again as to why indeed you wish to use the CENTER option with a mixture model.)

Commenting on some of the NM-Users discussion which ensued from Matt's question, in the order it seems to have been generated:

>There are essentially two ways NONMEM obtains gradients needed for

>performing the pseudo-Newton minimization. One involves only numerical

>derivatives, the other involves a combination of analytical and numerical

>derivatives. As you might imagine, the first is often slower than the

>second. It is therefore used less often. It is used when NUMERICAL is

This is correct. Note that here Bill is saying that there is a choice between numerical derivatives and analytic ones concerning the way gradients to the objective function surface are computed.

There are second derviatives with respect to eta which are a part of the Laplacian objective function itself. These can often be computed analytically. If the NUMERICAL option is included in the $ESTIMATION record, these second derivatives are computed numerically. Then, as Bill states, it so happens that the SLOW option is always also used. But then however, using NM-TRAN, this choice should be transparent to the user (there will be no message such as Matt experienced).

The NUMERICAL option is documented. It is necessary to use this option in certain cases. NM-TRAN will provide messages that indicate that the option should be used when it is mistakenly omitted. Unless one is using the option in the cases where it is necessary to do so, or unless one is simply experimenting with this option, there is no need to use it.

>Based on the discussions I am a little unsure what the value of the slow

>gradient method is. I would have thought that analytical derivatives would

>be more accurate and perhaps more stable than numerical - and therefore I am

>not sure why a potentially slower and perhaps less reliable method is of

>interest to us? Could you explain where the numerical method might be

Analytical derivatives can be more accurate, more stable, and faster to compute, as Stephen suggests. But, e.g. when NUMERICAL is used, and also in Matt's case, it just so happens that NONMEM is not using analytical derivatives to compute gradients of the objective function surface. This should be essentially of no concern to the user.

>I presume for situations where the model can only be described as ODEs then

>there might be little choice - but otherwise I can't see the advantage.

In fact, NONMEM is unaware when PREDPP is using DE's (differential equations), and NONMEM's choice as to whether or not to use analytical derivatives to compute gradients of the objective function surface is unaffected.

>I don't know if the SLOW method uses numerical derivatives or not but it

>is perhaps important to point out that the SLOW option on the $ESTIMATION

>is not the same as the NUMERICAL option. The NUMERICAL option requests

>that the second derivatives for the LAPLACE method are computed

>numerically, which, I presume, is quicker and sometimes more tractable

>than analytical second derivatives.

Here, Niclas emphasizes the same distinction I have tried to make above between the SLOW and NUMERICAL options. He suggests moreover that the use of the NUMERICAL option can sometimes result in quicker computations. Indeed this can happen, but the circumstances when this can happen are rare, and I think the user can fairly safely assume that where possible, NUMERICAL should be avoided.

>As I recall it, the SLOW option gives you the version of FOCE that was

>implemented in NONMEM IV. In one of the beta versions of NONMEM V there

>was an improvement to the FOCE algorithm that made it about three times

>faster (my own, hardly remebered, benchmarks). The new method could,

>however, not handle certain cases, i.e. CENTERing, mixture model and when

>the NUMERICAL option is used.

Indeed, with NONMEM IV, the only choice was to use the SLOW gradient method, and so no distinction was made. The newer and faster method may be used in most situations, including ones where the option CENTERING is used, except when there is also a mixture model (Matt's situation). The newer and faster method is the default method.