From: "HUTMACHER, MATTHEW [Non-Pharmacia/1825]" <matthew.hutmacher@pharmacia.com>

Subject: Slow Gradient Method.

Date: Thu, 24 May 2001 14:08:44 -0500

 

Hello,

 

I am trying to use the CENTERING option of the ESTIMATION statement for a mixture model. I get a statement at the end of the report file that says "CENTERED METHODS MUST USE SLOW GRADIENT METHOD WITH MIXTURE MODEL". Can someone tell me how to use this method and what it means/does?

 

Thanks for your help.

 

Matt

 

*****

 

From: "Niclas Jonsson" <Niclas.Jonsson@farmbio.uu.se>

Subject: Re: Slow Gradient Method.

Date: Thu, 24 May 2001 20:56:54 -0500 (CDT)

 

Matt,

 

You can use the slow method by giving the SLOW option of the $ESTIMATION record.

 

Best,

 

Niclas

--

Department of Pharmaceutical Bioscienses

Division of Pharmacokinetics and Drug Therapy

Uppsala University

Box 591

SE-751 24 Uppsala

Sweden

Phone: +46 18 471 46 85

Fax: +46 18 471 40 03

Mobile: +46 70 485 61 98

E-mail: niclas.jonsson@farmbio.uu.se

 

*****

 

From: "Piotrovskij, Vladimir [JanBe]" <VPIOTROV@janbe.jnj.com>

Subject: RE: Slow Gradient Method.

Date: Mon, 28 May 2001 08:57:45 +0200

 

One more nondocumented NONMEM option? Can anybody tell something about it?

 

Best regards,

Vladimir

 

*****

 

From: Nick Holford <n.holford@auckland.ac.nz>

Subject: Re: Slow Gradient Method.

Date: Mon, 28 May 2001 19:30:29 +1200

 

> "Piotrovskij, Vladimir [JanBe]" wrote:

>

> One more nondocumented NONMEM option?

> Can anybody tell something about it?

 

What do you mean 'nondocumented'? Can't you read FORTRAN? :-)

 

IF (FRSTPR.EQ.1) THEN

ICALL=0

CALL MIX (ICALL,NCALL,P)

IMIX=0

IF (ICALL.NE.9999) THEN

IF (OPTWO.EQ.2.AND.OPNOGR.EQ.0) THEN

WRITE (6,46)

GO TO 9000

ENDIF

IMIX=1

ENDIF

ENDIF

...

46 FORMAT ('0CENTERED METHODS MUST USE SLOW GRADIENT METHOD',

1 ' WITH MIXTURE MODEL')

ENDIF

 

If the variable OPNOGR.NE.0 then the SLOW GRADIENT method is used. Just check out what this does in the following files and you will learn exactly what the SLOW GRADIENT method does.

 

grep OPNOGR *.for

File ELS.FOR:

IF (ICONTR.EQ.1.AND.OPETA1.EQ.1.AND.OPNOGR.EQ.0) THEN

File INITL.FOR:

IF (OPTWO.EQ.2.AND.OPNOGR.EQ.0) THEN

OPNOGR=1

OPNOGR=1

OPNOGR=1

IF (MM.EQ.0) OPNOGR=1

IF (OPNOGR.EQ.1) THEN

File INPT.FOR:

IF (OPNOGR.LT.0.OR.OPNOGR.GT.1) GO TO 1110

IF (OPNOGR.EQ.0.AND.OPLAPN.EQ.1) GO TO 1110

IF (OPNOGR.EQ.1) WRITE (UNOUT,1123) NY(OPNOGR+1)

OPNOGR=0

 

Good luck :-)

 

Nick

--

Nick Holford, Divn Pharmacology & Clinical Pharmacology

University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand

email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556

http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

 

*****

 

From: "Bachman, William" <bachmanw@globomax.com>

Subject: RE: Slow Gradient Method.

Date: Wed, 30 May 2001 07:55:14 -0400

 

Vladimir,

 

You are correct that this idea is not documented.

 

There are essentially two ways NONMEM obtains gradients needed for performing the pseudo-Newton minimization. One involves only numerical derivatives, the other involves a combination of analytical and numerical derivatives. As you might imagine, the first is often slower than the second. It is therefore used less often. It is used when NUMERICAL is specified.

 

nmconsult@globomax.com

GloboMax LLC

7250 Parkway Drive, Suite 430

Hanover, MD 21076

Voice: (410) 782-2205

FAX: (410) 712-0737

 

*****

 

From: "Piotrovskij, Vladimir [JanBe]" <VPIOTROV@janbe.jnj.com>

Subject: RE: Slow Gradient Method.

Date: Wed, 30 May 2001 14:18:25 +0200

 

Thanks Bill for your explanation. Now I understand that SLOW can hardly have any advantages over the default algorithm. Meanwhile I performed some exploration and found that using SLOW with METHOD=COND did not improve the convergence behaviour in complicated cases. I have currently one example where FOCE fails to converge whereas FO method converges perfectly. With FOCE NONMEM stops due to rounding errors, and before this it does a few iterations with very high gradient (>10^5) and no changes in parameters at all.

 

Best regards,

Vladimir

 

*****

 

From: "Stephen Duffull" <sduffull@pharmacy.uq.edu.au>

Subject: RE: Slow Gradient Method.

Date: Thu, 31 May 2001 08:30:15 +1000

 

RE: Slow Gradient Method.Bill

 

Based on the discussions I am a little unsure what the value of the slow gradient method is. I would have thought that analytical derivatives would be more accurate and perhaps more stable than numerical - and therefore I am not sure why a potentially slower and perhaps less reliable method is of interest to us? Could you explain where the numerical method might be valuable?

 

I presume for situations where the model can only be described as ODEs then there might be little choice - but otherwise I can't see the advantage.

 

Regards

 

Steve

=================

Stephen Duffull

School of Pharmacy

University of Queensland

Brisbane, QLD 4072

Australia

Ph +61 7 3365 8808

Fax +61 7 3365 1688

http://www.uq.edu.au/pharmacy/duffull.htm

 

*****

 

From: "Niclas Jonsson" <Niclas.Jonsson@farmbio.uu.se>

Subject: RE: Slow Gradient Method.

Date: Thu, 31 May 2001 09:31:36 +0200

 

I don't know if the SLOW method uses numerical derivatives or not but it is perhaps important to point out that the SLOW option on the $ESTIMATION is not the same as the NUMERICAL option. The NUMERICAL option requests that the second derivatives for the LAPLACE method are computed numerically, which, I presume, is quicker and sometimes more tractable than analytical second derivatives.

 

As I recall it, the SLOW option gives you the version of FOCE that was implemented in NONMEM IV. In one of the beta versions of NONMEM V there was an improvement to the FOCE algorithm that made it about three times faster (my own, hardly remebered, benchmarks). The new method could, however, not handle certain cases, i.e. CENTERing, mixture model and when the NUMERICAL option is used.

 

Niclas

 

*****

 

From: Erik Olofsen <E.Olofsen@lumc.nl>

Subject: RE: Slow Gradient Method.

Date: Thu, 31 May 2001 10:01:10 +0200 (CEST)

 

Dear Vladimir,

 

I came across the same phenomenon a while ago:

 

http://www.cognigencorp.com/nonmem/nm/99apr042001.html

 

Suddenly some components of the gradient vector get very large and one or two iterations later the same might happen to other components and even sometimes the problem disappears after a few iterations. The magnitude of the largest values depend on the number of significant digits and I have successful convergence and covariance step with eg SIG=3, and that's why I got the feeling that it has something to do with the precision of numerical derivatives of the prediction with respect to the thetas. In PRED first and second analytical derivatives with respect to the etas need to be computed, and second analytical derivatives only when the NUMERICAL option is not used. How is the SLOW option diffferent from the NUMERICAL option?

 

Best regards,

 

Erik Olofsen

 

*****

 

From: Nick Holford <n.holford@auckland.ac.nz>

Subject: Re: Slow Gradient Method.

Date: Thu, 31 May 2001 20:12:19 +1200

 

Niclas,

 

Thank you for your hardly remembered viewpoint :-) Your historical perspective of the evolution of NONMEM IV to NONMEM V is certainly of interest.

 

Given the pedantic nature of the NONMEM Project Group documentation it seems quite reasonable to extrapolate that the SLOW (undocumented) and the NUMERICAL (documented) options do not have identical meanings.

 

But I am not clear why you think a numerical derivative might be quicker than an analytical derivative. Typical numerical derivatives are (f(t+dt) - f(t))/dt while analytical derivatives are f'(t) so unless f'(t) involves at least twice as much computation as f(t) plus the notoriously computationally expensive division by dt it seems that an analytical derivative would usually be faster than a numerical derivative. There are cases I believe when no convenient analytical derivative exists and then of course one must use numerical derivatives. I found that the use of numerical derivatives in MKMODEL seemed to give reasonable results without having to resort to the labour of obtaining analytical derivatives so in terms of the end result I am not sure if there is any real world difference when numerical vs analytical derivatives are used for the purposes of parameter estimation (assuming the analytical derivative is conveniently available).

 

I have copied this message to those directly responsible (Stuart Beal and Alison Boeckmann) to see if they can throw some light on what distinguishes SLOW from NUMERICAL.

 

Nick

--

Nick Holford, Divn Pharmacology & Clinical Pharmacology

University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand

email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556

http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

 

*****

 

From: "Niclas Jonsson" <Niclas.Jonsson@farmbio.uu.se>

Subject: Re: Slow Gradient Method.

Date: Thu, 31 May 2001 10:51:38 +0200

 

Nick,

 

I'm certain that your historical perspective by far exceeds mine;)

 

I'm convinced about your reasoning about the numerical derivatives, which leaves numerical tractability as the main benefit of the NUMERICAL option.

 

Niclas

 

*****

 

Date: Thu, 31 May 2001 09:51:15 -0700 (PDT)

From: stuart@c255.ucsf.edu

 

>From Matt Hutmacher:

 

>I am trying to use the CENTERING option of the ESTIMATION statement for a

>mixture model. I get a statement at the end of the report file that says

>"CENTERED METHODS MUST USE SLOW GRADIENT METHOD WITH MIXTURE MODEL".

>Can someone tell me how to use this method and what it means/does?

 

As some may know, there are undocumented and unsupported features in NONMEM which are not intended for the general user and which should not interfere with the general use of the program. We make no apology for this.

 

There are other arcane features which will pop into view on rare occasions. Please feel quite free to contact the NONMEM User Support Group when this happens. It seems that Matt has stumbled on one of these occasions. The meaning of the "SLOW gradient method" is one which the NONMEM user can essentially ignore. However, Matt will need to respond to the message; he should simply include the option SLOW in the $ESTIMATION record. (Matt, please ask yourself once again as to why indeed you wish to use the CENTER option with a mixture model.)

 

Commenting on some of the NM-Users discussion which ensued from Matt's question, in the order it seems to have been generated:

 

>From Bill Bachman:

 

>There are essentially two ways NONMEM obtains gradients needed for

>performing the pseudo-Newton minimization. One involves only numerical

>derivatives, the other involves a combination of analytical and numerical

>derivatives. As you might imagine, the first is often slower than the

>second. It is therefore used less often. It is used when NUMERICAL is

>specified.

 

This is correct. Note that here Bill is saying that there is a choice between numerical derivatives and analytic ones concerning the way gradients to the objective function surface are computed.

 

There are second derviatives with respect to eta which are a part of the Laplacian objective function itself. These can often be computed analytically. If the NUMERICAL option is included in the $ESTIMATION record, these second derivatives are computed numerically. Then, as Bill states, it so happens that the SLOW option is always also used. But then however, using NM-TRAN, this choice should be transparent to the user (there will be no message such as Matt experienced).

 

The NUMERICAL option is documented. It is necessary to use this option in certain cases. NM-TRAN will provide messages that indicate that the option should be used when it is mistakenly omitted. Unless one is using the option in the cases where it is necessary to do so, or unless one is simply experimenting with this option, there is no need to use it.

 

>From Stephen Duffull:

 

>Based on the discussions I am a little unsure what the value of the slow

>gradient method is. I would have thought that analytical derivatives would

>be more accurate and perhaps more stable than numerical - and therefore I am

>not sure why a potentially slower and perhaps less reliable method is of

>interest to us? Could you explain where the numerical method might be

>valuable?

 

Analytical derivatives can be more accurate, more stable, and faster to compute, as Stephen suggests. But, e.g. when NUMERICAL is used, and also in Matt's case, it just so happens that NONMEM is not using analytical derivatives to compute gradients of the objective function surface. This should be essentially of no concern to the user.

 

>I presume for situations where the model can only be described as ODEs then

>there might be little choice - but otherwise I can't see the advantage.

 

In fact, NONMEM is unaware when PREDPP is using DE's (differential equations), and NONMEM's choice as to whether or not to use analytical derivatives to compute gradients of the objective function surface is unaffected.

 

>From Niclas Jonsson:

 

>I don't know if the SLOW method uses numerical derivatives or not but it

>is perhaps important to point out that the SLOW option on the $ESTIMATION

>is not the same as the NUMERICAL option. The NUMERICAL option requests

>that the second derivatives for the LAPLACE method are computed

>numerically, which, I presume, is quicker and sometimes more tractable

>than analytical second derivatives.

 

Here, Niclas emphasizes the same distinction I have tried to make above between the SLOW and NUMERICAL options. He suggests moreover that the use of the NUMERICAL option can sometimes result in quicker computations. Indeed this can happen, but the circumstances when this can happen are rare, and I think the user can fairly safely assume that where possible, NUMERICAL should be avoided.

 

>As I recall it, the SLOW option gives you the version of FOCE that was

>implemented in NONMEM IV. In one of the beta versions of NONMEM V there

>was an improvement to the FOCE algorithm that made it about three times

>faster (my own, hardly remebered, benchmarks). The new method could,

>however, not handle certain cases, i.e. CENTERing, mixture model and when

>the NUMERICAL option is used.

 

Indeed, with NONMEM IV, the only choice was to use the SLOW gradient method, and so no distinction was made. The newer and faster method may be used in most situations, including ones where the option CENTERING is used, except when there is also a mixture model (Matt's situation). The newer and faster method is the default method.

 

Stuart Beal