# Phase Retrieval V: Nonlinear Optimization

## Introduction

In
the last post I talked about focus diversity, the most common form of diversity for phase retrieval. There exist other forms of diversity, for example transverse translation diversity (using a subset of the pupil for each image), or wavelength diversity, but there are many others. Iterative transform phase retrieval *could* be used to implement at least the former, but the combobulation of the sub-apertures into a full aperture phase (and amplitude) estimate would be fairly complicated post processing, since there does not exist a natural method to enforce smoothness between the sub-apertures, or sameness in the overlap regions.

I mentioned in the previous post that multiple planes of focus diversity is slightly under-defined when using Misel’s algorithm to implement focus diversity. In this post, we will explore the logical extension of Conjugate-Gradient phase retrieval, into a nonlinear optimization problem, over the cost function $B$ from post 2.

Iterative transform phase retrieval algorithms, in general, require no “physicality,” since there is just a Fourier transform relationship between planes. There is no concept of wavelength, physical size, and so on. There is also no need to have a general modeling capability, since the operations of the algorithm are fully predefined. The method covered here is different, and marries a forward model, which will include many physical parameters, to a nonlinear optimizer that will adjust the parameters to make the model and measurement consistent.

Nonlinear optimization may be familiar from lens design or other fields. It is simply an algorithm that iterates a vector of parameters, to minimize some cost or objective function. The mechanism by which it determines which steps to make depends on the optimizer used, as well as the cost function. For details of those algorithms, consult the textbooks by either Nocedal or Bertsekas.

Most nonlinear optimization routines require access to the gradient, or the partial derivative of the cost function with respect to each parameter. In Fienup1982, he used what is now called algorithmic differentiation to figure out what the derivative of $B$ is with respect to each pixel of phase in the pupil plane. Before thinking about such a high dimensional optimization problem (Recall, 512x512 samples =262k parameters), we will think first about a low dimensional problem, where polynomial coefficients are the parameters. This is a low dimensional problem, having for example Noll Z4..Z11 = 7 parameters. One mechanism for evaluating the partial derivative of the cost with respect to each parameter is to simply perturb the parameters one at a time and evaluate the change. This is called finite differences. Many optimization packages do this for you if you do not provide the gradient.

If we leverage finite differences, all we must create to implement nonlinear optimization based phase retrieval is a forward model, and evaluation of $B$. That is, in order to use nonlinear optimization to do phase retrieval, all that is required is the capability to do forward modeling of the image chain, by propagating a field from the pupil plane to the image plane. The cost function evaluation is extremely trivial (only a line or two).

The polynomials describe the phase of the pupil, not the PSF, and are “mapped” to the PSF via a combination of linear and nonlinear operators, the final one - conversion from complex E field to intensity, $||^2$, being nonlinear.

I have not included this modeling capability in PRAISE; the user is free to use whichever tools they like to do the modeling. Prysm is not the only diffraction code out there, even if it is my favorite!

For this particular problem, the forward model for the optimizer simply looks like:

```
from scipy import optimize
D = psf0
cost_denom = np.sum(D**2)
history = []
def fwd_model(x):
# x is the colloquial parameter vector variable
# for nonlinear optimization
phs = polynomials.sum_of_2d_modes(basis, x)
wf = propagation.Wavefront.from_amp_and_phase(amp, phs, wvl, dx_p)
psf = wf.focus(efl, Q=2)
M = abs(psf.data)**2
return M
def cost(x):
M = fwd_model(x)
B = np.sum((D-M)**2) / cost_denom
return B
def nlopt_phase_retrieval():
return optimize.minimize(cost, x0 = np.zeros_like(coefs), method='L-BFGS-B', options={'eps': 1e-5, 'gtol': 1e-20, 'ftol': 1e-15, 'maxiter': 500})
```

Consult the previous posts to see the forward model that generates the data (`psf0`

), and other closure variables like basis and amp. Running this produces:

```
result = nlopt_phase_retrieval()
> coefs
array([3.36285288, 2.14547004, 0.0775794 , 2.29069011, 2.32366345,
3.08800282, 0.9280281 , 3.08185021, 4.77255725, 3.6927512 ,
0.56967056, 1.84092647])
> results.x
array([-3.37083169, -2.14063148, -0.08305467, 2.29121102, 2.32570854,
3.08787787, 0.9258548 , -3.0777322 , -4.76673893, -3.69359821,
-0.5718016 , -1.8468265 ])
```

Keep in mind that the output of this routine is Zernike polynomial coefficients in nanometers. You can see that several coefficients have the correct magnitude, but incorrect sign. This is the twin image problem in phase retrieval, in which multiple object phases produce the same Fourier magnitude. In this case, at a small level. If the scale factor on coefs is increased from 5 to 25 nm RMS, the algorithm converges with a final cost function of 1.8e-16 (~=double precision perfect) in 7 seconds. In comparison, iterative transform CG phase retrieval becomes entrapped at a cost function of about 2e-5, and required a higher order routine alternating between Misel’s version of phase diversity, and rounds of CG. The ultimate cost function value was about 2e-8, which is good agreement but not as good as the nonlinear optimizer found. So, to highlight:

for a large phase error, nonlinear optimization produces a significantly better solution, with lower error, more than ten times faster.

This is before we implement algorithmic differentiation to speed up the algorithm further. The small phase sign ambiguity can be resolved by adding diversity. To add diversity, we simply modify the forward model and cost function:

```
# ... snip
defocuses = [-3, 0, 3] # wvs zero to peak
psfs = []
for defocus in defocuses:
dz = thinlens.defocus_to_image_displacement(25, fno, wvl)
wf2 = psf.free_space(dz*1e3)
data = abs(wf2.data)**2
psfs.append(data)
# ... snip
```

This is not the most efficient forward model, but this particular tutorial is not especially geared for performance. The optimization routine is modified as:

```
D = psfs
cost_denoms = [np.sum(d**2) for d in D]
def fwd_model_focus_diverse(x):
# x is the colloquial parameter vector variable
# for nonlinear optimization
phs = polynomials.sum_of_2d_modes(basis, x)
wf = propagation.Wavefront.from_amp_and_phase(amp, phs, wvl, dx_p)
psf = wf.focus(efl, Q=2)
models = []
for defocus in defocuses:
dz = thinlens.defocus_to_image_displacement(25, fno, wvl)
wf2 = psf.free_space(dz*1e3)
data = abs(wf2.data)**2
models.append(data)
return models
def cost_focus_diverse(x):
models = fwd_model_focus_diverse(x)
B = 0
for d, m, denom in zip(D, models, cost_denoms):
B += np.sum((m-d)**2) / denom
B /= len(D)
return B
def nlopt_phase_retrieval():
return optimize.minimize(cost_focus_diverse, x0 = np.zeros_like(coefs), method='L-BFGS-B', options={'eps': 1e-5, 'gtol': 1e-20, 'ftol': 1e-15, 'maxiter': 500})
```

Running the optimizer produces:

```
took 22.931084237999983 seconds
fun: 3.445497884930999e-15
hess_inv: <12x12 LbfgsInvHessProduct with dtype=float64>
jac: array([-6.64462169e-13, -3.05493926e-12, 1.59373788e-11, 8.00309187e-12,
-3.53886450e-12, -6.51087867e-12, -2.04889752e-13, 3.73758672e-12,
2.64361840e-11, 2.18518401e-11, 8.61960446e-14, 1.95546966e-12])
message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 351
nit: 25
njev: 27
status: 0
success: True
x: array([3.3628582 , 2.14548413, 0.07760613, 2.29068534, 2.32365864,
3.08799708, 0.92802352, 3.08184338, 4.77254619, 3.69274328,
0.56966573, 1.84092193])
```

This is still much faster than CG iterative transform phase retrieval was able to find a solution, despite the inclusion of focus diversity. If we check the residual error:

```
> np.linalg.norm(coefs-result.x)
3.629006320925115e-05
```

Since this is in nanometers, 3.6e-5 is 36 femtometers.

## Conclusions

Nonlinear optimization based phase retrieval finds extraordinarily good solutions, and is easily adapted to include diversity or other parameters. It works significantly faster than iterative transform methods. Because the optimization parameters in this case were exactly matched to the modes of the data, this comparison is slightly unfair because no out-of-band “noise” can be optimized by the algorithm, in the form of higher order terms. Truncation of the basis set used will limit the algorithm’s ability to find higher frequency phase errors. This method is not restricted to finding a small number of modes, and can even solve elementwise phase solutions the same as iterative transform methods. To do so more or less requires the use of algorithmic differentiation to analytically compute the gradients, since a 262,000 parameter finite difference will take an extremely long time to compute.

## Next

In the next posts, we will explore algorithmic differentiation to further accelerate the nonlinear optimization algorithm, and show how to use it to find elementwise phase solutions.