Variational Source Conditions and Conditional Stability Estimates for Inverse Problems in PDEs

and Applied Analysis, 2008, pp. 1–19, 2008. http://dx.doi. org/10.1155/2008/192679. [BL76] BERGH, J. AND LÖFSTRÖM, J. Interpolation spaces. An introduction. Springer-Verlag, Berlin-New York, 1976. Grundlehren der Mathematischen Wissenschaften, No. 223. [BL91] BORWEIN, J.M. AND LEWIS, A.S. Convergence of best entropy estimates. SIAM J. Optim., 1, pp. 191–205, 1991. http://dx.doi.org/10.1137/ 0801014. [BMS14] BOURDAUD, G., MOUSSAI, M., AND SICKEL, W. Composition operators acting on Besov spaces on the real line. Ann. Mat. Pura Appl. (4), 193, pp. 1519–1554, 2014. http://dx.doi.org/10.1007/s10231-013-0342-x. [BO04] BURGER, M. AND OSHER, S. Convergence rates of convex variational regularization. Inverse Problems, 20, pp. 1411–1421, 2004. http://dx.doi. org/10.1088/0266-5611/20/5/005. [Bor02] BORCEA, L. Electrical impedance tomography. Inverse Problems, 18, pp. R99–R136, 2002. http://dx.doi.org/10.1088/0266-5611/18/6/201. [BP86] BARBU, V. AND PRECUPANU, T. Convexity and optimization in Banach spaces, vol. 10 of Mathematics and its Applications (East European Series). D. Reidel Publishing Co., Dordrecht; Editura Academiei Republicii Socialiste România, Bucharest, romanian ed., 1986. [BS11] BOURDAUD, G. AND SICKEL, W. Composition operators on function spaces with fractional order of smoothness. In Harmonic analysis and nonlinear partial differential equations, RIMS Kôkyûroku Bessatsu, B26, pp. 93–132. Res. Inst. Math. Sci. (RIMS), Kyoto, 2011. [BSW16] BASKIN, D., SPENCE, E.A., AND WUNSCH, J. Sharp High-Frequency Estimates for the Helmholtz Equation and Applications to Boundary Integral Equations. SIAM J. Math. Anal., 48, pp. 229–267, 2016. http://dx.doi. org/10.1137/15m102530x. [BT03] BROWN, R.M. AND TORRES, R.H. Uniqueness in the Inverse Conductivity Problem for Conductivities with 3/2 Derivatives in Lp, p > 2n. Journal of Fourier Analysis and Applications, 9, pp. 563–574, 2003. http://dx.doi. org/10.1007/s00041-003-0902-3. [Bur98] BURENKOV, V.I. Sobolev spaces on domains, vol. 137 of Teubner-Texte zur Mathematik [Teubner Texts in Mathematics]. B. G. Teubner Verlagsgesellschaft mbH, Stuttgart, 1998. http://dx.doi.org/10.1007/ 978-3-663-11374-4.


INTRODUCTION
Inverse problems arise naturally in science and industry: given some observation one wants to find some parameter of interest which is not directly observable. Such problems are common in fields as diverse as medical imaging, geoscientific exploration, non-destructive testing and finance. Very often these problems can be formulated as partial differential equations (pdes), where the observation is given by (a part of) the solution and the unknown can be an initial condition, a coefficent or even the domain of the equation.
Mathematically these different problems can be put into the following general framework: given the parameter of interest f the observation g is determined by an operator equation where F is the mapping that assigns to each parameter the corresponding data. Hence naively if we observe some phenomenon g and know the mapping F we simply have to apply F −1 to find the parameter f . There are, however, problems with this approach, even if F is invertible. Typically we will not be able to measure the right hand side g exactly but will measure only g obs ≈ g. Here "≈" includes several effects like discretization, round-off and modeling errors as well as unavoidable measurement errors. Consider for a moment the case that f , g ∈ R d and F is given by an invertible linear map A ∈ R d×d . Then a standard result from numerical analysis is the following: Introduction where cond(A) := A A −1 is the condition number of the matrix A with respect to the norm · . As on the set of invertible linear matrices the condition number might be arbitrary large this illustrates that even a small perturbation of the measured data might lead to large deviations in the reconstructed parameter.
We will, however, usually model inverse problems in infinite dimensional spaces. A typical situation here is that F is a linear compact operator with dense range. In this case the inverse mapping F −1 still exists on a dense subset but it is no longer continuous, i.e. we can not hope to bound the reconstruction error in any way. In other words we have to consider any naive reconstruction as useless.
In order to obtain stable reconstructions from corrupted data one therefore does not try to solve the equation F( f ) = g obs exactly but in an approximate way that takes the error of the right hand side into account. Such methods that approximate F −1 by stable operators are called regularization methods. In this thesis we will focus on Tikhonov regularization of the form Here S measures closeness of observed data g obs and data generated by a candidate parameter f , R is a penalty term that incorporates our a priori knowledge on the solution and stabilizes the problem while the regularization parameter α > 0 balances the two terms. This is a generalization of the more classical method studied e.g. in [EHN96] where S F( f ), g obs = F( f ) − g obs 2 and R( f ) = f 2 .
That this general procedure yields indeed a stable reconstruction and that f α converges to the true solution f † when g obs converges to the true data for some choice of α has been established e.g. in [Pös08,Fle11] (see Section 1.2.2 for more details). For the choice of α one roughly has to take the following effect into account: the larger we choose α the more stable is the problem we are solving but the less we try to match the data exactly. Hence one has to try to find an optimal balance of the two effects in the sense that the f α is as close as possible to the true solution f † . It is well known, however, in the regularization theory that any bound on the distance f α to f † requires a priori knowledge on f † , i.e. we cannot find such a bound which will hold globally for all possible choices of parameters f † . Such an a priori knowledge is typically formulated as a source condition. E.g. in [EHN96] for the case that F is a linear, compact operators acting between Hilbert spaces the spectral source condition f † = (TT) ν w, ν ∈ (0, 1] have been studied, where the right hand side is defined by the functional calculus.

7
If the data obeys g obs − g ≤ δ for the above choice of the Tikhonov functional a convergence rate of for an optimal choice α =ᾱ(δ, g obs ) of the regularization parameter has been obtained. For details on generalization to conditions of the form f † = ϕ(F * F)w and nonlinear operators see Section 2.1.
While the corresponding theory was quite successful it also has some drawbacks: using the functional calculus requires a Hilbert space setting, and the treatment of nonlinear operators require strong additional assumptions. However, many interesting problems arising in practice are non-linear and Banach space settings allow greater flexibility in the choice of data and penalty terms which are required for more accurate modeling and promotion of desired features of the reconstruction (see e.g. [SKHK12]). Hence a more general type of source condition is needed.
The current state of the art assumption is to require that the true solution f † fulfills the variational inequality for some loss function E . These variational source conditions (VSC) were introduced in [HKPS07] and have become a standard assumption in regularization theory. For an optimal choice α =ᾱ of the regularization parameter it is straightforward to show the convergence rate E fᾱ, f † ≤ cψ(δ 2 ) under mild assumptions on the involved quantities, where the observed data g obs has to satisfy g obs − g ≤ δ again. Recent results of [Fle18] illustrate that such a VSC is always fulfilled, but they do not tell us how the function ψ will look like. The subject of this thesis is to investigate conditions which allow to quantify ψ in the VSC and hence the convergence rate.
If F is an operator mapping between function spaces F −1 is often not continuous due to the fact that F is a smoothing operator, that is F( f ) is a smoother function than f . It has been observed for such problems that the spectral source condition can be interpreted as a smoothness assumption on f relative to the smoothing properties of the operator. The smoothing properties of the operator are related to the degree of ill-posedness of the operator; e.g. in case of linear, compact operators acting between Hilbert spaces this is typically measured by the speed of the decay of the singular values of the operator.
In this thesis we will use the two factors, smoothness of the solution and illposedness of the operator equation, in order to derive explicit forms of variational Introduction source conditions; the second of the two factors is usually more difficult to characterize. This criterion will be treated by using similarities between VSCs and conditional stability estimates. Conditional stability estimates are inequalities of the form for all f 1 , f 2 in a (usually smooth) subset K. Such estimates are quite common for many interesting problems while results that show that for a given problem and conditions on the true solution f † a VSC for a specific function ψ is fulfilled are still rare in the literature (see Remark 2.28). As VSCs imply stability estimates this allows us to compare our new findings with existing results in the literature.
This thesis is structured as follows: • In Chapter 1 we will give a more detailed introduction into inverse problems and Tikhonov regularization.
• Chapter 2 will review the results on convergence rates known in the literature. We will discuss the advantages and limitations of various source conditions as well as their relation to each other. Further we will prove a main result of this thesis: a general strategy to verify VSCs with Bregman loss function for certain penalty terms R. This strategy generalizes our approach to verify VSCs in [HW15,HW17b] and quantifies solution smoothness and ill-posedness of the operator. It will be our main tool to verify VSCs later on.
• The case of linear operator acting between Hilbert spaces will be treated in Chapter 3. Here we will show that VSCs are not only sufficient but even necessary to obtain certain rates of convergence. As an important byproduct we obtain that f † fulfills a VSCs if and only if f † is in a certain interpolation spaces. The results of this chapter have been published in [HW17a].
• A choice of penalty term which has become popular is to choose R as a (weighted) sum of wavelet coefficents. Under certain conditions these sums turn out to be equivalent norms on Besov spaces; thus we study Tikhonov regularization with such penalty terms in Chapter 4. While Besov spaces are not necessarily Hilbert spaces they have a rich structure not only in terms of wavelets but they can also be characterized e.g. by Fourier transform and interpolation; the different characterizations will be exploited in order to demonstrate that under certain conditions VSCs are fulfilled. While we cannot show equivalence between convergence rates and VSCs here we are able to illustrate that the derived convergence rates are of optimal order, i.e. in certain Besov balls no faster uniform convergence is possible. One of the main results of CHAPTER I

TIKHONOV REGULARIZATION
Measure what can be measured, and make measureable what cannot be measured.
attributed to GALILEO GALILEI.

Ill-Posed Problems and their Regularization
In [Kel76] the concept of problems which are inverse to each other is introduced: it is required that the formulation of one problem contains the solution of the other. These kind of problems occur for example in science where one wants to predict a parameter given some observation. An associated problem is to to predict the observation given the parameter which is usually much better understood and hence called the direct problem. The direct problem is often well-posed in the sense of Hadamard (see [Had53]), i.e. it meets the conditions of the following definition: A problem that is not well-posed is called ill-posed.
Correspondingly we call the problem to predict the parameter from the solution the inverse problem which often turns out to be ill-posed. Note that reformulating the problem can help to overcome issues with the first two items by either enlarging or shrinking the set where we look for a solution. However, item (c) is the most important one as in practice any measurement will be noisy, and thus any ad-hoc reconstruction not taking the noisy character of the measurement into account should be regarded as meaningless.
From now on we will assume that the direct problem can be written as an operator equation where F is the forward operator mapping the parameter f ∈ dom(F) ⊂ X to the observation g ∈ Y and X , Y are Banach spaces. We will denote the true solution by f † and the corresponding data by g † . The well-posedness criteria of Hadamard in Definition 1.1 for the inverse problem then read as (a) F is surjective, In case that F −1 is not continuous a small perturbation of the input can lead to an arbitrarily large perturbation of the output. Further it is evident why this problem cannot be overcome by a simple reformulation as we either need a finer topology on Y or a coarser topology on X . However, one would like to analyze the effect of errors on the solution. Assuming that the topologies are induced by norms one has the problem that errors are usually unbounded in stronger norms on Y (which would induce a finer topology) while error estimates in weaker norms on X (i.e. coarser topology) are usually non-informative. Hence, instead of solving the inverse problem directly, one approximates F −1 by a family of continuous operators {R α } α>0 such that R α (g) α→0 −→ F −1 (g) whenever the right hand side is defined. A prominent example of such a method is Tikhonov regularization which will be studied in more detail in the following.

Generalized Tikhonov Functionals
Consider for a moment the case that X , Y are Hilbert spaces and the forward operator is a linear injective map, that is F = T. 1 On the observed data g = g obs suppose that g obs − g † | Y ≤ δ. 2 Then f is a least-square solution to (1.1) with data g obs if and The approximation f α can also be characterized in a variational way, namely (1. 2) The functional T g obs ,α is called Tikhonov functional due to Tikhonov [Tik63b,Tik63a] who used spaces X including derivatives of f . The first term of the Tikhonov functional in (1.2) measures the data misfit, i.e. how good the approximation satisfies the measured data, while the second term stabilizes the problem and incorporates the a-priori knowledge that the solution is smooth and close to f 0 . The parameter α > 0 balances the two terms; the smaller we choose α the better our approximation of the original problem but the more sensitive we become to data errors. In the following we will study a generalization of the form f α ∈ arg min T g obs ,α ( f ), where T g obs ,α ( f ) := 1 α S F( f ), g obs + R( f ) (1.3) explained in more detail in the following subsection.

Noise model
More generally denote by Y obs the set of all possible measurable data g obs . In general one has Y obs = Y -note that there are even cases where Y obs is not a vector space (see the references on impulsive noise below). Define a mapping S : Y × Y obs → R ∪ {∞} which assigns to each measurement g obs ∈ Y obs a data fidelity functional S(·, g obs ) which should be relatively small if an input close to F( f ) could generate the data g obs . As we are looking for a minimizer of (1.3) the exact value of S(g, g obs ) has no meaning. If g obs is observed data generated by g † however, then (S(g, g obs ) − S(g † , g obs )) should be approximately zero if g is close to g † and large if g is far away from g † . To quantify how close g and g † are, we introduce an ideal data fidelity functional T g † : Y → [0, ∞]. The noise level will now be defined by comparing the difference of the two data fidelity functionals with the ideal data fidelity functional. Definition 1.2 (effective noise level). Let g † ∈ Y be exact data and let C err ≥ 1 be a constant. Then the effective noise level err : Y → [0, ∞] given some data g obs ∈ Y obs is defined as err(g) := S(g † , g obs ) − S(g, g obs ) + 1 C err T g † (g).
The global effective noise level err is defined as err = err F(dom(F)) .
Note that err(g) is the smallest number e such that the inequality S(g, g obs ) − S(g † , g obs ) ≥ 1 C err T g † (g) − e holds true.
Example 1.3. The following example introduces the two noise models which will occur in this thesis, together with their respective choices of (ideal) data fidelity functionals.
(a) Classical, deterministic noise model, see [WH12,Ex. 3.1]: In this case one has Y obs = Y and one assumes that the observed data g obs satisfies The typical choices of data fidelity functionals are S(g 1 , g 2 ) = T g 1 (g 2 ) = 1 p g 1 − g 2 | Y p S g, g obs = 1 2 g − g obs Y 2 = 1 2 g | Y 2 − g obs , g + 1 2 g obs Y 2 one sees that this term is infinite almost surely. However, all terms up to g obs | Y 2 are almost surely finite and this term does not depend on g. Hence we will drop this term and define S g, g obs = 1 2 g | Y 2 − g obs , g .
Setting C err = 1 this leads to err(g) = ε Z, g − g † so one obtains E[err(g)] = 0 and V[err(g)] = ε 2 g − g † | Y 2 . Note that err(g) is still a random variable and that err = ∞, if dom(F) contains a one dimensional subspace. Therefore to handle the error more properties on the distribution in form of concentration or deviation inequalities are required.
Further examples where this error model has been used are impulsive noise models [HW14,KWH16] and Poisson noise models [WH12].

Regularizing property
In this section we prove the following regularization properties of (1.3): (a) Well-definedness, that is there exists a minimizer f α of T g obs ,α for any α > 0 and g obs ∈ Y obs .
(b) Stability, that is the regularized solutions f α depend sequentially continuously on the data g obs .
(c) Convergence, that is regularized solutions ( f α n ) n∈N for which α n → 0 and (err(F( f α n ))) n∈N → 0 in an appropriate manner converge to the true solution f † .
To prove these properties we will require that the subsequent assumption holds true: Assumption 1.4. The following will be standing assumption throughout the current and the following chapter.
(a) (X , τ X ) and (Y, τ Y ) are locally convex vector spaces and (Y obs , τ Y obs ) is a Hausdorff space. (c) Let F : dom(F) → Y be a possibly nonlinear operator defined on a set dom(F) and assume that D := dom(F) ∩ dom(R) = ∅ is a sequentially closed set and F is sequentially continuous on D.
(d) Associate to all data g obs ∈ Y obs a data fidelity functional S(·, g obs ) : Y → R ∪ {∞} satisfying inf f ∈dom(F) S(F( f ), g obs ) ∈ R which is lower semicontinuous w.r.t. τ Y × τ Y obs and sequentially continuous w.r.t. τ Y obs .
(e) Let g † be exact data and T g † : Y → [0, ∞] be a corresponding ideal data fidelity functional, which is lower semicontinuous and satisfies T g † (g) = 0 if and only if g = g † .
(f) There exists a unique f † ∈ arg min f : F( f )=g † R( f ), called R-minimizing solution to F( f ) = g † .
Example 1.5. Consider the classical Tikhonov functional (1.2) with norm-to-norm continuous operator T. Setting τ X as the weak topology on X and τ Y = τ Y obs as the weak topology on Y we see that all the assumptions are fulfilled. Indeed T is also weak-to-weak continuous and even if T is not injective the R-minimizing solution is unique as f † is the orthogonal projection of f 0 onto the affine subspace { f ∈ X : T f = g † }.
The following theorem shows that under the assumptions above Tikhonov regularization is well-defined, the proof follows along the lines of[Pös08, Thm. 1.6] or [Fle11,Thm. 3.2] and essentially shows that the Tikhonov functional is lower semicompact. Theorem 1.6 (well-definedness). Suppose that Assumption 1.4 is satisfied. Then there exists a minimizer f α of T g obs ,α ( f ).
If F = T is linear and S(·, g obs ) and R are convex with at least one of them being strictly convex, then the minimizer f α is unique as then the Tikhonov functional is strictly convex. This is e.g. the case for the classical Tikhonov regularization (1.2). In general however one cannot guarantee uniqueness. Hence one cannot expect that given a converging sequence of data a sequence of corresponding minimizers converges as the sequence of minimizer might jump between different solutions. Nevertheless the following result shows that one still has convergence along subsequences.
If Assumption 1.4 is satisfied then the following holds true: (a) lim sup n∈N T g obs n ,α n ( f n ) < ∞. (b) ( f n ) n∈N has a convergent subsequence ( f n k ) k∈N .
(c) Each limit f α of a convergent subsequence satisfies f α ∈ arg min f T g obs ,α ( f ).
(d) For each convergent subsequence one obtains S( f n k , g obs n k ) → S( f α , g obs ), R( f n k ) → R( f α ) and T g obs n k ,α n k ( f n k ) → T g obs ,α ( f α ) as k → ∞.
(e) If the minimizer of T g obs ,α is unique, then ( f n ) n∈N converges to f α .
The previous theorem shows that generalized Tikhonov estimators depend continuously on the observed data. So if we solve (1.3) instead of (1.1) we obtain a well-posed problem in the sense of Definition 1.1 up to uniqueness. It further shows that it is also robust with respect to only approximate evaluation.
Theorem 1.8 (convergence). Let f † ∈ X with exact data g † ∈ Y. Let (g obs n ) n∈N ⊂ Y obs be a sequence of observed data with associated global effective noise level (err n ) n∈N such that lim n→∞ err n = 0. Assume that the regularization parameters (α n ) n∈N are chosen such that lim n→∞ α n = 0 and lim n→∞ err n α n = 0.
Let Assumption 1.4 be satisfied and f n := f α n be a sequence of minimizers of T g obs n ,α n . Then the following holds true: (a) f n → f † with respect to τ X ; Proof. As f n minimizes T g obs n ,α n , we obtain 1 α n S F( f n ), g obs n + R f n ≤ 1 α n S F( f † ), g obs n + R f † .
Rearranging terms and using the definition of the effective noise level this implies R f n + 1 C err α n T g † F( f n ) ≤ R f † + 1 α n err F( f n ) .

Tikhonov Regularization
Bounding the effective noise level by err n and taking the limit superior on both sides this shows that lim sup (1.4) due to the assumption on the choice of α n . As T g † is positive and R is lower semicompact by Assumption 1.4 we obtain that there exists a subsequence ( f n k ) k∈N converging to somef . As D is supposed to be sequentially closed we getf ∈ D ; hence by the lower semicontinuity of T g † and sequential continuity of F that Thus F(f ) = g † , as furthermore R(f ) ≤ R( f † ) this impliesf = f † by Assumption 1.4. Due to the uniqueness of f † this implies that f n → f † , hence lim n→∞ f * , f n − f † = 0 for all f * ∈ ∂R( f † ). Combining (1.4) and the lower semicontinuity of R we see that lim sup n→∞ R f n ≤ R f † ≤ lim inf n→∞ R f n finishing the proof.

Numerical minimization of Tikhonov functionals
In order to simplify the notation we will abbreviate S(g) := S(g, g obs ) and T ( f ) := T g obs ,α ( f ) in this section. Additionally we will assume that S(·) is convex which is the case for the most prominent data fidelity terms.

Linear Forward Mappings
In case that F = T the Tikhonov functional is convex, hence any (sub-)gradient decent algorithm will eventually converge to a minimizer f of T . Especially in the Hilbert space setting where the solution of the Tikhonov functional is given by the solution of the linear equation (T * T + αI) f = T * g δ + α f 0 the Conjugate Gradient Method provides a very efficient and fast algorithm to solve this linear system. In the following we want to shortly present a First-Order Primal-Dual Algorithm for Convex Problems which works for functionals S and R which are proper, convex and lower semicontinuous. The algorithm solves the problem by simultaneously minimizing the Tikhonov functional and the corresponding dual problem. It was first developed in [CP10] for a (formally finite dimensional) Hilbert space setting and extended to an infinite dimensional Banach space setting in [Hom15].
Associated to each Tikhonov functional where for simplicity we absorbed the regularization parameter α into the data misfit functional there is a corresponding dual problem p ∈ arg max The three problems are related by the following Theorem: Theorem 1.9 (see [BC11,Chap. 19]). Assume that there exist f 0 ∈ X and p 0 ∈ Y * such that R( f 0 ), S(Tp 0 ), R * (T * p 0 ) and S * (p 0 ) are all finite. Set Then the following holds true: (a) If R * is continuous at T * p 0 , then a solution f of (1.5P) exist and µ = µ * .
(b) If S is continuous at T f 0 , then a solution p of (1.5D) exist and µ = µ * .
(c) The following are equivalent: (i) f is a solution of (1.5P), p is a solution of (1.5D) and µ = µ * .
(iii) The extremal relations hold true.
If any of the points in Theorem 1.9(c) is fulfilled we say that strong duality holds true. In this case the extremal relations can be used to set up an algorithm for solving the saddle point problem (1.5S). Using Corollary A.15, we can rewrite the second extremal relation as T f ∈ ∂S * (− p). Then scaling both equations with some positive parameters σ, τ as well as adding and subtracting j Y * (− p) and j X ( f ), where j Y * , j X are selections of the normalized duality mapping (see Definition B.6) we see that (1.7) In order to create an algorithm out of these equations efficient ways to evaluate the operators (j Y * + σ∂S * ) −1 and (j X + τ∂R) −1 , called resolvent mappings, are needed. Under the conditions of Theorem 1.9 it can be shown that they are well-defined and single valued, see [Roc70]. Furthermore, they can be computed by solving the minimization problem and likewise for the second resolvent. For several important cases there even exist closed form expressions of the resolvents: We can now use (1.7) together with an overrelaxation step to obtain the following algorithm to solve the saddle point problem (1.5S): Algorithm 1.11. First-Order Primal-Dual Algorithm for Convex Problems and setf 0 = f 0 .
• Iterate: For n ∈ N 0 set For convergence properties of this algorithm under suitable parameter choice rules and mild assumptions on the involved functionals see [Hom15,Chap. 4]. As a particular example, let X be 2-convex and Y be 2-smooth with S and R as in Example 1.10, then there exists a constant parameter choice rule for σ, τ, Θ such that (p k , f k ) converge linearly with respect to the the Bregman distance to a solution ( p, f ) of the saddle point problem.

Nonlinear Forward Mappings
If the forward operator is nonlinear, then the Tikhonov functional will usually have local minima even for convex functionals S, R. Hence finding the global minimizer is a great numerical challenge. For the convergence rate analysis of Tikhonov regularization in the following chapters we will still always assume that the minimizer f α is given.
If the forward operator is nonlinear but differentiable the corresponding inverse problem is often solved via successive linearization, that is given some guess f n one solves to obtain an update f n+1 = f . If this equation is ill-posed one of course has to regularize again.
• Given a Tikhonov functional with fixed α the following methods can be used: -If the functionals are smooth one can either apply a Gradient Decent Method or a Newton-type Method to obtain a locally convergent algorithm. The problem with gradient decent methods is that they generally converge quite slowly, also their performance can be improved by suitable adjustment of the step direction, see e.g. [BB88]. The main problem of Newton's method is that it require the Hessian of the forward operator, which is usually very costly to compute, as the method is applied to the first order optimality conditions. So one typically uses quasi Newton's methods like BFGS where only the Jacobian of the forward operator is needed. -Alternatively one can try to directly apply the technique of the previous subsection. While there exists no dual problem one can still formulate a saddle point problem for a nonlinear operator which is of the form An algorithm similar to the one presented in Algorithm 1.11 for solving this equation with nonlinear operator F has first been developed in [Val14] for a finite dimensional setting and was extended in [CV17] to an infinite dimensional Hilbert space setting. Assuming that the forward operator is differentiable, the iteration reads and local convergence has been shown under suitable regularity assumptions. For global convergence properties see [CMV18].

Tikhonov Regularization
• Another idea is to solve a sequence of Tikhonov functionals where α n → 0 together with an early stopping rule which lead to new regularization methods.
-Starting with [Ram03] and extended in [ZW16] an algorithm that yields global convergence has been studied where S is a norm power and R convex of power type. The idea is to first compute an approximate minimizer of a Tikhonov functional with a very large α via an iterative algorithm, as it can be shown that then the Tikhonov functional is convex in a neighborhood of f † . Then α is reduced and the previous minimizer is used as an initial guess. It can be shown that this algorithm terminates after finitely many inner and outer iterations, however strong assumptions have to be made (one needs the nonlinear version of the source condition (2.17) as well as a tangential cone condition (2.13)).
-A widely used method is the iteration which is the so called iteratively regularized Gauss-Newton method. As at each step a Tikhonov type functional has to be minimized one can apply the First-Order Primal-Dual Algorithm presented in the previous section as an inner algorithm. In [HW12] it is shown that together with an early stopping rule this algorithm provides a regularization method under suitable assumption restricting the nonlinearity of the forward operator (the conditions generalize the tangential cone condition (2.13)). Furthermore a large part of the convergence rate theory described later on carries over to this regularization technique.

CHAPTER II CONVERGENCE RATES THEORY
Mathematics is the work of individuals. But its concepts and its theorems belong to no person and no ethnic, religious, or political group. They belong to all of us. Mathematical knowledge builds on the work of those who have gone before us. It is hard won, and we often do not value it as we should. Anyone of us with an elementary school education can solve arithmetic and algebraic problems that would have defeated the most learned Babylonian scribes. Anyone of us with a few courses of calculus and linear algebra can solve problems that Pythagoras, Archimedes, or even Newton could not have touched. A mathematics graduate student today can handle topological calculations that Riemann and Poincare could not have begun. We are not smarter than they. Rather, we are their beneficiaries.
DONAL O'SHEA, in "The Poincaré conjecture" We have seen in Theorem 1.8 that if we have a sequence of data for which the limit of the noise level vanishes, then the estimator of the Tikhonov functionals converge to the exact solution. A key question in regularization theory is to determine bounds on the distance between regularized and true solution. In the classical Tikhonov model one tries to find error estimates of the form 2. Convergence Rates Theory for some continuous function ψ : [0, ∞) → [0, ∞) with ψ(0) = 0 and some parameter choice rule α = α(δ, g obs ). More generally one tries to find error estimates of the form where E is some generalized loss functional for some parameter choice α depending on the data and the noise model. Such error estimates are called convergence rates in regularization theory. In order to obtain convergence rates one needs additional information on the true solution as the following proposition shows: then F −1 is continuous.
In this chapter we will study several conditions on the true solution f † which will guarantee that convergence rate of the form (2.1) hold true. In Section 2.1 we will look at spectral source conditions that will yield convergence rate in the classical Hilbert space setting, and in Section 2.2 at the closely related range conditions which allow to extend some special cases to a Banach space setting. Next we will show how stability estimates can be used to derive convergence rates for specially constructed Tikhonov functionals. In Section 2.4 we will present the convergence rate theory based on variational source conditions which can be considered as the state of the art theory. Afterwards we will describe an abstract framework to prove such a condition which is applicable for a wide range of settings and will be an important tool in the following chapters. Lastly we will discuss the relation to the previously studied conditions and give some further results on variational source conditions.

Spectral Source Conditions
In this section we will present the classical convergence rate theory in Hilbert spaces, the basics of this are already treated in [EHN96]. This theory is very well suited to deal with solving linear equations in Hilbert spaces. We will recall the main convergence rate result and briefly comment on optimality, parameter choice rules and extensions to nonlinear operators. As this theory relies heavily on the spectral calculus it is not extendable to a Banach space setting, however in special cases the conditions are equivalent to range conditions which can also be exploited in a Banach space setting to get convergence rates as we will see in Section 2.2.

Definition 2.2. A continuous function
We say that f † satisfies a spectral source condition, if for some index function ϕ, source element w and initial guess f 0 . The two most commonly used examples of index functions in spectral source conditions are Hölder source condition where for some ν > 0 and logarithmic source condition where for some p > 0 and some t 0 ∈ (0, 1). The difficulty with condition (2.2) is the interpretation of the equation. It turns out that for many problems spectral source conditions with the index functions as in (2.3) can be interpreted as smoothness conditions on f † − f 0 in the sense of classic Sobolev smoothness of the true solution. Hence one often refers to (2.2) as a smoothness condition.
Example 2.3. For the following examples such an interpretation holds true with f 0 = 0: • In the case of Numerical Differentiation (that is, • The Backward heat equation tries to determine the temperature distribution at t = 0 given the temperature distribution at some later time t = τ > 0 (see Section 3.6 for details on the operator). If one assumes a periodic setting, that . Similar results hold for the side ways heat equation and in satellite gradiometry, see [Hoh00].
• In the case of inverse potential and inverse scattering (which are both nonlinear problems, see Section 2.1.4 below on how to treat nonlinear problems) it has been shown in [Hoh97] that even analyticity of the true solution is not enough in order to obtain (2.2) with ϕ = ϕ H ν as in (2.3a) while ϕ = ϕ L p as in (2.3b) has again meaningful interpretations in terms of Sobolev smoothness.
An example for the verification of a spectral source condition which is not of the form (2.3) can be found in [HH01, Lemma 4.2] (see also Lemma 5.16).

Convergence rates
In order to state the convergence rate result for spectral source conditions, we need to introduce a quasiordering on the set of index functions.
Definition 2.4. We say that an index function ϕ 0 covers an index function ϕ if there exists a constant c > 0 such that If ϕ 0 covers ϕ, we write ϕ ϕ 0 .
Whether an index function covers another is only determined by their behavior around zero; roughly speaking ϕ ϕ 0 means that ϕ 0 decays faster to zero then ϕ.
Theorem 2.5 (convergence rate, see [MP03,Thm. 2]). Let f † fulfill (2.2) for some index function ϕ, f 0 ∈ X and w ∈ X such that w ≤ . Assume that ϕ id and set Θ(t) := √ tϕ(t). Chooseᾱ byᾱ then there exists a constant c > 0 such that for fᾱ the minimizer of the classical Tikhonov functional the error estimate The proof is based on splitting the error in two different parts where f α is the minimizer of T g † ,α . The two error terms on the right hand side are called propagated data noise error and approximation error (or bias in a statistical context) respectively. For the propagated data noise error one has the estimate 1 while the estimate of the approximation error is given by if the conditions of the theorem are met. The parameter choice rule then ensures that the right hand sides of (2.7) are of equal order and so their sum approximately minimize the total error. If we specialize the theorem above to the index functions considered in (2.3) we see that in the case of ϕ = ϕ H ν we have ϕ H ν id if and only if ν ≤ 2. In the case of ν ≤ 2 we see that a choice of α = (δ/ρ) 2/(ν+1) implies the convergence rate When ν > 2 the additional smoothness of the true solution cannot be used to improve the convergence rate when using Tikhonov regularization (other regularization methods have to be used in this case, see Section 3.2). One might wonder whether faster rates are possible for Tikhonov regularization which are attainable under different conditions. However, the following shows that this only the case if f † = f 0 : Proposition 2.6 (see [Neu97, Thm. 3.1]). Letᾱ = α(δ, g obs ) be a parameter choice rule such that sup We would like to give a small example (which we will refer to repeatedly in this chapter) where Hölder type source conditions are fulfilled: Example 2.7. Let T : 2 (N) → 2 (N) be given by (T f ) n = 1 n f n . Let f † ∈ 2 (N) be given by f † n = 1 n . Then for f 0 = 0 a short calculation shows that but not for ν = 1/2. Hence Theorem 2.5 shows that for a classical noise model and an appropriate parameter choice ruleᾱ the convergence rate for the minimizer of the Tikhonov functional fᾱ holds true. Note that ν ν+1 < 1 3 for all ν ∈ (0, 1/2) and w (ν) | 2 (N) 1 ν+1 → ∞ as ν → 1/2. For ϕ = ϕ L p we obtain that ϕ L p id for all p > 0. Since Θ −1 does not have a nice closed form expression we cannot simplify the parameter choice rule but can show that

Convergence Rates Theory
One can actually prove that this rate is even achievable by the very simple parameter choice rule α = δ/ (see [Hoh00,Thm. 5]). One of the natural questions that appear is whether all possible solutions f † fulfill a spectral source condition. This question has been answered for injective operators, first under the assumption that they are also compact in [MH08] and later for the general case in [HMvW09].
Proposition 2.8. Let T be injective. Then for all f † ∈ X there exist an index function ϕ and w ∈ X such that f † = ϕ(T * T)w and w ≤ 2 f † .
Note that this does not contradict Proposition 2.1: while for each specific element f † ∈ X we get a convergence rate, we cannot uniformly bound these rates on the whole space X .

Lower bounds on rates
So far we have seen convergence rates for Tikhonov regularization under a-prior parameter choice rules for linear operators. One might wonder whether the derived convergence rates are optimal or better rates are possible under the same conditions either for Tikhonov regularization or any other reconstruction method. In order to compare different methods we define: Definition 2.9. Let F : X → Y . Then for any (linear or nonlinear) mapping R : Y → X the worst case error at noise level δ on the set K ⊂ X is defined as We will say that a method R converges of optimal order on the set K if there exists a constant c ≥ 1 such that D R (δ, K) ≤ c infR DR(δ, K). In order to determine whether a method is of optimal order, we hence need a universal lower bound for the worst case error. This bound is given by the following: Definition 2.10. Let F : X → Y be injective and K ⊂ X . Then the modulus of continuity of (F| K ) −1 is defined as Lemma 2.11 (compare [EHN96,Rem. 3.12]). The worst case error for any (linear or nonlinear) reconstruction method R : Y → X satisfies the lower bound and taking the supremum over the left hand side yields the claim.
It therefore remains to calculate lower bounds on suitable sets K for the modulus of continuity. In the case of linear operators and spectral source conditions a suitable source set is given by For Tikhonov regularization we know that on these sets the estimate holds true by Theorem 2.5. Proposition 2.12 (lower bound, [MP03, Thm. 1]). Let ϕ be an index function and Θ be defined as in Theorem 2.5.
Furthermore it can be shown that if the gaps in the spectrum are bounded on a logarithmic scale and ϕ is of type (2.3), then there exist a constant c > 1 such that the estimate in Proposition 2.12(a) also holds true for Θ −1 (δ/ ) ∈ σ(T * T) if we multiply the left hand side by c; hence Tikhonov regularization is always of optimal order in this case.

Parameter choice rules
The previous two sections have illustrated how to obtain order optimal convergence rates for Tikhonov regularization under the assumption of a spectral source condition (2.2). Achieving this rate required the choice of the regularization parameter α according to (2.4). In practice however it is usually unknown what type of source condition the solution fulfills, i.e. ϕ and are unknown. Hence one would like to have a criterion which selects the best solution f j for a sequence of solutions to the Tikhonov functional f j := f α j . Here best solutions mean that the selection criterion will lead to order optimal convergence rates. We will discuss two of these methods here, the sequential discrepancy principle and the Lepskiȋ principle which both operate on the set of regularization parameters for some sufficiently large α 0 > 0 and q ∈ (0, 1). The idea is to evaluate after the computation of f j for j = 1, . . . , J which f j is an order optimal solution with low computational cost.

The sequential discrepancy principle
The discrepancy principle introduced by Morozov [Mor66] is based on the following heuristic idea: if for the true solution f † solving T f † = g † we have that g † − g obs ≤ δ, then it does not make sense to look for approximate solutions f α such that T f α − g obs δ.
As generally the residual decreases as α decreases one should instead look for the largest α such that T f α − g obs ≈ δ.
For simplicity let f 0 = 0, τ > 1 be a parameter and assume that the signal-to-noise ratio fulfills g obs /δ > τ. Then we choose the regularization parameterᾱ d by the rule T fᾱ d − g obs ≤ τδ ≤ T fᾱ d /q − g obs , (2.9) whereᾱ d ∈ (α j ) j∈N as defined in (2.8). By the functional calculus we get that which shows that the dependence of the residual on α is continuous and monotone. As the limits of the residual are given by we see that the sequential discrepancy principle is well defined if α 0 is large enough. Moreover note that one can directly check after the computation of each f j whether the criterion (2.9) is fulfilled by a simple evaluation of the forward operator; hence the discrepancy principle is numerically cheap to compute.
A proof of this theorem for Hölder-type source conditions (2.3a) can also be found in [EHN96,Thm. 4.17 and Rem. 4.18] and a version for logarithmic source conditions in [Hoh00,Thm. 7]. There is an important difference between the previous theorem and Theorem 2.5 if one considers Hölder-type source conditions with index function ϕ H ν . The discrepancy principle yields order optimal rates only for ν ∈ (0, 1] since otherwise the condition ϕ H ν √ · is violated. Indeed it can be shown that for Höldertype source conditions with ν > 1 order optimal rates for this parameter choice rule are achieved only in very special cases, see [Gro84, Theorem 3.3.6]. Instead of using the sequential version presented above one can also solve the equation G(t) = 0 where G(t) = T f 1/t − g obs 2 − (τδ) 2 . One can show that this function is two times differentiable and convex, therefore it is solvable via Newton's method and the iteration will converge globally. It can be formulated in a way such that the only inverse operators appearing are of the form (T * T + 1 t I) −1 which have to be computed anyway. This is the original discrepancy principle, however the sequential version described here can be generalized easier to more general Tikhonov functionals, see [AHM14].

The Lepskiȋ principle
The Lepskiȋ principle builds onto the idea of splitting the error as in (2.6). It originates from [Lep90] and was further developed in [MP03,BH05,Mat06]. Note that the bound on the propagated data noise error (2.7a) is known a-priori and decreases in α (respectively increases in j if we pick α j by (2.8)); we will denote it by Φ noi (j). The approximation error (2.7b) on the other hand depends on the unknown smoothness of f † and has a bound -denoted by Φ app (j) -that is increasing in α (respectively decreasing in j). Hence we have an error decomposition of the form (2.10) 36

Convergence Rates Theory
Ideally, after computing f 1 , . . . f J for some J ∈ N, we would pick the index j or ∈ {1, . . . , J} as a parameter choice rule such that j or ∈ arg min j∈{1,...,J} As such a parameter choice rule requires knowledge of the unknown exact solution it is called an oracle choice. Adopting the idea of balancing the two errors which lead to (2.5) another idea would be to choose j * given by but as the approximation error is unknown this rule is still not implementable. Even though we know that for j ≥ j * the total error is bounded by 2Φ noi (j) which can be used to define the parameter choice As for all k > j ≥ j * the inequality f j − f k ≤ 4Φ noi (k) holds true, we finally obtain an implementable parameter choice rule which is the so called Lepskiȋ principle.
Theorem 2.14 (see [Mat06]). Let f † fulfill (2.2) for some index function ϕ and w ∈ X such that w ≤ and ϕ id. Let α j for j = 1, . . . J be generated by (2.8) and let J be large enough such that Φ noi (J) ≥ Φ app (J). Then j Lep ≤ j * ≤ j * and withᾱ Lep := α 0 q j Lep the estimate The condition Φ noi (J) ≥ Φ app (J) guarantees that the choice of j * is well defined which is needed throughout the analysis. For Tikhonov regularization it can be shown that Φ app (j) ≤ 2 f † , and thus we have to choose J large enough such that It is evident that the main difference between the discrepancy principle and this approach is that we cannot determine when to stop the computation of f j on the fly but have to compute a fixed number of iteration as detailed above. In return we get order optimal convergence rates even for Hölder-type source conditions with ν ∈ (1, 2].

Nonlinear operators
If the operator F is nonlinear (2.2) is not well defined. Assuming that the operator is Fréchet differentiable, we replace it by As many operators appearing in practical applications are smooth this condition is not very restrictive. Furthermore the source condition implies that there exists somew ∈ X such that f † − f 0 = F [ f † ] * w (also see the Section 2.2 about range conditions). If the smallness assumption L w < 1 is satisfied, then one obtains the same rates as in Theorem 2.5, see [EKN89,Neu89].
But for weaker source conditions this is not sufficient and stronger nonlinearity assumptions are necessary. Here two different concepts have been used: • In [HNS95] the tangential cone condition 13) for all f 1 , f 2 in a ball around f † with some a ∈ [0, 1], b ∈ [0, 2] and η > 0 usually assumed to be small was introduced. Then if either a = 1 and η < 1 or a ≥ 1 + ν(1 − a − b) > 0 and ϕ = ϕ H ν Theorem 2.5 remains valid even for ν ∈ (0, 1], see [HS94,Tau97]. Lipschitz continuity of the derivative as above implies a tangential cone condition with a = 0 and b = 2 however for convergence rates it is needed that a ≈ 1. Hence this condition is much more restrictive especially since for ill-posed problems F( f 1 ) − F( f 2 ) can decay much faster then • Starting from [SEK93] the condition 14) for all f 1 , f 2 in a ball around f † has been considered. Convergence rates as in Theorem 2.5 under smallness assumptions of k 0 f † − f 0 for ϕ = ϕ H ν have been derived in [JH99,TJ03] and for general source conditions in [MN07].
The two conditions were put in relation in [Kal08]. Assume that for f 1 , f 2 close enough to f † we have either 2. Convergence Rates Theory for some C r > 0 and β ∈ [0, 1] or for the same range of parameters. Then on the one hand it can be shown that (2.15a) with β = 1 is equivalent to (2.14). On the other hand (2.15b) with β > 0 implies (2.13) with a = 1 and b = β. Indeed, since we can estimate The tangential cone condition in a small enough neighborhood then follows with η = 2C r /(1 + β) by estimating In summary, obtaining convergence rate by spectral source conditions for nonlinear operators is still possible. Strong assumptions on the nonlinearity of F, however, have to be imposed for weak source conditions that guarantee that F is reasonably well approximated by F and that F does not change to rapidly. But these conditions are very hard to verify and may even not be fulfilled, see e.g. [BH14].

Range Conditions
Coming back to linear operators T, we will now generalize two specific cases to a Banach space setting. Consider the case that f † fulfills a Hölder type source condition (2.3a) with ν = 1 or ν = 2, then using the fact that ran(T * ) = ran((T * T) 1/2 ) we can write these two specific source conditions as We know that for proper parameter choice rules these conditions lead to convergence rates for ∆ 1 2 · 2 ( f α , f † ) = 1 2 f α − f † 2 of O(δ) and O(δ 4/3 ) respectively. Note that this formulation of the two conditions does not require the functional calculus, hence one might hope for a generalization. The aim of this section is to provide such a generalization that yield similar rates in the non-Hilbert space setting when S(g, g obs ) = 1 q g − g obs q for some q > 1.
If we explicitly state the dependence on α in (1.5P) and take the limit of α → 0 for true data g † we obtain the minimization problem (2.16) which by our assumptions on R (see Assumption 1.4) has the unique solution f † . Requiring that strong duality (i.e. one of the points of Theorem 1.9(c)) holds true for this problem we obtain the extremal relation (2.17) Interpreting this equation as a source condition was first done in [BO04] and the following result has been achieved: Theorem 2.15. Let f † fulfill the source condition (2.17) then there exists a c > 0 such that the estimate holds true. In particular for α ∼ δ q−1 the convergence rate Note that for the Hilbert space case and classical Tikhonov regularization (2.17) is equivalent to f † = T * p and therefore to the Hölder source condition with index ν = 1. As ∆ 1 2 · 2 ( f 1 , f 2 ) = 1 2 f 1 − f 2 2 we obtain the same rate as expected by Theorem 2.5.
To motivate the stronger source condition assume that Y and Y * are uniformly convex and uniformly smooth. We then replace in (2.16) the equality constraint T f = g † by a minimization property to obtain the problem where again f 0 = f † by assumption. Expressing the minimization property by the first order optimality conditions and using single valuedness for the duality mapping we can conclude that ⇐⇒ T f ∈ g † + j Y * ,q (ker(T * )) 40 2. Convergence Rates Theory after applying the likewise single valued operator (j Y,q ) −1 = j Y * ,q on both sides. Note that j Y * ,q (ker(T * )) = j Y * (ker(T * )) as the two duality maps only differ by different scaling and that this set is a closed subspace of Y. Indeed j Y * (ker(T * )) is symmetric, homogeneous and convex as the set of solutions to a convex optimization problem. Hence if we require strong duality to hold for the problem (2.18) we obtain as extremal relations where for a set K ⊂ X we denote by K 0 ⊂ X the set K 0 := { f * ∈ X : f * , f = 0 for all f ∈ K}. It holds that (j Y * (ker(T * ))) 0 = j Y * ((ker(T * )) 0 ) if Y is a Hilbert space; assuming that the same relation holds true in a Banach space 2 we get that p ∈ j Y * (ran T). Requiring that even p ∈ j Y * (ran T) is fulfilled, see [Res05, Sec. 2] for further motivation, one obtains which will again be regarded as a source condition.

Rates by Stability Estimates
Convergence rates for Tikhonov regularization can also be obtained by conditional stability estimates without any further assumptions which restrict the nonlinearity of the operator F. Conditional stability estimates are of interest e.g. in parameter 2.3. Rates by Stability Estimates 41 identification problems and proven for a lot of different applications, see e.g. [Isa06] as well as the references in Chapters 5 and 6. They have the form is a monotonically increasing function, Ψ is a concave index function and B Z is the set of all f ∈ Z with f | Z ≤ where Z ⊂ X is a dense, continuously embedded subspace.
Assuming that the true solution f † satisfies f † ∈ Z, Tikhonov regularization with the functional given by The main idea of the proof is to find a that is large enough to ensure that f α , f † ∈ B Z and then apply the stability estimate.
One important difference to classical Tikhonov regularization is that the parameter choice rule violates the conditions of Theorem 1.8. Hence we cannot guarantee that f α → f † in Z for the given parameter choice rule.
In order to get a feeling on how this approach compares to spectral source conditions, we return to Example 2.7. is finite. Then for f † and T as in Example 2.7 we obtain that f † ∈ 2 s (N) for s < 1/2. Results for two different stability estimate will be discussed here: (a) For X = Y = 2 (N) and with Z = 2 s (N) for some s ∈ (0, 1/2) the stability estimate 2. Convergence Rates Theory holds true and the exponents in this inequality are optimal. Thus Theorem 2.17 implies the same rate as we obtained in Example 2.7. Note, however, that we need to know the smoothness of f † (that is the fact that f † ∈ 2 s (N) = Z) in advance, while the rate in Example 2.7 can be achieved with a-posteriori parameter choice rules without knowing the specific spectral source condition.
(b) Besides the previous stability estimate, T is also Lipschitz stable, namely So by changing the functional setting to X = 2 −1 (N) and Z = 2 (N) we obtain a convergence rate of which is the rate of a well-posed problem and therefore faster than the rate obtainable by classical Tikhonov regularization with an ill-posed operator. It is, however, obtained in a weaker norm and we do not know whether f α → f † with respect to 2 (N). Note that the knowledge f † ∈ 2 s (N) for 0 < s < 1/2 is not used in this result.
To summarize the results so far: From the first estimate we could get the convergence rates as before but we would need to know the exact smoothness of f † in advance which is unrealistic in practical applications. From the second estimate we achieve a faster rate but this rate is obtained in a weak norm and we cannot use the full smoothness of our true solution. These shortcomings have recently been addressed in [EH18]. They assume that the involved spaces are part of a so called Hilbert scale (for the usage of Hilbert scales in regularization theory see e.g. [Nat84,Neu92]): given a Hilbert space X and a compact operator L : X → X we define X t for t ∈ R as the set of all elements such that f | X t := L −t f | X is finite. An example of a Hilbert scale was already given in Example 2.18, further examples are L 2 -Sobolev spaces on T d .
. Let (X t ) t∈R be a Hilbert scale and assume that (2.20) holds true with X = X −a and Z = X b and Ψ(δ) = δ γ with −a ≤ b, a ≥ 0 and 0 < γ ≤ 1. Assume that f † ∈ dom(F) ∩ X s for b ≤ s ≤ 2b + a and let f α be the minimizer of the Tikhonov functional (2.21), then there exists a constant c > 0 depending on f † | X s such that holds true for the parameter choice α = δ 2−2γ s−b s+a .
The same rates can be obtained via an a-posteriori parameter choice rule based on the discrepancy principle. The advantage of this result compared to Theorem 2.17 is that smoothness of the exact solution has to be known only roughly. Coming back to Example 2.18(b) we see that we recover again the rate for s ∈ (0, 1/2) with c depending on s which is not surprising as the parameter choice rule coincides with the rule of Theorem 2.5. It was however derived quite differently. Especially if the operator would have been nonlinear, none of the conditions discussed in Section 2.1.4 would have been needed. We will sketch the proof of a similar result in Section 2.4.3.3 with the help of variational source conditions and in Section 6.3.1 extend it to a scale of Besov spaces.

Variational Source Conditions
Up to now we have seen several different approaches to achieve convergence rates for Tikhonov regularization: (a) In Section 2.1 we have seen how to obtain rates for classical Tikhonov regularization via spectral source conditions. However, this theory builds heavily on functional calculus, hence generalizing these results in a form such that we get rates for the general Tikhonov functional (1.3) seems impossible. Already turning to nonlinear problems as discussed in Section 2.1.4 turns out to be quite difficult as the forward operator F does not only have to be differentiable but also needs to fulfill nonlinearity conditions which often are very hard to verify or may not even hold true.
(b) Section 2.2 provided a way to prove rates in the more general setting. However, the scale of rates which can be obtained is rather limited. The conditions (2.17) and (2.19) allow to recover the rates for spectral source conditions with Höldertype source function ϕ H ν for ν = 1 and ν = 2 respectively. But Example 2.3 has shown that these are rather special cases even in a Hilbert space setting, and might not have a meaningful interpretation.
(c) We obtained a large set of convergence rates in Section 2.3 by deriving them from stability estimates which are often quite well known for inverse problems. While the nonlinearity of the operator was not restricted, the main problem was that in general the smoothness of the solution needed to be known at least roughly a priori.
A key step to overcome the shortcomings of these ansatzes have been provided in [HKPS07]: Here a condition of the form has been used where f * ∈ ∂R( f † ) with some parameters β 1 ∈ (0, 1), β 2 ∈ [0, ∞). If S(g 1 , g 2 ) = 1 q g 1 − g 2 q for some q > 1 this lead to the convergence rate for the parameter choice α = δ q−1 . This idea has been further developed in a series of papers [BH10,Fle10,Gra10] leading to the following: We say that f † fulfills a variational source condition (for short: VSC) for a loss function E , and a concave index function ψ if (2.22) As with spectral source conditions the difficulty lies in the interpretation of the condition which is the main subject of this thesis. In the following we will first present the corresponding convergence rates result before providing a strategy to verify variational source conditions which will be an important tool later on. This strategy is also one of the main tools to relate variational source conditions to the previous convergence rates results.

Convergence rates
This assumption now allows to easily derive convergence rates.
Further the following rate is obtained in the image space If there exists err such that err ≥ err(F( f α )), then the infimum of the right hand side of (2.23a) with err(F( f α )) replaced by err is attained at α =ᾱ if and only if For convinience abbreviate T := T g † (F( f α )). As f α is a minimizer of the Tikhonov functional we obtain together with the definition of the noise level that for all λ ∈ [0, 1). Replacing T = T g † (F( f α )) inside the square bracket by τ and taking the supremum over all τ ≥ 0 this yields For λ = 0 this gives (2.23a), if λ = 1/2 we get (2.23b) by nonnegativity of the loss function and rearranging terms.
Remark 2.21. Let there exist err such that err ≥ err(F( f α )). Then in order to get convergence rates one does not necessarily need the VSC (2.22) for all f ∈ D, instead any of the following is sufficent: 46 2. Convergence Rates Theory (a) If one can characterize the set { f α : f α is a minimizer of T g obs ,α } in advance, then it is enough that (2.22) is fulfilled on this set.
≤ ρ} for any ρ > 0, since by Theorem 1.8 we know that if err → 0 we will get eventually that ∆ R ( f α , f † ) ≤ ρ and afterwards we can proceed as above.
For the two most common index function introduced in (2.3) the functions governing the convergence rate in (2.23a) are given by for Hölder type functions with ν ∈ (0, 2) and for logarithmic functions the limiting behavior holds true, see [Fle11].

A meta-theorem to prove variational source conditions with Bregman loss
If the loss function is given by a multiple of the Bregman distance, that is E ( f , f † ) = β∆ f * R ( f , f † ) and the constant β satisfies β ∈ (0, 1) we can rewrite the variational source condition into the following form: Note that by Theorem 2.20 the constant β does only play a minor role; it influences the constant in front of the convergence rate O(ψ(err)), but not the limiting behavior which is only determined by ψ. Further a VSC with constant β implies a VSC with constantβ ∈ (0, β) for the same function ψ. Therefore we will focus on verifying variational source conditions with β = 1/4 in the form above, that is inequalities of the form ∀ f ∈ D : In principle our strategy presented below could be used to show that a VSC for any β ∈ (0, 1) is fulfilled, but we choose β = 1/4 for consistency with Section 2.4.4.3. A main ingredient of our strategy will be the following assumption.
Assumption 2.22. There exist constants C ∆ > 0 and r > 1 such that Example 2.23. The following provides a list of examples where Assumption 2.22 is fulfilled: (a) Let X be a Hilbert space, then choosing R (b) Let X be a Banach space which is r-convex, see Definition B.1. Then there exists a constant C X depending only on r such that (c) Let X = L 1 (Ω) and R be the Kullback-Leibler divergence given by Then the corresponding Bregman distance is given by If f 1 and f 2 are probability density functions, then the Kullback-Leibler divergence fulfills the estimate showing that Assumption 2.22 is met with r = 2 and C ∆ = 1 2 . A similar estimate for positive, uniformly bounded functions with X = L 2 (Ω) is proven by the same authors.
(d) An example where Assumption 2.22 is not fulfilled is given by X = 1 (N) and and thus if sgn( f (n) 2 ) for all n ∈ N we have ∆ f * R ( f 2 , f 1 ) = 0 which cannot be bounded from below in the desired form for f 1 = f 2 . Theorem 2.24. Let X and Y be Banach spaces and R a penalty term such that Assumption 2.22 is fulfilled. Let f † ∈ D and f * ∈ ∂R( f † ). Suppose that there exists a family of operators P j : X * → X * for j ∈ J an index set such that for some functions κ, σ, γ : J → [0, ∞), a constant ϑ ∈ (0, r) and some index functions φ,φ such that φ andφ r/(r−ϑ) are concave the following holds true for all j ∈ J: and (2.28c) (2.29) Condition (2.28a) describes the smoothness of the solution (actually rather the smoothness of the subdifferential, but in the examples considered later one of the two uniquely determines the other, see Section 4.2), whereas (2.28c) describes the local degree of ill-posedness of the problem. Of course the two are not independent of each other: given estimates of the form 2.28 any reparametrization of the set J provides new estimate with different functions κ, σ and γ.
Example 2.25. We now justify these interpretations.
(a) Consider the case that X , Y are Hilbert spaces and T is an injective, compact operator with the singular system ( f k , g k , σ k ) k∈N . Set P j f = ∑ {k : σ 2 k ≥j} f , f k f k for J = (0, ∞). As f * = f † , we obtain that (2.28a) holds true with i.e. κ measures the decay rate of the coefficients of f † in the system ( f k ) k∈N . When f k are trigonometric polynomials this measures classical smoothness.
We get an inequality of the form (2.28c) by i.e. σ measures the decay rate of the singular values of T relative to the decay rate of the coefficients of f † in the singular system; therefore it measure the local degree of ill-posedness. In contrast the upper bound only depends on the norm of f † and the decay of the singular values, thus it measures the global degree of ill-posedness, further possible definitions of this term are discussed in [HK10].
(b) In [HS94] the following construction is used to define the local degree of illposedness for nonlinear operators F acting between Hilbert spaces: Denote by Choose from this class the operatorĜ with the smallest nullspace (to avoid G ≡ 0) and the slowest decay of singular values. Then the decay rate of the singular values ofĜ is the local degree of ill-posedness of the operator. 3 A condition that for some c > 0 and β > 0. If such an estimate holds true forĜ and β ∈ (0, 2) then we can proceed as in the first part of the example (at least in a small enough neighborhood, see Remark 2.21) withĜ instead of T and obtain σ(j) as above with the singular values ofĜ replacing those of T; however, we will get φ(t) = t β/2 . Thus our definition of the local degree of ill-posedness extends [HS94]. This definition of local degree of ill-posedness is also connected to the nonlinearity conditions studied earlier. If (2.13) is fulfilled for some a > 0 then Cor. 2 and Prop. 6]). Hence we can conclude that the nonlinearity of F is already included into our condition.
(c) The notion local degree of ill-posedness was for noncompact, linear operators acting between Hilbert spaces also defined in [HK10,Sec. 4]. We will discuss the compatibleness of this definition with ours after Lemma 2.29.
Before proving the theorem we first show that a function defined of the form (2.29) always gives rise to a concave index function.
Lemma 2.26. Let J be an index set,κ : J → [0, ∞) be a function such that inf j∈Jκ (j) = 0 and (φ j ) j∈J be a set of concave index function. Then Proof. Since ψ is defined as an infimum over concave monotonically increasing functions, ψ is itself a concave monotonically increasing function. Asφ j is continuous for all j ∈ J, ψ is upper semicontinuous and hence continuous on This shows that ψ is continuous at t = 0 and therefore indeed a concave index function.

Proof of Theorem 2.24:
First assume that that is the variational source condition holds true even with ψ ≡ 0.

51
Otherwise using (2.28a), (2.28c) and Young's inequality we get for each j ∈ J that Taking the infimum over the right hand side with respect to j ∈ J yields (2.29) with we see thatφ j is a concave index function for all j ∈ J, thus the claim follows by Lemma 2.26 withκ(j) = cκ(j) r for c as above.
For linear operators one can often choose γ ≡ 0, φ(t) = t 1/2 and the additional restriction that one needs (2.28c) only if f † − f | X is small is not necessary as already seen in the specific example. For some nonlinear operators however the more general case turns out to be useful, see e.g. [HW15,HW17b] as well as Chapter 5 and 6.
gained some attention. In this case convergence rates can (in a different manner than above) be derived in the case that the VSC holds true with A further advantage of this condition 52 2. Convergence Rates Theory is that converse results are available, see [Fle17]. As a VSC of this form can equivalently be written as we can use Theorem 2.24 with some small modifications to verify VSCs with this loss function. For β = 0 it can be proven as before and for β > 0 additional information on f * is needed. As we need the VSC only on minimizers of Tikhonov functionals (see Remark 2.21) these information can be obtained from the first order optimality conditions. If for example X = Y = L 2 (T d ) for the classical Tikhonov functional and T = T * is such that ran(T) = H a (T d ), then f * = (T * T + αI) −1 T * g obs ∈ H a for all possible data g obs and thus the rate of decay of κ depends on whether f * or f * is less smooth.
(b) In case X = 1 and R(·) := · | 1 the corresponding Bregman distance is not very informative, as seen in Example 2.23. However, the choice E ( f , f † ) = f − f † | 1 was successfully used in a series of papers, see [FG18] and reference therein.
Further strategies or explicit verifications for VSCs are given in the following: Remark 2.28. Besides our results the following verifications of VSCs are known: • For a phase retrieval and an option pricing problem VSCs with ψ(t) = √ t were derived in [HKPS07].
• Spectral source conditions, range conditions and stability estiamates all imply that VSCs hold true, see Section 2.4.3 for details. However, then the VSC does not yield additional information.
• In case X = 1 , R(·) := · | 1 , E ( f , f † ) = f − f † | 1 and linear forward operator convergence are often derived based on sparsity (i.e. the true solution has only many nonzero coefficients) assumptions, see e.g. [LT08,GHS08]. An analysis where the sparsity assumption is violated was developed in a series of papers starting with [BFH13] and leading to [FG18]. For an analysis in the spirit of the setup presented here that extends to the nonlinear case see [HM18]. Furthermore, the results have been extended to elastic net regularization in [CHZ17].
• Based on a sparsity assumption a VSC was verified for the autoconvolution problem in [BFH16].
The choice of operators in Example 2.25(a) extends to linear operator acting between Hilbert spaces in general by setting J = (0, ∞) and P j := I − E j where E j := E j (T) denotes the spectral family of T * T; that is For this choice of the operator family the notions of smoothness and local ill-posedness are often equivalent.
Lemma 2.29. Let X and Y be Hilbert spaces, T : X → Y be linear and injective, f † ∈ X and κ an index function.
Assume that for all j ∈ [0, ∞) and all f ∈ X the inequality 2.32 holds true with Proof. The proof relies on the functional calculus.
Proof of (a): Since I − E j is a projection we obtain where (T * ) † denotes the pseudoinverse of T * . Partial integration yields The assumptions E λ f † | X ≤ κ(λ) and the monotonicity of t µ−1 κ(t) 2 then imply Combining the last two inequalities and taking the square root yields the claim.
Proof of (b): Splitting the interval (0, j) uniformly on a logarithmic scale we get that

Convergence Rates Theory
Evaluating 2.32 at j = j/2 k+1 and f = [ while the right hand side can be estimated by Thus the estimate holds true. Inserting into the expression for E j f † | X and using the expression for κ then yields Let us check the requirement of (a) for Hölder type functions κ(j) = ϕ H ν (j) = c ν j ν/2 : it holds that j µ−1 κ(j) 2 is decreasing if and only if ν ∈ (0, 1) and µ ∈ (0, 1 − ν). Hence for these cases our notions of smoothness and local ill-posedness given by (2.28a) and (2.28c) are actually equivalent.
In [HK10] a measure of local ill-posedness was defined by the decay behavior 4 of E j f † | X as j → 0 which we call smoothness of f † . The theorem above thus shows that our definition by (2.28c) is an extension of theirs. Indeed Lemma 2.29(a) does not only hold for Hölder type index functions but also for logarithmic index function κ(j) = ϕ L p (j) = c p (− log(min{j, t 0 })) −p , as we observe that there always exists a t 0 depending on p such that the condition on κ is fulfilled.

Relation to previous convergence rate results
Next we study the relation between variational source conditions and the concepts presented in Sections 2.1 to 2.3 to show that they are a generalization of these concepts.

Relation to spectral source conditions
We will now use Theorem 2.24 to derive VSCs from spectral source conditions. That this is possible has been known for some time, see [HY10,Fle11], however it required a far more complicated proof.
. Let T be injective and f † fulfill a spectral source condition (2.2) such that ϕ 2 is concave. Then f † fulfills a VSC (2.26) with index function Proof. We proceed similar to the proof of Lemma 2.29 and set J = (0, ∞), Then on the one hand we obtain that as ϕ is monotonically increasing. Hence we fulfill (2.28a) with κ(j) = ϕ(j). As ϕ is an index function we note that (2.28b) is satisfied since for injective operators lim j→0 I − P j f † = 0.
On the other hand we can rewrite since the two operators P j and ϕ(T * T) are self-adjoint and commute. With the Cauchy-Schwarz inequality we therefore obtain

Convergence Rates Theory
where the last estimate follows from the fact that ϕ 2 is concave and ϕ(0) = 0. Therefore we get (2.28c) with As a consequence a variational source condition holds true with Choosing j such that ϕ(j) the two terms in the square brackets are equal and hence the claim follows.
Note that the spectral source condition allows again to derive an ill-posedness estimate similar to Lemma 2.29(a) but under slightly relaxed assumptions on the index function. Indeed the above theorem allows Hölder index functions with ν ≤ 1.
As the Bregman distance in the Hilbert space setting is given by the square of the norm we immediately see that by using Theorem 2.20 we obtain the same convergence rates as implied by Theorem 2.5. To illustrate that variational source conditions are indeed a weaker condition we return to the problem discussed in Example 2.7 and 2.18.
Example 2.31. Using the calculations of Example 2.25(a) we know that we can set hence by Lemma 2.29(a) there is a c > 0 such that we get (2.28c) with as well as φ(t) = √ t and γ ≡ 0. Therefore we obtain by Theorem 2.24 that a VSC holds true with ψ(t) = ct 1/3 by the choice j = t 2/3 . Now using Theorem 2.20 we see that there exists a parameter choice ruleᾱ such that even the convergence rate holds true -a rate which we could not prove with the help of spectral source conditions.
Corollary 2.32. Let F fulfill the tangential cone condition (2.13) with a = 1, b = 0 and some η > 0, assume that F [ f † ] is injective and that f † fulfills (2.12) such that ϕ 2 is concave. Then f † fulfills a variational source condition.

Proof. By Theorem 2.30 we see that
. By the triangle inequality and the tangential cone condition we obtain that hence we can resubstitute while just loosing a constant.
Similar to Example 2.25(b), this result underlines that variational source conditions are well suited for getting a convergence rate in the nonlinear case since they combine source condition and nonlinearity assumption into one condition.

Relation to range conditions
q g − g † q for some q > 1, then we have seen that we can bound err ≤ 2 q δ q . Thus Theorem 2.20 implies the convergence rate and therefore a rate of O(δ) for the specific cases ψ(t) = ct 1/q or ψ(T g † (g)) = c g − g † . This is the same rate we obtained in Section 2.2 under the condition of (2.17).
The connection between those two conditions has already been studied in [HKPS07] and is one of the motivations for the introduction of VSCs in the given form.

Convergence Rates Theory
Note that (a)(iii) is equivalent to (2.26). One of the main downsides of VSCs is that the previous proposition gives a limit on which convergence rates are achievable.
This further illustrates that the restriction to concave index function for (2.22) is a natural one, as we even need ψ(t q ) to be concave in order to not end up in this particular case.

Relation to stability estimates
Variational source conditions are also closely connected with stability estimates. Assume for a moment that for a whole class of functions K ⊂ D a variational source condition with uniform constant β and function ψ holds true. We will see later that this is a quite mild assumption and can often be achieved by (2.22). If further the loss function is either symmetric or bounded from below by a symmetric loss functionalẼ (e.g. if it fulfills Assumption 2.22) then we obtain the stability estimateẼ since one of the two terms ±(R( f 1 ) − R( f 2 )) will be bounded from above by 0. On the other hand it is unclear how to verify a VSC from (2.33) for the same penalty term R directly, as the term R( f ) − R( f † ) might be negative and we further require the VSC to hold on the set D which is usually much larger then K. However, if K = B Z ∩ dom(F), then a stability estimate can indeed be used to verify a VSC where the penalty term is given by the a norm power of Z. We will use this in order to prove Theorem 2.19 for the choice r = b. To stay simple we will stick to the case of a Hilbert scale generated by a compact operator L, later on we will apply the same ideas in a Besov space setting, see Sections 4.4 and 6.3.1. We will assume as in (2.20) that the stability estimate is a monotonically increasing function and Ψ is an index function such that Ψ • √ · is concave. Further the true solution fulfills f † ∈ X 1 with f † | X 1 ≤ where 1/2 ≤ s < 1. Note that the numbering of the scales is no restriction, since via redefining which space we consider as X 0 and replacing L by L ν for some ν > 0 this is always possible and we are immediately in the setting of Theorem 2.19.
As in Example 2.25(a) let ( f j , g j , σ j ) j∈N be the singular system of L. Set P j f = ∑ k≤j f , f k f k (note that this is just a reparametrization of the choice in Example 2.25(a), however, it allows a more explicit calculation). Setting R( f ) = 1 2 f | X s 2 we see by Example 2.23 that Assumption 2.22 holds true for X = X s with r = 2 and C ∆ = 1/2. Moreover we get that f * = f † and hence can use that f * ∈ X 1 . This provides the estimate As and 2s ≥ 1 an estimate as above gives Since (2.28c) needs only be fulfilled for f with f † − f | X s ≤ 8 3 f † | X s we can assume the bound f | X s ≤ 4 which with the previous estimates gives us (2.28c) with γ ≡ 0, Choosing j such that σ j ≈ φ(t) then shows that a VSC holds true with Note that as explained above this VSC implies again a stability estiamte. The derived VSC yields for all f 1 , f 2 ∈ dom(F) ∩ B X 1 the stability estimate

Further results on variational source conditions
Lastly we want to discuss some further properties of convergence rates analysis assuming that a VSC holds true. First we will focus on a question arising from Proposition 2.34, namely if it is also possible to obtain convergence rate of δ ν ν+1 for ν ∈ (1, 2] as it is possible with Hölder type spectral source conditions. In case of linear operators we will see that imposing a VSC on the dual problem allows such enhanced convergence rates. The second issue extends Section 2.1.3; we will point out that the two a posteriori parameter choice rules presented there also provide convergence rate of the same order as the a priori parameter choice (2.24a). Lastly we show a result similar to Proposition 2.8, namely that VSC with Bregman loss are always fulfilled if the forward operator is injective.

Higher order rates
The question arises whether faster rates up to O(δ 4/3 ) as seen in Theorem 2.5 are achievable with VSCs. For linear operators this question has been answered in [Gra13], see also [SH18] for even faster rates by using Bregman iteration as a regularization method. The idea is roughly the following: If we want to achieve faster rates than O(δ) then at least the condition that guarantees the rate of O(δ) should hold true and hence by Proposition 2.33 we should have f * = T * p for f * ∈ ∂R( f † ). Further we will see thatp is itself a minimizer of a Tikhonov type functional and thus we suppose thatp fulfills a VSC for this problem.
Recall from Section 2.2 that the motivation for the condition f * = T * p was strong duality for the problem f ∈ arg min This implies thatp is a solution to the dual problem which has the form Hence adding f † , T * p − R * (T * p ) which does not depend on p we see thatp solves This can be understood as a Tikhonov functional with forward operator T * , true data fidelity term ∆ f † R * (·, T * p ) and penalty term 1 q · q . Therefore a VSC forp will take the form Especially if ψ(t) = t 1/r for some r > 1, then for Note that in the case where ψ(t) = t 1/r we always obtain a convergence rate with ∆ R ( fᾱ, f † ) = o(δ) but again we cannot be arbitrarily fast.
Lemma 2.36 ([Gra13, Lem. 5.1]). Let the assumptions of Theorem 2.35 be fulfilled and assume that R * is twice differentiable at T * p .
Hence the VSC onp fills the gap between the two range conditions (2.17) and (2.19). Therefore if f † does not minimize R convergence range of O(δ µ ) with µ ∈ (1, 4/3] are still achievable by requiring a VSC on the dual solution.

A posteriori parameter choice rules
The main convergence rate theorem 2.20 for VSCs states convergence rates under the a priori parameter choice rule (2.24a). However, as explained in Section 2.1.3 a posteriori parameter choice rules are favorable. We have already seen two of these methods in Section 2.1.3 and will explain how these two can be adapted to the setting of VSCs and more general Tikhonov functionals.

Discrepancy principle
The sequential discrepancy principle has been studied for the choices S(g 1 , g 2 ) = T g 1 (g 2 ) = 1 q g 1 − g 2 q for q > 1 in [HM12,AHM14]. Together with a classical noise model we have a well defined noise level δ, thus for some parameter τ > 0 choosingᾱ d according to (2.9) is well defined. Assuming that the true solution fulfills a VSC of the form (2.22) then yields the convergence rate if the following two criteria are met. In order to ensure thatᾱ d < ∞ we have to ensure again some kind of data compatibility; one needs that τδ ≤ g obs − F( f min ) for all f min ∈ arg min R. Further the case of exact penalization has to be excluded, that is that the Tikhonov functional recovers the true solution f † from exact data g † for α small enough. This is e.g. guaranteed if F and R are Gateaux-differentiable or more generally if R is convex and there exists a bounded linear operator a convex set with empty interior and hence a derivative of F is not well defined.

Lepskiȋ principle
The essence of the Lepskiȋ principle is the error split (2.10). Note that the same analysis as in Section 2.1.3.2 can be carried out if the left hand side is replaced by some metric d, see [Mat06]. The error split for an analysis of the Lepskiȋ principle under a VSC is provided by (2.23a). Again we have an approximation error of the formΦ app = (−ψ) * (− 1 C err α ) depending on the unknown smoothness characterized by ψ of f † and a propagated data noise error of the formΦ noi = 1 α err(F( f α )). If E is a metric and the error is globally bounded by err, then this split provides a convergence rate of E ( f j Lep , f † ) ≤ Cψ(err) as err → 0 as shown in [HM12].
When E is given by the Bregman distance, a similar analysis has been carried out under Assumption 2.22. As the validity of (2.22) with Bregman loss then implies a VSC with the same function ψ and E ( f 1 , f 2 ) ≥ C ∆ f 1 − f 2 r one obtains an error decomposition of the form (2.10) with This decomposition then implies again the convergence rate when the error is globally bounded by err, see [Wer12,Cor. 3.43]. An analysis of the Lepskiȋ principle where the error is not globally bounded but only bounded with high probability can be found in [WH12, Thm. 5.1].

Validity of variational source conditions with Bregman loss
We have seen in Proposition 2.8 that for linear and injective forward operators a spectral source condition (2.2) always holds true. Combined with Theorem 2.30 this implies that also a variational source condition will hold true if ϕ 2 is concave. Hence the question arises whether a similar result is obtainable for the more general situation.
It was recently shown in [Fle18] that if F is injective and T g 1 (g 2 ) = 1 q g 1 − g 2 q for q > 1 this is indeed the case under the general Assumptions 1.4 for some β > 0.
Following the same ideas we show that this generalizes to general T g † and fixed β = 1/4. The main idea of the proof is to use approximate variational source conditions -a concept which also leads to convergence rates for general Tikhonov regularization. It was however shown in [Fle11,Sec. 12.4] that this concept is equivalent to the concept of variational source conditions, therefore we will only study the basics here. The main idea behind approximate source conditions is to choose a benchmark function Φ, which has to be a concave index function, and then try to measure how close f † is to fulfilling a VSC with ψ = rΦ for all r ≥ 0. This gives rise to the definition of the distance function Obviously D is a monotonically decreasing function and D(r) ≥D( f † , r) = 0. Before showing how to obtain a VSC from the distance function D we first need some further properties of this function.
Lemma 2.37. Let f * ∈ ∂R( f † ), then for all f ∈ X the inequality Proof. Writing out the left hand side we get Using Young's inequality A.14 on f * , f and f * , f † with the functionals R and R * we get Lemma 2.38. Let F be injective, then lim r→∞ D(r) = 0.
Proof. By the properties of D stated above we will assume that D(r) > 0 for all r ≥ 0. We then first show that the supremum in the definition of D is attained for each r ≥ 0. Let ( f n ) n∈N be a maximizing sequence. We may assume without loss of generality thatD( f n , r) > 0. Using Lemma 2.37 we have Now let (r n ) n∈N be a sequence such that lim n→∞ r n = ∞ and define a new sequence ( f n ) n∈N ⊂ X by D(r n ) =D( f n , r n ). As above we obtain the bound Using again lower semicompactness the sequence of maximizers has a convergent subsequence denoted by (f n ) n∈N with associated (r n ) n∈N . We claim thatf n → f † . Indeed, as D(r n ) > 0 this leads tõ by the nonnegativity of the Bregman distance and Lemma 2.37. Asr n → ∞ this is lower semicontinuous we get F(f n ) → g † and by continuity and injectivity of F thatf n → f † . Therefore This implies D(r) → 0 when r → ∞ since D(r) is monotonically decreasing (as noted earlier).
Now it is very easy to prove that a VSC is fulfilled for all f † .
Proof. By Lemma 2.38 we know that lim r→∞ D(r) = 0. Therefore the function ψ as defined above is a concave index function by Lemma 2.26, as we can setκ(r) = D(r) andφ r (t) = rΦ(t) for r ≥ 0. As for all r ≥ 0 the claim follows by minimizing the right hand side with respect to r.
Note that the proof of this theorem is, with Lemma 2.38, in principle constructive. The main challenge here is to compute the function D(r). In a Hilbert space setting a good choice of the benchmark function is Φ(t) = t -also we know that a VSC with ψ(t) = rΦ(t) will only be fulfilled if f † ∈ arg min R (see Proposition 2.34) -as then the computation of D(r) is reduced to a quadratic optimization problem.
For the simple example considered in this chapter starting with Examples 2.7 the optimization problem can be solved explicitly.
Therefore one obtains Inserting f † n = 1/n hence yields which shows that a variational source condition with ψ(t) = ct 1 3 is fulfilled (as already seen in Example 2.31).
For non-Hilbert space settings however solving the involved optimization problem is a difficult task on its own as the computation of D(r) for each r is equivalent to solving a Tikhonov functional. Hence verifying VCSs from the explicit computation of D(r) seems not to be a viable strategy.

CHAPTER III EQUIVALENCE RESULTS IN HILBERT SPACES
An old French mathematician said: A mathematical theory is not to be considered complete until you have made it so clear that you can explain it to the first man whom you meet on the street. This clearness and ease of comprehension, here insisted on for a mathematical theory, I should still more demand for a mathematical problem if it is to be perfect; for what is clear and easily comprehended attracts, the complicated repels us.

DAVID HILBERT in "Mathematical Problems"
In this chapter we study the simplest setup of inverse problems, namely linear inverse problems in Hilbert spaces as published in [HW17a]. As reviewed in Section 2.1 convergence rate theory for such problems has been studied for some time. Most importantly several conditions have been shown to be equivalent to convergence rates, the first major results are in [Neu97] for Tikhonov regularization with Hölder rates. Combining results of [FHM11,Fle12] shows that for certain index function one of these conditions is that a VSC is fulfilled. The proof requires the concept of approximate source conditions we briefly touched in Section 2.4.4.3.
In Section 3.1 we will show equivalence of VSCs to a generalization of a condition introduced in [Neu97]; namely the decay of the spectral projections. By showing equivalence of VSCs on the one hand and the speed of the spectral decay on the other we sidestep the usage of approximate source conditions. Neubauer further proved that the speed of the spectral decay is equivalent to the approximation quality of Tikhonov regularization. The approximation property can be seen as a noise-free convergence rate as it measures how fast f α → f † with respect to α where the observed data is given by g obs = g † . This equivalence result was recently extended to more general regularization methods and index functions measuring the approximation rate in [AEdHS16]. These new results, however, exclude iterative regularization methods. In Section 3.2 we illustrate that under slightly different assumptions on the regularization methods which include prominent iterative methods the equivalence remains valid.
By Theorem 2.20 we hence know that the three conditions VSC, spectral decay and approximation quality are all sufficient in order to get convergence rates. We will show in Section 3.3 that for the deterministic error model they are even necessary. Results of this type -showing that a sufficient condition is even necessary -are called converse results. Our proof will be done in a two step approach: first we prove equivalence to convergence rates with an oracle parameter choice and then to rates for more general parameter choice rules.
Another converse results is due to [And15]. It shows that the set of functions where the spectral decay is of Hölder type can be characterized as an interpolation space. We point out in Section 3.5 that for many interesting applications these interpolation spaces are actually Besov spaces even for infinitely smoothing operators. This underlines the typical interpretation of source conditions as smoothness assumption as discussed in Example 2.3 and extends it to VSCs.
We further derive a converse result for the white noise error model in Section 3.4. In statistics one is typically interested in the following converse result: one wants to characterize the maxisets of the estimator, i.e. the maximal set where the considered estimators converge with a certain speed. However this is already covered by our result and we can characterize the maxisets for many estimators as Besov spaces. This fits well into known results as in statistics maxisets of wavelet methods for the estimation of the density of i.i.d. random variables have been characterized as Besov spaces in [KP93]. Furthermore for thresholding and more general wavelet estimators maxisets have been investigated in [KP00,KP02], and their results have been generalized in [Riv04] for inverse problems in a sequence space model and again Besov spaces were obtained as maxisets.
Lastly, in Section 3.6 we summarize our results of this chapter by applying them to a set of important inverse problems. Thereby we connect VSCs, smoothness assumptions and convergence rates for deterministic and white noise errors respectively.

Spectral Tail Condition
We start by recalling Theorem 2.30 which shows that for certain index functions ϕ a spectral source condition (2.2) implies that a VSC is fulfilled. In the proof we use the spectral source condition to obtain the estimate where the operator P j is given by the spectral projection and ). 2 Specifying to the Hölder case this means that the spectral source condition f † = (T * T) ν/2 w implies E j f † ≤ cj ν/2 and a convergence rate of δ ν/(ν+1) . Between the two implications the following relation was observed in [Neu97] for ν ∈ (0, 2]: that is the decay rate of the spectral distribution function j → E j f † is necessary and sufficient in order to obtain Hölder convergence rates. It was further shown that E j f † = O(j ν/2 ) implies a Hölder-type source condition f † = (T * T) µ/2 w with 0 < µ < ν, but not with µ = ν. As a specific case for this we refer to the Examples 2.7 and 2.31.
But combining Lemma 2.29 and Theorem 2.24 shows that already an estimate of the form E j f † ≤ ϕ(j) is enough in order to get a VSC. As VSCs are sufficient for (not only Hölder-) rates we will -motivated by [Neu97] and Lemma 2.29 -investigate the relation between VSCs and the decay of the spectral distribution function. To this end we define for an index function ϕ the space that measures the decay rate of the tail of the spectral distribution of elements in X . The nomenclature "tail" here is rooted in considering compact operators where I − E j for j > 0 is a projection operator onto a finite dimensional subspace of X .

71
since Θ ϕ is monotonically increasing. The concavity of ϕ 2 yields the inequality ϕ 2 (ct) ≤ max{1, c}ϕ 2 (t) which applied to the last line of the previous equation provides the claim.
The main idea is to use Theorem 2.24 and Lemma 2.29(a).
Due to the monotonicity of t µ−1 ϕ(t) 2 we can apply Lemma 2.29(a) to achieve (2.28c) with σ(j) = B( 1 Thus we can conclude by the meta-theorem that a VSC is fulfilled with Recall the remarks after Lemma 2.29 and Theorem 2.30: if ϕ is a Hölder type index functions, then ϕ 2 is concave for ν ∈ (0, 1] and t µ−1 ϕ ν (t) is decreasing if and only if ν ∈ (0, 1) and µ ∈ (0, 1 − ν), while logarithmic type index functions alway fulfill both conditions.

Approximation Property of Regularization Methods
Up to now we only considered Tikhonov regularization. If f 0 = 0, then it is given by More generally a large class of regularization methods in a Hilbert space setting is given by The function q α is called a filter, e.g. the filter for Tikhonov regularization is given by While further examples are discussed below we mention that e.g. the conjugate gradient method applied to the normal equation with a stopping rule is not of this form as this is nonlinear in the input data. Closely related to the filter is the residual function r α (λ) := 1 − λq α (λ) which measures the approximation error (or bias in the statistical context) of the regularization method given by In order to ensure the regularization property and favorable characteritics of the method we will impose the following assumption throughout the rest of this chapter on the filter and residual function: Part (a)-(c) are assumptions that guarantee regularization properties. Indeed R α is a bounded operator due to (a) and as a result we have stability. Furthermore we obtain that r α (0) = 1 by definition and therefore and thus the desired convergence property. While (a) and (c) are standard assumptions usually instead of (b) only a relaxed version of (3.2) -namely |r α (λ)| ≤ C 4 for C 4 ∈ (0, ∞) -is demanded to ensure the regularization property. However, we require later on that r α and q α do not change sign which is guaranteed by (b). Items (d) and (e) will be similarly needed for converse results. They require a smooth transition at λ = α between modes that are accurately reconstructed, i.e. modes with r α (λ) ≈ 0 or equivalently q α (λ) ≈ 1 λ , and modes of the order r α (λ) ≈ 1 which are neglected for stability reasons.
Example 3.3. The following provides a list of regularization methods which fulfill Assumption 3.2. If not stated otherwise we haveα = ∞.
More information on these methods can be found in [EHN96]. Well-known methods which do not satisfy the assumptions above are spectral cut-off (that is r α (λ) = 1 if λ ≤ α and r α (λ) = 0 else) as it violates (e) as well as the continuity in (b) and the ν-methods as they violate the monotonicity requirements in (b) and (d).
In [AEdHS16, Def. 2.1] similar assumptions on filter functions are imposed in order to obtain results as in the current and following section. Whereas (b) requires nonnegativity of r α they only impose that r 2 α is monotonically decreasing. At the same time they require continuity of α → r α (λ) which rules out iterative methods. Further k-times iterated Tikhonov regularization for k ≥ 3 does not fulfill their assumptions, as they require q α (λ) ≤ C 2 / √ αλ, which has been relaxed to r α (α) ≥ C 2 in Assumption 3.3. . Let a regularization method fulfill Assumption 3.2. Moreover, let ϕ be an index function such that there exist C > 0 and µ > 1 such that for all λ > 0, α ∈ (0,α]. Then the following are equivalent for all f † ∈ X : Furthermore the estimates where the last inequality follows from (3.2).
Let from now on α ∈ (0, T 2 ). Then using integration by parts we get The integral on the right hand side will now be split into two terms: For the first integral note that λ → (−r α (λ) 2 ) and λ → E λ f † | X 2 are monotonically increasing, which implies by (3.2). For the second integral we have where in the estimates we use f † | X T ϕ ≤ A, (3.3) and Assumption 3.2(b) and (e) respectively. Combining the previous estimates we arrive at Using again (3.3) we get the estimate since ϕ is an index function and µ > 1 which implies the claim.

Convergence Rates Results
The previous two sections have pointed out that we have equivalence of VSCs, the decay of the spectral tail and approximation properties of regularization methods. We already know that the first condition implies optimal convergence rates under certain parameter choice rules for Tikhonov regularization by Theorem 2.20. In this section we go along the other direction: for certain classes of parameter choice rules optimal convergence rates in the deterministic setting imply optimal approximation properties.

Convergence rates for an oracle parameter choice
Under our assumptions on the filter function the following bounds hold on the norm of the corresponding regularization method.

Equivalence Results in Hilbert Spaces
Lemma 3.6. Let a spectral regularization method satisfy Assumption 3.2, then the bounds Proof. The upper bound follows by Assumption 3.2(a) and (3.2) since we deduce For the lower bound note that where we used Assumption 3.2(e).
Both bounds will be used to derive the following theorem: . Let a spectral regularization method satisfy Assumption 3.2 and let ϕ be an index function such that there exists a p ≥ 1 such that for all α > 0 and r ≥ 1 (3.4) (i.e. ϕ does not grow faster than polynomially). Then for f † ∈ X the following are equivalent: Furthermore the estimates Proof. (a) =⇒ (b): Using the upper bound of Lemma 3.6 the standard error split leads to

77
For δ ≤ Θ ϕ (α) choose α = Θ −1 ϕ (δ) ≤α, then we have we notice that only the middle term on the right is affected by a sign change of ξ. So if we take the supremum over ξ we may assume that this term is positive and therefore obtain the lower bound Replacing R α 2 with the lower bound of Lemma 3.6 we note that is monotonically increasing in α by Assumption 3.2(d), which implies that for all α * ∈ (0,α] we get
Note that the previous theorem involves a very strong performance concept, as

79
The advantage of our result is that it holds without further assumptions relating the index function ϕ with the chosen filter, see [AEdHS16, eqs. (23)&(24)]. We will however show in the next section that the above inequality can often be reversed while loosing only a constant and therefore gain the results of [Neu97, AEdHS16] with little further assumptions.

Convergence rates for quasioptimal parameter choice rules
The following characterization of parameter choice rules is due to [RH07]: • weakly quasioptimal for the regularization method (R α ) α if there exists a constant c > 0 and δ 0 > 0 such that • and it is called strongly quasioptimal for the regularization method (R α ) α if there exists a constant c > 0 and δ 0 > 0 such that In many cases the constant c appearing in the definition can be calculated explicitly. Example 3.9. For the two parameter choice rules introduced in Section 2.1.3 the following was validated in [RH07]: • The discrepancy principle is strongly quasioptimal for regularization methods with infinite classical qualification, see end of Section 3.2 for examples. It is however not even weakly quasioptimal for (iterated) Tikhonov regularization.
• The Lepskiȋ principle is weakly quasioptimal for all methods considered in Example 3.3.
The following lemma shows that for continuous regularization methods the notion of weak and strong quasioptimality coincide.
(c) Let R α be given either by Landweber iteration with µ T * T < 1 or by Lardy's method. If f † = 0 the size of the gaps of ∆( f † ) on a logarithmic scale is bounded by ln γ with By definition of the operator norm for every ε ∈ (0, 1) (even for Indeed let T = U(T * T) 1/2 be the polar decomposition of T with a unitary operator U : X → ran(T) ⊂ Y. As (ker(R α )) ⊥ ⊂ ran(T) we may assume that ξ ∈ ran(T). Let ζ ∈ X be such that ξ = Uζ , then By Halmos version of the spectral theorem (see [Hal63]) T * T is unitary equivalent to a multiplication operator M g : L 2 (Ω, µ) → L 2 (Ω, µ), (M g h)(x) = g(x)h(x) for all x ∈ Ω on a locally compact space Ω with positive Borel measure µ and a nonnegative function g ∈ L ∞ (Ω, µ), that is T * T = W * M g W for some unitary operator W : X → L 2 (Ω, µ). Therefore we have By (3.2) we have r α ≥ 0 and q α ≥ 0 for all α > 0, hence nonnegativity of the right hand side of the previous equation can be ensured if (W f † )(x)(Wζ )(x) ≥ 0 for µ-almost all x ∈ Ω. This can be achieved by replacing (Wζ )(x) by s(x)(Wζ )(x) where s : Ω → {−1, 1}. Hence replacing ξ be UW * (s · (Wζ )) shows that (3.6) holds true.
With these choices of α , ξ we then obtain inf 0<α≤α sup ξ | X ≤δ As r α (T * T) f † | X is monotonically increasing in α and R α ξ | X monotonically decreasing in α we get that As for x, y such that x, y ≥ 0 the inequality ( x + y ) 2 ≤ 2 x + y 2 holds true we thus obtain by (3.6) that inf 0<α≤α sup ξ | X ≤δ As the above holds for all ε ∈ (0, 1) this shows the first claim. It remains to investigate the properties of ∆( f † ): Proof of (a): By Lemma 3.6 we have that (1 − C 3 ) 2 /α ≤ R α 2 ≤ C 1 /α. We have r α (T * T) f † | X ≤ f † | X by (3.2) so we only have to ensure that r α (T * T) f † = 0 for all α close to 0 in order to get the claim. By Assumption 3.2(a) we have that Hence Proof of (b): If α → q α (λ) is continuous then also α → r α (λ) is continuous and by Lebesgue dominated convergence theorem α → r α (T * T) f † | X and α → R α are continuous as well. As the above estimate on r α (λ) for α > C 1 λ implies for f † = 0 that r α (T * T) f † = 0 the statement follows from the intermediate value theorem and Lemma 3.6.
Proof of (c): For Landweber iteration and Lardy's method we can rewrite Using Lemma 3.6 we obtain the bound R 1/(n+1) / R 1/n ≤ √ C 1 /(1 − C 3 ), which gives a bound on the quotient of the denominators of neighboring δ n ( f † ).
For Landweber iteration quotients of enumerators of δ n ( f † ) are bounded by This shows that γ is finite in both cases.

White Noise Error Model
The previous section has dealt with the case of deterministic noise g obs = T f † + ξ for some ξ ∈ Y. We will now look into the case of white noise meaning the observed data is given by with a white noise process Z on Y as introduced in Example 1.3. Note that f α − f † | X 2 is a random variable in this case and hence not a useful error measure if one wants to evaluate the performance of the regularization method as ε 0. We will use the expected square error as a loss function. This leads to the following bias-variance decomposition which replaces the splitting of the error into an approximation and a propagated data noise part: (3.7) By the linearity of the expectation and the properties of Z we have that so by Section 3.2 the bias term can be controlled by assuming that f † ∈ X T ϕ for some index function ϕ.
Note that as R α R * α = q 2 α (T * T)T * T this requires the forward operator T to be at least compact 3 , as it has to be of trace class. Thus the main difference between this and the previous section is that the noise is not described by the maximum but the sum of the eigenvalues of R α R * α . Often the sum grows faster than the maximum as α → 0 and the rate does not only depend on the regularization method but also on the distribution of the eigenvalues of T.
To handle the variance term we will therefore assume that there exists a constant D ≥ 1 and a continuous, monotonically decreasing function v : (0, ∞) → R such that with limits lim α→0 v(α) = ∞ and lim α→∞ v(α) = 0. In comparison with the deterministic case in Lemma 3.6, we see that if we replace E[ R α Z 2 ] by R α 2 we could choose v(α) = 1/ √ α with a constant D depending on the specific regularization method. Moreover, we will assume that v does not grow faster than polynomially as α 0 or, equivalently, that the inverse function v −1 : (0, ∞) → (0, ∞) does not decay faster than polynomially at infinity, i.e. it satisfies for some q ≥ 1. A way to calculate v has been given in [BHMR07]; under certain conditions one gets that , that is the variance of the estimator behaves like the variance of the spectral cut-off estimator for which explicit expressions have been derived.
Let ϕ be an index function such that (3.4) holds true. Then for f † ∈ X the following statements are equivalent: Furthermore the estimates and using the bias-variance decomposition we obtain by (a) as well as (3.8a) that The infimum of the right hand side is approximately attained if ϕ(α) = εv(α) or (b) =⇒ (a): Using again bias-variance decomposition and the lower bound in (3.8a) yields that Note that by Assumption 3.2(d) the first term is increasing in α while by our assumptions on v the second term is decreasing. Hence for all α * ∈ (0,α]. For the choice so the minimum has to be attained at the first argument. Solving α * for ε, abbreviating and inserting into the resulting equation yields by the definition of ψ ϕ,v . The growth restrictions (3.8b) and (3.4) now imply that showing (a) for all α of the given form.
Note that the previous proof is essentially the same as the proof of Theorem 3.7, the only difference being that the latter treats the special case v(α) = 1/ √ α. Remark 3.13. If Assumption (3.8a) is relaxed to with v ± functions with the properties as v in (3.8) where possibly lim α→0 ( v + v − )(α) = ∞, then an inspection of the proof shows that Theorem 3.12(a) ⇒ sup This is especially relevant for operators T with exponentially decaying singular values. While (3.8a) can be verified for polynomially decaying singular values by results in [BHMR07], for the asymptotic behavior σ j = exp(−cj β ) for some c, β > 0 one can only verify the relaxed condition above with v − (α) = c − α −1/2 and v + (α) = c + α −1/2−ν (3.9) for any ν > 0 and some c − , c + > 0 easily. However, for such operators Theorem 3.12(a) is typically only fulfilled for ϕ of logarithmic type, that is of the form (2.3b) for some p > 0, see Example 2.3. In these cases one has independent of the choice of ν ∈ (0, ∞), see [Mai94]. Therefore the equivalence in Theorem 3.12 still holds true with either v = v ± .

Interpretation of Maxisets
The previous sections have illustrated that f † ∈ X T ϕ does not only yield convergence rates results but converse results also hold true. So the question arises whether these spaces have a more natural, that is operator independent characterization. We will start with results from [And15] showing that the spaces can be regarded as interpolation spaces in Hilbert scales before showing that these spaces are Besov spaces for a wide class of operators.
We will build on the ideas and notation of Section B.2: we set X 0 := X and X 1 := (T * T) k X , that is f | X 1 := (T * T) −k f | X , for some k ∈ (0, ∞). As X 1 ⊂ X 0 the spaces are obviously compatible. Then for fixed t > 0 we get that As this is a convex and coercive minimization problem we know that there exists a unique solution f t of the minimization problem. The first order optimality conditions establish Using the spectral measure we hence obtain

Equivalence Results in Hilbert Spaces
Therefore for any θ ∈ (0, 1) this results in This form will be used to prove the following: Lemma 3.14 (see [And15, Prop. 2.2]). Let k > s > 0 and set ϕ(t) = t s . Then we have X T ϕ = (X 0 , X 1 ) s/k,∞ with equivalent norms.
Proof. Note that for j > 0 we have Thus we get for f † ∈ (X 0 , X 1 ) s/k,∞ , that since λ ≤ j and the integrand is monotonically decreasing in λ. Furthermore it follows that by (3.10). Taking the supremum over j > 0 on the left hand side reveals f † ∈ X T ϕ . If on the other hand f † ∈ X T ϕ , we have for each t > 0 that
Recall that by (B.1) that for θ ∈ (0, 1) and l ∈ R if the underlying manifold is smooth enough. Hence if ran((T * T) k ) = H l , then we get that X T id s = B ls/k 2,∞ , that is we found an interpretation of the spectral decay space in terms of more classical smoothness spaces.
We will extend this result further by assuming that T : X = L 2 (M) → Y, where M fulfills Assumption B.17, is such that with Λ meeting the following conditions: For such operators we obtain the subsequent characterization of maxisets: , then X T ϕ s = B s 2,∞ with equivalent norms.
Denote by E T λ the spectral projection with respect to the operator T * T and by E S λ the spectral projection with respect to the operator S * S. For t ∈ (0, t −1/2 0 ) we then obtain by the substitution t = ϕ(j) that As for Taken together the last two inequalities illustrate that we can conclude by using the first inequality that f | X S id s ≥ f | X T ϕ s . Therefore X S id s = X T ϕ s and the norms coincide. In summary this implies that if we choose k > s then by Lemma 3.14.

Examples
The previous sections have yielded several equivalence results for convergence rate theory in Hilbert spaces. In order to conclude this chapter we now apply these results to a set of well studied inverse problems in order to illustrate our findings.

Operators in Sobolev scales
In the following we describe a fairly general class of problems. It contains convolution operators (if M = R d or M = T d ), for which the convolution kernel has a certain type of singularity at 0 as well as boundary integral operators, injective elliptic pseudo-differential operators and compositions of such operators.
Theorem 3.17 ([HW17a, Thm. 7.1]). Let M be a d-dimensional manifold satisfying Assumption B.17, and let T be an operator which is a-times smoothing (a > d/2) in the sense that T : H t (M) → H t+a (M) is well-defined, bounded and has a bounded inverse for all t ∈ R. We will consider T as an operator from L 2 (M) into itself, i.e. X = Y = L 2 (M) and a spectral regularization method with classical qualification (see Remark 3.5) µ 0 ≥ 1 satisfying Assumption 3.2. Then the following statements are equivalent for all f † ∈ X \ {0} and s ∈ (0, a): (a) f † satisfies a VSC (2.26) with ψ(t) = ct s s+a for some c > 0.
(b) f † ∈ B s 2,∞ (M). (c) For a quasioptimal parameter choice ruleᾱ and a regularization method for which ∆( f † ) meets (3.5) we have In addition, (b)-(d) are equivalent for all s ∈ (0, 2aµ 0 ), and furthermore the assumption a > d/2 can be relaxed to a > 0 if (d) is neglected.
Example 3.18. We will now have a look at the example studied in the Examples 2.7, 2.18 and 2.31 in the light of the previous theorem. If M = S, then the Fourier transform F is a mapping from L 2 (S) → 2 (Z) and a convolution operator T h f = h * f can be written as T h f = F * (ĥ ·f ). Via relabeling we can identify 2 (Z) and 2 (N) and hence we see that T as in the examples can be seen as a convolution operator with the desired properties which is 1-times smoothing. As (after relabelling) (∑ n∈N n 2s |f (n)| 2 ) 1/2 is an equivalent norm on H s (S) we see by Example 2.7 that a spectral source condition f = (T * T) ν/2 w for some w ∈ L 2 (S) is equivalent to f ∈ H ν . Therefore we can infer for the considered f † that f † ∈ H s for s < 1/2. On the other hand Example 2.31 illustrates that f † ∈ B s 2,∞ if and only if s ≤ 1 2 as (j 2s ∑ n>j |f (n)| 2 ) 1/2 is an equivalent norm on B s 2,∞ . This shows that the convergence rates obtained from the variational source condition are optimal.

Backward Heat Equation
Let us consider the heat equation on a manifold M satisfying Assumption B.17: The backward heat equation is the inverse problem to estimate the initial temperature f from observations of the final temperature g † = u(·, τ). This fits into the framework (3.11) with the function Λ BH (t) = exp(−2τt).
This results from Theorem 3.1 and (2.25b).
This can be inferred from Theorems 3.4 and 3.11.
Using the results of [BHMR07, § 5.1] and applying Remark 3.13 and Theorem 3.12 it is evident that (3.9) is fulfilled for any ν > 0.
Example 3.20. We want to highlight again the difference between spectral and variational source conditions. It has been shown in [Hoh00] that for the backwards heat equation on S we have with ϕ p as in (2.3b). At the same time the previous theorem shows that the convergence rate as the one implied by the spectral source condition with ϕ p is obtained on the set B 2p 2,∞ (S), hence spectral source conditions fail to predict the convergence rate for functions f † ∈ B s 2,∞ \ H s correctly. Due to the embeddings H s ⊂ B s 2,∞ ⊂ H s−ε for all ε > 0 this difference seems small but nevertheless contains important functions. If we consider the case of f † (t) = 1, if t ∈ (−π/2, π/2) and f † (t) = 0 else as a prototypical function that is smooth up to jumps we see that it Fourier coefficientŝ f † (n) ∼ 1 n for n = 0. As argued in Example 3.18 this shows that f † ∈ B 1/2 2,∞ \ H 1/2 and hence for this class of functions spectral source conditions fail to predict the correct rate.

Sideways Heat Equation
We now consider the heat equation in the interval [0, 1]. We may think of [0, 1] as the wall of a furnace where the right boundary 1 is the inaccessible interior side and 0 the accessible outer side. We assume the left boundary is insulated and impose the no-flux boundary condition ∂ x u(0, t) = 0. The forward problem reads (3.14) We will consider the inverse problem to estimate the temperature f (t) = u(1, t) at the inaccessible side from measurements of the temperature g(t) = u(0, t) at the accessible side for all times t ∈ R. As shown in [Hoh00] this fits into the framework (3.11) if we set . For spectral regularization methods satisfying Assumption 3.2 and the forward operator T : L 2 (R) → L 2 (R) such that T * T = Λ SH (−∆) of the sideways heat equation the following statements for s > 0 and f † ∈ L 2 (R) \ {0} are equivalent: (a) f † ∈ B s/2 2,∞ (R). (b) f † satisfies a VSC (2.26) with an index function ψ(t) = c(− log(t)) −2s (1 + o(1)) for some c > 0.
Satellite Gradiometry Let us assume that the earth is a perfect ball of radius 1. The gravitational potential u of the earth is determined by its values f at the surface by the exterior boundary value problem (3.15) In satellite gradiometry one studies the inverse problem to determine f from satellite measurements of the rate of change of the gravitational force in radial direction at height R > 0, i.e. the data is described by the function g = d 2 u dr 2 | RS 2 . As shown in [Hoh00] this fits into the framework (3.11) if we set Note that Λ SG (unlike Λ BH and Λ SH ) is not globally monotonically decreasing unless R is large enough (one needs R ≥ exp((4 √ 2 + 2)/( √ 2 + 5)) ≈ 3.3, which is not realistic in application).

CHAPTER IV OPTIMAL CONVERGENCE RATES IN BESOV SPACES ON THE TORUS
In Riemann, Hilbert or in Banach space Let superscripts and subscripts go their ways.
Our asymptotes no longer out of phase, We shall encounter, counting, face to face.
Verse of "Love and Tensor Algebra" from "The Cyberiad" by STANISŁAW LEM.
In this chapter we study Tikhonov regularization in the scale of Besov spaces on the torus. For readers unfamilar with Besov spaces we recommend to read Appendix B.3 first. Besov space regularization is often implicitly employed when using the weighted sum of a wavelet expansion as a penalty term, as the Besov norm can be expressed via wavelet coefficients as defined in (B.5). Such penalties with wavelet Besov norms with small index p are frequently applied to enforce sparsity (see e.g. [DDDM04,RR10]). For arbitrary p ∈ (1, 2] and q ∈ (1, ∞) choose the functional setting given by The idea behind picking p ∈ (1, 2] is to keep the idea of "sparse" representations of the minimizers of the Tikhonov functional. In this case sparse means a faster decay of the coefficients and not necessarily only a finite number of nonzero coefficients. On how to choose q no guideline seems to exist in the literature; the most common choice seems to be q = p for simplicity. Here we will study the whole range of possible values. Recall that our strategy Theorem 2.24 to verify a VSC relies on Assumption 2.22 which allows us to bound the Bregman distance from below by a norm power. This assumption can be fulfilled if X is r-convex (see (2.27)), so we set r := max{2, p, q} as Besov spaces B s p,q are known to be convex of power type r (see [Kaz13]).
In the deterministic setting Tikhonov regularization will hence be given by (4.1a) Convergence properties of minimizers of this functional were already studied in [DDDM04] but no convergence rates were obtained. Rates of the order O( √ δ) and O(δ 2/3 ) based on the range conditions of Section 2.2 have been derived in [LT08] and [RR10] respectively.
As our goal is to derive order optimal convergence rates we will first study lower bounds on regularization methods in Section 4.1 whose derivation will be similar to Lemma 2.11 and present an idea on how to compute them.
The application of Theorem 2.24 requires knowledge of subgradients. Therefore we will study whether additional smoothness on f † leads to additional smoothness of f * ∈ ∂R( f † ) in Section 4.2.
In Section 4.3 we move to a statistical setting where the error is given by Gaussian white noise. Hence the Tikhonov functional has to to be altered as described in Example 1.3(b) to arg min where Z is a white noise process on L 2 (T d ).
(4.1b) For statistical inverse problems convergence rates have been considered for methods based on wavelet shrinkage. Here minimax optimal rates under Besov smoothness assumptions have been achieved, see [CHR04, DJKP95, KPPW07, KMR06]. The advantage of our appoach, however, is that fewer assumptions on the operator are required -it even works for nonlinear operators.
In Sections 4.4 and 4.5 we then derive convergence rates via variational source conditions for finitely and infinitely smoothing operators respectively. It will turn out that our strategy leads to order optimal rates for q ≥ 2.

Lower Bounds on Convergence Rates
Recall that the modulus of continuity was defined as In a certain sense the modulus of continuity gives the best possible stability estimate fulfilled by F −1 on the set K. If F = T is linear, then the modulus of continuity is alternatively often defined as Note that these moduli behave quite similarly since and if K is given by some ball around 0, then K − K = 2K.
Lemma 4.1. Let R(·) = 1 q · | X q for some q > 1, T : X → Y linear and K ⊂ X such that K = −K. Then any reconstruction method R : Y → X fulfills the following lower bound: Proof. Let f ∈ K such that T f ≤ δ and set f * ∈ ∂R( f ). Setting g obs = 0 we see that T f − g obs ≤ δ is fulfilled. By symmetry of K and linearity of T we also have − f ∈ K with T(− f ) − g obs ≤ δ and − f * ∈ ∂R(− f ). Hence by the equality condition in Young's inequality. By Theorem B.7 we get and as inf R R(R(0)) = 0 this proves the claim. For a nonlinear operator F where the penalty term meets Assumption 2.22 one can obviously conclude with Lemma 2.11 that In order to prove lower bounds one hence needs to find good estimates from below of the modulus of continuity. The following theorem is built upon [DDDM04,Prop. 4.6] and estimates the decay of the modulus of continuity in the case where the data has a nice representation in the Fourier domain.
Then, if F(0) = 0, the modulus of continuity satisfies Proof. For w ∈ Z d let k ∈ N 0 be uniquely defined by w ∈ Γ k and set One immediately calculates that Hence f w ∈ K for all w ∈ Z d and since 0 ∈ K we obtain Note that this theorem can be easily adapted to any other orthonormal system in L 2 that provides an unconditional basis for Besov spaces.

Subgradient Smoothness
The general strategy to verify VSCs formulated in Theorem 2.24 requires knowledge on the subgradient f * ∈ ∂R( f † ) at the true solution. It is, however, more natural to assume a priori knowledge on the true solution f † . If X is a Hilbert space and R(·) = 1 2 · | X 2 , then (and only then) the mapping f † → f * is equal to the identity mapping and hence a priori knowledge on one function immediately transfers to the other. In the more general case that R(·) = h( · | X ) we get that f * ∈ J X ,h ( f † ) under mild assumptions, see Theorem B.7. The continuity properties of the duality mapping have been studied for some time, see e.g. [CSZ07] and references therein (recall from Proposition B.8 that a different choice of h only results in a different scaling of the mapping). Much less is however known about the following question: If f † ∈X ∩ X for some spaceX , does there exist a spaceX such that f * ∈X ∩ X * ?
In the language of convex analysis we can rephrase this as: Are there linear subspaces that are mapped into linear subspaces by the duality mapping? In the inverse problems context this means: Does the a priori knowledge of f † ∈X give some insight on f * ? This question is not only of interest for the proposed strategy, it also appears naturally in the study of the range conditions introduced in Section 2.2.

Smoothness of Besov norm subgradients
Before answering the question in the context of Besov spaces, we will take a look at sequence spaces. For this let I = I 1 × I 2 be an index set and w : I 1 → R be a weight function (that is w(α) > 0 for all α ∈ I 1 ). Define for p, q ∈ [1, ∞] the space q w ( p ) as the set of all sequences λ = (λ α,β ) (α,β)∈I for which the norm is finite with the usual modification if p = ∞ or q = ∞. Note that this definition can also be extended to p, q ∈ (0, 1), although the spaces are then no longer Banach spaces. For p, q ∈ [1, ∞) one easily checks that the dual space is given by ( In the context of subgradients, one sees that if p, q ∈ (1, ∞), then 1 r · | q w ( p ) r is Gateaux differentiable for all r > 0. Thus Lemma A.7 yields µ = (µ α,β ) (α,β)∈I ∈ (4.2) This allows us to answer the initial question of this section in the following way: Proposition 4.3. Let p 1 , q 1 ∈ (1, ∞), p 2 , p 3 , q 2 , q 3 , r > 0 and w 1 , w 2 , w 3 be weight functions. Let λ ∈ q 1 w 1 ( p 1 ) and µ ∈ ∂ 1 r λ | q 1 w 1 ( p 1 ) r . Assume that the parameters are related as follows: In both cases the norm of µ in Proof. Let λ ∈ q 2 w 2 ( p 2 ), then by using the explicit form of the subgradient (4.2) we see that A := µ | q 3 w 3 ( p 3 ) q 3 / λ | q 1 w 1 ( p 1 ) q 3 (r−q 1 ) is given by If (a) holds, then the first sum over I 2 will be taken to the power zero and therefore not influence the product. The relation of the parameters then yields which implies the expression for the norm. If we have (b), then and hence the norm results holds true.
For the reverse direction use that by Corollary A.15 and replace the parameters accordingly.
The result of Proposition 4.3(a) has already been worked out for q 2 = p 2 independently in [RR10] and [LT08] in the context of range conditions of Section 2.2. While the former focuses more on the aspect of the form of abstract sequence spaces as we did here the latter provides a more specific interpretation for Besov spaces; however, both results miss the reverse direction.
Let S : Z → X be a linear and bounded mapping, then for all h ∈ Z the subdifferential chain rule (see Proposition A.8(b)) yields We can now extend the result into the scale of Besov spaces and unify the results of [LT08,RR10] with [WSH18, Thm 3.5]. Recall that the wavelet norm on Besov spaces is given by where λ = W f and W is the wavelet transform defined in (B.5a) for either R d or T d .
The consideration above does not only hold for Besov spaces on R d or T d but whenever a wavelet system with the properties of (B.4) exists. If e.g. Ω ⊂ R d is a sufficently nice open domain one can show that a wavelet system carrying over the main properties of (B.4) exists, see [Tri08, Thm. 2.33], and that the characterization of Besov spaces (B.5) remains valid, see [Tri08, Thm. 3.13 and Prop. 3.21]. The difficult part is to compute this wavelet system explicitly.
The interesting case of the corollary above is if B s 2 p 2 ,q 2 ⊂ B s 1 p 1 ,q 1 as in these cases B s 3 p 3 ,q 3 is also a proper subspace of B −s 1 p 1 ,q 1 ; otherwise the explicit expression of the norm might still be useful. Note that if q 2 = ∞, that is f † is in the largest space with smoothness s 2 , then also q 3 = ∞, so f * is again in the largest space with fixed smoothness s 3 .
From now on we will always assume sufficient smoothness of the wavelet system in the sense of the previous corollary without further mentioning.
As the definition of the norm influences the form of the subgradient an immediate question that arises is whether the result holds true for other norm definitions as well. While we cannot give a complete answer we will show that at least for the norm defined via (B.3) one obtains the same smoothness results for T d and R d with the help of multiplier theorems, see the remark below. Hence the advantage of using the norm definition via orthogonal wavelets is in this case to provide a simple and self-contained proof of the subgradient smoothness result.

105
In comparison with the proof of Proposition 4.3 the difficulty lies in the fact that The first idea that may come to one's mind is to use the convolution theorem in order to estimate This is, however, an insufficient approach: On T d we would obtain that F * ϕ j | L 1 ∼ j d by [Ton10] which would have to be compensated by a loss of ε-smoothness for some ε > 0. Even worse on R d we get that F * ϕ j ∈ L 1 for all j ∈ N 0 so this ansatz would not be useful at all. The remedy is the usage of multiplier theorems which follow from the Marcinkiewicz multiplier theorem: uniformly for all n > 0, so the claim follows as above. Hence for both cases the result of Corollary 4.4 remains true.

Jackson-type inequality
We now return to our motivation for studying properties of subgradients. If one chooses as an operator familiy P j in Theorem 2.24 a projection onto subspaces whose elements have compact support in Fourier domain, then explicit bounds on κ in (2.28a) follow from exploiting the smoothness of the subgradients. In order to do so choose J = N 0 and define with f k := F * ϕ k F f and P 0 f := 0 with ϕ k as in (B.3). Note that these operators fulfill f j = (P j+1 − P j ) f and P j P k f = P min{j,k} f .
To distinguish different norms on Besov spaces B s p,q , we will write · | B s p,q F for the norm in (B.3); in case it does not matter which norm definition we use we will drop the subscripts of the norm. The following Jackson-type inequality will give an immediate expression for κ. Lemma 4.6. Let 1 < p, q < ∞, r = max{2, p, q} and s > 0. Let f † ∈ B s p,∞ and f * ∈ ∂ 1 r f † | B 0 p,q r W . Then there exists some constant c > 0 such that

Proof.
We can infer f * | B which proves the second part of the claim.
In other words, our choice of (P j ) j∈N 0 fulfills (2.28a) with κ(j) := c r−1 2 −js(q−1) which satisfies (2.28b). Note that the constant c > 0 only depends on the wavelet system and the parameters s, p, q. The proof just requires r ≥ q but the additional Assumption on r guarantees that Assumption 2.22 will be met.
Keeping the notation of Corollary 4.4, a different approach for the choice of the operator family is the projectionsP j defined bỹ which would give the same bounds as in the lemma. The difference then lies in how to prove an estimate of the form (2.28c): if the operator has good properties in Fourier domain it makes sense to use P j ; however, if it is easier to study the operator in wavelet domain one should useP j .

Abstract Convergence Rates for Gaussian White Noise
In this section we will extend the convergence rate theory of Theorem 2.20 to the white noise model (4.1b). In comparison to the deterministic case we have two difficulties to deal with: (a) The Tikhonov functional will not meet Assumption 1.4 as we cannot guarantee that inf f ∈dom(F) S(F( f ), g obs ) ∈ R. Hence the existence of minimizers has to be shown.
(b) Another difficulty in the application of Theorem 2.20 is that no global bound err on the noise level exists. As seen in Example 1.3 we only know that err(g) = ε Z, g − g † is a Gaussian random variable with E[err(g)] = 0 and V[err(g)] = ε 2 g − g † | Y 2 . It turns out that both problems can be overcome by a further assumption. In order to motivate this property we address the second issue first.
Assume that Y = L 2 (T d ). Then a Gaussian white noise process can be seen as a mapping Z : Ω → D (T d ) where (Ω, Σ, P) is the underlying probability space and D (T d ) the space of distributions on T d . In this case the following deviation inequality is known for white noise Z: where M Z is the median of Z | B −d/2 p ,∞ .
As err(g) = ε Z, g − g † it is natural in the light of the previous lemma to estimate the error functional for p ∈ (1, 2] via (4.4) and the right hand side will be bounded almost surely if g, g † ∈ B d/2 p,1 . Since we only need a bound on the error functional if g = F( f α ) this can be achieved by assuming that F(X ) ⊂ B d/2 p,1 . Still finiteness of F( f α ) − F( f † ) | B d/2 p,1 is not enough as control over the factor is needed. To formulate a general error bound as well as to establish existence of minimizers we will hence assume that the second factor can be estimated as follows: Assumption 4.8. The operator F, regarded as F : dom(F) ⊂ B 0 p,q → B d/2 p,1 is compact. Furthermore there exist constants C com , β, γ > 0 with β < 2 and γ < 1 − β/2 such that the inequality holds true for all f 1 , f 2 ∈ dom(F).
In many cases this assumption can be verified in the following way: Remark 4.9. Note that if Assumption 2.22 is met the previous assumption can be fulfilled by an interpolation approach if F is Lipschitz continuous into a space of higher smoothness and p ∈ (1, 2]. That is assuming that for someq ∈ [1, ∞] and some t > d/2 there exists a constant L > 0 such that F( f 1 ) − F( f 2 ) | B t p,q ≤ L f 1 − f 2 | X , then by standard interpolation (see Proposition B.14) we get that Now as · | B 0 p,2 ≤ · | L 2 for p ∈ (1, 2] we obtain with the Lipschitz property Assumption 4. This allows to generalize [WSH18,Prop. 4.8] in order to show that there exists a minimizer of the Tikhonov functional almost surely. Proof. As g obs = F( f † ) + εZ we have N Z := g obs | B −d/2 p ,∞ < ∞ almost surely. Then, picking any f 0 ∈ dom(F), we can estimate by Assumption 4.8 and Young's inequality. Thus the data fidelity functional can be estimated from below by holds true, thus we can estimate Hence there exists a constant D > −∞ depending on f 0 and N Z such that the Tikhonov functional can be almost surely bouned from below by As 2γ < 2 − β the last line tends to infinity as f | B 0 p,q → ∞. This shows that any minimizing sequence ( f n ) n∈N of the Tikhonov functional must be bounded in B 0 p,q and hence has a weakly convergent subsequence. Without loss of generality assume f n f * as n → ∞ for f * ∈ X . Since F is compact as a mapping into B d/2 p,1 by Assumption 4.8, we get that F( f n ) − F( f * ) | B d/2 p,1 → 0 as n → ∞ and therefore Together with the weak lower semi-continuity of · | B 0 p,q r it follows that f * is a minimizer of the Tikhonov functional.
Furthermore the following estimates of both noise level and rate are obtained. (a) There exists a constant c > 0 such that the effective noise level at F( f α ) is bounded by 1 The idea and the proof of this theorem are due to of Benjamin Sprung; further results presented in this and the following sections on the white noise error model have been obtained in cooperative research.

(b) The error bound
holds true.
Proof. For (a) note that due to Assumption 4.8 we obtain with (4.4) that By the image space convergence rate (2.23b) of Theorem 2.20 we can hence estimate by Young's inequality. Rearranging terms yields the bound on the effective noise level.
To prove (b), note that due to (2.23a) in Theorem 2.20 we have Together with the first part we get which proves the claim.
Note that the obtained error bounds are still random variables, but now the deviation inequality can be used in order to find deviation estimates as well as expectations for the error estimates again.
Lemma 4.12. Let the assumptions of Theorem 4.11 hold true. Then there exists a constant c > 0 such that for all t > 0 Proof. As Theorem 4.11(b) holds true we get that p ,∞ as 2 2−β−2γ ∈ (0, 1). Hence one obtains for all t > 0 that For the second part note that due to Assumption 2.22 we get that p ,∞ as r > 1. By linearity and monotonicity of the expectation and t ≥ 1 we have that The expectation on the right hand side can be estimated by the deviation inequality Remark 4.13. Note that the results of this sections are extendable to other probabilistic error models as long as a deviation inequality like in Lemma 4.7 with a sufficiently fast decay holds true.

Application to Finitely Smoothing Operators
In this section we assume that the forward operator F := F a is a-times smoothing for some a > d p − d 2 in the following sense: and for some L > 0 and all f 1 , f 2 in the weakly sequentially closed set D = dom(F a ) ⊂ B 0 p,q , which will be sufficient for rates in the deterministic error model. For the white noise error model we will strengthen (4.5a) and require that holds true. Due to the choice of parameters, (4.5a) and the continuous embedding of Lemma B.18(d) F a : B 0 p,q → B a p,q ⊂ L 2 is well-defined and continuous. A simple example which meets (4.5) is F a = (I − ∆) −a/2 in which case F a : B s p,q → B s+a p,q is bounded and boundedly invertible for all s ∈ R. More generally, it is fulfilled for injective elliptic pseudodifferential operators of order −a. In the case of a Fréchetdifferential nonlinear operator it was shown in [HM18, Lem. 2.9] that (4.5b) and (4.5c) follow from (4.5b) and (4.5c) with F a replaced by F a [ f † ] and the nonlinearity condition (2.15b). It is further shown that this allows to verify these conditions for well-studied inverse problems like the identification of reaction and diffusion coefficients or Hammerstein integral equations.

Verification of variational source conditions and convergence rates
We now validate a VSC for an operator fulfilling (4.5) via Theorem 2.24. The family of operators P j will be chosen as in (4.3).
To verify (2.28c) denote by S a the operator mapping f = ∑ k∈N f k → ∑ k∈N 2 ka f k . Then using the Besov norm · | B 0 p,q F given by (B.3), the relation to Lebesgue spaces (see Lemma B.18(c)) and (4.5b), we get the estimate (4.6) Setting φ(t) = √ t and γ ≡ 0 in (2.28c) it hence remains to bound P j f * | B a p ,2 by a function of j in order to get σ(j).
Here we use again the subgradient smoothness f * ∈ B s(q−1) p ,∞ with the norm bound of Lemma 4.6 as well as the assumption s ∈ (0, a q−1 ) to obtain This implies that we can choose σ(j) = c r−1 2 j(a−s(q−1)) in (2.28c) for some c > 0 independent of f † . Now Theorem 2.24 implies that a variational source condition holds true with ψ vsc (t) = inf j∈N 0 c r−1 2 j(a−s(q−1)) √ t + r 2 −js(q−1)r .
Choosing j such that 2 j ∼ ( / √ t) τ with τ = 1 s(q−1)(r −1)+a we can estimate Now use that for q ≤ 2 we have r = r = 2 while for q ≥ 2 we have r = q and r = q .
Remark 4.15. In Remark 2.27(a) we focused on applying our strategy to the skewed Bregman distance: If one had chosen β = 0, then we needed information on f * ∈ ∂R( f α ). For the problem under consideration they can again be obtained from the first order optimality condition if F = T is linear, as then If (4.5c) holds true, then T * (L 2 ) ⊂ B a 2,2 and we can use Corollary 4.4 to obtain that The difficult part is to find bounds on the norm of f * in this space.
Remark 4.16. For small changes of the penalty functional the following can be obtained: • Let the requirements of Theorem 4.14 hold true, but use X = Bs p,q and R(·) = 1 r · | Bs p,q r W fors ∈ R. Assuming that a * := a +s > d p − d 2 , then for f † ∈ B s p,∞ with s ∈ R such that s * := s −s ∈ (0, a * q−1 ) a variational source condition as in Theorem 4.14 holds true with ψ given as in the Theorem but s, a replaced by s * , a * respectively.
• Suppose the constraint f † ∈ D is incorporated in the penalty term R by replacing it by R : ∂χ( f † ) coincides with the normal cone to D at f † and while {0} ⊂ ∂χ( f † ) for all f † ∈ D it contains more elements if f † is not in the interior of D. In this case ∂ R( f † ) may contain elements of higher smoothness than ∂R( f † ) leading to faster rates of convergence (see [FH11] and [EHN96, §5.4]).

Convergence rates for deterministic error model
The variational source condition will now be used to derive convergence rates for deterministic errors.
Corollary 4.17. Let the assumptions of Theorem 4.14 hold true together with (4.5c). Then a minimizer of the Tikhonov functional (4.1a) exists. Furthermore, if for some c α > 0 we choose α byᾱ = c α −ν δ 2−2µ with µ, ν as in Theorem 4.14, then every minimizer fᾱ satisfies the error bounds and with a constant c independent of f † , f α , , and δ.
Proof. Minimizers of the functional exist according to Theorem 1.6 if Assumption 1.4 is met. Using weak topologies we see that only sequetial continuity of the forward operator is an issue. But this follows from (4.5c) and the compactness of the embedding B 0 p,q → B −a 2,2 . The convergence rate in Bregman distance is an immediate consequence of Theorem 2.20 as the parameter choice rule is just (2.24a) up to a constant. For the rate in norm note that Assumption 2.22 is fulfilled.
Of course the parameters s and describing the smoothness of f † are typically unknown in practice. As shown in Section 2.4.4.2, however, the convergence rate (4.7) can also be obtained without prior knowledge of s and by using either the discrepancy or the Lepskiȋ principle as an a posteriori parameter choice rule.
Convergence rates for white noise error model We now assume that the data is given by the white noise error model. To apply our results on existence of minimizers and convergence rates we need to prove Assumption 4.8 which will be done as outlined in Remark 4.9.  for all g ∈ B a p,q , since · | B 0 p,2 (T d ) ≤ · | L 2 (T d ) for p ≤ 2. Using (4.5d) and Assumption 2.22 one obtains that Combining both inequalities for g = F a ( f 1 ) − F a ( f 2 ) gives the claim.
Therefore the following rate results is achieved for the white noise data model; a deviation inequality would follow as outlined in Lemma 4.12 for the same parameter choice rule. Proof. Existence of f α follows from Theorem 4.10 and Lemma 4.18.
Concerning the rate: Inserting the form of the variational source condition into Lemma 4.12 one sees that with the notation of Theorem 4.14 that holds true by (2.25a) and rules for Fenchel conjugates. Hence for α =ᾱ we see that both summands are of the same order in and ε and the claim follows.

Lower bounds
We next show optimality of the rates in Corollary 4.17.
Theorem 4.20 (see [WSH18,Thm. 4.10]). Suppose that F a satisfies (4.5c) and F a (0) = 0. where we have chosen k ∈ N 0 such that the terms are balanced, i.e. 2 k ∼ ( δ ) 1/(s+a) . Now the claim follows by Lemmata 4.1 and 2.11 for optimality of rate in Bregman distance and norm respectively.
On optimality in Corollary 4.19 for q ≥ 2 see [WSH18,Cor. 4.12]. Remark 4.21. Let q ≥ 2. Then comparing the optimal convergence rates obtained in Corollary 4.17 with the optimal convergence rates of Theorem 3.17 it is evident that convergence rates of O(δ s a+s ) are obtained on B s p,∞ and B s 2,∞ respectively. As for p 1 ≤ p 2 one gets that B s p 2 ,∞ ⊂ B s p 1 ,∞ on T d . At first glance it looks like only the set for which optimal convergence rates are attained increases when p → 1. But for a specific solution f † the maximal smoothness index s max with f † ∈ B s p,∞ if and only if s ≤ s max (or s < s max ) depends on p. Consider a function f † which is smooth up to jumps, then s max = 1/p and hence in the Hilbert space case one obtains for such function the convergence rate O(δ 1 2a+1 ) while with the Besov space regularization approach one gets the rate O(δ 1 pa+1 ), i.e. for p < 2 one achieves a faster convergence rate.
Knowing that f † ∈ B s p,∞ implies optimal convergence rates for B 0 p,q regularization with q ≥ 2 by a variational source condition one could ask whether a variational source condition also implies smoothness of the solution similar to Theorem 3.1 (recall that for linear operators of the form T = F a the space X T ϕ could be interpreted as a Besov space, see Example 3.17).  Proof. Let f † (x) = ∑ j,m,l λ l j,m φ l j,m (x) be the wavelet decomposition of f † . Define f † j,l as f † j,l (x) = ∑ m λ l j,m φ l j,m (x). By assumption f † fulfills a variational source condition, Note that in this case the dual pairing can be explicitly calculated as p,q q and thus the Bregman distance is given by Rearranging terms and using the smoothing properties of the operator (4.5c), we therefore infer that there exists a constant c such that As for any p, q and s we have that f † j,l | B s p,q = f † j,l | B s p,∞ this implies Via norm estimates on p spaces we can conclude that Inserting into the equation above and rearranging yields 2 j as s+a 2 As c does not dependent on j and l this implies In summary we cannot show a similar equivalence as in Theorem 3.1; but at least for p = 2 extend it to nonlinear operators fulfilling (4.5)

Numerical validation
Let us consider a problem of the type (4.5) where F a : B 0 p,q (T) → L 2 (T) is given by F a := (I − ∂ 2 x ) −1 , that is we have a = 2, with a deterministic error model. The true solution f † is given by a continuous, piecewise linear function, therefore f † ∈ B s p,∞ for s ≤ 1 + 1/p. As for q = 2 the obtained convergence rates are of optimal order, we test for different values of p if they are also achieved numerically using the a priori choice (2.24a), that is α ∼ δ 2a s+a . Numerical computations were carried out in MATLAB. In order to make the implementation of the operator F a efficient we used the FFT on a grid with 2 10 nodes. For the Besov norm we used the wavelet decomposition of the Wavelet toolbox with periodic db7-wavelets. Data was first generated on a finer grid and then undersampled. To obtain the minimizer of the Tikhonov functional we used the extension of the Chambolle-Pock algorithm to Banach spaces with a constant parameter choice rule, see Section 1.2.3, where the iterations were stopped when the current step gets small compared to the first. Note that the steps of this algorithm become especially simple since the considered spaces are 2-convex, see Example 1.10. The duality mappings were evaluated with the help of (4.2).
We tested which convergence rate we observe if we choose R( f ) = 1 2 f | B 0 p,2 2 for different values of p. The results of this test are shown in Figure 4.1. Note that the observed rates coincide quite well with the predicted optimal rates for the tested values of p.

Application to Backward Heat Equation
Consider again the problem of the backward heat equation as given in (3.13) and denote the forward operator by T.
Proof. Analogous to the proof of Theorem 4.14 we apply Theorem 2.24 with the choice P j as in (4.3). Again we obtain by Lemma 4.6 and our assumptions that we can choose κ(j) = c r−1 2 −js(q−1) . In order to verify (2.28c) note that for z ∈ Γ k we have |z| ≤ √ d2 k , hence Therefore, we can estimate and hence can choose σ(j) = c r−1 e dτ2 2j , φ(t) = √ t and γ ≡ 0. This implies by Theorem 2.24 that a variational source condition with ψ vsc (t) = inf j∈N 0 c r−1 e dτ2 2j √ t + r 2 −js(q−1)r 4.5. Application to Backward Heat Equation

121
holds true. Now choosing j such that Note that the VSC behaves like as δ → 0, so one immediately obtains the following: Corollary 4.24. There exists a unique minimizer of the Tikhonov functional (4.1a) for the backward heat equation for any α > 0. Furthermore, the parameter choice rule α =ᾱ given by (2.24a) implies the error bounds with a constant c > 0 independent of f † , δ and .
Proof. This follows along the lines of the proof of Corollary 4.17, see also Lemma 4.26 below.
Concerning optimality one obtains the following: where we have chosen k ∈ N 0 such that 2 2k ∼ 1 dτ ln δ . The optimality then follows from the Lemmata 4.1 and 2.11.
Turning to the white noise error model a rather coarse interpolation bound is sufficient: Lemma 4.26 (see [WSH18,Lem. 5.3]). The operator of the backward heat equation fulfills Assumption 4.8 with β = 1 2 , γ = 1 2r and some C com > 0.

Proof. By interpolation we obtain
As p ≤ 2, the first factor can be bounded by T( f 1 − f 2 ) L 2 1 2 . To control the second factor we again use p ≤ 2 to get As there exists a constant c > 0 such that by the embedding properties of Besov spaces (Lemma B.18(d)) and Assumption 2.22. As the embedding B d p,q → B d/2 p,1 is compact this completes the proof.
Corollary 4.27. Suppose that f † ∈ B s p,∞ for some s > 0 with f † | B s p,∞ ≤ . Then f α in (4.1b) is well-defined almost surely, and for the parameter choice ruleᾱ = ε/2 it satisfies for all t ≥ 1 the following error bounds in expectation with a constant c > 0 independent of f † , and ε.
The other two infinitly smoothing operators in Section 3.6 will not be treated here for different reasons. For the sideway heat equation (3.14) and p = 2 the problem is that 1 | L 2p 2−p (R d ) = ∞, hence in order to proceed similarly as above a support constraint with finite measure on f † would have to be known a priori. For the satellite gradiometry (3.15) problem one can in principle use the approach above; however, we would need to introduce Besov spaces on the sphere S 2 which we would like to avoid here. One would then obtain that f † ∈ B s p,∞ for p ∈ (1, 2] fulfills a VSC for the choice of R(·) = 1 r · | B 0 p,q r with as might be expected from Theorem 3.22.

CHAPTER V CONVERGENCE RATES FOR THE REGULARIZED SCHRÖDINGER EQUATION
People think they don't understand math, but it's all about how you explain it to them. If you ask a drunkard what number is larger, 2/3 or 3/5, he won't be able to tell you. But if you rephrase the question: what is better, 2 bottles of vodka for 3 people or 3 bottles of vodka for 5 people, he will tell you right away: 2 bottles for 3 people, of course.
attributed to ISRAEL GELFAND in "Love and Math" by Edward Frenkel We consider the following scattering problem: given an energy E and some compactly supported potential f we want to find solutions u to the Schrödinger equation Corresponding inverse problems are to recover the potential f from measurements of u at known energy E. The inverse problems studied here will be made more precise below.

Convergence Rates for the Regularized Schrödinger Equation
Stability estimates for these problems were first studied in [Ale88,Ste90]. It is known that corresponding stability estimates have to be of logarithmic type (see [Man01,Isa13b]), that is the function Ψ in (2.20) has to be of the form (2.3b). Nevertheless in recent years stability estimates for these problems have been significantly improved by deriving an explicit dependence of the stability estimate on the energy E. While in first results the estimate increased exponentially with the energy [NUW13] more recent results show that is possible to have only a polynomial behavior of the energy [Isa10,IN12,Isa13a,INUW14,IN14]. This leads to so called Hölder-logarithmic stability estimates were the stability estimate consists of two parts: the first with a logarithmic dependence on the data that decays with increasing energy and a second with Hölder-dependence on the data that gets amplified with increasing energy.
In this chapter we want to improve our finding of [HW15]in two ways: we want to derive VSCs for the inverse problems under lower smoothness assumptions and make the dependence on the energy E of ψ explicit in a way that allows us to prove Hölder-logarithmic convergence rates. A similar result has been announced in [WH17].
While Section 5.1 offers a more explicit description of the first inverse problem with near field data, we discuss the choice of an appropriate penalty functional as well as the regularizing properties of the resulting Tikhonov functional in Section 5.2. In order to verify a VSC we will use the strategy presented in Theorem 2.24, and the main tool for the proof of the ill-posedness estimate (2.28c) will be the construction of complex geometric optics solutions which will be introduced in Section 5.3. Results based on these solutions to (5.1) will then be used in Section 5.3.2 together with trace estimates in Sobolev spaces and regularity estimates of boundary value problems to derive energy dependent bounds on the difference of two potentials for low frequencies in Fourier domain. The statement and verification of the VSC will then be given in Section 5.4; furthermore we extend our results to a second inverse problem with far field data and end with a discussion of the results.

Problem Description
It is well known that the equation (5.1) together with the Sommerfeld radiation condition, which ensures that the solution models outgoing waves, has a unique solution in H 1 loc for f ∈ L ∞ compactly supported; i.e. for supp( f ) ⊂ B r for some r > 0, where B R := {x ∈ R 3 : |x| < R} for all R > 0. Furthermore, physically meaningful potentials have to satisfy ( f ) ≥ 0. The solution can be calculated by finding a solution to the Lippmann-Schwinger equation below. Splitting the solution into an incident field u in , which is known a priori and satisfies (5.1) for f = 0, the total field u is the solution to is the free space fundamental solution. Efficient numerical implementations of this equation have been discussed first in [Vai00]; see also [Hoh01].
For the inverse problem choose R > r. Assume now that we can put a point source generating the incident field at each point y ∈ ∂B R and measure the corresponding total field solving (5.1) and (5.2) on ∂B R , i.e. we are able to measure the Green's function g(x, y) of the problem on ∂B R × ∂B R . The forward operator is hence defined as with X as introduced later on. For this inverse problem it has been shown in [Nac88, Thm. 1.2] that the solution is unique. As a difference to most results on stability estimates we point out that in our case Y = L 2 and that for g = F( f ) the norm g | L 2 can be interpreted as the Hilbert-Schmidt norm of the operator mapping incident field generated by sources on ∂B R to the total field on ∂B R ; see [LKK13,Sec. 4] for more details. For the stability estimates it is most often assumed that for all solutions u to (5.1) and (5.2) the normal derivative ∂ r u on ∂B R is known and the mapping u| ∂B R → ∂ n u| ∂B R is considered as a measurement. It is, however, usually considered as a mapping from H 1/2 → H −1/2 , see e.g. [Isa10,NUW13], or as a mapping from L ∞ → L ∞ , see e.g. [IN12,IN14]. Yet boundedness of a measurement errors of u| ∂B R , ∂ n u| ∂B R can usually only obtained in L 2 -norms which do not directly translate into boundedness of errors in the norms above.

Regularization Approach
The problem to reconstruct the potential from measurements is ill-posed hence one should apply regularization techniques to get a reasonable reconstruction from measurements. We will use again Tikhonov regularization, this time of the form

Convergence Rates for the Regularized Schrödinger Equation
with the operator introduced in (5.4) and the data g obs given due to the classical error model. We will choose the penalty functional where p ∈ (1, 2], the norm is defined as in (B.5) and is the indicator function of a L ∞ ball with radius C ∞ > 0.
As we have already seen in Chapter 4 that a regularization approach with a B 0 p,2 penalty term leads to order optimal convergence rates for certain problems. We choose q = 2 since our strategy leads to suboptimal rates for q < 2, and if we choose q > 2 a smoother wavelet system is required, because then f * will be smoother then f † (see Corollary 4.4). Furthermore, if p ≈ 1 we can take advantage of Remark 4.21, i.e. obtain faster rates for the same solution without further assumptions. The added indicator function has a twofold purpose: On the one hand the construction of complex geometric optic solutions presented in Section 5.3 requires uniform L ∞ bounds on f . While these bounds could be achieved by using R( f ) = 1 2 f | B s p,2 2 for s large enough this would exclude the interesting case of potentials f with jumps.
On the other hand (as we will see in the following) it guarantees that the Tikhonov functional is regularizing in the sense that it meets Assumption 1.4. To see this we choose X = B 0 p,2 for p ∈ (1, 2] and Y = L 2 (∂B R × ∂B R ) together with their weak topologies. As we are in the deterministic error model and the inverse problem has a unique solution, it remains to check the sequential closedness of D given by as well as the sequential continuity properties of the operator F on this set. A regularization strategy similar to our choice has already been considered in [LKK13]. It has been shown that for p > 3/2 and as a penalty term Assumption 1.4 is fulfilled. However due to the lack of a unique continuation principle for f ∈ L p with p ∈ (1, 3/2] the result could not be extended to all p ∈ (1, ∞).
Lemma 5.1. Let ( f n ) n∈N ⊂ D and f n f ∈ B 0 p,2 , then f ∈ D and f n f in L t for all t ∈ (1, ∞) and f n * f in L ∞ .
Proof. Since ( f n ) n∈N ⊂ D we get that f n L ∞ ≤ C ∞ for all n ∈ N. Hence there exists some subsequence ( f n k ) k∈N and As the domain is bounded we can infer f 0 ∈ L t for all t ∈ (1, ∞). Using again boundedness of the domain the weak- * -convergence implies that f n k f 0 in L t for all t ∈ (1, ∞).
Due to the continuous embeddings L p ⊂ B 0 p,2 and B 0 p ,2 ⊂ L p we obtain that f n k f 0 in B 0 p,2 and hence f 0 = f by uniqueness of the limit. By repeating the above arguments every subsequence of ( f n ) n∈N has a subsequence that weak- * -converges to f in L ∞ . Thus f n * f in L ∞ which then implies that This illustrates that D is weakly sequentially closed and the weakly convergent sequences in B 0 p,2 for p ∈ (1, 2] also converge weakly in L t for t ∈ (1, ∞). In other words: as F has been shown to be sequentially continuous with respect to the weak L 2 topology in [LKK13] our regularization strategy fulfills Assumption 1.4 and is hence regularizing by Section 1.2.2.
As usual we will measure convergence and the rate of convergence with respect to the Bregman distance ∆ R associated to the penalty term R. Due to the added indicator function the subgradient of a potential f with R( f ) < ∞ is not unique. However, R is regular enough to apply the sum rule for the calculation of the subdifferential, see Proposition A.8(a), thus we get that If ι( f ) < ∞, we always have 0 ∈ ∂ι( f ), therefore we will always consider in the following the Bregman distance to be associated with the uniquely determined sub- 2 which is of the form (4.2) for λ = W f † . Note that due to ι( f † ) = 0 we get that as in Lemma B.21; this makes evident that our choice of the regularization functional fulfills Assumption 2.22. A VSC fulfilled by this problem will be stated and proven in Section 5.4, but beforehand we will discuss the basics required for the ill-posedness estimate (2.28c).

Ill-posedness Estimate
The general idea for proving that a VSC is fulfilled is to choose the operator family similar to P j as defined in (4.3), i.e. the Fourier transform of P j f is zero for high frequencies and coincide with the Fourier transform of f for low frequencies (for the concrete choice of P j see the proof of Theorem 5.13 below; nevertheless (4.3) is the main part). Before going into the details, we would like to describe the idea behind the ill-posedness estimate. For simplicity assuming a L 2 -setting we can estimate where F denotes the Fourier transform as defined in (B.2), by norm equivalence for Sobolev spaces. While the first term can be estimated with the help of smoothness of f † the challenge is to find a good control of F P j ( f † − f ) | L ∞ . However, this this term also appears when proving stability estimates for this problem as we explain next.
For stability estimates one usually proceeds as follows: One chooses P j as above and splits As one typically assumes that f 1 − f 2 ∈ H s with s > ν we get that the Fourier coefficients will decay with a certain speed and hence one immediately obtains an estimate for the first summand. For the second summand one uses as above that holds true. A stability estimate then follows after estimating F P j ( f 1 − f 2 ) | L ∞ by properly balancing the terms.
Therefore to verify the VSC we can rely on techniques to prove stability estimates. But how does one control F P j ( f 1 − f 2 ) L ∞ ? There are two key factors involved in estimating this term: Alessandrini type identities and complex geometric optics solutions.
Alessandrini type identities are relations of the form where u k is a solution to the Schrödinger equation (5.1) for the potential f k and G is some linear functional. The name originates from [Ale88], where the first such identity was proven in order to derive the first stability estimate for the Schrödinger equation as explained above. We will see two such identities later on (see the Lemmas 5.9 and 6.2). Suppose now that we can find solutions u k such that u 1 u 2 = e −iξ· (1 + p) where p is a -hopefully small -perturbation. By rearranging terms, an Alessandrini type identity thus allows the estimate where the right hand side can typically be controlled by a function depending on , p , u 1 and u 2 , and the first two of these are the ones we need for the stability or ill-posedness estimate, while the others are at least bounded.
Note that the assumption to find solutions such that u 1 u 2 = e −iξ· (1 + p) is not unrealistic; if f = 0 a solution to (5.1) is given by e iζ·x for all ζ ∈ C 3 satisfying ζ · ζ = E. 1 These solutions are plane wave solutions but with a complex valued direction of propagation ζ, leading to an exponential growth in the direction − (ζ), hence for (ζ) = 0 they are unphysical. If we have two such solutions with ζ 1 + ζ 2 = −ξ, then we found the solutions we wanted to plug into the estimate.
If f = 0, then of course u(x) = e iζ·x for ζ with ζ · ζ = E does not longer solve (5.1). Therefore we will use the ansatz and solutions of this form are called complex geometric optic solutions. Inserting into (5.1) and using the relation ζ · ζ = E we get the following differential equation for the perturbation v ζ : Note that we still have some freedom in our choice of ζ, so we are in particular interested in cases where v ζ gets small in some norm. Properties of D ζ and its solution operator G ζ defined by D ζ (G ζ f ) = f have been studied by many different authors via Fourier methods. They already appear in [Fad65], and have been applied in [SU87] to prove uniqueness of the inverse problem. Starting with [Ale88] they have been used in order to derive stability estimates. The construction has been greatly simplified for a periodic setting in [Häh96]. Here we will employ estimates of [Wed91] in combination with ideas from [BT03] in order to show that v ζ → 0 as |ζ| → ∞.

Complex geometric optics solutions
Before we are able to study the mapping properties of G ζ we have to introduce some Sobolev based function spaces.

Convergence Rates for the Regularized Schrödinger Equation
Lemma 5.4. There exists constants C µ > 0 depending on |µ| and the choice of η in (5.8b) only such that for all f ∈ H µ,−1 (R d ) the estimate Proof. Let η be the function in the definition of the H s -norm in (5.8b) and let ρ > 1. Consider the operator T ρ : H µ,t → H µ,t+1 defined by T ρ f = η ρ f . We will first show that this operator is bounded.
Assume that µ ∈ N 0 and f ∈ H µ,t for some t ∈ R. Then we get that for all multi-indices α with 0 ≤ |α| ≤ µ hat there exists constant c α > 0 depending on η such that As |∂ β f | ∈ H 0,t for all 0 ≤ β ≤ α we therefore get that there exist constants C µ depending on µ and η only such that Using again Proposition 5.2(d) we see that i.e. T ρ f = η ρ f is a bounded linear operator from H µ,t → H µ,t+1 with norm bound C µ ρ for C µ := √ 2C µ . Therefore interpolation implies for an appropriate constant C µ .
By definition of the norm on H µ ρ for µ ≥ 0 we therefore choose t = −1 Using duality and t = 0 yields that for the same constant C µ .
Note that by the embedding . We will need the introduced spaces for the construction of complex geometric optics solutions only. As their construction will merely be carried out for d = 3 we will from now on only write H µ,t , H µ ρ and H µ for the respective space with domain R 3 .

Complex geometric optics solutions for bounded potentials
We now return to study the solution operator G ζ . Note that this operator can actually be represented as a convolution operator of the form (5.9) This form was used to prove the following: Proposition 5.5 ([Wed91, Theorem 1.1]). Let t > 1/2, µ ∈ R, ν ∈ [0, 2] and ζ ∈ C 3 with ζ · ζ = E and |ζ| ≥ 1/10, then there exist constants C(ν, t) > 0 such that Proof. The finding of Weder is only valid for µ = 0. The extension to µ ∈ R follows by observing that G ζ and F * (1 + |·| 2 ) µ/2 F commute.
The result in [Wed91] gives a more explicit value for C(ν, t) but for our purpose the estimate in the previous proposition is sufficient. Furthermore the equivalent norm definition of Proposition 5.2(e) was used.

Proof.
Setting t = 1 we get by Proposition 5.5 and Lemma 5.3 that for f ∈ H µ ρ the estimate We now rewrite (5.7) as This operator has the following norm bound: Lemma 5.7. Let f ∈ dom(F) and ζ ∈ C 3 with ζ · ζ = E such that |ζ| ≥ max{1/10, C r f | L ∞ } with C r := 4C 0 C(0, 1)(1 + r) 2 (5.11) where C(0, 1) and C 0 are the constant of Proposition 5.5 and Lemma 5.4 respectively. Then there exists a c > 0 depending only on r such that the estimate the norm estimate is then implied by Lemma 5.6. In order to see that G ζ M f is indeed a contraction note that by Lemma 5.6 we can infer that for all h ∈ H 0 , since (5.11) guarantees |ζ| ≥ 1/10. The second lower bound on |ζ| hence gives the desired contraction result and the claim follows.
In summary, we obtain the following estimates for v ζ solving (5.7): Theorem 5.8. Let the assumptions of Lemma 5.7 be met, ν ∈ [0, 2] and choose ζ ∈ C 3 \ R 3 according to (5.11). Then there exists a constant c > 0 depending on r only such that for v ζ solving (5.7) the following estimates are fulfilled: Proof. We use the findings on complex geometric optics solutions: Proof of (a): Since supp( f ) ⊂ B r we get by Lemma 5.7 that there exists ac > 0 depending on r only such that and c depends only on r.
Proof of (b): By (5.7) and the first part we obtain that Combining this with the fact that |g ζ (x)| = O(1/|x|) uniformly in ζ ∈ C 3 \ R 3 (see [NK87]) we see that there exists a constant c > 0 depending only on r such that if ζ is chosen according to (5.11).
Proof of (c): Note that supp((1 + η 2r v ζ ) f ) ⊂ B r and by (a) this is even in H 0 r . Hence by (5.7) and Lemma 5.6 we get that Taking the supremum over C ν C(ν, 1) for ν ∈ [0, 2] (which is finite) yields the claim.

Energy dependent estimates of low Fourier frequencies for the difference of two potentials
The next aim is to derive the key estimate which will be used to obtain an estimate of the form (2.28c). This will be done in three steps: First we need to establish a Alessandrini type identity as a connection between potentials and corresponding data; this will lead to a transmission problem. A solution to the transmission problem is given by a complex geometric optics solution in the inner domain which is then continued as a radiating solution meeting (5.2) in the outer domain. Secondly, as the desired connection between potentials and data involves the jump of the Neumann traces across the boundary, we have to find estimates on these as well. Lastly, a combination of these estimates yields an estimate of |F ( f 1 − f 2 )(ξ)| for ξ small enough.

Connecting measurements with potentials
In the following we consider a transmission problem that will help us to connect measurements and potentials in a different way. For f ∈ dom(F) and given a function h on the sphere Γ := ∂B R with R > r we seek for a solution w of where the traces are defined by with n being the outer normal vector of Γ. The solution to (5.12) is unique and can be calculated by Then we obtain the following characterization for the difference of two potentials in dom(F) by the solutions of (5.12): Lemma 5.9 ([HH01, Lem. 3.2]). Let R > r and f j be potentials with f j ∈ dom(F) for j = 1, 2. Let w j be solutions to the boundary value problem (5.12) given by f j for j = 1, 2.
Then the identity holds true.
Note that if w 1 and w 2 have the form of complex geometric optics solutions (5.6) in B r with ζ 1 + ζ 2 = −ξ, then the left hand side is of the form F ( f 1 − f 2 )(ξ) + O( v ζ 1 | H 0 , v ζ 2 | H 0 ) and the perturbation of the Fourier coefficient will get small if |ζ 1 |, |ζ 2 | → ∞ due to Theorem 5.8. It therefore remains to find a good control of the right hand side.

Trace estimates for complex geometric optics solutions
We now want to find estimates for ∂ ± n w where w is a solution to (5.12) and the restriction of w to B R has the form of a complex geometric optics solution. Note that such a solution exists, since we can define whereũ ζ is the unique scattering solution to the Helmholtz equation in R 3 \ B R with prescribed Dirichlet dataũ ζ | Γ = u ζ | Γ (see [CK13,Sec. 3.2] for uniqueness, existence and regularity properties of this solution) with Γ = ∂B R . Then obviously w ζ is a solution to (5.12) and uniquely defined for ζ ∈ C 3 . To estimate ∂ + n w ζ we need an estimate for the Neumann trace of a radiating solution to the Helmholtz equation. Several estimates of such kind exist that also make the dependence on the energy explicit, see [CGLS12] for an overview. Here we will use the following estimate which is known to be sharp with respect to the energy E. Let Ω − ⊂ R 3 be star-shaped with Lipschitz boundary and set Γ := ∂Ω − and Ω + := R 3 \ Ω − . Let u ∈ H 1 loc (Ω + ) solve such that u| Γ ∈ H 1 (Γ). Then for every E 0 > 0 there exists a constant c independent of u and E such that ∂u ∂n + L 2 (Γ) ≤ cE 1/2 u| Γ H 1 (Γ) holds true for all E ≥ E 0 .
Note that w ζ and B R fulfill the regularity assumptions on u and Ω − respectively and hence Lemma 5.10 can be applied to estimate ∂ + n w ζ . We now use simple trace estimates and Lemma 5.7 to find a bound on the jump of the Neumann derivative of w ζ across ∂B R .
Lemma 5.11. Let E ≥ 1, R > r, f ∈ dom(F) and w ζ be the solution to (5.12) defined by (5.14). Choose ζ ∈ C 3 \ R 3 according to Lemma 5.7. Then the estimate holds true for all ζ fulfilling (5.11) with c depending only on R and r.
Proof. By Theorem 5.8(c) with ν = 2 the complex geometric optic solution u ζ is in H 2 loc , so we can estimate ≤ c|ζ| 3 e R| (ζ)| , for ζ meeting (5.11); and the constant c depends on R and r only. Likewise we obtain the norm estimate with Theorem 5.8(c) with ν = 1 and c depending again on R and r. Therefore via interpolation we get with a R and r dependent constant if ζ fulfills (5.11). By the trace theorem there exists a constant c depending on R such that together with Lemma 5.10 we can estimate and ∂w ζ ∂n + L 2 (∂B R ) ≤ cE 1/2 u ζ H 3/2 (B R ) .

Bounds of the Fourier transform at fixed frequency
We now want to apply the above results to obtain a bound on the low Fourier frequencies of the difference of two potentials, while making the dependence on the energy E explicit.
Theorem 5.12. Let ν ∈ [0, 1), E ≥ 1, R > r, C ∞ > 0 and f 1 and f 2 be potentials with f j ∈ dom(F) and f j | L ∞ ≤ C ∞ for j = 1, 2. Denote by g j = F( f j ) the error free data of f j for j = 1, 2. Choose t, b > 0 such that t 0 := C r C ∞ and C r as in (5.11).

(5.15)
Then there exists a constant c depending only on R and r such that for all ξ ∈ R 3 satisfying |ξ| ≤ b we have Proof. For fixed ξ ∈ R 3 choose two unit vectors d 1 and d 2 in R 3 such that ξ · d 1 = ξ · d 2 = d 1 · d 2 = 0. For t as in (5.15) define Then ζ t 1 , ζ t 2 ∈ C 3 \ R 3 and they satisfy As |ζ t j | ≥ t 0 and |ζ t j | ≥ E ≥ 1 imply that ζ t j fulfills (5.11) for j = 1, 2 there exists by Section 5.3.1.2 complex geometric optics solutions of the form where u j solves the equation −∆u j + f j u j = Eu j in B R for j = 1, 2. By extending u j to radiating solutions outside of B R solving (5.12) for f j we obtain that (5.17) Using the results of Lemma 5.9 and 5.11 we can bound Note that there exists a constant c depending only on R such that t 4 e 2Rt ≤ ce (2R+1)t and t 6 e 2Rt ≤ ce (2R+1)t for all t > 0.
Therefore we obtain that The second integral in (5.17) can be bounded by using the estimates Theorem 5.8(b) and (c) for ν ∈ [0, 1) with c depending on r only. Combining the two last estimates gives the desired estimate.

Convergence Rates
Recall that we proposed the Tikhonov functional in (5.5) to recover the potential from the Green's function. We now have the necessary tools at hand to show that for this functional the following VSC is fulfilled.
We will now introduce a related inverse problem, afterwards we verify VSCs for both problems. We finish with a discussion of the the result.

Extension to far field data
The data of the operator F described in (5.4) is sometimes called near field data, as we measure the data on ∂B R which is in this denomination assumed to be not too far away from B r which contains the support of the potential f . An interesting case for applications is the limit of R → ∞. Recall that every solution u of the Schrödinger equation (5.1) fulfilling the Sommerfeld radiation condition (5.2) has the asymptotic behavior uniformly for all directionsx = x/|x| ∈ ∂B 1 , and u ∞ is called the far field pattern (see e.g. [CK13]). Roughly speaking the previous equation states that far away from the potential the scattered field u sc (x) := u(x) − u in (x) has the form of an incident point source as in (5.3) for y = 0 with an amplitude modulation that just depends on the direction of travel. As incident fields of the form (5.3) decay with their distance to their origin y we cannot probe a potential located around the origin with such incident fields generated at ∂B R with R = ∞. Instead we will consider incident plane fields of the form which propagate into the direction d ∈ ∂B 1 . Define now u ∞ (x, d) as the far field pattern in the directionx generated by an incident plane field propagating into the direction d for all directionsx, d ∈ ∂B 1 . Recovering the potential from such data coincides with solving F f ( f ) = g with the far field operator Using the similar regularization approach one can verify the following VSC for this problem: Theorem 5.14. Let r > 0 and E ≥ 1. Let C ∞ > 0, p ∈ (1, 2], s > 0, > 0 and choose ε, θ ∈ (0, 1). Suppose that the true potential f † satisfies f † ∈ D such that f † | L ∞ ≤ C ∞ and f † | B s p,∞ ≤ . Then there exists a constant c > 0 such that for the Tikhonov functional (5.18) the variational source condition holds true for all f ∈ D with µ = 2 4−p+ε min{s/4, 1}.

Proof of the theorems
We now prove the two Theorems 5.13 and 5.14 stating that a VSC is fulfilled by each of the two inverse problems.

Verification of the variational source condition for near field data
To prove our claim the following lemma is necessary which helps us bringing Theorem 5.12 into the desired form of Theorem 2.24.
Lemma 5.15. Let f ∈ B 0 p,2 ∩ L ∞ with supp( f ) ⊂ B r for some r > 0 and p ∈ (1, 2], then for all ε ∈ (0, 1) there exists a constant c depending on ε, p and r such that the estimate holds true. If p = 2 we can even choose ε = 0.
For p < 2 and ε ∈ (0, 1), define θ ε by the equation p−ε 2 = 1 − θ ε . Lastly define p ε := θ ε p ε . Then the Lemmas B.19 and B.14 yield that Using the wavelet norm · | B 0 p ε ,2 W , we note that for each wavelet coefficient we can estimate As we know that χ B r ∈ B 1/p ε p ε ,∞ , there exists a constant c depending on ε, p and r such that and the claim follows.
Note that for p = 2 in the limit of ε → 0 the constant of the lemma satisfies c → ∞ as χ B r ∈ B 0 ∞,∞ .
Proof of Theorem 5.13. For convenience the proof will be split into several parts. Support results: Since f † and the wavelets φ l j,m are compactly supported it is clear that at a fixed level of resolution j ∈ N 0 only a finite number of the wavelet coefficients λ l j,m can be nonzero. As for the wavelet coefficients µ l j,m of the subdifferential f * we have µ l j,m = 0 if and only if λ l j,m = 0 the function f * will also have a compact support. To be more precise, we know that supp(φ 0 ) ∪ ( 1≤l≤7 supp(φ l )) ⊂ B 1 2 with φ 0 and φ l the functions from (B.4), then Due to supp( f † ) ⊂ B r , we obtain λ l j,m = 0 if 2 min{1−j,0} m ∈ B r+ 1 2 =: Ω. This in addition implies that f * (x) = 0 for x ∈ Ω.
Choice of operator family: To apply our strategy Theorem 2.24 we need to define the family of operators P j . To do this we mix the ideas of Chapter 4 and the support results above. Denote byP j as the projection defined in (4.3) and by P Ω the projection We will set P j :=P j P ΩPj P Ω for j ∈ N. Note that the action of P Ω is described by the same equation and that P Ω | B s p,q → B s p,q ≤ 1 as well as P j | B s p,q → B s p,q ≤ 1 for all s ∈ R and p, q ∈ [1, ∞].
Smoothness: We can decompose I − P j = (I −P j ) +P j [P Ω (I −P j )P Ω + (I − P Ω )]. By the support result we have supp( f * ) ⊂ Ω and hence P Ω f * = f * . With the norm bounds of the operators we get by Lemma 4.6 that Ill-posedness: For the ill-posedness result use the findings of Section 5.3. We get that For the first factor note that as f * ∈ B s p ,∞ . For the second factor we want to apply Theorem 5.12. Note that and therefore we need to estimate | P j ( f † − f ), φ l k,m |. As by assumptions the wavelets are continuous and compactly supported we know that their Fourier transform is in and by definition of the wavelets and calculus rules for the Fourier transform we obtain that F φ l k,m | L 1 ≤ 2 3k/2 F φ l | L 1 . As for fixed k ∈ N 0 their are O(2 3k ) points m ∈ Z 3 with 2 min{1−k,0} m ∈ Ω we therefore receive the upper bound and the sum is finite.
Extension to all τ: We now prove that a variational source condition of a similar form also holds true for τ ≥ τ max . Independent of τ the following inequality If τ ≥ τ max , then we have 1 ≤ E + 2t(τ) 2 ≤ t 0 with t chosen as above. Thus we get that as µ ∈ (0, 1]. Therefore we obtain for these τ ≥ τ max the VSC (5.19b) If on the other hand τ ≥ 1, then we have ln 2 (3 + τ −1/2 ) < 2. Hence there exists a constant c such that since again E ≥ 1 and 0 < µ ≤ 1. Therefore we obtain for τ ≥ 1 that Final result: Combining now (5.19a) with (5.19c) we see that we have the VSC as stated in Theorem 5.13.

Verification of the variational source condition for far field data
The proof uses the VSC of Theorem 5.13 together with a spectral source condition which connect near field data with the far field pattern: Lemma 5.16 ([HH01, Section 4]). Let R > r, > 0 and 0 < θ < 1. Then there exist constants ω, ρ, τ max > 0 such that for any two potentials f 1 , f 2 ∈ D satisfying f j | B 0 p,2 ≤ C B we have ) and far field scattering data for f j for j = 1, 2 respectively.
Note that the dependency of the constants of the lemma on E and is unknown. Thus -if we use this Lemma in the proof -we cannot hope to make the dependence on these constants explicit in Theorem 5.14.
Remark 5.17. The statement in [HH01] requires stronger regularity assumptions on f j than stated here. It is required that f j ∈ H s for some s ≥ 3/2. This regularity is used to show that the mapping f j → g j = F( f j ) is compact. However, this continues to hold under the weaker regularity assumptions, see Lemma 5.1 and [LKK13,Sec. 4].
Proof of Theorem 5.14. Recall that the VSC holds for p,2 as seen in the proof of Theorem 2.24 (for the estimate of the constant C ∆ see Example 2.23(b) and Lemma B.21). Hence we may assume that f | B 0 p,2 ≤ 11 3 f † | B 0 p,2 ≤ 11 3 , therefore Lemma 5.16 is applicable with C B = 11 3 .
Thus we can find constant B > 0 and τ max ∈ (0, 1 2 min τ 2 max , 1 ] such that In order to extend this to all τ recall from the proof of Theorem 5.13 that As for τ ≥ τ max we see that (ln(3 + τ −1/2 )) −2µθ is bounded from below we obtain the result by enlarging the constant if necessary.
(b) The stability estimate Proof. First note that by Section 5.2 minimizers f α of (5.5a) exist. The convergence rate then follows from (2.24b), while the estimate for ψ when δ → 0 is due to the fact that the logarithmic term will dominate the mixed term. For the conditional stability estimate simply apply (2.33).
Proof. As the proof of Corollary 5.18.
If we compare our results Theorems 5.13 and 5.14 for p = 2 with [HW15, Thm. 2.1, Thm. 3.2] it is obvious that several improvements could be made. For once the assumption of having a smoothness larger then 3/2 on the true solution as well as on the elements allowed in the Tikhonov functional could be dropped at the cost of assuming L ∞ -bounds -which is a realistic assumption for practical applications. Hence convergence rates are now obtained for less smooth functions. Further on, while the maximal convergence speed is still bounded, the corresponding maximal rate is now achieved at a lower smoothness s and we do not have to exclude specific smoothness values because of divergence of the involved constants.
Comparing the stability estimate of Corollary 5.18 for near field data with the best known stability estimates [INUW14,IN14] we see that the main disadvantage of our result is that the exponent of the logarithmic part µ is bounded in s. We will discuss why this is the case after Theorem 6.10. Further in both cases the function of the stability estimate Ψ dependence of E is better, i.e. the exponent of E in the Hölder part is lower. At the same time we would like to point out (as already discussed in Section 5.1) that the data term is more meaningful in our setup.
A stability estimate for far field data similar to Corollary 5.19 has been derived in [IN13]. The advantage of their result is that the exponent µ of the stability estimate is again unbounded with respect to the smoothness of the solution; however, a minimal smoothness of f ∈ W 3,1 := { f ∈ S : ∂ α f ∈ L 1 for all α ∈ N 3 with |α| ≤ 3} is required. Likewise the stability estimate for far field data in [HH01,HW15] require a minimal smoothness of H 3/2 , thus the main improvement of our new result is that it is applicable for potentials with low smoothness.
Lastly we would like to discuss the advantage of making the dependency on the energy explicit. Assume we can perform a sequence of measurements where E → ∞ as δ → 0, where we suppose that f is independent of E. In this case Hölder rates can be achieved. To be more precise assume that we choose then the convergence rate as δ → 0 is attained which is much better than any logarithmic rate of convergence (even when the exponent is small).
Note that it is known that the problem is even Lipschitz stable if one assumes that f (x) = ∑ J j=1 f j χ Ω j (x) for some a priori known Lipschitz sets Ω j ⊂ B r but unknown coefficients f j for j = 1, . . . , J, see for example [BdHFS16]. Yet besides the disadvantage of allowing only piecewise constant functions with known jump sets the Lipschitz constant is exponentially increasing with respect to the energy E and the number of partitions J. This makes the advantage of Hölder-logarithmic estimates in the high energy limit obvious.

CHAPTER VI ELECTRICAL IMPEDANCE TOMOGRAPHY
There were, however, hurdles on the road to becoming a professional mathematician, "a mathematician's mathematician", as Alberto Calderón was sometimes called, because other mathematicians would come to him for help when they got stuck on a difficult problem. [...] When I marveled at how he could remain so unsassuming despite all the acclaim, he would simply answer "I know how little I know". It was the answer of a mathematician's mathematician.
ALEXANDRA B. CALDERÓN about ALBERTO P. CALDERÓN, in "Selected papers of Alberto P. Calderón with commentary" Electric impedance tomography is a noninvasive imaging technique with many applications in medical imaging, geoscience and nondestructive testing; see [MS12,Chap. 12] for a list of references. The idea is to place electrodes on the surface of a body and apply a current with these electrodes. Due to the spatial variations of the conductivity inside the body one will measure a spatially varying potential distribution on the boundary. By repeating the measurement for different input currents one hopes to retrieve information about the interior conductivity distribution.

156
6. Electrical Impedance Tomography The advantage of using electrical impedance tomography as an imaging technique is that it is noninvasive and radiation free. Further anomalies one wants to detect often have a high contrast compared to the background. The main disadvantage, however, is the very low spatial resolution one achieves. While the required equipment for measurements is rather cheap, the problem is very sensitive to noise and modeling errors, thus making the application challenging. From a mathematical point of view the starting point for the investigation of this problem is usually attributed to [Cal80]. In this pioneering work the questions were brought up whether such measurements uniquely determine the conductivity and how to retrieve it. It was shown that the linearized problem has a unique solution for small perturbations of constant conductivities and a reconstruction procedure was suggested. Since then a lot of progress on this topic has been made; yet the theory is (at least in the three dimensional case) not complete. The challenges of the problem are its strong nonlinearity and ill-posedness. Several reconstruction methods specifically taylored to this problem have been proposed and succesfully implemented. For example the D-bar method starting with [Nac96,SMI00] aims to reconstruct the conductivity everywhere whereas factorization based methods (see e.g. [HB03,Har13]) try to find deviations from a reference conductivity.
In this chapter we will study Tikhonov regularization to solve for the conductivity. We will give a precise statement of the problem in Section 6.1 and review known results. While proving the regularization property turns out to be straightforward under typical assumptions on the conductivity the difficult part will be to also prove convergence rate for this method.
Here we will deviate from the previous chapters and not proof a VSC directly. Instead we will first prove a stability estimate for the problem in Section 6.2. This will be done by using a close relation between electrical impedance tomography on the one hand and the Schrödinger equation studied in the previous chapter on the other. Most importantly stability estimates for the latter imply stability estimates for the former. As in the process of going from conductivities to potentials one looses smoothness we have to study complex geometric optic solution again but this time for potentials which are less smooth.
In Section 6.3 we will then prove the convergence rate result. In order to do so we will show how general stability estimates can be used to show that VSCs are fulfilled. The results then follows from the stability estimate we showed earlier.

The Electrical Impedance Tomography Problem
We will now give a more rigerous introduction into the electrical impedance tomography problem by describing the direct and inverse problem respectively; including a literature review. Afterwards, we introduce a Tikhonov functional for this problem and prove its regularizing properties.

Problem setup
Let γ denote the spatiallly varying conductivity of the body Ω where for simplicity we will assume that Ω = B 1 . Then the potential distribution u inside the body for an apllied current h is given by the solution to the Neumann problem ∇ · (γ∇u) = 0 in B 1 , where n is the outer normal vector of B 1 ; see [Bor02] for a derivation from Maxwells' equations.
Direct problem For the direct problem one assumes that γ is given and the goal is to calculate u| ∂B 1 . Note that the applied current has to satisfy and a solution u can only be unique up to an additive constant. Therefore we introduce the spaces where we set f | H 1 (B 1 ) := ( B 1 |∇ f | 2 dx) 1/2 which by Poincaré's inequality is indeed a norm. The weak formulation of (6.1) is then given by where Tr : H 1 (B 1 ) → H 1/2 (∂B 1 ) denotes the trace operator. Assuming that γ ∈ L ∞ with γ(x) ∈ [γ, γ] for all x ∈ B 1 and for some γ > 0, we see that this problem is elliptic. Hence one can apply the Lax-Milgramm Lemma to see that there exists a bijective linear operator L γ : fulfilling the estimates (6.2) 158 6. Electrical Impedance Tomography Thus for all h ∈ H −1/2 (∂B 1 ) there exists a unique solution u ∈ H 1 (B 1 ) to (6.1) given by u = L −1 γ Tr * h where Tr * is the adjoint of Tr. This allows to define the Neumann-to-Dirichlet map (sometimes also called current-to-voltage map in this context): which for given Neumann data (or current) assigns the corresponding Dirichlet data (or voltage) of the problem. Note that by (6.2) u will depend continuously on γ with respect to the L ∞ norm if γ fulfills the ellipticity condition γ ∈ [γ, γ] thus the mapping γ → Λ γ is also continuous.

Inverse problem
The inverse problem we are considering is to determine the conductivity from the knowledge of the Neumann-to-Dirichlet map Λ γ . The review [Uhl09] summerizes known results focussing on complex geometric optics solutions; the connection between these solutions and the problem will be made clear in the following. Concerning the questions posed by Calderon the state of the art knowledge is fundamentally different depending on whether d = 2 or d ≥ 3. For the two dimensional case the questions has been fully answered. It has been shown in [AP06] that the problem has a unique solution for all γ ∈ L ∞ with γ ≥ γ(x) ≥ γ > 0 and in [BFR07,RFC10] that small regularity of γ allows to prove stability estimates. Furthermore, an explicit reconstruction scheme based on [Nac96] has been developed in [SMI00] called the D-bar method which for γ ∈ C 2 converges with the rate (− ln(δ)) −1/14 , see [KLMS09,Thm. 3.1].
For the three dimensional case (the case d > 3 usually follows similarly) the situation is different and the picture is less complete. In [KV85] uniqueness has been shown for piecewise analytic conductivities and in [SU87] and [Nac88] uniqueness has been demonstrated for conductivities γ ∈ C ∞ and γ ∈ C 1,1 respectively via a connection to the Schrödinger problem (which we will discuss below). The result of Sylvester and Uhlman has then been improved in [BT03] to γ ∈ H 3/2 p for p > 6 and to γ slightly more regular than Lipschitz in [HT13]. Finally uniqueness for Lipschitz conductivities has been proven in [CR16], and for conductivities with γ ∈ H 1 3 ∩ L ∞ in [Hab15]. Stability estimates requiring a bit more smoothness then the current state of the art uniqueness results have been obtained in [Ale88,Hec09,CGR13]. The stability estimates are of the form where Ψ(t) = c(− ln(t)) −µ . (6.3) We summarize the achieved results in the Table 6.1. As the finding of [Man01] generalizes to electrical impedance tomography we know that the function Ψ in the stability estimate has again to be of logarithmic type and therefore the results are optimal up to the value of the exponent. We should mention that for both cases, i.e. the two-and the three-dimensional case, the formulation of the inverse problem is usually slightly different than presented here. It is usually assumed that the data is given by the Dirichlet-to-Neumann map Λ DtN γ (that is the mapping u| ∂B 1 → ∂ n u| ∂B 1 ) instead of the Neumann-to-Dirichlet map Λ γ . However, with the given functional setting we have that which shows that -as long as the conditions of the stability estimates imply that the involved mappings are uniformly bounded -a stability estimate for one gives a stability estimate for the other and one loses at most a constant.

Regularization of electrical impedance tomography
The goal of this section is to setup a Tikhonov functional for the electrical impedance tomography problem.
We have seen earlier that in order to derive well definedness of the Neumannto-Dirichlet operator Λ γ we needed that γ(x) ∈ [γ, γ] and that the support of γ − 1 is contained in B 1 . The first condition is usually called ellipticity condition, as it guarantees ellipticity of the pde. In practice the conductivities of involved materials are usually quite well known, see e.g. the tables in [Bor02], and therefore upper and lower bounds are readily available. Hence an assumption of this form is not very 160 6. Electrical Impedance Tomography restrictive, and we will include it into the definition of the domain of the operator. The second condition motivates to set that is the unknown in the inverse problem is the deviation of the conductivity from the homogeneous conductivity. Thus the domain of the forward operator is given by For the exact definition and the image space of the operator see below. Note that this way f will be a function on the whole of R 3 . As we assume that the support of f is contained in the open ball B 1 an extension of f to the whole space will not introduce any type of singularities and hence the smoothness of γ defined on B 1 and of f defined on R 3 will coincide. As already discussed in Section 6.1 it is unknown whether the inverse problem has a unique solution for all f ∈ dom(F). As the problem is in addition nonlinear we will also not be able to guarantee uniqueness of the R-minimizing solution if more than one solution might exist. Hence the choice of the penalty functional will enforce enough regularity on f such that the inverse problem is uniquely solvable. As a a starting point we will use the uniqueness result of [BT03] that yields uniqueness for γ ∈ H 3/2 p for allp > 6. By Lemma B.18 we know that f ∈ B 3/p+1+τ p,2 for p ≤ 2 and τ > 0 implies the necessary smoothness on γ for somep > 6 by embeddings. The motivation for choosing a Besov space as a primary regularizer as well as the choice of the fine index q = 2 are the same as in Section 5.2. Moreover, we will impose a stronger restriction on the support which will be needed in order to get a closed set on which the Tikhonov functional is finite. The pre-image space and the penalty term will then finally be given by for p ∈ (1, 2], τ ∈ (0, 1) and r ∈ (0, 1). The restriction on τ will become clear when constructing complex geometric optics solutions later. However, in order to not require to much regularity one should think of τ as close to zero anyway. Note that if we would start with the uniqueness result of [Hab15], then we could set X = B t p,2 for t = 3/p, i.e. we would require roughly one derivative less on the solution of the problem. Yet in this case the construction of complex geometric optic solutions and corresponding estimates gets much more involved. Therefore we stick to the result of [BT03] as then the analysis can be carried out similarly to Section 5.3 in combination with a simple approximation procedure.
We will now take a closer look at what we consider to be the data of our inverse problem. It will turn out that it is advantageous to define where Λ 1 is the Neumann-to-Dirichlet map for the conductivity γ ≡ 1 or equivalently f ≡ 0 and j : H 1/2 (∂B 1 ) → L 2 (∂B 1 ) is the canonical embedding and j * its adjoint. Thus the data is a mapping and as it has a continuous extension to a mapping H −1/2 (∂B 1 ) → H 1/2 (∂B 1 ) we get that it is also compact. It will turn out that it is even a Hilbert-Schmidt operator. This would be immediate from the decay of the singular values of j for d = 2 but for d = 3 this decay is too slow. The advantage of regarding g as a Hilbert-Schmidt operator and not as a mapping between the canonical trace spaces has already been discussed in Section 5.1.
In order to see that the data is indeed a Hilbert-Schmidt operator we will split Λ 1+ f − Λ 1 = S(T 1+ f − T 1 ) and use that S is infinetely smoothing. Denote byr := 1+r 2 and define for any conductivity γ with γ − 1 ∈ dom(F) and supp(γ − 1) ∈ B r the operator T γ : H −1/2 (∂B 1 ) → H 1/2 (∂Br) by h → u γ | ∂Br where u γ is the solution to (6.1), i.e. u γ solves ∇ · (γ∇u γ ) = 0 in B 1 , Furthermore, define S : H 1/2 (∂Br) → H 1/2 (∂B 1 ) as the operator that mapsh → w| ∂B 1 where w is the unique solution to the Cauchy problem If h ∈ H −1/2 (∂B 1 ) andh = u γ 1 | ∂Br − u γ 2 | ∂Br where u γ 1 and u γ 2 solve (6.1) for γ 1 and γ 2 respectively, then the solution of the Cauchy problem is given by w = u γ 1 − u γ 2 since u γ j solve ∆u γ j = 0 in B 1 \ Br for j = 1, 2. Furthermore w| ∂B 1 ∈ H 1/2 (∂B 1 ) since u γ j | ∂B 1 ∈ H 1/2 (∂B 1 ) for j = 1, 2. Thus the identity This yields that Λ 1+ f − Λ 1 : L 2 (∂B 1 ) → L 2 (∂B 1 ) is indeed a Hilbert-Schmidt operator. Thus we choose as image space of F and data fidelity terms Y := HS L 2 (∂B 1 ), L 2 (∂B 1 ) , and S(g 1 , g 2 ) = T g 1 (g 2 ) := 1 2 g 1 − g 2 | Y 2 (6.4e) as usual when dealing with the deterministic error model. It remains to choose the topologies on X and Y such that the Tikhonov functional defined by (6.4) is regularizing. We will equip X with its weak and Y with its norm topology. As our penalty term guarantees unique solvability of the inverse problem it remains to check the continuity properties of F on D given by as D is obviously weakly sequentially closed. But note that γ → Λ γ is norm-to-norm continuous on the closed set for the L ∞ -norm, as we have seen in (6.2). Denote by J the embedding which is compact. Then we get that J(D) ⊂D and this shows that F : D → Y is sequentially weak-to-norm continuous.

Stability Estimates for Electrical Impedance Tomography
Our next aim is to prove stability estimates for the electrical impedance tomography problem. Later on these stability estiamtes will be used to verify VSCs, the reason for the detour will be discussed at the end of Section 6.2.1. The basic idea to investigate stability (but also uniqueness) for the inverse problem to (6.1) remains unchanged since [SU87]: One exploit that via a simple transformation we can transform (6.1) into a Schrödinger equation. For this equation we already know how to prove stability by complex geometric optic solutions (see the introduction of Section 5.3). The new challenge here is that the obtained potentials might not be in L ∞ and hence we have to rework the results of Section 5.3.1.2 while requiring less regularity on the potential. As the back transformation from potentials to conductivities involves an elliptic pde one can then use regularity results to get a stability estimate for conductivities. In order to estimate the smoothness of involved functions we will rely on the the generalization of product and chain rule presented in Section B.3.1.2.

Connection to Schrödinger equation
We now construct a Schrödinger equation from (6.1). Considering that define the potential V by V = ∆ γ 1/2 γ 1/2 (6.5) (the change in notation in comparison to Chapter 5 is due to the fact that the potential is no longer the sought for quantity of the inverse problem). Further define then w is a solution to the differential equation Note that for γ not smooth enough (6.5) has to be understood in a distributional sense. We will make regularity properties of V given regularity properties of γ more precise later.
Let now h ∈ H −1/2 (∂B 1 ), then similarly to (6.1) we define the Neumann problem for V as in (6.5). If we assume that supp(γ − 1) ⊂ B 1 and γ ∈ C 1 (B 1 ), then we see that this problem has a unique solution. Indeed there exists u solving (6.1) for the same Neumann data and if w is defined by w = √ γu then w ∈ H 1 (B 1 ) and w solves (6.6) since as γ ≡ 1 for |x| > r.

Electrical Impedance Tomography
A similar calculation makes evident that under the given assumptions also the Dirichlet data of the solutions coincide and hence the problems (6.1) and (6.6) have the same Neumann-to-Dirichlet map. We can now show that this operator is self adjoint: for all x ∈ B 1 for some γ > 0. Then the Neumann-to-Dirichlet map Λ γ is self adjoint.
Proof. Let h k ∈ H −1/2 (∂B 1 ) and denote by w k the solutions to (6.6) with Neumann boundary data h k for k = 1, 2. Then using w l for l = k as a test function, we obtain with the help of Green's theorem.
Now we immediately obtain the following version of Alessandrini's identity (compare [Ale88, Lem. 1] and Lemma 5.9). Lemma 6.2. Let γ k ∈ C 1 (B 1 ), supp(γ k − 1) ⊂ B r and γ k (x) ∈ [γ, γ] for all x ∈ B 1 for some γ > 0. Denote by V k the potentials given by (6.5) for γ k , and let w k be the solution to (6.6) for Neumann data h k ∈ H −1/2 (∂B 1 ) for k = 1, 2. Then Proof. Proceeding as in the proof of Lemma 6.1 we get As w k = Λ γ k h k on ∂B 1 the claim follows by rearranging terms and using Lemma 6.1.
The left hand side of the result of the previous lemma is already familiar to us; it is the same as in Lemma 5.9. As w k is a solution to a Schrödinger equation we can insert complex geometric optics solutions as a special case. This allows -again -to estimate the low Fourier coefficients of V 1 − V 2 by a combination of the free parameter ζ of the complex geometric optic solution and the data Λ γ 2 − Λ γ 1 . Together with a smoothness assumption which estimates the high Fourier coefficients this will imply a stability estimate for the potentials.
But our goal is to derive a stability estimate for conductivities. Hence we need a relation that transfers stability from potentials to conductivities. In order to do this we note that there is another useful relation between these two quantities which again was already used in [SU87]. Assume that for k = 1, 2 we have γ k ∈ L ∞ with γ k (x) ∈ [γ, γ] for all x ∈ B 1 for some γ > 0, then we can define a := log γ 1 γ 2 . (6.7a) By a simple calculation one verifies that formally a is a solution to the elliptic pde ∇ · (γ 1 γ 2 ) 1 2 ∇a = 2(γ 1 γ 2 ) 1 2 V 1 − V 2 in B 1 , a = 0 on ∂B 1 . (6.7b) If we assume that the right hand side of this equation is in H −1 0 (B 1 ), then we get by the Lemma of Lax-Milgram that a ∈ H 1 0 (B 1 ) and the estimate a H 1 (B 1 ) ≤ 2 γ (γ 1 γ 2 ) 1 2 V 1 − V 2 H −1 (B 1 ) (6.8) holds true. It remains to estimate the norm of the left hand side from below by the norm of γ 1 − γ 2 and the norm of the right hand side from above by the norm of V 1 − V 2 . The next two lemmas show that this is possible and one only looses a constant depending on the C 1 -norm of γ k . Lemma 6.3. Let γ k ∈ C 1 (B 1 ), supp(γ k − 1) ⊂ B 1 and γ k (x) ∈ [γ, γ] for all x ∈ B 1 for some γ > 0. Then there exists a constant c > 0 depending on γ and γ such that with a as defined in (6.7a).
One might wonder why we do not prove a VSC directly by Theorem 2.24. The problem here is that we do not know how to prove the ill-posedness estimate (2.28c). Consider again our typical choice P j by (4.3), then we would have to estimate P * j ( f 1 − f 2 ) . But inserting complex geometric optics solution into the Alessandrini identity given by Lemma 6.2 the complex geometric optics solution only gives estimates for P * j (V 1 − V 2 ) , where V k is the potential given by (6.5) for γ k = f k + 1 for k = 1, 2. As the mapping f → V is nonlinear we do not know how to relate the two norms P * j ( f 1 − f 2 ) and P * j (V 1 − V 2 ) directly.
As τ > 0 this yields the estimate.
Note that this estimate is sharp in the sense that we cannot hope to get more regularity on V. Indeed by setting p = 2 we see that V has exactly two derivatives less than γ, as has to be expected by the definition of V due to (6.5) since it involves the Laplace operator.
Therefore we want to establish existence and norm estimates of v ζ for potentials V with V ∈ H 1/2+τ where from now on we will always assume that V is smoothly extended by zero on R 3 \ B 1 . In order to do this we follow again the approach of [BT03] where this time we exploit more of its capability. The main idea of this approach (which allows to drop the regularity requirement approximately by one) is to approximate a potential V with a sufficiently smooth function and then treat the reminder as a small perturbation.
Proof. Note that G ζ,V f = g if and only if (I − G ζ • M V )g = G ζ f . Therefore, if we assume that G ζ • M V is a contraction on H 1/2 with norm bound 1/2, we obtain by the Banach fixpoint theorem that and thus g H 1/2 ≤ 2 G ζ f H 1/2 .
Proof of (b): The first part and the choice of ζ imply that v ζ H 1/2 ≤ c|ζ| −1 V | H 1/2 ≤ c V | H 1/2 c em C r V H 1/2+τ and the right hand side is bounded by a constant independent of V and ζ.
Note that the motivation to restrict to τ ∈ (0, 1) is due to the proof of (c).

176
6. Electrical Impedance Tomography The first integral in (6.19) can be bounded by using the estimates of Theorem 6.8 with c independent of V k and t. Inserting the three last estimates into (6.19) gives the desired estimate.
Proof. Let b > 0, then Since V 1 and V 2 are in H s we get the estimate for the Tikhonov functional given by (6.4) where f * ∈ ∂ f † | B Proof. We will prove the VSC by applying Theorem 6.12. By Theorem 6.11 a stability estimate with , Ψ(t) := ln 3 + t −1 −(1+τ) + t and R(ρ) := cρ 1 + ρ ν+3 ≥ c max ρ 4 (1 + ρ ν ), ρ 5 1 + ρ 2+τ holds true and we have r = 2. By increasing the constant if necessary we see that R( ) ≥ and the value of ν follows by setting s = 1 2 + τ. Inserting the parameters in Theorem 6.12 finishes the proof. Proof. In Section 6.1.2 we have shown that Assumption 1.4 is met, thus Theorem 1.6 yields existence of a minimizer. The convergence rate then follows from Theorem 6.15 and Section 2.4.4.2.
To the best of our knowledge Corollary 6.16 yields the first convergence rate result for an reconstruction algorithm for electrical impedance tomography in three dimensions. Note that the ellipticity condition, the support constraint as well as the Besov norm by wavelets are all rather easy to implement in practice. However, calculating the minimizer of the Tikhonov functional is still a challenge as the Tikhonov functional is nonconvex due to the nonlinearity of F. This is the advantage of the D-bar method which reconstructs the conductivity from the data without having to solve a nonconvex optimization method. As already mentioned in Section 6.1 it is known that for C 2 conductivities in two dimensions this method converges with rate (− ln(δ)) −1/14 (see [KLMS09,Thm. 3.1]). This algorithm has been extended to a reconstruction algorithm in three dimensions in [CKS06] for conductivities close to 186 6. Electrical Impedance Tomography γ = 1 and more recently to the full problem in [DK14] for γ ∈ C 1,1 . However for this three dimensional generalization a convergence rate result is not yet available.
Implicitly convergence rates could have been obtained by Theorem 2.17. But there the smoothness of the true solution has to be known a-priori. Further the stability estimates discussed in Table 6.1 either require a very high smoothness (γ ∈ H s for s > 7/2) or have smoothness assumptions that would be hard to implement as a penalty term (γ ∈ C 1,ε for ε ∈ (0, 1)).
We think that our result could be improved in two ways: One the hand one could allow the conductivities to be nonconstant near the boundary. As the boundary values of γ depend stably on the data this should not lead to slower convergence rates. The difficult part here would be to extend the conductivities outside of the domain toγ in such a way that the smoothness of the function is preserved and a slightly weaker ellipticity condition, sayγ(x) ∈ [γ/2, 2γ] for all x ∈ R 3 , is fulfilled. However, in these cases we have to choose Y = L(H −1/2 (∂B 1 ), H 1/2 (∂B 1 )) as we can no longer show that the data is a Hilbert-Schmidt operator. On the other hand applying the results of [Hab15] might allow to reduce the smoothness assumptions by roughly one derivative. But even then one would still be far away from the ideal case which would be given by regularizing with a B s p,q -norm with s < 1/p in order to allow conductivities which are smooth up to jumps which would be the most important application.
In this work we have studied how to verify VSCs for various settings and problems with the motivation to obtain convergence rates for Tikhonov estimators. Most results of this thesis are based on our strategy Theorem 2.24 to verify VSCs. Hence it can be seen as the central result of this work while the following chapters have shown the great flexibility of the presented approach.
From Chapter 3 we can conclude that VSCs are (under mild assumptions) necessary and sufficient for low order convergence rates for the two most common error models and a large set of estimators. The advantage of VSCs -compared to most other conditions that yield equivalence -is, however, that their formulation does not require the functional calculus. This underlines that they are the "right" condition in order to obtain low order convergence rates (i.e. rates slower then O( √ δ)). For higher order rates (that is, rates of o( √ δ)) higher order VSCs have been suggested (see Section 2.4.4.1 and reference therein). For the verification of these conditions our main strategy is also applicable (see [SH18,Sec. 5

]).
Chapter 4 has illustrated that our strategy is also applicable in Besov spaces. A key step was the characterization of smoothness of subgradients for certain norms. However, the picture we get is less complete than for Hilbert spaces. We are only able to obtain order optimal rates for Besov penalties with fine index q ≥ 2. For q < 2 several reasons are possible why we do not obtain optimal convergence rates. We believe that our upper bounds are too pessimistic, this can be seen e.g. by comparing with the case q = 1 treated in [HM18]. But note that in this case the VSC is not formulated with respect to the Bregman distance. Thus an immediate direction of further research is to try to close this gap in optimality for q ∈ (1, 2). However, proving better upper bounds most likely has to rely on another technique as presented here, since -taken individually -we believe all our estimate to be optimal up to constants. Another interesting topic would be to study converse results in the sense of equivalence of convergence rates, smoothness and VSCs.
In our study of the Schrödinger equation in Chapter 5 we have shown that the findings of the previous chapter are not limited to simple operators on the torus but can also be applied to more involved problems. Here we could improve our previous findings of [HW15] and get convergence rates under lower smoothness assumptions as well as prove that the rates are of Hölder-logarithmic type for near field data, i.e. are of Hölder type in the high energy limit. We would like to point out, that (to the best of our knowledge) no other convergence rate result for this problem is available. As discussed later on it seems that the main downside of our derived stability estimate -the bounded exponent of the logarithm -seems to be inherited by the fact that we verify a VSC first; as can e.g. be seen in Theorem 6.10.
Chapter 6 contains two main results: a first convergence rate result for electric impedance tomography in three dimensions and a general method to verify VSCs by stability estimates. Possibilities to improve our stability estimate (which implies that the VSC is fulfilled) are already discussed at the end of that chapter; namely varying conductivity up to the boundary and using estimates on complex geometric optics solutions which require less smoothness on the involved potentials. We believe that further investigation of Theorem 6.12 might lead to new interesting research questions: First of all it shows the usefulness of stability estimates in weak (i.e. negative smoothness) norms. Second, now that we see that stability estimates are not only implied by VSCs but also imply them, the question arises whether the two conditions are actually equivalent. As we have seen that VSCs derived by stability estimates are not always optimal, so a step forward would be to investigate when sharp converse implications can be obtained.

APPENDIX A CONVEX ANALYSIS
In the following we will summarize some basic concepts of convex analysis. The presented results can e.g. be found in [BP86, Chap. 2] and [Zal02].

Convex functions
We will look at properties of functions mapping into the extended realsR := R ∪ {±∞}. Definition A.1. Let X be a vector space, then a mapping h : X →R is called convex, if h(λx + (1 − λy)) ≤ λh(x) + (1 − λ)h(y) for all λ ∈ [0, 1] and all x, y ∈ X . It is called strictly convex if the inequality is strict for all λ ∈ (0, 1) and x = y. Further h is (strictly) concave, if −h is (strictly) convex. The set dom(h) := {x ∈ X : h(x) < ∞} is called the effective domain of h. The mapping is furthermore called proper if h(x) > −∞ for all x ∈ X and dom(h) = ∅.
For convex mappings the following generalization of continuity is often of interest: Definition A.2. Let X be a vector space and h : X →R a mapping. Then h is lower semicontinuous at x 0 ∈ X , if h(x 0 ) = lim inf x→x 0 h(x). Furthermore h is called lower semicontinuous if it is lower semicontinuous at every point x 0 ∈ X . It connects in the following way to the classical continuity property: Lemma A.3. Let X be a Banach space and h : X →R be a proper, lower semicontinuous and convex function. Then h is continuous at every point of the interior of its effective domain.
We are mainly interested in convex functionals due to the following minimization property.
Proposition A.4. Let X be a reflexive Banach space and h : X →R a proper, convex and lower semicontinuous mapping. Then h attains its minimal value on every bounded, convex and closed subset C ⊂ X . If h is even strictly convex, then the minimizer is unique. Further the statements remain true if boundedness is replaced with the coercivity condition lim x∈C, x | X →∞ h(x) = ∞.
Subdifferential For convex functions there is the following generalization of a derivative: Definition A.5. Let X be a Banach space and h : X →R convex. Then x * ∈ X is called a subgradient of h at x, if h(x) is finite and h(y) ≥ h(x) + x * , y − x ∀y ∈ X .
We call the mapping ∂h : X → 2 X -where 2 X is the power set of X -with ∂h(x) := {x * ∈ X : x * is a subgradient of h at x} h(x) is finite ∅ else the subdifferential of h.
Example A.6. Let C ⊂ X be a closed, nonempty, convex set. Then the indicator function ι C : X →R of C is defined as ι C (x) := 0 x ∈ C ∞ x ∈ C and the corresponding subdifferential is given by ∂ι C (x) := {x * ∈ X : x * , x − y ≤ 0 for all y ∈ C} x ∈ C ∅ x ∈ C .
The set ∂ι C (x) is also called the normal cone to C at x. One immediately sees that 0 ∈ ∂ι C (x) for all x ∈ C and can show that ∂ι C (x) = {0} if and only if x is in the interior of C. The previous lemma showed how to calculate the subdifferential for differential functions. Furthermore, it has the following calculus rules: Proposition A.8. Let X , Y be Banach spaces, h 1 , h 2 : X →R be proper and convex and T : Y → X be linear and continuous.