Consider a real-valued Gaussian process $ f$ on some compact domain $ \mathcal{X}$ with mean zero and covariance function $ k(x,x’) \in [0,1]$ (also known as the kernel function). This question concerns a finite collection of points in $ \mathcal{X}$ , namely, a set of *sampled points* $ \mathbf{x} = [x_1,\dotsc,x_n]^T$ and a *query point* $ x$ . Writing $ \mathbf{f}(\mathbf{x}) = [f(x_1),\dotsc,f(x_n)]^T$ , we have the joint distribution $ $ \left[ \begin{array}{c} \mathbf{f}(\mathbf{x}) \ f(x) \end{array}\right] \sim N\left( \left[ \begin{array}{c} \mathbf{0} \ 0 \end{array}\right], \left[ \begin{array}{cc} \mathbf{K} & \mathbf{k}(x) \ \mathbf{k}(x)^T & k(x,x) \end{array}\right] \right), $ $ where $ \mathbf{k}(x)$ is an $ n\times 1$ vector with $ i$ -th entry $ k(x,x_i)$ , and $ \mathbf{K}$ is an $ n \times n$ matrix with $ (i,j)$ -th entry $ k(x_i,x_j)$ . The samples corresponding to $ \mathbf{x} = [x_1,\dotsc,x_n]^T$ are denoted by $ \mathbf{y} = [y_1,\dotsc,y_n]^T$ , and take the form $ $ y_i = f(x_i) + z_i,$ $ where $ z_i \sim N(0,\sigma^2)$ is additive Gaussian noise (independent for each sample) with $ \sigma^2 \le 1$ .

It is well known that the posterior distribution of $ f(x)$ given $ \mathbf{y}$ (with $ \mathbf{x}$ assumed fixed and known) is Gaussian, with the posterior mean and variance taking the form $ $ \mu_n(x) = \mathbf{k}(x)^T(\mathbf{K} + \sigma^2 \mathbf{I})^{-1}\mathbf{y}$ $ $ $ \sigma_n^2(x) = k(x,x) – \mathbf{k}(x)^T(\mathbf{K} + \sigma^2 \mathbf{I})^{-1}\mathbf{k}(x).$ $ **My question is as follows:** If we let $ \widetilde{\sigma}_n^2(x) = k(x,x) – \mathbf{k}(x)^T \mathbf{K}^{-1}\mathbf{k}(x)$ be the posterior variance we would get under *noiseless* samples, then is it true that $ $ \sigma_n^2(x) \le \widetilde{\sigma}_n^2(x) + C\sigma^2 $ $ for some universal constant $ C$ ? Intuitively, if our samples are each corrupted by noise of variance $ \sigma^2$ , we shouldn’t expect to incur more than $ O(\sigma^2)$ additional uncertainty on the unknown function value $ f(x)$ that we are trying to predict.

**Notes:** A potential starting point is to use the Woodbury matrix identity to write $ $ \mathbf{k}(x)^T(\mathbf{K} + \sigma^2 \mathbf{I})^{-1}\mathbf{k}(x) = \mathbf{k}(x)^T \mathbf{K}^{-1}\mathbf{k}(x) + \sigma^2\mathbf{k}(x)^T \Big(\mathbf{K}^{-1} \big(\mathbf{I} + \sigma^2\mathbf{K}^{-1} \big)^{-1} \mathbf{K}^{-1}\Big)\mathbf{k}(x).$ $ By a matrix Taylor expansion, the final term should behave as $ O(\sigma^2)$ as $ \sigma^2 \to 0$ , which appears to yield the desired result. However, this approach leads to a constant factor depending on $ \mathbf{x}$ and $ n$ , whereas I would like to show the above result with an **absolute constant** $ C$ .

Having said that, if it makes things easier, I would be happy for $ C$ to depend on the covariance function $ k$ , and/or on the input domain $ \mathcal{X}$ (e.g., even the simple choices $ \mathcal{X} = [0,1]$ and $ k(x,x’) = e^{-c\cdot(x-x’)^2}$ would be of interest).