Why do spectrum plots look ugly?¶

Very often when we compute the spectrum of a Hamiltonian over a finite grid of parameter values, we cannot resolve whether crossings are avoided or not. Further if we only compute a part of the spectrum using e.g. a sparse diagonalization routine, we fail to find a proper sequence of levels.

Let us illustrate these two failure modes.

# Just some initialization

%matplotlib inline
import numpy as np
from scipy import linalg
from scipy.optimize import linear_sum_assignment
import matplotlib
from matplotlib import pyplot

matplotlib.rcParams['figure.figsize'] = (8, 6)

def ham(n):
    """A random matrix from a Gaussian Unitary Ensemble."""
    h = np.random.randn(n, n) + 1j*np.random.randn(n, n)
    h += h.T.conj()
    return h

def bad_ham(x, alpha1=.2, alpha2=.0001, n=10, seed=0):
    """A messy Hamiltonian with a bunch of crossings."""
    np.random.seed(seed)
    h1, h2, h3 = ham(n), ham(n), ham(n)
    a1, a2 = alpha1 * ham(2*n), alpha2 * ham(3*n) * (1 + 0.1*x)
    a2[:2*n, :2*n] += a1
    a2[:n, :n] += h1 * (1 - x)
    a2[n:2*n, n:2*n] += h2 * x
    a2[-n:, -n:] += h3 * (x - .5)
    return a2

xvals = np.linspace(0, 1)
data = [linalg.eigvalsh(bad_ham(x)) for x in xvals]
pyplot.plot(data)
pyplot.ylim(-2.5, 2.5);

This is mock data produced by a random Hamiltonian with a bunch of crossings. We know that some of these apparent avoided crossings are too tiny to resolve, and should instead be classified as real crossings.

Let's now simulate what would happen if we also use sparse diagonalization to obtain some number of eigenvalues closest to 0.

truncated = [sorted(i[np.argsort(abs(i))[:13]]) for i in data]
pyplot.plot(truncated);

The ugly jumps are not real, they appear merely because some levels exit our window and new ones enter.

A desperate person who needs results right now at this point replots the data using a scatterplot.

pyplot.plot(truncated, '.');

This is OK, but at the points where the lines are dense our eye identifies vertical lines, making the plot harder to interpret.

The solution part I¶

Let's try to fix this situation by using our knowledge of the inner products of neighboring sets of eigenvectors from neighboring points. We will start from assuming that the states from neighboring parameter values the maximal overlaps are continuations of each other.

Searching a bit online shows that selecting the set of pairs that maximizes the entries in a matrix is called an assignment problem, or weighted bipartite perfect matching. Searchign for "assignment problem python" leads us to scipy.optimize.linear_sum_assignment. This should be enough.

We start with the full problem to observe that there we have exactly what we need, and then continue to the reduced problem where just doing the matching is not going to be sufficient.

def eigh_interval(x, levels=30):
    """Emulate behavior of sparse diagonalization by computing levels closest to a point."""
    e, psi = linalg.eigh(bad_ham(x))
    ind = np.argsort(abs(e))[:levels]
    return e[ind], psi[:, ind]

e, psi = eigh_interval(xvals[0])
sorted_levels = [e]
for x in xvals[1:]:
    e2, psi2 = eigh_interval(x)
    Q = np.abs(psi.T.conj() @ psi2)  # Overlap matrix
    assignment = linear_sum_assignment(-Q)[1]
    sorted_levels.append(e2[assignment])
    psi = psi2[:, assignment]

pyplot.plot(sorted_levels)
pyplot.ylim(-2.5, 2.5);

Perfect! Compare the amount of information you get from this plot, as compared to either the original one or the scatter plot.

Now let's try the same but only taking a part of all the levels.

e, psi = eigh_interval(xvals[0], 13)
sorted_levels = [e]
for x in xvals[1:]:
    e2, psi2 = eigh_interval(x, 13)
    Q = np.abs(psi.T.conj() @ psi2)  # Overlap matrix
    assignment = linear_sum_assignment(-Q)[1]
    sorted_levels.append(e2[assignment])
    psi = psi2[:, assignment]

pyplot.plot(sorted_levels);

We're almost there, except that we always find a matching. This means that whenever a state exits our window at the top, and another one enters from below, the algorthm matches them.

The solution part II¶

I am unsure what the best strategy should be for deciding when we observe such behavior, however one decent idea is to not connect the two energies where they may not correspond to the same band under any circumstances.

Following a gut feeling we are going to classify states that overlap weakly enough as belonging to different bands. Two random vectors of length $N$ would have overlap of $1/\sqrt{N}$ on average. On the other hand, when the overlap is larger than $1/\sqrt{2}$, we're sure that this is the best match. If the eigenvectors bunch together, we cannot rely on the overlap being always larger than $1/\sqrt{2}$, so we'll choose a geometric mean between these two as the threshold.

def best_match(psi1, psi2, threshold=None):
    """Find the best match of two sets of eigenvectors.

    
    Parameters:
    -----------
    psi1, psi2 : numpy 2D complex arrays
        Arrays of initial and final eigenvectors.
    threshold : float, optional
        Minimal overlap when the eigenvectors are considered belonging to the same band.
        The default value is :math:`1/(2N)^{1/4}`, where :math:`N` is the length of each eigenvector.
    
    Returns:
    --------
    sorting : numpy 1D integer array
        Permutation to apply to ``psi2`` to make the optimal match.
    diconnects : numpy 1D bool array
        The levels with overlap below the ``threshold`` that should be considered disconnected.
    """
    if threshold is None:
        threshold = (2 * psi1.shape[0])**-0.25
    Q = np.abs(psi1.T.conj() @ psi2)  # Overlap matrix
    orig, perm = linear_sum_assignment(-Q)
    return perm, Q[orig, perm] < threshold


e, psi = eigh_interval(xvals[0], 13)
sorted_levels = [e]
for x in xvals[1:]:
    e2, psi2 = eigh_interval(x, 13)
    perm, line_breaks = best_match(psi, psi2)
    e2 = e2[perm]
    intermediate = (e + e2) / 2
    intermediate[line_breaks] = None
    psi = psi2[:, perm]
    e = e2
    
    sorted_levels.append(intermediate)
    sorted_levels.append(e)

pyplot.plot(sorted_levels);

Et voila!

The threshold is a bit arbitrary, but it should work for most practical cases.