I love open science. Since you are reading a scientific blog, I believe it is likely that you also support many of open science ideas. Indeed, easy access to publications, code, and research data makes research easier to reuse, while also ensuring transparency of the process and better quality control. Unfortunately the academic community is extremely conservative and it just takes forever for new standards to become commonplace.
You're a researcher doing numerical modelling. You're an old hand. You use Python (and therefore are awesome).
Your days are spent constructing mathematical models, implementing them in code, and exploring the models as you change different parameters. You realize that simplicity is key, so you make sure to write your models as pure functions of the parameters, maybe a little something like this:
def complicated_model(x, y): ... return result
Beautiful; now it's time to do some science! Of course, you'll want to plot your model as a function of the parameters. Python makes this super simple but, as we'll see, this simplicity has a price.
I teach the undergraduate solid state physics course, where we just switched to a shiny new book "Oxford Solid State Basics" by Steve Simon.
Steve's story of condensed matter physics starts with the heat capacity of solid materials. It's a great way to dive into how quantum mechanics combines with lucky guesses to improve our understanding of what is happening. It is also what we do in our course.
A great source of experimental data showing the problem is Einstein's original work, and Steve's book reproduces the plot from Einstein. (See also the English translation) Unfortunately that plot belongs to the current publisher of Annalen der Physik and cannot be republished under a free license. So in order to provide this data in the lecture notes and to make it available to whoever wants, I decided to take the original data Einstein has and repeat the exercise. Because we are living in an enlightened age, I also wanted to see if the more advanced Debye model would be any better for Einstein's data.
Neural networks have an advantage compared to humans because they have access to a much larger body of information. This is why what looks like random noise to a human, after correct processing by a machine learning algorithm turns out to be a signature of Higgs boson.
While analysing physics data with machine learning is definitely a great direction of research, another intriguing possibility is trying to infer what the researchers themselves think. To make an example, the domain of sentiment analysis tries to not only extract the information contained in the text, but also the attitude the author has about this information.
Last semester Anton was lecturing on the undergraduate Solid State Physics course at TU Delft. The course lasted several weeks, and each week there was a mini exam that students on the course could take for partial credit. This was a big course with 200 participants, and the prospect of having to manually grade 200 exam manuscripts every week was not something that anyone on the course team was looking forward to.
I wrote a column for the newsletter of our institute. Since I liked the result, I'm also reposting it below.
As a child I had a book "Bad advice" that contained nothing but poems suggesting you to do what you should really never do. So here is my bad professional advice (except that I won't risk making poetry):
Why do spectrum plots look ugly?¶
Very often when we compute the spectrum of a Hamiltonian over a finite grid of parameter values, we cannot resolve whether crossings are avoided or not. Further if we only compute a part of the spectrum using e.g. a sparse diagonalization routine, we fail to find a proper sequence of levels.
Let us illustrate these two failure modes.
# Just some initialization %matplotlib inline import numpy as np from scipy import linalg from scipy.optimize import linear_sum_assignment import matplotlib from matplotlib import pyplot matplotlib.rcParams['figure.figsize'] = (8, 6)
def ham(n): """A random matrix from a Gaussian Unitary Ensemble.""" h = np.random.randn(n, n) + 1j*np.random.randn(n, n) h += h.T.conj() return h def bad_ham(x, alpha1=.2, alpha2=.0001, n=10, seed=0): """A messy Hamiltonian with a bunch of crossings.""" np.random.seed(seed) h1, h2, h3 = ham(n), ham(n), ham(n) a1, a2 = alpha1 * ham(2*n), alpha2 * ham(3*n) * (1 + 0.1*x) a2[:2*n, :2*n] += a1 a2[:n, :n] += h1 * (1 - x) a2[n:2*n, n:2*n] += h2 * x a2[-n:, -n:] += h3 * (x - .5) return a2 xvals = np.linspace(0, 1) data = [linalg.eigvalsh(bad_ham(x)) for x in xvals] pyplot.plot(data) pyplot.ylim(-2.5, 2.5);
This is mock data produced by a random Hamiltonian with a bunch of crossings. We know that some of these apparent avoided crossings are too tiny to resolve, and should instead be classified as real crossings.
Let's now simulate what would happen if we also use sparse diagonalization to obtain some number of eigenvalues closest to 0.
truncated = [sorted(i[np.argsort(abs(i))[:13]]) for i in data] pyplot.plot(truncated);
The ugly jumps are not real, they appear merely because some levels exit our window and new ones enter.
A desperate person who needs results right now at this point replots the data using a scatterplot.
This is OK, but at the points where the lines are dense our eye identifies vertical lines, making the plot harder to interpret.