Connecting the dots: applying deep learning techniques in HEP
With massive amounts of computational power, machines can now recognize objects and translate speech in real time. Deep-learning software attempts to mimic the activity in layers of neurons in the neocortex, the wrinkly 80 percent of the brain where thinking occurs. The software learns, to recognize patterns in digital representations of sounds or images while it can also be used to analyse the data collected by the detectors of the LHC experiments.
Beginning around the time of the Higgs discovery, the data analysis world outside high energy physics saw a resurgence in interest in machine learning, driven in part by new and innovative approaches to training neural networks.
These new “deep” networks were able to label raw data with much better accuracy than previous carefully hand-crafted algorithms.s. Deep Learning over the past few years given rise to a massive collection of ideas and techniques that were previously either unknown or thought to be untenable.
The ATLAS and CMS experiments at the LHC search for new particles, rare processes and short-lived particles. Challenges in these searches stem from the required statistics and the backgrounds that could hide signals of new physics. These challenges drive a need to explore how advanced machine-learning methods could apply to improve the analysis of data recorded by the experiments.
Presently the experiments select interesting events - at the level of the so-called High-Level Triggering - by reconstructing the trajectory of each particle using the raw data from the silicon layers of the inner tracker. Raw data from the detector are processed to obtain hit clusters, formed by nearby silicon pixels that see a signal. The cluster shape depends both on the particle, on its trajectory and on the module that has been hit. In that sense, track reconstruction by its nature is a combinatorial problem that requires great computational resources. It is implemented as an iterative algorithm that allows to first search for easy tracks, eliminate from the successive searches the hits associated with the found tracks, and look for the more difficult tracks in successive steps.
Stages of track reconstruction (Image credit: joona.havukainen@helsinki.fi).
Perhaps, one of the most challenging, albeit very interesting challenges for machine learning techniques, is the study of jets originating from heavy flavour quarks (b/c “tagging”) which at the same time is one of the crucial objects to search for new physics. A variety of b-tagging algorithms has been developed by CMS to select b-quark jets based on variables such as the impact parameters of the charged-particle tracks, the properties of reconstructed decay vertices, and the presence or absence of a lepton, or combinations thereof. These algorithms heavily rely on machine learning tools and are thus natural candidates for advanced tools like deep neural networks.
Starting with improved b-jet tagging techniques the method can also be applied in jets containing W, Z or top particles. Markus Stoye, a CMS physicist and Professor at Imperial College London, leads the effort of applying deep learning algorithms to interpret data recorded by the CMS detector. In the last two years, the team has applied deep learning techniques to tackle the challenges of studying the jets of particles produced at the LHC Run 2.
Boosted objects like jets can have a large number of tracks in a small section of the tracker and thus reconstructing these is difficult as allowing shared hits increases reconstruction of fake and duplicate tracks. A neural network classifier identifies if hits are caused by single or multiple tracks and it has been shown to increase tracking efficiency in dense regions. Moreover, the use of deep neural networks (DNN) instead of boosted decision trees (BDT) as classifiers can further improve the efficiency and reduce the fraction of "fake" tracks.
The new generation of b tagging algorithms have shown significantly improved performance compared to previous b-taggers. Stoye explains: “A variety of b tagging algorithms has been developed at CMS to select b-quark jets based on variables such as the impact parameters of the charged-particle tracks, the properties of reconstructed decay vertices, and the presence or absence of a lepton.”
Following an initial training period to familiarize with the concepts and available tools he helped form a group within the CMS collaboration. The success of the deep learning approach resulted in a great team and today more than ten people work to push further deep learning techniques in the analysis of CMS data. “Currently, we design a neural network architecture DeepCSV that can do simultaneously the formerly independent steps that are followed in the analysis of jets, e.g. variable design per particle and track selection. The input to the deep learning algorithm are the constituents of the jet, meaning all its particles and secondary vertices”. These adds up to about 1000 features and if you use a general dense deep neural network, you might have 10.000.000 to minimize in your optimization. Based on some assumptions stemming from the physics describing these interactions you can reduce the complexity bringing this number down to 250.000.
In contrast to the other algorithms, it uses properties of all charged and neutral particle-flow candidates, as well as of secondary vertices within the jet, without a b-tagging specific preselection. The neural network consists of multiple 1x1 convolutional layers for each input collection. Their output goes to recurrent layers, followed by several densely connected layers. So far, in the CMS simulations, this algorithm outperforms the other taggers significantly, especially for high-pt jets, which could lead to improved sensitivity in searches for new physics with high energetic b jets in the final state.
Figure 1. Performance of the b jet identification efficiency algorithms demonstrating the probability for non-b jets to be misidentified as b jet as a function of the efficiency to correctly identify b jets. DeepCSV significantly better than CSVv2 in ~every b jet efficiency value versus both light and c jet misidentification (as b jets) probability.
Deep learning techniques have achieved great results in pattern recognition tasks. "In this process one has to understand both the architectures that are available as well as the physics problems that one aims to address. Markus comments: "We input pretty complete information about the particles to the algorithm and gradually the neural network starts becomes able to figure out itself what is most important for the analysis” and continues: “We know that we have better tagging following the copious efforts of the past nine months. Thanks to the neural network technique there is an acceleration in the way we improve on these fields compared to the past though this is not to undermine all past efforts and the way in which they pushed our understanding.“
Regarding future steps, the team plans to develop ways to reduce systematic uncertainties. Markus explains: “Presently there are different approaches on that within data science and this is an aspect that we presently focus. It is a major branch of research in data science called domain adaptation while it will be a major step in developing new techniques”.
Machine learning has also been a part of the ATLAS physics program since the beginning of LHC data taking. "Neural networks, while basic by today’s standards, were part of b-quark identification since first data in 2009" note David Rousseau and Dan Guest, ATLAS ML conveners. "Boosted Decision Trees (BDT), a relatively simple algorithm, have been the most frequently used algorithm while since 2012, ATLAS uses these algorithms not only to identify visible particles, but also to better measure them".
Once the particles emerging from the collisions are identified and measured, and the event saved on disk, physicists have to sift through billions of events to find and study the rarest unstable particles such as the Higgs boson. Machine Learning did not have a major role in ATLAS for the Higgs boson discovery in 2012, however the BDT technique has had major impact on the study of the more difficult channels that were later discovered: the Higgs boson decaying to tau leptons, to b-quarks, the separation of different Higgs boson production mechanisms and very recently the direct observation of the coupling of the Higgs boson to the top quark. Regression techniques were used, e.g., for the calibration of electrons and photons with impact on the resolution of the Higgs peak and on the measurement of its mass.
Recently ATLAS physicists have put these new algorithms to work on their data. The ATLAS detector consists of millions of individual sensors, which must work in unison to detect hints of new particles. Conventional event reconstruction relies on many hand-crafted algorithms, which first distil the millions of raw sensor outputs down to hundreds of physical objects, and then further summarize these objects as a few variables in each collision. BDTs are still useful as the final step, when objects must be classified from a dozen of variables. But more modern machine learning, with its ability to distil very complicated information into a few meaningful quantities, has opened new doors.
Thanks to modern neural networks, physical objects like jets and electrons may soon be identified directly from raw data. For example, calorimeters measure energies in discrete cells, and are somewhat analogous to cameras measuring light in pixels, meaning that image processing techniques can be adapted to our calorimeters. Moreover, Recurrent Neural Network, initially developed for text analysis (where sentences have variable number of words), can be used to analyse a bunch of tracks in a jet. As the machines advance, ATLAS physicists ask more sophisticated questions. With the help of generative networks, for example, physicists can invert the particle classification problem to simulate realistic physics instead.
Schematic of RNN for b tagging in ATLAS.
"But we can’t do it all on our own! None of these modern advances would be possible without help from outside software engineers, industry data scientists, and academic machine learning researchers." says David Rousseau. To help bring new ideas to High Energy Physics, ATLAS is in the process of releasing curated datasets to the public. ATLAS also recently set up a mechanism to “embed” Machine Learning researchers in the collaboration, giving them access to internal software and simulated data. Thus they can collaborate directly with ATLAS physicists and eventually publish their results together with the ATLAS collaboration.
The author would like to warmly thank Markus Stoye (CMS) as well as David Rousseau and Dan Guest (ATLAS) for their invaluable contributions and comments.