CERN Accelerating science

This website is no longer maintained. Its content may be obsolete.
Please visit https://home.cern/ for current CERN information.

CERN Accelerating science

Deep Neural Networks for particle reconstruction in high-granularity calorimeters

by Jan Kieseler (CERN)

Precision measurements in high energy physics as well as an increasing amount of searches for new phenomena rely on a precise reconstruction of the event that caused a particular signature in the detector. Particle flow algorithms [1, 2, 3, 4] aim to identify individual particles before they are merged to compound objects such as jets or missing (transverse) momentum. These approaches exploit all subdetector systems to resolve ambiguities and allow to apply calibrations on the level of individual reconstructed particle candidates. In consequence, particle flow algorithms typically lead to gains in physics performance. For existing detectors, they rely heavily on track information to resolve ambiguities [1, 2], since the calorimeter granularity is often insufficient to disentangle close-by particles or identify the particle type with high accuracy. These shortcomings can be addressed by high-granular calorimeters (HGCs) with high lateral, but also high longitudinal segmentation, where even close-by showers can be disentangled. Moreover, the identification and the energy measurement of individual particles can be enhanced by resolving the fine structure of the shower.

Therefore, high granular calorimeters are investigated for future lepton and hadron collider experiments. The focus of the lepton collider experiments is more on a precise measurement of the particle properties and because the overall radiation is lower, scintillator-based solutions can be employed, as proposed by the CALICE collaboration [5, 6]. For hadron colliders, also radiation considerations play a crucial role, leading to choosing more radiation hard detectors, such as a highly granular liquid Argon sampling calorimeter proposed for the hadronic phase of the future circular collider (FCChh) [7] to cope with up to 1000 simultaneous interactions per bunch crossing (pileup). For the high luminosity LHC phase, the endcap calorimeters of the CMS experiment will be replaced by the CMS high granularity calorimeter (HGCal), each endcap being a sampling calorimeter with 50 layers and in total 3 Million sensors. The sensors are made of silicon with sizes as low as 0.5 cm2 at high pseudorapidities and in the first layers, and scintillator material combined with silicon photomultipliers in areas with less radiation [8]. Also the other proposals for future calorimeters in Ref. [5, 6, 7] have similar granularities, far below the size of individual showers. Therefore the challenges for reconstructing individual particles from the energy deposits in each cell are very similar, despite different choices for the detector hardware.

To reconstruct particles in these calorimeters, the challenges are two-fold: the large number of readout channels requires fast algorithms, in particular at trigger level. However, at the same time, the algorithms should exploit the possibilities the detectors offer to the fullest, which requires sophisticated techniques that can not only reconstruct electromagnetic- but also hadronic showers, which have strongly irregular shapes with electromagnetic components, hadronic components and minimal ionizing particle (MIP) tracks connecting different parts of the shower. If possible, the algorithms should also include information from other tracking subdetectors to facilitate a fully consistent particle flow approach, including timing information for each particle to further control the effect of pileup. Even though high granularity calorimeters can be seen as dense trackers, approaches from tracking cannot be adapted easily. One of the main complications here is that tracks are objects with a well defined and very consistent helix shape in comparison to calorimeter showers, where shower shapes vary widely and inconsistently, particularly for hadronic showers. This can easily lead to problems in particular for strictly sequential algorithms, e.g., when electromagnetic showers are reconstructed first, and then the remaining deposits are reconstructed as hadronic energy. In this case, an early showering hadron would be reconstructed as an electromagnetic shower, with parameters optimised for such a shower, and the remnant would be mis-reconstructed as either a low energy hadron or the hadron reconstruction would fail due to the unexpected shower shape of the hadronic remnant, leading to worse a energy resolution and misidentification of individual particles.

Deep neural networks (DNNs) can in principle help to address both requirements on the reconstruction (good computing and physics performance) simultaneously. Even though they usually incorporate many more operations than standard algorithms, almost all operations are large matrix multiplications that are inherently parallelisable. In particular on dedicated hardware such as graphics processor units or field programmable gate arrays, that support up to thousands of parallel operations per clock cycle, the computation time can reduce almost linearly for large networks. This fact can provide benefits at trigger level as well as for offline reconstruction. As far as the physics performance is concerned, DNN based approaches have proven to be superior to standard approaches, in particular for complex problems that cannot easily be expressed in a simple physics model; or where the simple physics model does not fully apply due to detector effects. One example is the identification of the jet origin (jet tagging), where DNN based approaches have already become the default algorithms [9, 10, 11, 12, 13]. One strength of DNNs is that instead of a sequence of steps, where each step potentially removes information, they in principle allow to retain the full information up to the final reconstructed quantity, which is even often expressed as a probability. Therefore complications such as the example of the early showering hadron above can be avoided.

The key to reconstruction with DNNs is to adapt the neural network architecture to the structure of the problem. For example, many advances in computer vision only became possible through convolutional neural networks (CNNs) [14] which exploit the translation invariance of the image through moving kernels that find certain features (edges or complete objects) independent of their position. The CNN architectures require a strict equidistant grid, and are therefore only applicable to particular detector geometries. One example of a calorimeter with such geometry is the proposed barrel calorimeter for the FCChh detector. There, the application of a customised CNN architecture shows excellent performance for charged pion energy reconstruction as shown in Figure 1. The CNN is extended to three dimensions and adapted to perform optimal identification of hadronic and electromagnetic shower components: each kernel provides the local energy sum in addition to almost linear local corrections together with more non-linear corrections that take into account a larger area.

Figure 1: Energy response and resolution of charged pions in the FCChh barrel calorimeter using a DNN based approach and a globally calibrated topological clustering approach as benchmark.

Despite the promising results, this approach does not generalise to the irregular geometries of typical calorimeters at colliders and furthermore does not allow to incorporate track information in a natural way, needed for a consistent particle flow algorithm. A better representation of the calorimeter data is given by point clouds. In a point cloud, each sensor is represented by a point with position and other features, e.g. energy, sensor size, and timing information. This representation does not require a regular grid and can be sparse, such that only sensors with significant energy over threshold need to be processed.

To reconstruct individual quantities like particles from a point cloud or for segmentation (clustering) of objects in a point cloud, graph neural networks (GNNs) [15] have received increasing attention in the last years. These rely on vertices and edges connecting the vertices. Typically, information is exchanged along the edges, which might carry properties of their own that contribute to this information exchange. This approach has proven to be very powerful for jet tagging [16, 17], but also for segmentation of three dimensional objects in computer vision applications, e.g. in Refs. [18, 19, 20]. In addition, the representation as point clouds makes it possible to integrate track information and calorimeter hits in a natural way into the network, since both can be represented by vertices in the graph. For the application to particle reconstruction in calorimeters dedicated GNN architectures have been developed, addressing physics performance and also computing resource requirements [21]. The proposed GravNet performs the information exchange between neighbouring vertices, weighted by their distance. The vertex properties are transformed into a low dimensional space, where distances and neighbour relations are evaluated, and to higher dimensional features, which are exchanged between neighbour vertices. This distinction reduces the resource requirements as compared to proposals from the computer science literature, e.g. in Ref. [19] while having the same reconstruction performance. In addition, it removes the need to define the edges by hand in a preprocessing step. The qualitative performance is illustrated in Figure 2, showing two charged pions with an energy of approximately 50entering the calorimeter. Their showers develop very differently, however the GNN model is able to predict the energy fractions accurately.

Figure 2: Comparison of true energy fractions (left) and energy fractions reconstructed by the GravNet model (right) for two charged pions with an energy of approximately 50. Colours indicate the fraction belonging to each of the showers. The size of the markers scales with the square root of the energy deposit in each sensor [21].

This architecture has been adapted for the CMS HGCal for up to 5 showers and shows similar excellent reconstruction performance here, as shown in Figure 3. Particles of different types are separated accurately and simultaneously by the same algorithm.

Despite the excellent performance, these studies have the disadvantage that they do not generalise directly to an unknown number of particles in the detector. These models predict energy fractions per sensor, so the number of expected showers determines the number of fractions to be predicted and and upper bound needs to be defined a priori. Setting this bound to a very large number is not feasible, since it introduces problems with combinatorics in the training when matching and comparing true and reconstructed fractions. One solution would be to use a seed driven algorithm, where first seeds are identified and then particle properties are reconstructed around those seeds. However, this comes with some of the disadvantages of sequential algorithms as far as information loss is concerned and leads to computational overhead since the pattern recognition needed for seeding and final reconstruction is almost identical.

Another solution to the problem is to reconstruct edge features rather than vertex features. Instead of reconstructing a fraction per sensor, the connecting edge between two sensors is classified as either belonging to the same shower or to a different shower. This approach has been successfully studied for tracking applications [23, 24] and also for calorimeter clustering [25]. While this method in principle resolves the issues mentioned above and allows to predict an unknown number of showers, it comes with strict requirements: before the GNN is applied, all possibly true connections between sensors need to be inserted in the graph, such that they can be classified by the network, and the same connections need to be evaluated once classified to built the shower under question. Moreover, the binary nature of an edge classification makes this approach less applicable to situations with large overlaps and fractional assignments, however additional particle properties such as particle type or energy can also be attached to the edges and reconstructed. The final object properties can then be determined through the mean of all connected edges, requiring an additional step.

Figure 3: Comparison of true energy fractions (left) and energy fractions reconstructed by the GravNet model (right) for different particles entering the CMS HGCal. Colours indicate the fraction belonging to each of the showers. The size of the markers scales logarithmically with the energy deposit in each sensor. The particles enter the calorimeter from the bottom right. [22].

There are multiple other approaches from computer vision to solve the problem that comes with predicting an unknown number of objects in a point cloud or an image. Most use a grid to create anchor boxes [26, 27, 28, 29, 30, 31, 32], which are sensitive to the anchor box sizes, aspect ratios and their density [30, 28] and do not generalise easily to sparse point clouds. Recent anchor free approaches identify key points instead of using anchor boxes, which are tightly coupled to the physical centre of the object [33, 34]. While these techniques are well established in computer vision, they do not apply directly to detector data. The object condensation method [35] overcomes the problem by clustering the vertices in a learnable space that is fully detached from the physical input dimensions. All particle properties are accumulated by the neural network in one representative point for each cluster. These representative points have a well defined minimal distance in the clustering space and can therefore be collected with a simple algorithm. Moreover, they are not seeds around which the object properties are determined, but they form through the object being identified directly, omitting a sequential approach. This training method can be applied to existing neural network architectures such as e.g. GravNet, to extend the promising results on a limited number of particles to an unlimited number of particles without combinatorics issues.

The next step will be to combine neural network architectures that have shown excellent performance for a limited number of showers and methods to extend them to reconstruct an unknown number of particles. In this process it is only natural to also include information from other subsystems to facilitate a fully consistent particle flow approach. Once the performance of these algorithms is proven in simulation, techniques to mitigate differences between simulation and data can be directly incorporated in the reconstruction algorithms, e.g. through domain adaptation techniques [36, 37] or other adversarial approaches which allow to include data directly in the training. The lessons learnt from developing GNN techniques used for reconstruction will also be pivotal for generative adversarial based simulation techniques that could provide the speedup needed for simulating events at the high luminosity LHC, and that could also benefit from incorporating data to improve the simulation directly (see e.g. Ref. [38] and therein).

 

Further Reading

[1] A.M. Sirunyan, A. Tumasyan, W. Adam, E. Asilar, et al. Particle-flow reconstruction and global event description with the cms detector. Journal of Instrumentation, 12(10):P10003–P10003, Oct 2017.

[2] ATLAS Collaboration. Jet reconstruction and performance using particle flow with the ATLAS Detector. Eur.

Phys. J., C77(7), 2017.

[3] Manqi Ruan and Henri Videau. Arbor, a new approach of the Particle Flow Algorithm. In Proceedings, International Conference on Calorimetry for the High Energy Frontier (CHEF 2013): Paris, France, April 22-25, 2013, pages 316–324, 2013.\

[4] J. S. Marshall and M. A. Thomson. Pandora Particle Flow Algorithm. In Proceedings, International Conference on Calorimetry for the High Energy Frontier (CHEF 2013): Paris, France, April 22-25, 2013, pages 305–315, 2013.

[5] "The CALICE Collaboration". Design and electronics commissioning of the physics prototype of a si-w electromagnetic calorimeter for the international linear collider. Journal of Instrumentation, 3(08):P08001–P08001, Aug 2008.

[6] "The CALICE Collaboration". Calorimetry for lepton collider experiments - calice results and activities, 2012.

[7] M. Aleksa, P. Allport, R. Bosley, J. Faltova, J. Gentil, R. Goncalo, C. Helsens, A. Henriques, A. Karyukhin,

J. Kieseler, C. Neubüser, H. F. Pais Da Silva, T. Price, J. Schliwinski, M. Selvaggi, O. Solovyanov, and

A. Zaborowska. Calorimeters for the fcc-hh, 2019.

[8] CMS Collaboration. The Phase-2 Upgrade of the CMS Endcap Calorimeter. Technical Report CERN-LHCC2017-023. CMS-TDR-019, CERN, Geneva, Nov 2017. Technical Design Report of the endcap calorimeter for the Phase-2 upgrade of the CMS experiment, in view of the HL-LHC run.

[9] D. Guest, K. Cranmer, and D. Whiteson. Deep Learning and its Application to LHC Physics. Ann. Rev. Nucl. Part. Sci., 68, 2018.

[10] CMS Collaboration. CMS Phase 1 heavy flavour identification performance and developments. Technical Report CMS-DP-2017-013, 2017.

[11] CMS Collaboration. New Developments for Jet Substructure Reconstruction in CMS. Technical Report CMS-DP2017-027, 2017.

[12] ATLAS Collaboration. Identification of Jets Containing b-Hadrons with Recurrent Neural Networks at the ATLAS Experiment. Technical Report ATL-PHYS-PUB-2017-003, 2017.

[13] A. Butter, K Cranmer, D Debnath, B. M. Dillon, et al. The Machine Learning Landscape of Top Taggers. SciPost Phys., 7:014, 2019.

[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Intelligent Signal Processing, pages 306–351. IEEE Press, 2001.

[15] F. Scarselli, M. Gori, A Tsoi, M. Hagenbuchner, et al. The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 2009.

[16] H. Qu and L. Gouskos. ParticleNet: Jet Tagging via Particle Clouds. 2019.

[17] E. Moreno, O. Cerri, J. Duarte, H. Newman, et al. JEDI-net: a jet identification algorithm based on interaction networks. Eur. Phys. J., C80(1):58, 2020.

[18] R. Qi Charles, Hao Su, Mo Kaichun, and Leonidas J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017.

[19] Yue Wang et al. Dynamic graph cnn for learning on point clouds. arXiv:1801.07829 [cs.CV], 2018.

[20] Biao Zhang and Peter Wonka. Point cloud instance segmentation using probabilistic embeddings. ArXiv, abs/1912.00145, 2019.

[21] S.R. Qasim, J. Kieseler, Y. Iiyama, and M. Pierini. Learning representations of irregular particle-detector geometry with distance-weighted graph networks. Eur. Phys. J., C79(7):608, 2019.

[22] CMS Collaboration. Application of Distance-Weighted Graph Networks to Real-life Particle Detector Output. Technical Report CMS-DP-2020-001, 2020.

[23] Steven Farrell, Paolo Calafiura, Mayur Mudigonda, Prabhat, et al. Novel deep learning methods for track reconstruction. In 4th International Workshop Connecting The Dots 2018 (CTD2018) Seattle, Washington, USA, March 20-22, 2018, 2018.

[24] Farrell, S., Anderson, D., Calafiura, P., Cerati, G., et al. The hep.trkx project: deep neural networks for hl-lhc online and offline tracking. EPJ Web Conf., 150:00003, 2017.

[25] Xiangyang Ju, Steven Farrell, Paolo Calafiura, Daniel Murnane, et al. Graph Neural Networks for Particle Reconstruction in High Energy Physics detectors. In Thirty-third Conference on Neural Information Processing Systems (NeurIPS2019), Vancouver, Canada, 2019.

[26] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. CoRR, abs/1506.02640, 2015.

[27] Joseph Redmon and Ali Farhadi. YOLO9000: better, faster, stronger. CoRR, abs/1612.08242, 2016.

[28] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497, 2015.

[29] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, et al. SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015.

[30] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, et al. Focal loss for dense object detection. CoRR, abs/1708.02002, 2017.

[31] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017.

[32] Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. Pointrcnn: 3d object proposal generation and detection from point cloud. CoRR, abs/1812.04244, 2018.

[33] Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. FCOS: fully convolutional one-stage object detection. CoRR, abs/1904.01355, 2019.

[34] Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. Objects as points. CoRR, abs/1904.07850, 2019.

[35] Jan Kieseler. Object condensation: one-stage grid-free multi-object reconstruction in physics detectors, graph and image data, 2020.

[36] John S. Bridle and Stephen J. Cox. Recnorm: Simultaneous normalisation and classification applied to speech recognition. In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 234–240. Morgan-Kaufmann, 1991.

[37] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Vaughan. A theory of learning from different domains. Machine Learning, 79:151–175, 2010.

[38] D. Belayneh, F. Carminati, A. Farbin, B. Hooberman, et al. Calorimetry with deep learning: particle classification, energy regression, and simulation for high-energy physics. 2019.