Final Year Physics Project Interim Report
Anthony Hashemi
Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom
I. Introduction
The Standard Model of particle physics describes
the fundamental particles of physics
(fermions and bosons) and their corresponding
interactions 1; the electromagnetic,
strong and weak force. Fermions
have half integer spin, and each has a corresponding
antiparticle different to itself.
Bosons have integer spin and can be their
own antiparticle, are responsible for the
fundamental forces between particles, and
include the photon, gluons, and weak decay
(W and Z) bosons. All the predicted
particles of the standard model have now
been discovered experimentally since the
Higgs boson was confirmed in the past few
years 2, 3. However, it has many parameters
calculated by experiment and only includes
three of the four fundamental forces;
it excludes gravity, as the graviton is yet to
be found 4. Also, it doesn’t explain many
cosmological phenomena such as dark matter
5 or what happened to all the antimatter
after the big bang (CP violation)
6. Therefore, it is believed to approximate
a larger complete theory. The Standard
Model organises fermions (quarks and leptons)
using the flavour property (six quark
flavours, and six lepton flavours). Out of
the three standard model interactions: only
the weak force can change flavour, and then
only in charged current transitions (mediated
by a W+ or W? boson) and therefore
flavour transitions between fermions
of the same charge (and different flavour),
known as flavour-changing neutral-current
(FCNC) processes, cannot be contributed
to by tree-level diagrams 7 and only occur
through higher-order loop processes. Tree
level diagrams are those Feynman diagrams
8 without loops, higher order processes
have diagrams containing loops in which
virtual particles are involved and represent
higher orders of perturbation theory
(virtual particles are transient fluctuations
that violate energy and momentum conservation
very briefly, and come from perturbation
theory where particle interactions
described by virtual particles).
The Cabibbo-Kobayashi-Maskaw matrix
9 gives information of flavour-changing
Figure 1: This is the CKM Matrix. Each element
represents the transition probability between
the two quarks in the subscript. For example
Vud represents the probability of an up
(or anti-up) quark transitioning into a down
(or anti-down) quark or vice versa. Specifi-
cally in the Wolfenstein configuration shown,
? = 0.2257+0.0009
?0.0010 is the expansion parameter,
? = 0.135+0.031
?0.016, ? = 0.349+0.015
?0.017 and
A = 0.814+0.021
?0.022. Therefore the Wolfenstein
CKM matrix is almost diagonal.
weak decays, specifically it gives the likelihood
of a transition between the up and
down type quarks. We will use the Wolfenstein
Parametrization of CKM 10 which
is approximately diagonal, as the diagonal
elements are the charge changing transitions
within each of the three fermion
flavour generations (u?d, c?s, t?b) and
the elements involved in higher order loop
transitions are therefore the off diagonals
and so are relatively small compared to
the tree level diagonal elements. Therefore,
in the standard model, higher order loop
processes are suppressed because of the
Glashow?Iliopoulos?Maiani (GIM) mechanism
11 and the CKM Matrix being approximately
diagonal. CP violation is the
combination of charge conjugation violation
and parity violation. Charge conjugation
is the symmetry between positive and
negative charges and parity is spatial coordinate
symmetry. CP violation has been
tested in the Kaon system 12 but not the
whole SM, and there is a discrepancy of the
CP violation given by CKM and the CKM
required to explain the low amount of antimatter
in the universe is obviously. Therefore,
looking for CP violation in B decays
beyond the SM CKM is vital to account
for the B asymmetry in the universe. Some
examples of NP are low-energy supersymmetric
extensions of the SM (SUSY) 13,
where there are extra sources of CP violation
and the CKM ?CKM is applied to
new particles.
Rare b meson decays are rare decays (decays
of particles in which fermions change
flavour) of b mesons; mesons containing a
bottom (b) quark (or anti quark). Rare decays
are suppressed due to involving higher
order loops, or small (combinations of)
CKM matrix elements or both. These are
features which arent necessarily true in beyond
standard model physics. As the SM
does a very good job at describing fundamental
particles and interactions, it is
known that the deviations from SM that
NP brings must be small. So, the changes
are only to be seen in processes which are
suppressed in the SM, which is the case in
rare decays which means they are of interest
when investigating NP. The first rare
B decay was found by the CLEO experiment
14 of the b?s process and the sub2
Figure 2: LHCb Schematic.
sequent BaBar 15, 16 and Belle 17 experiments
among others have given strong
constraints on NP models.
At the LHC 18 there are four large experiments
ongoing: ATLAS 19, ALICE
20, CMS 21 and LHCb 22 (as well
as other smaller ones). LHCb is a forward
spectrometer covering pseudorapidity
range 2 < ? < 5 whose objective is to look for evidence that the Standard Model (SM) of particle physics is not complete, specifically by looking at rare B hadron (particles containing b and anti-b quarks) decays as explained above. Before using the LHC, the rare B decays studied have agreed with the SM to the level of precision possible. The LHCb allows greater precision due to the high energy proton-proton collisions, which gives a large cross section of O(100µb) 23 and so the possibility to show discrepancies with the SM. II. Theory B0 is made of anti b quark and a d quark. ? 0 is made up of d anti d in this decay (can also be u anti u). Then there are the dimuon pair mu+, mu-. The decay is anti b to anti d and mu+ and mu-. B mesons produced by proton-proton collisions in the LHCb stay close to the line of the beam pipe and so the detector is a series of different components stacked behind each other in that line. Firstly, the Vertex Locator (VeLo) detector is where the proton beams collide. The B meson is indirectly detected by matching the theoretical decay distance of the B meson (just over 1 mm) to a particle decay distance in the VeLo. The decay distance of a particle in the VeLo is measured by detecting the position of the particle decay and the position of the proton collision. Then the Ring Imaging Cherenkov (RICH) detectors identify different charged particles from a B decay, such as pions, kaons and protons by measuring the emissions of Cherenkov radiation, allowing the determination of velocity. In between the two RICH detectors, there is a magnet which allows the momentum of charged decay particles to be calculated by looking at their curvature in the field. Silicon and outer trackers then record trajectories of charged particles; charged particles collide with silicon atoms in the silicon tracker giving an electric current and 3 they may ionise gas molecules in the outer tracker giving an electric current. Combining the velocity from RICH and trajectories from the tracking system and magnetic field gives the mass, charge, and therefore the particle itself. Pion 0 is not charged and so cannot be detected by the RICH, magnetic field, or tracking detectors and so it (and other non-charged particles) needs an extra detector. This is solved with the calorimeter system, which stops particles as they hit metal plates, releasing a shower of particles which then hit metal plates and emit ultraviolet light. The calorimeters can then measure the energy lost by measuring the amount of UV as UV luminosity proportional to energy of particles. The other particles in the B decay are the dimuon pair: mu+, mu- and although they are charged particles, they need a separate detector system to the other charged particles and cannot use the calorimeter system as they pass through the it with almost no energy loss as they are heavy. The muon system can only detect particles coming from muons, as all others are stopped by the calorimeter system. The muon system is made up of five rectangular stations containing carbon dioxide, argon and tetrafluoromethane, which the muons react with and the fallout is detected by electrodes. A Monte Carlo algorithm is used to simulate proton-proton collisions so as to be able to compare the real data to something we think it should look like. Such processes that are simulated the proton-proton to B0 decay, hadronics (propagation of particles) and the electronics of the detector. Charges in the electronics correspond to hits, which allow the construction of tracks and particle in the same way as the real particle charges in the detectors can. All the events in our dataset are potential B0 decays but most of it is background. They are selected by combining muons with a neutral pion whilst requiring restrictions on the impact parameters of these daughter particles and potential B particle, the B momentum to be directed to the primary vertex and more for the muons. There are different types of background (misidentified groups of particles) present in the data. One type is combinatorial background, where the particles in an event come from different decays, another is a peaking background where the particles are from a single decay but at least one of the particles is misidentified and lastly, partially reconstructed backgrounds where at least one of the final state particles in the decay are not reconstructed. 24 To choose which variables were best to cut on, we use Receive Operating Characteristic (ROC) curves 25. ROC curves are graphs where 1 ? B is plotted against S where B is Efficiency of Background = Accepted Background Events / Total number of Background Events, S is Efficiency of 4 Signal = Accepted Signal Events / Total Number of Signal Events, where the corresponding signal and background accepted events are those which are kept by the cut on that number (direction decided by which one removes more background than signal). A square ROC would be the optimum (full separation) and the worst would be a straight line between (0,1) and (1,0) (no separation). All curves have area > 0.5
by this convention and so the greater area,
the better the variable to cut.
Once good variables are chosen to cut on
to remove background (but preserve most
signal), to choose the best value of a variable
to cut on, an efficiency measure can
be used, and here this is chosen to be
the Punzi Figure of Merit: P = S/(1 ?
B ? NB) where NB is Total Number of
Number of Background Events. Maximising
this gives the best ratio of background
loss to signal loss 26.
Classifiers predict the outcome of a binary
question. Multivariate classifiers combine
multiple affecting variables into a fi-
nal output variable which can be used to
achieve better results than a single variable
to separate/classify, where trends may
not be visible. There are multiple types of
multivariate classification methods such as
Boosted Decision Trees (BDT) and Artifi-
cial Neural Networks. A decision tree (DT)
is a system made up of a consecutive set
of nodes (at the ends of branches) with a
binary response made at each. A final classification
is given after reaching the end of
a given node path (at a leaf). A DT is
trained on a dataset which the outcome to
the tree that is required is already known.
When looking for a particle decay signal,
signal from simulated data and background
from real data is given (separated by looking
outside the predicted range of the signal
of particle but if not possible can just use
the simulated background data). A decision
tree alone is a rather poor classifier as
it cant learn that well and overfits to the
training set. A random forest, is an ensemble
learning method made up of a collection
of decision trees whose classification
is decided as the most common classification
of the separate trees corrects the overtraining
issue of a DT as the weighted average
of all the trees becomes insensitive
to fluctuations of specific training samples.
A BDT is a random forest which uses a
boosting method to make sure misclassified
events are improved on. AdaBoost (Adaptive
Boosting), used here, gives each total
training set event a weight (probability of
use in training subset); higher weight indicates
greater probability. It trains the
trees on this subset of the training data,
and then increases the weights of misclassified
events whilst reducing the weights
of correctly classified events and then repeats
the process until the weight of misclassified
events is > 50%. The final out5
put weight is the sum of all classifiers output
weighted by their errors; < 50% accuracy given negative weight, >= 50% accuracy
given weight zero. An Artificial Neural
Network (ANN) on the other hand is a
simulation of a model based on the human
brain, specifically the network of neurons.
Like a BDT, it learns from a training set
but does so differently. They usually consist
of layers of interconnected nodes (artificial
neurons). There is the input layer,
a hidden layer(s) and output layer. These
can be either feed-forward (signal travels
from input to output) and recurrent (signals
can travel in both directions, allowing
older results to be used again, giving the
system a sort of ‘memory’). Here the focus
is on feed-forward types, specifically multilayer
perceptrons (MLP). A simple perception
has an input layer and output node.
The inputs are fed into the output node but
each connection is weighted (initially randomly)
but change based on the learning.
The output node then takes the weighted
sum of input values and uses an activation
function to give an output. Simple perceptrons,
however, can only model linear problems
well. So for large background/signal
classification, MLP is required, where there
are ‘hidden’ layer(s) in between the input
and output with hidden neurons which all
act like the output node in the simple case,
however feed into another layer of hidden
nodes or finally again an output node. The
Figure 3: ROC for B0 MMERR. Area = 0.93
Figure 4: Punzi Figure of Merit for
B0 MMERR. Has maximum at 29.9. This is
the best value to cut on.
multiple activation functions in the multiple
neurons allows for a non-linear classifi-
cation by combining the linear activation
functions into a non-linear combination.
The errors in the classification are then corrected
by propagating the error backward
(backpropagation), readjusting the weights
accordingly at each layer. 27
III. Work Done in Term 1
Firstly, I familiarised myself with ROOT
and C++ and revisited Python. Then
I worked on the B decay kinematics
questions provided; calculating B0 decay
distance, momenta of an example
Figure 5: gamma1 PY MC Background and
Real Signal shows a strange issue in the
B0 decay, impact parameters and Kaon
decay distances (all in ROOT). Then I
determined the number of signal events
in a toy sample by fitting to the mass
variable and integrating the area under
the fit curve. Then I started analysing
the data from CERN; specifically looking
for any variables which would give a good
cut to get rid of background in the real
data to be able to see a signal peak in the
B0 M variable. This was done by plotting
two different graphs of each variable
superimposed on each other: one from
the MC data in the signal region given by
B0 BKGCAT < 51 and the other from the real data in the background only range i.e. B0 M < 4900 MeV and B0 M > 5800 MeV
(outside the MC signal range of B0 M).
After looking at these, I decided on cuts
to get rid of background in the real data
for the variables with the two graphs with
least overlap to lose background without
losing much signal. Singular and parallel
cuts gave no obvious signal once applied
to the real data B0 M plot, although they
made it look much more exponential, so
removed a lot of large background covering
the whole range. Subsequently I plotted
ROC efficiency curves for these good
variables to find the best of them, listing
them in order of area under curve. The
top 10 variables were (in order from best
to worst): B0 MMERR, pi0 P, pi0 PT,
pi0 MMERR, B0 TAUERR, gamma1 P,
Weird effects were observed in the
pi0 PY/ gamma1(2) PY graphs. After
consideration this was thought to be some
issue in the calorimeter and was confirmed
by plotting, for both the MC and real data
(in both signal and background regions),
atan(gamma1 PY/gamma1 PZ) against
atan(gamma1 PX/gamma1 PZ) which
shows the detection of particles in the
calorimeter. For the real data there was a
strip along the x axis of the calorimeter but
in the MC data only the background had
this and also the MC data in both regions
had a defining rectangular region near the
centre of calorimeter not present in the
real. Reasoning behind this wasnt properly
understood but it will need addressing in
term 2.
Then I started using TMVA, with only
BDT active, with the given starting variable
in the TMVA code using the MC signal
and real background data as training
Figure 6: Plots for both the MC and real data (in both signal and background regions)
of atan(gamma1 PY/gamma1 PZ) against atan(gamma1 PX/gamma1 PZ). Discrepancies between
the images of the MC and real data indicate an issue in the calorimeter in this detection,
confirming assumption found
sample. The output, from testing on the
real data (non training subset) gave a good
ROC (area > 0.9).
Branching factors were started to be
looked at for different possible background
events, to see how significant they are.
IV. Plan for Term 2
We will apply the TMVA BDT output acquired
in term one to the original data by
adding the output weight to the original
tree, plotting a Punzi figure of merit curve
for it and finding the optimum value to cut
on. Then we will try different combinations
of the best variables I found earlier to input
into TMVA and compare their corresponding
ROC curves to find the best combination.
This should take about a week and
a bit of signal should be identifiable in the
data. Then we will also test out and possibly
use other forms of machine learning,
such as the neural networks discussed in
the Theory section, again using the same
TMVA code. This could take between 1-3
weeks depending on how many variations of
the TMVA methods are used and to what
level we tweak their parameters. We will remove
big amounts of background by identifying
causes of background as discussed
in the Theory section, including the J/psi
reconstruction of the dimuon pair, different
types of combinatorial backgrounds such as
. whose branching fractions were assessed
in term 1. Will simulate particle decays
for these background sources in a similar
way to the B decay kinematics problems
at the beginning of term 1. This should
take about 1-2 weeks and should allow us
to put constraints on the data. This will
hopefully be accompanied by acquiring a
new, streamlined data set, which has all the
J/Psi contributions removed. Also will remove
the data from strip around calorimeter
giving the issue in the Pys discussed
from term 1. This should get rid of substantial
background without that much signal.
Finally the branching factors which we
started looking at in term 1 will be used to
focus on certain types of background.
1 Kibble T W B 2014 ArXiv e-prints
(Preprint 1412.4094)
2 Aad G et al. (ATLAS Collaboration)
2012 Phys.Lett. B716 1–29 (Preprint
3 2014 Nature Physics 10 557–560
(Preprint 1401.6527)
4 DYSON F 2013 International
Journal of Modern Physics A
28 1330041 (Preprint http:
URL http://www.worldscientific.
5 Abdallah J et al. 2015 Physics
of the Dark Universe 9-10 8 –
23 ISSN 2212-6864 URL http://
6 Y N 2000 165 (Preprint hep-ph/
7 Glashow S L, Iliopoulos J and Maiani
L 1970 Phys. Rev. D2 1285–1292
8 Kumericki K 2016 ArXiv e-prints
(Preprint 1602.04182)
9 Gershon T 2012 Pramana 79 1091–
1108 (Preprint 1112.1984)
10 Wolfenstein L 1983 Phys. Rev.
Lett. 51(21) 1945–1947 URL
11 Maiani L 2013 ArXiv e-prints
(Preprint 1303.6154)
12 D’Ambrosio G and Isidori G 1998
International Journal of Modern
Physics A 13 1–93 (Preprint
13 Sohnius M F 1985 Phys. Rept. 128 39–
14 Alam M S et al. (CLEO Collaboration)
1995 Phys. Rev. Lett. 74(15)
2885 URL
15 Lees J P et al. (BABAR Collaboration)
2012 Phys. Rev.
Lett. 109(19) 191801 URL
16 Lees J P et al. (BABAR Collaboration)
2012 Phys. Rev. D 86(11) 112008
17 Limosani A et al. 2009 Physical Review
Letters 103 241801 (Preprint 0907.
18 Jr A A A et al. (LHCb) 2008 Journal
of Instrumentation 3 S08005
19 Aad G et al. (ATLAS) 2008 JINST 3
20 Aamodt K et al. (ALICE) 2008 JINST
3 S08002
21 Chatrchyan S et al. (CMS) 2008
JINST 3 S08004
22 Alves Jr A A et al. (LHCb) 2008
JINST 3 S08005
23 LHCb Collaboration 2010 Physics Letters
B 694 209–216 (Preprint 1009.
24 Aaij R et al. (LHCb) 2012 JHEP 12
125 (Preprint 1210.2645)
25 Fawcett T 2006 Pattern Recogn.
Lett. 27 861–874 ISSN 0167-8655
26 Punzi G 2003 79 (Preprint physics/
27 Hoecker A et al. 2007 ArXiv Physics
e-prints (Preprint physics/0703039)