CEB NT97/05
30 September 1997
PIPELINE ARCHITECTURE
OF THE HERAB PRETRIGGER CONTROLLER
C.Baldanza, M.Bruschi, I.D'Antone
Istituto Nazionale di Fisica Nucleare
Bologna,Italy
Abstract
In this report the pipelined architecture of the controller in the pretrigger board is described. The adequate pipeline depth has been evaluated modelling the controller FIFOs as a queuing system.
The analytic result has been compared with the simulation output.
CENTRO DI ELETTRONICA
ISTITUTO NAZIONALE DI FISICA NUCLEARE
Sezione di Bologna
1. Introduction
The ECAL pretrigger system requires about two milliseconds to perform the complete algorithm. Therefore a pipeline architecture [1] has been realized, where each stage performs its work within 96nsec.
A controller in the pretrigger board holds in a pipeline the coordinates of each selected cluster [2]. It contains three FIFOs to takes the coordinates of the candidate clusters and four FIFOs in which are stored the addresses of the cells that must be elaborated for the searching of the Bremmsstrahlung photon energy.
To evaluate each adequate pipeline depth we have modelled the controller FIFOs as a queuing system and furthermore we have simulated the controller operating rules.
2. Queueing systems
In order to specify the type of a given queueing system [3], it is often used the shorthand notation A/B/m, where A and B describe the interarrival time distribution and the service time distribution, respectively, and m represents the number of parallel servers of the servicing system. The following is a list of the wellaccepted symbols for frequently used distribution that will be used in the following:
M Esponential distribution (Poisson arrivals or exponential service times)
D Deterministic variable, that is, interarrival times/service times are constant values.
G General distribution.
Thus, the system M/M/1 means a single server system with Poisson arrivals and exponential service times.
The exponential distribution is the most commonly used in queuing theory. There are two good reasons for this. First, it is generally a good approximation of the entrance distribution in many types of queues. Second, it makes the calculations relatively easy. The standard deviation of an exponential distribution is the inverse of the mean (average value). In other words: standard deviation = 1/mean.
When the time between random events is exponentially distributed, the number of random events during a given period of time will have a Poisson distribution.
In the M/M/1 queue the is:

r^{2} 

Q= 
 
(1) 

1r 

The equation (1) is a special instance of the more general formula for the M/G/1 case:

r^{2} (1 + Cs) 

Q= 
 
(2) 

2 * (1r) 

r = Server Utilization
Q = Number in queue
Cs= Coefficient of variation of the service time.
This equation is called the PollaczekKhintchine Formula after the two scientists who independently discovered it.
If the service time distribution is exponential (M/M/1), then Cs=1 and (2) reduces to (1).
If the service time is constant (M/D/1), then Cs=0 and

r^{2} 

Q= 
 
(3) 

2 * (1r) 

It must be observed that the mean queue size of an M/M/1 queue is two time the queue size of an M/D/1.
If the service is performed by m server (M/M/m) the average queue length is:

(mr)^{m} * r 

Q= 
p_{0 } 
(4) 

m! * (1r) 

where r = l / mm, and p_{0 }=1 / {S_{n }[(l /m)^{n} / n!] + [(l /m)^{m }/ (m!m!*l / mm)]}.
3. Process controller description
The process controller (CTRL) in the HERAB pretrigger board handles the elaboration of the cellvalues in a matrix 10x5. It gets the addresses of the cluster center cells (CCC) and those of the cells for the Bremmsstrahlung recovery (BRM) data and starts up to three elaborations for each bunch crossing (BX). The CTRL associates to the CCC addresses the BX number (BCN) received by an external module.
The logic that stores and distributes the signals is implemented on a 4013E Xilinx.
The addresses to be elaborated by the CTRL are seven :
n HA, HB and HC: the addresses of the (up to) three CCC .
n BR and BL: the two BRM addresses into the matrix .
n XR and XL: the two BRM addresses coming from an external pretrigger board.
The addresses are stored in seven FIFO’s 16 locations deep. Each FIFO (from here on we will call them with the same symbols as the addresses above) is made with a RAM block in an FPGA and can be read and written asynchronously. Each FIFO has four state signals to inform the extraction circuit about its content:
FULL: the FIFO is full, the arrival of another address set up an error flag .
DATA: the FIFO contains at least one address.
MID: the FIFO is half full.
LAST: the FIFO contains only one address.
fig.1
Process controller queueing structure
The extraction circuit takes from the FIFO three addresses per BX (96ns) and sends them
to the external circuits at the fCLK (24ns) rate.
A queueing model of this CTRL can be as shown in fig.1, where the server is the extraction circuit that read the FIFOs and send the data to the next circuits.
4. Performance evaluation
From a Montecarlo simulation it has been obtained that in the worst case the input rates are the following:
l_{HA} = 0.16,
l_{HB} = l_{HA} / 4,
l_{HC} = l_{HA} / 8,
l_{XR} = l_{HA} / 3,
l_{XL} = l_{HA} / 3,
l_{BR} = l_{HA} / 3,
l_{BL} = l_{HA} / 3.
These values have been obtained after a Montecarlo simulation of the HERAB electromagnetic calorimeter considering 1000 events consisting of 5 minimum bias events overlapping to a pure J/psi event in the "hottest" region (inner ecal, close to the beam pipe).
The l_{HA} l_{HB} l_{HC} rates have been obtained considering Poisson distribution arrivals.
If the server read one FIFO buffer at each BX the queue can be modelled, for each buffer, with a M/D/1 structure having m=1/7. We have seen that the mean queue length of an M/M/1 queue is two times the queue length of an M/D/1, then considering the M/M/1
queue we obtain a more conservative solution.
The server is implemented in a fast FPGA Xilinx and it is able to read 3 buffers in 96ns then the queue can be modelled with a multiserver structure: M/M/3.
To have a more uniform working load in the seven buffers we give them different priorities: we give a higher priority to a buffer with a higher input rate (Fixed Priority).
The three time slots in a BX used to read the FIFO are fixed in the following way:
1^{st} slot read HA
2^{nd} slot read XR, XL, BR, BL
3^{rd} slot read HB, HC
Then the HA buffer, for example, is read more frequently than the other buffers. With this three fixed priority we have:
m_{HA} = 1 ,
m_{XR} = 1/4, m_{XL} = 1/4, m_{BR} = 1/4, m_{BL} = 1/4,
m_{HB} = 1/2, m_{HC} = 1/2.
Here we see that the Q average length of the queues , in ten thousand events, are distributed as shown in fig.2.
fig.2
To have r = l / mm < 1, in this case must be l_{HA} < 3/4, to avoid the r > 1 condition in the XR, XL, BR, BL buffers.
To improve the working load distribution and to raise the maximum allowable rate l_{HA} we have developed a control circuit that changed dynamically the priorities with which the server read the buffers. We use the MID signal to change the priorities.
The extraction circuit operates with the following rules [1]:
1. The addresses are extracted with the priority: HA, BR, BL, XR, XL, HB, HC.
the first time slot of each BX), but when the MID signal is active it is read in all the
time slots. In other words, more time slots are assigned to a FIFO having the MID
signal active.
4. When 3 or more MID signals are active, the FIFOs with no MID signals are
inhibited.
The analysis with variable priorities is analitycally heavy . We have then developed a simulation program that implements the rules to adapt the priorities.
fig.3
To compare the previous analytical result with the simulation output, we have shown in fig.3 the average queue length in the seven buffers with l_{HA} =3/7. The dotted line is the analytical solution shown in fig.2.
We see that due to the different priority the XL XR BL BR buffers have different lengths, while in the previous case (three fixed priorities) they have the same average length.
The previous result has been obtained with pipeline depth=16 cells and MID=8, i.e. the MID signal is given by the FIFOs when they are half full.
We have also chosen the best MID to have the best working load distribution and to have the minimum average queue length in all the queues. It has been evaluated 4<MID<6.
5. Conclusion
In this report we have described the pipelined architecture of the controller in the pretrigger board. The adequate pipeline depth has been evaluated modelling the controller FIFOs as a queuing system. The analytic result has been compared with the simulation output .
Reference
[1] C.Baldanza et al., HERAB ECAL PRETRIGGER BOARD DESCRIPTION, herab
note, September 1997.
[2] C.Baldanza et al, A Cellular AutomatON for Cluster Selection
in THE HERAB PRETRIGGER BOARD, Technical note CEB NT97/03,
September 1997
[3] L.Kleinrock, Queueing Systems,Volume I: Theory, New York, John Wiley, 1975
Page edited by Bisi Fabio