CEB NT-97/04
30 September 1997



C.Baldanza, M.Bruschi, I.D'Antone, M.Piccinini, M.Zuffa

Istituto Nazionale di Fisica Nucleare



In this report the HERA-B ECAL pretrigger board is described. It is also shown how, in the board, all the operations are pipelinized. The latency of all the operations is also described.



Sezione di Bologna

1. Introduction

In the following the HERA-B ECAL pretrigger board is described. It is also shown how, in the board, all the operations are pipelinized. The latency of all the operations is also described.

2. Pretrigger data input

The input data are given by the readout cards, two wires per channel, according to a serial synchronous protocol.

The rate of the input data is equal to the bunch crossing (BX) of HERAB, 96nS. Each datum contains 8 bit, seven of which are the value given by the ADC of the front-end card and the other bit is a flag given by a comparator on the front-end card, indicating, for each channel, a value over a threshold. The pretrigger recomposes the data distributed on two lines and elaborates only the data with the flag bit set.

The data synchronization is performed by means of the experiment HERA_clock, whose period is of 96nS. In the pretrigger board the HERA_clock is quadruplicated with PLL circuits and the extracted fCLK signal, 24nS of period, is used to synchronize all the circuits of the board.

The data on the input lines are serialized with 24nS clock. The timing of the data transfer on a couple of lines respects to the HERA_clock and to the fCLK could be seen in fig.1.

3. Data organization

The pretrigger board must elaborate groups of 48 or 50 data coming from the calorimeter. The data are organized in a 12x8 matrix in such a way that each pretrigger card can deal with two groupings of calorimeter cells: 10x5 and 8x6 plus the border cells.

For each BX the pretrigger board is able to elaborate three clusters. Each cluster is composed by a central cell and the eight cells that surround it (nonet).

To elaborate a nonet having the central cell on the border of the matrix , for example the C1 or the C30 cells in a 10x5 matrix (fig.2), it is essential to get the values of the cells on the border of the confinant matrixes.

In fig.2 is represented the disposition of the cells in the 10 x 5 and 8x 6 matrixes. The cells of each matrix are indicated with the C letter (C1, C2, ..., C50). The U, D, L and R border cells originate from the eight matrixes in the neighbour pretrigger boards.


4. Block diagram description

The serial data are transferred simultaneously to the Event Buffer& Multiplexer (EBM) and to the Local Maxima Finder Unit (LMFU). The EBM stores the data of an event associating them to the BX number. The LMFU recognizes three cells containing a value higher than the neighbours and with the flag actived.

The addresses of the three cells are sent to the Process Controller (CTRL) that stores in three FIFOs the address together to the relative BX number.

Furthermore the CTRL contains other 4 FIFOs in which are stored the coordinates of the cells that must be elaborated for the searching of the Bremsstrahlung photon(BRM).

Two FIFOs are used for the Bremsstrahlung recovery on-board and other two FIFOs for the Bremsstrahlung recovery requested by the external pretrigger boards on the right side or on the left side.

These coordinates characterize the photon of right side and of left side belonging to the current nonet.

A sequencer within the CTRL extracts 3 coordinates for each BX from the FIFOs according to a suitable criterion of priority and it sends them to the EBM and to the Message Buffer (MBUF).

The data coming from the EBM are elaborated by the LUT, whose results are sent to the MBUF. The MBUF is a circuit composed by 8 dual port RAM and a controller that handles the flow of the data. The function of the controller is the memorization of the results in the dual port RAM, the recognition of a results coming from a main cluster elaboration and those coming from the Bremsstrahlung recovery and, finally, the extraction of the complete data.

The complete data are transferred to the VME Interface & Message Formatter (VIMF) in which the message for the TFU is assembled. The message is composed by a packet of four 20 bit words.

MBUF checks also the BRM Request Interface (BRI) that gives, by means of a look-up table, the coordinates for the Bremsstrahlung recovery.

The coordinates could belong to the matrix elaborated on the same board or to the two matrixes to the right and to left side. In the first cases these coordinates are sent directly to the FIFO of the CTRL on-board, in the second case are sent to the CTRL of the pretrigger board of the right or the left side.

Simultaneously the BRI could get some requests from the card on the right or the left side and send them to the CTRL. The results of these elaborations goes through the BRI to the external boards.

The FCS interface gets the bunch crossing number (BCN) and transfers it to the CTRL. The FCS gets besides the HERA_clock and send it to the Fast Clock Generator (FCG) where the fCLK is generated and distributed to all the circuits on-board.

The FGC contains some PLL circuits that are able to delay or to advance the fCLK. In this way all the clock edges in all the pretrigger boards can be synchronized.

The data coming from the Input Interface (InI) are in differential TTL format and in the InI are converted in TTL and synchronized with the fCLK.

The block diagram of the pretrigger board is shown in fig.3.


5. Input Interface

Sigla segnali
JCA Central cells first part  pCO1-12, nCO1-12, pCE1-13, nCE1-13 
JCB Central cells second part  pCO13-25, nCO13-25, pCE14-25, nCE14-25 
JCC Central cells third part pCO26-37, nCO26-37, pCE26-38, nCE26-38
JCD Central cells fourth part pCO38-50, nCO38-50, pCE39-50, nCE39-50
JU Upper border cells pUE1-10, nUE1-10, pUO1-10, nUO1-10
JD Lower border cells  pDE1-10, nDE1-10, pDO1-10, nDO1-10
JL Left border cells  pLE1-8, nLE1-8, pLO1-8, nLO1-8
JR Right border cells  pRE1-8, nRE1-8, pRO1-8, nRO1-8

The Input Interface (InI) gets the signals coming from the front-end cards in differential TTL format, it converts them in TTLsingle ended, synchronizes them with the fCLK and distributes them to the EBM and to the LMFU.

The input connectors are eight ; in Table 1 is shown the signal distribution. The signals are named according to their destination. The presence of a 'p' or a 'n' at the beginning of the name distinguishes positive and negative signals. The second letter of the abbreviation characterizes the group to which the signal belongs: 'C' for central cells, 'U' for the upper edge, 'L' for the lower edge, 'L' for the left edge and 'R' for the right edge. The third letter could be 'E' if the signal line carries the even bit, 'O' if it carries the odd bit (Table 1).

The number of signals is 344, 50 central cells, 10 upper, 10 lower, 8 at right side and 8 at left side multiplied by 4 (2 lines carry the even or odd bits and 2 lines for the positive and negative of the differential signal). The number of cells handled by the input interface is however more than the format 10x5 (84 cells included the edges) and more than the format 8x6 (80 cells included the edges). In this way the two formats are handled with the same pretrigger board.

To the connectors are connected the differential receivers and the latches synchronized with the fCLK. Then the signals are transferred to the LMFU and to the multiplexers within the EBM that create the matrix 8x6 or 10x5 (fig. 2). The selection among the two systems happens by means of a jumper. The multiplexers distribute the signals in input in two matrices like those shown in fig.2). The signals are distributed in this way because, to extract the nonet, the data in the EBM are organized for columns.

6. Local Maxima Finder Unit

The Local Maxima Finder Unit (LMFU) is a device that finds in a matrix three cells having a cluster. It is implemented in a 4013FPGA Xilinx. The capability to elaborate two types of matrixes, 10x5 or 8x6, is obtained loading two different programs in the FPGA at the start-up, according to the position of a jumper.

The data in input originate from the InI, in serial format, and they concern the central and the edges cells. In output the device gives three addresses of cells containing an event and a flag for each address that validate the address.

The criterions for the choice of a cell are:

1. the cell must have a value higher than the cell in upper side and the cell to the left side.

2. the cell must have an equal or higher value than the cell in lower side and to the right side.

3. the cell must have the bit of flag actived ('0').

To compare two cells it is used a serial comparator in double line, adapted to the format of the input data. At each fCLK edge two 2bit data arrive. The first couple of bit, i.e. the most significant bit of the datum, is recognized by the presence of a '1' on a synchronization line. The most significant bit of this couple contains the flag. If the flag is not active, the cell is immediately discarded. If the first couple of 2bit data is equal, the comparator waits the second , then the third and the fourth couple. When two couples are different the comparator is stopped. The information of the comparator is stored and transferred to the next circuit at the next synchronization impulse.

Any C cell is connected to 4 comparators called Upper (U), Lower (D), Left (L) and Right (R). As it is seen in Fig.5, two cells are connected to the d input of the R and D comparators and to the c input of the U and L comparators .

The flow chart of the comparator cycle is shown in Fig.6. In the description that follows the words are referred to the elements of the comparator in Fig. 5.

The cycle starts at the arrival of a synchronization signal R, simultaneously arrive the bit 7 and 6 of the data on the c and d inputs.

At the beginning is tested the flag bit  and if it is not active ('1') the cell connected to d is immediately discarded . Otherwise the couple of bits are compared sequentially.

Each cell has a circuit that picks up the results of the four comparators to which it is connected.

If the cell has a value higher than the CU , CD, CL, CR cells that surround it, the response of the comparator process is zero if the cell is valid and "1" if the cell has been discarded.

This output is taken by means of a latch synchronized with the fCLK.

Then the results are transferred to three priority encoders in pipeline, with 50 inputs everyone. Each encoder contains five priority encoders with ten inputs everyone, for the matrix 10x5, and six priority encoders with eight inputs, for the matrix 8x6. These encoders are moreover connected to a secondary encoder with variable priority (fig.7).


The matrices are divided in horizontal lines, 5 for the 10x5 and 6 for the 10x6. Each line is connected to a main encoder. The encoder recognizes the active cell with the higher priority and brings, in the output line, the address of the correspondent cell position. The first cell to the right of each line has address 0, the seconds 1 and so on. The main encoders are connected to the secondary encoder across a line that becomes active when at least one of the encoder cells is valid. If more than one encoder is active the secondary encoder selects that with higher priority, enabling the output buffer and giving in output the information of the column (Column). Simultaneously it gives

in output the address of the row elaborated by the encoder (Row). The secondary encoder checks also the Valid line, that it is active when at least a cell in the matrix is valid.

The row with higher priority is variable at each BX. At the power up the line at higher priority is the first, then the second, the third and as so on.

The first encoder with 50 inputs selects the first event and at the following BX send the data to the second encoder, blanking the cell that it has selected. The second encoder, after a subsequent BX send the data to a third encoder, blanking the cell that has selected always.

The data, before the transmission to the CTRL, are converted from the Row / Column value to the address of the cell in the matrix minus one: the cell C1 has address 0, the cell C2 address 1 and so on. This address of cell from now identifies the cells of the matrix.

To the CTRL are sent, in three different times, the addresses of the three select cells (HA, HB, HC) and relative to the same BX.

7. Process Controller

The process control handles the elaboration of the data contained in the cells. It gets the addresses of the valid cells and the addresses of the cells with BRM photon and it starts three elaborations for each BX. The CTRL extracts the Bunch Crossing Number (BCN) from the FCS interface and use it as address to write in the EBM. The part of reception and distribution of the signals within the process controller has been implemented on a 4013E Xilinx. The control of the EBM and the management of the BCN within the process controller has been implemented in a FPGA 4003E Xilinx. This last part gets the BCN from the FCS card, delays it by means of a shift register, to recover the latency of the read-out cards, and stores it to the same address of the matrix regarding the same BX, to dispatch it to the MBUF when the valid cells of the matrix have been elaborated.

The sources of the cell addresses to elaborate are seven :

  1. HA originates from the LMFU, it is the first selected cell.
  2. HB originates from the LMFU, it is the second selected cell.
  3. HC originates it also from the LMFU, it is the third selected cell.
  4. BR originates from the LUT. It is the address of the cell where probably a BRM photon is arrived to the right of the elaborate cell
  5. Bl originates from the LUT. It is the address of the cell where probably a BRM photon is arrived to the left of the elaborate cell.
  6. XR originates from the pretrigger card on the right side. Each card could require the elaboration of a cell of the card to his right or his left side where a BRM photon could be arrived.
  7. XL originates from the pretrigger card on the left side.

To the addresses BR, BL, XR, XL are associate the BCN of the active cell whose elaboration has given the address. The BCN of the addresses HA, HB and HC is associated directly by the CTRL.

The addresses are stored in seven FIFO deep 16. Each FIFO is realized with a RAM block in a FPGA and it could be read and written simultaneously. Each FIFO has four state signals that inform the extraction circuit about the content. The signals are :

The extraction circuit takes from the FIFO three addresses for each BX and send them to the EBM and to the MBUF. The addresses exit at 24nS rate. Beyond to the address and the associate BCN, also a group of 5 flag containing information on the data are sent to the MBUF (tab.2)
Flag Name
1 = The packet contains valid data.
If EVENT è 0 and bit = 1 the packet contains left BRM data, if bit = 0 the packet contains rightBRM data.
If EVENT = 0 and bit = 1 the packet contains an external BRM request to be trasferred to the right or the left board following the status of bit LEFT; if bit= 0 the packet contains an internal to board BRM request.
If bit= 1 data concern an event processing, otherwise a BRM request processing. 
If bit=1 an external BRM request has to be checked; if this is the case the processing is discarded.

The rules with which the extraction circuit operates are :

  1. The priority with which the addresses are extracted is variable according to the table 3.
  1. The FIFOs selected in a BX are inhibited in the next 4 fCLK. This rule has been inserted because the update of the state signals requires a BX. At each BX we have 4 time slots; the FIFOs are depleted in the first 3 time slots.
  1. HA is normally depleted in the first time slot, but when the MID signal is active it uses all the time slots. Then the HA FIFO can be quickly depleted to elaborate the principal event in a small possible time.
  1. At a FIFO having the MID signal activated more time slots are assigned. In this way the FIFO is quickly depleted.
  1. When 3 or more MID signals are actived, the FIFOs with no MID signals are inhibited.


The output of each FIFO contains the next address (and the associate BCN) that must be sent and it is connected to a tri-state buffer. The extraction circuit enables only one buffer when it must send an address from a FIFO to the other part of the card.

In each FIFO are realized three circuits to check the data and to highlight and correct some error conditions :

1. OVERRUN : It checks if the BCN associated to the address is too old. The EBM in fact has a depth of 64 word and after 64 BX a buffered event is overwritten.

The addresses of cells with events HA, HB, HC are eliminated before the addresses of cells with BRM, because the elaboration of an event is always followed by the BRM elaboration. In case of an error on a FIFO BRM the elaboration of the relative BRM is not launched and the message, incomplete, is overwritten and it is missing within the MBUF. The error flag is set.

2. NOEXTERNAL : Starting from the BCN relative to an address of event HA, HB, HC, it checks if the system has time to elaborate an external BRM request after the event elaboration.

In a limit situation, there is time to elaborate an internal BRM but not an external given BRM ; the BRM external cycle has a longer latency of 6 BX. Consequence : in this case the ERR_IF_X flag is set and if the result of the elaboration foresees an external BRM elaboration the message is deleted.

3. FIFOFULL : It check that the write cycles are inhibited, when the line FULL of the FIFOs is activated. In this case the older address of the FIFO is overwritten.

The three described circuits go to three 7bit registers readable from the VME bus. Each register contains the state of the FIFO and it is reset after a reading from the VME.

Furthermore, the output of each group of registers is connected to a red led on the frontal panel. The three led has called : OVER, XERR and FULL.

8. Event Buffer & Multiplexer

The EBM stores 64 matrixes 8x6 or 10x5 in a buffer composed by static RAM implemented in 6

FPGA Xilinx 4013. The first couple of bit of the data arrives to the input of the EBM where a serial to parallel converter handles the input serial data. At a suitable clock phase a valid parallel data is obtained which is stored in the buffer at the address given by the CTRL. For each cell of the matrix a RAM block of 64 7bit words has been implemented in the FPGA ; globally the buffer is 64 x 96 7bit words.

For each BX are extracted 3 nonet. The data extracted are sent to the LUT by means of 9 7bit bus ; each bus carries a cell of the nonet. During a write cycle the CTRL furnishes the current BCN that is used as address in the buffer, while during a read cycle the CTRL furnishes the address of the central cell of the nonet and the BCN relative to the event that must be analyzed. The address of the central cell in the EBM is given by the CTRL, in a command word, to the multiplexer in the EBM that it send the data on the bus for the LUT.
Col l bit SLl SCl SRl SLh SCh SRh MLl MCl MRl MLh MCh MRh Col h bit
0 L1 U1 U2 U3 U4 U5 U6 U7 U8 R1 - - 8
1 L2 C1 C7 C13 C19 C25 C31 C37 C43 R2 - - 9
2 L3 C2 C8 C14 C20 C26 C32 C38 C44 R3 - - 10
3 L4 C3 C9 C15 C21 C27 C33 C39 C45 R4 - - 11
4 L5 C4 C10 C16 C22 C28 C34 C40 C46 R5 - - 12
5 L6 C5 C11 C17 C23 C29 C35 C41 C47 R6 - - 13
6 L7 C6 C12 C18 C24 C30 C36 C42 C48 R7 - - 14
7 L8 L1 L2 L3 L4 L5 L6 L7 L8 R8 - - 15

Col l bit SLl SCl SRl SLh SCh SRh MLl MCl MRl MLh MCh MRh Col h bit
0 L1 U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 R1 8
1 L2 C1 C6 C11 C16 C21 C26 C31 C36 C41 C46 R2 9
2 L3 C2 C7 C12 C17 C22 C27 C32 C37 C42 C47 R3 10
3 L4 C3 C8 C13 C18 C23 C28 C33 C38 C43 C48 R4 11
4 L5 C4 C9 C14 C19 C24 C29 C34 C39 C44 C49 R5 12
5 L6 C5 C10 C15 C20 C25 C30 C35 C40 C45 C50 R6 13
6 L8 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 R8 14


Each FPGA can memorize 16 cells organized in two columns of 8 cells. In tab.4 is shown the mapping of the 6 FPGA on the matrix. The cells are part of a 12x8 matrix. The matrixes 8x6 and 10x5 can be imbedded in the 12x8 matrix ; in this way the two cases are handled with only a lightly larger matrix. This configuration has been adopted to optimize the FPGA area of the global EMB implementation.

The six FPGAs has been divided in 3 Slaves and 3 Masters. The Slaves has called SL, SR and SC and are connected to the Masters ML, MR and MC by means of a 21bit bus. The Masters are connected to the Slaves and also among them with three 21bit bus. Each Master has an output that connects it to the other two Masters (fig.8).

From each FPGA Master, three 7bit bus depart that carry the nonet to the LUT.

The first FPGA Master, ML, transfers the values of left side of the nonet. The second, MC, transfers the central values of the nonet. The third FPGA Master, MR, transfers the values of right side of the nonet (fig.8).

The multiplexer that extracts the nonet is distributed in the six FPGAs and acts in the following steps:

1. From all the columns are extracted the values that are located on the same row of the matrix, on the superior line and on the inferior line.


2. A second multiplexer, with a signal that could be different for each Master-Slave couple , selects the trio of the data in the first or in the second column in all the FPGAs.

3. A third multiplexer within the Master, with a signal different for each Master-Slave couple, selects the trio of the data from the second multiplexer or from the Slave FPGAs connected with the 21bit bus. The data at this point have been reduced to 9 values that compose the nonet.

4. Finally could be needed the rotation of the three trios. Each FPGA has an output 21bit AB bus and two input 21bit A and B buses (fig.8). To correct the positions of the trios is used a further multiplexer to three output of the Master performing the suitable rotation.

The multiplexer selects, with a common signal to the three MASTERs, the trio on the output A, realizing an anticlockwise rotation of the data, or on the B output, realizing a clockwise rotation of the data.At the multiplexer output is connected the bus that bring the nonet to the LUT.

9. LUT

The LUT performs the elaboration to produce the x-y coordinates of the center of gravity, the energy of the cluster and other useful information for the first level trigger (Energy, h , x , dx , ddx , DEST, BCN).

The result of this processing is a message as defined in [2 ]. In fig. 9 the logical scheme with which the RAMs perform the processing is shown. Furthermore is also indicated the elaboration performed on the nonet.

The LUT evaluates also the coordinates of the cells where the Brehmsstrahlung correction have to be searched.

10. Message Buffer

The MBUF temporarily stores the data coming from the LUT, relative to the energy of the selected cell. It waits for the data of the BRM elaboration to complete the message. To do this it uses a RAM DUAL PORT 256x75 bit assembled with IDT7014S12J chip controlled by a FPGA Xilinx 4013E.

For each data group coming from the LUT, the CTRL furnishes the event parameters: the address of the central cell, the BCN of the event and the flags describing the event.

The parameters are re-synchronized through a delay circuit to compensate the latency of the LUT. A control circuit routes the data arriving from the LUT discriminating the data of a principal event from those of right or left side BRM. It is performed by means of some flags describing the event and following these rules:

1. The bit 0 (VALID) must be active otherwise the packet of data is unknown.

2. If the bit 3 (EVENT) is active the data are relative to a cell having an event selected by the LMFU. All the elements (Energy, h , x , dx , ddx , DEST, BCN) are stored in the RAM DUAL PORT to the address BCN. Then it is transferred to the BRI the address of the cell and the Dbrem parameter from the LUT, to evaluate the address of the cells with the two BRM photons.

3. If the bit 3 (EVENT) is not active the data coming from the LUT contain the energy of a cell with a BRM photon that could be of the right side, bit1 (LEFT) active, or of the left side, bit 1 no active. The BRM request could be internal or external, according to the state of the bit 2 (EXT):

4. When VALID is not active, the control circuit enables the memorization of the energy values coming from the right and from the left boards, results of precedent elaborations. The interface, that gets the data coming from the right or from the left side, contains a FIFO that hold the data, waiting for the extraction performed by the control circuit. Also this FIFO contains some test circuits to



LUT block diagram

recognize error conditions that are lighted on the frontal panel by means of a led and are readable by the VME bus.

11. Message Formatter & VME Interface

The communication of the message to the following cards, the TFU, is controlled by the MBUF + MFVI. The MBUF, when the board is enabled and it has a message to transfer, send the data to the MFVI that creates the message formatted in four 20bit words for the TFU.

Up to 16 pretrigger cards could be connected in parallel on a local bus. At each BX one of the cards on the bus transfers the message to the TFU.

In fig.10 is shown the output section of 4 Message Formatter blocks. A bus arbiter handles the bus access, in a way that there is no privileged board .The arbiter handles, by means of a local bus on a J3 connector, the data transfer of 16 pretrigger boards. The 80 bit data are transferred to the TFU on a 20 bit data bus at a rate 4 x 20bit / BX.

12. Pretrigger Latency

In the pretrigger board the events are synchronized with the fCLK. Therefore the event synchronization happens with the temporal resolution of fCLK= 0.25* BX= 24 nsec.

In the case of an event with 3 clusters, the latency of the complete process in the pretrigger board, included the Bremsstrahlung recovery, follow the time scale shown in fig.11.

If we have 3 events with 3 clusters for each event, the FIFOs in the Process Controller CTRL takes the data in queue. The Process Controller, due to the queue, gives the BCN+SEL relative to the last input 4 BX after the entrance. The data are extracted sequentially and elaborated by the LUT. In this case the latency, included the Bremsstrahlung recovery, increase by 4 BX.

13. Conclusions

In this report the HERA-B ECAL pretrigger board has been described. It has been shown how, in

the board, all the operations are pipelinized. Finally the latency of all the operations has been described.


  1. "The Programmable Logic Data Book, XILINX, 1996

2. "HERA-B FLT Message Transfer Module", Universitat Mannheihm, 1996

3. D.Ressing, "FCS Specifications", January 13, 1997

  1. C.Baldanza et al, "A Cellular AutomatON for Cluster Selection
in THE HERA-B PRETRIGGER BOARD", Technical note CEB NT-97/03,

September 1997

Page edited by Bisi Fabio