Saurabh Gupta* Al Crouch** Jennifer Dworak* Daniel Engels* *Southern Methodist University, Dallas, Texas, USA **Amida, Austin, Texas, USA
Abstract—The number of on-chip embedded instruments required for testing, debugging, and monitoring integrated circuits (ICs) has increased dramatically. The IEEE 1687 (IJTAG) standard can allow efficient access to these embedded instruments by dynamically reconfiguring the scan chain using Segment Insertion Bits (SIBs). Unfortunately, instruments that require a large amount of test data and several accesses during test mode still result in long test times when the test data is shifted through the scan path serially. To provide high bandwidth access to the embedded instruments, we describe a SIB-based Parallel-IJTAG network architecture that can significantly reduce test times. The SIB programming access time overhead is equal to that of the corresponding serial network. Different ways of implementing Parallel-SIBs (P-SIBs) and the security implications of a ParallelIJTAG network are explored. We show that despite the increased bandwidth of the scan path, the security provided by Locking SIBs can be maintained in a parallel network. For example, the expected amount of time for successful random brute force attacks on Locking Parallel SIBs of sufficient key sizes is over 12,000 years.
I. INTRODUCTION
The increasing complexity of System-on-Chips (SOCs) has resulted in an increasing number of embedded instruments being included in the chip for the purposes of test, debug, monitoring, and configuration of the chip and the board on which it is placed. These instruments may be on-chip sensors to monitor temperature, voltage or current, delay monitors, built-in-self-test (BIST) engines, assertion checkers, error detection circuits, event counters, or trace buffers, among others. The IEEE 1687 standard [1] can be viewed as an extension of the IEEE 1149.1 standard [2]. IEEE 1687 focuses on the access and control of on-chip embedded instruments for device test, debug, and configuration. Instead of using different instructions in the IEEE 1149.1 Instruction Register (IR) to access embedded instruments in different configurations, IEEE 1687 defines the use of scan data within the scan chain itself to dynamically reconfigure the chain and open access to new chain segments. Special scan cells that act as Network Instruction Bits (NIBs) are used to generate control signals and reconfigure the scan chain by placing an appropriate value in the update cell of the NIB during the UpdateDR (Update Data Register) cycle of the IEEE 1149.1 state machine. A NIB that can reconfigure the scan path is called a Segment Insertion Bit (SIB).
This distributed control approach allows various combinations of instruments to be accessed concurrently without requiring new instructions to be encoded for the instruction register or without requiring parking in the Run-Test-Idle state while any given instrument is operating. As such, it also enables easier design reuse when the combinations of instruments included in a new version of a design changes because the instruction decoding for the instruction register does not necessarily need to be redesigned to handle the changes.
As the number of instruments that must be accessed concurrently in an IEEE 1687 network increases, the amount of data that must be shifted through the chain on every CaptureShift-Update cycle of the JTAG state machine increases as well. This can dramatically increase test time in a traditional IEEE 1687 scan network—especially if the instruments have long Test Data Registers (TDRs)—because the data is shifted through the network serially from the Test Data In (TDI) port to the Test Data Out (TDO) port.
To help address this issue, in this paper, we explore a Parallel IJTAG architecture that is reconfigured thorugh the use of Parallel-SIBs (P-SIBs). Some of the major contributions of this work are the following:
- We explore the design of a Parallel-IJTAG network using SIBs that were previously designed for serial IEEE 1687 networks. We show that the SIB programming overhead of this Parallel-IJTAG network is equal to its corresponding serial network. We also calculate the cost of instrument access times (number of clock cycles) in both the parallel and serial designs to show access time reductions in a parallel network.
- We design a new n-bit Parallel-SIB for an n-bit ParallelIJTAG network that requires a reduced number of Update registers as compared to using regular SIBs. We also show how this n-bit Parallel-SIB design can be modified to design Addressable-PSIBs and show an example network architecture where these Addressable-PSIBs can be used. • We explore some of the security implications of parallel IJTAG networks. We show how the Locking-SIBs (LSIBs) that were introduced for serial networks can also be used in Parallel-IJTAG networks and investigate the cost and effectiveness of LSIBs for Parallel-IJTAG.
- We introduce the design of a new n-bit Locking-ParallelSIB (L-PSIB) for an n-bit Parallel-IJTAG network that requires a reduced number of Update registers. We show that such L-PSIBs with sufficient key sizes may require over 12,000 years to access on average when a brute force attack method is employed.
Delivery of parallel test data for the IEEE Standard 1687 structures using “remotely controlled scan mux architecture (RCSMA)” was discussed in [3] in the context of 3D stacked ICs. In addition, the SIB based network that is presented in the 1687 Std. that allows parallel data transport is the Broadcast Mode. Broadcast mode allows one serial scan chain to feed multiple identical scan chains in parallel, but only one serial scan chain is returned. Other alternatives are being discussed in the P1687.1 Working Group under non-JTAG access to 1687 networks. However, to the best of our knowledge ours is the first paper to describe the design of multiple types of ParallelSIBs and to explore some of the various IJTAG networks that can be created with them, along with some of their security implications.
In the remainder of this paper, Section II presents an overview of prior work. Section III presents some of the relevant architectural features of an IEEE 1687 test network and previous work in LSIB-based security. Section IV discusses parallel IJTAG architectures and explores the design of different Parallel-SIBs. Section V discusses a hybrid IJTAG network. Section VI investigates securing a parallel IJTAG network. Finally, Section VII concludes the paper.
II. PRIOR WORK
Previous papers have discussed methods to calculate the time required for instrument access and methods to reduce overall instrument access times in an IEEE 1687 network. Algorithms for automated design and optimization of 1687 networks were discussed in [4]. In [5], a method to calculate the overall access time (OAT) for a given IEEE 1687 network was presented. OAT calculations were performed on flat and hierarchical network architectures using two access schedules—a sequential schedule and a concurrent schedule. A methodology to evaluate the test cost of reconfigurable networks, according to the functional fault model introduced in [6], was presented in [7]. The problem of test scheduling to reduce test access times while satisfying resource and power constraints was also discussed in [8], and hybrid test schedules (a combination of sequential and concurrent schedules) were explored. The hybrid test-scheduling method to reduce OAT while satisfying resource and power constraints that was proposed in [8] was further optimized in [9]. In [10], an efficient scalable methodology for the delivery and localization of interrupts was presented based on hierarchical, multi-mode IJTAG networks.
The authors of [3] investigated the potential use of parallel test access in the context of 3D stacked ICs. As part of this work they discussed increasing the number of scan paths in an IJTAG network that used the “Remote Controlled Scan Mux Architecture” (RCSMA) for scan path reconfiguration. While RCSMA has advantages, it also requires a pass through the IR (Instruction Register) scan to reset the value of the Scan Control Unit each time before the scan chain can be reconfigured. In this paper we explore the design of parallel IJTAG networks based on Parallel-SIBs instead of RCSMA.
Historically, attackers trying to gain access to internal data and functionality of the chips and the board have used the JTAG port. For example, hackers have previously used the JTAG port to disable the Digital Rights Management (DRM) policy on an Xbox [11]. In other cases an attacker may attempt to gain access to private data, debug instruments, test circuitry, state information, or configuration hardware by serially scanning in private or undocumented instructions or data into the instruction register and scan chains.
As a result of successful scan chain-based attacks, methods of securing the scan chain and the JTAG ports have been developed. Some methods involve limiting scan chain access by using challenge response pairs [12], [13], [14]. As an example, in [13], Clark introduced a JTAG design that uses the SHA256 hash algorithm and a true random number generator to create challenge response pairs that limit access to chip internals via the scan chain. In [15], a security module and test control module are used to protect memory content. When the security module is in restricted mode, data encryption and decryption of the memory contents is disabled, the TDO pin is forced to a logic 1, and only limited access to the memory is permitted.
Scan chain attacks have also been successful in compromising cryptographic algorithm implementations. Yang et al. [16] show how scan chains can be used to mount side channel attacks on a hardware implementation of the Data Encryption Standard (DES). A scan chain-based side channel attack on a stream cipher was explored by Mukhopadhyay et al. [17]. Similarly, Kamal and Youssef [18] describe the use of scan chain-based side channel attacks to retrieve the secret keys of the NTRUEncrypt cryptosystem. Other scan chain-based attacks have also been shown to extract secret keys [19].
To avoid scan chain-based attacks, it has been suggested that the scan chain be divided into segments whose access order may be reconfigured [20], [21], [22]. Reconfiguration may occur automatically when, for example, the first k bits that are shifted into the scan chain do not correspond to the first k bits of the key. Additional security mechanisms have been proposed to further secure an IEEE 1687 scan chain network against attacks. In [23], Dworak et al. introduced Locking Segment Insertion Bits (LSIBs) to secure access to segments in IEEE 1687. To open an LSIB, a preselected key must be present in the scan chain. This key is then used to enable the update signal for the LSIB. Thus, LSIBs can provide fine grained protection and secure access to different instruments using different keys or sets of keys. Further optimizations to LSIBbased security in an IEEE 1687 network were discussed in [24], [25], [26]. Some power analysis attacks on the LSIB key logic circuits and mitigation methods were explored in [27]. Liu and Agrawal [28] introduced a key generation mechanism that uses a linear feedback shift register (LFSR) to dynamically generate keys. A defined number of scan flip flops are used to create an LFSR that is used for key generation. In [29], Baranowski et al. propose a challenge-response protocol that uses secure segment insertion bits to reconfigure the length of the scan chain. All of
these secret key methods make it harder for an attacker to use brute force attacks to gain access to hidden or secure segments of IEEE 1687 networks.
III. OVERVIEW OF IJTAG
Fig. 1 shows an overview of the JTAG architecture and the interfacing of the IEEE 1687 IJTAG network to JTAG. Note that additional test data registers (TDRs) that don’t implement IJTAG may also be present and selected through placing an appropriate instruction in the instruction register (IR).

An IEEE 1687 test network provides a scalable plug-andplay interface for interfacing and accessing on-chip embedded instruments. The dynamic reconfiguration of the IEEE 1687 network through distributed control has significant advantages over a pure instruction-based approach. Design modifications to the test network, including the addition of new embedded instruments, is easier because the decoding of the IEEE 1149.1 instruction register does not need to change. It is also possible to access various combinations of hundreds of instruments without requiring a new instruction for each combination.

The distributed control is obtained using special data-side scan path cells called Network Instruction Bits (NIBs), shown in Fig. 2. Control signals such as “local reset” and “deny” can be generated by placing correct values in the update cell of a NIB. For example, a local reset signal could be used to reset an individual instrument’s TDR. In contrast, conducting a reset of some portion of the JTAG architecture is generally accomplished by passing the state machine through the Test-Logic-Reset (TLR) state—which has the bad side effect of resetting all the JTAG chips on a board attached to the same active daisy chain. A “deny” signal could be used to prevent capture and update operations on a specific TDR to help with the debugging procedures. Implementing similar functionality using the IEEE 1149.1 JTAG would require a separate instruction for each instrument TDR on the scan path.

A Segment Insertion Bit (SIB) as shown in Fig. 3 is a special type of NIB that enables access to the embedded instruments by dynamically reconfiguring the scan path and adding (or removing) a scan segment to (or from) the active TDI-TDO scan path. The SIB shown in Fig. 3 can add (or remove) the scan network connected between TDI2 and TDO2 to the active scan path. When the SIB is closed, the active scan path consists of only the SIB cell between TDI and TDO. Placing an appropriate value in the update cell of the SIB asserts the Select* signal. This Select* signal causes the first multiplexer to select the input connected to the TDO2 signal, which is then fed to the shift cell. It also enables the shifting of the scan cells in the “Extra scan segment” (control not shown). Together, this adds the “Extra scan segment” between TDI2 and TDO2 to the scan path immediately before the SIB. Thus, the SIB acts as a one bit bypass register when it’s closed.


As mentioned earlier, LSIBs provide secure access to the embedded instruments on an IJTAG network. One possible implementation of an LSIB is shown in Fig. 4. An LSIB is a SIB that has been modified to prevent the SIB from “opening” or “closing” unless the conditions needed to “Unlock” the LSIB have been met. In this figure, an AND gate has been added so that an update operation (Update Data Register cycle of the TAP controller) on the update cell can only be performed if the Unlock signal is set to 1. In general, the value of the Unlock signal is determined by the values of other signals, such as values present in other scan cells.
Fig. 5 shows a 10-bit scan path segment with separate shift and update cells. An LSIB is inserted in this scan chain, and six key bits are present in the update cells (shown in dark black) of some of the other scan cells. The ScanMux shows the insertion point for the additional scan segment TDI2-TDO2. Note that even though the insertion point in this example is between shift bit-6 and bit-7 (i.e. next to the LSIB), it could legally be inserted anywhere in the chain. The unlock signal shown in Fig. 4 is generated if the correct key value is placed in all the key bit locations shown in Fig. 5. Thus, if an LSIB is used instead of a regular SIB, in this implementation, then accessing an instrument TDR in the “Extra Scan Segment” requires first shifting the correct ”LSIB key” value into the chain and performing an UpdateDR to generate the Unlock signal. This is followed by shifting values into the chain again and performing another UpdateDR to place the correct value in the update cell of the LSIB itself. Thus two Data-RegisterScans (DR-Scans) are required to open the LSIB, and the extra scan segment can be accessed during the third DR-scan. Other variations of this approach are also possible. For example, [27] showed that allowing different LSIBs to share the same physical key bit scan cells may help reduce susceptibility to power analysis attacks. It is also possible to place the key bits in the shift cells of other scan cells of the chain. Placing key bits in shift cells decreases the amount of time required for an authorized user to access the instrument, but it allows an attacker who does not know the key to make opening attempts with new key guesses more quickly and may also make power analysis attacks easier. To some extent, this reduction in security can be counteracted by adding additional key bits and reusing them for other LSIBs. In general, LSIBs provide a scalable method of securing access to the internal embedded instruments
in an IJTAG network, where overhead (access time and key bits) can be traded for increased security.
IV. A PARALLEL IJTAG NETWORK
As discussed in Section II several methods of reducing SIB overhead and different instrument test scheduling algorithms have previously been proposed. However, even if the SIB overhead is reduced, the IJTAG networks discussed until now are limited by their serial scan designs. Thus, the test time (number of clock cycles) depends on the size of the instruments’ TDRs and the number of times the instruments are accessed. In this paper, to increase the bandwidth of the IJTAG network, we discuss multiple ways of designing a Parallel-SIB based distributed control network.
Fig. 6 shows a conceptual overview of the JTAG circuitry and the interfacing of a Parallel-IJTAG network to JTAG. The required Capture, Shift, and Update control signals to the instruction register and to the test data registers, including the Parallel-IJTAG network, are supplied from the TAP controller. The n-bit wide Parallel-IJTAG network is fed by TDI from the TAP and an (n − 1)-bit wide Parallel-TDI (P-TDI) bus. The (n − 1)-bit P-TDI and (n − 1)-bit P-TDO bus can be multiplexed with general purpose input/output (GPIO) pins on the chip such that during test mode the GPIO pins can be used for P-TDI and P-TDO. If enough GPIO pins cannot be multiplexed during the test mode, then an alternative may be to supply the parallel data through SerDes input/output ports, if such ports are provided.

In the subsequent subsection we begin with the obvious approach of replicating the SIBs in each of the parallel scan paths. This is followed by multiple design optimizations.
A. SIBs in Parallel-IJTAG network – I
The n-bit Parallel-IJTAG network is similar to a serial IJTAG network with multiple scan paths. In the simplest case, a k-bit TDR is divided into n registers (TDR segments) each of size (k/n) bits. Each SIB in the serial IJTAG network is also replaced with n SIBs distributed over the parallel scan paths. If the length (k) of the TDR and the width (n) of the parallel network is such that (k/n) is not an integer then the n-th TDR segment is padded with additional shift cells such that its length is equal to the other (n −1) TDR segments.


An n-bit Parallel-IJTAG network is shown in Fig. 7, and the corresponding serial IJTAG network with three SIBs is shown in Fig. 8. For the sake of simplicity, instead of showing TDI from the TAP and the additional (n −1) P-TDIs for the parallel network separately, we labelled the entire n-bit input port as P-TDI. (This convention is followed in the rest of this paper.) These example network designs are based on the flat architecture, where each SIB is in the active scan path regardless of its open or closed state. Thus, each additional SIB adds at least one bit to the shift time on DR-Scan. This extra cycle due to the presence of the SIB is called the “SIB programming overhead” [4].
In Fig. 7, TDRa of length La bits from the serial IJTAG network is divided into a set of four segments – TDR1, TDR2, TDR3, and TDR4. These four TDR segments are in separate scan paths connected to four different SIBs – SIB1, SIB2, SIB3, and SIB4. Thus SIBa from the serial network is replaced by four single-bit SIBs in the 4-bit parallel network. Similarly, other SIBs (SIBb and SIBc in this instance) and their respective TDRs (TDRb and TDRc) from the serial network can be implemented in parallel.
Opening (or closing) SIBs requires scanning in their control bits through the scan path. Also, the JTAG state machine needs to be cycled in the sequence of Exit1-DR, Update-DR, RunTest/Idle, Capture-DR, and Shift-DR states to perform update and capture operations on the TDRs between two shift operations. (Note that many software tools require going through the Run-Test/Idle state.) Thus, the test time depends on the number of SIBs in the scan path (i.e. the SIB programming overhead), the TAP controller cycles, the instrument’s TDR length (L), the number of times (A) the instrument needs to accessed during test, and the concurrency of instrument access. To calculate the test time improvements in the Parallel-
CLOCK CYCLES REQUIRED TO ACCESS 32-BIT TDRA 5 TIMES IN A SERIAL AND A 4-BIT PARALLEL NETWORK – FLAT ARCHITECTURE
Operation | Serial IJTAG # clock cycles | 4-bit Parallel IJTAG # clock cycles |
Open SIBa | 3(SIBs) + 5(TAP) | 3(SIBs) + 5(TAP) |
Scan out previous output vector + Scan in new input vector (TDRa) | (3 + 32) + 5 | (3 + 8) + 5 |
Repeat previous operation 4 more times | [(3 + 32) + 5]*4 | [(3 + 8) + 5]*4 |
Scan out final output vector | (3 + 32) + 5 | (3 + 8) + 5 |
∑=248 | ∑=104 |
IJTAG network compared to the serial design, we calculate the SIB programming overhead and the number of clock cycles required to perform read/write operations on the instrument TDRs in the example flat network designs shown in Fig. 7 and Fig. 8. An example of this calculation is shown in Table I. In this example, we assume the length of TDRa=32-bits, and the Parallel-IJTAG network is assumed to have four parallel scan paths. We also assume that the instruments are accessed sequentially as opposed to concurrently in the order that the SIBs associated with these instruments appear in the scan path.
Thus, SIBa in the serial design and in the parallel design (SIB1, SIB2, SIB3, and SIB4) is opened in the first scan operation by shifting in three bits in each scan path. Five more clock cycles are needed to apply UpdateDR and return to ShiftDR through the JTAG state machine so that a new scan shift cycle can commence. (This is shown in Row 1 in Table I). Once SIBa is opened, the active serial scan path will include the 32 bits of TDRa, and the four active scan paths of the parallel network will include the 8 bits of TDR1, TDR2, TDR3, and TDR4. Thus, the next scan operation in the serial network will shift in three bits for the SIBs and 32 bits for TDRa. On the other hand, the parallel network will shift in three bits for the SIBs and 8 bits for the TDRs in each of the four scan paths. During each of these scan operations, the previous output vector from the instrument can be scanned out while scanning in a new input vector. Hence, to access an instrument 5 times this scan operation will be performed 5 times in total (Rows 2 and 3). Finally, an additional scan operation would be required to scan out the final output vector, close SIBa, and return to ShiftDR (Row 4). The number of clock cycles required to perform these operations in both the serial and Parallel-IJTAG networks is shown in Row 5. Accessing the other instruments sequentially would require this process to be repeated again for each instrument. Concurrent access would involve fewer iterations of this procedure, but the shift cycles would be longer to accommodate multiple TDRs being present on the chain simultaneously. As shown in Table I, the SIB programming overhead for the Parallel-IJTAG and serial network remains the same. However, the instrument access times (number of clock cycles) decreases in the Parallel-IJTAG network depending on the number of parallel scan paths available. Table II shows the Overall Access Times (OAT) for three instruments in both parallel and serial
TABLE II OVERALL ACCESS TIMES (NO. OF CLOCK CYCLES) – FLAT ARCHITECTURE, THREE INSTRUMENTS. C=CONCURRENT, AND S=SEQUENTIAL INSTRUMENT ACCESS
TDR length La, Lb, Lc | TDR Access Aa, Ab, Ac | Parallel Network | Serial Network | ||||||
n=4 | n=8 | n=16 | n=1 | n=1 | |||||
C | S | C | S | C | S | C | S | ||
32, 48, 64 | 5, 4, 6 | 284 | 372 | 174 | 262 | 119 | 207 | 944 | 1032 |
128, 64, 256 | 4, 3, 2 | 464 | 520 | 256 | 312 | 152 | 208 | 1712 | 1768 |
32, 16, 64 | 6, 4, 8 | 300 | 396 | 190 | 286 | 135 | 231 | 960 | 1056 |
128, 256, 512 | 2, 2, 1 | 576 | 616 | 304 | 344 | 168 | 208 | 2208 | 2248 |
networks based on the flat architecture using both concurrent (C) and sequential (S) test schedules and different numbers of accesses (A) and parallel scan paths (n). The calculation for the OAT values is based on the discussion in [5]. We can see that the OAT for instruments in a parallel network reduces drastically as compared to the serial network, for both concurrent and sequential schedules. The total number of single bit SIBs required in an n-bit simple Parallel-IJTAG network increases by a factor of n.
B. Parallel-SIBs (P-SIBs) in Parallel-IJTAG network – II
In the previous subsection the n-bit Parallel-IJTAG network design was obtained by simply adding n serial scan paths in parallel to the IJTAG network. The SIB programming overhead and the number of clock cycles required to shift SIB data remained the same as compared to the corresponding serial IJTAG network design. However, the area overhead contributed by the SIBs increased by a factor of n. The n-bit Parallel-SIB shown in Fig. 9 reduces this SIB area overhead by using only one update cell and n shift cells.
To open this Parallel-SIB, first an appropriate value is scanned in the shift-1 cell and then an UpdateEn signal is applied. The update-1 cell generates the Select* signal which is applied to the muxes of the n shift cells in the parallel scan paths. The TDRs connected to these shift cells are also enabled and now become part of the active parallel scan paths.
The SIB programming overhead of the parallel network designed using this Parallel-SIB design is the same as that for the previous parallel design or existing serial IJTAG network designs. Thus, the bandwidth improvements are the same as in the previous design. However, this design requires (n −1) fewer update cells for each Parallel-SIB compared to the previous parallel network.
C. Parallel-IJTAG network using Addressable Parallel-SIBs
In the P-SIB design discussed in the previous sub-section, the bit from the shift-1 cell is fed to the update-1 cell directly to generate the Select* for all of the parallel SIBs. Another way to design this P-SIB is such that the bits from some or all of the shift cells are AND’ed together in complemented or uncomplemented form before feeding the result to the update-1 cell. In this way, the Select* signal will be generated only if all of the shift cells in the Parallel-SIB contain the correct values during the update operation. Thus, the parallel SIB is now addressable. The P-SIB shown in Fig. 10 is an addressable

PSIB that opens when the bits 10…1 are shifted into the shift cells of the PSIB followed by an update operation.

One of the uses of addressable PSIBs involves the design of broadcast networks such as that shown in Fig. 11. When the address of the target PSIB is broadcast on all three parallel paths, the PSIB with the correct address opens and allows access to the instrument’s TDR. Contention to PTDO can be resolved using tri-state buffers, or using multiplexers. This network design can be useful in cases where the number of pins available for PTDI are limited and no two instruments need to be accessed concurrently. It also reduces the SIB programming overhead during each instrument access because the instruments are not connected to the scan-path serially.

V. HYBRID IJTAG NETWORK
The test architecture shown in Fig. 6 includes only the Parallel-IJTAG network. Even though the parallel design provides high bandwidth access to embedded instruments, it requires more SIBs than the serial network and adds to the complexity of routing control signals to the additional SIBs.
Fig. 12 shows a conceptual view of the interfacing of both a serial and Parallel-IJTAG network to JTAG. This architecture can provide the designer with opportunities to make tradeoffs between the instrument access times and the area of the test network. Instruments that require high bandwidth or instruments that need to be accessed several times during test can be included in the parallel network. Instruments with smaller TDRs or those that do not require frequent access during test can be placed in the serial network. Additional parallel and/or serial TDRs could also be included if desired.

Parallel IJTAG network
VI. SECURING PARALLEL-IJTAG NETWORKS
Securing access to the embedded instruments in a serial IJTAG network using Locking-SIBs (LSIBs) was shown to be effective against brute force attacks in [23]. Next, we explore the use of LSIBs in a Parallel-IJTAG network.
A. LSIBs in Parallel-IJTAG network
As shown in Fig. 5, in a serial IJTAG network the key bits required to unlock the LSIB are present in the serial scan path. These key bits may be present in the update cells (or shift cells) of other SIBs or scan cells in the scan path. In Fig. 5, the output from these key bit holding cells is fed to a tree of AND and NOT gates (referred to as Key Logic in this paper) which generates the LSIB Unlock signal. (Note that other key logic implementations, including key logic that allows programmable or changeable keys and key logic shared among multiple LSIBs, are possible and may be preferable.) In an n-bit Parallel-IJTAG network these j key bits are spread over j scan cells in n parallel scan paths. Fig. 13 shows an n-bit Parallel-IJTAG network with Locking-SIBs.

The n-bit Parallel-IJTAG network shown in Fig. 13 is similar to the one in Fig. 7, but SIBa is replaced with LSIBa. Here LSIBa consists of n single-bit LSIBs (LSIB1, LSIB2, LSIB3, to LSIBn), with one LSIB in each of the n parallel scan paths providing secure access to its own subcomponent of TDRa. The unlock signal for the LSIBs in the parallel scan path is generated by the key logic circuit when the correct key bit values are placed in the update cells of the key-bit holding registers. (In this case, j update cells from SIBb and SIBc and previous cells on the scan path are used as key bits.)
B. Locking-Parallel-SIBs (L-PSIBs)
In the previous subsection, an n-bit LSIB with a j-bit key in a Parallel-IJTAG network was implemented by replacing the single-bit SIBs in the parallel scan paths by single-bit LSIBs. As in the case of the Parallel-SIBs described in Section IV-B, we can design an n-bit Locking-PSIB that consists of only one update cell and n shift cells. Fig. 14 shows an n-bit LockingPSIB that can be used in an n-bit Parallel-IJTAG network.
If a Parallel-IJTAG network utilizes this Locking-PSIB and the Parallel-SIBs from Section IV-B, then the designer has decisions to make regarding the placement of key bits. To minimize area overhead and test time, the key bits may be placed in the parallel shift cells. This allows the L-PSIB to be opened in one DR-Scan because the key bits can enable the Unlock signal as soon as the right values are shifted in. The ability to shift the key in parallel may also reduce test

Fig. 14. Locking-SIB using single update cell in an n-bit Parallel-IJTAG network
time. However, such a design may be more susceptible to power analysis attacks. Alternatively, key bits may be placed in update cells. Because the L-PSIB and PSIB designs only contain a single update cell, additional update cells may need to be added to the outputs of shift cells designated as key bits to allow shifting the key bits in parallel. In addition, the test time will increase as one DR-Scan is needed to place the key bits in the update cells and a second DR-Scan is needed to open the L-PSIB.
C. Cost of a Guess
The previous two subsections showed ways to provide secure access to the embedded instruments in a Parallel-IJTAG network. We now discuss the cost of guessing a j-bit key in an n-bit-wide Parallel-IJTAG network of length m-bits. Compared to the serial network in which each key bit is scanned in sequentially, in an n-bit Parallel-IJTAG network, n key bits can be shifted in simultaneously. Due to this, it might appear that the LSIB security in a parallel network may be more susceptible to brute force attacks that attempt to access the network by trying random keys. However, a Parallel-IJTAG network using the standard JTAG controller still needs to go through the JTAG FSM’s states for each key guess. In this subsection, we show how this affects the cost of applying guess keys through the Parallel-IJTAG network.
As discussed earlier, the SIB programming overhead for the Parallel-IJTAG network is same as in the serial network. Thus, the length of the two networks designed with the same SIB hierarchy when all the SIBs are closed will be equal. In a serial network, when a SIB is opened to access a k-bit TDR, the length of the active scan path increases by k-bits. In the Parallel-IJTAG network, when an n-bit SIB is opened to access a k-bit TDR, the lengths of all the active parallel scan paths increase by (k/n)-bits (assuming it is evenly divisible). Thus the length of an n-bit parallel network when all of the SIBs are open would depend on the length k of the TDRs and the number of n parallel scan paths.
To investigate the time required to access an instrument hidden behind an L-PSIB, we first assume that the length of the parallel scan paths is (m-bits) long, which is ideally longer than the (j/n)+1)-bits on the scan path that correspond to the key and the LSIB bit that the attacker is trying to open. We also assume that the Locking-PSIB discussed in Section VI-B is used in the network and that the key bits are present in the shift cells as opposed to update cells. The TAP controller state transitions that are required in an attempt to open an LSIB with a random guess key scanned into the network is shown in Fig. 15.

An UpdateDR should be performed after the guess key pattern is scanned into the network. If the guess key is correct, then the key logic circuit would generate an L-PSIB unlock signal, and the UpdateDR operation will open the L-PSIB if the correct value was simultaneously placed in the shift-1 cell of the L-PSIB. This will add (k/n)-bits of the TDR segments to each parallel scan path. This UpdateDR operation is followed by cycles required to check if the length of the chain has changed, indicating that the L-PSIB is now open. Checking the length of the chain involves shifting a deterministic pattern of length d through the chain of expected length m. Because the transition from UpdateDR to ShiftDR includes an intermediate CaptureDR state, any previously scanned in guess pattern may now be overwritten if the corresponding shift cells of the key bits are also designed to capture data. The deterministic pattern of length d is now shifted in, followed by the next guess key pattern, which is shifted through the length m of the parallel scan paths. Thus, the attacker would need to shift for (m+d) cycles for each guess key. The cost of applying a guess key, attempting to open the L-PSIB, and checking if the chain length is changed is thus:
Costguess = (5+ m + d) cycles
For an L-PSIB key of length j-bits the expected number of guess key patterns required is 2j+1. The “plus 1” is for the correct value that must be clocked into the LSIB’s update cell. This equation also assumes that the chain is sufficiently longer than the key that even seemingly different random key guesses by an attacker who does not know the key bit locations may be identical in the key bits themselves. Even relatively small key sizes can lead to long brute force attack times. For example, the expected time for a successful brute force attack on average for a key size of 56 bits when the parallel scan chains are all 256 bits long, and with a shift clock frequency of 50MHz, is over 12,000 years.
D. Securing instrument access in a hybrid IJTAG architecture
As discussed in the previous section, the time required to open an L-PSIB using guess keys depends on the size of the key and the length of the parallel scan paths. The length of the parallel network and its corresponding serial network is the same if all the SIBs are closed. However, as mentioned earlier, the length of the parallel network does not increase by the same amount as that of the serial network when a SIB is opened to allow access to an instrument’s TDR. Thus at any given time in a parallel network, if some of the SIBs are open, then the length of the parallel scan paths will be smaller than the length of the corresponding serial network designed with the same SIB hierarchy. As the number of parallel scan paths in the Parallel-IJTAG network increases, and as the number of L-PSIBs that are open increases, this difference in lengths between the two networks will increase as well.
The hybrid serial and parallel network discussed in Section V provides further opportunities to mitigate the brute force attacks on L-PSIBs in a parallel network as the number of parallel scan paths increases. Because the hybrid architecture consists of both the serial and parallel networks, the key logic circuits of L-PSIBs from the parallel network can be moved to the serial network. Thus, these key logic circuits can be fed with key bits from the the serial scan path. Moreover, these key logic circuits for the L-PSIBs that are moved to the serial network can share the same input bits as the other existing key logic circuits designed for LSIBs in the serial network, reducing the susceptibility to power analysis attacks.
In this design, an L-PSIB would be unlocked by first selecting the TDR containing the serial network with the JTAG state machine. If we assume that the attacker is starting in the SelectDR state of the JTAG state machine, Fig. 16 shows the subsequent steps that would be followed by an attacker trying to open an L-PSIB in a hybrid network.
First, if the serial chain length is s, then leaving SelectDR, shifting a key guess into the serial network, applying UpdateDR, and returning to SelectDR requires s + 5 clock cycles. Next, the attacker must shift to the parallel network. This requires changing the instruction in the JTAG instruction register. Entering the IR-Scan half of the state machine takes one clock cycle. If the length of the instruction register is i, then leaving SelectIR, shifting in the instruction, applying UpdateIR, and returning to SelectDR requires i + 5 cycles.

Next, the attacker tries to get the correct value in the Update cells of any L-PSIBs in the hope that the key was set correctly and an L-PSIB will open. A new DR-scan begins and all 1’s are shifted into the parallel network. If the parallel network is of length m, this requires m+5 cycles. This is followed by a second DR-scan that attempts to check the length of the chain to see if an L-PSIB was opened successfully. This consists of shifting in a distinctive pattern of d bits followed by an m-bit pattern of all 0’s and applying UpdateDR. This requires m+d+5 cycles. Next, assuming the attacker was unsuccessful at opening an L-PSIB with the all 1’s attempt, he can use the 0’s already in the chain to attempt to get the correct value in the L-PSIB’s update cell and once again check the length of the chain. This requires an additional m+d+5 clock cycles. If the attacker is still unsuccessful, he will need to once again switch to the serial network to begin a new guess. This requires 1+i+5 clock cycles. The process can now begin again with a new key guess. Thus, the total cost of a guess is:
Costguess = (32+ s +3∗ m +2∗ d +2∗ i) cycles
Because the attacker is now able to deterministically try both a logic 1 and a logic 0 in the L-PSIB’s update cells (as shown in boxes 5 and 6 of Fig. 16), the expected number of random guesses on average is 2j for a j-bit key (assuming that the key size is much smaller than the serial scan path length and thus an attacker without detailed knowledge of the key locations cannot deterministically avoid repeatedly using the same key values in his guesses).
As shown in Table III, this approach provides more security than the previous version due to the extra passes through IRScan and the greater likelihood of being able to hide the locations of the key bits. In this case, when s = 256,m = 64, and d = 10, and the clock frequency is 50 MHz, the expected time to open an L-PSIB with a 56-bit key is approximately 24,000 years on average.
TABLE III
EXPECTED TIME TO UNLOCK AN L-PSIB IN A HYBRID (SERIAL AND
PARALLEL) NETWORK. s=256, m=64, d=10, SHIFT FREQUENCY=50MHZ
L-PSIB Key – j-bits | 48 | 56 | 64 | 72 | 80 |
Time t years | 9.28E1 | 2.38E4 | 6.08E6 | 1.56E9 | 3.99E11 |
VII. CONCLUSION
In this paper, we have introduced multiple types of parallel SIBs and parallel IJTAG networks targeting enhanced bandwidth, area, and/or security. We have shown that parallel SIBbased networks can significantly increase the test bandwidth with no increase in the SIB programming overhead (i.e. the clock cycles needed to shift through closed SIBs while accessing an instrument). Additional overhead can be saved by reducing the number of update cells used for the ParallelSIB design to 1. When test I/O is limited, a broadcast mode with addressable SIBs can add significant test bandwidth and lower SIB overhead at the cost of disjoint instrument access.
The security of parallel IJTAG networks protected by locking parallel SIBs was also investigated. Although one might intuitively think that security would be compromised with the ability to scan key data into the network in parallel, the number of guesses required still grows exponentially with key size. Shorter chain lengths in the parallel case (and thus a smaller cost of a guess) when some of the SIBs are open can be counteracted with slightly larger keys. In addition, when all of the SIBs are closed, the chain lengths of both networks are identical, and the cost of a guess is thus the same as well. However, more cells are active on the chain in the parallel case, providing more possible key bit locations when all SIBs are closed. A 56-bit key in a 256-bit long parallel scan chain shifted at 50 MHz would require over 12,000 years on average for a successful brute force attack. Distributing the keys and LSIBs across a parallel hybrid network increases the time required even further to 24,000 years on average.
Future work will explore alternative parallel IJTAG network architectures as well as methods for performing intelligent tradeoffs between test access time, overhead, and security. We will also investigate the application of other security optimizations, including the use of traps introduced in [23] to parallel IJTAG. Finally, we will explore the implications of power analysis attacks on parallel IJTAG security.

REFERENCES
[1] “IEEE Standard for Access and Control of Instrumentation Embedded within a Semiconductor Device,” IEEE Std 1687-2014, pp. 1–283, Dec. 2014.
[2] “IEEE Standard for Test Access Port and Boundary-Scan Architecture Redline,” IEEE Std 1149.1-2013 (Revision of IEEE Std 1149.1-2001) Redline, pp. 1–899, May 2013.
[3] J. C. Ye, M. A. Kochte, K. J. Lee, and H. J. Wunderlich, “Autonomous Testing for 3D-ICs with IEEE Std. 1687,” in 2016 IEEE 25th Asian Test Symposium (ATS), Nov. 2016, pp. 215–220.
[4] F. G. Zadegan, U. Ingelsson, G. Carlsson, and E. Larsson, “Design automation for IEEE P1687,” in 2011 Design, Automation Test in Europe, Mar. 2011, pp. 1–6.
[5] ——, “Access Time Analysis for IEEE P1687,” IEEE Transactions on Computers, vol. 61, no. 10, pp. 1459–1472, Oct. 2012.
[6] R. Cantoro, M. Montazeri, M. S. Reorda, F. G. Zadegan, and E. Larsson, “On the testability of IEEE 1687 networks,” in 2015 IEEE 24th Asian Test Symposium (ATS), Nov. 2015, pp. 211–216.
[7] R. Cantoro, M. Palena, P. Pasini, and M. S. Reorda, “Test Time Minimization in Reconfigurable Scan Networks,” in 2016 IEEE 25th Asian Test Symposium (ATS), Nov. 2016, pp. 119–124.
[8] F. G. Zadegan, U. Ingelsson, G. Asani, G. Carlsson, and E. Larsson, “Test Scheduling in an IEEE P1687 Environment with Resource and Power Constraints,” in 2011 Asian Test Symposium, Nov. 2011, pp. 525–531.
[9] S. S. Nuthakki, R. Karmakar, S. Chattopadhyay, and K. Chakrabarty, “Optimization of the IEEE 1687 access network for hybrid access schedules,” in 2016 IEEE 34th VLSI Test Symposium (VTS), Apr. 2016, pp. 1–6.
[10] A. Ibrahim and H. G. Kerkhoff, “Efficient utilization of hierarchical iJTAG networks for interrupts management,” in 2016 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Sep. 2016, pp. 97–102.
[11] K. Rosenfeld and R. Karri, “Security and Testing,” in Introduction to Hardware Security and Trust, M. Tehranipoor and C. Wang, Eds. Springer New York, 2012, pp. 385–409.
[12] R. F. Buskey and B. B. Frosik, “Protected JTAG,” in 2006 International Conference on Parallel Processing Workshops (ICPPW’06), 2006, pp. 8 pp.–414.
[13] C. Clark, “Anti-Tamper JTAG TAP Design Enables DRM to JTAG Registers and P1687 on-Chip Instruments,” in 2010 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), Jun. 2010, pp. 19–24.
[14] K. Rosenfeld and R. Karri, “Attacks and Defenses for JTAG,” IEEE Design Test of Computers, vol. 27, no. 1, pp. 36–47, Jan. 2010.
[15] H. A. Little, J. R. Randell, R. C. Madter, and R. J. HICKEY, “Debugging port security interface,” U.S. Patent US20120278630 A1, Nov., 2012.
[16] B. Yang, K. Wu, and R. Karri, “Scan Based Side Channel Attack on Dedicated Hardware Implementations of Data Encryption Standard,” in 2004 International Conferce on Test, Oct. 2004, pp. 339–344.
[17] D. Mukhopadhyay, S. Banerjee, D. RoyChowdhury, and B. B. Bhattacharya, “CryptoScan: A Secured Scan Chain Architecture,” in 14th Asian Test Symposium (ATS’05), Dec. 2005, pp. 348–353.
[18] A. A. Kamal and A. M. Youssef, “A Scan-Based Side Channel Attack on the NTRUEncrypt Cryptosystem,” in 2012 Seventh International Conference on Availability, Reliability and Security, Aug. 2012, pp. 402–409.
[19] S. S. Ali and O. Sinanoglu, “Scan Attack on Elliptic Curve Cryptosystem,” in 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), Oct. 2015, pp. 115–118.
[20] J. Lee, M. Tehranipoor, C. Patel, and J. Plusquellic, “Securing Scan Design Using Lock and Key Technique,” in 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05), Oct. 2005, pp. 51–62.
[21] J. Lee, M. Tebranipoor, and J. Plusquellic, “A low-cost solution for protecting IPs against scan-based side-channel attacks,” in 24th IEEE VLSI Test Symposium, Apr. 2006, pp. 6 pp.–99.
[22] J. Lee, M. Tehranipoor, C. Patel, and J. Plusquellic, “Securing Designs against Scan-Based Side-Channel Attacks,” IEEE Transactions on Dependable and Secure Computing, vol. 4, no. 4, pp. 325–336, Oct. 2007.
[23] J. Dworak, A. Crouch, J. Potter, A. Zygmontowicz, and M. Thornton, “Don’t forget to lock your SIB: Hiding instruments using P1687,” in 2013 IEEE International Test Conference (ITC), Sep. 2013, pp. 1–10.
[24] A. Zygmontowicz, J. Dworak, A. Crouch, and J. Potter, “Making it harder to unlock an LSIB: Honeytraps and misdirection in a P1687 network,” in 2014 Design, Automation Test in Europe Conference Exhibition (DATE), Mar. 2014, pp. 1–6.
[25] J. Dworak, Z. Conroy, A. Crouch, and J. Potter, “Board security enhancement using new locking SIB-based architectures,” in 2014 International Test Conference, Oct. 2014, pp. 1–10.
[26] J. Dworak and A. Crouch, “A call to action: Securing IEEE 1687 and the need for an IEEE test Security Standard,” in 2015 IEEE 33rd VLSI Test Symposium (VTS), Apr. 2015, pp. 1–4.
[27] S. Gupta, J. Dworak, D. Engels, and A. Crouch, “Mitigating simple power analysis attacks on LSIB key logic,” in 2017 IEEE North Atlantic Test Workshop (NATW), 2017, pp. 1–6.
[28] H. Liu and V. D. Agrawal, “Securing IEEE 1687-2014 Standard Instrumentation Access by LFSR Key,” in 2015 IEEE 24th Asian Test Symposium (ATS), Nov. 2015, pp. 91–96.
[29] R. Baranowski, M. A. Kochte, and H. J. Wunderlich, “Fine-Grained Access Management in Reconfigurable Scan Networks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 6, pp. 937–946, Jun. 2015.