“A major challenge facing the semiconductor industry is the inability to detect product defects early in the mass production phase. If a defective product is put on the market, it will cause huge financial and reputational damage to the business. This is especially true for designers of high-performance computing system-on-chip (SoC) for hyperscale data centers, networking, and AI applications, as any product defect could have a catastrophic impact on AI R&D workloads or data processing.
A major challenge facing the semiconductor industry is the inability to detect product defects early in the mass production phase. If a defective product is put on the market, it will cause huge financial and reputational damage to the business. This is especially true for designers of high-performance computing system-on-chip (SoC) for hyperscale data centers, networking, and AI applications, as any product defect could have a catastrophic impact on AI R&D workloads or data processing.
The semiconductor industry has developed a range of test methods to increase the speed and coverage of production testing. And these methods are standardized so that companies can use common test metrics and interfaces at different stages of final product manufacturing, from wafer testing to chip testing to board-level testing, to improve efficiency.
This article describes how to use Die-to-Die PHY IP to efficiently test system-in-package (SiP) in volume production to ensure that the final product is defect-free and maintains the highest possible production yield. It also explains how the Die-to-Die PHY IP’s internal test capabilities extend the test coverage of all dies.
Challenges of SiP Testing
Integrating multiple bare dies into a single package is of renewed interest. There are two factors contributing to this trend: on the one hand, increasing design complexity; on the other hand, the size of SoCs is too large for cost-effective monolithic integration, and the technologically and economically more meaningful process nodes are not available. flexibility to implement different SoC functions.
A SiP is a chip that integrates multiple dies (or “chiplets”) in one package. These can be multiple of the same chiplet to improve system performance, or different chiplets to bring more functionality to the system in a cost-effective manner.
Often, chiplets are integrated into the same package after being produced by different suppliers. As shown in Figure 1, modern 2.5D or 3D packaging technologies integrate multiple dies in complex ways, utilizing (simpler) organic substrates or (more complex) silicon interposers, silicon bridges, and via-silicon vias (TSVs). ) to transmit signals between dies and to the periphery of the package.
Figure 1: Different packaging technologies with different routing capabilities
Individual dies, package “structures” (interposers, TSVs, bumps) and package components can be yield constrained. Even if the yield of each individual component is relatively high, the total yield of SiP (the cumulative yield of all the different components) can be very low, as shown in the following formula:
Yield SiP = Yield N Die x Yield Package x Yield Component
where N = the number of dies integrated in the same package.
Taking a SiP with 4 dies as an example, the yield of each die is 90%, and the package and integration yield is 100%, the total SiP yield is only about 65%. For large dies in advanced process nodes, an individual yield of 80% is good, but the final SiP yield can be very low, around 41%. Basically, even if 3 dies are defect-free, as long as 1 die is defective, the entire SiP will fail.
To improve yield, companies need to follow two principles:
1. Identify and only integrate known good dies (KGD) in the package. Thus, the total SiP yield in the above example is equal to the yield of each die.
2. After integration, verify cross-die functionality to detect defects during integration, as well as other defects that are difficult to identify by testing a single die (e.g., a defective bump may not be detected during single-die testing).
It can also help improve yield by testing and fixing functions at the die level and at the integrated system level to avoid defects or otherwise overcome discovered defects. Such test and repair functions can include redundancy or other schemes, and are particularly useful for large conventional structures such as memory or very wide buses across dies.
Given the complexity of SiP testing and the variety of die sources, having a standardized test infrastructure and methodology across the ecosystem is critical to the success of the SiP and chiplet ecosystem. IEEE and other standards groups are stepping up efforts to develop new test architecture standards for 3D package dies.
SiP Test Architecture
For example, the recently released IEEE 1838 defines a standardized modular test access architecture for SiP products, helping system designers and test engineers to efficiently validate their products, as shown in Figure 2.
Figure 2: IEEE 1838 Test Access Architecture for Testing Single Dies, Integrated Dies, and Packaged SiPs
Based on existing test standards for monolithic SoCs (such as IEEE 1149.1, IEEE 1500, etc.), IEEE 1838 defines a test architecture for managing the testing of individual and integrated dies with minimal addition of test circuitry Achieve full die-to-die functional block test coverage.
IEEE defines a serial port (based on IEEE 1149.1) for test control and low-speed test data access, implemented in each die and accessible even after final integration; also defines an optional parallel test Access ports, but may not be accessible after integration. These ports are reduced to using only one set of test bumps for non-integrated die testing, or seamlessly connect to corresponding ports in another die, expanding the test infrastructure to cover integrated intra-die or inter-die testing.
In addition, IEEE defines the test hierarchy, dividing the work into intra-die testing of KGD, inter-die testing of packaged components, and inter-die testing of packaged components themselves, as shown in Figure 2.
Inside each die, more test hierarchies can be defined to test digital logic blocks, memory blocks, and other blocks with scan chains and built-in self-test (BIST) structures according to established methods. Digital connections between dies are tested based on boundary scan chains.
High-speed mock block testing is typically based on functional testing, but can also be integrated into the test management hierarchy by adding suitable test wrappers that interface with the test infrastructure, as shown in Figure 3.
Figure 3: Test architecture hierarchy inside Chiplet, including wrappers for integrating high-speed mock block testing capabilities throughout the test infrastructure
To automate test and reduce test time, high-speed analog blocks such as high-speed PHY IP must provide adequate test coverage. This becomes even more challenging when considering high-speed die-to-die links. For such cases, it is necessary to rely on the test infrastructure built into the high-speed PHY to test the complete link including the PHY on both dies, the associated bump and encapsulation link.
A high-speed PHY that implements inter-die connectivity must include a number of design-for-test (DFT) functions:
Scan chains for static and fast detection of faults in digital circuits (fixed, open, slow transitions/transitions)
Built-in self-test (BIST) function to detect specific digital and analog modules as much as possible
Internal loopbacks test a single PHY; these loopbacks can be shallow (covering digital circuitry) or deep (covering all transmit and receive signal paths up to bump or as close to bump as possible without avoiding impact on mission mode performance influence)
Pattern generators and matchers that support pseudo-random patterns or specific patterns
Ability to scan reference bits and phases to generate pass/fail eye diagrams to determine design margins
External loopback from one die to the next extends the test coverage to bump and die-to-die traces, as shown in Figure 4.
Tests for known qualified die
Mandatory initial steps are performed prior to integration in the SiP, identifying defective chips so that only KGD is integrated, significantly improving overall production yield.
The die is KGD tested prior to packaging. For IEEE 1838 compliant dies, standard serial and parallel test access ports are used to access the die’s complete test infrastructure through a reduced set of test bumps.
Test functions within the analog block, such as high-speed PHY IP, are also interconnected with the die test infrastructure via an IEEE 1500 compliant wrapper, enabling PHY testing as well.
Extending coverage to such missing items, as well as inter-die connections, will be performed on the integrated SiP in subsequent steps of the test strategy.
Assuming both dies are IEEE 1838 compliant, the die’s test infrastructure is seamlessly merged into a single fabric evaluated at the test port of the same (“first”) die, and expanded to the next with secondary test ports die.
Tests such as boundary scan EXTEXT for digital pins and cross-die loopback tests for high-speed PHYs can now be initiated, extending test coverage to the periphery of the die as well as to the package itself.
Other yield improvement strategies
It is worth noting that, in some special cases, the above layered test method may still not improve the yield to the desired level.
In this case, a wide parallel interface can be considered between the two dies: for example, a high-bandwidth memory (HBM) between the memory and the digital chip, or a high-bandwidth interconnect (HBI) between the two digital chips /Advanced Interface Bus (AIB). These interfaces may have thousands of pins using tiny bumps and have very dense traces on the interposer to connect those pins. In this case, the yield of substrate traces or micro bumps may be very low, resulting in KGD losses. For such cases, a complementary test and repair strategy can be employed, relying on redundant pins on each PHY and corresponding redundant micro-bumps and traces that can additionally restore higher yields after final product integration .
With the growing market demand requiring the integration of multiple dies into the same package for high performance computing applications and many other applications, testing of dies (pre- and post-integration) becomes key to achieving expected yields where. Based on a standards-based die test infrastructure, test coverage must be extended to the die level and to the integrated SiP. The functionality of the Die-to-Die interface covers both dies that make up the link, and thus plays an important role in the test strategy. The die-to-die PHY IP must include test capabilities that simplify testing at the die level and the post-integration link itself, while being able to be integrated into the chip test infrastructure.