NAND Flash
There were three NAND flash papers, one each from Toshiba, Samsung, and Western Digital Corp. (WDC).
Toshiba 96-layer 1Tb QLC NANDToshiba described a 96-layer QLC 1.33 terabit chip. Like the chip that Toshiba presented last year, this one uses CUA, which Toshiba calls “Circuit Under Array” although Micron, who originated the technology, says that CUA stands for “CMOS Under Array.” Toshiba improved the margins between the cells by extending the gate threshold ranges below zero, a move that forced them to re-think the sense amplifiers. They also implemented a newer, faster, lower-error way to program the cells. The developers made a number of speed-related enhancements, including the option of allowing pages to be configured as SLC, TLC, or QLC. Toshiba compared this chip to a similar chip later presented by its partner Western Digital, pointing out that this one had a higher density despite its lower layer count (96 vs. 128 layers) largely due to its use of QLC rather than the TLC approach used by the chip that WDC presented. The WDC chip was designed for speed.
Samsung 512Gb >110-layer TLC NANDSamsung presented a 512Gb TLC part that had “more than 110 layers” but the company’s management forbade the speaker from disclosing the actual layer count. This is something peculiar to Samsung, who originated the concept of not disclosing process geometries, choosing instead to use the term “20nm-class” to denote its 27nm planar NAND. Samsung has been working on thinning the chip’s layers to avoid having to adopt the string stacking approach already embraced by its competition. This might have prevented Samsung from implementing a full 128 layers in this chip, which would explain the company’s reluctance to specify its layer count. The key focus of this design was to operate at the very high speeds required by the Toggle 4.0 interface, and it did a good job of that, with a 1.2Gb/s I/O bandwidth, 83MB/s program throughput, and a 45 microsecond read time. This was achieved through a number of optimizations ranging from improved bitline precharge techniques, minimization of wordline capacitive coupling, and new Toggle Mode interface techniques, to even modifying erase sequences to compensate for the fact that the string column’s diameter shrinks from the top of the column to its bottom. Interestingly, this chip has the least efficient layout of any that The Memory Guy has seen to date, with only 13 million columns per square millimeter, half that of the most dense chips.
Western Digital presented a different chip than its partner Toshiba’s at a 512Gb density using 128 layers and TLC. The speaker made it a point to repeatedly tell the audience that the design had the highest layer count ever announced, which worked to Samsung’s disadvantage since Samsung chose not to reveal its layer count. The chip was divided into four planes to double performance, an approach that would have penalized the die area by about 15% had the designers not used the CUA approach mentioned above. By slipping the logic beneath the array the available area for the logic becomes enormous and the die area penalty for moving to four planes was reduced to less than 1%. When power planes and the bitlines are also allowed to move under the array the chip can run at higher speeds thanks to reduced capacitance and resistance. As a result, WDC achieved a 132MB/s program throughput or about 50% more than that of the Samsung chip. The WDC chip also uses a technique the company calls “multi-chip Peak Power Management” or mPPM to manage power in a multi-die stack and improve write times by as much as 47%, and accesses data using a smaller 4kB page (vs. the industry standard 16kB) to limit peak currents.