高性能高可靠的计算机存储系统架构设计研究中英文摘要

时间:2019-05-13 16:14:29下载本文作者:会员上传
简介:写写帮文库小编为你整理了多篇相关的《高性能高可靠的计算机存储系统架构设计研究中英文摘要》,但愿对你工作学习有帮助,当然你在写写帮文库还可以找到更多《高性能高可靠的计算机存储系统架构设计研究中英文摘要》。

第一篇:高性能高可靠的计算机存储系统架构设计研究中英文摘要

论文中英文摘要

作者姓名:孙宏滨

论文题目:高性能高可靠的计算机存储系统架构设计研究

中文摘要

过去几十年里,集成电路工艺尺寸缩小已经为电路设计带来巨大的性能改善。按照摩尔定律的预测,处理器的速度每18个月就会翻一番,而存储器的速度每年仅仅增长7%。结果,处理器与存储器之间的速度鸿沟每21一个月就会翻一番,这被称为“存储墙”问题。包括高速缓存与主存在内的计算机存储系统架构设计是解决处理器与存储器之间性能鸿沟的主要方法。随着CMOS工艺尺寸的不断缩小,计算存储系统的可靠性和性能都受到了严重威胁。日益升高的硬件缺陷与软错误发生率,使高速缓存的良率和可靠性不断恶化。同时,逐渐成熟的三维集成工艺技术为解决“存储墙”问题也提供了更好的技术手段。设计高性能高可靠性的存储结构已经成为计算机系统的关键技术。本文在减轻处理器-存储系统性能鸿沟和改进存储系统可靠性方面做了以下几项重要研究工作:

首先,本文提出了一种高效的内建修复分析方法来提高嵌入式存储器的良率。当前,嵌入式存储器已成为处理器和系统集成芯片的核心部件,决定着整个芯片的良率。嵌入式存储器很难像传统存储器一样通过外部测试设备来检测缺陷并分析修复策略,而需要内建自测试和内建修复分析电路来完成存储器的测试与修复。以前内建修复分析器的研究都假设硬件缺陷只能够被片上的冗余行或列修复,但事实上大多数的嵌入式存储器都集成有纠错码电路来防止存储器中的软错误。本文的方法通过适当的利用片上已有的纠错码电路,开发一种修复率高且硬件开销小的存储器内建修复分析器。该方法使用非常简单的块缺陷优先的修复分析方法来降低硬件开销,使用片上纠错码资源修正残余的硬件缺陷,并最终使用适当的方法恢复软错误的容忍力。本文提出的方法可有效降低内建修复分析算法的硬件开销,同时能够保持相同或者更高的硬件缺陷修复率以及容软错误能力。

其次,本文提出了一种利用多比特纠错码来提高二级缓存容错能力的方法。静态随机存储器的错误包括硬件缺陷和粒子射线引起的软错误两种。在传统存储器的设计中,硬件缺陷一般由片上冗余的行列资源来修补,而软错误由单比特纠错码来保护。集成电路工艺尺寸的不断缩小已使可靠高密度的高速缓存设计越来越复杂,传统的可靠性设计方法将无法满足良率要求。虽然多比特纠错码能显著提高高速缓存的可靠性,但由于多比特纠错码会明显的降低计算机性能并增加面积开销,通常被认为无法应用于高速缓存设计。本文研究了在二级缓

存中使用多比特纠错码在防止软错误的同时容忍大量随机硬件缺陷的可行性与可能的可靠性改善问题。我们的研究并不着眼于开发新的多比特纠错码,而专注于如何在二级缓存中利用架构设计有效利用多比特纠错码。由于那些有一个或多个缺陷位的缓存数据块可以在存储器测试的时候检查出来,我们本能的可以采用一种更好的方式:只使用多比特纠错码保护那些需要的缓存数据块,而不是普遍的保护所有数据块。这种选择性保护方案可以使多比特解码较长延迟对处理器性能的影响大大降低,而需要存储的纠错码冗余位也会相应下降。这种选择性使用多比特纠错码的方案需要基于内容寻址存储器的实时查找表来判断当前访问的数据块是否被多比特纠错码保护。但是,尽管直接由内容寻址存储器来实现选择性使用多比特纠错码看似简单,其功能无法满足高密度的缺陷率条件:(1)随着随机缺陷率的增加,大部分的缓存访问都将引起多比特纠错码解码操作,因而会降低整个系统的性能。(2)内容寻址存储器比普通SRAM的功耗要大得多,因此不断查找缺陷表将导致过大的能量消耗。本文进一步利用高速缓存访问的局部性原理,通过以几个特殊功能的小缓存来辅助高速缓存的方法,巧妙避免了大多数的多比特容错码解码延迟,极大降低了多比特容错码的面积开销。此外,本文提出的二级缓存设计可在提高可靠性的同时保持相同的容软错误能力。

三维集成已经成为处理器设计领域一项前景广阔的技术,为解决高性能处理器的存储墙问题提供了可行的解决方案。面向三维集成的工艺技术,本文开发了一种采用粗颗粒度分区策略的三维动态随机存储器结构。与之前的研究相比,该结构在不引起过孔加工限制的情况下充分利用三维集成的优势,在所有硅基层合理共享全局的地址和数据总线,从而只需要的很少量的硅层间过孔和相对较低的过孔加工尺寸要求。本文进一步提出使用该存储结构为多核计算系统设计了一种异构三维动态随机存储器结构,利用三维动态随机存储器同时设计计算机系统的二级缓存和计算机主存。为提高动态随机存储器二级缓存的性能,本文提出采用可变子单元大小和多阈值电路等技术降低访问延迟。与通常动态存储器性能远低于静态存储器的印象相反,我们使用改进的存储器建模工具证明三维动态随机存储器二级缓存设计可实现与静态存储器相同的访问速度,甚至更快。通过应用以上技术,本文提出的三维动态存储结构能有效的减小访问延迟,进而改进三维集成计算系统的整体性能。

对于未来的三维集成微处理器,由于硅片垂直叠放相互遮挡,不同的硅片层受射线粒子引起软错误的程度也不同。研究表明,外层硅片可以为内层硅片遮挡粒子射线,这一现象被称为屏蔽效应。受三维微处理器结构的屏蔽效应启发,本文提出一种容软错误的三维高速缓存结构。由于外硅片层为内层电路遮挡Alpha射线,内层电路可能天然的不受Alpha射线的影响而具有容软错误的能力,其容错电路可以省去。因而,访问不受软错误威胁的内层硅片上的缓存数据的延迟与能耗比其他硅片层要小得多。进一步,我们开发了多种技术来使外层硅片上的数据动态搬移到内层,从而使高速缓存的数据访问集中于不受软错误影响的硅片层。

对于一级缓存,我们提出一种内层直接映射缓存结构来尽量增加内层数据的访问,同时避免访问不必要数据所引起的功耗损失;对于低级缓存,我们提出解除Tag与Data块之间的直接对应关系,来弥补低级缓存相对低的局部访问特性。该三维高速缓存结构可显著提高处理器的性能和能耗效率。

最后,本文分析了未来三维集成的视频处理电路的性能与功耗改善。随着视频处理算法的复杂度不断提高,存储带宽已成为高级视频编码与显示处理系统的主要瓶颈,这一带宽不足状况还会进一步恶化。由于三维逻辑-存储集成将会提供大量的垂直互联,因而将对需要大存储容量与高带宽的视频处理应用产生重要影响。为了量化估计三维集成视频处理系统的性能和功耗改善,本文进一步开发了一款可无缝集成于多媒体多核处理系统的三维集成的运动估计加速器。我们提出一种三维集成的动态存储器结构和图像帧存储策略,并设计一种全并行的二维运动估计加速方法来利用三维集成动态存储器降低系统功耗。该方法可无缝的支持各种运动估计视频处理算法,包括H.264/AVC编码标准中的变块运动估计。我们以多帧运动估计为例,使用硬件设计和动态存储器建模工具证明了该运动估计加速器的能耗效率。

本文结合存储系统的设计需求与最新的集成电路工艺进展,针对计算机存储系统设计的多个关键问题提出了系统的解决方案。本文提出的所有架构设计与方法研究都使用系统级和电路级的仿真工具完成了有效性的验证。其中存储器电路级设计主要使用硬件电路设计仿真与存储器建模工具来完成对电路参数的预估。计算机系统级性能则分别采用了单核和多核处理器系统仿真器对本文提出架构的处理能力和功耗进行了完整的评估。

关键词:存储结构;可靠性;容错技术;三维集成Architecture Design of High Performance and Reliable

Computer Memory Systems

Sun Hongbin

ABSTRACT

Scaling of CMOS devices has provided remarkable improvement in performance of integrated circuits in the past few decades.Moore’s law tells that processor speed doubles every 18 months because of technology scaling.The memory speed, however, increased only by about 7% per year.As a consequence, the processor-memory speed gap doubles every 21 months, which is called as “memory wall”.To bridge the processor-memory gap, computer memory hierarchy including both cache and main memory has played a key role to alleviate the affect of the memory slowness.As CMOS technology continues to scale down, how to design a high performance and reliable memory hierarchy in computer system has become a grand challenge.The yield and reliability of cache memory are threatened by both hard faults and soft errors.In the meanwhile, the emerging three-dimensional(3D)integration technology provides the better approaches to address the “memory wall”.As a consequence, to design the high performance and reliable memory architecture becomes the critical technique in computer systems.This thesis makes several important contributions to mitigate the processor-memory gap and improve the reliability of memory hierarchy.First, we present a cost-efficient built-in repair analysis(BIRA)approach to improve the yield of embedded memory.As embedded memories become more and more dominant in system-on-chip(SoC)design, it is very crucial to achieve sufficiently high embedded memory yield.Due to the increasing number of diversified embedded memories on chip, external memory testing and redundancy repair analysis become inadequate and the use of BIRA becomes more attractive and even indispensable.All the prior work on BIRA assumed that defects can only be repaired by redundant rows or columns.Motivated by the fact that most embedded memories use error correction code(ECC)to uniformly protect all the memory words from soft errors, we propose to appropriately leverage the existing on-chip error correction circuit to enable very low-cost built-in repair analysis implementations while maintaining the same and even higher defect repair rate and the same soft error tolerance.Second, we propose a defect tolerant L2 cache memory by using multi-bit error correction codes.Potential faults in SRAM can be parametric/catastrophic defects or transient soft errors, both of which are becoming increasingly serious as the technology feature size shrinks.In conventional design practice, memory defects are handled by using spare(or redundant)rows, columns, and/or words to repair(i.e., replace)the defective ones, while soft errors are compensated by single-error-correcting error-correcting codes.As the technology continues to scale down, traditional repair-only defect tolerance strategy may no longer be sufficient to ensure high enough yield.Although strong multi-bit ECCs appear to be a natural choice to improve the reliability, it is commonly believed that multi-bit ECCs may incur prohibitive performance degradation and

silicon/energy cost for cache memory.This work concerns the feasibility and potential of using multi-bit ECC to tolerate a large amount of random defects in L2 cache without the loss of soft error tolerance.This work does not intend to develop any new multi-bit ECC, instead we focus on how to enable the effective use of multi-bit ECC in L2 cache.Since cache blocks consisting of one or more defective cells can be identified during memory testing, it is very intuitive that a better choice is to apply multi-bit ECC to the cache blocks only whenever necessary instead of uniformly protecting all the cache blocks using multi-bit ECC.Such a simple selective use of multi-bit ECC may largely alleviate the impact on the overall cache performance and area overhead.Intuitively, implementation of the selective use of multi-bit ECC must perform content addressable memory(CAM)based run-time table look-up to check whether or not the cache block being accessed should be protected by the multi-bit ECC.However, although a direct realization of selective use of multi-bit ECC accompanied by CAM is quite straightforward, its effectiveness may be inadequate in the presence of a relatively high random defect density for two main reasons:(i)As the random defect density increases, a larger percentage of cache read operations may invoke multi-bit ECC decoding, which will directly degrade the overall system performance such as IPC;(ii)Since the energy consumption of CAM is greatly larger than that of normal SRAM and the size of CAM will increase as the random defect density increases, a significant energy consumption overhead will be incurred.However, by supplementing a conventional L2 cache core with several special-purpose small caches/buffers, we can greatly reduce the silicon cost and minimize the probability of explicitly executing multi-bit ECC decoding on the cache read critical path.Moreover, the proposed L2 cache design can maintain the same level of soft error tolerance in the meanwhile.Three dimensional(3D)integration is emerging as an attractive technology for microprocessor design, and provides a viable and promising option to address the well-known memory wall problem in high performance computing systems.Based on 3D integration technology, we develop a 3D DRAM design applying coarse-grained 3D partitioning strategy, which introduces a much less number of through-silicon vias(TSVs)and less stringent constraints on TSV pitch compared with prior work.The key is to share the global routing of memory address and data bus among all the DRAM dies through coarse-grained TSVs with the small pitch.We also investigate the potential of using 3D DRAM to implement both L2 cache and main memory in 3D multi-core processor-DRAM integrated computing systems.In contrast to the common impression that DRAM is much slower than SRAM, using the modified CACTI tool, we show that 3D DRAM L2 cache may achieve comparable and even faster speed than 2D SRAM L2 cache.By employing these design techniques, the proposed 3D DRAM design can effectively reduce the access latency, hence improve the overall 3D integrated computing system performance.3D microarchitecture provides another interesting advantage that circuits on different dies may exhibit the heterogeneous soft error vulnerabilities due to the shielding effect of die stacking.Recent research characterized microarchitecture soft error vulnerabilities across the 3D-stacked chip dies and concluded that the inner-dies can be shielded by the outer-dies from particle strikes.Motivated by the shielding effect in 3D microarchitecture, we propose a soft error resilient 3D cache architecture.The underlying idea is to eliminate the error correction circuits on the soft error invulnerable dies(SID), being aware that the inner-dies may be inherently soft error invulnerable since they are implicitly protected by the outer dies from particle strikes.As a result, data access on the soft error invulnerable dies introduces a much less latency and energy dissipation.Moreover, we develop techniques to enable the dynamic data block movement in cache memory, which can effectively maximize the data access on the soft error invulnerable dies.For L1 cache, we propose a SID direct mapping cache architecture to maximize the accesses on the SIDs and avoid the energy

waste on the useless data accesses in the meanwhile.For low level caches, we propose to decouple the tag entry from data block to compensate the relatively poor locality characteristics in low level caches.The overall cache hierarchy can achieve a significant performance and energy efficiency improvement.Finally, we analyze the potential benefits of 3D-stacked video processing circuits in terms of performance and energy consumption.Currently, bandwidth has become the primary bottleneck of the advanced video coding and display processing systems.The bandwidth deficiency in video processor may be even worse when people try to use more sophisticated algorithms to further improve the performance.We show that 3D integration will have a significant impact on memory intensive video processing, given the massive logic-memory interconnect bandwidth enabled by die stacking.To quantitatively demonstrate the attractive advantages, we further develop a 3D integrated motion estimation accelerator that can be integrated in multimedia processing multicore processor.We develop a 3D integrated DRAM memory organization and image frame storage strategy geared to motion estimation, and apply a fully parallel 2D motion estimation computation engine to take advantage of the 3D stacked DRAM to minimize the energy consumption.Our proposed approach seamlessly supports various motion estimation algorithms and variable block-size motion estimation(VBSME)that has been adopted in H.264/AVC.We present a case study on multi-frame motion estimation by applying the proposed accelerator design solution based on DRAM performance modeling and ASIC design to demonstrate its energy efficiency.By focusing on the design requirement of memory hierarchy and new advance in semiconductor technology, this thesis proposes several efficient architecture solutions to address the critical problems in computer memory systems.All the architectures and approaches proposed in this thesis are extensively demonstrated by using the system-level and/or circuit-level simulation tools.The electrical properties of memory circuit design are mainly evaluated and estimated by leveraging circuit design and simulation tools.While the unicore and multicore microprocessor simulators are used to give an extensive evaluation to the computational capability and energy consumption of the proposed architectures.Key words:Memory hierarchy;Reliability;Defect tolerance;3D integration

下载高性能高可靠的计算机存储系统架构设计研究中英文摘要word格式文档
下载高性能高可靠的计算机存储系统架构设计研究中英文摘要.doc
将本文档下载到自己电脑,方便修改和收藏,请勿使用迅雷等下载。
点此处下载文档

文档为doc格式


声明:本文内容由互联网用户自发贡献自行上传,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任。如果您发现有涉嫌版权的内容,欢迎发送邮件至:645879355@qq.com 进行举报,并提供相关证据,工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。

相关范文推荐