前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Pulse-latch approach reduces dynamic power

Pulse-latch approach reduces dynamic power

作者头像
ExASIC
发布2020-07-17 11:31:59
1K0
发布2020-07-17 11:31:59
举报
文章被收录于专栏:ExASICExASIC

大家都在讨论Latch的缺点,比如时序分析中存在borrow time的问题,如何在数字电路设计时避免产生latch。然而,在一些大公司,也时常可以见到用latch来设计高速数字电路,比如CPU内核。本文作者介绍了一种用“脉冲时钟来驱动latch”来节省动态功耗的方法,多个项目结果表明可以节省至少20%的动态功耗。我们不要害怕latch,要善于利于latch的优势。

原文链接:https://www.eetimes.com/pulse-latch-approach-reduces-dynamic-powe

Recently, many methodologies have been introduced for reducing dynamic power for systems-on-chip (SoCs). These methodologies, however, impose restrictive physical constraints which have schedule impact or which are heavily dependent on logic functions such as clock gating.

This article presents an elegant methodology using pulsed latch instead of flip-flop without altering the existing design style. It reduces the dynamic power of the clock network, which can consume half of a chip's dynamic power. Real designs have shown approximately a 20 percent reduction in dynamic power using the methodology described below.

Introduction

Dynamic power is consumed across all elements of a chip. The clock network is one of the large consumers of dynamic power. According to a recent IBM study [1], half of dynamic power is dissipated in the clock network.

Therefore, reducing power in the clock network can impact the overall dynamic power significantly. Designers already use a variety of techniques to reduce the clock power using smaller clock buffers, reducing the overall wiring capacitance, employing clock gating to reduce the dynamic power [2], and de-cloning to move the clock buffers at higher levels of hierarchy.

Even with these techniques, the dynamic power of clock network can be large since registers are used as state elements in the design. In general, a flip-flop is used as the register. A conventional flip-flop is composed of two latches (master and slave) triggered by a clock signal.

Flip-flop synchronization with the clock edge is widely used because it is matched with static timing analysis (STA). Timing optimization based on STA is must for SoCs. On the other hand, designers may choose to use a latch for storing the state. A latch is simple and consumes much less power than that of the flip-flop. However, it is difficult to apply static timing analysis with latch design because of the data transparent behavior.

A methodology has been developed which uses latches triggered with pulse clock waveforms. With this methodology, designers can apply static timing analysis and timing optimization to a latch design while reducing the dynamic power of the clock networks. The following describes this pulsed latch design methodology in detail and gives some guidelines as to how designers can apply this methodology in their designs.

Pulsed latch concept

A latch can capture data during the sensitive time determined by the width of clock waveform. If the pulse clock waveform triggers a latch, the latch is synchronized with the clock similarly to edge-triggered flip-flop because the rising and falling edges of the pulse clock are almost identical in terms of timing.

With this approach, the characterization of the setup times of pulsed latch are expressed with respect to the rising edge of the pulse clock, and hold times are expressed with respect to the falling edge of the pulse clock. This means that the representation of timing models of pulsed latches is similar to that of the edge-triggered flip-flop.

The pulsed latch requires pulse generators that generate pulse clock waveforms with a source clock. The pulse width is chosen such that it facilitates the transition. The following diagram represents a simple pulse generator and the associated pulse waveform.

Figure 1 Pulse generator and waveform

In this methodology, the pulse generators are automatically inserted to satisfy several rules during clock-tree synthesis. Along with pulse generators, this approach also uses a number of matching delay cells to allow for match clock insertion delays with or without pulse generators.

Designing with pulsed latches

With this methodology, conventional edge triggered flip-flops are used before clock tree synthesis (CTS). During CTS, pulsed latch replacement and pulse generator insertion are performed. In order to use pulsed latches on a design, the complete methodology should be implemented, including:

  • Pulsed latch replacement and pulse generator insertion
  • Skew and slew control of the clock tree
  • Timing analysis and optimization
  • Power analysis
  • Pulse latch design rule checking

A design can have a mixture of pulsed latches and edge-triggered flip-flops because some of the flip-flops cannot be replaced with pulsed latches. The methodology should support all designs.

Pulsed latch replacement and pulse generator insertion Since the pulse generator should be inserted in the clock network which considering the trade-off of power consumption between pulse generators and pulsed latch, the methodology employs one objective function in clock tree synthesis (CTS) methodology to pick up a clock-tree structure with the most efficient power reduction through pulsed latch replacement. Pulsed latches can then be used to substitute existing flip-flops where ever such substitution is possible.

Consideration is given to flip-flops connected to primary ports for timing model generation when a bottom-up design approach is applied. For those flip-flops transitioning on a different clock edge and those with tight hold time margins, the approach applies local cloning of clock trees to ensure maximum use of pulsed latches.

Skew and slew control In order to insert pulse generators to control pulsed latches, clock tree synthesis needs to maintain the skew balancing and required clock slew. Unless the CTS has a good control on skew and slew, it is very difficult to find optimal placement of pulse latches.

As pulse generators are inserted, the delay must be matched with the other branches by using delay cells. To ensure the clock pulse is not degraded, CTS should have good control on the slew across the entire clock tree. This methodology also allows designers to specify a different slew constraint for clock tree before and after the pulse generator to provide a trade-off between maintaining pulse shape and less power consumption.

Timing analysis and optimization Pulsed latches have similar timing libraries to that of conventional flip-flops allowing designers to fully utilize conventional static timing analysis. The timing reports with pulsed latch design are exactly the same as those with edge triggered flip-flops, although special care should be taken with hold timing analysis.

Conventional timing optimization is also performed with pulsed latch design and must account for tight hold time margins, and hence should be sensitive to this. Since this pulsed latch methodology is fully compatible with the entire timing optimization flow, designers can natively take coupling noise effect into account inside of optimization.

Power analysis Power analysis must distinguish the pulsed latches and normal flip flops and apply power values appropriately. Moreover, there is additional power consumed in pulse generators and delay cells. These power numbers must be considered during power analysis for comprehensive power savings.

Pulse latch design rule checking Since pulsed latch methodology co-operates with normal flip flops, a number of pulse latch design rules must be checked to ensure the design integrity:

  • Minimum and maximum slews limit on the clock network
  • Multiple pulse generators or dummy cells in the same clock path
  • Negative edge-triggered flip flop or macro driven by the pulse clock
  • Pulsed latches not driven by the pulse clock

Results

This automated pulse latch design methodology has been implemented and tested in several production chips. On the average, there was a 20 percent reduction of the total dynamic power consumption. An example of a normal implementation and the implementation using pulsed latches is shown below:

Figure 2 Pulse latch replacement and CTS, before and aftere

Conclusion

Designing with pulsed latches is a novel approach to saving dynamic power of up to 20 percent in nanometer designs. Reducing dynamic clock power is particularly important in high frequency designs as well as on designs with high flip-flop counts. This paper presents the requirements for an automated methodology to handle this new circuit element. Several ASICs have been designed using this methodology, and show a promising power improvement.

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-05-23,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 ExASIC 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档