|
||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
|
|
System Power Management -- A HOT TopicBy Kerry Glover System complexity and speed continue to grow every year at rates only dreamed about a few years ago. Had you asked personal computer (PC) owners in 1995 if they would ever own a 500-MHz computer system, the majority would have thought this impossible -- yet it occurred within four years. Yet again, in less than three years after 500-MHz speeds were achieved, we expect to see systems operating in the GHz range, a more than 2X-speed increase. How is this possible? Moore's Law has been used by the industry for many years to predict the constant progression of technology. The exponential growth of speeds and densities resulting from technology advancement is evident in Intel's processor clock frequency evolution (see graph below), and has been a key enabler for high-speed systems. However, these advances often come at the expense of power. ![]() Increased power to support the advancement of system performance can mean higher power dissipation resulting in increased heat generation inside the system enclosure. Heat is enemy No. 1when it comes to reliability -- for both electronic and electro-mechanical components. Present-day PC processors alone dissipate power in excess of 20 W, and higher. The development of higher-performance graphics processors, memory and HDDs (hard-disk drives), which are required to realize the benefits of increased processor speeds, also add to the heat challenge inside the PC case. To the PC user, should temperatures rise to the point of causing vital components to malfunction, work/data will likely be lost. Should temperatures be allowed to rise even higher, permanent damage can occur to the system. As system performance has continued to increase, the ability to adequately address heat from power dissipation has also improved. Designers can use IC technology and design techniques to reduce power dissipation through process feature-size reduction, along with lower voltage and improved architecture, but these techniques have limitations. History has shown that regardless of improvements in IC technology to reduce power dissipation, the desire for higher system performance driven by the "greed for speed" has ultimately driven power up to the point of thermal limitations. This is evident by Intel's processor power dissipation versus clock frequency and process technology (see graph below.) Despite dramatic reduction in power dissipation shown at 266 MHz and 300 MHz with the introduction of 0.25-m m / 2.0-V technology, processor power is quickly approaching 35 W. How long will it be before 40 W is exceeded? ![]() To remove heat and maintain lower temperatures inside the system case, the use of cooling fans has been commonplace in desktop PCs, and is becoming more prevalent in notebook PCs. But even when cooling fans are operating, the die temperatures of critical components can become excessively high if power is not managed. Thus, the need and ability to monitor and react to the temperatures of particular critical components has evolved. Due to long, thermal time constants that govern system dynamic temperatures, fault prediction can be used as a solution in some situations. In systems that require a fan for adequate cooling, as most do today, the fan can be monitored to ensure it is operating properly. If the fan stops spinning, increased internal temperature and/or overheating can be predicted due to lack of cooling. This is a first-level approach, however, which does not address heating concerns while the cooling fan is operating. Actual temperature monitoring inside the system is an effective method to detect thermal issues while the cooling system is functioning. A thermal-measuring device can be implemented to detect the temperature rise of critical systems components, but there are also challenges with this approach. Each electronic component in the system has some amount of power it dissipates, a thermal resistance and a thermal mass, which control the amount of time for the temperature to rise and be measured. In an IC the source of heat from power dissipation is the IC die, which is located inside the device's package. If die temperature is rising and the thermal measurement sensor is a thermocouple on the outside of the package, a time lag occurs to sense the die temperature rising. In addition, the actual temperature of the die will be higher than measured externally according to thermal physics. Depending on the amount of power being dissipated and the thermal resistance of the device's package, the difference in die temperature versus the external measurement can be 15° C, or higher. These problems associated with an external measurement can both be addressed by developing a method for direct measurement of IC die temperature -- resulting in quicker response to temperature increases and the actual die temperature being measured. Accurate and fast thermal sensing directly on the chip die is extremely critical for a highly reliable hardware monitoring thermal subsystem. Mixed-signal temperature monitoring products have been developed utilizing a technique to perform the measurement that requires only a diode to serve as the thermal sense element on the actual die of the targeted device. This technique is highly reliable, as it remotely measures temperature at the target location. It does not require system-to-system calibration of temperature accuracy and is a low cost solution. Implementation requires two extra pins for connection to the on-die diode of the target device, but no extra components, such as thermocouples or RTDs, are necessary. The ability to predict a thermal problem, even to the extent of measuring a CPU's die temperature, provides only one aspect of a thermal power management solution. How the thermal information is used through actions designed into the system is another aspect. One highly beneficial implementation is for the system to execute a soft recovery from a problem to prevent the user from losing data. This would involve reducing power dissipation as much as possible without losing system functionality and then taking action to prevent loss of data. Prevention of data loss could be addressed by alerting the user to an impending problem and advising them to save information, or better yet, by the system automatically saving the information before shutting down. Several methods can be used to ensure soft failure or even reduced capability with no failure. When a high temperature condition is detected the first line of defense would be to increase cooling through fan speed. Implementing a cooling fan that runs only at the speed required to maintain adequate cooling enables additional cooling when needed. In addition, operating a fan below its rated RPM during normal operation extends the fan life, or increases MTBF, and keeps system noise from the cooling fan to a minimum. A second line of defense would be to lower the CPU clock speed, which reduces power dissipation and performance without crippling the system. This response could also apply to a low-battery condition. Another safeguard is to somehow alert the applications being used on the system to save automatically. And a final protection mechanism would be to send a message to the operating system asking it to shut down. One highly effective method being used today is to monitor the CPU die temperature and, based on that, either increase fan speed, or reduce processor power by lowering the CPU clock frequency, or a combination of both. This method involves use of a hardware monitoring subsystem that performs periodic temperature measurements and converts them to digital words, which can then be compared against digital temperature limits to assert interrupts. The subsystem communicates with an embedded controller, which handles the local interrupts and decisions on adjustment of fan speed or CPU frequency (see diagram below.) ![]() A consortium of industry leaders, including Intel, Microsoft and Toshiba, has developed an open specification for the PC industry, ACPI (Advanced Configuration and Power Interface), which defines the interfaces between OS software, BIOS software and hardware involved in PC power management under OS control. A portion of this specification relates to hardware monitoring and control for the purpose of thermal management as we have shown above. A new line of ACPI-compatible HMC (Hardware Monitoring & Control) products, developed by TI, is targeted toward PC thermal management solutions. The THMC10 device, intended mainly for notebook PCs, includes local and remote temperature monitoring with programmable temperature limits via a SMBus interface. The THMC50 device, intended mainly for corporate desktop PCs, includes the same temperature monitoring functions, but adds cooling fan speed control, over-temperature failsafe detection and alerting, and dual voltage supervisor/RESET generators. In both devices, IC implementation of the algorithm for accurately measuring temperatures from remote and local diodes, and their conversion to digital words, is based on TI's expertise in mixed-signal and analog design. In order to effectively deal with the heat generated by today's CPUs, graphic processors and memory, PC manufacturers need a thermal management solution so the system can adaptively manage the heat dissipation. As PC OEMs produce systems with higher-performance CPUs, graphics processors, dense memory and high-capacity disk drives to meet their customers' desires, PC manufacturers must re-evaluate their HMC devices to ensure that the system can manage the heat dissipation. As systems continue to increase in performance, the need for efficient thermal management devices will continue to grow. Analog Main | Product of the Week | Columns | Editorial | Tech Notes
|
|||||||||||||||||||||||||||||||||||
|
Copyright © 2003 ChipCenter-QuestLink About ChipCenter-Questlink |
||||||||||||||||||||||||||||||||||||