ChipCenter Questlink
SEARCH CHIPCENTER
Search Type:
Search for:




Knowledge Centers
Product Reviews
Data Sheets
Guides & Experts
News
International
Ask Us
Circuit Cellar Online
App Notes
NetSeminars
Careers
Resources
FAQ
EE Times Network
Electronics Group Sites

Improving Performance in Complex Programmable Logic Devices (CPLDs) with the FPGA Express Software

By Phil Simpson, Altera Corporation (psimpson@altera.com)

Introduction
As the demand for improved performance increases, designs must be constructed for maximum logic optimization. Achieving best performance is possible by using efficient VHDL and Verilog HDL coding techniques, FPGA Express design techniques, and FPGA Express constraints and settings. In a typical design flow, after creating the HDL description of the design, the design is synthesized in FPGA Express and an EDIF netlist file is generated to be imported into the place and route tool. However, it may be necessary to iterate through the design process using all or some of the techniques discussed below until the design specifications are met.

The first step in this iterative process is to write HDL code that is "architecture-aware" and "synthesis-aware." This means that your HDL source code should use constructs that will utilize the architecture features of PLDs, such as abundance of registers and embedded memory blocks. While writing HDL code you must also have some knowledge about how the synthesis tools generally interpret particular HDL styles. The next step is to use design methodologies, like pipelining and logic duplication, to improve the design implementation. The final step in this iterative process is to use various options and constraints in the FPGA Express software or in the place and route tool to achieve the desired results.

Effective HDL Design Techniques for the FPGA Express Software By using effective HDL design techniques in the FPGA Express software, you can streamline your designs, reduce and optimize logic, reduce logic delay and improve overall performance. This section describes the following techniques:

  • Hierarchy
  • Latches vs. Registers
  • Priority-encoded If statements
  • "Don't-care" conditions
  • Gated clocks
  • State machines
Hierarchy: A very simple technique is to create a hierarchical design. The functionality of many designs is too complex to implement in a single design file. FPGA Express software allows you to have multiple design files and link the files into a hierarchy. You can build up the design through standard VHDL or Verilog HDL instantiations, and optimize sub-designs individually rather than optimizing the entire design. The design should be partitioned at functional boundaries and to minimize I/O connections.

After creating a hierarchical design, you have the option of flattening the design during synthesis to allow FPGA Express to optimize across the boundaries. However, if you have different optimization conditions (area or speed) for different blocks of your design, you now have the option of preserving the hierarchy and setting the desired constraints during synthesis.

Latches vs. Registers: Since PLDs have registers built into the silicon, designing with latches generates more logic and lower performance than designing with registers. Therefore, when you are designing combinatorial logic, you should avoid unintentionally creating a latch due to your HDL design style. For example, when Case or If statements do not cover all possible conditions of the inputs, combinatorial feedback can generate latches. Figure 1 shows sample VHDL code that generates a latch.

Library ieee;
use ieee.std_logic_1164.all;
entity des1 is
PORT (a, b, c	: in std_logic;
sel	: IN STD_LOGIC_VECTOR (1 DOWNTO 0);
oput	: out std_logic);
end des1;
architecture behave of des1 is
BEGIN
Process (a,b,c,sel)
Begin
If sel = "00" then
oput <= a;
elsif sel = "01" then
oput <= b;
elsif sel = "10" then
oput <= C;
end if;
End process;
end behave;
Figure 1. Sample VHDL Code Unintentionally Generating a Latch

A latch is generated when the final ELSE clause or WHEN OTHERS clause is omitted from an If or Case statement, respectively. Figure 2 shows sample VHDL code that prevents the unintentional creation of a latch.

Library ieee;
use IEEE.std_logic_1164.all;
entity des2 is
PORT (a, b, c	: in std_logic;
sel	: IN STD_LOGIC_VECTOR (1 DOWNTO 0);
oput	: out std_logic);
end des2;
architecture behave of des2 is
begin
Process (a,b,c,sel)
Begin
If sel = "00" then
oput <= a;
elsif sel = "01" then
oput <= b;
elsif sel = "10" then
oput <= C;
eLSE
oput <= 'X'; -- removes latch
eND IF;
End process;
end behave;
Figure 2. Sample VHDL Code Preventing Unintentional Latch Creation

Priority-encoded If statements: To reduce the propagation delay of critical-path signals in a design, you can use If statements to perform priority encoding. Example 3 illustrates good design practice if sel1 is a late-arriving signal in the critical path. In this case, sel1 has the highest priority.

module priority (a,b,c,d,sel1,sel2,sel3,sel4,oput);
input a,b,c,d,sel1,sel2,sel3,sel4;
output oput;
always @(a or b or c or d or sel1 or sel2 or sel3 or sel4)
begin
oput = 1'b0;
if (sel1)
oput = a;
else if (sel2)
oput = b;
else if (sel3)
oput = c;
else if (sel4)
oput = d;
end
endmodule
Figure 3. Verilog HDL for Priority-Encoded If Statement

Figure 4 shows the schematic representation of the Verilog code from Figure 3. The late-arriving signal sel1 is placed such that it passes through minimum logic.

Figure 4
Figure 4. Schematic Representation of Priority-Encoded If Statement from Figure 3

"Don't Care" conditions: The FPGA Express software generally treats unknowns as "don't care" conditions to optimize logic. Within a design, you can assign the default case value to "don't care" instead of to a logic value to give the best logic optimization. However, you must verify all "don't care" conditions in simulation.

Gated clocks: Gated clocks create logic delays and clock skew, and use additional routing resources within devices. Therefore, you should avoid using gated clocks or sometimes you may be able to use the clock enable input. However, if you must implement a gated clock in your design, some PLD architectures include features that will reduce the hazards associated with them. For example, in Altera's FLEX devices you can use the GLOBAL primitive to place the gated clock on one of the high-fan-out internal global signals. Figure 5 shows an example that implements a gated clock using the GLOBAL primitive in a VHDL design.

LIBRARY ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
entity gate is
port (a,b	: in std_logic;
c,d	: in std_logic_vector(3 downto 0);
oput	: out std_logic_vector(3 downto 0));
end gate;
architecture behave of gate is
SIGNAL clock: std_logic;
signal gclk: std_logic;
signal count: std_logic_vector(3 downto 0);
component GLOBAL
port (A_IN: in std_logic; 
A_OUT: out std_logic);
end component;
begin
clock <= a AND b;
clk_buf: GLOBAL port map (clock, gclk);
process(gclk)
begin
if gclk='1' AND gclk'event then
count <= c + d;
end if;
end process;
oput <= count;
end behave;
Figure 5. Implementing a Gated Clock in VHDL for Altera's FLEX Devices

State Machines: In designs containing state machines, you should separate the state machine logic from all arithmetic functions and data paths to improve performance. Use a state machine purely as control logic. For state machines targeting PLDs, using one-hot encoding gives better results as this encoding style uses one bit per state, which uses more state registers but reduces the decoding logic required. To optimize a state machine, use one-hot encoding when targeting register-rich, look-up table (LUT)-based architectures such as Altera's FLEX devices since such devices work well with low-fan-in decoding logic. Consequently, one-hot encoding increases performance and efficiency for these devices.

Another design technique that increases state machine performance for PLDs with embedded memory blocks, like Altera's FLEX 10K device EABs, is placing state machines in EABs. Extremely complex state machines with limited I/O are ideal for implementing in EABs. This is because the EAB implements complex functions in a single logic level, resulting in more efficient device utilization and higher performance. Follow these basic guidelines when placing a state machine in Altera's FLEX10K EABs:

  • The state machine should not contain any latches. Use registers in EABs for best results.
  • Design blocks to be placed in an EAB cannot have any feedback. Therefore, place the state machine in a lower-level file and provide the feedback on an upper level.
  • Register all or none of the inputs and all or none of the outputs.
  • Depending on the configuration, an EAB can have a maximum of 11 inputs and a maximum of 8 outputs. However, EABs can be cascaded to implement functions that require more inputs and outputs than are available in a single EAB.

FPGA Express Design Techniques for CPLDs
In addition to HDL coding styles, there are techniques that a designer can implement during the design phase. There are also options like register duplication in FPGA Express to get improved results for CPLDs. These topics are briefly discussed here.

Register balancing: This technique is used to reduce long delays and to increase shorter delays. This is especially beneficial in applications where registers are used purely for latency purposes. As you can see from figure 6, this technique is used to satisfy the register-to-register timing requirements by moving the registers in the design. This does not change the latency of your design, as the number of register levels remains constant.

Figure 6
Figure 6. Register Balancing

Pipelining: This technique uses registers rather than combinatorial latches to hold logic. When pipelining a design, you add registers to break up large combinatorial delays. Because LUT-based PLD architectures include a register with each LUT, pipelining combinatorial logic generally does not require additional device resources. Therefore, using registers for pipelining improves performance without increasing the logic utilization in a device. See Figure 7.

Figure 7
Figure 7. Pipelining a Design

Logic Duplication: Logic duplication is a good method for reducing fan-out and improving the design performance on a given path. You should use logic duplication on high-fan-out nodes and flip-flops because it reduces the number of loads these signals drive and can potentially ease routing. The Synopsys FPGA Express software provides the Merge Duplicate Register option from the GUI. You can disable this option forcing FPGA Express synthesis to preserve a register and not optimize across it. By default, this option is enabled in FPGA Express; therefore duplicate registers are optimized out. Here is an example of a VHDL code that has duplicate registers.

entity split is 
port(	clk, rst	: in std_logic;
a,b,c,d,e	: in std_logic;
data_out	: out std_logic_vector(1 downto 0));
end split;
architecture behave of split is
signal inter	: std_logic_vector(1 downto 0);
SIGNAL temp	: std_logic;
begin
reg: process(clk,rst)
begin
IF rst = '0' THEN
inter <= "00";
ELSIF (clk = '1' AND clk'EVENT) THEN
inter(0) <= temp;
inter(1) <= temp;
END IF;
END PROCESS;
temp <= a AND c AND d AND e;
data_out(0) <= inter(0); -- the registers inter(0) and inter(1) are
data_out(1) <= inter(1); -- duplicates.
end behave;
Figure 8. VHDL Source Code for Logic Duplication

Figure 9 shows the schematic representation of the VHDL logic from Figure 8 with the Merge Duplicate Register option disabled in FPGA Express.

Figure 9
Figure 9. Schematic Representation of Logic Duplication (Merge duplicate Register Option Disabled)

Figure 10 shows the schematic representation without logic duplication. In this case the Merge Duplicate Register option is enabled.

Figure 10
Figure 10. Schematic Representation Without Logic Duplication (Merge Duplicate Register Option Enabled)

LPM Functions: Functions from the Library of Parameterized Modules (LPM) are large building blocks that can be customized easily for your application by using different ports and by setting different parameters. Some PLD vendors, such as Altera, optimize LPM functions for optimal placement in their device architectures. In FPGA Express, LPM functions can be instantiated in the HDL source code or inferred from certain operators. Figure 11 below shows a VHDL design that instantiates the LPM_MULT function.

LIBRARY ieee;
USE ieee.std_logic_1164.all;
LIBRARY lpm;	
-- Use these two commands to instantiate an lpm
USE lpm.lpm_components.all;	
-- block directly, without component declaration
ENTITY mult IS
PORT(	a, b	: 
IN STD_LOGIC_VECTOR(7 DOWNTO 0);
prod	: 
OUT STD_LOGIC_VECTOR(15 DOWNTO 0));
END mult;
ARCHITECTURE struct OF mult IS
SIGNAL prod_temp	: STD_LOGIC_VECTOR (15 DOWNTO 0);
SIGNAL gnd	: STD_LOGIC_VECTOR (15 DOWNTO 0);
BEGIN -- struct
prod <= prod_temp;
gnd <= (others => '0')
u1 : lpm_mult
generic map (
lpm_widtha => 8,
lpm_widthb => 8,
lpm_widthp => 16,
lpm_widths => 8,
lpm_representation => unsigned
);
PORT MAP (
dataa => a,
datab => b,
sum => gnd,
result => prod_temp
);
END struct;
Figure 11. Instantiating an LPM Function in VHDL

In addition to instantiating LPM functions, the FPGA Express software infers LPM functions from certain operators. For example, in designs targeting Altera's FLEX devices, LPM functions are inferred from relational operators with non-constant operands and multiplication operators. Figure 12 shows a VHDL design where a lpm_mult function is inferred by the FPGA Express software from the multiplication operator.

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.ALL;
ENTITY mult IS
PORT(	a, b : in std_logic_vector(7 downto 0);
prod : out std_logic_vector(15 downto 0));
END mult;
ARCHITECTURE behav OF mult IS 
BEGIN -- behav 
prod <= a*b;
END behav;
Figure 12. Inferring an LPM function

FPGA Express Constraints and Settings
This section describes the various constraints and settings that can be applied in the FPGA Express software to improve the performance of your designs. These include:

  • State machine implementation options
  • Area/speed constraints
  • Advanced synthesis options
  • Device and timing constraints
  • Timing analysis tool - TimeTracker
  • Graphical analysis tool - Vista

Finite State Machine Implementation: FPGA Express software provides many options to control the implementation of state machines. It automatically extracts state vectors and re-encodes state machines as one-hot, binary or Altera-specific zero-one-hot when the state vectors are described using an enumerated type. To select the FSM encoding style, you must select the style before analyzing the HDL files in FPGA Express. Figure 13 shows a VHDL state machine design that describes enumerated types. FPGA Express will automatically encode this design with the style you choose.

ARCHITECTURE behavior OF stmch1 IS
TYPE state_type IS (idle, five, ten, 
fifteen, twenty, twenty_five, 
thirty, owe_dime);
SIGNAL current_state, next_state: state_type;
BEGIN
.
.
.
END;
END behavior;
Figure 13. VHDL FSM Example with Enumerated Types

The FPGA Express software also allows you to specify the encoding style in the HDL source code. In a VHDL file you can do this by using the Synopsys attribute ENUM_ENCODING. For a Verilog design file specify the enumerated states by using the parameter keyword. Figures 14 and 15 below show VHDL and Verilog HDL FSM examples with user-specified encoding respectively.

ARCHITECTURE fsm OF stmch2 IS
TYPE state_type IS (idle, go, yield, stop);
SIGNAL current_state, next_state: state_type;
ATTRIBUTE enum_encoding: STRING;
ATTRIBUTE enum_encoding OF current_state:
TYPE IS "0001 0010 0100 1000";
BEGIN
.
.
.
END;
END fsm;
Figure 14. VHDL FSM Example with User-Specified Encoding

module stmch2 (reset, clk, out1);
input reset, clk;
output [1:0] out1;
parameter [3:0] idle=4'b0001, go=4'b0010, 
yield=4'b0100, stop=4'b1000;
.
.
.
endmodule
Figure 15. Verilog HDL FSM Example with User-Specified Encoding

Additionally, FPGA Express provides control over the implementation of "when others" statement in your VHDL state machine design. This statement is used to cover all states that aren't specified, including invalid states. In such a case, the FPGA Express software generates next state logic to ensure that the implementation is an exact match of the VHDL description and the FSM is able to recover from invalid state transitions.

However, if guaranteed recovery from invalid state transitions is not required, a smaller and faster implementation of the one-hot FSM can be generated. The selection logic for all of the invalid states from the "when others" choice is eliminated resulting in a smaller and faster one-hot state machine. This is the default option in FPGA Express.

Area/Speed Constraints: In the FPGA Express software you can choose to synthesize your design for area optimization or for speed optimization. These constraints can be implemented on a global basis or on individual hierarchical levels. The impact of this constraint varies depending on your HDL coding style and the size of your design (large design blocks generally show significant difference in results.)

Advanced Synthesis Optimization: The FPGA Express software allows you to choose High Effort versus Low Effort option. Low effort corresponds to fast compilation mode, while high effort option takes longer compilation time due to the use of advanced synthesis optimization algorithms. This option is selected during synthesis on a global basis or it can be applied to individual hierarchical levels. As described in effective HDL design techniques' section, your HDL coding style plays an important role during synthesis, and therefore it may limit this optimization.

Device and Timing Assignments: The FPGA Express software allows you to make device and timing assignments for various PLD architectures. For Altera, these assignments are passed to the MAX+PLUS II software, Altera's place and route tool, via the ACF file (assignments and constraints file) produced by the FPGA Express software. Figure 16 shows an example of the ACF file generated by FPGA Express. This MAX+PLUS II compatible ACF file may have assignments for family, device, clock frequency, timing assignments, synthesis style and pad settings (pin location, slew rate and use of I/O register).

CHIP prep3
BEGIN
DEVICE = EPF10K10AFC256-1 {synopsys};
"|inn6" : PIN = 31 {synopsys};
"|outt2" : PIN = 14 {synopsys};
END;
GLOBAL_PROJECT_SYNTHESIS_ASSIGNMENT_OPTIONS
BEGIN
DEVICE_FAMILY = FLEX10KA {synopsys};
STYLE = FAST {synopsys};
OPTIMIZE_FOR_SPEED = 5 {synopsys};
AUTO_GLOBAL_CLOCK = ON {synopsys};
END;
LOGIC_OPTIONS
BEGIN
"|outt6" : IO_CELL_REGISTER = ON {synopsys};
"|outt3" : SLOW_SLEW_RATE = OFF {synopsys};
END;
COMPILER_INTERFACES_CONFIGURATION
BEGIN
EDIF_INPUT_VCC = VDD {synopsys};
EDIF_INPUT_GND = GND {synopsys};
EDIF_INPUT_USE_LMF1 = ON {synopsys};
EDIF_INPUT_LMF1 = "prep3.lmf" {synopsys};
END;
IGNORED_ASSIGNMENTS
BEGIN
FIT_IGNORE_TIMING = OFF {synopsys};
END;
TIMING_POINT
BEGIN
FREQUENCY = 20MHz {synopsys};
TPD = 50ns {synopsys};
"|inn7" : TSU = 5ns {synopsys};
"|outt7" : TCO = 50ns {synopsys};
"|outt4" : TCO = 10ns {synopsys};
END;
Figure 16. MAX+PLUS II Compatible ACF File Generated by FPGA Express with Device and Timing Assignments

TimeTracker: TimeTracker is the timing analysis tool of FPGA Express. This tool helps speed up the design cycle by allowing you to identify and fix critical portions of your design before doing place and route.

In the pre-optimization stage this tool allows you to enter timing constraints for clocks, path groups, multi-cycle paths, sub paths and ports. After the chip has been optimized, you can view the results in a familiar and easy-to-use spreadsheet format. TimeTracker displays the required timing and the achieved timing side-by side to show which constraints were met and which ones failed. The timing paths can be traced in the schematic viewer too. Although the timing results reported by the tool are pre-place and route, the results give you a good idea where the design is not performing optimally. You can now change the HDL source code to achieve better performance before running the design through the place and route tool. However, once you are satisfied with the results of FPGA Express you must use the place-and-route tool's timing analyzer to obtain the actual results.

Visual Tools for Analysis (Vista): Vista is the FPGA Express schematic viewing tool. This tool helps you visualize your synthesis results and improve your device's performance and area results.

You can use it to view your pre-optimized design. At this stage the design consists of generic gates and it allows you to visualize how the FPGA Express synthesis engine interprets HDL code. Once you optimize the design the FPGA Express software maps the generic gates to architecture-specific elements. The optimized schematic shows the design mapped to LUTs, lcells, and carry and cascade chains. This view allows you to evaluate how well the design has been mapped to the given technology.

Vista's hierarchy browsing capability enables you to analyze your design at different levels of hierarchy and helps you visualize where changes might reduce device area or increase device performance.

Vista also highlights critical paths within your design, allowing you to quickly analyze problem areas. The optimized schematic view is tightly integrated to TimeTracker, which allows you to see clock group, path or cell when selected in TimeTracker. You can move along the critical path by using the Next Pin and Previous Pin commands.

Vista also allows you to perform fan-in and fan-out analysis on both the pre-optimized and optimized designs. You can trace the fan-in and fan-out paths for a given cell using the filter options available in FPGA Express.

Conclusion
Design time and performance are valuable commodities in the programmable logic industry. This paper demonstrates various techniques to help you achieve performance goals and save design time by streamlining your design. By using VHDL and Verilog HDL coding techniques, FPGA Express software constraints, and PLD-vendor software options, you can improve performance in PLDs and ultimately improve your overall design.


Home | Product of the Week | Tech Note | AppReview | Vendor Tools | Feedback

Click here to get your listing up.

Copyright © 2003 ChipCenter-QuestLink
About ChipCenter-Questlink  Contact Us  Privacy Statement   Advertising Information  FAQ