

Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. G. Fettweis

#### 🜔 vodafone chair

#### Digital Signal Transmission Lab SS 08

Oliver Arnold Steffen Kunze



# Introduction



🜔 **vodafone** chair

# Hardware

- □ Why to use digital signal processing?
- General introduction to DSPs
- □ The TMS320C6711 DSP
  - Architecture Overview
  - Peripherals

### DSK6711 evaluation board - Software

- □ Code Composer Studio
- DSP/BIOS
- Multi-channel Buffered Serial Port (McBSP)



## Hardware



# **System Considerations**



**O vodafone** chair

Performance Interfacing Power Size **Ease-of Use** Integration Cost Programming • Memory Device cost Interfacing • Peripherals System cost Debugging Development cost Time to market



 Digital signal processing techniques are now so **powerful** that sometimes it is extremely difficult, if not impossible, for analogue signal processing to achieve similar performance.

# • Examples:

□ FIR filter with linear phase

□ Adaptive filters



- Analogue signal processing is achieved by using analogue components such as:
  - □ Resistors
  - Capacitors
  - Inductors
- The inherent tolerances associated with these components, temperature, voltage changes and mechanical vibrations can dramatically affect the effectiveness of the analogue circuitry



# • With DSP? - It is easy to:

- □ Change applications
- Correct applications
- Update applications

# • Additionally DSPs reduce:

- □ Noise susceptibility
- □ Chip count
- Development time
- Cost
- Power consumption



# **General Introduction to DSPs**



What are the typical DSP algorithms?



🜔 vodafone chair

The Sum of Products (SOP) is the key element in most DSP algorithms:

| Algorithm                        | Equation                                                                               |
|----------------------------------|----------------------------------------------------------------------------------------|
| Finite Impulse Response Filter   | $y(n) = \sum_{k=0}^{M} a_k x(n-k)$                                                     |
| Infinite Impulse Response Filter | $y(n) = \sum_{k=0}^{M} a_k x(n-k) + \sum_{k=1}^{N} b_k y(n-k)$                         |
| Convolution                      | $y(n) = \sum_{k=0}^{N} x(k)h(n-k)$                                                     |
| Discrete Fourier Transform       | $X(k) = \sum_{n=0}^{N-1} x(n) \exp[-j(2\pi / N)nk]$                                    |
| Discrete Cosine Transform        | $F(u) = \sum_{x=0}^{N-1} c(u) \cdot f(x) \cdot \cos\left[\frac{\pi}{2N}u(2x+1)\right]$ |

Why do we need DSP processors?



🜔 vodafone chair

- Use a DSP processor when the following are required:
  - Cost saving
  - □ Smaller size
  - □ Low power consumption
  - Processing of many "high" frequency signals in real-time
- Use a GPP processor when the following are required:
  - □ Large memory
  - Advanced operating systems

Hardware vs. Microcode multiplication



🚺 vodafone chair

- DSP processors are optimized to perform multiplication and addition operations.
- Multiplication and addition are done in hardware and in one cycle.
- Example: 4-bit multiply (unsigned).

| Hardware | Microcode |         |
|----------|-----------|---------|
| 1011     | 1011      |         |
| x 1110   | x 1110    |         |
| 10011010 | 0000      | Cycle 1 |
|          | 1011.     | Cycle 2 |
|          | 1011      | Cycle 3 |
|          | 1011      | Cycle 4 |
|          | 10011010  | Cycle 5 |

General Purpose DSP vs. DSP in ASIC



🜔 vodafone chair

- Application Specific Integrated Circuits (ASICs) are semiconductors designed for dedicated functions.
- The advantages and disadvantages of using ASICs are listed below:

| Advantages                                                                                                                                                                                         | Disadvantages                                                                                                   |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| <ul> <li>High throughput</li> <li>Lower silicon area</li> <li>Lower power consumption</li> <li>Improved reliability</li> <li>Reduction in system noise</li> <li>Low overall system cost</li> </ul> | <ul> <li>High investment cost</li> <li>Less flexibility</li> <li>Long time from design to<br/>market</li> </ul> |

Floating vs. Fixed point processors



**Ovodafone** chair

- Applications which require:
  - □ High precision
  - □ Wide dynamic range
  - □ High signal-to-noise ratio
  - □ Ease of use
  - →Need a floating point processor
- Drawback of floating point processors:
  - □ Higher power consumption
  - Usually higher cost
  - Usually slower than fixed-point counterparts and larger in size



# TMS320C6711 Architectural Overview





# Specification

□ Clock Rate: 100/150 MHz → 600/900 MFLOPS
 □ 0.18-µm/5-Level Metal Process – CMOS Technology

# • CPU has got **two Datapaths**, altogether:

□ Four ALUs (Floating- and Fixed-Point)

□ Two ALUs (Fixed-Point)

□ Two Multipliers (Floating- and Fixed-Point)

□ Load-Store Architecture

□ 2\*16 32-Bit General-Purpose Registers



- VelociTI → advanced very-long instruction words (VLIW)
  - □ Program Memory Width is 256 Bit
  - □ Up to 8 32-Bit instructions can be executed in parallel/Cycle
  - □ 16, 32 and 40 bit fixed point operands
  - □ 32 and 64 bit floating point operands
  - □ Instruction parallelism is detected at compile-time
    - no data dependency checking is done in Hardware.
  - □ Instruction Packing Reduces Code Size
  - □ All operations work on registers

#### **Memory Architecture**

- □ 4K-Byte L1P Program Cache (Direct Mapped)
- □ 4K-Byte L1D Data Cache (2-Way Set-Associative)
- 64K-Byte L2 Unified Mapped RAM/L2 Cache (Flexible Data/Program Allocation)

# Functional Block and CPU Diagram





# A '6711 Datapath



#### **() vodafone** chair



#### **Functional Units and Operations Performed**



#### **o vodafone** chair

| Functional Unit    | Fixed-Point Operations                                                                                                                                                                                                 | Floating-Point Operations                                                                                                  |  |  |
|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|--|--|
| .L unit (.L1,.L2)  | 32/40-bit arithmetic and compare operations<br>Leftmost 1 or 0 bit counting for 32 bits<br>Normalization count for 32 and 40 bits<br>32-bit logical operations                                                         | Arithmetic operations<br>Conversion operations:<br>DP $\rightarrow$ SP, INT $\rightarrow$ DP, INT $\rightarrow$ SP         |  |  |
| .S unit (.S1, .S2) | 32-bit arithmetic operations<br>32/40-bit shifts and 32-bit bit-field operations<br>32-bit logical operations<br>Branches<br>Constant generation<br>Register transfers to/from the control register file<br>(.S2 only) | Compare reciprocal and reciprocal<br>square-root operations<br>Absolute value operations<br>SP to DP conversion operations |  |  |
| .M unit (.M1, .M2) | 16 $	imes$ 16 bit multiply operations                                                                                                                                                                                  | 32 	imes 32 bit multiply operations Floating-point multiply operations                                                     |  |  |
| .D unit (.D1, .D2) | 32-bit add, subtract, linear and circular address<br>calculation<br>Loads and stores with a 5-bit constant offset<br>Loads and stores with a 15-bit constant offset<br>(.D2 only)                                      | Load double word with a 5-bit constant offset                                                                              |  |  |

# C6700: Instruction Set



vodafone chair

.S Unit .L Unit ADD NEG ABSSP ABS NOT **ADDSP** ADDK **ADD** NOT ABSDP OR ADDDP .S ADD2 OR **CMPGTSP** AND SADD **SUBSP** AND SET CMPEOSP CMPFO SAT **SUBDP** B SHL **CMPLTSP** CMPGT **SSUB INTSP** CLR **CMPI T** SHR CMPGTDP SUB **INTDP** EXT **SSHL CMPEQDP** LMBD **SUBC SPINT** .L MV **SUB** MV XOR **CMPLTDP** DPINT **MVC** NEG SUB2 RCPSP **ZERO SPRTUNC** MVK NORM XOR RCPDP **DPTRUNC** DPSP **MVKH ZERO** RSQRSP .D **RSORDP** .M Unit **SPDP MPY** SMPY **MPYSP** .D Unit **SMPYH MPYDP MPYH** .M **MPYIH MPYI** ADD NEG MPYHL **MPYID** ADDAB (B/H/W) STB (B/H/W)LDB (B/H/W) SUB No Unit Used LDDW **SUBAB** (B/H/W)NOP IDLE MV **ZERO** 



# **'C6000 Internal Buses**



**O vodafone** chair



How are Peripherals Controlled?



🜔 vodafone chair

- Control and configuration of internal peripherals is done by memory mapped control registers
- There is a separate memory mapped register file of control registers

Example of Timer mode control register:

| 31  |    |      | 12   | 11    | 10     | 9      | 8    |
|-----|----|------|------|-------|--------|--------|------|
|     | Rs | vd   |      | TSAT  | INVIMP | CLKSRC | C/P  |
| 7   | 6  | 5    | 4    | 3     | 2      | 1      | 0    |
| HLD | GO | Rsvd | PWID | DATIN | DATOUT | INVOUT | Func |

# 'C6711 Memory Map







### **Memory Map**



# Operands



🜔 vodafone chair

### Operands can be

- □ 5-bit constants (or 16-bit in some special instruct.)
- □ 32-bit Registers
- □ 40-bit Registers
- □ 64-bit Registers
- A 40-bit or a 64-bit register can be obtained by concatenating two registers
  - □ The registers must be from the same side
  - The first register must be even and the second odd (e.g. A1:A0, B9:B8 or A15:A14)
  - □ The registers must be consecutive





- All instructions in each Functional Unit of both Data paths can be executed conditionally
- Only the Registers A1, A2, B0, B1, B2 can hold the condition
- Conditional Execution uses the Syntax

[!condition] Instruction

e.g

| [!B0] | ADD.L1 | A1,A2,A3 | ; add if B0 ==0  |
|-------|--------|----------|------------------|
| [B0]  | ADD.L1 | A1,A2,A3 | ; add if B0 != 0 |





- Branches are required to realize loops and change the program flow
- Branches are very useful in conjunction with conditional execution
- There are two branch types supported:
   Relative Branching
   Absolute Branching



- With this processor all the instructions are encoded in a 32-bit.
  - Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded.





- Case 1:B.S1label
  - Relative branch.
  - Label limited to  $+/- 2^{20}$  offset.

More on the Branch Instruction (2)



🜔 vodafone chair

- By specifying a **register as an operand** instead of a label, it is possible to have an absolute branch.
- This will allow a dynamic range of  $2^{32}$ .





- All Instructions work exclusively on Registers
- The .D Units in the Data-Paths are used to load and store the required Data from and to the Memory
- Load and Store Instructions use an Address operator X:





- There are two addressing modes supported:
  - □ Linear Addressing
  - □ Circular Addressing (e.g. Convolution)
    - Circular Addressing supports block sizes 2<sup>N</sup>
    - Only the lower N bits of the Address are modified by address arithmetic. This equals mod(2<sup>N</sup>) operations.



- The addressing mode is selected by control register "AMR'
- Operands for CA are limited to A4-A7, B4-B7

# Floating vs. Fixed point processors



vodafone chair

#### Fixed point arithmetic

- □ 16-bit (integer or fractional)
- □ Signed or unsigned
- Floating point arithmetic
  - □ 32-bit single precision
  - □ 64-bit single precision
- Using signed and unsigned integers:
  - Multiplication overflow.
  - Addition overflow
    - → Saturate the result
    - ➔ Double precision result
    - ➔ Fractional arithmetic

e.g. If A and B are fractional then:  $A \times B < min(A, B)$ 

$$y(n) = \sum_{k=0}^{N-1} a(k) x(n-k)$$

TU Dresden, 4/29/2008

# C6000 C Data Types



|                   |         | 🚺 vodafor      |  |
|-------------------|---------|----------------|--|
| Туре              | Size    | Representation |  |
| char, signed char | 8 bits  | ASCII          |  |
| unsigned char     | 8 bits  | ASCII          |  |
| short             | 16 bits | 2's complement |  |
| unsigned short    | 16 bits | binary         |  |
| int, signed int   | 32 bits | 2s complement  |  |
| unsigned int      | 32 bits | binary         |  |
| long, signed long | 40 bits | 2's complement |  |
| unsigned long     | 40 bits | binary         |  |
| enum              | 32 bits | 2's complement |  |
| float             | 32 bits | IEEE 32-bit    |  |
| double            | 64 bits | IEEE 64-bit    |  |
| long double       | 64 bits | IEEE 64-bit    |  |
| pointers          | 32 bits | binary         |  |
|                   |         |                |  |

#### Numerical Issues - Useful Tips



vodafone chair

Multiply by 2:

Divide by 2:

Use shift left Use shift right

- Log<sub>2</sub>N: Use shift
- Sine, Cosine, Log: Use look up tables
- To convert a fractional number to hex:
  - □ Num x 2<sup>15</sup>
  - □ Then convert to hex
  - e.g: convert 0.5 to hex
    - $\Box$  0.5 x 2<sup>15</sup> = 16384
    - $\Box$  (16384)<sub>dec</sub> = (0x4000)<sub>hex</sub>

Numerical Issues - 32-bit Multiplication



🜔 vodafone chair

- It is possible to perform 32-bit multiplication using 16-bit multipliers.
- Example: c = a x b (with 32-bit values).





🚺 vodafone chair

# Selected '6711 Peripherals

### C6000 Peripherals





#### The McBSP



vodafone chair

Multichannel Buffered Serial Port

- Up to 100 Mb/sec performance
- 2 (or 3) full-duplex, synchronous serial-ports
- Enables direct interfacing to industry standard Codecs, Analog interface Chips and other serially connected devices
- Supports a wide range of data-sizes, including 8, 12, 16, 20, 24 and 32 bits

→Bit, Word(channel), Frame, Phase

➔In our lab the McBSP is used to connect to the A/D, D/A daughter card





🚺 vodafone chair

- MONOLITHIC 20-BIT DS ADC AND DAC
  - □ 16-/20-BIT INPUT/OUTPUT DATA
  - □ HARDWARE CONTROL: PCM3003
  - □ STEREO ADC: SNR: 90dB & DynamicRange: 90dB
  - □ STEREO DAC: SNR: 94dB & Dynamic Range: 94dB
  - Digital Attenuation (256 Steps), Soft Mute, Digital Loop Back
- SAMPLING RATE: Up to 48kHz
- SYSTEM CLOCK: 256f<sub>S</sub>, 384f<sub>S</sub>, 512f<sub>S</sub>









 When the DSP is NOT powered or under reset the internal program memory is in a random state.







 When the DSP is powered and the CPU is taken out of reset the internal memory is still in a random state and the program will start running for address zero.







 With the boot, a portion of code can be automatically copied from external to internal memory.



🜔 vodafone chair

- DSPs must be able to execute tasks on asynchronous events
- Interrupts suspend the current processor task and save its context
- A interrupt service routine (ISR) is executed
- After completion of the ISR, the context of the former task is restored and the execution continues
- Interrupts are organized hierarchically
   → vs. Polling

# Interrupt Interrupt- and Thread Types



🜔 vodafone chair



- HWI priorities set by hardware
   One ISR per interrupt
- 14 SWI priority levels → Multiple SWIs at each level
- 15 TSK priority levels → Multiple TSKs at each level
- Multiple IDL functions
   Continuous loop
- →HWI triggered by hardware interrupt→IDL runs as the background thread



**O vodafone** chair

### The DSK6711 Development Kit

# **DSK Contents**

#### Hardware

- ◆ 150 MHz 'C6711 DSP
- TI 16-bit A/D Converter ('AD535)
- External Memory
  - 16M Bytes SDRAM
  - 128K Bytes Flash ROM
- ♦ LED's
- Daughter card expansion
- Power Supply & Parallel Port Cable

#### Software

- Code Generation Tools
  - (C Compiler, Assembler & Linker)
- Code Composer Debugger
   (256K program limitation)
- Example Programs & S/W Utilities
  - Power-on Self Test
  - Flash Utility Program
  - Board Confidence Test
  - Host access via DLL
  - Sample Program(s)



#### C6711 DSK Overview



**o vodafone** chair



# Software: (4) PC $\rightarrow$ DSK Communications



vodafone chair CCS uses parallel port to control DSP via JTAG port You can use full TI eXtended Dev System (XDS) via 14 pin header connector Communicate from Windows program (C++, VB) via parallel port using Win32 DLL **Use HPI via Win32 DLL** DSP **JTAG JTAG** Emulation Port

