# AMD's processor lines belonging to the low-power oriented Cat family (Families 14h/16h)

# Dezső Sima

# October 2018

(Ver. 1.1)

© Sima Dezső, 2018

AMD's processor lines belonging to the low-power oriented Cat family (Families 14h/16h)

- 1. Introduction to AMD's low power oriented processor lines
- 2. Family 14h Models 00h-0Fh (Bobcat-based) APU lines
- 3. Family 16h Models 00h-0Fh (Jaguar-based) APU lines
- 4. Family 16h Models 30h-3Fh (Puma+ based) APU lines
- 5. AMD's withdrawal from the mobile market
- 6. References

# 1. Introduction to AMD's low power oriented processor lines

## 1. Introduction to AMD's low power oriented processor lines (1)

#### 1. Introduction to AMD's low power oriented processor lines AMD's move to reshape their mobile and embedded market strategy -1

a) About 2011 AMD recognized the immensely increasing role of the mobile and embedded market segments, as demonstrated in one slide of their 2012 Financial Analyst Day presentations [1], shown below. AMD's view of worldwide turnover and growth rate of different processor segments [1]



#### AMD's move to reshape their mobile and embedded market strategy -2

- b) In 1/2011 AMD's CEO (Chief Executive Officer) and President Dirk Meyer resigned mainly due to his alleged ignorance of the netbooks, handheld and embedded market space.
- c) In 2011/2012's roadmaps and product announcements AMD put much more emphasis on mobile, laptop and embedded products, and also
- d) In 1/2012 AMD re-branded their Fusion APU concept to Heterogeneous Systems Architecture (HAS) to indicate their broader scoop of interest (broader than alone considering accelerated graphics) on the processor market.

#### Remarks

- Dirk Meyer was an outstanding processor architect, he has co-designed three very successful processors; DEC's Alpha 21064, 21264 as well as AMD's Athlon processors.
- In his role as CEO (2008-2011) he focused on the PC and data center market and intended to address the mobile and consumer electronics markets later [16].

#### AMD's revised concept for addressing the breadth of the processor market -1

Traditionally, AMD covered all market segments by the same processor design typically with two alternative implementations, including

- one full-fledged processor design that addresses the performance oriented server and desktop segments and a number of low cost, low power design for the mainstream desktop to the mobile segments, as derivatives of the full-fledged basic architecture.
- These derivative designs provide typically less resources, such as less cores, smaller L2 cache size or lack of the L3 cache, and are running at lower clock speeds.
   All in all these designs provide lower power consumption at lower price and performance.

#### Example

### AMD's K10.5 Shanghai based server-, desktop- and mobile lines -1

As shown below

a) the Shanghai-based high performance desktop core (Deneb) is obviously based on the original Shanghai core,

#### **Contrasting the K10.5-based Shanghai server and Deneb desktop dies**





4 C L2: 512 KB/C L3: 6 MB 258 mm<sup>2</sup>, 758 mtrs

Deneb core [18]

4 C L2: 512 KB/C L3: 6 MB 258 mm<sup>2</sup>, 758 mtrs

#### AMD's K10.5 Shanghai based server-, desktop- and mobile lines -2

b) the K10.5 Shanghai-based mainstream and value desktop cores (Propus, Regor) are L3-less derivatives of Deneb with reduced L2/Core cache size or core count.

## 1. Introduction to AMD's low power oriented processor lines (8)



[18]

[20]

#### AMD's K10.5 Shanghai based server-, desktop- and mobile lines -2

c) The K10.5 Shanghai-based mobile cores (Caspian, Champlain) are then derivatives of the mainstream and value desktop cores (Regor and Propus).

## 1. Introduction to AMD's low power oriented processor lines (10)



AMD-s K10.5 Shanghai-based native mobile architectures

[21]

\*\*\*\*\*\*



#### AMD's K10.5 Shanghai based server-, desktop- and mobile lines -2

To sum it up, despite of the wide variety of K10.5 Shanghai-based server, desktop and mobile cores, all of these designs are in fact derivatives of the basic K10.5 Shanghai core, i.e.

in their K10.5 Shanghai based cores AMD maintained in fact the same basic design.

#### AMD's revised concept for addressing the breadth of the processor market -2

In 2011, along with the introduction of their Bobcat lines, AMD changed their design concept and opted henceforth for covering the wide processor market by two distinct processor designs rather than one in order to better optimize power and performance features of their lines [23].

According to the new concept

- the Bulldozer design focuses on the performance oriented server and desktop market, whereas
- the Bobcat design addresses the low cost, low power mobile, entry-level desktop and embedded market.
- In addition, desktop, mobile and embedded devices provide basically an integrated graphics as well.
- This is in line with AMD's visionary Fusion system architecture concept, announced after AMD's merger with ATI in 2006 that became renamed to the Heterogeneous System Architecture concept in 2012.

#### Remark

Intel made the same movement already in 2008 when they introduced their Atom line. Since then Intel pursues two major designs lines;

- their major line that focuses on performance oriented servers and desktops, whereas
- their Atom line that addresses low cost, low power mobile devices.

#### **Evolution of AMD's basic architectures**



Overview of AMD's low power oriented APU lines (embedded/microserver APUs not shown) [14]



#### Brand names of AMD's Family 14h and 16h processor lines

|                         | Launched in                                         | 2011                                      | 2012                                | 2013                                | 2014                               | 2015                              |
|-------------------------|-----------------------------------------------------|-------------------------------------------|-------------------------------------|-------------------------------------|------------------------------------|-----------------------------------|
|                         |                                                     | Family 14h<br>(00h-0Fh)<br>(Bobcat)       | Family 14h<br>(00h-0Fh)<br>(Bobcat) | Family 16h<br>(00h-0Fh)<br>(Jaguar) | Family 16h<br>(30h-3Fh)<br>(Puma+) | Family 16h<br>(30h-3Fh)<br>(Puma+ |
|                         | 4P servers                                          |                                           |                                     |                                     |                                    |                                   |
| Servers                 | 2P servers                                          |                                           |                                     |                                     |                                    |                                   |
|                         | 1P servers                                          |                                           |                                     |                                     |                                    |                                   |
| Se                      | (85-140 W)                                          |                                           |                                     |                                     |                                    |                                   |
|                         | <b>High perf.</b><br>(~95-125 W)                    |                                           |                                     |                                     |                                    |                                   |
| Desktops                | Mainstream<br>(~65-100 W)                           |                                           |                                     |                                     |                                    |                                   |
|                         | <b>Entry level</b><br>(~30-60 W)                    |                                           |                                     |                                     |                                    |                                   |
| oks                     | High performance/<br>mainstream/entry<br>(~30-60 W) |                                           |                                     | Kabini A6                           |                                    |                                   |
| Notebooks               | <b>Ultra portable</b><br>(~10-15 W)                 | Zacate<br>E-Series<br>Ontario<br>C-Series | Zacate<br>E1/E2                     | Kabini<br>A/E-Series                | Beema<br>A/E-Series                | Carrizo-L<br>A/L-Series           |
| <b>Tablet</b><br>(~5 W) |                                                     | Desna<br>Z-Series                         |                                     | Temash<br>A Series                  | Mullins<br>A Series/E1             |                                   |

#### Main features of AMD's Family 14h/16h (Cat-based) ultra-thin notebook processor lines

| Base arch.                                | Intro   | Ultra-thin<br>mobile<br>family | Series          | Techn. | Core count<br>(up to)                                  | L2<br>(up to)              | L3            | GPU<br>(APU) | Memory<br>(up to) | TDP<br>[W] | Socke<br>t   |
|-------------------------------------------|---------|--------------------------------|-----------------|--------|--------------------------------------------------------|----------------------------|---------------|--------------|-------------------|------------|--------------|
|                                           | 1/2011  | Zacate<br>(not SoC)            | E<br>Series     | 40 nm  | 2                                                      | 512 KB/<br>core<br>Private | -             | Yes          | DDR3L-<br>1333    | 18         | FT1<br>(BGA) |
| <b>Family14h</b><br>(00h-0Fh)<br>(Bobcat) | 6/2012  | Zacate<br>(not SoC)            | E1/E2<br>Models | 40 nm  | 2                                                      | 512 KB/<br>core<br>Private | -             | Yes          | DDR3L-<br>1333    | 18         | FT1<br>(BGA) |
|                                           | 1/2011  | Ontario                        | C<br>Series     | 40 nm  | 2                                                      | 512 KB/<br>core<br>private | -             | Yes          | DDR3-<br>1066     | 9          | FT1<br>(BGA) |
| Family 16h<br>(10H-1fH)<br>(Jaguar)       | 5/2013  | Kabini<br>(SoC)                | A<br>Series     | 28 nm  |                                                        | 2 MB<br>shared             | -             | Yes          | DDR3L-<br>1866    | 15         | FT3          |
| Family 16h                                | 4/2014  | Beema<br>(SoC)                 | A<br>Series     | 28 nm  | 4 cores<br>with a<br>shared L2<br>cache                | 2 MB<br>shared             | -             | Yes          | DDR3L-<br>1866    | 15         | FT3b         |
| (30H-3fH)<br>(Puma+)                      | 5/2015  | Carrizo-L<br>(SoC)             | A<br>Series     | 28 nm  |                                                        | 2 MB<br>shared             | -             | Yes          | DDR3L-<br>1866    | 10/<br>15  | FP4          |
| Family 17h<br>(00H-0fH)<br>(Zen)          | 10/2017 | Raven<br>Ridge<br>(SoC)        | Ryzen<br>7/5/3  | 14 nm  | 4-core CCX,<br>private L2<br>and shared<br>L3 cache(s) | ½ MB/<br>core              | 1 MB/<br>core | Yes          | DDR4-<br>2400     | 15         | AM4          |

APU: Accelerated Processing Unit (CPU +GPU) CCX: Core CompleX

<sup>2</sup>: 2\*512 KB for Turion X2, 2\*1 MB for Turion X2 Ultra

# 2. Family 14h Models 00h-0Fh (Bobcat-based) APU lines

- 2.1 Overview of the Bobcat-based APU lines
- 2.2 The Bobcat core
- 2.3 APU-lines of the Brazos platform
- 2.4 APU lines of the Brazos 2.0 platform
- 2.5 APU lines of the Embedded G-Series platform

# 2.1 Overview of the Bobcat-based APU lines

#### 2.1 Overview of the Bobcat-based APU lines

• Bobcat-based APUs are AMD's first products with the Fusion brand name, that were introduced at the Consumer Electronics Show (CES) in 1/2011.

They are the basic parts of the Brazos platform that focuses on the mobile market, but includes desktop models as well.

- The Brazos platform became one of AMD's most successful products that is typically used in ultra-light notebooks and tablets [24], [25].
- Bobcat-based APUs compete with Intel's Atom processors.
- They mark AMD's new market strategy that positively repositions the mobile and desktop market segments in AMD's overall market policy.
- Bobcat based Fusion products include basically up to two Bobcat cores and a GPU with similar capabilities than a low-end discrete graphics card.
- 40 nm technology, 75 mm<sup>2</sup>, 450 mtrs.

#### Remark [26]

Bobcat based Brazos systems was a turning point for AMD.

The firm sold until the launch of the Jaguar-based 28 nm mobile products (Temash, Kabini) nearly 50 million Brazos systems.

Jaguar-based systems improve both IPC (about 20 %) and power consumption (also about 20 %) vs. the previous products.

According to industry sources [27] these low power systems seems to be AMD's last hope to avoid bankruptcy and remain alive.

#### **Overview of the Bobcat-based APU lines**



Positioning of the Bobcat-based APU lines-1 (embedded lines not shown) [based on 14]



Positioning of the Bobcat-based APU lines-2 (embedded lines not shown) [14]



#### **Brand names of Bobcat-based processor lines**

|                         | Launched in                                         | 2011                                      | 2012                                | 2013                                | 2014                               | 2015                              |
|-------------------------|-----------------------------------------------------|-------------------------------------------|-------------------------------------|-------------------------------------|------------------------------------|-----------------------------------|
|                         |                                                     | Family 14h<br>(00h-0Fh)<br>(Bobcat)       | Family 14h<br>(00h-0Fh)<br>(Bobcat) | Family 16h<br>(00h-0Fh)<br>(Jaguar) | Family 16h<br>(30h-3Fh)<br>(Puma+) | Family 16h<br>(30h-3Fh)<br>(Puma+ |
|                         | 4P servers                                          |                                           |                                     |                                     |                                    |                                   |
| Servers                 | 2P servers                                          |                                           |                                     |                                     |                                    |                                   |
|                         | 1P servers                                          |                                           |                                     |                                     |                                    |                                   |
| Se                      | (85-140 W)                                          |                                           |                                     |                                     |                                    |                                   |
|                         | High perf.<br>(~95-125 W)                           |                                           |                                     |                                     |                                    |                                   |
| Desktops                | <b>Mainstream</b><br>(~65-100 W)                    |                                           |                                     |                                     |                                    |                                   |
|                         | <b>Entry level</b><br>(~30-60 W)                    |                                           |                                     |                                     |                                    |                                   |
| oks                     | High performance/<br>mainstream/entry<br>(~30-60 W) |                                           |                                     | Kabini A6                           |                                    |                                   |
| Notebooks               | <b>Ultra portable</b><br>(~10-15 W)                 | Zacate<br>E-Series<br>Ontario<br>C-Series | Zacate<br>E1/E2                     | Kabini<br>A/E-Series                | Beema<br>A/E-Series                | Carrizo-L<br>A/L-Series           |
| <b>Tablet</b><br>(~5 W) |                                                     | Desna<br>Z-Series                         |                                     | Temash<br>A Series                  | Mullins<br>A Series/E1             |                                   |

# 2.2 The Bobcat core

- 2.2.1 The microarchitecture of the Bobcat core
- 2.2.2 Main features of the Bobcat core

# 2.2.1 The microarchitecture of the Bobcat core

#### 2.2.1 The microarchitecture of the Bobcat core

The Bobcat core is the x86 "engine" of Bobcat-based APUs, as indicated in the next example.

#### **Example: Use of dual Bobcat cores in the Zacate processor** [28]



#### Rough block diagram of the microarchitecture of the Bobcat core [28]



512 KB/core ECC protected L2, clocked at ½ clock rate to reduce power

#### More detailed block diagram of the microarchitecture of the Bobcat core [8]



#### **Bobcat's floorplan** [8]



#### **Example: Block diagram of the Zacate APU** [28]



**Example: Floorplan and power domains of the Zacate die that includes 2 Bobcat cores**[28]



#### Use of the Bobcat CPU core in AMD's Family 14h-based mobile and embedded lines

- All the Bobcat based Zacate, Ontario and Desna mobile and G-series Fusion APUs have the same common design that includes basically two Bobcat CPU cores and a GPU, as detailed in Sections 4.xx and in Sections 4.xx and 4.xx.
- Nevertheless, in different models AMD disables a CPU or the GPU core or both and includes different GPU cores, as indicated next.

#### Use of different GPU cores in Bobcat-based Ontario, Zacate and Desna mobile models [29]

#### "Zacate" (40 nm)

|         |         |       |          |       |            |                    |                   |         | 1                   |         |         |           |      |              |
|---------|---------|-------|----------|-------|------------|--------------------|-------------------|---------|---------------------|---------|---------|-----------|------|--------------|
|         | C       |       |          |       | CPU        |                    |                   |         | G                   | PU      |         |           | TOD  | Dalasad      |
| Model   | Step.   | Cores | Freq.    | Turbo | L2 Cache   | Multi <sup>1</sup> | V <sub>core</sub> | Model   | Config <sup>2</sup> | Freq.   | Turbo   | Memory    | TDP  | Released     |
| E-240 🗗 | B0      | 1     | 1.5 GHz  |       | 512 KB     | 15×                | 1.175 - 1.35      |         |                     | 500 MHz |         | DDR3-1066 |      | Jan 4, 2011  |
| E-300   | B0      |       | 1.3 GHz  |       |            | 13×                |                   | HD 6310 | 80:8:4              | 488 MHz | N/A     | DDR3-1333 |      | Aug 22, 2011 |
| E-350 🗗 | B0      |       | 1.6 GHz  | N/A   |            | 16×                | 1.25 - 1.35       |         |                     | 492 MHz |         | DDR3-1066 |      | Jan 4, 2011  |
| E-450   | C0      | 2     | 1.65 GHz |       | 2 × 512 KB | 16.5×              |                   | HD 6320 | 80:8:4              | 508 MHz | 600 MHz | DDR3-1333 | 18 W | Aug 22, 2011 |
| E1-1200 |         |       | 1.4 GHz  | ]     |            | 14×                |                   | HD 7310 | 80:8:4              | 500 MHz | N/A     | DDR3-1066 | ]    | 04.0040      |
| E2-1800 |         |       | 1.7 GHz  | 1     |            | 17×                |                   | HD 7340 | 80:8:4              | 523 MHz | 680 MHz | DDR3-1333 | 1    | Q1 2012      |
|         |         |       |          |       |            |                    | ,                 |         |                     |         |         | ,         |      |              |
| Ontario | " (40 r | ım)   |          |       |            |                    |                   |         |                     |         |         |           |      |              |
|         |         |       |          |       |            |                    |                   |         |                     |         |         |           |      |              |

#### "Ontario" (40 nm)

| Madal  | Ctore |       |         |          | CPU        |                    |                   |         | G                   | PU      |         |           | TOD | Deleased     |
|--------|-------|-------|---------|----------|------------|--------------------|-------------------|---------|---------------------|---------|---------|-----------|-----|--------------|
| Model  | step. | Cores | Freq.   | Turbo    | L2 Cache   | Multi <sup>1</sup> | V <sub>core</sub> | Model   | Config <sup>2</sup> | Freq.   | Turbo   | Memory    | TDP | Released     |
| C-30 🗗 | DA    | 1     | 1.2 GHz | N/A      | 512 KB     | 12×                | 1.25 - 1.35       |         | 00.0.4              | 070 MUL | N/A     |           |     | lan ( 0011   |
| C-50 🗗 | B0    | 2     | 1.0.01- | N/A      | 2 542 KB   | 10×                | 1.05 - 1.35       | HD 6250 | 00:0:4              | 276 MHz | N/A     | DDR3-1066 | 9 W | Jan 4, 2011  |
| C-60   | C0    | 2     | 1.0 GHz | 1.33 GHz | 2 × 512 KB | 10-13.3×           |                   | HD 6290 | 80:8:4              | 276 MHz | 400 MHz |           |     | Aug 22, 2011 |

#### "Desna" (40nm)

| Madal  | Cton  |       |         | C     | :PU        |                    |                   |         | GP                  | U       |       | Mamanu    | TOD   | Delegend     |
|--------|-------|-------|---------|-------|------------|--------------------|-------------------|---------|---------------------|---------|-------|-----------|-------|--------------|
| Model  | step. | Cores | Freq.   | Turbo | L2 Cache   | Multi <sup>1</sup> | V <sub>core</sub> | Model   | Config <sup>2</sup> | Freq.   | Turbo | Memory    | TDP   | Released     |
| Z-01 🗗 | B0    | 2     | 1.0.04- | N/A   | 2 × 512 KB | 10×                |                   |         | 00.0.4              | 076 MU- | N/A   | DDD2 1000 | 5.9 W | June 1, 2011 |
| Z-03   |       | 2     | 1.0 GHz | N/A   | 2 × 512 KD | 10×                |                   | HD 6250 | 00:0:4              | 276 MHz | N/A   | DDR3-1066 | 4.5 W | Q1 2012      |

# 2.2.2 Main features of the Bobcat core

2.2.2 Main features of the Bobcat CPU core-1 [8]

2.2.2.1 Main features of the Bobcat core affecting the performance or security

# **Bobcat Core Overview**

#### Advanced Micro-architecture

- Dual x86 Decode
- Advanced Branch Predictor
- Full OOO instruction execution
- Full OOO load/store engine
- High Performance Floating Point
- AMD64 64-bit ISA
- SSE1,2,3, SSSE3 ISA
- Secure Virtualization
- 32kb L1s, 512kb L2

#### Low Power Design

- Power Optimized Execution
- Micro-architecture that minimizes data movement and unnecessary reads
- Clock gating, Power gating
- System Low Power States

#### Small Core

 Area efficient balance of high performance and low power





#### The Turbo Core technology in Bobcat cores

- First Bobcat based models were Stepping B0 devices, they were introduced in 1/2011, they did not provide the Turbo Core technology.
- Turbo Core technology appeared along with select dual core Stepping C0 mobile devices that appeared in 8/2011, such as
  - the E450 (Zacate) model that supports Turbo Core for the GPU and
  - the C60 (Ontario) model that supports Turbo Core for both the CPU and the GPU cores [29].

There is no available publication to date that gives details of how Turbo Core is implemented in Bobcat-based processors, nevertheless, it can be assumed that Turbo Core is implemented in these processors in the same way as in the Llano processors, i.e. based on a digital power monitor circuitry, described in [30].

#### Supporting the Turbo Core mode in different Bobcat-based mobile models [29]

#### "Zacate" (40 nm)

| Madel   | Char     |       |          |       | CPU        |                    |                   |         | G                   | PU      |         |           | тор   | Delegend     |
|---------|----------|-------|----------|-------|------------|--------------------|-------------------|---------|---------------------|---------|---------|-----------|-------|--------------|
| Model   | step.    | Cores | Frec.    | Turbo | L2 Cache   | Multi <sup>1</sup> | V <sub>core</sub> | Model   | Config <sup>2</sup> | Freq.   | Turbo   | Memory    | TDP   | Released     |
| E-240 🗗 | B0       | 1     | 1.5 GHz  |       | 512 KB     | 15×                | 1.175 - 1.35      |         |                     | 500 MHz |         | DDR3-1066 |       | Jan 4, 2011  |
| E-300   | B0       |       | 1.3 GHz  |       |            | 13×                |                   | HD 6310 | 80:8:4              | 488 MHz | N/A     | DDR3-1333 |       | Aug 22, 2011 |
| E-350 🗗 | B0       |       | 1.6 GHz  | N/A   |            | 16×                | 1.25 - 1.35       |         |                     | 492 MHz |         | DDR3-1066 | 10.14 | Jan 4, 2011  |
| E-450   | C0       | 2     | 1.65 GHz |       | 2 × 512 KB | 16.5×              |                   | HD 6320 | 80:8:4              | 508 MHz | 600 MHz | DDR3-1333 | 18 W  | Aug 22, 2011 |
| E1-1200 |          |       | 1.4 GHz  |       |            | 14×                |                   | HD 7310 | 80:8:4              | 500 MHz | N/A     | DDR3-1066 |       | 01 0010      |
| E2-1800 |          |       | 1.7 GHz  |       |            | 17×                |                   | HD 7340 | 80:8:4              | 523 MHz | 680 MHz | DDR3-1333 |       | Q1 2012      |
| Ontario | '' (40 n | im)   |          |       |            |                    | ·                 |         | <u>.</u>            |         |         |           | *     | ·            |

# "*Ontario*" (40 nm)

| Madal  | Ctore |          |         |          | CPU        |                    |                   |         | G                   | PU        |         |           | тор | Delessed     |
|--------|-------|----------|---------|----------|------------|--------------------|-------------------|---------|---------------------|-----------|---------|-----------|-----|--------------|
| Model  | step. | Cores    | Freq.   | Turbo    | L2 Cache   | Multi <sup>1</sup> | V <sub>core</sub> | Model   | Config <sup>2</sup> | Freq.     | Turbo   | Memory    | TDP | Released     |
| C-30 🗗 | BO    | 1        | 1.2 GHz | N/A      | 512 KB     | 12×                | 1.25 - 1.35       | HD 6250 | 80:8:4              | 276 MHz   | N/A     |           |     | len 4, 2011  |
| C-50 🗗 | B0    | 2        | 1.0 GHz |          | 2 × 512 KB | 10×                | 1.05 - 1.35       | HD 0200 | 00.0.4              | 270 10112 | N/A     | DDR3-1066 | 9 W | Jan 4, 2011  |
| C-60   | C0    | 2        | 1.0 GHZ | 1.33 GHz |            | 10-13.3×           |                   | HD 6290 | 80:8:4              | 276 MHz   | 400 MHz |           |     | Aug 22, 2011 |
|        |       |          |         |          |            |                    |                   |         |                     |           |         |           |     |              |
|        | "De   | sna'' (4 | 40nm)   |          |            |                    |                   |         |                     |           |         | -         |     |              |

#### "Desna" (40nm)

| Model         Step.         CPU         GPU         Memory         TDP         Ref           Cores         Freq.         Turbo         L2 Cache         Multi <sup>1</sup> V <sub>core</sub> Model         Config <sup>2</sup> Freq.         Turbo         TDP         Ref | Released  |          |          |       |      |         |                     |         |                   |                    |            |       |         |       |       |        |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|----------|----------|-------|------|---------|---------------------|---------|-------------------|--------------------|------------|-------|---------|-------|-------|--------|
| Cores Freq. Turbo L2 Cache Multi <sup>1</sup> V <sub>core</sub> Model Config <sup>2</sup> Freq. Turbo                                                                                                                                                                      |           |          |          |       |      | บ       | GP                  |         |                   |                    | PU         | C     |         |       | Cton  | Madal  |
|                                                                                                                                                                                                                                                                            | Released  | IDP R    | nemory   | urbo  | . Tu | Freq.   | Config <sup>2</sup> | Model   | V <sub>core</sub> | Multi <sup>1</sup> | L2 Cache   | Turbo | Freq.   | Cores | step. | woder  |
| Z-01 @ B0 2 1.0 GHz N/A 2 × 512 KB 10× HD 6250 80:8:4 276 MHz N/A DDR3-1066 5.9 W Jun                                                                                                                                                                                      | une 1, 20 | 5.9 W Ju | 002 1000 | N/A D |      | 076 MU- | 00.0.4              |         |                   | 10                 | 0 510 KB   | NZA   | 1.0.00- | 2     | B0    | Z-01 🗗 |
|                                                                                                                                                                                                                                                                            | Q1 2012   |          |          | NVA D |      |         | 00:0:4              | HD 6250 |                   | 10×                | 2 × 512 KD | N/A   | 1.0 GHZ | 2     |       | Z-03   |

### 2.2.2.2 Main features of the Bobcat core affecting the power consumption [8]



### 1) Power optimized microarchitecture [31]

### • Introducing physical register files instead of future files for register renaming

- While using physical register files for register renaming the physical register files keep both the architectural state and the result values of the execution that are waiting for writing back into the architectural registers in sequential order.
- Such an implementation replaces data movements with pointer updates that results in power saving.

### Pointer based implementation of queues

This minimizes data movements versus traditional implementations of queues where data will be shifted for adding or removing data.

#### Remark

For an overview of register renaming see e.g. [62].

2) Using effective power saving techniques related to clocking and powering the processor

Bobcat cores utilize

- Clock gating
- Power gating as well as
- Core CC6 and package PC6 states,
- i.e. low power techniques used also in Llano or Bulldozer cores, and described in the related Section discussing the Llano lines.

Remarks

- 1) Data size-based clock gating is widely used in Bobcat cores for less than 64-bit operations, for this reason e.g. the upper and lower halves of all result buses and forwarding logic are clocked independently to allow clock gating of not needed circuitry [31].
- 2) Power gating is implemented differently in the Bobcat and Llano cores and Bulldozer modules. Whereas in the Llano cores and Bulldozer modules AMD power isolates VSS (ground), in the Bobcat cores VDD will be isolated as shown next [28].

#### **Power gating the Bobcat cores** [28]



#### **Power gating the NBB units** [28]



GFX: Graphics CoreGMC: Graphics Memory Control (if memory is in self-refresh)UVD: Unified Video Decoder Power gating of VDDNB in Zacate's VDDNB power grid implementation [28]



VDDNB: Supply voltage of the North Bridge VDDNB\_INT: Power gated supply voltage of the North Bridge VSS: Ground

M7, M8: Metal layers

### **Power gating of VSS in AMD's Llano cores** [32]



## **Power gating of VSS in AMD's Bulldozer modules** [33]



### Remark

Nevertheless, in Llano's NB AMD seems to power-gate VDDNB rather than VSS [34]



- With GFX: Graphics Core
  - GMC: Graphics Memory Control (when memory is in self-refresh)
  - UVD: Unified Video Decoder
  - PCIE GFX CONTROL: the x16 PCIe Graphics Expansion Controller
  - AON: Always on (not power gated circuitry)

The GFX and GMC units are dynamically (hardware controlled) power gated whereas power gating of the other units is static, i.e. done under software (driver) control.

### **Implementation of the Core C6 state with power gating in a Bulldozer module** [33]

## POWER MANAGEMENT | Core C6 State (CC6)

- Core C6: if a core isn't active, remove power
- Implemented in this physical design by a power gating ring that isolates the Core VSS for each Bulldozer module from the "Real" VSS
- CC6 entry: when both Bulldozer cores in the module are idle, flush caches and dump register state to CC6 save space, then gate Core VSS

CC6 exit: ungate Core VSS, reload CC6 saved state, resume execution (ex: service interrupts, etc.)



## 2.2.2 Main features of the Bobcat core (13)

#### The Core C6 state (CC6 state)/Package C6 state (PC6 state) (simplified) [35]

#### Entering the CC6 state

- 1. OS requests entering into the CC6 state by issuing a specific I/O read or HLT instruction.
- 2. The processor enters the CC6 state if associated monitors allow this.

#### The process of entering the CC6 state

- 1. L1 and L2 caches are flushed to DRAM by hardware.
- 2. Internal core state is saved to DRAM by hardware.
- 3. The core clock ramps down to a low value..

#### Entering the power gating state

If a core entered the CC6 state and a transition to power gating is allowed the core becomes power gated by Vss removed.

#### Exiting the Core C6 state (CC6 state)

There is a set of events, such as interrupts, which cause a core to exit the CC6 state.

#### Entering the Package C6 state (PC6 state)

If all cores have entered the CC6 state and a transition to the PC6 state is enabled (by monitors), the processor enters the PC6 state.

#### Exciting from the Package C6 state (PC6 state)

If one of the cores leaves the CC6 state the processor excites the PC6 state.

## 2.2.2 Main features of the Bobcat core (14)

## Remark [35]

Flow diagram of entering

- the Core C6 (CC6) and
- the Package C6 (PC6) states



Wait for next Cstate request.

## 2.3 APU-lines of the Brazos platform

- 2.3.1 Overview of the Bobcat based APU lines of the Brazos platform
- 2.3.2 Mobile APU lines of the Brazos platform

2.3.1 Overview of the APU lines of the Brazos platform

## 2.3.1 Overview of the Bobcat based APU lines of the Brazos platform (1)

### 2.3.1 Overview of the APU lines of the Brazos platform



Direct X 11 capable graphics.

No Turbo Core on first introduced, Stepping B0 devices in 1/2011, CPU and GPU Turbo Core introduced later on select dual core Stepping C0 Series E and C devices in 8/2011. Socket: FT1 BGA.

### Target markets of Bobcat based APU lines of the Brazos platform [36]



All days battery life

#### Positioning of the Bobcat-based mobile APU lines of the Brazos platform [14]



# 2.3.2 Mobile APU lines of the Brazos platform

## 2.3.2 Mobile APU lines of the Brazos platform (1)

#### 2.3.2 Bobcat based mobile APU lines of the Brazos platform



Direct X 11 capable graphics.

No Turbo Core on first, Stepping B0 devices in 1/2011,

CPU and GPU Turbo Core introduced later on select dual core Stepping C0 Series E and C devices in 8/2011. Socket: FT1 BGA.

#### Positioning of the Mobile APU lines of the Brazos platform [based on 14]



## 2.3.2 Mobile APU lines of the Brazos platform (3)

#### The common block diagram of the Zacate, Ontario and Desna Fusion APUs [28]



GMC: Graphics Memory Controller DP: Display port DFS: Digital Frequency Synthesizer UMI: Unified Media interface

FCH: Fusion Controller Hub LVDS: Low Voltage Differential Signaling VGA: Video Graphics Array

#### Remarks

- 1) AMD's Fusion architecture implements an efficient form of the UMA architecture (Unified Memory Architecture) in which a part of the system memory is used as the graphics frame buffer [28].
- 2) The Zacate, Ontario and Desna processors are part of the Brazos mobile platform, as already stated.

#### **Microarchitecture of Mobile APUs of the Brazos platform**

- Zacate, Ontario and Desna APUs have basically the same microarchitecture.
- The main difference between them is that they have decreasing clock rates (both CPU and GPU clock rates) and accordingly decreasing power consumption (18 W/9 W/~5 W), as shown below.

## 2.3.2 Mobile APU lines of the Brazos platform (6)

#### Key features of Mobile APU lines of the Brazos platform



Direct X 11 capable graphics.

No Turbo Core on first, Stepping B0 devices in 1/2011, CPU and GPU Turbo Core introduced later on select dual core Stepping C0 Series E and C devices in 8/2011.

Socket: FT1 BGA.

#### Main features of mobile APU lines of the Brazos platform

|                  |                                                       | Core<br>count | L2       | GPU                                         | fc (GPU)<br>(MHz) | 1 Ch. DDR3<br>(up to) |                                                                                                             |
|------------------|-------------------------------------------------------|---------------|----------|---------------------------------------------|-------------------|-----------------------|-------------------------------------------------------------------------------------------------------------|
|                  | Mainstream A-Series<br>(Llano-based)                  |               |          |                                             |                   |                       |                                                                                                             |
| M                | Essential<br>E-series<br>(Zacate)                     | 2C            | 2x512 KB | HD 6320                                     | 488-508           | DDR3-1333             | 1/11<br><b>Zacate</b><br>40 nm, ~75 mm <sup>2</sup> , ~450 mtrs<br>1.3-1.65 GHz<br>18 W, FT1 BGA<br>E-4xx   |
| b                | (Bobcat-based)<br>Value                               | 1C            | 512 KB   | HD 6310                                     | 488-500           | DDR3-1333             | E-2xx/E-3xx                                                                                                 |
| l<br>l<br>e<br>s | C-series<br>Low power (Ontario)<br>(Bobcat-based)     | 2C            | 2x512 KB | HD 6250<br>(C-30/C-50)<br>HD 6290<br>(C-60) | 276               | DDR3-1066             | 1/11<br><b>Ontario</b><br>40 nm, ~75 mm <sup>2</sup> , ~450 mtrs<br>1.0-1.2 GHz<br>9 W, FT1 BGA<br>C-50/C60 |
|                  |                                                       | 1C            | 512 KB   |                                             |                   |                       | C-30                                                                                                        |
|                  | Z-series<br>Ultra Low (Desna)<br>power (Bobcat-based) | 2C            | 2x512 KB | HD 6250                                     | 276               | DDR3-1066             | 6/11<br><b>Desna</b><br>40 nm, ~75 mm², ~450 mtrs<br>1.0 GHz<br>5.9 W, FT1 BGA<br>Z-0x                      |

# Differences in the microarchitectures of the Bobcat-based Zacate, Ontario and Desna APUs-1

a) A few Zacate and Ontario models (E-series and C-series) have only a single Bobcat core, while the other core is disabled on the die, as indicated in the next Figure.

# Differences in the microarchitectures of the Bobcat-based Zacate, Ontario and Desna APUs-2

|                  |                                      | Core<br>count | L2       | GPU                                         | <b>fc (GPU)</b><br>(MHz) | 1 Ch. DDR3<br>(up to) |                                                                                                             |
|------------------|--------------------------------------|---------------|----------|---------------------------------------------|--------------------------|-----------------------|-------------------------------------------------------------------------------------------------------------|
|                  | Mainstream A-Series<br>(Llano-based) |               |          |                                             |                          |                       |                                                                                                             |
| M                | Essential<br>E-series<br>(Zacate)    | 2C            | 2x512 KB | HD 6320                                     | 488-508                  | DDR3-1333             | 1/11<br><b>Zacate</b><br>40 nm, ~75 mm <sup>2</sup> , ~450 mtrs<br>1.3-1.65 GHz<br>18 W, FT1 BGA<br>E-4xx   |
| b                | Value                                | 1C            | 512 KB   | HD 6310                                     | 488-500                  | DDR3-1333             | E-2xx/E-3xx                                                                                                 |
| i<br>I<br>e<br>s | C-series<br>Low power (Ontario)      | 2C            | 2x512 KB | HD 6250<br>(C-30/C-50)<br>HD 6290<br>(C-60) | 276                      | DDR3-1066             | 1/11<br><b>Ontario</b><br>40 nm, ~75 mm <sup>2</sup> , ~450 mtrs<br>1.0-1.2 GHz<br>9 W, FT1 BGA<br>C-50/C60 |
|                  |                                      | 1C            | 512 КВ   |                                             |                          |                       | C-30                                                                                                        |
|                  | Ultra Low Z-series<br>power (Desna)  | 2C            | 2x512 KB | HD 6250                                     | 276                      | DDR3-1066             | 6/11<br><b>Desna</b><br>40 nm, ~75 mm <sup>2</sup> , ~450 mtrs<br>1.0 GHz<br>5.9 W, FT1 BGA<br>Z-0x         |

# Differences in the microarchitectures of the Bobcat-based Zacate, Ontario and Desna APUs-3

b) Different models of the Zacate, Ontario and Desna lines include slightly different GPUs, nevertheless all have the same number of processing units (80), as shown below.

#### Use of different GPU cores in different mobile processors [29]

#### "Zacate" (40 nm)

|         |         |       |     |          |       |            |                    |                   | · — — · |                     |         |         |            |       |              |
|---------|---------|-------|-----|----------|-------|------------|--------------------|-------------------|---------|---------------------|---------|---------|------------|-------|--------------|
|         |         |       |     |          |       | CPU        |                    |                   |         | G                   | PU      |         |            | TOD   | Deleved      |
| Mode    | I Ste   | Cor   | es  | Freq.    | Turbo | L2 Cache   | Multi <sup>1</sup> | V <sub>core</sub> | Model   | Config <sup>2</sup> | Freq.   | Turbo   | Memory     | TDP   | Released     |
| E-240 d | Ø В0    | 1     |     | 1.5 GHz  |       | 512 KB     | 15×                | 1.175 - 1.35      |         |                     | 500 MHz |         | DDR3-1066  |       | Jan 4, 2011  |
| E-300   | B0      |       |     | 1.3 GHz  |       |            | 13×                |                   | HD 6310 | 80:8:4              | 488 MHz | N/A     | DDR3-1333  |       | Aug 22, 2011 |
| E-350 d | Ø В0    |       |     | 1.6 GHz  | N/A   |            | 16×                | 1.25 - 1.35       |         |                     | 492 MHz |         | DDR3-1066  | 10.14 | Jan 4, 2011  |
| E-450   | CO      | 2     |     | 1.65 GHz | N/A   | 2 × 512 KB | 16.5×              |                   | HD 6320 | 80:8:4              | 508 MHz | 600 MHz | DDR3-1333  | 18 W  | Aug 22, 2011 |
| E1-120  | 0       |       |     | 1.4 GHz  |       |            | 14×                |                   | HD 7310 | 80:8:4              | 500 MHz | N/A     | DDR3-1066  |       | 01 2012      |
| E2-180  | 0       |       |     | 1.7 GHz  |       |            | 17×                |                   | HD 7340 | 80:8:4              | 523 MHz | 680 MHz | DDR3-1333  |       | Q1 2012      |
| Ontari  | o'' (40 | nm)   |     |          |       |            |                    |                   |         |                     |         |         |            |       |              |
| lodel   | Step.   |       | 1   |          |       | CPU        |                    |                   |         | G                   | PU      |         | Memory     | TDP   | Released     |
|         | orop,   | Cores | Fr  | req. T   | urbo  | L2 Cache   | Multi <sup>1</sup> | V <sub>core</sub> | Model   | Config <sup>2</sup> | Freq.   | Turbo   | lineitiery |       |              |
| C-30 🗗  | -       | 1     | 1.2 | GHz      |       | 512 KB     | 12×                | 1.25 - 1.35       |         |                     |         |         |            |       |              |

| C-50 & | B0 | 2 | 1.0 GHz | N/A      | 2 × 512 KB | 10×      | 1.05 - 1.35 | HD 6250 | 80:8:4 | 276 MHz | N/A     | DDR3-1066 | 9 W | Jan 4, 2011  |
|--------|----|---|---------|----------|------------|----------|-------------|---------|--------|---------|---------|-----------|-----|--------------|
| C-60   | C0 | 2 | 1.0 GHZ | 1.33 GHz | 2 × 912 KD | 10-13.3× |             | HD 6290 | 80:8:4 | 276 MHz | 400 MHz |           |     | Aug 22, 2011 |

#### "Desna" (40nm)

| Madal  | Cton  |       |         | C     | :PU        |                    |                   |         | GP                  | U       |       |           | TOD   | Delegend     |
|--------|-------|-------|---------|-------|------------|--------------------|-------------------|---------|---------------------|---------|-------|-----------|-------|--------------|
| Model  | step. | Cores | Freq.   | Turbo | L2 Cache   | Multi <sup>1</sup> | V <sub>core</sub> | Model   | Config <sup>2</sup> | Freq.   | Turbo | Memory    | TDP   | Released     |
| Z-01 🗗 | B0    | 2     | 1.0 GHz | N/A   | 2 × 512 KB | 10                 |                   |         | 00.0.4              | 276 MHz | NI/A  | DDR3-1066 | 5.9 W | June 1, 2011 |
| Z-03   |       | 2     | 1.0 GHZ | DV/A  | 2 × 912 KD | 10×                |                   | HD 6250 | 00.0.4              |         | N/A   | DDR3-1066 | 4.5 W | Q1 2012      |

#### The Brazos notebook platform [37]



### Positioning the Hudson-M1 (A50M) Fusion Controller Hub (FCH) [38]

- All models support up to 4 channels of HD audio.
- A55T support up to 2 channels of HD audio.

| Model | Codename                                   | Platform                       | Fab<br>(nm) | UMI                       | SATA                     | USB<br>3.0 + 2.0 + 1.1 | RAID             | Gb Ethernet<br>MAC                      | 33 MHz<br>PCI    | SD <sup>1</sup> | VGA<br>DAC | TDP<br>(W) | Features / Notes                                     |
|-------|--------------------------------------------|--------------------------------|-------------|---------------------------|--------------------------|------------------------|------------------|-----------------------------------------|------------------|-----------------|------------|------------|------------------------------------------------------|
| A55T  | Hudson <mark>-</mark> M2T <sup>[N 1]</sup> | "Brazos" T                     |             | ×2 Gen 1                  | 1 × 3 Gbit/s<br>AHCI 1.1 | 0+8+0                  | No               | No                                      |                  | SDIO            | No         |            |                                                      |
| A50M  | Hudson-M1 <sup>[N 1]</sup>                 | "Brazos"                       |             | ×4 Gen 1 <sup>[M 1]</sup> | 6 × 6 Gbit/s             | 0+14+2                 |                  |                                         |                  | No              |            |            | ~920mw idle                                          |
| A60M  | Hudson-M2 <sup>(N-1)</sup>                 | "Sabine"                       |             | ~~~~~                     | AHCT1.2                  |                        |                  | /////////////////////////////////////// |                  |                 | Yes        |            |                                                      |
| A68M  | Hudson-M3L <sup>(N 1)</sup>                | "Brazos"<br>2.0                | 65          | ×4 Gen 1                  | 2 × 6 Gbit/s<br>AHCI 1.2 | 2+8+0                  | RAID 0,1         | 10/100/1000                             | No               | Yes             | No         | 4.7        | ~750mw idle                                          |
| A70M  | Hudson-M3 <sup>[N 1]</sup>                 | "Indus"<br>"Comal"<br>"Sabine" |             | +DP                       | 6 × 6 Gbit/s<br>AHCI 1.2 | 4 + 10 + 2             | 14 10 0,1        |                                         |                  |                 | Yes        |            | First native<br>USB 3.0<br>controller <sup>[6]</sup> |
| A45   | Hudson-D1 <sup>[N 2]</sup>                 |                                |             | ×4 Gen 2 <sup>[M 2]</sup> | 6 × 3 Gbit/s<br>AHCI 1.1 | 0 + 14 + 2             | No               | No                                      | Up to<br>4 slots | No              | No         |            |                                                      |
| A55   | Hudson-D2 <sup>[N 2]</sup>                 |                                |             |                           | And LT                   |                        |                  |                                         |                  |                 |            | 7.6        |                                                      |
| A75   | Hudson-D3 <sup>[N 2]</sup>                 | "Lynx"                         | 65          | ×4 Gen 2<br>+DP           | 6 × 6 Gbit/s<br>AHCI 1.2 | 4 + 10 + 2             | RAID 0,1,10      | 10/100/1000                             | Up to<br>3 slots | Yes             | Yes        | 7.8        | First native<br>USB 3.0<br>controller <sup>(6)</sup> |
| A85X  | Hudson-D4                                  | "Virgo"                        |             |                           | 8 × 6 Gbit/s<br>AHCI 1.2 |                        | RAID<br>0,1,5,10 |                                         |                  |                 |            |            |                                                      |
| A55E  | Hudson-E1 <sup>[N 3]</sup>                 |                                | 65          | ×4 Gen 2                  | 6 × 6 Gbit/s<br>AHCI 1.2 | 0 + 14 + 2             | RAID<br>0,1,5,10 | 10/100/1000                             | Up to<br>4 slots | No              | No         | 5.9        |                                                      |

Codenames:

M: for notebook platform

UMI:

**D**: for desktop platform

E: for embedded platform

UMI ×4 Gen 1 is based on <u>PCIe 1.1</u> × 4 lanes, giving 1 GBps bandwidth UMI ×4 Gen 2 is based on <u>PCIe 2.0</u> × 4 lanes, giving 2 GBps bandwidth

#### Die plots of the Zacate, Ontario and Desna APUs

All three APUs have the same die plot, as demonstrated below.

### Die plots of the Zacate and Ontario processors [28], [31]

#### Zacate

### Ontario



# 2.3.2 Mobile APU lines of the Brazos platform (16)

#### **Die plots of the Zacate and Desna processors** [36]

#### Zacate



#### Desna



### **Contrasting the die plots of Ontario and Intel's Pineview dual core Atom processor** [39]



# 2.3.2 Mobile APU lines of the Brazos platform (18)

#### **OpenCL programming support for the Ontario and the Zacate lines**

AMD's APP SDK 2.3 (1/2011) (formerly ATI Stream SDK) (provides OpenCL support for both lines .

New features of APP SDK 2.3 [40]

Improved OpenCL runtime performance:

Improved kernel launch times.

Improved PCIe transfer times.

Enabled DRMDMA for the ATI Radeon 5000 Series and AMD Radeon 6800 GPUs that are specified in the <u>Supported Devices</u>.

Increased size of staging buffers.

Enhanced Binary Image Format (BIF).

Support for UVD video hardware component through OpenCL (Windows 7).

Support for AMD E-Series and C-Series platforms (AMD Fusion APUs).

Support for Northern Islands family of devices.

Support for AMD Radeon<sup>™</sup> HD 6310 and AMD Radeon<sup>™</sup> 6250 devices.

Support for OpenCL math libraries: FFT and BLAS-3, available for download at <u>AMD Accelerated</u> <u>Parallel Processing Math Libraries</u>.

Preview feature: An optimization pragma for unrolling loops.

Preview feature: Support for CPU/X86 image. This enables the support for Image formats, as described in the Khronos specification for OpenCL, to be run on the x86 CPU. It is enabled by the following environment variable in your application: CPU\_IMAGE\_SUPPORT.

### Positioning of the Bobcat-based Brazos mobile lines vs. Intel's lines [41]



#### Remark

A-Series processors address the midrange and high-end market [41]

#### Main features of Bobcat-based ultra-portable mobile lines with a TDP of ~10-20W

| Base arch./<br>stepping                   | Intro       | Ultra-<br>portable<br>mobile<br>family | Series          | Techn. | Core count<br>(up to)                                  | L2<br>(up to)               | L3            | GPU<br>(APU) | Memory<br>(up to) | TDP<br>[W] | Socke<br>t   |
|-------------------------------------------|-------------|----------------------------------------|-----------------|--------|--------------------------------------------------------|-----------------------------|---------------|--------------|-------------------|------------|--------------|
|                                           | 1/2011      | Zacate<br>(not SoC)                    | E<br>Series     | 40 nm  | 2                                                      | 512 KB/<br>core<br>Private  | -             | Yes          | DDR3L-<br>1333    | 18         | FT1<br>(BGA) |
| <b>Family14h</b><br>(00h-0Fh)<br>(Bobcat) | 6/2012      | Zacate<br>(not SoC)                    | E1/E2<br>Models | 40 nm  | 2                                                      | 512 KB/<br>core<br>Private  | -             | Yes          | DDR3L-<br>1333    | 18         | FT1<br>(BGA) |
|                                           | 1/2011      | Ontario                                | C<br>Series     | 40 nm  | 2                                                      | 512 KB/<br>core<br>private  | -             | Yes          | DDR3-<br>1066     | 9          | FT1<br>(BGA) |
| Family 16h<br>(10H-1fH)<br>(Jaguar)       | 5/2013      | Kabini<br>(SoC)                        | A<br>Series     | 28 nm  | 4                                                      | 2 MB<br>shared              | -             | Yes          | DDR3L-<br>1866    | 9/<br>15   | FT3          |
| Family 16h                                | 4/2014      | Beema<br>(SoC)                         | A<br>Series     | 28 nm  | 4 cores<br>with a<br>shared L2<br>cache                | 2 MB<br>shared              | -             | Yes          | DDR3L-<br>1866    | 15         | FT3b         |
| (30H-3fH)<br>(Puma+)                      | 5/2015      | Carrizo-L<br>(SoC)                     | A<br>Series     | 28 nm  |                                                        | 2 MB<br>shared              | -             | Yes          | DDR3L-<br>1866    | 10/<br>15  | FP4          |
| Family 17h<br>(00H-0fH)<br>(Zen)          | 10/201<br>7 | Raven<br>Ridge<br>(SoC)                | Ryzen<br>7/5/3  | 14 nm  | 4-core CCX,<br>private L2<br>and shared<br>L3 cache(s) | <sup>1</sup> ⁄2 MB/<br>core | 1 MB/<br>core | Yes          | DDR4-<br>2400     | 15         | AM4          |

APU: Accelerated Processing Unit (CPU +GPU) CCX: Core CompleX

<sup>2</sup>: 2\*512 KB for Turion X2, 2\*1 MB for Turion X2 Ultra

UMI: Universal Media Interface

# **Die plot of the Zacate APU** [42]



# 2.3.2 Mobile APU lines of the Brazos platform (22)

#### "Zacate" (40nm)

[edit]

edit

- All models support: SSE, SSE2, SSE3, SSSE3, SSE4a, NX bit, AMD64, PowerNow!, AMD-V
- Memory support: DDR3 SDRAM, DDR3L SDRAM (Single-channel, up to 1066 MHz)
- Config GPU are Unified Shaders (Vertex shader/Geometry shader/Pixel shader) : Texture mapping unit : Render
   Output unit

|                                |              | $\frown$ |              |                    |                 |              |               |              |             | $\frown$ |             |                    |
|--------------------------------|--------------|----------|--------------|--------------------|-----------------|--------------|---------------|--------------|-------------|----------|-------------|--------------------|
| Model<br>Number                | CPU<br>cores | Freq.    | L2 Cache     | Multi <sup>1</sup> | Voltage         | Model<br>GPU | Config<br>GPU | GPU<br>Freq. | ими         | TDF      | Socket      | Release<br>Date    |
| E-Series Ę <del>,</del><br>240 | 1            | 1.5 GHz  | 512 KB       | 18.75x             | 1.175 -<br>1.35 | HD 6310      | 80:8:4        | 500MHz       | 2.5<br>GT/s | 18<br>W  | BGA-<br>413 | January 4,<br>2011 |
| E-Series Ę <sub>5</sub><br>350 | 2            | 1.6 GHz  | 2x 512<br>KB | 20x                | 1.25 - 1.35     | HD 6310      | 80:8:4        | 500MHz       | 2.5<br>GT/s | 18<br>W  | BGA-<br>413 | January 4,<br>2011 |

Table: Main features of AMD's mainstream Zacate Fusion APU line [43]

#### "Ontario" (40nm)

- All models support: SSE, SSE2, SSE3, SSSE3, SSE4a, NX bit, AMD64, PowerNow!, AMD-V
- Memory support: DDR3 SDRAM, DDR3L SDRAM (Single-channel, up to 1066 MHz)
- Config GPU are Unified Shaders (Vertex shader/Geometry shader/Pixel shader) : Texture mapping unit : Render
   Output unit

| Model<br>Number               | CPU<br>cores | Freq.   | L2 Cache  | Multi <sup>1</sup> | Voltage        | Model<br>GPU | Config<br>GPU | GPU<br>Freq. | имі         | TDP | Socket      | Release Date       |
|-------------------------------|--------------|---------|-----------|--------------------|----------------|--------------|---------------|--------------|-------------|-----|-------------|--------------------|
| C-Series C <sub>-</sub><br>30 | 1            | 1.2 GHz | 512 KB    | 15x                | 1.25 -<br>1.35 | HD 6250      | 80:8:4        | 280MHz       | 2.5<br>GT/s | 9 W | BGA-<br>413 | January 4,<br>2011 |
| C-Series C <sub>5</sub>       | 2            | 1.0 GHz | 2x 512 KB | 12.5x              | 1.05 -<br>1.35 | HD 6250      | 80:8:4        | 280MHz       | 2.5<br>GT/s | 9 W | BGA-<br>413 | January 4,<br>2011 |

Table: Main features of AMD's low power Ontario Fusion APU line [43]

# 2.4 Bobcat-based APU lines of the Brazos 2.0 platform

# 2.4 APU lines of the Brazos 2.0 platform (1)

#### **APU lines of the Brazos 2.0 platform**

Introduced in 6/2012

#### Brazos 2.0 APU line

Zacate APUs

E2-1800 E1-1200

18 W

DX 11 support OpenCL 1.1 support GPU Turbo mode available on the E2-1800 model.

The same models are used both in value desktops and value notebooks (termed also as extra portable notebooks).

### **Contrasting the Brazos 2.0 and Brazos platforms**

### a) Contrasting the Brazos 2.0 APUs and the previous Brazos APUs [44].

Neither of the E2-1800 and the E1-1200 CPU cores are new designs but only minor upgrades of the previous Brazos E-450 and E-300 CPUs, as detailed in the subsequent Remark.

### Remark

Differences of the Brazos 2.0 APUs from the previous Brazos APUs

 The CPUs of the Brazos 2.0 APUs are minor upgrades of the previous Brazos line, as follows: The E2-1800 is an upgrade of the previous Brazos E-450 whereas the E2-1200 is an upgrade of the E-300.

The improvement is restricted to a few % clock speed increase, as indicated in the Table below.

• The GPUs associated with the Brazos 2.0 GPUs are rebranded GPUs of the HD 6xxx line, as follows:

The HD 7310 is a rebranded HD 6310 whereas the HD 7340 is a rebranded HD 6320, both with a slight clock speed increase, as shown below.

|                     | Brazo   | os 2.0  | Braz     | zos     |  |  |  |
|---------------------|---------|---------|----------|---------|--|--|--|
|                     | E2-1800 | E2-1200 | E-450    | E-300   |  |  |  |
| fc of the CPU       | 1.7 GHz | 1.4 GHz | 1.65 GHz | 1.3 GHz |  |  |  |
| GPU type            | HD 7310 | HD 7340 | HD 6310  | HD 6320 |  |  |  |
| Basic fc of the GPU | 523 MHz | 500 MHz | 508 MHz  | 488 MHz |  |  |  |
| Turbo fc of the GPU | 680 MHz |         | 600 MHz  |         |  |  |  |

# b) Contrasting the FCHs of the Brazos 2.0 platform with those of the previous Brazos platform [45].

|                               | Note           | book           | Des            | ktop           |
|-------------------------------|----------------|----------------|----------------|----------------|
| 2011 Branding                 | A50M           | A68M           | A45            | A68            |
| Target Platform               | "Brazos"       | "Brazos 2.0"   | "Brazos"       | "Brazos 2.0"   |
| PCI Express Support (CF)      | Y              | 1x4            | Y              | 1x4            |
| Clock Gen                     | N              | Y              | N              | Y              |
| SATA                          | 6 x 6Gb/s      | 2 x 6Gb/s      | 6 x 3Gb/s      | 3 x 6Gb/s      |
| HD Audio                      | Up to 4-codecs | Up to 4-codecs | Up to 4-codecs | Up to 4-codecs |
| PCIe GPPs                     | 4 x1 Gen 2     |
| Unified Media Interface (UMI) | x4 Gen 1       | x4 Gen 2       | x4 Gen 2       | x4 Gen 2       |
| USB 3.0 + 2.0 + 1.1 Ports     | 0 + 14 + 2     | 2 + 8 + 0      | 0 + 14 + 2     | 2 + 8 +0       |
| APU Fan Control               | Y              | Y              | Y              | Y              |
| SD Controller                 | N              | Y              | N              | Y              |
| 33MHz PCI                     | N              | N              | Up to 4 slots  | Up to 3 slots  |

As indicated, the major differences are additional PCIe 2.0 x4 and USB 3.0 support in Brazos 2.0 FCHs.

# Battery life comparison of Brazos, Brazos 2.0 and Intel's Atom models [45]



# Main features of Mobile APU models of the Brazos 2.0 platform [45]

| 2012 AMD  | 2012 AMD E-Series APU for Essential Notebooks and Desktops |     |              |                         |                         |                         |          |                                           |  |  |  |  |
|-----------|------------------------------------------------------------|-----|--------------|-------------------------|-------------------------|-------------------------|----------|-------------------------------------------|--|--|--|--|
| APU Model | AMD<br>Radeon™<br>Brand                                    | TDP | CPU<br>Cores | CPU Clock<br>(Max/Base) | AMD<br>Radeon™<br>Cores | GPU Clock<br>(Max/Base) | L2 Cache | Max DDR3                                  |  |  |  |  |
| E2-1800   | HD 7340                                                    | 18W | 2            | 1.7GHz                  | 80                      | 680MHz/<br>523MHz       | 1MB      | DDR3-1333<br>DDR3L1066<br>DDR3U-<br>1066  |  |  |  |  |
| E1-1200   | HD 7310                                                    | 18W | 2            | 1.4GHz                  | 80                      | 500MHz                  | 1MB      | DDR3-1066<br>DDR3L-1066<br>DDR3U-<br>1066 |  |  |  |  |

# The Brazos 2.0 platform [45]

#### "Zacate" APUs Dual Core CPU - 40nm "Bobcat" core - 1 MB L2, 64-bit FPU - 18W TDP Complete ISA support - SSE1-3 and virtualization Memory Control x86 Cores SIMD •Graphics Core Engine – 80 Radeon<sup>™</sup> Cores Array – New AMD Radeon<sup>™</sup> HD 7000 Series GPU DirectX® 11 capable - OpenCL<sup>™</sup> 1.1 enabled Universal Improved graphics boost Video Decoder AMD VISION Engine •3rd generation Unified Video Decoder •DDR3 1066-1333, 2 DIMMs Hami •FT1 BGA package (same as current Brazos) Platform Interfaces P **Display and I/O** •Two dedicated digital display interfaces dvi - Configurable as HDMI, DVI, and/or Display Port - Also supports single link LVDS for internal panels •VGA SATA •8 PCle Gen 2 LPC •"Hudson-3L" FCH SPI FCH "Hudson"\_3L PCie - USB 3.0 Support (2 ports) / USB 2.0 (8 ports) A68/A68M 4X1 FCH HD AUDIO CIR - Desktops: A68 USB2/USB3 - Mobiles: A68M SD Controller

# 2.4 APU lines of the Brazos 2.0 platform (8)

# Positioning the Brazos 2.0 APU models [45]

|          | AMD                                                                    | Intel         |
|----------|------------------------------------------------------------------------|---------------|
| 2.0      | AMD E2-1800 Accelerated Processor<br>vith AMD Radeon™ HD 7340 Graphics | Pentium       |
| "Brazos  | AMD E1-1200 Accelerated Processor<br>vith AMD Radeon™ HD 7310 Graphics | Celeron       |
| "Brazos" | AMD C-60 Accelerated Processor<br>with AMD Radeon™ HD 6290 Graphics    | Atom          |
|          | AMD Z-01 Accelerated Processor<br>with AMD Radeon™ HD 6250 Graphics    | Atom Z-Series |

# 2.5 APU lines of the Embedded G-Series platform

### **Embedded G-Series**

- Introduced in 1/2011
- Typical use: In TV Set Top Boxes
- They are basically Zacate, Ontario or Desna designs, as shown below.

# Main features of the embedded GPU lines [29]

|        |                      |             | Core<br>count | L2       | fc<br>CPU   | GPU<br>Direct X11 | fc<br>GPU                                                          |                                                                  |                         |
|--------|----------------------|-------------|---------------|----------|-------------|-------------------|--------------------------------------------------------------------|------------------------------------------------------------------|-------------------------|
|        |                      | 2C 2x512 KB |               | 1.4-1.65 | HD 6310     | 500/520           | 1(11<br><b>Zacate</b><br>40 nm, Up to DDR3-1333<br>FTI BGA<br>T48N |                                                                  |                         |
| Е      | G-series<br>with GPU | 18 W        | 1C            | 512 KB   | GHz         | HD 6320           | MHz                                                                | T56N<br>T52R<br>(1C disabled)                                    |                         |
| m<br>b |                      | 9 W -       | 2C            | 2x512 KB | 1.0-1.2     | HD 6250           | 280 MHz                                                            | 280 MHz                                                          | 1/11<br>Ontario<br>T40N |
| e      |                      | 9 W         | 1C            | 512 KB   | GHz         | HD 6290           | 200 1112                                                           | T44R                                                             |                         |
| d<br>d |                      | 6.4/5.5 W   | 2C            | 2x512 KB | - 1.0 GHz   | HD 6250           | 280 MHz                                                            | 6/11<br><b>Desna</b><br>T40E                                     |                         |
| e      |                      |             | 1C            | 512 KB   |             | пD 6250           | 200 1112                                                           | T40R<br>(1 C disabled)                                           |                         |
| d      |                      | 18 W -      | 2C            | 2x512 KB | + 1.4 GHz   |                   |                                                                    | <u>کرا ا</u><br><b>Zacate</b><br>T48 L                           |                         |
|        | G-Series             | 18 W ·      | 1C            | 512 KB   | 1.4 GHZ     | _                 | _                                                                  | T30 L<br>(GPU disabled)                                          |                         |
|        | without GPU          | 5 W         | 1C            | 512 KB   | 0.8-1.0 GHz | _                 | _                                                                  | 3(11<br><b>Desna</b><br>T24 L<br>(1C disabled)<br>(GPU disabled) |                         |

# 2.5 APU lines of the Embedded G-Series platform (3)

# **Graphics support of the G-Series** [46]

- Direct X 11 with Shader Model 5
- Open CL 1.1
- Open GL 3.2.2.1

#### System architecture of a Bobcat-based embedded GPU lines [47]



# Main features of the Controller Hubs of the lines of the Embedded G-Series platform [47]

| Model | PCI<br>Express®                                                     | PCI                                          | Gigabit<br>Ethernet<br>MAC | RAID                                            | SATA              | USB               | Additional<br>Interfaces/Features                                                                                           |  |  |
|-------|---------------------------------------------------------------------|----------------------------------------------|----------------------------|-------------------------------------------------|-------------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------|--|--|
| A55E  | 4x1 or 1x 4<br>PCIe®<br>Generation<br>2 UMI<br>connection<br>to APU | 33MHz<br>with<br>support<br>for 4<br>masters | Yes                        | 0/1/5/10 support<br>with FIS-based<br>switching | 6x 6Gb/s<br>ports | 14x v2.0<br>ports | SPI, LPC, SMBus,<br>CIR, HD audio, up<br>to 102 GPIO, Fan<br>control, Integrated<br>Clock Generation,<br>pin compatible for |  |  |
| A50M  | 4x1 or 1x 4                                                         | No                                           | No                         | No                                              |                   |                   | A50M and A55E                                                                                                               |  |  |

# 2.5 APU lines of the Embedded G-Series platform (6)

Use of a Bobcat-based embedded APU in a TV set top box [48]



# 3. Family 16h Models 00h-0Fh (Jaguar-based) APU lines

- 3.1 Overview of the Jaguar-based APU lines
- 3.2 The Jaguar core
- 3.3 Jaguar-based APU lines

# 3.1 Overview of the Jaguar-based APU lines

### 3.1 Overview of the Jaguar-based APU lines -1

• Jaguar-based APUs are AMD's second low power oriented product series that were announced at the Hot Chips 2012 in 8/2012 and introduced in 5/2013.

They focus on the mobile market, especially on ultra-light notebooks and tablets but include also a line of micro servers, released in 5/2013 as well.

- Jaguar-based products follow AMD's successful Bobcat-based product lines of the Brazos platform that are typically used in ultra-light notebooks and tablets.
- Jaguar-based APUs compete with Intel's Atom line of processors, and also with ARM designs, as ARM's processor design were used in most smartphones and tablets, e.g. in Apple's iPad, Google's Galaxy Nexus 7 and Amazon's Kindle Fire [49].
- They are part of AMD's new market strategy that emphasizes the mobile market segment in AMD's overall market policy.
- Jaguar-based APU products include basically up to four Jaguar cores and a GPU with similar capabilities than a low-end discrete graphics card.
- 28 nm technology, ~ 110 mm<sup>2</sup> die size (compared to 75 mm<sup>2</sup> of Bobcat-based APUs + 28 mm<sup>2</sup> for the associated FCH) [50] and 1178 mtrs (vs. 450 mtrs for Bobcat-based APUs) [51].

#### **Design goals of the Jaguar-based APU lines** [26]



AMD TECH DAY | MAY 2013 | EMBARGO - MAY 23 2013 12:01 AM EST

Remark

Process portability means the ability of different manufacturers to take the Jaguar design and fabricate it [52].

**Positioning the Jaguar-based APU lines** (The Kyoto micro server line not shown)[based on 14]



# Brand names of AMD's Jaguar-based processor lines

|           | Launched in                                         | 2011                                      | 2012                                | 2013                                | 2014                               | 2015                              |
|-----------|-----------------------------------------------------|-------------------------------------------|-------------------------------------|-------------------------------------|------------------------------------|-----------------------------------|
|           |                                                     | Family 14h<br>(00h-0Fh)<br>(Bobcat)       | Family 14h<br>(00h-0Fh)<br>(Bobcat) | Family 16h<br>(00h-0Fh)<br>(Jaguar) | Family 16h<br>(30h-3Fh)<br>(Puma+) | Family 16h<br>(30h-3Fh)<br>(Puma+ |
|           | 4P servers                                          |                                           |                                     |                                     |                                    |                                   |
| or S      | 2P servers                                          |                                           |                                     |                                     |                                    |                                   |
| Servers   | 1P servers                                          |                                           |                                     |                                     |                                    |                                   |
| Se        | (85-140 W)                                          |                                           |                                     |                                     |                                    |                                   |
|           | High perf.<br>(~95-125 W)                           |                                           |                                     |                                     |                                    |                                   |
| Desktops  | <b>Mainstream</b><br>(~65-100 W)                    |                                           |                                     |                                     |                                    |                                   |
| Des       | <b>Entry level</b><br>(~30-60 W)                    |                                           |                                     |                                     |                                    |                                   |
| oks       | High performance/<br>mainstream/entry<br>(~30-60 W) |                                           |                                     | Kabini A6                           |                                    |                                   |
| Notebooks | <b>Ultra portable</b><br>(~10-15 W)                 | Zacate<br>E-Series<br>Ontario<br>C-Series | Zacate<br>E1/E2                     | Kabini<br>A/E-Series                | Beema<br>A/E-Series                | Carrizo-L<br>A/L-Series           |
|           | Tablet<br>(~5 W)                                    | Desna<br>Z-Series                         |                                     | Temash<br>A Series                  | Mullins<br>A Series/E1             |                                   |

# 3.1 Overview of the Jaguar-based APU lines (6)

#### **Overview of the Jaguar-based APU lines**



# **Overview of Jaguar cores in game consoles** [52], [53]

- Sony's PS4 (Playstation 4) (the previous PS3 incorporated a Sony/Toshiba/IBM Cell processor and an Nvidia graphics chip, the GeForce 7000)
- Microsoft's Xbox One (the previous Xbox 360 incorporated a triple core IBM PowerPC-based processor and an AMD GPU)
- Nintendo Wii U (the previous Wii incorporated a single-core IBM PowerPC-based processor and an ATI GPU).

# 3.1 Overview of the Jaguar-based APU lines (8)

# **Design goals of the Jaguar core** [3]

# "JAGUAR" DESIGN GOALS

Improve on "Bobcat": performance in a given power envelope

- More IPC
- Better Frequency at given Voltage
- Improve power efficiency thru clock gating and unit redesign

Update the ISA/Feature Set



# Manually power optimized design of the Jaguar core [4]

- The Jaguar core uses the 28 nm process technology and has a small area of 3.1 mm<sup>2</sup>.
- The core design was manually optimized by using Calypto's PowerPro power simulation tool.
- This tool analysis pre synthesis design and provides hints for improving clock gating.
- The design team made a number of design iteration cycles, each requiring a number of weeks, to reduce the number of active (not gated) flip-flops in the design.

# Example: Reducing the number of active (not gated) flops during cpu-halt in the design iteration process



#### Figure: Average clocked flops during cpu-halt [4]

#### Results of the manually optimized design of the Jaguar core

One of the circuit designers summarized the results of the design efforts as follows:

"As design goals included increasing the frequency and instructions per clock cycle (IPC) in this generation of the core, designers worked on timing and minimizing the gates between flops. The goal at the start of the project was to lower typical application power by 10%. Ultimately, using a design methodology that included deployment of PowerPro® from Calypto®, AMD was able to lower the typical power by approximately 20% while increasing frequency at the given voltage by over 10%" [4].

#### **IPC** improvements and power gating efficiency of the Jaguar core vs. the Bobcat core [4]

|                 | Bobcat<br>x86 IPC | Jaguar<br>x86 IPC | Bobcat Core<br>% Flops Active | Jaguar Core %<br>Flops Active |
|-----------------|-------------------|-------------------|-------------------------------|-------------------------------|
| Halt            | 0.00              | 0.00              | 8.2                           | 1.2                           |
| АррТур          | 0.95              | 1.10              | 10.3                          | 7.7                           |
| Bobcat<br>Virus | 1.74              | 1.78              | 15.4                          | 12.9                          |
| Jaguar<br>Virus | 0.81              | 1.86              | 14.3                          | 15.0                          |

*Table: Comparison of IPC and clock-gating improvements* [4]

#### Remark

**IPC improvements** were achieved also by manually optimizing the design for timing by using Monte Carlo simulation tools.

## 3.2 The Jaguar core

- 3.2.1 The microarchitecture of the Jaguar core
- 3.2.2 Main features of the Jaguar core

## 3.2.1 The microarchitecture of the Jaguar core

#### 3.2.1 The microarchitecture of the Jaguar core

The Jaguar cores are the key parts of the Compute Unit (x86 "engine") of the APU, as indicated in the next slide.

#### Block diagram of a Jaguar based APU [63]



**Key difference in the layout of the core complex in Bobcat and Jaguar based APUs** [35], [64]

#### Bobcat based APU

Jaguar-based APU



Note that Jaguar-based APUs are based on up to 4 cores with a shared (2 MB) L2 cache whereas Bobcat-based APUs are based on up to 2 cores with private 512 KB L2 caches.

#### Block diagram of the Compute Unit (CU) of the Jaguar core [3]



#### Major ISA changes vs. the Bobcat core [63]

- Physical address extended to 40 bits (from 36 bits)
- Instruction set extensions
  - AVX
  - AES
  - SSE4.1, SSE4.2

#### Power consumption vs. quad-core utilization while running different applications [26]

#### **QUAD-CORE UTILIZATION**

### 

- If not all cores are being used, the unused power becomes available to the remaining cores
- Idle cores behave as thermal sinks to the active cores
- Power Density Multiplier (PDM) is applied to the TDP Limit of the active CPUs, based on the topology of the Idle CPUs and the ambient temperature
- Increases the CPU Thermal Headroom and lower thread count performance



#### The shared L2 cache of Jaguar [26]



## Shared Cache is major design addition in "Jaguar"

- Supports 4 cores
- ▶ Total shared 2MB, 16-way
  - Supported by 4 L2D banks of 512KB each
- L2 cache is inclusive allows using L2 tags as probe filter
  - Any line in a Core L1 instruction or data cache must be in the L2

#### The L2 cache interface of Jaguar [26]



- All connections routed thru L2 interface
- L2 tags reside in interface block
  - Divided into 4 banks
  - L2D bank lookup only after L2 tag hit
- ▶ L2 Interface block runs at core clock
  - L2D's run at half clock for power, only clocked when required
- New L2 stream prefetcher per core
  - Allows improved bandwidths & IPC
- Up to a total of 24 paired read + write transactions in flight
- 16 additional L2 snoop queue entries
  - Allows for handling coherent probes at high bandwidth

#### Expected benefit of the shared L2 cache

- The shared cache consists of four 512 KB data banks that are connected to the four cores via an L2 interface unit.
- The L2 interface unit runs at full speed whereas the data banks at half speed to save power.
- The data banks are only clocked when required.
- The shared design provides more L2 cache space for any core in a single threaded program and thus reduces the rate of cache misses but needs a new L2 data prefetcher per core [7].

#### **Comparing the microarchitecture of the Bobcat and Jaguar cores**



Jaguar core [3]

**Bobcat core** [8]

#### The front-end part of the microarchitecture of the Jaguar core [3]



#### **Comparing the front-end parts of Jaguar and Bobcat** [26]



#### Loop buffer [7], [26]

- One front-end improvement is the addition of a 4 element (4x 32 byte) loop buffer.
- Whenever a loop is detected instructions are fetched from the loop buffer rather than from the L1 cache.

Nevertheless, it is not a trace cache, like the one used in Intel's Pentium 4.

With the loop buffer the fetch/decode unit will not be shut down during subsequent iterations of the loop.

Using the loop buffer saves power but does not increase performance.

#### Remark

Using loop buffers to improve IPC or reduce power consumption of instruction fetching is not a new idea.

Different implementations were already used e.g. in Cray computers or in the Pentium M or Nehalem processors.

There are also numerous patents concerning loop buffers, see e.g. the US patent application 11/272,718 with a large number of references.

#### Integer execution in the Jaguar core [3]



#### 3.2.1 The microarchitecture of the Jaguar core (12)

#### **Comparing the integer execution units of Jaguar and Bobcat** [26]



#### FP execution in the Jaguar core [3]



#### Comparing the floating point units of Jaguar and Bobcat [26]



#### The core pipeline of Jaguar [3]

Two added stages (additional decode stage (iDec) and FP Reg. read stage (RegRd2)) to raise fc.



#### The core pipeline of Bobcat as comparison [8]



ison



#### The core floor plan of Jaguar [3]



#### Contrasting the core floor plans of Bobcat and Jaguar [3]

# CORE FLOOR PLAN COMPARISON "Bobcat" core in 40nm = 4.9 mm<sup>2</sup> 7 core macros, 2 L2 macros, **3 clock macros**



"Jaguar" core in 28nm = 3.1 mm<sup>2</sup> 3 core macros, 1 L2 macro, 1 clock macro

## 3.2.2 Main features of the Jaguar core

#### **3.2.2** Main features of the Jaguar core compared with the Bobcat core [9]

| Feature                         | "Bobcat"                            | "Jaguar"                                                            | User Benefit                                                        |
|---------------------------------|-------------------------------------|---------------------------------------------------------------------|---------------------------------------------------------------------|
| Core Frequency                  | 1.4 GHz/1.7 GHz                     | >10%                                                                | Improved CPU performance                                            |
| Instructions per clock          | °90% of<br>mainstream"              | >15%                                                                | Improved CPU performance                                            |
| Core Count                      | 2                                   | 4                                                                   | More performance for highly threaded apps                           |
| L2 Cache                        | 512KB per core                      | 2MB Shared                                                          | More performance for less threaded apps                             |
| Power efficiency                | Core C6 power<br>gating             | Enhanced core C6<br>power gating                                    | Better battery life                                                 |
| Instruction Sets                | AMD64,<br>SSE1-SSE3,<br>SSSE3,SSE4a | Bobcat +<br>SSE4.1, 4.2,<br>AES/CLMUL,<br>MOVBE,<br>AVX, F16C, BMI1 | Improved performance/power when applications use newer instructions |
| Physical address<br>capability  | 36-bit                              | 40-bit                                                              | Memory capacity scale-up                                            |
| Out of Order (000)<br>Execution | Yes                                 | Yes                                                                 | More performance than In-Order designs                              |
| Machine Width                   | 2-wide                              | 2-wide                                                              | Power efficient design point                                        |
| Floating Point Unit             | 64-bit data path                    | 128-bit data path                                                   | Better media experience                                             |

#### **Expected benefit of the shared L2 cache** [7]

- In the previous design each Bobcat core has a 512KB private L2 cache, clocked at ½ CPU speed.
- With Jaguar, AMD has opted to attach a single shared (up to 2 MB) L2 cache to the CPUs (providing still 512 KB per core).
- This cache pool is connected via an L2 interface unit, running at full processor speed, while the L2 cache itself still runs at 50% core clock.

#### The Core C6 state (CC6 state)/Package C6 state (PC6 state) of Jaguar (simplified) [54]

#### Entering the CC6 state

- 1. OS requests entering into the CC6 state by issuing a specific I/O read or HLT instruction.
- 2. The processor enters the CC6 state if associated monitors allow this.

#### The process of entering the CC6 state and power gating

Due to using a shared L2 cache the process of entering the CC6 state has changed, as follows:

- 1. Internal core state is saved to the L1 cache or to DRAM depending to internal settings.
- 2. L1 cache is flushed to the shared L2 cache by microcode.
- 3. Power is removed from the core and the Core PLL is powered down.

#### Exiting the Core C6 state (CC6 state)

There is a set of events, such as interrupts, which cause a core to exit the CC6 state.

#### Entering the Package C6 state (PC6 state)

- If all cores entered the CC6 state and a transition to the PC6 state is enabled (by monitors), the processor enters the PC6 state.
- The last core entering the CC6 state let flush L2 to DRAM by microcode.
- In the PC6 state VDD can be reduced to a non-operational voltage that does not retain core state but allows to reduce static and dynamic dissipation.

#### Exciting from the Package C6 state (PC6 state)

If one of the cores leaves the CC6 state the processor excites the PC6 state.

#### Improved C6/CC6 latencies of the Jaguar APU [3]

#### "JAGUAR" C6 Any Core can independently go into **Relative C6 Latencies Under** CC6 power gating Normalized Conditions Optimized microcode routines and hardware allow for fast CC6 entry/exit Shared L2 leaves more cache for the remaining active cores (IPC) Last core in the compute unit to be power gated flushes shared L2 in preparation for full C6. Hardware engines added to improve L2 flush "Bobcat" C6 "Jaguar" C6 "Jaguar" CC6 times.

## 3.3 Jaguar-based APU lines

- 3.3.1 Overview of the Jaguar-based APU lines
- 3.3.2 The Kyoto Opteron X microserver line
- 3.3.3 The Kabini notebook APU line
- 3.3.4 The Temash tablet APU line

## 3.3.1 Overview of the Jaguar-based APU lines

#### 3.3.1 Overview of the Jaguar-based APU lines (1)

#### 3.3.1 Overview of the Jaguar-based APU lines



#### The Kabini notebook and Temash tablet processors on AMD's roadmap for 2013 [65]



#### INDUSTRY-LEADING GRAPHICS, COMPUTE IP RAPIDLY LEVERAGED IN LOW POWER PLATFORMS VIA APUS

AMD

5 | 2012 Financial Analyst Day | February 2, 2012 | Consumerization, Cloud, Convergence. | Confidential

## 3.3.2 The Opteron X (Kyoto) microserver line

### 3.3.2 Opteron X (Kyoto) microserver line (1)

### 3.3.2 Opteron X (Kyoto) microserver line



### Key features of the Opteron X (Kyoto) X models [56]

| AMD Kyoto Offerings |              |              |     |              |             |      |  |  |  |  |
|---------------------|--------------|--------------|-----|--------------|-------------|------|--|--|--|--|
|                     | CPU<br>Cores |              |     |              |             |      |  |  |  |  |
| Opteron<br>X1150    | 4            | Up to 2.0GHz | -   | -            | 9 -<br>17W  | \$64 |  |  |  |  |
| Opteron<br>X2150    | 4            | Up to 1.9GHz | 128 | 266 - 600MHz | 11 -<br>22W | \$99 |  |  |  |  |

Socket: FT3 BGA

Comparing key features of the AMD's Opteron X and Intel's Atom S1200 lines [56]

| AMD OPTERON™ X-SERIES OUTPERFORMS INTEL® ATOM™ S1200 SERIES AMDZ ACROSS ALL KEY PERFORMANCE METRICS |                            |                            |               |  |  |  |  |  |
|-----------------------------------------------------------------------------------------------------|----------------------------|----------------------------|---------------|--|--|--|--|--|
|                                                                                                     | AMD<br>Opteron X-Series    | Intel<br>Atom S1200 Series | AMD ADVANTAGE |  |  |  |  |  |
| x86 CPU Cores                                                                                       | 4                          | 2                          | 2X            |  |  |  |  |  |
| x86 CPU Threads                                                                                     | 4                          | 4                          |               |  |  |  |  |  |
| GPU Cores*                                                                                          | 128 AMD Radeon™ 8000 Cores | None                       | AMD Only      |  |  |  |  |  |
| Max. DRAM per Socket                                                                                | 32 GB**                    | 8 GB                       | 4X            |  |  |  |  |  |
| Max. DRAM Speed                                                                                     | DDR3-1600                  | DDR3-1333                  | 1.2X          |  |  |  |  |  |
| L2 Cache                                                                                            | 2 MB                       | 1 MB                       | 2X            |  |  |  |  |  |
| Throughput Performance (est.) <sup>2</sup>                                                          | 28.9 @ 2 GHz (CPU)         | 13.0 @ 2GHz                | 2.2X          |  |  |  |  |  |
| Single Thread Performance (est.) <sup>2</sup>                                                       | 10.0 @ 2 GHz (CPU)         | 5.2 @ 2 GHz                | 1.9X          |  |  |  |  |  |
| Integrated SATA Ports                                                                               | Yes                        | No                         | AMD Only      |  |  |  |  |  |

\* X-Series APU only; \*\*Based on the capability of the Kyoto memory controller and expected 16GB DIMM availability.

SERVER BU ANDREW FELDMAN | MAY 2013 | CONFIDENTIAL NDA ONLY-UNDER EMBARGO UNTIL May 29, 2013

### Block diagram of the Opteron X2150 APU [57]



### 3.3.2 Opteron X (Kyoto) microserver line (5)

### **Overview of the Opteron-X (Kyoto) series processors** [58]



SERVER BU ANOREW FELOMAN | MAY 2013 | CONFIDENTIAL

NDA ONLY-UNDER EMBARGO UNTIL May 29, 2013

# 3.3.3 The Kabini notebook APU line

### 3.3.3 The Kabini notebook APU line (1)

### 3.3.3 The Kabini notebook line



### 3.3.3 Positioning the Kabini notebook APU line -1 [based on 14]



### **Positioning the Kabini notebook APU line -2** [2]

| AMD 2013 Client Roadmap           |                                                                                         |                                                                                                         |                                                                           |                                                                                        |  |  |  |  |
|-----------------------------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|----------------------------------------------------------------------------------------|--|--|--|--|
| Performance                       | AMD 2 <sup>nd</sup> Generation A-<br>Series APUs codename                               | AMD 2 <sup>nd</sup> Generatio<br>codename "Richla                                                       |                                                                           | "Kaveri" 3 <sup>rd</sup> Gen A-Series<br>APUs                                          |  |  |  |  |
| Mainstream                        | "Trinity"                                                                               | Low Voltage (17-25)<br>2-4 "Piledriver" CPU                                                             | cores                                                                     | 2-4 "Steamroller" CPU Cores<br>Graphics Core Next (GCN) GPU<br>HSA Application Support |  |  |  |  |
|                                   | AMD C-Series and E-Serie<br>"Brazos 2.0"                                                | 2 <sup>nd</sup> Generation DirectX®11 GPU<br>"Kabini"A-Series an<br>complete SoC<br>Low Voltage (9-15W) |                                                                           | d E-Series APUs                                                                        |  |  |  |  |
| Low-Power<br>Essential            | Low Voltage (9-18W)<br>2 "Bobcat" CPU Cores<br>DirectX®11 capable GPU                   | 2-4 "Jaguar" CPU cores<br>Graphics Core Next (GCN) GPU                                                  |                                                                           |                                                                                        |  |  |  |  |
| Ultra-Low Power<br>Tablet/Fanless | AMD Z-Series APU codena<br>1-2 "Bobcat" CPU Cores, Ult<br>(4.5W), DirectX®11 capable of | ra Low Voltage                                                                                          | "Temash" A-Series Eli<br>Ultra Low Voltage; Com<br>Graphics Core Next (GC | plete SoC; 2-4 "Jaguar" CPU Cores                                                      |  |  |  |  |

AMD roadmaps are subject to change without notice

40nm 32nm 28

AMD

January 7, 2013

phx.corporate-ir.net/External.File?item...t=1

# 3.3.3 The Kabini notebook APU line (4)

### Main features of Kabini notebook APU models [59]

### 2013 AMD MAINSTREAM APU SPECS

| Model          | Radeon™<br>Brand | TDP | CPU<br>Cores | CPU<br>Clock Speed<br>(Max/Base) | L2<br>Cache | Radeon™<br>Cores¹ | GPU<br>Clock Speed<br>(Max/Base) | MAX<br>DDR3<br>Speed |
|----------------|------------------|-----|--------------|----------------------------------|-------------|-------------------|----------------------------------|----------------------|
| AMD A-Series A | \PUs             |     |              |                                  |             |                   |                                  |                      |
| A6-5200        | HD 8400          | 25W | 4            | 2.0GHz                           | 2MB L2      | 128               | 600MHz                           | DDR3L-1600           |
| A4-5000        | HD 8330          | 15W | 4            | 1.5GHz                           | 2MB L2      | 128               | 500MHz                           | DDR3L-1600           |
| AMD E-Series A | \PUs             |     |              |                                  |             |                   |                                  |                      |
| E2-3000        | HD 8280          | 15W | 2            | 1.65GHz                          | 1MB L2      | 128               | 450MHz                           | DDR3L-1600           |
| E1-2500        | HD 8240          | 15W | 2            | 1.4GHz                           | 1MB L2      | 128               | 400MHz                           | DDR3L-1333           |
| E1-2100        | HD 8210          | 9W  | 2            | 1.0GHz                           | 1MB L2      | 128               | 300MHz                           | DDR3L-1333           |

### 3.3.3 The Kabini notebook APU line (5)

### Block diagram of the Kabini notebook APU [26]



### **Chip level power distribution** [60]

Power consumption

 (and hence performance)
 is set by the cooling
 capabilities of the
 platform

- Power varies a lot by workload
- We measure and manage the power of each component on the chip to generate the best performance/Watt





FCH: Fusion Controller Hub

### **Digital Power Monitoring** [60]

- In order to manage temperature and send the power wherever it's needed, we use power monitors in all chip components
- "Kabini" and "Temash" have power monitors in each CPU, the GPU, the Display Interface and the FCH
- The central controller uses this information to optimize performance within thermal constraints



### Die plot of the Kabini/Temash APU [61]



### 3.3.3 The Kabini notebook APU line (9)

### Block diagram and main features of the Kabini notebook APU [59]



AMD 2013 MOBILITY APU INTRODUCTION | MAY 2013



### 3.3.3 The Kabini notebook APU line (10)

### Block diagram of the Kabini notebook APU platform [26]



### 3.3.3 The Kabini notebook APU line (11)

### Main datapaths in the Kabini notebook APU platform [26]

#### FCL

- 128b (each direction) path for IO access to memory
- GPU access to coherent memory space
- CPU access to dedicated GPU framebuffer

#### **GRAPHICS MEMORY BUS**

- 256b (each direction) for GMC access to memory
- Full bandwidth path for graphics to system memory
- DRAM-friendly stream of reads and write
- Bypasses coherency mechanism

#### **INTEGRATED FCH**

- Provides complete system connectivity
  - USB 3.0, USB 2.0, SATA 3, GPIO
  - Integrated system clock generator
- Reduced motherboard footprint required
- Higher IO performance at reduced power consumption



## 3.3.3 The Kabini notebook APU line (12)

### Block diagram and main features of the HD8000 Graphics Core Next GPU of Kabini [26]

### ▶ First APU with GCN Architecture

#### API Support:

- Graphics: DirectX 11.1, OpenGL 4.3, OpenGL ES 3.0
- Compute: OpenCL1.2, DirectCompute, C++ AMP

#### ▶ Hardware Configuration:

#### Geometry Engine

- ▶ ¼ prim/clock
- Two GCN Compute Units(CU)

#### 1 Render Back-end

- 4 Pixel Color Raster Operation Pipelines (ROPs)
- ▶ 16 Depth Test (Z) / stencil Ops
- Color Cache (C\$) / Depth Cache (Z\$)
- 128KB read/write L2 cache
- 4KB Global Data Share with Global synchronization resources

#### Advanced Power Management:

- Fine grain clock\clock tree gating
- Power Tune Dynamic V/F Scaling with power containment
- Zero Core Power Power Gating



### **AMD's preview of the Kabini notebook APU line** [10]



**Performance figures and key features of select notebook APUs with 15-18 W dissipation** [61]

|                                                       | PCMark<br>7 | Cinebench<br>11.5 (Single<br>Threaded) | Cinebench 11.5<br>(Multithreaded) | 7-Zip<br>Benchmark<br>(Single<br>Threaded) | 7-Zip<br>Benchmark<br>(Multithreaded) |                   |      |        |
|-------------------------------------------------------|-------------|----------------------------------------|-----------------------------------|--------------------------------------------|---------------------------------------|-------------------|------|--------|
| AMD A4-<br>5000<br>(1.5GHz<br>Jaguar x<br>4)          | 2425        | 0.39                                   | 1.5                               | 1323                                       | 4509                                  | 28 nm             | 15 W | 40 \$  |
| AMD E-<br>350<br>(1.6GHz<br>Bobcat x<br>2)            | 1986        | 0.32                                   | 0.61                              | 1281                                       | 2522                                  | 40 nm<br>(Zacate) | 18 W | 80 \$  |
| Intel<br>Atom<br>Z2760<br>(1.8GHz<br>Saltwell<br>x 2) |             | 0.17                                   | 0.52                              | 754                                        | 2304                                  | 32 nm             | 15 W | 41 \$  |
| Intel<br>Core i5-<br>3317U<br>(1.7GHz<br>IVB x 2)     | 4318        | 1.07                                   | 2.39                              | 2816                                       | 6598                                  | 22 nm             | 17 W | 225 \$ |

### Remarks

- 1. Both Intel processors are dual-core dual-threaded processors providing actually four logical cores for the OS.
- 2. While taking into account price and performance figures the Jaguar-based Kabini APUs seem to have good market chances.

### 3.3.3 The Kabini notebook APU line (16)

### AMD's preview of the Kabini notebook APU line [10], [26]



### Summary assessment of Kabini notebook APUs [61]

Compared with Bobcat-based models it can be stated that Jaguar-based Kabini APUs have compelling battery life and performance figures.

# 3.3.4 The Temash tablet APU line

### 3.3.4 The Temash tablet APU line

**Increasing market share of tablets** [59]

### CHANGING MARKET PLACE CREATES WORLD OF "NEW CLIENT" PCS



Sources: IDC AMD Extended Forecast Client World Wide by Country March 2013, IDC AMD Extended Forecast Tablet December 2012

AMD 2013 MOBILITY APU INTRODUCTION | MAY 2013

### 3.3.4 The Temash tablet APU line (2)

### The Temash tablet line



### Positioning the Temash tablet APU line -1 [based on 14]



### **Positioning the Temash tablet APU line -2** [2]

#### AMD 2013 Client Roadmap AMD 2<sup>nd</sup> Generation A-AMD 2<sup>nd</sup> Generation A-Series APUs "Kaveri" 3rd Gen A-Series Performance codename "Richland" Series APUs codename APUs "Trinity" 2-4 "Steamroller" CPU Cores Desktop, Standard notebook (35W) and Low Voltage (17-25W) Graphics Core Next (GCN) GPU 2-4 "Piledriver" CPU cores HSA Application Support Mainstream 2<sup>nd</sup> Generation DirectX®11 GPU "Kabini"A-Series and E-Series APUs AMD C-Series and E-Series APUs codename Complete SoC "Brazos 2.0" Low Voltage (9-15W) Low Voltage (9-18W) Low-Power 2-4 "Jaguar" CPU cores 2 "Bobcat" CPU Cores Essential Graphics Core Next (GCN) GPU DirectX®11 capable GPU "Temash" A-Series Elite Mobility APUs AMD Z-Series APU codename "Hondo" Ultra-Low Power 1-2 "Bobcat" CPU Cores, Ultra Low Voltage Ultra Low Voltage; Complete SoC; 2-4 "Jaguar" CPU Cores **Tablet/Fanless** (4.5W), DirectX®11 capable GPU Graphics Core Next (GCN) GPU

AMD roadmaps are subject to change without notice

40nm 32nm 28

AMD

January 7, 2013

phx.corporate-ir.net/External.File?item...t=1

### Remark

### Temash is AMD's third tablet design

Actually, AMD's first tablet design was the Bobcat-based Desna Z-01.

- It included dual Bobcat cores, a Rqdeon HD 6250 GPU and run at 1 GHz core frequency.
- The Z-01 was not yet a SOC design (it needed an FCH) and had a TDP of 5.9 W.
- Its major drawback was that it needed a fan.
- The Z-01 did not gain a wide market acceptance, only a single tablet opted for it (the MSI WinPad 110) [11].

AMD's second tablet design was the 2. gen. Bobcat based Z-60 Hondo.

- The Hondo Z-60 supports Windows 8 rather than Android.
- It has the same basic features as the previous design (dual 2. gen. Bobcat cores, the Radeon HD 6250 GPU, 1 GHz clock frequency) but a reduced TDP of 4.5 W).

In addition, Z-60 comes with a redesigned Fusion controller hub (FCH), which has a number of capabilities switched off to reduce power consumption.

• Also the Z-60 Hondo could not achieve a wide market acceptance, there are only a few tablet designs based on it [12].

### Main features of Temash tablet APU models [59]

| Model          | Radeon™<br>Brand    | TDP  | CPU<br>Cores | CPU<br>Clock Speed<br>(Max/Base) | L2<br>Cache | Radeon™<br>Cores⁺ | GPU<br>Clock Speed<br>(Max/Base) | MAX<br>DDR3<br>Speed |
|----------------|---------------------|------|--------------|----------------------------------|-------------|-------------------|----------------------------------|----------------------|
| AMD A-Series E | Elite Mobility APUs |      |              |                                  |             |                   |                                  |                      |
| A6-1450        | HD 8250             | 8W   | 4            | 1.4GHz/1.0GHz                    | 2MB L2      | 128               | 400MHz/300MHz                    | DDR3L-1066           |
| A4-1250        | HD 8210             | 9W   | 2            | 1.0GHz                           | 1MB L2      | 128               | 300MHz                           | DDR3L-1333           |
| A4-1200        | HD 8180             | 3.9W | 2            | 1.0GHz                           | 1MB L2      | 128               | 225MHz                           | DDR3L-1066           |

### 3.3.4 The Temash tablet APU line (7)

### Main features of the K16 Temash tablet APU line [13]



### AMD's Turbo Dock technology-1

- First will be introduced in select models of Temash.
- The Turbo Dock technology is actually a hybrid design (as shown below) meaning that the tablet has a dock station that provides keyboard, fan-based cooling and may provide an additional battery as well.



Figure: Hybrid design with a dock station [14]

### AMD's Turbo Dock technology-2

- Without the dock station the tablet runs at 1 GHz and has a TDP of 5.9 W.
- When docked, the tablet runs at 1.4 GHz and its TDP scales up to 15 W utilizing the fan-based cooling provided by the dock station.

This provides a performance gain of about 40 %.

Remark

The concept underlying the Turbo Dock technology is not new.

Already Lenovo's ThinkPad Helix tablet has a hybrid design such that the tablet boosts performance when docked with the aid of a cooling system located in the hinge of the keyboard docking station [15].

### 3.3.4 The Temash tablet APU line (10)

# Block diagram and main features of the Temash tablet APU-based Elite Mobility APU platform [59]

### 2013 AMD ELITE MOBILITY APU PLATFORM DETAILS

- Dual/Quad Core, AMD Start Now Technology<sup>17</sup> w/smart sleep
- AMD Turbo Dock Technology for 40% performance boost<sup>1</sup>
- "Jaguar" Core: Up to 20% improvement over "Bobcat" cores<sup>18</sup>
- "GCN" GPU core: Up to 75% compute performance (GFLOP) improvement with DirectX® 11.1 support<sup>19</sup>
- Power gating for DCE, UVD, VCE, NB and DDR P-States
- Memory Support: Single-channel (64 bits) DDR3
- ▶ High Resolution display supporting dual up to 2560 x 1600
- ▶ Faster than real time HD encode H.264, SVC HDCP 2.1
- Protected Content DRM offloaded from CPU, HDCP 2.1
- Updated I/O: Up to 8 USB 2.0 and 2 USB 3.0, 1 SATA Gen2, SD Card Reader Version 3.0 or SDIO controller



- DCE: Display Controller Engine
- UVD: Unified Video Decoder
- VCE: Video Compression Engine
- NB: North Bridge

HDCP: High-Bandwidth Digital Content Protection

See backup for footnotes 16

### Performance figures of the Temash tablet line vs. Intel's Atom Z2760 [27]



# 4. Family 16h Models 30h-3FH (Puma+ based) APU lines

- 4.1 Overview of the Puma+ based APU lines
- 4.2 The Puma core
- 4.3 Enhancements and innovations introduced in the Puma+ based APU lines
- 4.4 The Beema notebook line
- 4.5 The Mullins tablet line
- 4.6 The Carrizo-L notebook line

## 4.1 Overview of the Puma+ based APU lines -1 Brand names of AMD's Puma+ based processor lines

|           | Launched in                                         | 2011                                      | 2012                                | 2013                                | 2014                               | 2015                              |
|-----------|-----------------------------------------------------|-------------------------------------------|-------------------------------------|-------------------------------------|------------------------------------|-----------------------------------|
|           |                                                     | Family 14h<br>(00h-0Fh)<br>(Bobcat)       | Family 14h<br>(00h-0Fh)<br>(Bobcat) | Family 16h<br>(00h-0Fh)<br>(Jaguar) | Family 16h<br>(30h-3Fh)<br>(Puma+) | Family 16h<br>(30h-3Fh)<br>(Puma+ |
|           | 4P servers                                          |                                           |                                     |                                     |                                    |                                   |
| rs        | 2P servers                                          |                                           |                                     |                                     |                                    |                                   |
| Servers   | 1P servers                                          |                                           |                                     |                                     |                                    |                                   |
| Se        | (85-140 W)                                          |                                           |                                     |                                     |                                    |                                   |
|           | High perf.<br>(~95-125 W)                           |                                           |                                     |                                     |                                    |                                   |
| Desktops  | <b>Mainstream</b><br>(~65-100 W)                    |                                           |                                     |                                     |                                    |                                   |
| Des       | <b>Entry level</b><br>(~30-60 W)                    |                                           |                                     |                                     |                                    |                                   |
| oks       | High performance/<br>mainstream/entry<br>(~30-60 W) |                                           |                                     | Kabini A6                           |                                    |                                   |
| Notebooks | <b>Ultra portable</b><br>(~10-15 W)                 | Zacate<br>E-Series<br>Ontario<br>C-Series | Zacate<br>E1/E2                     | Kabini<br>A/E-Series                | Beema<br>A/E-Series                | Carrizo-L<br>A/L-Series           |
|           | <b>Tablet</b><br>(~5 W)                             | Desna<br>Z-Series                         |                                     | Temash<br>A Series                  | Mullins<br>A Series/E1             |                                   |

#### Key features of AMD's subsequent mobile generations [67]



#### AMD's announcement of the Puma+ based Beema and Mullins lines in 11/2013 [68]



7 | 2014 AMD MOBILITY APU LINEUP ANNOUNCEMENT | NOVEMBER 2013

#### **Overview of the Puma+ based APU lines -2**

- The Puma+ core provides only modest performance improvements rather they afford advances in power management and security, as detailed in Section 4.3.
- Puma+ based APU lines were launched in two groups, as indicated below.



## Key features of the Puma+ based lines (based on [66])

| Brand                       | Desna Ontario Zacate     | Kabini Temash                                  | Beema Mullins                                        | Carrizo-L                              |
|-----------------------------|--------------------------|------------------------------------------------|------------------------------------------------------|----------------------------------------|
| Aim                         | T T EDT/NB               | EDT/NB T                                       | EDT/NB T                                             | EDT/NB                                 |
| Released                    | Jan 2011                 | May 2013                                       | April 2014                                           | May 2015                               |
| Fab. (nm)                   | TSMC 40                  | 28                                             | 28                                                   | 28                                     |
| Die size (mm <sup>2</sup> ) | 75 (+ 28 FCH)            | ~107                                           | ~107                                                 | ТВА                                    |
| Core count<br>(up to)       | 2                        | 4                                              | 4                                                    | 4                                      |
| L2 cache                    | Privat                   | Shared                                         | Shared                                               | Shared                                 |
| CPU microarch.              | Bobcat                   | Jaguar                                         | Puma+                                                | Puma+                                  |
| Socket                      | FT1                      | AM1, FT3                                       | FT3b                                                 | FP4                                    |
| Memory support              | DDR3L-1333<br>DDR3L-1066 | DDR3L-1600 DDR3L-1333<br>DDR3L-1333 DDR3L-1066 | DDR3L-1866<br>DDR3L-1600<br>DDR3L-1333<br>DDR3L-1333 | DDR3L-1866<br>DDR3L-1600<br>DDR3L-1333 |
| 3D engine                   | TeraScale (VLIW5)        | GCN 2nd Gen                                    | GCN 2nd Gen                                          | GCN 2nd Gen                            |
| Video Decoder<br>ASIC       | UVD 3.0                  | UVD 4.0                                        | UVD 4.2                                              | UVD 6.0                                |
| Video Encoding<br>ASIC      | N/A                      | VCE 2.0                                        | VCE 2.0                                              | VCE 3.1                                |

EDT: Entry-level desktop NB: Notebook T: Tablet

#### Transistor count and die area comparisons of mobile processors [69]

| AMD/Intel Transistor Count & Die Area Comparison |              |                  |                           |  |  |  |  |
|--------------------------------------------------|--------------|------------------|---------------------------|--|--|--|--|
| SoC                                              | Process Node | Transistor Count | Die Area                  |  |  |  |  |
| AMD Zacate                                       | TSMC 40nm    | 450M+            | 75mm <sup>2</sup>         |  |  |  |  |
| AMD Kabini/Temash                                | TSMC 28nm    | 914M             | ~107mm <sup>2</sup> (est) |  |  |  |  |
| AMD Beema/Mullins                                | GF 28nm      | 930M             | ~107mm² (est)             |  |  |  |  |
| AMD Llano                                        | GF 32nm SOI  | 1.18B            | 228mm <sup>2</sup>        |  |  |  |  |
| AMD Trinity/Richland                             | GF 32nm SOI  | 1.30B            | 246mm <sup>2</sup>        |  |  |  |  |
| AMD Kaveri                                       | GF 28nm SHP  | 2.41B            | 245mm <sup>2</sup>        |  |  |  |  |
| Intel Haswell (4C/GT2)                           | Intel 22nm   | 1.40B            | 177mm <sup>2</sup>        |  |  |  |  |

GF: Globalfoundries (USA) TSMC: Taiwan Semiconductor Manufacturing Company

#### Preliminary performance figures given by AMD at announcing Beema and Mullins [68]



More than 2X the performance per watt of the previous generation<sup>1</sup>

- ▲ AMD Radeon<sup>™</sup> graphics: The world's best graphics technology that is found in all the latest game consoles AND in PCs that deliver richer more lifelike videos, movies, photos and the best gaming experience
- AMD DockPort technology support
- Microsoft 8.1 optimizations and Microsoft InstantGo<sup>2</sup> for faster wake times and to ensure data, such as e-mail, actively refreshes in standby
- ▲ Platform security processor based on the ARM Cortex<sup>™</sup>-A5 processor featuring ARM TrustZone<sup>®</sup> technology for enhanced data security

8 2014 AMD MOBILITY APU LINEUP ANNOUNCEMENT | NOVEMBER 2013

See backup for footnotes

# 4.2 The Puma core

#### 4.2 The Puma+ core

• As AMD states in [], there are only a few minor changes concerning the Puma+ core (underlying the Family 16h Models 30h-3Fh processors) compared to the Jaguar core (underlying the Family 16h Models 00h-0Fh processors).

These minor changes include the introduction of a performance time-stamp counter, a processor power accumulator and the implementation of the RDRAND instruction.

- Accordingly, the Puma+ core has a two-wide out-of-order superscalar microarchitecture like the Jaguar core.
- AMD claims a 19 % reduction in the core leakage at 1.2 V compared to the Jaguar core.
- This contributes to achieving higher clock frequencies within the same TDP [70].

Remark to the Puma+ designation of the core

• AMD names the CPU core of the Beema and Mullins processors in different publications differently.

On the one hand, AMD terms these cores Puma in their 2014 roadmap (revealed in 11/2013) whereas at launching (04/2014) these processors were dubbed Puma+.

 A possible explanation for this may be found in newer releases of the original description of the Puma+ based processor lines (e.g. [63]), as Revision 3.01 (dated on 12/2014) provides a long list of changes.

Based on this, the original Puma core can be assumed to have undergone an update and a renaming to Puma+ but the new designation obviously could not retrospective be employed to the roadmap issued previously. 4.3 Enhancements and innovations of the Puma+ based APU lines

# 4.3 Enhancements and innovations of the Puma+ based APU lines (1)

# **4.3 Enhancements and innovations introduced into Puma+ based APU lines Comparing the layouts of Jaguar and Puma+ based APUs** [63], [64]

Jaguar

Puma+



As a comparison shows the layouts of Jaguar-and Puma+ APUs as revealed in the official BKDG (BIOS and Kernel Developer Guide) publications are identical.

#### **Enhancements of the Puma+ core based Beema and Mullins APUs**

- a) Higher clock frequencies at lower power consumption
- b) Higher memory transfer rates
- c) Lower powered memory and display controllers

# 4.3 Enhancements and innovations of the Puma+ based APU lines (3)

#### a) Higher clock frequencies at lower power consumption [70]



#### b) Higher memory transfer rates

Both Beema and Mullins have single channel DDR3L memories but the desktop and entrylevel Beema has a higher max. transfer rates than the preceding Kaveri, as indicated below:

- Entry-level desktop and notebook aimed lines:
  - Beema provides up to 1866 MT/s vs. up to 1600 MT/s furnished by Kaveri
- Table aimed lines:
  - Mullins provides the same max. transfer rate of 1333MT/s as the preceding Temas.

# 4.3 Enhancements and innovations of the Puma+ based APU lines (5)

# c) Lower powered memory and display controllers [70]



## Innovations introduced by the Puma+ based APUs (Beema/Mullins)

- a) Skin temperature aware power management (STAM)
- b) Intelligent boost (selective boost)
- c) Adaptive voltage control to encounter short voltage drops
- d) Support for ARM TrustZone via integrated Cortex-A5 processor

# 4.3 Enhancements and innovations of the Puma+ based APU lines (7)

#### a) Skin temperature aware power management (STAM) [70] (Chassis temperature aware turbo boost)

- Boost aggressively until Tskin reaches user defined maximum
- Reduce power only when necessary to adhere to Tskin, max limit
- ▲ Most use-cases for mobile devices are short in duration → Result is higher performance most of the time
- All without using more power



#### DYNAMIC SKIN-TEMPERATURE AWARE POWER MANAGEMENT CAN ENABLE UP TO 63% PERFORMANCE INCREASES<sup>1</sup> ON KEY WORKLOADS

Based on 3DMark11-P and PCMark8 V2 Home on 3.5W TSP Mullins with and without STAPM enabled. Pre-production engineering sample of "Mullins" quad-core APU with next generation AMD Radeon graphics (model number TBD), 2x2G8 DOR3-1333MHz RAM, Windows 8-1, and unreleased reference driver.

#### Benefit of STAM: Higher turbo boost frequency reduces the total energy consumption [71]



# **b) Intelligent boost (selective boost)** [72]

# AVOIDING POWER WASTE WITH INTELLIGENT BOOST CONTROL

# 

- Intelligent Boost is designed to avoid power waste that results from boosting applications that benefit very little from higher frequency
- Enables long battery life and cool operation while maintaining great performance
- Power management micro-controller tracks application behavior real-time to determine frequency sensitivity
- Boost behavior is adjusted accordingly



#### Performance metrics tracked by power manager

# 4.3 Enhancements and innovations of the Puma+ based APU lines (10)

# c) Adaptive voltage control to encounter short voltage drops [72]

# **VOLTAGE ADAPTIVE OPERATION**

- CPUs, GPUs and APUs all operate at low voltages with high current – this creates a challenge for power supplies and packages to delivery a quality voltage
- In fact it's impossible to deliver a perfect voltage, and the variations that happen are often about 10% of the nominal value – that means at least 20% power is wasted covering these voltage variations
- AMD's unique voltage adaptation feature recovers most of that wasted power by operating at the average voltage and quickly reducing frequency for the brief periods when the voltage reduces



✓ VOLTAGE REDUCTION AT 3.5GHZ IS 50MV-70MV ACROSS TEMPERATURE → 10% TO 20% POWER REDUCTION FOR THE SAME PERFORMANCE<sup>14</sup>

# 

# 4.3 Enhancements and innovations of the Puma+ based APU lines (11)

#### d) Support for ARM TrustZone via integrated Cortex-A5 processor [68]

- Mullins and Beema are AMD's first lines to integrate ATM's TrustZone technology to provide enhanced system security.
- This technology is analogous to Intel's Trusted Computing technology.
- Provides a Trusted Execution Environment (TEE)
  - Protects against software attack from open/rich OS side of system
  - Provides scalable environment for secure applications like user authentication, anti-malware, content management, online payments, etc.
- Delivers two separate domains, normal and secure
  - Extends across entire system
  - Beyond simply the processor/SOC
  - Can deliver secure
    - Processing data path
    - On/off-chip memory
    - I/O and display



#### New features

**Turbo Core:** AMD's Kabini and Temash could reduce their own clock speeds to save power but didn't have a Turbo Mode for additional performance in single-threaded workloads. Beema and Mullins both add this capability to certain chips — Beema, the notebook processor, can burst up to 2.4GHz while Mullins, the tablet SoC, can ramp as high as 2.2GHz.

**ARM TrustZone:** Mullins and Beema are the first AMD processors to integrate a Cortex-A5 ondie for additional system security and management. TrustZone is analogous to Intel's Trusted Computing technology — ARM's own website says that the feature is analogous to the Intel standard. This is essentially a corporate or government-oriented feature; there doesn't seem to be much consumer software that actually uses the TrustZone system.

**Reduced leakage:** AMD claims that it's reduced leakage current loss by 19% in Mullins as compared to Kabini. This isn't' the same thing as reducing total power consumption, but it should still have a measurable impact. The on-board GPU has improved even more; Beema and Mullins have 38% reduced leakage compared to Kabini/Temash.

A number of additional improvements were made to reduce power consumption in other areas — the display controller now draws less power when using DisplayPort and low-power DDR optimizations allowed AMD to reduce memory controller power consumption by 600mV compared to standard DDR modules.

**New power management:** This ties into the Turbo Core feature but is distinct enough to deserve its own mention. According to AMD, Beema and Mullins will include the ability to directly measure the skin temperature of the laptop or tablet and will adjust frequency based on how warm the chassis is — not just according to their own Tmax. Because heat dissipates out to the chassis rather slowly, AMD can therefore run their cores at a higher frequency for a longer period of time. According to AMD, Tskin will be a user definable variable (it's not clear how this capability will be exposed in software).

# 4.3 Enhancements and innovations of the Puma+ based APU lines (12)

# Power management techniques introduced by AMD and improvements achieved both in their mainstream and mobile lines [69]



# 4.3 Enhancements and innovations of the Puma+ based APU lines (13)

#### Power management techniques in development by AMD [73]

# MULLINS/BEEMA FEATURES ARE JUST THE TIP OF THE ICEBERG

- AMD has been building a pipeline of power focused IP for many years
- Watch for more leading edge innovations enabling accelerated power gains in the future





# 4.4 The Beema notebook line

#### 4.4 The Beema notebook line

Launched in 04/2014.

# 4.4 The Beema notebook line (2)

#### Announcing the Puma+ based Beema notebook line in AMD's 2013-2014 mobile roadmap in 11/2013 [68]



AMD roadmaps are subject to change without notice or obligations to notify of changes. Placement of boxes intended to represent first year of production shipments

7 | 2014 AMD MOBILITY APU LINEUP ANNOUNCEMENT | NOVEMBER 2013

#### Key features of the Beema notebook line [67]

# UNMATCHED FEATURE SET

# 



- Puma+ x86 Cores
- Graphics Core Next (GCN)
- System-Aware Power Management
- A Platform Security Processor
- DDR3-1866 memory support
- Over 50% more frequency at nearly half the TDP of the previous generation <sup>2</sup>



4 | AMD 2014 LOW POWER AND MAINSTREAM MOBILE APUS | UNDER EMBARGO UNTIL APRIL 29<sup>TH</sup> / 12:01 AM EASTERN U.S. TIME

# 4.4 The Beema notebook line (4)

# Main features of the Beema notebook line [74]

|        | Model | Socket              | CPU   |           |               | GPU      |              |               |              |                |      |                |
|--------|-------|---------------------|-------|-----------|---------------|----------|--------------|---------------|--------------|----------------|------|----------------|
| Family |       |                     | Cores | Frequency | Max.<br>Turbo | L2 Cache | Model        | Max.<br>Freq. | TDP          | Memory         |      |                |
| A8     | 6410  | 4<br>Socket<br>FT3b |       | 2.00 GHz  |               |          | Radeon<br>R5 | 800 MHz       |              | DDR3L-         |      |                |
| A6     | 6310  |                     |       |           | 1.80 GHz      | 2.4 GHz  |              | Radeon<br>R4  | 800 MHz      | 15 W           | 1866 |                |
| A4     | 6250J |                     | 4     | 2.00 GHz  |               | 2 MB     | Radeon<br>R3 | 600 MHz       | 25 W         |                |      |                |
| A4     | 6210  |                     |       |           |               | 1.80 GHz |              |               | Radeon<br>R3 | 600 MHz        | 15 W | DDR3L-<br>1600 |
| E2     | 6110  |                     |       | 1.50 GHz  | N/A           |          | <b>D</b> .   | 500 MHz       | 15 00        |                |      |                |
| E1     | 6010  |                     | 2     | 1.35 GHz  |               | 1 MB     | Radeon<br>R2 | 350 MHz       | 10 W         | DDR3L-<br>1333 |      |                |

Positioning AMD's Beema notebook processors vs. Intel's corresponding processors [67]



# 4.4 The Beema notebook line (6)

Power improvements of the Beema line vs. the preceding Kabini line [69]



# 4.5 The Mullins tablet line

#### 4.5 The Mullins tablet line

Launched in 04/2014.

# 4.5 The Mullins tablet line (2)

## Announcing the Puma+ based Mullins tablet line in AMD's 2013-2014 mobile roadmap in 11/2013 [68]



AMD roadmaps are subject to change without notice or obligations to notify of changes. Placement of boxes intendec to represent first year of production shipments

7 | 2014 AMD MOBILITY APU LINEUP ANNOUNCEMENT | NOVEMBER 2013

# Main features of the Mullins tablet line [74]

|           | Model | CPU   |                    |               | GPU          |              | Power         |         |                |        |                |
|-----------|-------|-------|--------------------|---------------|--------------|--------------|---------------|---------|----------------|--------|----------------|
| Family    |       | Cores | Frequency          | Max.<br>Turbo | L2 Cache     | Model        | Max.<br>Freq. | TDP     | SDP            | Memory |                |
| A10 Micro | 6700T | 4     | 1.2 GHz            | 2.2 GHz       |              | Radeon<br>R6 | 500 MHz       |         |                |        |                |
| A6 Micro  | 6500T |       |                    | 1.2 GHZ       | 1.8 GHz      | 2 MB         | Radeon<br>R4  | 401 MHz | 4.5 W          | 2.8 W  | DDR3L-<br>1333 |
| A4 Micro  | 6400T |       | 1004               | 1.6 GHz       |              | Radeon<br>R3 | 350 MHz       |         | 2.0 VV         |        |                |
| E1 Micro  | 6200T |       | 1.0 GHz<br>1.4 GHz | 1 MB          | Radeon<br>R2 | 300 MHz      | 3.95 W        |         | DDR3L-<br>1066 |        |                |

#### **Positioning AMD's Mullins tablet processors vs. the previous Temash processors** [67]

| GENERATION VS. GENERATION                         | N, TOP OF STACK COMPARISON*                                                                                    |                                     |
|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------|-------------------------------------|
| 2013                                              | 2014                                                                                                           | AMD A10 Micro-6700T vs. AMD A6-1450 |
|                                                   | AMD A10 Micro-6700T APL<br>with Radeon <sup>TM</sup> R4 Graphics<br>4.5W TDP, 2.8W SD <sup>*</sup> 582 1591 15 |                                     |
| "Temas<br>Quad-o<br>8w TDP, 5.4 SDP 478 1487 15   | with Dedeen TM D2 Creation                                                                                     |                                     |
| "Temas<br>Dual-c<br>3.9W TDP, 3.5 SDP 299 1186 10 | with Dadoon TM D2 Cranhie                                                                                      |                                     |
| 3DMark 11<br>Performance                          | PCMark 8 Home Basemark CL                                                                                      | PCMark 8 – Home v2                  |

As seen, the new Mullins processors provide higher performance for substantially lower power (4.5 W vs. 8 W) or substantially higher performance for the same power (3.9 W).

## 4.5 The Mullins tablet line (5)

#### Die shot and floorplan of the Mullins tablet processor [73]

## MULLINS

# 111 ast wat a



### 

4.6 The last processor based on the Puma+ core: the Carrizo-L APU line

#### 4.6 The last processor based on the Puma+ core: the Carizzo-L APU line -1

As seen in AMD's Mobile and AIO platform roadmap for 2015 – 2017 Carrizo-L is AMD's last processor based on the Cat line, actually on the Puma+ core. [75] [76]



The current generation OEM products are power by Carrizo & Carrizo-L

In 2016 "Bristol Ridge" Further Improves Performance<sup>\*</sup> and Mobility, and Enables DDR4

#### The last processor based on the Puma+ core: the Carizzo-L APU line -2

- The Carrizo-L line was launched in 05/2015.
- It is based on the same Puma+ CPU core as the previous Beema notebook line.
- There are no significant improvements vs. the previous Beema line, max. clock frequencies are only 5 -10 % higher.

#### Key features of the Carrizo-L notebook processor line [76]



#### Main features of the processors of the Carrizo-L line [77]

| AMD Carrizo-L      |               |               |               |               |               |  |
|--------------------|---------------|---------------|---------------|---------------|---------------|--|
|                    | A8-7410       | A6-7310       | A4-7210       | E2-7110       | E1-7010       |  |
| Cores /<br>Threads | 4 / 4         | 4 / 4         | 4 / 4         | 4 / 4         | 2/2           |  |
| CPU<br>Frequency   | Up to 2.5 GHz | Up to 2.4 GHz | Up to 2.2 GHz | Up to 1.8 GHz | Up to 1.5 GHz |  |
| TDP                | 12-25W        | 12-25W        | 12-25W        | 12-15W        | 10W           |  |
| L2 Cache           | 2MB           | 2MB           | 2MB           | 2MB           | 1MB           |  |
| DRAM<br>Frequency  | DDR3L-1866    | DDR3L-1600    | DDR3L-1600    | DDR3L-1600    | DDR3L-1333    |  |
| Radeon<br>Graphics | R5            | R4            | R3            | 'Radeon'      | 'Radeon'      |  |
| GPU<br>Frequency   | Unknown       | Unknown       | Unknown       | Unknown       | Unknown       |  |

# 5. AMD's withdrawal from the mobile market

#### 5. AMD's withdrawal from the mobile market

Sail statistics of application processors to be incorporated in smartphones and notebooks (see below) reveal that no traditional processor or graphics card firms, including Intel, AMD or NVIDIA could achieve a favorable market position in worldwide sales.

| Smartphone application processors worldwide market share 2015 (revenue) |      |  |  |
|-------------------------------------------------------------------------|------|--|--|
| Qualcomm (USA)                                                          | 42 % |  |  |
| Apple (USA)                                                             | 21 % |  |  |
| MediaTek (Taiwan)                                                       | 19 % |  |  |
| Samsung (S. Korea)                                                      |      |  |  |
| Spreadtrum (China)                                                      |      |  |  |

| Tablet application processors worldwide market share 2015 (revenue) |      |  |  |
|---------------------------------------------------------------------|------|--|--|
| Apple (USA)                                                         | 31 % |  |  |
| Qualcomm (USA)                                                      | 16 % |  |  |
| Intel (USA)                                                         | 14 % |  |  |
| MediaTek (Taiwan)                                                   |      |  |  |
| Samsung (S. Korea)                                                  |      |  |  |

[Source: Related press releases of Strategy Analytics]

Worldwide market share of smartphone and tablet application processors in 2015 (based on revenue) [34]

Note that seemingly Intel achieved the 3. place in tablet application processor sales, but this is brought about by paying high subsidies to OEMs for choosing Intel processors what could not long be maintained due to high losses incurred (about 7 billion USD I two years). As a consequence, all three vendors mentioned have withdrawn from the mobile market in 2015/2016, as shown next.

#### The last processor based on the Puma+ core: the Carizzo-L APU line -1

As seen in AMD's Mobile and AIO platform roadmap for 2015 – 2017 Carrizo-L is AMD's last processor based on the Cat line, actually on the Puma+ core. [75] [76]



The current generation OEM products are power by Carrizo & Carrizo-L

In 2016 "Bristol Ridge" Further Improves Performance<sup>\*</sup> and Mobility, and Enables DDR4

#### **Cancellation of AMD's low-power Cat line in 2015**

- Neither Intel nor AMD became successful on the mobile market, so beyond Intel also AMD stopped their activities on this market.
- The last core of AMD's Cat line was the Puma+ core (launched in 6/2015 in the Carrizo-L APU).
- In AMD's 2016 Mobility roadmap there is no sign of an APU powered by the Puma+ core or a derivative of a core belonging to the Cat line, as seen in the next slice.
- Instead AMD placed emphasis on the development of Zen core based products.

#### Intel's withdrawal from the smartphone and mobil market

- As Intel failed to gain traction in the mobile sector and suffered high losses on the mobile segment the firm announced their withdrawal from the mobil market in 4/2016.
- Intel's statement says [78]:

"I can confirm that the changes included canceling the Broxton platform as well as SoFIA 3GX, SoFIA LTE and SoFIA LTE2 commercial platforms to enable us to move resources to products that deliver higher returns and advance our strategy.

These changes are effective immediately."

• At the same time Intel laid off about 12000 employees (~ 11 % of their workforce).

#### NVidia's leaving the smartphone and tablet market in June 2016 [79]

- NVIDIA's Tegra 4 chips were also not successful, so the firm announced in 05/2014 that they will abandon the phone market.
- Apple's iPad Air 2 with its A8X processor and its GPU with 256 EUs became a very powerful rival to NVIDIA's subsequent 64-bit K1 chip including a GPU with 192 EUs.

As a consequence, NVIDIA also gave up their tablet interests.

٠

• In 6/2016 (at Computex) NVIDIA's CEO declared the firm's leaving the smartphone and tablet market by saying:

"We are no longer interested in that market". He adds, "Anybody can build smartphones, and we're happy to enjoy these devices, but we'll let someone else build them".

 Instead NVIDIA became interested in designing in-car computers and car infotainment systems.

# 6. References

- [1]: Su L., Consumerization, Cloud, Convergence, AMD 2012 Financial Analyst Day, Febr. 2 2012
- [2]: Ryan T., AMD's CES 2013 Press Conference, Semi Accurate, Jan. 8 2013, http://semiaccurate.com/2013/01/08/amds-ces-2013-press-conference/

[3]: Rupley J., "Jaguar": AMD's Next Generation Low Power x86 Core, Hot Chips 24, Aug. 28 2012

- [4]: Kommrusch S., Implementing an Efficient RTL Clock Gating Analysis Flow at AMD, Calypto White Paper, 2013, http://calypto.com/en/uploads/collateral/WP0021\_ ImplementingAnEfficientRTLClockGatingAnalysisFlowatAMD\_0313.pdf
- [5]: Singh T., Bell J., Southard S., John D., Jaguar: A Next-Generation Low-Power x86-64 Core, IEEE International Solid-State Circuits Conference, 2013, http://www.hardware.fr/marc/ISSCC2013-Final-v5.pdf
- [6]: Kommrusch S., Reducing power in AMD processor core with RTL clock gating analysis, EE Times, Febr. 4 2013, http://www.eetimes.com/design/eda-design/4406251/ Reducing-power-in-AMD-processor-core-with-RTL-clock-gating-analysis-
- [7]: Hruska J., AMD's next-gen Bobcat APU could win big in notebooks and tablets if it launches on time, Extreme Tech, Dec. 4 2012, http://www.extremetech.com/gaming/142163amds-next-gen-bobcat-apu-could-win-big-in-notebooks-and-tablets-if-it-launches-on-time
- [8]: Burgess B., "Bobcat" AMD's New Low Power x86 Core Architecture, Aug. 24 2010, http://www.hotchips.org/uploads/archive22/HC22.24.730-Burgess-AMD-Bobcat-x86.pdf

- [9]: Eassa A., AMD's Sleeper Agent: 'Temash' For Tablets, Seeking Alpha, Nov. 26 2012, http://seekingalpha.com/article/1027391-amd-s-sleeper-agent-temash-for-tablets
- [10]: Paine S.C., AMD Shares SoC Line-Up for 2013. Kabini is for Ultrathins, Ultrabook News, Jan. 9 2013, http://ultrabooknews.com/2013/01/09/amd-shares-soc-line-up-for-2013kabini-is-for-ultrathins/
- [11]: Bodnár Á., Újra próbálkozik a tabletpiacon az AMD, HWSW, Oct. 9 2012, http://www.hwsw.hu/hirek/49157/amd-z60-hondo-tablet-processzor-windows.html
- [12]: Shilov A., Vizio Introduces World's Second AMD Z-60-Based Tablet, Xbit Labs, Jan. 9 2013, http://www.xbitlabs.com/news/mobile/display/20130109235910\_Vizio\_Introduces\_ World\_s\_Second\_AMD\_Z\_60\_Based\_Tablet.html
- [13]: AMD Temash Core, CPU World, http://www.cpu-world.com/Cores/Temash.html
- [14]: Három referenciatabletet demonstrált az AMD az MWC-n, Prohardver, Febr. 27 2013, http://prohardver.hu/hir/harom\_referenciatabletet\_demonstralt\_amd\_mwc.html
- [15]: Halfacree G., AMD unveils Turbo Dock for convertibles, Bit-Tech, Febr. 21 2013, http://www.bit-tech.net/news/hardware/2013/02/21/amd-turbo-dock/
- [16]: Wikipedia, Dirk Meyer, http://en.wikipedia.org/wiki/Dirk\_Meyer
- [17]: Enderle R., AMD Shanghai "We are back!", TGDaily, Nov. 13, 2008, http://www.tgdaily.com/content/view/40176/128/

- [18]: AMD Phenom II X4 975 BE 3.60 GHz, Tech Power Up, Jan. 5 2011, http://www.techpowerup.com/reviews/AMD/Phenom\_II\_X4\_975/
- [19]: AMD Phenom II X4 840 3.20 GHz, Tech Power Up, Jan. 5 2011, http://www.techpowerup.com/reviews/AMD/Phenom\_II\_X4\_840/
- [20]: Gavrichenkov I., AMD Phenom II X2 550 and AMD Athlon II X2 250 Processors Review, Xbitlabs.com, June 1 2009, http://www.xbitlabs.com/articles/cpu/display/phenom-athlon-ii-x2\_3.html
- [21]: Walrath J., AMD Introduces New Mainstream and Ultra-Portable Platforms, PC Perspective, Sept. 12 2009, http://www.pcper.com/reviews/Processors/AMD-Introduces-New-Mainstream-and-Ultra-Portable-Platforms/2009-Mainstream-Platfo
- [22]: K10: Barcelona, Shanghai, Quad-Core Opteron, Phenom, AMD Zone, http://www.amdzone.com/phpbb3/viewtopic.php?f=52&t=137000
- [23]: Kanter D., AMD's Bulldozer Microarchitecture, Real World Technologies, Aug. 26 2010, http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333&p=2
- [24]: Kirsch N., AMD Kabini Mainstream APU Notebook Platform Preview, Legit Reviews, May 23 2013, http://www.legitreviews.com/article/2197/
- [25]: Chiappetta M., AMD 2013 A & E-Series Kabini and Temash APUs, Hot Hardware, May 23 2013 http://hothardware.com/Reviews/AMD-2013-ASeries-Kabini-and-Temash-Mobile-APUs/

- [26]: Shimpi A.L., AMD's Jaguar Architecture: The CPU Powering Xbox One, PlayStation 4, Kabini & Temash, AnandTech, May 23 2013, http://www.anandtech.com/show/6976/ amds-jaguar-architecture-the-cpu-powering-xbox-one-playstation-4-kabini-temash/5
- [27]: Hruska J., AMD's last, best hope: Low-power Kabini, Temash are ready for action; could rejuvenate mobile market, Extreme Tech, May 23 2013, http://www.extremetech.com/computing/156552-amds-last-and-only-hope-low-powerkabini-temash-are-ready-for-action
- [28]: Foley D., A Low-Power Integrated x86-64 and Graphics Processor for Mobile Computing Devices, IEEE Vol. 47 No. 1, Jan. 2012, http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=06054032
- [29]: Wikipedia, List of AMD Fusion microprocessors, http://en.wikipedia.org/wiki/List\_of\_AMD\_Fusion\_microprocessors
- [30]: Jotwani R., Sundaram S., Kosonocky S., Schaefer A., Andrade V. F., Novak A., Naffziger S., An x86-64 Core in 32 nm SOI CMOS, IEEE Xplore, 2010, http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05624589
- [31]: Burgess B., Cohen B., Denman M., Dundas J., Kaplan D., Rupley J., Bobcat: AMD's, Low-Power x86 Processor, IEEE March/April 2011, http://home.dei.polimi.it/sami/architetture\_avanzate/AMDbobcat.pdf
- [32]: Branover A., Foley D., Steinman M., AMD Fusion APU: Llano, IEEE Micro, 2012, http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=06138843

- [33]: White S., High-Performance Power-Efficient X86-64 Server and Desktop Processors, Using the core codenamed "Bulldozer", Aug. 19 2011, http://hotchips.org/uploads/ hc23/HC23.19.9-Desktop-CPUs/HC23.19.940-Bulldozer-White-AMD.pdf
- [34]: Foley D., AMD's "LLANO" Fusion APU, Hot Chips 23, Aug. 19 2011, http://www.hotchips.org/archives/hc23/HC23-papers/HC23.19.9-Desktop-CPUs/ HC23.19.930-Llano-Fusion-Foley-AMD.pdf
- [35]: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 14h Models 00h-0Fh Processors, Rev. 3.13, Febr. 17 2012, http://support.amd.com/us/Embedded\_TechDocs/43170.pdf
- [36]: Tseng A., AMD Fusion 11 Taipei TFE, Technical Forum & Exhibition, Oct. 5 2011, http://sites.amd.com/us/Documents/John%20Taylor%20Fusion\_11\_TFE\_Oct5\_2011.pdf
- [37]: Kowaliski C., A closer look at AMD's Brazos platform, Tech Report, Nov. 8 2010, http://techreport.com/articles.x/19937
- [38]: Wikipedia, Comparison of AMD chipsets, http://en.wikipedia.org/wiki/Comparison\_of\_AMD\_chipsets
- [39]: AMD Ontario APU pictured, die size ~77mm^2, Xtreme Systems, http://www.xtremesystems.org/forums/showthread.php?258499-AMD-Ontario-APUpictured-die-size-77mm-2&s=1efcaa856d971673336f2e458e1ddf5f
- [40]: AMD Accelerated Parallel Processing (APP) SDK (formerly ATI Stream) with OpenCL 1.1 Support, APP SDK 2.3, Jan. 2011

- [41]: Altavilla D., AMD Fusion: A8-3500M A-Series Llano APU Review, Hot Hardware, June 14 2011, http://hothardware.com/Reviews/AMD-Fusion-A83500M-ASeries-Llano-APU-Review/?page=2
- [42]: Angelini C., ASRock's E350M1: AMD's Brazos Platform Hits The Desktop First, Tom' Hardware, Jan. 13 2011, http://www.tomshardware.com/reviews/asrock-e350m1amd-brazos-zacate-apu,2840.html
- [43]: Gonzales N., 2006 Technology Analyst Day, June 1 2006, http://setcom.ee/tanno/info/is/teave/ite\_arv\_cpu\_x86\_amd\_all\_prodinfo\_2007.pdf
- [44]: Walton J., AMD Reveals Brazos 2.0 APUs and FCH, AnandTech, June 5 2012, http://www.anandtech.com/show/5937/amd-reveals-brazos-20-apus-and-fch
- [45]: Sutphen C., Introducing the 2012 AMD E-Series APU, June 5 2012, http://www.slideshare.net/AMD/introducing-the-2012-amd-eseries-apu
- [46]: Ganesh T. S., AMD G-Series Brings APUs to the x86 Embedded Market, AnandTech, Jan. 19 2011, http://www.anandtech.com/show/4133/amd-gseries-brings-apus-to-thex86-embedded-market/2
- [47]: AMD Embedded G-Series Platform, 2011, http://www.amd.com/us/Documents/49282\_G-Series\_platform\_brief.pdf

[48]: AMD White Paper, 2011, http://www.amd.com/us/Documents/50356\_G-Series\_xSTB\_Whitepaper.pdf

- [49]: Shah A., AMD Fusion: A8-3500M A-Series Llano APU Review, Hot Hardware, Computerworld, July 16 2012, http://www.computerworld.com/s/article/9229223/ AMD\_to\_detail\_upcoming\_Jaguar\_low\_power\_chip\_design\_for\_tablets
- [50]: Shimpi A.L., A Closer Look at the Kabini Die, AnandTech, May 23 2013, http://www.anandtech.com/show/6977/a-closer-look-at-the-kabini-die
- [51]: AMD Radeon HD 8280 IGP, Tech Power Up, http://www.techpowerup.com/gpudb/2204/radeon-hd-8280-igp.html
- [52]: Case L., Everything You Need to Know about APUs in Next-Gen Consoles, Tested, Febr. 19 2013, http://www.tested.com/tech/gaming/453638-everything-you-need-knowabout-apus-next-gen-consoles/
- [53]: Wikipedia, Wii U, http://en.wikipedia.org/wiki/Wii\_U
- [54]: Preliminary BIOS and Kernel Developer's Guide (BKDG) for AMD Family 16h Models 00h-0Fh (Kabini) Processors, Rev. 3.00, May 30 2013, http://support.amd.com/us/Processor\_ TechDocs/48751\_BKDG\_Fam\_16h\_Mod\_00h-0Fh.pdf
- [55]: AMD Amplifies Mobile Experience with Responsive Performance, Rich Graphics, Elite Software and Long Battery Life, May 23 2013, http://www.amd.com/us/press-releases/Pages/ amd-rejuvenates-mobile-2013may23.aspx
- [56]: Shimpi A.L., AMD Opteron X1150 & X2150 "Kyoto": Kabini Heads to Servers, AnandTech, May 29 2013, http://www.anandtech.com/show/6992/amd-opteron-x1150-x2150-kyotokabini-heads-to-servers

- [57]: AMD Opteron X2150 APU, Quick Reference Guide, 2013, http://www.amd.com/us/Documents/Kyoto2150\_QRG.pdf
- [58]: Pollice M., AMD launches Opteron X-Series, Moving Jaguar into Servers, BSN, May 30 2013, http://www.brightsideofnews.com/news/2013/5/30/amd-launches-opteron-x-series2cmoving-jaguar-into-servers.aspx
- [59]: AMD 2013 Mobility APU Introduction, Slideshare, May 22 2013, http://www.slideshare.net/AMD/amd-2013-mobility-apu-introduction-deck-final-for-lp
- [60]: Walrath J., Jaguar + GCN The Compute Architecture for Temash and Kabini, PC Perspective, May 23 2013, http://www.pcper.com/reviews/Processors/Jaguar-GCN-Compute-Architecture-Temash-and-Kabini/Getting-Away-Cores
- [61]: Walton J., The AMD Kabini Review: A4-5000 APU Tested, AnandTech, May 23 2013, http://www.anandtech.com/show/6974/amd-kabini-review
- [62]: Sima D., The design space of register renaming techniques, IEEE Micro, Vol. 20, Issue 5, Aug. 6 2002
- [63]: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 16h Models 30h-3Fh Processors, March 18 2016, https://support.amd.com/TechDocs/52740\_16h\_Models\_30h-3Fh\_BKDG.pd
- [64]: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 16h Models 00h-0Fh Processors, Rev. 3.03, https://support.amd.com/TechDocs/48751\_16h\_bkdg.pdf
- [65]: Shimpi A.L., Understanding AMD's Roadmap & New Direction, AnandTech, Febr. 2 2012, https://www.anandtech.com/show/5503/understanding-amds-roadmap-new-direction

- [66]: Wikipedia, Bobcat (microarchitecture), https://en.wikipedia.org/wiki/Bobcat\_(microarchitecture)
- [67]: Kean S., AMD Beema and Mullins APU Performance 3rd Generation APUs, Legit Reviews, Apr. 29 2014, http://www.legitreviews.com/amd-beema-mullins-apu\_139863
- [68]: AMD 2014 Mobility APU Lineup Announcement, Slideshare, Nov. 15 2013, https://www.slideshare.net/AMD/amd-mobility-apu-lineup-announcement?from\_action=sav
- [69]: Shimpi A.L, AMD Beema/Mullins Architecture & Performance Preview, AnandTech, Apr. 29 2014, https://www.anandtech.com/show/7974/amd-beema-mullins-architecturea10-micro-6700t-performance-preview
- [70]: Woligroski D., Mullins And Beema APUs: AMD Gets Serious About Tablet SoCs, Tom's Hardwar April 28 2014, https://www.tomshardware.com/reviews/amd-tablet-processor,3813.html
- [71]: Power Management Becomes System Aware, Hardware Canuncks, April 28 2014, http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/66162-amdmullins-beema-mobile-apus-preview-3.html
- [72]: Shilov A., AMD to boost energy efficiency of APUs by 25 times by 2020, Kitguru, Nov. 29 2014 https://www.kitguru.net/components/cpu/anton-shilov/amd-boost-energy-efficiency-ofapus-by-25-times-by-2020/

[73]: Hruska J., AMD launches new Beema, Mullins SoCs: Higher performance at almost-lowenough TDPs, ExtremeTech, April 29 2014, https://www.extremetech.com/computing/181407-amd-launches-new-beema-mullinssocs-higher-performance-at-almost-low-enough-tdps/3

- [74]: Wikipedia, Puma (microarchitecture), https://en.wikipedia.org/wiki/Puma\_(microarchitecture)
- [75]: Shilov A., AMD rumoured to unveil next-gen 'Carrizo-L' APUs this December, Kitguru, Oct. 7 2014, https://www.kitguru.net/components/cpu/anton-shilov/amd-rumoured-tounveil-next-gen-carrizo-l-apus-this-december/
- [76]: AMD A-Series APUS Deliver What Customers Need, July 15 2016, http://ps-philgeps.gov.ph/home/images/BAD/PHILIPPINES\_PS-DBM%20Event%20July %2015,%202016.pdf
- [77]: Cutress I., AMD's Carrizo-L APUs Unveiled: 12-25W Quad Core Puma+, AnandTech, May 12 2015, https://www.anandtech.com/show/9246/amds-carrizo-l-apus-unveiled-12-25w-quad-core-puma
- [78]: Smith R. and Cutress I., Intel's Changing Future: Smartphone SoCs Broxton & Sophia Officially Cancelled, AnandTech, April 29, 2016, https://www.anandtech.com/show/10288/intel-broxton-sofia-smartphone-socs-cancelled
- [79]: Nguyen H., NVIDIA Definitely "Not Interested" In Building Smartphones SoCs, Ubrergizmo, 06/04/2016, http://www.ubergizmo.com/2016/06/nvidia-not-interested-smartphones-soc/