NEON
ARCHITECTURE
NOEN is the extension support provided to ARM
architecture, for parallel operations for large data processing. Large data
processing is nothing but the audio, video and multimedia processing operations.
Initially parallel operations referred and introduced as ARM SIMD.
Before moving to NOEN completely a brief introduction
to ARM SIMD. In modern days software such as codes and graphic accelerators
operate on the large amount of which is less than a word size. Such as data
size of digital audio and video is respectively 16 and 8 bits. When performing
these operations on a 32-bit microprocessor, parts of the computation units are
unused, but continue to consume power. To make better use SIMD perform parallel
the 4 operations by dividing the 32 bit register into 4 parts. SIMD technology
uses a single instruction to perform the same operation in parallel on multiple
data elements of the same type and size.
In ARMv6 (There are many versions in ARM
architectures. Currently trending is ARMv7, ARMv8) SIMD instruction are
introduced which are operating on the 16-bit, 8-bit data units packed in to
32bit General Purpose Registers. This permits to execute certain operations to
perform twice or four times quicker.
Following figure
shows the 4 parallel 8 bit addition operations. Here four lanes of 8-bit data
is packed into Vector registers R1, R2 and place the result into R0.
The Instruction to
this is UADD8 R0, R1, R2.
Fig N.1: 4-way 8-bit unsigned integer add operation
ARMv7 Architecture introduced the ARM SIMD
Extension. It extends the SIMD to operate on the 64-bit doubleword and 128-bit quadword
vector registers by defining the group of instructions to operate. The
implementation of the Advanced SIMD extension used in ARM processors is called
NEON.
NEON technology is implemented on all
current ARM Cortex-A series processors. NEON instructions are executed as part
of the ARM or Thumb instruction stream, this leads to simplification of software
development, debugging, and integration. Traditional ARM or Thumb instructions
manage all program flow and synchronization. The NEON instructions perform:
·
memory
accesses
·
data
copying between NEON and general purpose registers
·
data
type conversion
·
data
processing.
NEON provides the standardized acceleration,
media and signal processing applications.
Following figure
shows the eight lane NEON addition operation on 128-bit quadword for 16-bit
data. The Instruction for this is VADD.I16 Q0, Q1, Q2.
Fig
N.2: 8-way 16-bit integer add operation
SUPPORTED DATA TYPES
NEON instructions
support 8-bit, 16-bit, 32-bit, and 64-bit signed and unsigned integers.
NEON also supports
32-bit single-precision floating point elements, and 8-bit and 16-bit
polynomials.
The VCVT instruction
converts elements between single-precision floating-point and:
• 32-bit integer
• Fixed-point
• Half-precision
floating point, if the processor implements the half-precision extensions.
NEON REGISTERS
NEON Register bank consists of 32 64-bit registers. For botn SIMD and
VFP(Vector Floating Point Operations) these register r are shared. This bank of
registers are also viewed as
- sixteen
128-bit quadword registers, Q0-Q15
- Thirty-two
64-bit doubleword registers, D0-D31.
NEON D0-D31 registers are the same
as the VFPv3 D0-D31 registers and each of the Q0-Q15 registers map onto a pair
of D registers. Figure following shows the different views of the shared NEON
and VFP register bank. All of these are accessible at any time.
Fig
N.3: NEON and VFP register set



No comments:
Post a Comment