Friday, January 5, 2018

NEON ARCHITECTURE

NOEN is the extension support provided to ARM architecture, for parallel operations for large data processing. Large data processing is nothing but the audio, video and multimedia processing operations. Initially parallel operations referred and introduced as ARM SIMD.
Before moving to NOEN completely a brief introduction to ARM SIMD. In modern days software such as codes and graphic accelerators operate on the large amount of which is less than a word size. Such as data size of digital audio and video is respectively 16 and 8 bits. When performing these operations on a 32-bit microprocessor, parts of the computation units are unused, but continue to consume power. To make better use SIMD perform parallel the 4 operations by dividing the 32 bit register into 4 parts. SIMD technology uses a single instruction to perform the same operation in parallel on multiple data elements of the same type and size.
In ARMv6 (There are many versions in ARM architectures. Currently trending is ARMv7, ARMv8) SIMD instruction are introduced which are operating on the 16-bit, 8-bit data units packed in to 32bit General Purpose Registers. This permits to execute certain operations to perform twice or four times quicker.  
Following figure shows the 4 parallel 8 bit addition operations. Here four lanes of 8-bit data is packed into Vector registers R1, R2 and place the result into R0.
The Instruction to this is UADD8 R0, R1, R2.
Fig N.1: 4-way 8-bit unsigned integer add operation
ARMv7 Architecture introduced the ARM SIMD Extension. It extends the SIMD to operate on the 64-bit doubleword and 128-bit quadword vector registers by defining the group of instructions to operate. The implementation of the Advanced SIMD extension used in ARM processors is called NEON.
NEON technology is implemented on all current ARM Cortex-A series processors. NEON instructions are executed as part of the ARM or Thumb instruction stream, this leads to simplification of software development, debugging, and integration. Traditional ARM or Thumb instructions manage all program flow and synchronization. The NEON instructions perform:
·         memory accesses
·         data copying between NEON and general purpose registers
·         data type conversion
·         data processing.
NEON provides the standardized acceleration, media and signal processing applications.

Following figure shows the eight lane NEON addition operation on 128-bit quadword for 16-bit data. The Instruction for this is VADD.I16 Q0, Q1, Q2.
Fig N.2: 8-way 16-bit integer add operation

SUPPORTED DATA TYPES
NEON instructions support 8-bit, 16-bit, 32-bit, and 64-bit signed and unsigned integers.
NEON also supports 32-bit single-precision floating point elements, and 8-bit and 16-bit polynomials.
The VCVT instruction converts elements between single-precision floating-point and:
• 32-bit integer
• Fixed-point
• Half-precision floating point, if the processor implements the half-precision extensions.

NEON REGISTERS
NEON Register bank consists of 32 64-bit registers. For botn SIMD and VFP(Vector Floating Point Operations) these register r are shared. This bank of registers are also viewed as
  • sixteen 128-bit quadword registers, Q0-Q15
  • Thirty-two 64-bit doubleword registers, D0-D31.

NEON D0-D31 registers are the same as the VFPv3 D0-D31 registers and each of the Q0-Q15 registers map onto a pair of D registers. Figure following shows the different views of the shared NEON and VFP register bank. All of these are accessible at any time.
Fig N.3: NEON and VFP register set


No comments:

Post a Comment