Introduction to ARM Architecture

ASSEMBLY LANGUAGE

An assembly language, abbreviated asm, is a low-level programming language for a computer, or other programmable device, in which there is a very strong correspondence between the language and the architecture's machine code instructions.

Why Assembly for DSP?

In general for real time application the execution time of program plays an important role in determining stability and failure rate of system. So DSP algorithms like Audio Processing algorithms are performance critical, because they involve in high floating point operations. So to achieve the performance we need to optimize the DSP algorithms using ARM-NEON SIMD Assembly language, converting the C code algorithm to assembly language by using various optimizing methods such as vectorization using SIMD, Loop unrolling etc.

In the following sections I am detailing about the main Instructions related to the ARM-NEON Assembly.

Instruction Set for ARM Assembly Language

Properties of ARM Instruction set:

1. All instructions are 32 bits long.

2. Most instructions execute in a single cycle.

3. Most instructions can be conditionally executed.

4. Load/store architecture Data processing instructions act only on registers.

5. Three operand format.

6. Combined ALU and shifter for high speed bit manipulation.

7. Specific memory access instructions with powerful auto‐indexing addressing modes.

8. Flexible multiple register load and store instructions.

I am just listing the ARM and Neon Instruction set. For detail information please refer the ARM technical reference manual.

Basic Operational instructions:

Arithmetic Operations

· ADD, ADDC, SUB, SUBC, RSB, RSC.

Syntax: <neumonic>{S}<c><q> {<Rd>,} <Rn>, <Rm>

Rd: Destination Register, Rn & Rm: Source Registers, c: Condition flag, neumonic: Operation, If S is present, the instruction updates the flags.

Logical Operations

· AND, ORR, EOR, BIC, ORN.

Syntax: <neumonic>{S}<c><q> {<Rd>,} <Rn>, <Rm>

Rd: Destination Register, Rn & Rm: Source Registers, c: Condition flag, neumonic: Operation, If S is present, the instruction updates the flags.

Multiplication and division Operations.

· MUL, MULL, MLA, MLAL

Syntax: <neumonic>{S}<c><q> {<Rd>,} <Rn>, <Rm>

Rd: Destination Register, Rn & Rm: Source Registers, c: Condition flag, neumonic: Operation, If S is present, the instruction updates the flags.

Data Movement and Stack Operations.

· MOV, MVN, PUSH, POP, MSR, MRS.

Syntax: <neumonic>{S}<c><q> {<Rd>,} <Rn>

Rd: Destination Register, Rn & Rm: Source Registers, c: Condition flag, neumonic: Operation, If S is present, the instruction updates the flags.

Compare, Conditional and Convert Instructions.

· TST, TEQ, CMP, CMN.

Syntax: <neumonic>{S}<c><q> {<Rd>,} <Rn>

Rd: Destination Register, Rn & Rm: Source Registers, c: Condition flag, neumonic: Operation, If S is present, the instruction updates the flags.

Load and Store Instructions.

· LDR, STR, LDM, STM, LDMIA, LDMIB, STMIA, STMIB.

Syntax:< neumonic >{<cond>}{<size>} Rd, <address>.

Rd: Destination Register, address: Data address. cond: Condition flag, neumonic: Operation, Size: Size of data.

Other Instructions.

· SWI, REV, ROR, LSL, LSR, ASR, RRX, B, BL, BX, BLX.

Syntax: <neumonic>{S}<c><q> {<Rd>,} <Rn>

Rd: Destination Register, Rn & Rm: Source Registers, c: Condition flag, neumonic: Operation, If S is present, the instruction updates the flags.

Single Register data transfer:

· The basic load and store instructions are: Load and Store Word or Byte LDR / STR / LDRB / STRB.

· ARMv4 adds support for Half words and signed data: LDRH / STRH.

· Load Signed Byte or Halfword ‐ Value and sign extend to 32 bits: LDRSB / LDRSH.

· Conditionally executed by appropriate condition code STR / LDR: LDREQB

· Syntax:<LDR|STR>{<cond>}{<size>} Rd, <address>.

Block Register data transfer.

· Load and Store Multiple instructions (LDM / STM) allow transfer to or from memory.

· Any subset of current bank of registers (default).

· Whole Register bank or subset copied with single Instruction.

· Appending ‘!’ can update Base register.

· Operated in Little Endian.

· Very efficient for saving and restoring context, Moving large blocks of data.

Instruction Set for NEON Assembly

Arithmetic Operations:

· VABA, VABD, VABS, VADD, VADDHN, VHADD, VADDL, VADDW, VSUB, VSUBHN, VHSUBB, VSUBL, VSUBH, VPADD, VPADDL.