👹
Carlos's Tech Blog
  • 🧔ECUs
    • ZYNQ_Documents
      • [ZYNQ] 构建ZYNQ的BSP工程
      • [ZYNQ] 启动流程
      • [ZYNQ] Secure Boot Flow
      • [ZYNQ] Provisioning Guideline
      • [ZYNQ] Decrypting Partition by the Decrypt Agent Using PUF key
      • [ZYNQ] enabling the cryptsetup on ramdisk
      • [ZYNQ] Encrypt external files based on file system using PUF key
      • [ZYNQ] Loading an Encrypted Linux kernel at U-Boot with a KUP Key
      • [ZYNQ] cross-compile the cryptsetup on Xilinx ZYNQ aarch64 platform
      • [ZYNQ] Linux Linaro系统镜像制作SD卡启动
    • S32G_Documents
      • [S32G] Going through the s32g hard/soft platform
      • [S32G] S32g247's Secure Boot using HSE firmware
        • S32g2 HSE key config
        • How S32g verify secure boot image
        • S32g secure boot signature generation
        • How to download and build S32g Secure boot image
        • [S32G] OTA with Secure Boot
    • RT117x_Documents
      • [RT-117x]IMX RT1170 Provisioning Guideline
      • [RT-117x] Going through the MX-RT1170 hard/soft platform
      • [RT-117x] i.MX-RT1170's Secure Boot
        • [RT-117x]Signing image with the HSM (SignServer)
    • LS104x_Documents
      • [LS104x] bsp project
      • [LS104x] boot flow
      • [LS104x] secure boot
      • [LS104x] Application Note, Using the PKCS#11 in TCU platform
      • [LS104x] 使用ostree更新rootfs
      • [LS104x] ostree的移植
      • [LS104x] Starting with Yocto
      • [LS104x] 使用FIT的kernel格式和initramfs
    • IMX6/8_Documents
      • [IMX6] Defining A U-Boot Command
      • NXP IMX6 嵌入式板子一些笔记
      • NXP-imx6 initialization
    • Vehicle_Apps
      • [SecOC] Tree
        • [SecOC] SecOC Freshness and MAC Truncation
  • 😾TECH
    • Rust Arm OS
      • ARMv7m_Using_The_RUST_Cross_Compiler
    • ARM
      • ARM-v7-M
        • 01_ARMv7-M_处理器架构技术综述
        • 02_ARMv7-M_编程模型与模式
        • 03_ARMv7-M_存储系统结构
        • 04_ARMv7-M_异常处理及中断处理
      • ARM-v8-A
        • 02_ARMv8_基本概念
        • 03_ARMv8_指令集介绍_加载指令集和存储指令集
        • 04_ARMv8_指令集_运算指令集
        • 05_ARMv8_指令集_跳转_比较与返回指令
        • 06_ARMv8_指令集_一些重要的指令
        • 0X_ARMv8_指令集_基于汇编的UART驱动
        • 07_ARMv8_汇编器Using as
        • 08_ARMv8_链接器和链接脚本
        • 09_ARMv8_内嵌汇编(内联汇编)Inline assembly
        • 10_ARMv8_异常处理(一) - 入口与返回、栈选择、异常向量表
        • 11_ARMv8_异常处理(二)- Legacy 中断处理
        • 12_ARMv8_异常处理(三)- GICv1/v2中断处理
        • 13_ARMv8_内存管理(一)-内存管理要素
        • 14_ARMv8_内存管理(二)-ARM的MMU设计
        • 15_ARMv8_内存管理(三)-MMU恒等映射及Linux实现
        • 16_ARMv8_高速缓存(一)cache要素
        • 17_ARMv8_高速缓存(二)ARM cache设计
        • 18_ARMv8_高速缓存(三)多核与一致性要素
        • 19_ARMv8_TLB管理(Translation Lookaside buffer)
        • 20_ARMv8_barrier(一)流水线和一致性模型
        • 21_ARMv8_barrier(二)内存屏障案例
      • ARM Boot Flow
        • 01_Embedded_ARMv7/v8 non-secure Boot Flow
        • 02_Embedded_ARMv8 ATF Secure Boot Flow (BL1/BL2/BL31)
        • 03_Embedded_ARMv8 BL33 Uboot Booting Flow
      • ARM Compiler
        • Compiler optimization and the volatile keyword
      • ARM Development
        • 在MACBOOK上搭建ARMv8架构的ARM开发环境
        • Starting with JLink debugger or QEMU
    • Linux
      • Kernel
        • 0x01_LinuxKernel_内核的启动(一)之启动前准备
        • 0x02_LinuxKernel_内核的启动(二)SMP多核处理器启动过程分析
        • 0x21_LinuxKernel_内核活动(一)之系统调用
        • 0x22_LinuxKernel_内核活动(二)中断体系结构(中断上文)
        • 0x23_LinuxKernel_内核活动(三)中断体系结构(中断下文)
        • 0x24_LinuxKernel_进程(一)进程的管理(生命周期、进程表示)
        • 0x25_LinuxKernel_进程(二)进程的调度器的实现
        • 0x26_LinuxKernel_设备驱动(一)综述与文件系统关联
        • 0x27_LinuxKernel_设备驱动(二)字符设备操作
        • 0x28_LinuxKernel_设备驱动(三)块设备操作
        • 0x29_LinuxKernel_设备驱动(四)资源与总线系统
        • 0x30_LinuxKernel_设备驱动(五)模块
        • 0x31_LinuxKernel_内存管理(一)物理页面、伙伴系统和slab分配器
        • 0x32_LinuxKernel_内存管理(二)虚拟内存管理、缺页与调试工具
        • 0x33_LinuxKernel_同步管理_原子操作_内存屏障_锁机制等
        • 01_LinuxDebug_调试理论和基础综述
      • Userspace
        • Linux-用户空间-多线程与同步
        • Linux进程之间的通信-管道(上)
        • Linux进程之间的通信-管道(下)
        • Linux进程之间的通信-信号量(System V)
        • Linux进程之间的通信-内存共享(System V)
        • Linux进程之间的通信-消息队列(System V)
        • Linux应用调试(一)方法、技巧和工具 - 综述
        • Linux应用调试(二)工具之coredump
        • Linux应用调试(三)工具之Valgrind
        • Linux机制之内存池
        • Linux机制之对象管理和引用计数(kobject/ktype/kset)
        • Linux机制copy_{to, from}_user
        • Linux设备树 - DTS语法、节点、设备树解析等
        • Linux System : Managing Linux Services - inittab & init.d
        • Linux System : Managing Linux Services - initramfs
      • Kernel Examples
        • Linux Driver - GPIO键盘驱动开发记录_OMAPL138
        • 基于OMAPL138的Linux字符驱动_GPIO驱动AD9833(一)之miscdevice和ioctl
        • 基于OMAPL138的Linux字符驱动_GPIO驱动AD9833(二)之cdev与read、write
        • 基于OMAPL138的字符驱动_GPIO驱动AD9833(三)之中断申请IRQ
        • Linux内核调用SPI驱动_实现OLED显示功能
        • Linux内核调用I2C驱动_驱动嵌套驱动方法MPU6050
    • OPTEE
      • 01_OPTEE-OS_基础之(一)功能综述、简要介绍
      • 02_OPTEE-OS_基础之(二)TrustZone和ATF功能综述、简要介绍
      • 03_OPTEE-OS_系统集成之(一)编译、实例、在QEMU上执行
      • 05_OPTEE-OS_系统集成之(三)ATF启动过程
      • 06_OPTEE-OS_系统集成之(四)OPTEE镜像启动过程
      • 07_OPTEE-OS_系统集成之(五)REE侧上层软件
      • 08_OPTEE-OS_系统集成之(六)TEE的驱动
      • 09_OPTEE-OS_内核之(一)ARM核安全态和非安全态的切换
      • 10_OPTEE-OS_内核之(二)对安全监控模式的调用的处理
      • 11_OPTEE-OS_内核之(三)中断与异常的处理
      • 12_OPTEE-OS_内核之(四)对TA请求的处理
      • 13_OPTEE-OS_内核之(五)内存和cache管理
      • 14_OPTEE-OS_内核之(六)线程管理与并发
      • 15_OPTEE-OS_内核之(七)系统调用及IPC机制
      • 16_OPTEE-OS_应用之(一)TA镜像的签名和加载
      • 17_OPTEE-OS_应用之(二)密码学算法和安全存储
      • 18_OPTEE-OS_应用之(三)可信应用的开发
      • 19_OPTEE-OS_应用之(四)安全驱动开发
      • 20_OPTEE-OS_应用之(五)终端密钥在线下发系统
    • Binary
      • 01_ELF文件_目标文件格式
      • 02_ELF文件结构_浅析内部文件结构
      • 03_ELF文件_静态链接
      • 04_ELF文件_加载进程虚拟地址空间
      • 05_ELF文件_动态链接
      • 06_Linux的动态共享库
      • 07_ELF文件_堆和栈调用惯例以ARMv8为例
      • 08_ELF文件_运行库(入口、库、多线程)
      • 09_ELF文件_基于ARMv7的Linux系统调用原理
      • 10_ELF文件_ARM的镜像文件(.bin/.hex/.s19)
    • Build
      • 01_Script_makefile_summary
    • Rust
      • 02_SYS_RUST_文件IO
    • Security
      • Crypto
        • 1.0_Security_计算机安全概述及安全需求
        • 2.0_Security_随机数(伪随机数)
        • 3.0_Security_对称密钥算法加解密
        • 3.1_Security_对称密钥算法之AES
        • 3.2_Security_对称密钥算法之MAC(CMAC/HMAC)
        • 3.3_Security_对称密钥算法之AEAD
        • 8.0_Security_pkcs7(CMS)_embedded
        • 9.0_Security_pkcs11(HSM)_embedded
      • Tools
        • Openssl EVP to implement RSA and SM2 en/dec sign/verify
        • 基于Mac Silicon M1 的OpenSSL 编译
        • How to compile mbedtls library on Linux/Mac/Windows
    • Embedded
      • eMMC启动介质
  • 😃Design
    • Secure Boot
      • JY Secure Boot Desgin
    • FOTA
      • [FOTA] Module of ECUs' FOTA unit design
        • [FOTA] Tech key point: OSTree Deployment
        • [FOTA] Tech key point: repositories role for onboard
        • [FOTA] Tech key point: metadata management
        • [FOTA] Tech key point: ECU verifying and Decrpting
        • [FOTA] Tech key point: time server
      • [FOTA] Local-OTA for Embedded Linux System
    • Provisioning
      • [X-Shield] Module of the Embedded Boards initialization
    • Report
由 GitBook 提供支持
在本页
  • 09_ARMv8_内嵌汇编(内联汇编)Inline assembly
  • 1. 基本用法
  • 2. 宏函数
  • Ref
  1. TECH
  2. ARM
  3. ARM-v8-A

09_ARMv8_内嵌汇编(内联汇编)Inline assembly

https://github.com/carloscn/blog/issues/22

09_ARMv8_内嵌汇编(内联汇编)Inline assembly

内联汇编并非ARCH64专门的使用方法,而是GNU编译器通用的做法。目的有二,其一,对于时间敏感的函数使用内联汇编减少执行开销;其二,C语言无法访问架构级的特殊指令,比如内存屏障功能。

1. 基本用法

1.1 基础内联汇编格式

Define:asm ("asm instruction")

【示例】:

asm(icicalluis) 调用一条高速缓存维护指令。

1.2 扩展内联汇编代码

  • Define:asm qualifier-asm (AssemblerInstruction)

  • Instruction:

    • 格式:指令部分:输出部分:输入部分:损坏部分

    • 编译器不会解析,按照字符串处理

  • 限定词:volatile,inline

    • volatile:无特殊情况下的限定词

    • inline:asm的语句视为代码最小可能性

【示例1】:

static inline unsigned long array_index_mask_nospec(unsigned long idx, unsigned long sz)
{
	unsigned long mask;
  asm volatile(
  "			cmp			%1, %2\n"								// 指令部第一行
  "     sbc     %0, xzr, xzr\n"         // 指令部第二行,使用\n换行
  : "=r" (mask)                         // 输出部分,指定只写属性的变量mask(%0)
  : "r" (idx), "Ir" (sz)                // 输入部分,执行只读部分的变量idx(%1), sz(%2)
  : "cc");                              // 损坏部分,跳出汇编
  csdb();
  return mask;
}

【示例2】:

static inline unsigned long arch_local_irq_save(void)
{
  unsigned long flags;
  asm volatile(
  "			mrs		%0, daif\n"
  "     msr   daifset, #2\n"
  : "=r" (flag)
  :
  : "memory"
  );
}

1.2.1 修饰符

C
Describe

=

Means that this operand is written to by this instruction: the previous value is discarded and replaced by new data. (只能写)

+

Means that this operand is both read and written by the instruction. (读写)

&

Means (in a particular alternative) that this operand is an earlyclobber operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address.(输入参数的指令执行完成之后才能写入)

1.2.2 约束符

C
Describe

p

An operand that is a valid memory address is allowed. This is for “load address” and “push address” instructions. (内存地址)

m

A memory operand is allowed, with any kind of address that the machine supports in general. Note that the letter used for the general memory constraint can be re-defined by a back end using the TARGET_MEM_CONSTRAINT macro.(内存变量)

o

A memory operand is allowed, but only if the address is offsettable. This means that adding a small integer (actually, the width in bytes of the operand, as determined by its machine mode) may be added to the address and the result is also a valid memory address.(内存地址,基地址寻址)

r

A register operand is allowed provided that it is in a general register. 通用寄存器

i

An immediate integer operand (one with constant value) is allowed. This includes symbolic constants whose values will be known only at assembly time or later. 立即数

V

A memory operand that is not offsettable. In other words, anything that would fit the ‘m’ constraint but not the ‘o’ constraint. 内存变量,不允许偏移的内存操作数

n

An immediate integer operand with a known numeric value is allowed. Many systems cannot support assembly-time constants for operands less than a word wide. Constraints for these operands should use ‘n’ rather than ‘i’.离结束

除了通用的约束符之外,还有AARCH64特有的约束符

C
Describe

k

The stack pointer register (SP)

w

Floating point register, Advanced SIMD vector register or SVE vector register

x

Like w, but restricted to registers 0 to 15 inclusive.

y

Like w, but restricted to registers 0 to 7 inclusive.

Upl

One of the low eight SVE predicate registers (P0 to P7)

Upa

Any of the SVE predicate registers (P0 to P15)

I

Integer constant that is valid as an immediate operand in an ADD instruction

J

Integer constant that is valid as an immediate operand in a SUB instruction (once negated)

K

Integer constant that can be used with a 32-bit logical instruction

L

Integer constant that can be used with a 32-bit logical instruction

M

Integer constant that is valid as an immediate operand in a 32-bit MOV pseudo instruction. The MOV may be assembled to one of several different machine instructions depending on the value

N

Integer constant that is valid as an immediate operand in a 64-bit MOV pseudo instruction

S

An absolute symbolic address or a label reference

Y

Floating point constant zero

Z

Integer constant zero

Ush

The high part (bits 12 and upwards) of the pc-relative address of a symbol within 4GB of the instruction

Q

A memory address which uses a single base register with no offset

Ump

A memory address suitable for a load/store pair instruction in SI, DI, SF and DF modes

【示例3】:分析atomic_add函数内嵌汇编

void my_atomic_add(unsigned long val, void *p)
{
  unsigned long tmp;
  int result;
  asm volatile (
  "		1: 	ldxr %0, [%2]\n"
  "       add %0, %0, %3\n"
  "       stxr %w1, %0, [%2]\n"
  "       cbnz %w1, 1b\m"
  : "+r" (tmp), "+r" (result), "+Q" (*(unsigned long *)p)
  : "r" (val)
  : "cc", "memory"
  );
}

内联汇编还支持助记符:

int add(int i, int j) {
  int ret = 0;
  asm volatile (
  "	add %w[result], %w[input_i], %w[input_j]\n"
  : [result] "=r" (res)
  : [input_i] "r" (i), [input_j] "r" (j)
  :
  );
}

实验:实现memcpy,使用内联汇编的方式。

void test_memcpy(void)
{
    unsigned long src_addr = 0x80000, dest_addr = 0x200000;
    unsigned long sz = 32;

    asm volatile (
    "       mov x6, %x[_src_addr]\n"
    "       mov x7, %x[_dest_addr]\n"
    "       add x8, x6, %x[_sz]\n"
    "   1:  ldr x9, [x6], #8\n"
    "       str x9, [x7], #8\n"
    "       cmp x6, x8\n"
    "       b.cc 1b\n"
    :
    : [_src_addr] "r" (src_addr), [_dest_addr] "r" (dest_addr), [_sz] "r" (sz)
    : "cc", "memory"
    );
}

以下需要注意:

  • GDB不能单步调试内联汇编。建议使用纯汇编单独编写调试之后移植到C语言内部。

  • 内联汇编的参数属性是易错点。

  • 输出部和输入部的修饰符不能用错,否则程序会跑飞。比如参数寄存器x0,指定在输入部,汇编内部对参数做了修改比如用了add指令,那么就跑飞了。

再做一个例子memset

void test_memset(void)
{
    unsigned long addr = 0x80000;
    unsigned long sz = 16;
    unsigned long i = 0;

    asm volatile (
    "       mov x4, #0\n"
    "   1:  stp %x[_count], %x[_count], [%x[_addr]], #16\n"
    "       add %x[_sz], %x[_sz], #16\n"
    "       cmp %x[_sz], %x[_addr]\n"
    "       bne 1b\n"
    : [_addr] "+r" (addr), [_sz] "+r" (sz), [_count] "+r" (i)
    :
    : "memory"
    );
}

2. 宏函数

内联汇编也可以和C语言的宏联系到一起,使用##字符串拼接的方式,在内联汇编里面使用""双引号来引用宏参数。

#define MY_OPS(ops, asm_ops)                                        \
static inline void my_asm_##ops(unsigned long mask, void *p) {      \
    unsigned long tmp;                                              \
    asm volatile (                                                  \
        "       ldr %1, [%0]\n"                                     \
        "       "#asm_ops" %1, %1, %2\n"                            \
        "       str %1, [%0]\n"                                     \
        : "+r" (p), "+r" (tmp)                                      \
        : "r" (mask)                                                \
        : "memory"                                                  \
    );                                                              \
}

MY_OPS(or, orr)
MY_OPS(and, and)
MY_OPS(andnot, bic)

最后宏展开为几个函数:

  • my_asm_or

  • my_asm_and

  • my_asm_andnot

这三个函数只是在内联汇编里面 "#asm_ops"位置不同。

Ref

上一页08_ARMv8_链接器和链接脚本下一页10_ARMv8_异常处理(一) - 入口与返回、栈选择、异常向量表

最后更新于1年前

😾
GCC - 6.47.3.1 Simple Constraints