0

In RISC-V assembly what does underscore before a register mean in assembly?

Something like lw sp, (_sp)

Becomes

auipc sp,0x3 sp
lw sp, -1804(sp)

Whereas,

lw sp, (sp)

becomes

lw sp,0(sp)

It's not clear where the offset comes from in the first _sp example. What does that mean? I looked here RISC-V Assembly Programmer's Manual and I don't see it there either. Is underscore before a register a common assembly syntax that I'm missing?

I'm working on a simple bootloader. It's a modified version of ultraembedded's trivial bootloader example. I'm trying to make sense of some assembly code in this bootloader. Ultraembedded Trivial bootloader

The original source code looks like this:

start:
  # Setup stack pointer
  lui sp, %hi(_sp)
  add sp, sp, %lo(_sp)

  # Setup IRQ vector
  lui t0, %hi(isr_vector)
  add t0, t0, %lo(isr_vector)
  csrw mtvec, t0

  # t0 = _bss_start
  lui t0,%hi(_bss_start)
  add t0,t0,%lo(_bss_start)

  # t1 = _end
  lui t1,%hi(_end)
  add t1,t1,%lo(_end)

I'm trying to understand how the stack pointer is setup. If I understand this correctly lui is a psuedo instruction that will expand to auipc and add. I can't get this original code to compile so I can't verify it, but other code does this.

Edit: I'm using the riscv-gnu-toolchain built with multi-lib support. I'm compiling for 64-bit. riscv64-unknown-linux-gnu-gcc (compiled with glibc support)

Edit 2: After a bit of trial and error, I wrote the following modified start routine.

 start:
 # Setup stack pointer
 lla sp, (_sp)

 # Setup IRQ vector
 lla t0, isr_vector
 csrw mtvec, t0

 # t0 = _bss_start
 lla t0, _bss_start

 # t1 = _end
 lw t1, _end

Which compiles to:

00000000800000f4 <start>:
800000f4:   00002117                auipc   sp,0x2
800000f8:   64410113                addi    sp,sp,1604 # 80002738 <_sp>
800000fc:   00000297                auipc   t0,0x0
80000100:   f4428293                addi    t0,t0,-188 # 80000040 <isr_vector>
80000104:   30529073                csrw    mtvec,t0
80000108:   00001297                auipc   t0,0x1
8000010c:   5a828293                addi    t0,t0,1448 # 800016b0 <load_reservation>
80000110:   00002317                auipc   t1,0x2
80000114:   63832303                lw      t1,1592(t1) # 80002748 <_end>

And this seems to work for this part of the task. I'm not sure I understand why %hi and %lo don't work here...and maybe that's not the problem.

  • 5
    `_sp` is not a register, it's presumably a symbol defined elsewhere. Just like all the others such as `_bss_start` and `_end`. In fact it's defined in [custom.ld line 48](https://github.com/ultraembedded/riscv-linux-boot/blob/master/custom.ld#L48). What the code is doing is just initializing the `sp` register so it points to the `_sp` address which is at the end of the area allocated for the stack in the linker script. – Jester Aug 01 '22 at 23:35
  • No, `lui` is not a pseudo, it is an actual instruction. [See also this answer](https://stackoverflow.com/a/59546567/547981). – Jester Aug 01 '22 at 23:41
  • You're mixing up `lui` with the `li` and `la` pseudo-instructions, which will expand to the lui/add pairs you see here. (Or auipc in position-independent code.) – Peter Cordes Aug 02 '22 at 00:42
  • assembly language is specific to the tools not target. whose tools are you using? please tag – old_timer Aug 02 '22 at 02:36
  • compilers have been using underscores for decades as a way to indicate a level of external. so a file with hello() and world() might use hello and world labels directly but other files that make an external call to hello, the compiler would look for the label _hello as a quick way to create a new label. If you do it right systematically (end to end through the toolchain) then it all just fits. Then when you look at the assembly language level you will visually see these underscores. It started long long before the risc-v was even dreamed of... – old_timer Aug 02 '22 at 13:42
  • ...possibly even pre MIPS the ancestor to risc-v. You will see this for many tools for many targets. This does not mean all toolchains do this. and/or all the time... – old_timer Aug 02 '22 at 13:47
  • @Jester thanks for pointing out the definition of _sp I somehow missed that. @old_timer The tool is riscv-gnu-toolchain, 64-bit compiled with multi-lib and using glibc. Where would I find the tool specific assembly information...like %hi and %lo. It seems like there are problems with this during compilation as it stands. with the original code # Setup stack pointer lui sp, %hi(_sp) add sp, sp, %lo(_sp) /mnt/data/minimal_bootloader/boot.S:112:(.text+0xf4): relocation truncated to fit: R_RISCV_HI20 against symbol `_sp' defined in .bss section ... – FirmwareRootkits Aug 03 '22 at 02:12
  • For debugging I look at "make --just-print" riscv64-unknown-elf-gcc -Ttext 0x80000000 -O0 -g -Wall -I. -DCONFIG_KERNEL_EMBEDDED -DPAYLOAD_BINARY=\"vmlinux.bin\" -DDTB_BINARY=\"config.dtb\" -Wno-unused-variable -mcmodel=medany -c assert.c -o /mnt/data/minimal_bootloader/obj/assert.o – FirmwareRootkits Aug 03 '22 at 02:24
  • echo "# LD riscv-linux-boot.elf" riscv64-unknown-elf-gcc /mnt/data/minimal_bootloader/obj/boot.o /mnt/data/minimal_bootloader/obj/payload.o /mnt/data/minimal_bootloader/obj/emulation.o /mnt/data/minimal_bootloader/obj/syscalls.o /mnt/data/minimal_bootloader/obj/exception.o /mnt/data/minimal_bootloader/obj/sbi.o /mnt/data/minimal_bootloader/obj/main.o /mnt/data/minimal_bootloader/obj/serial.o /mnt/data/minimal_bootloader/obj/assert.o -o riscv-linux-boot.elf -nostartfiles -nodefaultlibs -nostdlib -lgcc -T./custom.ld -Wl,--defsym=BASE_ADDRESS=0x80000000 – FirmwareRootkits Aug 03 '22 at 02:24
  • 1
    This code is written to work for 32-bit absolute addresses. `0x80000000` is in the top half of the low 32 bits, I guess as a high-half kernel? So I'm not sure why a `R_RISCV_HI20` relocation couldn't hold the high part of that 64-bit address which is all zero above bit #31. Except `-Ttext` *only* affects `text`, not `.data` or `.bss`, and those might default to somewhere else, outside the low 32 bits for your 64-bit RISC-V build. Might want to use a linker script. – Peter Cordes Aug 03 '22 at 03:10
  • 2
    Or wait, you're getting a relocation truncated message even when you *assemble*, before you can even link. (The error or warning mentions the `.S` file, not a `.o`). IDK, maybe you can't use 32-bit `la` style address-generation in RV64; I don't have experience with it. But that's pretty much a separate question from the fairly trivial question of `_sp` being a symbol your code happens to use. – Peter Cordes Aug 03 '22 at 03:11

0 Answers0