VBCC Optimizing C-compiler now supports NES

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Sun Nov 01, 2020 7:48 pm

Looks like I'm hitting some internal error.

Code: Select all

error 3005: reloc type 2, size 8, mask 0xffffffff (symbol  __ppu_OutputBuffer 1 + 0xffffffff) not supported

vbc
Posts: 67
Joined: Sun Jun 21, 2020 5:03 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by vbc » Tue Nov 03, 2020 11:01 am

yaros wrote:
Sun Nov 01, 2020 7:48 pm
Looks like I'm hitting some internal error.

Code: Select all

error 3005: reloc type 2, size 8, mask 0xffffffff (symbol  __ppu_OutputBuffer 1 + 0xffffffff) not supported
Looks like an unsupported relocation is generated. Do you have some code to reproduce this?

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Tue Nov 03, 2020 11:49 am

vbc wrote:
Tue Nov 03, 2020 11:01 am
Looks like an unsupported relocation is generated. Do you have some code to reproduce this?
Issue was, I was using label that didn't exist, my bad. Here is a code to reproduce the issue.

Code: Select all

void foo() {
    __asm("    jmp .1");
}

int main() {
    foo();
}

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Fri Nov 13, 2020 6:59 pm

Speaking of optimizations, is there way to hint the compiler, that array is not that big and shift register, instead of adding offset to the base address?

I have the following array:

Code: Select all

extern const unsigned char* levels_Collisions_x1[];
And read it like that

Code: Select all

ptr = levels_Collisions_x1[level];
Compiler recalculates the base address offset, and does it only once, which is really nice.

Code: Select all

  LDA #$00
  STA r17                  
  LDY #$00                 
  LDA (sp),Y ; 'unsigned char level' parameter
  STA r16                  
  ASL r16                  
  ROL r17                  
And then when I have lots of data[level] reads, it uses that offset.

Code: Select all

  LDA #$62       ; AD62 - base address of the array          
  CLC                      
  ADC r16                  
  STA r2                   
  LDA #$AD                 
  ADC r17                  
  STA r3                   
  LDY #$01                 
  LDA (r2),Y               
  STA ptr+1                 
  DEY                      
  LDA (r2),Y               
  STA ptr+0
It is really nice, but I know that array won't be longer than 128 elements. If compiler knows that it's ROM data (const type const) and it's a known size less than 128, can it instead do the following?

Code: Select all

  LDA (sp),Y
  ASL A
  TAY
  LDA #$62
  STA r2
  LDA #$AD
  LDA (r2),Y
  STA ptr+0
  LDA (r3),Y
  STA ptr+1
  STA r3
edit:

I also noticed that disabling stdlib also breaks things like division. I expected it would remain to work...

vbc
Posts: 67
Joined: Sun Jun 21, 2020 5:03 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by vbc » Tue Nov 17, 2020 4:41 pm

yaros wrote:
Fri Nov 13, 2020 6:59 pm
Speaking of optimizations, is there way to hint the compiler, that array is not that big and shift register, instead of adding offset to the base address?
Currently not. The 6502 backend of vbcc tries to use the indexed addressing modes, but at the moment it only is able to do this in some cases. In your case

Code: Select all

extern const unsigned char* levels_Collisions_x1[];
the array elements are pointer sized, causing the index to be multiplied by two. To optimize this case, vbcc would have to know that this multiplication will fit into 8 bits. I did think about adding range analysis to vbcc which could help such cases. Maybe I will add this in the future, but at the moment it is not possible.
I also noticed that disabling stdlib also breaks things like division. I expected it would remain to work...
Can you add the standard library as last linker library? In this case only missing functions will be pulled in from it.

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Wed Nov 18, 2020 11:48 am

vbc wrote:
Tue Nov 17, 2020 4:41 pm
the array elements are pointer sized, causing the index to be multiplied by two. To optimize this case, vbcc would have to know that this multiplication will fit into 8 bits. I did think about adding range analysis to vbcc which could help such cases. Maybe I will add this in the future, but at the moment it is not possible.
It's just suggestion, that would be really nice. I'll probably have to use assembler in hot paths for now, but will see.
vbc wrote:
Tue Nov 17, 2020 4:41 pm
Can you add the standard library as last linker library? In this case only missing functions will be pulled in from it.
Yes, linking with libvc.a worked, thank you.

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Tue Nov 24, 2020 1:30 pm

I also noticed that vbcc insist on copying pointer to r0 even when pointer is already in the zero page

The following code (-02)

Code: Select all

tile = ptr[index]
Produces this assembly

Code: Select all

  LDA _ptr+1
  STA r1                   
  LDA _ptr+0                 
  STA r0                   
  LDY r2                   
  LDA (r0),Y      

Where this would be sufficient

Code: Select all

             
  LDY r2                   
  LDA (_ptr),Y      
edit:

Another thing, here is tile unsigned char, so it should never overflow the Y register

Code: Select all

__zpage uint8 const* ts01;
...
unsigned char tile;
...
value = ts11[tile];
Generated code doesn't actually use Y register (LDY #0) but instead keeps value of 'tile' variable as in r31 and adds it to r0/r1 pair. Also not sure why it keeps the copy of tile in r30 and restores it after use.

Code: Select all

  LDA _ts01                
  CLC                      
  ADC r31                  
  STA r0                   
  LDA $32                  
  ADC #$00                 
  STA r1                   
  LDA r30                  
  STA r31                  
  LDY #$00                 
  LDA (r0),Y               
 

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Sat Nov 28, 2020 5:01 pm

Found another bug with -O3. the following function gets inlined, without assigning the value to the `data` arguments

Code: Select all

void ppu_LoadChr(__reg("r0/r1")const uint8 * data) {
    __asm(
    "   ldy #0\n"
    "   sty PPUADDR\n" 
    "   sty PPUADDR\n"
    "   ldx #32\n"
    ".1:\n"
    "   lda (r0),y\n"
    "   sta PPUDATA\n"
    "   iny\n"
    "   bne .1\n"
    "   inc r1\n"
    "   dex\n"
    "   bne .1\n"
    );
}
Output

Code: Select all

  LDY #$00                 
  STY PpuAddr_2006         
  STY PpuAddr_2006         
  LDX #$20                 
  LDA (r0),Y               
  STA PpuData_2007         
  INY                      
  BNE $AC28                
  INC r1                   
  DEX                      
  BNE $AC28                
edit:
Another trouble I'm facing, is linker is refusing to allocate at the given section

Code: Select all

MEMORY
{
  out:     org=0x7FF0, len=0xffffff
  zero:    org=0, len=0x0100
  b0:      org=0x8000, len=0x8000
  stack_ram:     org=0x0100, len=0x0100
  oam_ram:     org=0x0200, len=0x0100
  ram:     org=0x0300, len=0x0500
}

NESMAPPER    = 7 ; /* mapper number */
NESPRG_BANKS = 2 ; /* number of 16K PRG banks, change to 2 for NROM256 */
NESCHR_BANKS = 0 ; /* number of 8K CHR banks */
NESMIRRORING = 1 ; /* 0 horizontal, 1 vertical, 8 four screen */

SECTIONS
{
  header: {BYTE(0x4e);BYTE(0x45);BYTE(0x53);BYTE(0x1a);
           BYTE(NESPRG_BANKS);
           BYTE(NESCHR_BANKS);
           BYTE(NESMIRRORING|(NESMAPPER<<4));
           BYTE(NESMAPPER&0xf0);
           LONG(0);
           LONG(0);
          } >out

  text:   {*(text)} >b0 AT>out
  .dtors: { *(.dtors) } >b0 AT>out
  .ctors: { *(.ctors) } >b0 AT>out
  rodata: {*(rodata)}  >b0 AT>out
  init:   {*(init)}  >b0 AT>out
  data:   {*(data)} >ram AT>out
  /* fill program bank */
  fill: { .=.+0x10000-6-ADDR(init)-SIZEOF(init)-SIZEOF(data);} >b0 AT>out
  vectors:{ *(vectors)} >b0 AT>out

  zpage (NOLOAD) : {*(zpage) *(zp1) *(zp2)} >zero
  bss (NOLOAD)   : {*(bss)} >ram
  oam (NOLOAD)   : {*(oam)} >oam_ram
  stack (NOLOAD) : {*(stack)} >stack_ram

  __DS = ADDR(data);
  __DE = ADDR(data) + SIZEOF(data);
  __DC = LOADADDR(data);

  __STACK = 0x800;

  ___heap = ADDR(bss) + SIZEOF(bss);
  ___heapend = __STACK;
}
And the C code

Code: Select all

#pragma section oam
uint8 _ppu_oam[0x100];
#pragma section default

#pragma section stack
uint8 ppu_vram[0x40];
#pragma section default
_ppu_oam is properly put at 0x200 but ppu_vram is still at 0x300 instead of 0x100

vbc
Posts: 67
Joined: Sun Jun 21, 2020 5:03 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by vbc » Wed Dec 02, 2020 1:14 pm

yaros wrote:
Tue Nov 24, 2020 1:30 pm
I also noticed that vbcc insist on copying pointer to r0 even when pointer is already in the zero page
You are correct. Currently pointer variables in zero page are not used directly with the indexed addressing mode. That is an oversight that should be easy to fix.

For your second example, I would need a small compileable example to look at. It is basically the same code as in your first example, so the differences are probably due to the code around it.

vbc
Posts: 67
Joined: Sun Jun 21, 2020 5:03 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by vbc » Wed Dec 02, 2020 1:36 pm

yaros wrote:
Sat Nov 28, 2020 5:01 pm
Found another bug with -O3. the following function gets inlined, without assigning the value to the `data` arguments

Code: Select all

void ppu_LoadChr(__reg("r0/r1")const uint8 * data) {
    __asm(
      ...
     );
}
If you are using the __asm() statement like that, then there is no connection between the function argument and the asm code. It is not safe to assume that the registers will always have the correct value for the assembly code. To pass arguments to inline assembly, use the syntax for inline assembly functions:

Code: Select all

void ppu_LoadChr(__reg("r0/r1")const uint8 * data)  =
    "   ldy #0\n"
    "   sty PPUADDR\n" 
    "   sty PPUADDR\n"
    "   ldx #32\n"
    ".1:\n"
    "   lda (r0),y\n"
    "   sta PPUDATA\n"
    "   iny\n"
    "   bne .1\n"
    "   inc r1\n"
    "   dex\n"
    "   bne .1\n";
With this code, vbcc knows that the inline assembly gets a parameter in r0/r1 and it will work with inlining.
Another trouble I'm facing, is linker is refusing to allocate at the given section
...
_ppu_oam is properly put at 0x200 but ppu_vram is still at 0x300 instead of 0x100
I tried the small code snippet and it was correctly located with your supplied linker file. There must be something going wrong somewhere else.

Please run vobjdump on the object file containing the variables and check if they have been placed correctly. You should see something like this:

Code: Select all

...
0000020a: GLOB 00000000         stack        0 _ppu_vram
00000226: GLOB 00000000           oam        0 __ppu_oam
...
If one is not placed correctly, check if there are any declarations that are not using the correct #pragma section (e.g. in a header file).

If they are correct in the object file, please run vlink with option -Mmapfile and send the resulting mapfile.

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Fri Dec 04, 2020 6:59 pm

vbc wrote:
Wed Dec 02, 2020 1:36 pm
If you are using the __asm() statement like that, then there is no connection between the function argument and the asm code. It is not safe to assume that the registers will always have the correct value for the assembly code. To pass arguments to inline assembly, use the syntax for inline assembly functions:

Code: Select all

void ppu_LoadChr(__reg("r0/r1")const uint8 * data)  =
    "   ldy #0\n"
    "   sty PPUADDR\n" 
    "   sty PPUADDR\n"
    "   ldx #32\n"
    ".1:\n"
    "   lda (r0),y\n"
    "   sta PPUDATA\n"
    "   iny\n"
    "   bne .1\n"
    "   inc r1\n"
    "   dex\n"
    "   bne .1\n";
With this code, vbcc knows that the inline assembly gets a parameter in r0/r1 and it will work with inlining.
I'm getting the following. Is exporting assembly functions in "non inline" mode from the object file not supported?
Error 21: vbcc0692.o (text+0x21): Reference to undefined symbol _ppu_SetAddr.

lib.c

Code: Select all

void ppu_SetAddr(__reg("r0/r1")uint16 addr) =
    "   lda PPUSTATUS\n"
    "   lda r1\n"
    "   sta PPUADDR\n"  
    "   lda r0\n"
    "   sta PPUADDR\n";
lib.h

Code: Select all

void ppu_SetAddr(uint16 addr);
vbc wrote:
Wed Dec 02, 2020 1:36 pm
If one is not placed correctly, check if there are any declarations that are not using the correct #pragma section (e.g. in a header file).

If they are correct in the object file, please run vlink with option -Mmapfile and send the resulting mapfile.
Thank you, you have me a hint on how to fix it. Header file was simply:

Code: Select all

extern uint8 ppu_vram[0x40];
After chaning it to the following it works. I assumed "extern" should not reallocate memory, and pragmas shouldn't matter.

Code: Select all

#pragma section stack
extern uint8 ppu_vram[0x40];
#pragma section default

vbc
Posts: 67
Joined: Sun Jun 21, 2020 5:03 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by vbc » Sat Dec 05, 2020 7:47 am

yaros wrote:
Fri Dec 04, 2020 6:59 pm
I'm getting the following. Is exporting assembly functions in "non inline" mode from the object file not supported?
Error 21: vbcc0692.o (text+0x21): Reference to undefined symbol _ppu_SetAddr.

lib.c

Code: Select all

void ppu_SetAddr(__reg("r0/r1")uint16 addr) =
    "   lda PPUSTATUS\n"
    "   lda r1\n"
    "   sta PPUADDR\n"  
    "   lda r0\n"
    "   sta PPUADDR\n";
lib.h

Code: Select all

void ppu_SetAddr(uint16 addr);
Inline-assembly functions are not externally visible as they are no real functions. If you want them externally visible you can do either to a stub:

Code: Select all

void ppu_SetAddr_asm(__reg("r0/r1")uint16 addr) =
    "   lda PPUSTATUS\n"
    "   lda r1\n"
    "   sta PPUADDR\n"  
    "   lda r0\n"
    "   sta PPUADDR\n";

void ppu_SetAddr(uint16 addr)
{
 ppu_SetAddr_asm(addr);
}
or just write the function label in assembly:

Code: Select all

static void dummy()
{
 __asm(
    "   global ppu_SetAddr\n"
    "_ppu_SetAddr:\n"
    "   lda PPUSTATUS\n"
    "   lda r1\n"
    "   sta PPUADDR\n"  
    "   lda r0\n"
    "   sta PPUADDR\n"
    "   rts");
}
Of course you can also just write the entire function in an assembly source file.
vbc wrote:
Wed Dec 02, 2020 1:36 pm
If one is not placed correctly, check if there are any declarations that are not using the correct #pragma section (e.g. in a header file).

If they are correct in the object file, please run vlink with option -Mmapfile and send the resulting mapfile.
Thank you, you have me a hint on how to fix it. Header file was simply:

Code: Select all

extern uint8 ppu_vram[0x40];
After chaning it to the following it works. I assumed "extern" should not reallocate memory, and pragmas shouldn't matter.

Code: Select all

#pragma section stack
extern uint8 ppu_vram[0x40];
#pragma section default
The current handling in vbcc is not too user-friendly, I guess. Maybe I will look into changing it or at least issue a warning if something is declared with and without #pragmas.

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Sun Dec 06, 2020 6:58 pm

vbc wrote:
Sat Dec 05, 2020 7:47 am
or just write the function label in assembly:

Code: Select all

static void dummy()
{
 __asm(
    "   global ppu_SetAddr\n"
    "_ppu_SetAddr:\n"
    "   lda PPUSTATUS\n"
    "   lda r1\n"
    "   sta PPUADDR\n"  
    "   lda r0\n"
    "   sta PPUADDR\n"
    "   rts");
}
Of course you can also just write the entire function in an assembly source file.
Will writing the method directly in assembly and then defining it in the header like `extern void ppu_SetAddr(__reg("r0/r1")uint16 addr)` break optimizer like you said will happen when __asm is used or it will be fine?

vbc
Posts: 67
Joined: Sun Jun 21, 2020 5:03 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by vbc » Mon Dec 07, 2020 11:39 am

yaros wrote:
Sun Dec 06, 2020 6:58 pm
Will writing the method directly in assembly and then defining it in the header like `extern void ppu_SetAddr(__reg("r0/r1")uint16 addr)` break optimizer like you said will happen when __asm is used or it will be fine?
That will be fine.

yaros
Posts: 40
Joined: Mon Jul 27, 2020 1:14 pm

Re: VBCC Optimizing C-compiler now supports NES

Post by yaros » Mon Dec 21, 2020 10:31 am

Dr. Volker, is there any way to figure out what exactly vlink is complaining about with the following error?

Error 35: rpg.nes (text+0x2891): Calculated value 0x300 doesn't fit into relocation type R_ABS (offset=0, size=8, mask=0xffffffffffffffff).

edit: Ah, nevermind. I allocated the variable in bss, and imported it in c as zpage. It would be nice if linker could report the symbol (or address) it fails to reference.

Post Reply