It is currently Fri Sep 21, 2018 5:16 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 12 posts ] 
Author Message
PostPosted: Sun Mar 11, 2018 10:37 am 
Offline
User avatar

Joined: Wed Sep 20, 2017 1:14 pm
Posts: 24
Location: Green Hill Zone
Hi, I have a strange bug, the generation of my map is very well in PAL mode, but in NTSC no, it looks like data is written during the Rendering period.

My question is: What can cause this bug?

Because, if I don't exceed the 20 Vblank scanlines in PAL, it should not exceed them in NTSC too, right?

PAL : Image

NTSC : Image

Thanks for your answers !


Top
 Profile  
 
PostPosted: Sun Mar 11, 2018 10:46 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20562
Location: NE Indiana, USA (NTSC)
There are 70 vblank scanlines after NMI in PAL NES and only 20 in NTSC NES. In FCEUX, try setting a breakpoint on your NMI handler's RTI and looking at how long it takes to run.

(PAL famiclones are different, using a longer post-render period to appear more like an NTSC NES to programs made for NTSC.)


Top
 Profile  
 
PostPosted: Sun Mar 11, 2018 11:30 am 
Offline
User avatar

Joined: Wed Sep 20, 2017 1:14 pm
Posts: 24
Location: Green Hill Zone
Oh god... It takes approximatively 50 scanlines.

Well, I have to lighten my code.

Thanks !


Top
 Profile  
 
PostPosted: Sun Mar 11, 2018 12:54 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2262
Location: DIGDUG
C code is probably going to be too slow to be useful as a PPU update / NMI system.

Have you thought about Shiru's neslib?

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Sun Mar 11, 2018 2:08 pm 
Offline

Joined: Mon May 27, 2013 9:40 am
Posts: 494
Even if you don't use Shiru's neslib, you should build your updates during rendering time in a buffer and then blast it to VRAM during vblank.

_________________
http://www.mojontwins.com


Top
 Profile  
 
PostPosted: Sun Mar 11, 2018 3:06 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 10:59 pm
Posts: 1428
On PAL, it's possible to write a lot to the PPU during a single VBLANK - with 70 scanlines and 106+9/16 cycles per scanline, that gives you just under 7000 usable cycles (not counting sprite DMA and general interrupt handling), enough to write about 512 bytes using simple loops, more if you unroll them.

With NTSC, on the other hand, you've got under 1700 usable cycles - enough to transfer about 128 bytes, and even that's cutting things close if you want to update attribute bytes and the palette too.

_________________
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.


Top
 Profile  
 
PostPosted: Mon Mar 12, 2018 11:30 am 
Offline
User avatar

Joined: Wed Sep 20, 2017 1:14 pm
Posts: 24
Location: Green Hill Zone
dougeff wrote:
C code is probably going to be too slow to be useful as a PPU update / NMI system.

Have you thought about Shiru's neslib?


Why use shiru lib? On my side to display a tile I use your technique, I have pointer to memory addresses, so to write a tile I use that:
Code:
PPU_ADDRESS = hi-hex position value;
PPU_ADDRESS = lo-hex position value;
PPU_DATA = tile hex value;


So when I compile it turns into
Code:
lda # hi-hex position value
sta $ 2006

lda # lo-hex position value
sta $ 2006

lda #tile hex value
sta $ 2007



I don't see how to gain more speed in writing with shiru lib. But maybe you're talking about another speed gain?

na_th_an wrote:
Even if you don't use Shiru's neslib, you should build your updates during rendering time in a buffer and then blast it to VRAM during vblank.


Yes I looked a little on the forum, and I saw that we could use the stack as a buffer, I think it's a good idea, it would allow me to use the cpu to do all the calculations to find right tiles of the next column during my update. Then during the Vblank I load all the buffer in vram.


Top
 Profile  
 
PostPosted: Mon Mar 12, 2018 12:05 pm 
Online
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3602
Location: Mountain View, CA
MS-DOS wrote:
Why use shiru lib? On my side to display a tile I use your technique, I have pointer to memory addresses, so to write a tile I use that:
Code:
PPU_ADDRESS = hi-hex position value;
PPU_ADDRESS = lo-hex position value;
PPU_DATA = tile hex value;


So when I compile it turns into
Code:
lda # hi-hex position value
sta $ 2006

lda # lo-hex position value
sta $ 2006

lda #tile hex value
sta $ 2007

This code doesn't use pointers (indirect addressing) at all. If this is the assembly your code is resulting in, then it sounds like you're using either macros or something that resulted in unrolled code. And if all of your PPU updates are written in this fashion, then this could/would explain why much of your NMI time is wasted, especially for large sequential updates of PPU RAM.


Top
 Profile  
 
PostPosted: Mon Mar 12, 2018 12:20 pm 
Offline
User avatar

Joined: Wed Sep 20, 2017 1:14 pm
Posts: 24
Location: Green Hill Zone
Code:
#define PPU_ADDRESS      *((unsigned char*)0x2006)
#define PPU_DATA      *((unsigned char*)0x2007)


I use that, sorry if I was wrong in what I said!

On the other hand, if this method is slow, which method should I use to Update Vram?


Top
 Profile  
 
PostPosted: Mon Mar 12, 2018 12:36 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20562
Location: NE Indiana, USA (NTSC)
Ideally, your main program, be it in C or assembly language or whatever, fills a buffer of addresses and data values in otherwise unused memory, such as $0100-$01BF in the stack page. Then during vblank, your program calls a subroutine written in assembly language that reads the buffer, copying addresses and data from the buffer to the PPU's ports. There's one such subroutine in neslib; I wrote my own called Popslide that uses pointer arithmetic and (ab)use of the stack pointer to allow a fast unrolled loop that C on a 6502 can't practically match.


Top
 Profile  
 
PostPosted: Mon Mar 12, 2018 12:49 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2262
Location: DIGDUG
Looking at the distribution of incorrect tiles on that NTSC picture, I would guess that there is a lot more to your update code than just

LDA byte
STA 2006
LDA byte
STA 2006
LDA byte
STA 2007

I would have to guess that it is inside a loop that is calculating each PPU address in real time.

loops can be MUCH more efficient in ASM [edit, autocorrect]

Addresses should be calculated beforehand (buffered), or selected from a table of precalculated addresses.

EDIT, also, in my latest blog posts, I DO use neslib, and I prefer that sort of approach. I have abandoned my earlier code, and plan to rewrite all the example code some day.

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Mon Mar 12, 2018 2:09 pm 
Offline
User avatar

Joined: Wed Sep 20, 2017 1:14 pm
Posts: 24
Location: Green Hill Zone
tepples wrote:
Ideally, your main program, be it in C or assembly language or whatever, fills a buffer of addresses and data values in otherwise unused memory, such as $0100-$01BF in the stack page. Then during vblank, your program calls a subroutine written in assembly language that reads the buffer, copying addresses and data from the buffer to the PPU's ports. There's one such subroutine in neslib; I wrote my own called Popslide that uses pointer arithmetic and (ab)use of the stack pointer to allow a fast unrolled loop that C on a 6502 can't practically match.


Yeah, I think I use this technique! I'm going to take a look at Popslide.

dougeff wrote:
Looking at the distribution of incorrect tiles on that NTSC picture, I would guess that there is a lot more to your update code than just

LDA byte
STA 2006
LDA byte
STA 2006
LDA byte
STA 2007

I would have to guess that it is inside a loop that is calculating each PPU address in real time.

loops can be MUCH more efficient in ASM [edit, autocorrect]

Addresses should be calculated beforehand (buffered), or selected from a table of precalculated addresses.

EDIT, also, in my latest blog posts, I DO use neslib, and I prefer that sort of approach. I have abandoned my earlier code, and plan to rewrite all the example code some day.


Of course, this is just an example of how I draw ONE tile ^^

Indeed I will use the buffer technique using the unused part in the stack!

-------------------

Thank you all for your responses ! :D

Edit: It's good ! I used stack's buffer technique, push data in stack during rendering, and pop stack data to vram during vblank. It works very well, I no longer exceed 20 scanlines!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group