It is currently Mon Jun 24, 2019 8:13 am

All times are UTC - 7 hours



Forum rules





Post new topic Reply to topic  [ 38 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Sun Feb 24, 2019 5:28 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21452
Location: NE Indiana, USA (NTSC)
vivi168 wrote:
The problem is, I should also increment the source address by 32, to skip the current row remaining tiles. Is there a way to do this during DMA?

Not directly from ROM to the PPU, no. As I explained above, you have to bounce it off WRAM.

  1. At any time, copy the tilemap entries from ROM into a buffer in WRAM. For the ROM data format that you have described, you'll need to increment by 64 bytes (32 tilemap entries) when reading and 2 bytes when writing.
  2. During vblank, DMA that buffer to the PPU.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Sun Feb 24, 2019 9:54 pm 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 719
Added complication is that the 2x2 is stored as 4 separate screens and not 1 large screens this means you have to do some "fancy" maths to work out the address.

Base = (X & 40) << 8 + X & 3F

This means you also need to be careful with your horizontal DMA if it crosses the "32x32" barrier.


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 1:16 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 949
koitsu wrote:
The answer is no, you can't control the "source increment" while DMAing to PPU RAM; it always reads in increments of 1, because that's the nature DMA (I don't know of any DMA systems that let you control that, but I suspect the one on the PS2 probably does -- it's DMA implementation is crazy).

N64's RSP DMA can. It's intended for copying sub-textures.


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 10:04 am 
Offline

Joined: Tue Oct 24, 2017 11:07 pm
Posts: 26
Oziphantom wrote:
Added complication is that the 2x2 is stored as 4 separate screens and not 1 large screens this means you have to do some "fancy" maths to work out the address.

Base = (X & 40) << 8 + X & 3F

This means you also need to be careful with your horizontal DMA if it crosses the "32x32" barrier.

That's why I like 16x16 tile mode (what the numbers I provided earlier were for)- you can stick with a 32x32 tilemap and still get seamless scrolling on both directions with a linear layout.


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 2:37 pm 
Offline

Joined: Thu Feb 14, 2019 2:25 pm
Posts: 8
Quote:
You don't *have* to use DMA for these updates/writes, of course! It may make more sense to use DMA just for horizontal panning situations, and to do the $2118/2119 writes yourself natively for vertical panning situations


Oh ok, I was under the assumption one should never copy data manually because it would be too slow.

But in this case it makes sense, because there isn't much data to transfer after all ($80 byte for one column).


Thank you all for your reply, it's very instructive.


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 3:53 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 1045
If you're loading the system heavily, such that there's a serious risk of running out of VBlank time at some point, you should probably use DMA for everything unless it's so small that you can show manual copy isn't wasting time.

But if you're not, there may be no reason to go to the trouble. Since you're coding on bare metal, the only real "rule" in SNES development is that what you do has to work. (It also helps if you can still read your code when you come back to it after a week...)


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 4:18 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4115
Location: A world gone mad
You should use DMA if/when technically possible, but not all situations can be done using DMA. DMA is substantially faster than code, by around a factor of... 8? 10? I don't know (refs: #1, #2, #3, #4 but all of these talk about clocks, which is not the same thing as CPU cycles). Why you can't use DMA in your particular case here is because you want a way to increment the source address by something other than 1 or 2 -- SNES DMA can't do that. What most people end up doing is putting into WRAM, linearly, the bytes they want to be written to PPU RAM (by whatever increment) then use DMA for that.

Here are cycle counts for a 128 byte transfer into PPU RAM (with a 32x32 increment in PPU RAM, as well as from WRAM). Don't just use this, please read everything I've written. I haven't done SNES code in ~20 years so I may have parts of this wrong (ex. bits of $2115, PPU RAM layout for tilemap, etc.). Cut me some slack please.

Code:
  sep #$20          ; 3 cycles

; $2115 bit 7   = %1  = increment PPU RAM address on write to $2119 (low byte @ $2118, high byte @ $2119)
; $2115 bit 1,0 = %01 = increment PPU RAM address 32x32, e.g. one column at a time
;
  lda   #%10000001  ; 2 cycles
  sta.l $002115     ; 5 cycles

  rep #$30          ; 3 cycles

; XXXX = PPU RAM address of tilemap start; fill in yourself
;
  lda   #$xxxx      ; 3 cycles
  sta.l $002116     ; 6 cycles

  ldx #0            ; 3 cycles
loop:
  lda.l $7f0000,x   ; 6 cycles
  sta.l $002118     ; 6 cycles
  txa               ; 2 cycles
  clc               ; 2 cycles
  adc #$40          ; 3 cycles
  tax               ; 2 cycles
  cpx #$800         ; 3 cycles
  bne loop          ; 3 cycles if branching, 2 cycles if not


The initial setup (everything up to and including ldx #0) takes 25 cycles.

Each loop iteration (of writing 2 bytes to PPU RAM) takes 27 cycles, including the cost of the branch being taken. 27*63 = 1701 cycles. The final transfer, where the branch isn't taken, takes 26 cycles. So 1701+26 = 1727 cycles total for the loop, or 1727+25 = 1752 cycles for everything you see above. (Edit: I suspect I may be off by 1 somewhere, as I had to edit my code due to forgetting you can't do stx long).

This is a "slow but safe" routine. It can optimised in several different ways -- examples: not using long addressing when writing to $2118 (only will work in mode 20/LoROM), setting DB=$7F and then using absolute addressing for WRAM reads, switching DB=$00 and using absolute addressing for $2118/2119 writes, doing something like lda #$2100 / tcd / sta $18 (to write to $2118), unrolling the loop entirely + not using X indexing at all since the $7fxxxx addresses can be pre-calculated (this has most savings but at cost of ROM space), etc...

I forget how much time there is in NMI/VBlank on the SNES, but I imagine it's only a bit more than this.


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 6:58 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21452
Location: NE Indiana, USA (NTSC)
Vblank on NTSC Super NES in 224-line mode is 262 - 224 = 38 lines or thereabouts, and each line is (1364 - 40) / 8 = 165.5 slow cycles. (DMA and WRAM access use slow cycles.) I haven't tested this in detail, but I imagine the usable vblank time may be reduced by up to 1 line to allow for retrieving the first scanline's sprite data. So for now, I'll say 37 * 161.5 = 6123 cycles.

EDIT: Teaches me to trust mental math

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 9:08 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4115
Location: A world gone mad
tepples wrote:
Vblank on NTSC Super NES in 224-line mode is 262 - 224 = 38 lines or thereabouts, and each line is (1364 - 40) / 8 = 161.5 slow cycles. (DMA and WRAM access use slow cycles.) I haven't tested this in detail, but I imagine the usable vblank time may be reduced by up to 1 line to allow for retrieving the first scanline's sprite data. So for now, I'll say 37 * 161.5 = 5975 cycles.

Here's something that might help you know if your estimate is correct or not: the official developers manual has this little footnote on the bottom of page 2-17-2 (general description/introduction to general DMA) that says: In case of 224 lines, general purpose DMA can transfer 6K byte of data maximum during V-Blank period.


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 9:47 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 1045
(1364-40)/8 is 165.5, not 161.5. Multiply by 37 and you get 6123.5. Multiply by 38 (assuming that VRAM or at least CGRAM is safely writable while OAM is being scanned for the first time) and you get 6289.

Best not to try to pack it 100.000% full. NMI doesn't start instantly at the end of the last active line, and there's wobble in the timing due to CPU instruction handling. Also, it's not clear what the full exact intervals are in which OAM and VRAM are writable. There's even a short scanline during VBlank if interlace is off, although it's only by half a byte cycle (4 master clocks) and the rest of the timing slop dwarfs that.

Which reminds me - these line counts do not apply to PAL. A PAL console has 72/73 lines of VBlank in 239-line mode, or 87/88 lines in 224-line mode. And if interlace is on, there's a scanline at the end of VBlank that's half a byte cycle longer than usual.

Oh yeah, and interlace adds an extra line every other frame, on any console.

Lots of information.


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 9:58 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2874
tepples wrote:
Vblank on NTSC Super NES in 224-line mode is 262 - 224 = 38 lines or thereabouts, and each line is (1364 - 40) / 8 = 161.5 slow cycles. (DMA and WRAM access use slow cycles.) I haven't tested this in detail, but I imagine the usable vblank time may be reduced by up to 1 line to allow for retrieving the first scanline's sprite data. So for now, I'll say 37 * 161.5 = 5975 cycles.


Isn't that 165.5 cycles, not 161.5 cycles.

@koitsu
You would also need to do some math with the VRAM address and X index. If you're using 16x16 sized tiles:

lda camera_x
lsr #4
and #$001f
sta temp
ora #map_address
sta $2116

If you're storing the level as 32x32 tile maps laid out horizontally then you would do this to calculate x index

lda camera_x
and #$fe00
asl
ora temp
asl
tax


Top
 Profile  
 
PostPosted: Mon Feb 25, 2019 11:33 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4115
Location: A world gone mad
@psycopathicteen I don't think this is relevant to what the fellow is doing in his first pass, but we need clarification from him on what he's going with for now. I get the impression, as a first-pass-attempt, he's going to keep a raw copy of the tilemap (all $800 bytes) in WRAM, and wanted to update bits/pieces of it (ex. a column) and then write that to the related/appropriate part of PPU RAM -- i.e. $7F0000-01 ends up in (assuming SC base address = PPU RAM $2000) $2000-01, $7F0040-41 ends up in $2040-41, etc.. -- for the far left column.

You're just giving out magical variables that have no description, nor were those variables in my code, so... I don't think this helps someone learn, respectively. I actually understand the first set of code (for dynamically figuring out where in PPU RAM you want to start at), and relates to the whole "camera/view into PPU RAM/WRAM" concept, I know, but it's not easily explained from just 6 lines of code.

I don't know about the tile size, but I'm operating off of 8x8 under modes 0 through 4. 16x16 I think is a different situation (everything doubles, including the BG scroll ranges).

Also, I think I might have some of the values wrong -- it might need to be adc #$20 and cpx #$400, but I can't remember. This is one of the areas of the official documentation that sucks: they often talk about things in words (2 bytes) but then refer to indices etc. as +0, +1, +2, +3 (i.e. +0, +2, +4, +6). This compounded by it being 20+ years for me doesn't help. I'm trying my best off of memory though.


Top
 Profile  
 
PostPosted: Tue Feb 26, 2019 11:28 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2874
What if I add some notes?

Code:
lda camera_x                //--MMMMMTTTTTPPPP          M = map, T = tile, P = pixel
lsr #4                      //------MMMMMTTTTT
and #$001f                  //-----------TTTTT
sta temp
ora #map_address
sta $2116

lda camera_x                //--MMMMMTTTTTPPPP
and #$fe00                  //--MMMMM---------
asl                         //-MMMMM----------
ora temp                    //-MMMMM-----TTTTT
asl                         //MMMMM-----TTTTT-
tax



Top
 Profile  
 
PostPosted: Tue Feb 26, 2019 3:36 pm 
Offline

Joined: Thu Feb 14, 2019 2:25 pm
Posts: 8
Quote:
This is a "slow but safe" routine. It can optimised in several different ways -- examples: not using long addressing when writing to $2118 (only will work in mode 20/LoROM), setting DB=$7F and then using absolute addressing for WRAM reads, switching DB=$00 and using absolute addressing for $2118/2119 writes, doing something like lda #$2100 / tcd / sta $18 (to write to $2118), unrolling the loop entirely + not using X indexing at all since the $7fxxxx addresses can be pre-calculated (this has most savings but at cost of ROM space), etc...

I must admit I was a little lost here.
Why does long addressing only work in mode 20/LoROM?

As for TCD, I've made some research, it transfers the the 16 bits in of the accumulator to the Direct Page Register I think, but I'm not exactly sure what that means?
Does it affect which page/bank you access? For example instead of accessing Zero Page, you access Page 02?

Quote:
@psycopathicteen I don't think this is relevant to what the fellow is doing in his first pass, but we need clarification from him on what he's going with for now. I get the impression, as a first-pass-attempt, he's going to keep a raw copy of the tilemap (all $800 bytes) in WRAM, and wanted to update bits/pieces of it (ex. a column) and then write that to the related/appropriate part of PPU RAM -- i.e. $7F0000-01 ends up in (assuming SC base address = PPU RAM $2000) $2000-01, $7F0040-41 ends up in $2040-41, etc.. -- for the far left column.

So to clarify, first of all, I've set $2105 (BGMODE) to $01 (bg mode 1, tile size 8). and $2107 (BG1SC) is set to $10 (only one 32x32 tilemap).
At the beginning, I first load the tilemap from the ROM to PPU via DMA. Then when going left and right, I scroll the background. For now, it only wraps back.

What I'm looking to do, is fetch a column of another tilemap in the ROM and write it over a column of the tilemap in the PPU. (So it gives the illusion of a longer level, like the GIF tepples posted). From what I've read in this thread, it might be a good idea to first transfer it to the WRAM, and then copy the column I need from the WRAM to the PPU.

I now have a precise idea of what I need to write to achieve this. I'm gonna try it and hopefully get back to you with something working :)

Thanks a bunch


Top
 Profile  
 
PostPosted: Tue Feb 26, 2019 4:28 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21452
Location: NE Indiana, USA (NTSC)
Using 24-bit addressing works in both LoROM and HiROM. Not using long addressing, that is, using 16-bit absolute addressing without regard for the contents of the data bank register (B), is most practical in LoROM. The difference is that in HiROM, the data bank register will usually be in $C0-$FF, which is ROM, or $7E-$7F, which is WRAM. Neither of these banks allows access to MMIO. In LoROM, by contrast, the data bank register is more likely to be in $80-$BF, which contains the MMIO areas $2100-$21FF and $4200-$437F. There are more advanced ways to use MMIO with absolute addressing in HiROM, and they rely on the fact that the second half of each bank of $C00000-$FFFFFF ($C08000-$C0FFFF, $C18000-$C1FFFF, etc.) is mirrored down to banks $80-$BF. This requires bit manipulation either at assembly time or at runtime when determining which values to push onto the stack before plb, and it may require the linker script to place certain data in the second half of a bank.

When you load $0000 into the direct page base register (D), lda $3F reads $00003F.
When you load $0200 into the direct page base register, lda $3F instead reads $00023F.
When you load $0210 into the direct page base register, lda $3F reads $00024F and costs an extra cycle because bits 7-0 of the register are not 0.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 38 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group