Internal OAM address sprite evaluation

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
paulb_nl
Posts: 32
Joined: Fri Nov 18, 2016 7:57 am

Internal OAM address sprite evaluation

Post by paulb_nl »

Is it known what the SNES is doing with the internal OAM address during sprite evaluation?

At the start of HDMA Uniracers expects it to be at the High OAM address of the last sprite in range of the frame. So it is not reset even if there are no sprites on the current line.

The SNES scans through the OAM for 128 sprites and checks if they are in range. For this it also needs the High OAM bits to know the size and the full X-position. So I would expect the OAM address to be at the end of High OAM 0x21f after sprite evaluation every line.

How does the address end up at the last sprite in range of the frame? Could it be using multiple internal OAM addresses?

It takes 2 PPU clocks to evaluate 1 sprite. Would it be reading OAM the first clock and High OAM the second clock?
Kingizor
Posts: 24
Joined: Sat Jun 18, 2011 10:50 am

Re: Internal OAM address sprite evaluation

Post by Kingizor »

Two dots per sprite is about as close as I've been able to get.

The evaluation process needs X and Y, but also the data from the hi table. XY is one word and char/attr is the other, so I've been assuming one word per dot with the hi table bits accessed whenever.

As far as I can tell, reads always come from the lo table index being accessed, whereas writes go to both the lo table index being accessed and the equivalent address in the hi table. In my test, writes always went to XY, but never char/attr.

Reads and writes seem to respect the toggle as well. Reads return lo byte / hi byte of a word in the lo table, whereas it still takes two writes for the lo table to be written. The toggle doesn't increment the address which seems to be fixed. Once it reaches the last fetch address, the fixed address doesn't need to increment any more so it would stay at the end.

If a game only writes once it might stand to reason that only the hi table would be affected. I haven't gotten around to toying with Uniracers, but I've been somewhat suspicious of it. If as you say it does expect a write to the final address of the hi table, that might correlate with what I've observed.

I'm not too sure about timing relative to anything else, but I'd expect evaluation to occur within the first 256 dots of a scanline. There is a window of 0-271, since it can't possibly overlap with tile fetching for sprites.

I haven't looked at the rotation bit in $2102, but it simply enough should only affect the start address. I haven't thought about the n+3 bug either, but it might fit in here somewhere.

Independent verification would be nice. ^_^
Kingizor
Posts: 24
Joined: Sat Jun 18, 2011 10:50 am

Re: Internal OAM address sprite evaluation

Post by Kingizor »

I was half asleep when I wrote that and unbelievably distracted. I don't think I processed more than the first half of the OP.

That said, I think most of what I said is correct. The part that's clearly wrong is in regards to Uniracers. There is no prior mention of the write being to the last entry instead of the last active sprite, but that's what this model predicts.

But I think there is another part to the trick. I haven't ever studied the game in much detail, but a long time ago I was curious as to the manifestation of the bug. I found one emulator that didn't have it patched and I was able to see what the borked version looked like. After that, I opened whichever debugger I was using to isolate the write. If I recall correctly, it was using HDMA to write to OAM. Now if I'm not misremembering, the important piece of the puzzle that I've only just now connected to this aforementioned I/O behaviour is that it didn't just write to OAM once, it did so twice.

I think it was a write on the first line, then again halfway through the active frame. As I've mentioned and as far as I can tell, writes only go to the lo table based on the toggle. If the toggle isn't altered between those two writes, it should be a combined word going to XY of the last index. Both writes should independently also go to the last index of the hi table.

Whether any of that makes sense or whether it works that way in practice, that remains to be seen. I'm not really in a position to test anything right now, but I wonder if perhaps some tremendously kind person could check the values of the HDMA writes and see if they resemble something that could pass for XY, and that the other char/attr word in the final sprite might resemble what is expected appear on screen? How the game updates OAM during vertical blanking might also be relevant. DMA is very common, but if there are only a few sprites it might write manually which could be a problem.

It's a lot of maybes, but maybe there's a chance things actually work this way. :D
srg320
Posts: 32
Joined: Fri Feb 16, 2018 5:52 am
Location: Ukraine

Re: Internal OAM address sprite evaluation

Post by srg320 »

Found interesting information about Uniracers.
I made a temporary patch PPU and now Uniracers works well, but need rework sprites engine for correct work.
paulb_nl
Posts: 32
Joined: Fri Nov 18, 2016 7:57 am

Re: Internal OAM address sprite evaluation

Post by paulb_nl »

I made a few different test roms to test this behaviour on a real SNES. It displays 2 rows of 32 8x8 sprites spread out over OAM and it writes to $2104 in a horizontal IRQ that can change position with the controller.

Test results:
Writes went to both low and high OAM but most of below has been tested by trying to write to the X-position upper bit in high OAM.

Cycles 0-255 OAM sprite in range scanning:
Writes go to sprite 0 through 127 as the PPU reads through OAM. Changing X or Y position will affect the current sprite.

One cycle after scanning is finished the OAM address is changed:
-If there were no sprites found on the current line and sprite pixel fetching has been done on previous lines then writes would go to the address of the last sprite fetched during pixel fetching.
-Otherwise writes would go to $200 or $202. Not sure why that varied.

Cycles ~272-339 Fetching pixels:
Writes go to sprites while the PPU is fetching pixels for the sprites found in range but this is done backwards so if sprites 0-31 were in range then writes would go to sprite 31 through 0.
So this means that after fetching pixels is finished a write will go to the first sprite in range and not the last.

Interestingly changing X or Y position of a sprite during fetching pixels will affect the sprite so it seems it is also reading OAM and checks if sprites are in range even during this process.

Maybe during range scanning it stores the sprite number/offset instead of OAM data of the sprite in range? That way they would't need to add memory for 32 sprites cache.
Kingizor
Posts: 24
Joined: Sat Jun 18, 2011 10:50 am

Re: Internal OAM address sprite evaluation

Post by Kingizor »

The part about fetching happening in reverse is a massive breakthrough. I've considered that possibility a few times but I wasn't sure if it were possible to test for it. My OAM write test was very primitive, never venturing past rendering or indeed using sprites in any meaningful way, so I never stumbled upon it.

An idea I rather liked, was that evaluation counts up when a sprite is in range, then during fetching this counter counts back down. This determines whether tile overflow is set. The only way that theory works though is if the fetching phase is aware of or can calculate how many tiles are in a given sprite, as well as keeping a smaller intermediary counter to keep track of which tile it is currently fetching for the current sprite.

It's still a bit early for my mind to evaluate (!) this new information regarding the addresses, but it does seems to make some sense.

--When fetching begins, the address is set to the final sprite to pass evaluation.

It sounds like this will always happen. On a line with no sprites, the list of active sprites is never updated, so the address is still set to the same start of the previous list, only the address doesn't get ticked further because these sprites aren't active any more, or something to that effect. Whether physical fetches occur when there is nothing to fetch is up debate too I suppose.

It all fits together rather well, yes? Or maybe not. If a counter were a count down there would be no counter with which to set the address. Maybe there is a second counter or some other chicanery abound.

---

Merely storing the index during evaluation is very clean though. If copying the data to a buffer is deferred to the fetching phase, then evaluation might be as simple as one dot each checking size against Y and the 9-bit X.

Some things do need to be buffered for rendering. At the very least, we need the (8+1)-bit X, 3-bit palette and 2-bit priority. The equivalent setup on the NES doesn't sound too dissimilar from that point. The part about decrementing X counters explains why any sprite with an off-screen X doesn't end up on screen, including X=256 that somehow passes evaluation. For writes to OAM during the fetching phase, it's likely the newly written value that gets copied to the X counter instead of the old one.

It seems rather obvious now that buffering doesn't happen during evaluation because the buffer is being used during rendering which happens simultaneously.

There is another thing I've wondered that wouldn't be hard to test; whether or not evaluation and fetching occur on the final scanline before vertical blanking begins? I imagine they would, but it would be nice to confirm one way or the other. :)
paulb_nl
Posts: 32
Joined: Fri Nov 18, 2016 7:57 am

Re: Internal OAM address sprite evaluation

Post by paulb_nl »

Kingizor wrote:
Merely storing the index during evaluation is very clean though. If copying the data to a buffer is deferred to the fetching phase, then evaluation might be as simple as one dot each checking size against Y and the 9-bit X.
It also needs the OBJ Size bit. Uniracers changes the upper bit X-position and OBJ size to set it offscreen because they use 64x64 sprites.
Kingizor wrote: Some things do need to be buffered for rendering. At the very least, we need the (8+1)-bit X, 3-bit palette and 2-bit priority. The equivalent setup on the NES doesn't sound too dissimilar from that point. The part about decrementing X counters explains why any sprite with an off-screen X doesn't end up on screen, including X=256 that somehow passes evaluation. For writes to OAM during the fetching phase, it's likely the newly written value that gets copied to the X counter instead of the old one.
I have read that it uses a 256 pixel line buffer. During rendering PPU1 sends the sprite data from the line buffer to PPU2 as 4bit pixel + 3bit palette + 2 bit priority. So evaluation would read from OAM then store in range sprite number to cache. Then tile fetching reads OAM of in range sprites again for placing it into the correct position in the line buffer.
Kingizor wrote: It seems rather obvious now that buffering doesn't happen during evaluation because the buffer is being used during rendering which happens simultaneously.
The buffer used during in range evaluation would be different from the render line buffer. It would be like secondary OAM on the NES but it does not seem to be like the NES if we can affect sprites by writing to OAM during tile fetching.
Kingizor wrote: There is another thing I've wondered that wouldn't be hard to test; whether or not evaluation and fetching occur on the final scanline before vertical blanking begins? I imagine they would, but it would be nice to confirm one way or the other. :)
This would be difficult to test because the sprites won't be rendered in vblank. Maybe it could be done if it's possible to change to 239 line mode at the final line so it continues with rendering.
Kingizor
Posts: 24
Joined: Sat Jun 18, 2011 10:50 am

Re: Internal OAM address sprite evaluation

Post by Kingizor »

paulb_nl wrote:This would be difficult to test because...
I forgot that your test was using sprites to indicate what had been written. My own tests would prefill OAM with predetermined patterns, perform the reads or writes then display the numeric values to indicate which addresses had been read or written. It's a different approach, but it seemed like an easy way to pinpoint addresses.

I've gone ahead and modified my write test, and it shows identical results on line 224 as to those before it. That seems to indicate evaluation happens as usual at the very least.

I suppose we could have reasoned that away by realising that enabling overscan at the last second wouldn't appear to have any effect on sprites.

Nevertheless, I repeated the test with overscan enabled and 239 behaved the same, so there we are. Final line same as every other line, as far as we know.
The buffer used during in range evaluation would be different from the render line buffer. It would be like secondary OAM on the NES but it does not seem to be like the NES if we can affect sprites by writing to OAM during tile fetching.
Right. Evaluation buffers indexes only. Fetching uses those indexes to buffer some data from OAM and pattern from VRAM. The writes that happen during fetching affect the buffering at that point only.

Something like:
Eval -> OAMi -> Fetching -> OAM2 -> Render

An edge case would be whether disabling a sprite during fetching would allow for another sprite or sprite tile to be displayed that would not otherwise be displayed, providing they had already passed evaluation?

I've only just realised after all this time that tile overflow cuts off the lowest indexed sprite instead of the highest. anomie_regs mentions it, but perhaps not as prominently as it should. I feel as though I've been working with a handicap by being unaware of that. With that scrap of information a lot of other things fall into place very easily. How frustrating.
I have read that it uses a 256 pixel line buffer.
A line buffer sounds conceptually far more awkward than a series of counters. It seems something of a leap to diverge from established canon. Where have you read such things? Is it from a patent? Is there reasoning? Is it..convincing?

As to functionality, the only relevant part might be the behaviour of sprite to sprite priority which always favours the combination of index and width over presence.

Let's see.. We have our buffer of up to 34 tile rows, each with an associated X, palette and priority. For a line buffer it would be incredibly wasteful to have copies of any of that data, so we can't do that. A better way might be to work by index again. For each of the 256 slots, use the entry with [index].

That might work but there are caveats. Setting eight slots per fetch is perhaps odd but might work, and backwards fetching perhaps makes it all seem too convenient. A limit of 34 is a big concern for indexes since it's such an awkward number. The resulting 5 * 256 seems excessive. X=256 isn't explained away as nicely either.

I suppose it wouldn't matter much for an implementation if the end result is the same.

Is there a simpler model?
creaothceann
Posts: 611
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Internal OAM address sprite evaluation

Post by creaothceann »

Relevant links?
viewtopic.php?f=12&t=14467&p=211380
https://board.byuu.org/viewtopic.php?f=8&t=357&p=8007
http://board.zsnes.com/phpBB3/viewtopic ... 66#p204966 - someone (can't find it again) said it's hard to know if the info is accurate
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
paulb_nl
Posts: 32
Joined: Fri Nov 18, 2016 7:57 am

Re: Internal OAM address sprite evaluation

Post by paulb_nl »

Kingizor wrote: An edge case would be whether disabling a sprite during fetching would allow for another sprite or sprite tile to be displayed that would not otherwise be displayed, providing they had already passed evaluation?
That could be the case if the sprites are overlapping.
Kingizor wrote: A line buffer sounds conceptually far more awkward than a series of counters. It seems something of a leap to diverge from established canon. Where have you read such things? Is it from a patent? Is there reasoning? Is it..convincing?
I have seen multiple people say it on this forum. It also agrees with fetching the tiles in reverse so the lowest index sprite ends up on top of the higher index sprites.
Though using a line buffer does seem to require more memory. A line buffer requires 256x (4+3+2) = 2304 bits. A 34 tile buffer would require 34 x (8x4 +9+3+2) = 1564 bits. A tile buffer would also need to know the priority between sprites somehow.

Kingizor wrote: Let's see.. We have our buffer of up to 34 tile rows, each with an associated X, palette and priority. For a line buffer it would be incredibly wasteful to have copies of any of that data, so we can't do that. A better way might be to work by index again. For each of the 256 slots, use the entry with [index].

That might work but there are caveats. Setting eight slots per fetch is perhaps odd but might work, and backwards fetching perhaps makes it all seem too convenient. A limit of 34 is a big concern for indexes since it's such an awkward number. The resulting 5 * 256 seems excessive. X=256 isn't explained away as nicely either.
If you look at the pinouts of the PPUs there are pins for 4bit sprite, 3 bit palette and 2 bit priority so the linebuffer would need to store those per pixel to send those to PPU2 during rendering.

The X=256 bug is described as counted as in range but not displayed so it would be explained by in-range scanning checking <=256 instead of <=255 and then during fetching stopping at 255. Tile rendering to the linebuffer has to check if the position of the current tile pixel does not exceed 255 anyway.
paulb_nl
Posts: 32
Joined: Fri Nov 18, 2016 7:57 am

Re: Internal OAM address sprite evaluation

Post by paulb_nl »

creaothceann wrote:Relevant links?
viewtopic.php?f=12&t=14467&p=211380
https://board.byuu.org/viewtopic.php?f=8&t=357&p=8007
http://board.zsnes.com/phpBB3/viewtopic ... 66#p204966 - someone (can't find it again) said it's hard to know if the info is accurate
Some nice info there. The info about the line buffer being 2x 128x9 is interesting.

I have looked at the die shots of PPU1 and PPU2 and counted the memory bits:

http://siliconpr0n.org/map/nintendo/s-ppu1/
PPU1
upper left is 64x32 + 64x32 = 4096bits = 512byte OAM
upper right is 32x20 + 32x16 + 32x16 + 32x20 = 2304bits = 256x9 Line buffer. Indeed 2x 128x9.

There is still some kind of memory between OAM and the line buffer and I am not sure how to count it but the top has 7 bit lines and bottom has 8 so the top could be 32x7 = Sprite index for in-range sprite and bottom 32x8 = High OAM.

http://siliconpr0n.org/map/nintendo/s-ppu2-b/
PPU2
lower left 56x32 + 64x32 = 3840bits = 480byte CGRAM
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Internal OAM address sprite evaluation

Post by tepples »

I was under the impression that Atari 7800 and Neo Geo also rendered sprites into a line buffer. Is that correct?

Would it be fair to say "S-PPU1 handles the sprites, and S-PPU2 handles the backgrounds and compositing"? Or "Splitting the PPU into two chips is a large part of what allows the Super NES to have more palettes, more background layers, and more flexible shadow/highlight than the Genesis"?
Kingizor
Posts: 24
Joined: Sat Jun 18, 2011 10:50 am

Re: Internal OAM address sprite evaluation

Post by Kingizor »

Well that's that cleared up. It's certainly not how I imagined it all, but it's much better to learn the correct way and the means of getting there. I'll have to re-evaluate some things.

There is one other loosely related thing I might as well ask:

There is a bug mentioned in one book (i.e. book1 2-25-2), that mentions tile overflow getting set erroneously in some circumstances. Something to the effect of sprite zero having a size larger than 8x8, being on-screen and having other sprites with negative positions. It's a bit vague and there are a few different ways to set up a configuration like that. I recall trying it one such way and not getting any results. I suppose it's possible it could have been the X=256 bug again and no-one realised at the time.

I've never seen it mentioned anywhere else, even for the purpose of debunking it. I don't think it exists as described, but I'm mainly asking because I have the feeling I've been missing a lot of obvious things lately.
That one is unusual. I'll have to study that too.. There is one thing in particular that I'm curious about.

It says sequence fetching begins at the start of the scanline, that there are 33 such sequence fetches and that rendering begins around two sequences in. I tend to agree with that, but it seems to omit specific timing for mode7 which doesn't use sequences.

33 sequences indicates a gap between sequence fetches and sprite fetching in the tile-based modes where no VRAM accesses occur. There is no point in buffering mode7 data, so it can be used almost immediately after it is fetched. Based on that, my model said rendering begins roughly around ~16, first mode7 nametable fetch on ~15, then first mode7 pattern fetch a dot later ~16.

If mode7 fetches happen around that time, then the sequence gap from ~264-271 doesn't exist in mode7 and VRAM is potentially occupied during that time too. There is a very subtle consequence here if anyone can spot it, perhaps more than one.

The only things I feel I haven't factored in are CGRAM accesses and the pixel pipeline, but I don't anticipate problems with either.

I wonder if any of that sounds unreasonable? Does anyone have insights or ideas regarding the timing of mode7 fetches relative to anything else? Perhaps there is a nice 100-page thread nearby devoted to it that I've overlooked?
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Internal OAM address sprite evaluation

Post by psycopathicteen »

paulb_nl wrote:I made a few different test roms to test this behaviour on a real SNES. It displays 2 rows of 32 8x8 sprites spread out over OAM and it writes to $2104 in a horizontal IRQ that can change position with the controller.

Test results:
Writes went to both low and high OAM but most of below has been tested by trying to write to the X-position upper bit in high OAM.

Cycles 0-255 OAM sprite in range scanning:
Writes go to sprite 0 through 127 as the PPU reads through OAM. Changing X or Y position will affect the current sprite.

One cycle after scanning is finished the OAM address is changed:
-If there were no sprites found on the current line and sprite pixel fetching has been done on previous lines then writes would go to the address of the last sprite fetched during pixel fetching.
-Otherwise writes would go to $200 or $202. Not sure why that varied.

Cycles ~272-339 Fetching pixels:
Writes go to sprites while the PPU is fetching pixels for the sprites found in range but this is done backwards so if sprites 0-31 were in range then writes would go to sprite 31 through 0.
So this means that after fetching pixels is finished a write will go to the first sprite in range and not the last.

Interestingly changing X or Y position of a sprite during fetching pixels will affect the sprite so it seems it is also reading OAM and checks if sprites are in range even during this process.

Maybe during range scanning it stores the sprite number/offset instead of OAM data of the sprite in range? That way they would't need to add memory for 32 sprites cache.
How can you know if you're writing to X or Y?

Also has anybody figured out why weird sprite sizes like 16x32 and 32x64 show up in the two undocumented sprite sizes?
paulb_nl
Posts: 32
Joined: Fri Nov 18, 2016 7:57 am

Re: Internal OAM address sprite evaluation

Post by paulb_nl »

psycopathicteen wrote:
How can you know if you're writing to X or Y?
X is easy. Just write to OAM and you can change the X position upper bit so the Sprite would go offscreen.

For Y pos I wrote to OAM a lot of times in a row and saw a Sprite move down.
Post Reply