Poll: How would you prefer a first-person shooter on the NES to look like?

A place for your artistic side. Discuss techniques and tools for pixel art on the NES, GBC, or similar platforms.

Moderator: Moderators

What kind of resolution would you prefer in an NES first-person shooter?

Lower resolution, occupying more screen space
Higher resolution, occupying less screen space, only if the frame rate doesn't drop
Higher resolution, occupying less screen space, even if the frame rate drops a bit
Higher resolution, occupying less screen space, even if the frame rate drops a lot
Total votes: 27

User avatar
Posts: 12003
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Poll: How would you prefer a first-person shooter on the NES to look like?

Post by tokumaru » Tue May 05, 2020 1:26 pm

Bananmos wrote:
Tue May 05, 2020 11:21 am
Great to hear you're working on the raycaster again! I think it was such a cool project and would love to back a Kickstarter for the first fun'n'playable NES FPS. :)
Thanks! Let's keep our fingers crossed!
I think in cases like these, faking stuff is always better than brute-forcing it.
Yeah, I totally agree. I'm already faking quite a few things!
The key thing is cheating by using vertical symmtry to only render half the screen. Of course, vertical flipping of the background isn't totally free on a system like the NES which lacks BG tile flipping, and the feasibility heavily depends on whether you'll use an IRQ-capable cart or not.
Yeah, the symmetry trick isn't nearly as useful on the NES as it is on the SNES and on the Genesis. Plus, I am trying to go a little beyond what Wolfenstein 3D did, by adding some vertical movement and walls of variable height.
Untested, but as I've mentioned before, I think using the scroll registers to fake horizontal (and even partial vertical) movement has some great potential to create a smoother experience.
I guess you convinced me to give this a try, but I think that the flickering at the sides will be too distracting. To avoid the flickering and do this properly I'd have to render extra columns at the sides of the screen and also figure out a way to mask them until they need to be scrolled in, so unfortunately this trick isn't as trivial to implement as it sounds. I also worry about the disparity in smoothness when turning vs. walking forwards and backwards.
When you move forwards in an FPS the low resolution / FPS is usually not too distracting in my experience. It's when you try to turn / strafe that the chunkiness and low framerate become glaringly obvious.
I see. I will give this a try without bothering about the sides at first, and if the results are promising, I'll see if there are enough resources to do it the proper way.
Finally, as for sprite objects and clipping, just this week I have been playing around with making a sprite scaler based on Tepples lookup table concept, but using unrolled loops for the vertical scaling in place of the DDA algorithm Tepples used. I've also chosen to overlap sprites vertically as well as horizontally, which greatly simplifies the loop unrolling problem - albeit at the expense of sprite assignment and OAM cycling being a lot more difficult, in order to make sure that blank "zombie pixels" in sprites that not supposed to be visible any more never get assigned to horizontally higher priority sprites than those that have "live pixels" on the same scanline.
Interesting, I also remembered tepple's sprite scaling experiments when looking for a solution to my problem, but ended up not looking into it after all. I did think of solutions that would result in those dreaded "zombie pixels" you talked about, and couldn't find a satisfactory solution.

Currently, my (untested) solution for drawing sprites is to use pre-rendered tiles containing all 256 patterns of 2x4 soft-pixels of a single color + transparency. With sprite mirroring, all 256 patterns fit in only 76 tiles, so I can have 3 separate sets, one for each of colors 1, 2 and 3, occupying 228 tiles (actually 226, I don't need repeats of the transparent tile). This is cool because it gives objects access to the whole 12 colors of the sprites, they're not restricted to a single set of 3 colors, but the fact that each color needs to be rendered as a different set of sprites means that the graphics have to be carefully designed to avoid excessive overlap. Pre-scaling the sprites could help with optimizing the sprite arrangement and reducing overlap.
This also made me think that only a subset of the columns / rows actually need updating each frame, and that temporal coherence could be taken advantage of in order to update only a subset of the scaled sprites data each frame.
That's a good point.
As long as you don't mind a *lot* of wasted PRG, you can actually get the clipping against 3d walls "for free" by combining the lookup table for the horizontal scaling with the table for horizontal clipping, and just point it at another 256 page. The lookup table and your scrambled tile data should still fit in a 16kB bank, which is all you really care about if it's a visual demo more than a size coding competition.
I'm not sure I follow the whole thing about the "free" clipping. BTW, the horizontal clipping is not much of an issue, you can just compare the distance to the wall and the distance to the object, and not draw that column of the object if the wall is closer. The problem is that when you have varying floor/ceiling heights, objects need to be clipped vertically, and that's a whole other can of worms!
I'll try to find some time to clean up the code and do some more experiments in the next week or so.
I'd love to see what you have on sprite scaling.

I have the feeling that doing sprites "properly" will be way to slow for an FPS on the NES. Like almost everything else in this engine, this will have to be accomplished via a few tricks and a ton of lookup tables!

Posts: 1760
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: Poll: How would you prefer a first-person shooter on the NES to look like?

Post by Pokun » Thu May 14, 2020 5:10 am

Although I would probably need to test a demo to make a good judgement, I spontaneously think that performance is important in this case, no matter if it's an action game or a dungeon crawler. More so in an action game, but also in a dungeon crawler it's important to not loose orientation which is easy to do if the 3D maze movement isn't smooth enough. Resolution should probably not be too low though, and like some other here said, I don't really mind a smaller screen with a frame either. For these reasons I voted the second option "Higher resolution, occupying less screen space, only if the frame rate doesn't drop".

All that said, like Bregalad I'm not really a fan of FPS games (Goldeneye 007 is a notable exception) and I hardly played neither Wolfenstein 3D nor Doom back in the day. I was ever only really interested in them for their technical aspects rather than their aesthetics which never appealed to me. I'm still looking forward to see how this project turns out though.

Posts: 96
Joined: Sun Jan 12, 2020 8:42 pm

Re: Poll: How would you prefer a first-person shooter on the NES to look like?

Post by dink » Thu May 14, 2020 5:28 am

[x] Like Contra!!

User avatar
Posts: 226
Joined: Fri Jan 24, 2014 9:05 am
Location: Hungary

Re: Poll: How would you prefer a first-person shooter on the NES to look like?

Post by za909 » Fri May 15, 2020 1:08 am

I wonder if lighting like this is possible on the NES. It really adds a lot to the mood of the level, like you are in a scary, enclosed space: https://m.youtube.com/watch?v=ieCqOjwh3fA

User avatar
Posts: 12003
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Poll: How would you prefer a first-person shooter on the NES to look like?

Post by tokumaru » Fri May 15, 2020 1:45 am

za909 wrote:
Fri May 15, 2020 1:08 am
I wonder if lighting like this is possible on the NES.
I guess it should be possible to do something similar to what Doom does: https://doomwiki.org/wiki/COLORMAP

Apparently the Doom engine uses several lookup tables, each simulating a different brightness level, but still only using a total of 256 colors.

I, however, have only 16 colors, which are actually dithered patterns built from only 4 colors, making it that much harder to create convincing gradients into darkness. This would also lock the engine to a specific set of 4 colors, while with the setup I'm currently using I am able to change the palette as the player moves through the level.

Here's an interesting raycasting engine for the Amstrad CPC that implements lighting effects: https://youtu.be/TbUWK461Vkk

It's kinda cool, but also a bit disorienting. I'm not sure this is the best route to take when the number of colors we have is so limited.

Posts: 534
Joined: Wed Mar 09, 2005 9:08 am

Re: Poll: How would you prefer a first-person shooter on the NES to look like?

Post by Bananmos » Sat Jun 27, 2020 4:24 pm

I'd love to see what you have on sprite scaling.
Sorry for the silence. It took a while longer than I predicted to get on top of this.

I do however think I've got a pretty nifty variation on Tepples's original scaling method. But on the other hand, given its main drawback I'm not so sure it will be useful to your application...

For the cutscene composer I'm working on, I was using the Ninja Gaiden artwork for testing. And I never before appreciated just how elaborate its palette usage actually is. The face in the first cutscene for example, uses 3 palette variations just to allow a 4th color to be used within the 8x8 tile, and then a sprite overlay on top of that for the eye, maxing out the sprite palette.

Anyway, I was thinking I really wanted a version of sprite scaling that could accommodate these sort of multi-palette sprites. Additionally, I wanted to see if Tepples's sprite scaling could be a bit optimised to run more efficiently, and if there was opportunity to skip unnecessary updates to CHR-RAM.

The horizontal scaling is done using lookup table just as in Tepples version. But for the vertical part, I've chosen to employ the same system of overlapping the sprites, and using unrolled loops for the vertical scaling to efficiently scale the vertical sprites. For the unrolled loops I chose to have them write directly to CHR-RAM, mainly due to my use-case being nearly-full-screen images with scanline effects applied, so the trade-off of sacrificing generic frame time for more efficient vblank period didn't feel like the right one... also, I figured doing brute-force scaling updates of big sprites could potentially require more than the standard ~240 bytes a transfer buffer in the stack can provide. Finally, using the auto-increment feature of $2007 allowed for more efficient register usage, and pha was not an option given I was using the stack for efficient hand-over between vertical scalers.

Here's one the unrolled loops:
; 4 cycles
sta $2007

; 8 cycles

.ALIGN $100
.proc SpriteScalingVertical_8_5
; Scale 8 -> 5
; X.X.XX.X
.repeat 2, I
; Use sliver 0
ldy SourceTilesS0 + 256*I,x
lda (ssHScaleTabAddr),y
; -> 4+5+4 = 13 cycles (3+2+3 = 8 bytes)
; Skip sliver 1
; [...]
; Use sliver 2
ldy SourceTilesS2 + 256*I,x
lda (ssHScaleTabAddr),y
; Skip sliver 3
; [...]
; Use sliver 4-5
ldy SourceTilesS4 + 256*I,x
lda (ssHScaleTabAddr),y
ldy SourceTilesS5 + 256*I,x
lda (ssHScaleTabAddr),y
; Skip sliver 6
; [...]
; Use sliver 7
ldy SourceTilesS7 + 256*I,x
lda (ssHScaleTabAddr),y
; 3 empty lines
lda #0
; Return code
; Total cycles: 2*(5 * 13 + 2 + 3*4) + 8 = 2*79 + 8 = 166
; Total bytes: 2*(5 * 8 + 2 + 3*3) + 2 = 2*51 + 2 = 104
The way these loops is executed in vblank is also worth mention: To keep logic minimal, I employ what I call a "call chain" on the stack, consisting of 2*MAX_ROWS bytes denoting an RTS return address to the next unrolled loop, for an almost-instant handover. The actual address is then set depending on the vertical scale factor for each row. To accommodate longer/shorter columns and / or partially updated columns, the stack pointer can be set to start at a different point in the call chain, and an address in the call chain can be overwritten with the final return address to end the column updating prematurely.

But even with the unrolled loops and selective updates of only the columns and rows that actually changed in the frame, the brute-force method was still taking almost half the frame time for ~60 sprites, which made it not very useful for practical scenarios. So I realised to get this thing off the ground I would have to really embrace the NES philosophy of just doing small incremental updates to VRAM...

The core problem here is that your default integer-rounding for deciding horizontal / vertical scale factors for each tile will end up with tile sizes oscillating between the current / next integer scale factor. This pattern can be observed in the left version of the following gif animation, which illustrates both the column / row scale factors, as well as overlaying red transparency on the "dirty" columns and rows.
The left-hand version thus means invalidating the heavy work we just did, just to re-do it again soon. Not ideal.

The right-hand version OTOH manages to minimise the updates to just one dirty row / column at a time, by using a "scaling progression" table for the columns / rows that essentially increases the size of just one column / row at a time according to a set pattern. Whilst I thought a bit about how to best generate this table, I just ended up manually writing it up to be visually appealing. Basically, it just picks columns / rows to increase to the next tile size, depending on the column width / row height for the sprite. Here's the hard-coded patterns I chose for a column or row size of 1-10:

Code: Select all

 1: 0,
 2: 1, 0,
 3: 1, 0, 2,
 4: 2, 0, 3, 1,
 5: 2, 0, 4, 1, 3,
 6: 3, 0, 5, 2, 4, 1,
 7: 3, 0, 6, 2, 5, 1, 4,
 8: 4, 0, 2, 6, 5, 7, 1, 3,
 9: 4, 0, 8, 2, 6, 1, 7, 3, 5,
10: 5, 0, 2, 8, 6, 3, 9, 1, 7, 4
Basically, I just tried to start by incrementing the middle column first, and then successively try to increment the columns / rows that have the furthest distance to already-incremented neighbours, to try and manually "smooth out" the scaling.

This show the same naive scaling on the left and scaling progression table on the left, without the debug visualisations:
The right-side does end up looking different than the left, but not observably worse AFAICT. Both of the methods are kind of equally wrong, as the "proper" way would be to increase the pixel sizes, given infinite resolution.

There's also yet another trick I tried employing to minimise the VRAM bandwidth: If your scaling happens to just occasionally dirty both the column and the row in the same frame, you can delay one of them, with the only artifact being that the growing / shrinking of your sprite is delayed by a single frame, which won't be noticeable in practice. Essentially, the desired x/y scale of a sprite becomes a target value to attempt to reach within a set bandwidth limit.

Finally, the big caveat of this method are the blank "zombie" pixels of vertically downscaled tiles that overlap with the pixels in the tile below it, and would cause the sprites-per-scanline limit to kick in. The way to get rid of this requires sorting the sprites so that zombie pixels always have a higher sprite index than the non-zombie ones. Here's another animation to illustrate it. (yellow parts are the blank zombie pixels that we want to be dropped before any other sprites)
Now in practice, a sorting routine would be too expensive on the NES. But merging two originally sorted lists is a linear operation. So essentially you just need your sprite drawing code to work on a per-row basis, and select a row from the metasprite whose partial vertically-scaled tiles have the lowest y-coordinate for its bottom edge.

But this caveat is also the big drawback of this method. Not only does it add some awkward complexity to your code and puts limitations on your OAM cycling schemes, but it will also prevent you from being able to use the sprite index number to have metasprites appear in front / behind each other based on Z-order.

And given that the top use case for sprite scaling is to simulate a 3d scenario, that's a pretty tough sacrifice to make to get smooth sprite scaling... :(

My own use-case for making deterministic cutscenes with a lot of room left for raster effects means that the lower CPU time and VRAM bandwidth wins out over depth-ordering features. But unless you can design your ray caster levels to avoid having enemies that can go in front of / behind each other - a very limiting constraint - then you might be better off just using prescaled sprites and accepting choppier scaling.

Reverting to Tepples's original scheme is also possible, but then you're back to using a lot of VRAM writes, as a single dirty row will effectively invalidate all other rows below it by default. Though the scaling progression table should still help alleviate *some* of this cost.

Another downside compared to Tepples's method is that it doesn't work so well for simulating rotation with sprite shearing. In Tepples's method you just rewrite rows instead of shifting the sprite grid vertically, so shifting the whole thing horizontally works fine without any artifacts. But trying to shift the grid both vertically and horizontally will introduce ugly holes in your sprite as can be seen in these images.
SpriteShearing.png (6.47 KiB) Viewed 3107 times
SpriteShearingDebug.png (11.38 KiB) Viewed 3107 times
Although this problem you can probably get around by just introducing a default horizontal overlap of 1 pixel for your sprite rows, at the expense of incrementing the number of sprites used by one. It won't help the Ninja Gaiden face example though, as it already maxes out the sprites / scanline usage. :P

Finally, here is a simple demo rom using it in practice. It actually uses forced blanking at the top of the screen and the CHR-RAM writes are just slightly too long to fit in vblank due to the unrolled loop writing directly to CHR-RAM. But if you converted the unrolled loops to a version that fills a transfer buffer, it would definitely fit in the vblank period.

(edit: Replaced previously uploaded buggy demo ROM with one which has correct sprite indexing and also avoids the OAM corruption bug on NTSC)

A few other variations I thought of:
- If you have WRAM on your cart, it might make more sense to do your sprite scaling in two stages: The horizontal scaling updating dirty columns in WRAM, and the vertical scaling unrolled-loops then writing the data of the already-horizontally-scaled columns (if dirtied) to VRAM. Removing the horizontal scaling from the critical path effectively makes the VRAM writes as cheap as using a transfer buffer, but you also gain the benefit of re-using the work done for horizontal scaling, which should be a nice performance gain.

- I'd like to do an 8x16 sprite version of these unrolled loops at some point, not least because it's a bit cumbersome to have to track dirty rows / tile sizes in two separate bytes to accommodate. It'd also be nice to display a scaling 128-tile metasprite of this method, though it would requires most of the vblank time and blanking the top / bottom 8 pixels to achieve.

- I think an unrolled version of Tepples's original version would be worth exploring as well at some point, due to the benefits it has over sprite overlapping. But the unrolled loop would be a bit more complicated, because of the NES CHR format planes are interleaved. You essentially need to have your unrolled loop work on a bitplane of a tile at a time, and use one of the index registers to fetch source CHR data from a variable point. If you want to avoid using indexed-indirect for this source CHR data and use a fixed address, then this also limits the tiles a single unrolled 1bpp-loop can address to 256/8 = 32 tiles in height, which could be limiting. Though if your memory organisation so allows, you could pack that source CHR data into the start of each bank and employ bank switching to have your fixed $8000 fetch data from different PRG-ROM locations.
Last edited by Bananmos on Sun Jun 28, 2020 11:20 am, edited 2 times in total.

Posts: 534
Joined: Wed Mar 09, 2005 9:08 am

Re: Poll: How would you prefer a first-person shooter on the NES to look like?

Post by Bananmos » Sat Jun 27, 2020 4:30 pm

I'm not sure I follow the whole thing about the "free" clipping.
So what I meant here was that with the unrolled loop doing the "lda (ssHScaleTabAddr),y", you can just have your ssHScaleTabAddr point to a lookup table that combines *both* the scaling and a clipping by x pixels to the left / right into a single lookup table.

It does increase the size of your lookup tables quite drastically though, so it's definitely not free in terms of ROM space.

Post Reply