2bpp bullet rendering

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

2bpp bullet rendering

Post by psycopathicteen »

I have an idea for rendering bullets for a bullet hell shmup. Store 2bpp bullet sprite patterns and their transparency masks in the ROM at 8 horizontal scroll values. One row of 8 pixels would work like this:

lda buffer,y
and mask,x
ora pattern,x
sta buffer,y

That's already 8 pixels in ~20 cycles. I'm not too sure if that's enough though.
User avatar
darryl.revok
Posts: 520
Joined: Sat Jul 25, 2015 1:22 pm

Re: 2bpp bullet rendering

Post by darryl.revok »

I'm not sure if I understand the idea for this. If your bullets are sprites, why do you need to update the pattern? Wouldn't it be easier to have a single sprite shared between as many bullets as you want, and move the bullets on-screen via object position?

You're going to need a value with which to calculate collisions anyway. Perhaps I'm missing something, but the explanation is very vague.

It could be that I'm missing something because this is SNES and not NES, but you can still move sprites on SNES, right?

If you can figure out what they did for Aero Fighters, from what I've seen I feel like that game has probably the best bullet:slowdown ratio for the console.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: 2bpp bullet rendering

Post by Drew Sebastino »

darryl.revok wrote:I'm not sure if I understand the idea for this. If your bullets are sprites, why do you need to update the pattern?
But they aren't sprites. :wink: It's BG3 being used as a screen with a bunch of bullets being "painted" on it. The reason you'd do this is to avoid the sprite limit and the sprite pixel per scanline limit. It's really a shame that oam can't just be updated during vblank, because I think sprites are being drawn to a linebuffer then?

Wait... Couldn't you just write to oam during active display? I must be missing something, because this is way to obvious. I just don't see how oam could be used during hblank and active display, because I thought I remembered hearing about how sprites are drawn and then BGs or something like that.

Sorry for derailing this already. Sprites are really one of those situations where I'm not completely sure why they aren't just meant to be CPU driven, as in it wouldn't just have the same number of sprites per scanline as the total and you'd just multiplex it. Doesn't the Amiga actually work this way? Oh wait, I just said I was sorry for interrupting this. :lol:
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: 2bpp bullet rendering

Post by Sik »

Honestly, if I were to make a bullet hell for a 4th gen platform, I'd try to come up with regular patterns that are easy to recreate (e.g. by using tilemaps). Bonus in that it simplifies collision calculations (e.g. if there's a row of bullets, first check if the ship is inside the row, then if within the loop it's a bullet or a gap)
UnDisbeliever
Posts: 123
Joined: Mon Mar 02, 2015 1:11 am
Location: Australia (PAL)
Contact:

Re: 2bpp bullet rendering

Post by UnDisbeliever »

psycopathicteen wrote:That's already 8 pixels in ~20 cycles. I'm not too sure if that's enough though.
Umm, according to the bsnes/higan source it would be 24 cycles if the Accumulator & Index registers are 16 bits long.


Anyway, I thought about coding a bullet hell game. Let me find my notes.

There would be 2 buffers, one for player bullets, one for enemy bullets. Each buffer would be a 1bpp bitmap, 256x192 px in size (6144bytes). Bullets would be a single pixel in size.

Some VMAIN magic would combine the two buffers into a single 2bpp tileset.

Transfer One: player bullet buffer DMA DMAP_TRANSFER_1REG to VMDATAL with VMAIN set to $04 (increment on VMDATAL, 8 bit address shift).
Transfer Two: enemy bullet buffer DMA DMAP_TRANSFER_1REG to VMDATAH with VMAIN set to $84 (increment on VMDATAH, 8 bit address shift).

I never actually implemented this. My napkin-math suggested that I would not have been able to fit 250 bullets and 10 enemies onto the screen at 30fps (I lost that sheet and I can't remember how I got that conclusion),

Draw Bullet Code:

Code: Select all

.A8
.I16
; DP = bullet address

    LDA z:Bullet::xPos
    AND #$07
    TAY

    LDA z:Bullet::xPos
    LSR
    LSR
    LSR
    STA tmp

    REP #$30
.A16
    LDA z:Bullet::yPos
    AND #$00FF
    XBA
    LSR
    LSR
    LSR
    ; C always clear
    ; value at address tmp+1 is always 0
    ADC tmp
    TAX

    ; X = (xPos & 7)
    ; Y = yPos * 32 + xPos / 8
    SEP #$20
.A8
    LDA buffer, X
    ORA SetBulletTable, Y
    STA buffer, X

Code: Select all

SetBulletTable:
.repeat 8, i
    .byte 1 << i
.endrepeat
Collision code would have been pixel perfect:

Code: Select all

.A16
.I16
Check_8x8Collision:
    ; X = frame collsion data offset + (xPos & 7) * 2
    ; Y = yPos * 32 + xPos / 8

.repeat 8, i
    LDA buffer + i * 32, Y
    AND frameCollisionData + i * 16 * 2, X
    BNE CollisionOccoured
.endrepeat

   ; no collision

CollisionOccoured:
    ; collision code

Code: Select all

; CollisionData
; -------------
.macro _buildRow data
    .repeat 8, i
        .word data << i
    .endrepeat
.endmacro

CollisionDataFrame1:
    _buildRow %00011000
    _buildRow %00011000
    _buildRow %00111100
    _buildRow %00111100
    _buildRow %00111100
    _buildRow %00111100
    _buildRow %01111110
    _buildRow %11111111
.endrepeat
EDIT: Added info about combining buffers.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: 2bpp bullet rendering

Post by psycopathicteen »

I was basically thinking of doing this.

Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it. (Or underneath it, wait I need to check how priorities work again) The whole game would probably run at 30fps, alternating between updating the normal sprites and backgrounds, and updating bullets. It would need the screen to be cropped at 184 pixels in order to fit a whole 2bpp screen in one frame.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: 2bpp bullet rendering

Post by psycopathicteen »

This is 256 bullets moving at 20fps.
Attachments
bullet hell.zip
(2.3 KiB) Downloaded 231 times
UnDisbeliever
Posts: 123
Joined: Mon Mar 02, 2015 1:11 am
Location: Australia (PAL)
Contact:

Re: 2bpp bullet rendering

Post by UnDisbeliever »

psycopathicteen wrote:This is 256 bullets moving at 20fps.
Nice.
The movements are smoother than I expected them to be.

Are you going to do more with this?
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: 2bpp bullet rendering

Post by psycopathicteen »

Umm, according to the bsnes/higan source it would be 24 cycles if the Accumulator & Index registers are 16 bits long.
At first I didn't know what you were talking about, but then I found this at http://www.defence-force.org/computing/ ... /annexe_2/:
3) Add 1 cycle if adding index crosses a page boundary
I seriously never knew that. Surprisingly the long index addressing doesn't have a similar limitation. I wonder if that was just something left over from the 6502, because I don't see why a CPU with a 16-bit ALU would need to do that.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: 2bpp bullet rendering

Post by Drew Sebastino »

psycopathicteen wrote:I was basically thinking of doing this.Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it.
I think 93143 is doing the same thing.
psycopathicteen wrote: It would need the screen to be cropped at 184 pixels in order to fit a whole 2bpp screen in one frame.
Why not double buffer?
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: 2bpp bullet rendering

Post by Sik »

Presumably you need memory for everything else too. (also could be referring to transfer bandwidth)

Also I was thinking, most bullet hells are vertical. You could probably just use that ad an excuse to render only half the screen (the extra space would be presumably used for the HUD)
UnDisbeliever
Posts: 123
Joined: Mon Mar 02, 2015 1:11 am
Location: Australia (PAL)
Contact:

Re: 2bpp bullet rendering

Post by UnDisbeliever »

psycopathicteen wrote:I seriously never knew that. Surprisingly the long index addressing doesn't have a similar limitation. I wonder if that was just something left over from the 6502, because I don't see why a CPU with a 16-bit ALU would need to do that.
It is because absolute index addressing can increment the bank when the index crosses the bank boundary. This means that that 2 processing cycles are needed to preform the 24 bit addition with the 65816's 16 bit ALU.

The 65816 first preforms an 8 bit addition between the low byte of the address and the low byte of the index when it reads ADDR.H.
In the next cycle preforms a 16 bit addition between the 16 bit DB:ADDR.H and IH.
If the page boundary is never crossed (8 bit index && carry of {ADDR.L + I} is 0) then DB:ADDR.H is unchanged and the addition is skipped, saving an unneeded cycle. (source)

With absolute long addressing the second addition is processed in the half-cycle after the bank byte is read from memory and will not save a cycle if skipped.

EDIT: added source, reordered sentences.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: 2bpp bullet rendering

Post by psycopathicteen »

Add 1 cycle for indexing across page boundaries, or write, or X=0
Do they mean X=0 as in the status register bit that controls the size of the index registers? So does that mean that it always take an extra cycle when the index registers are 16-bit?
UnDisbeliever
Posts: 123
Joined: Mon Mar 02, 2015 1:11 am
Location: Australia (PAL)
Contact:

Re: 2bpp bullet rendering

Post by UnDisbeliever »

psycopathicteen wrote:
Add 1 cycle for indexing across page boundaries, or write, or X=0
Do they mean X=0 as in the status register bit that controls the size of the index registers? So does that mean that it always take an extra cycle when the index registers are 16-bit?
Yes, that extra cycle always occurs when the Index registers are 16 bit.
93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: 2bpp bullet rendering

Post by 93143 »

Espozo wrote:
psycopathicteen wrote:I was basically thinking of doing this.Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it.
I think 93143 is doing the same thing.
Yeah, for a port of an existing game. But I'm not doing software rendering on the S-CPU, partly because of the sheer number, size, and colour depth of bullets in the original, and partly because I got stubborn about look&feel and blew 3/4 of my CPU budget on raster effects. I'm using the Super FX chip for bullets and collisions, which moves me out of direct competition with anything that doesn't need a coprocessor.

Also, it's been almost two years and I still haven't done a bullet test. Advantage: not me...
Post Reply