WIP: Wizard of Wor

A place where you can keep others updated about your NES-related projects through screenshots, videos or information in general.

Moderator: Moderators

tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

Am slowing down again, as I need to squash a whole bunch of bugs that have crept into the code.

-Thom
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

Wizard of Wor WIP: Worrior and monster collision detection fully implemented (both worriors and monsters able to shoot and kill each other, with appropriate points rewarded.) There still is at least one lingering bug with the laser code. But for now, I need to optimize all the bounding box checking code to gain back much needed cycles as the game slows down when everybody is on screen and shooting. Can't have that. Computer is playing blue.

And now, I need to take a break from new features, to drastically optimize the bounds checking code, as I am doing lots of multiplies and divides all over the code for ostensibly similar or same values. (at least I think), I need to do the calculations once, and just use them per frame, and that should free up more than enough cycles to finish the game play implementation.

Image

Latest build is here:
wow-kills-splosions.nes
(40.02 KiB) Downloaded 865 times
-Thom
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

Wizard of Wor NES WIP:

I had initially planned to do three major optimizations. I have done two, and the result is dramatic. It seems I was at least spanning two or more frames worth of time to do my game logic. By simply re-arranging the game state arrays, and placing them into 6502 zero page, the game program logic as is, is running at full frame rate, speeding up by at least 200% ... WHAT A DIFFERENCE.

Basically before, I was building macros that did:

Code: Select all


unsigned char stamps[NUM_FIELDS*NUM_STAMPS];

#define STAMP_NUM(x) (x*NUM_FIELDS)
#define STAMP_X (STAMP_NUM(x)+0)
#define STAMP_Y (STAMP_NUM(x)+1)
...

stamps[STAMP_X(i)]=new_stamp_x_position;
stamps[STAMP_X(i)]=new_stamp_y_position;
...

if (stamps[STAMP_X(i)]==... && stamps[STAMP_Y(i)]==... )
{
...
}
and so on...

Which was causing a 6502 software multiply (because no hardware multiply) on EACH AND EVERY read and write of game state, and I was doing this a total of about 220 times throughout the game logic.

I replaced this with:

Code: Select all

unsigned char stamp_x[NUM_STAMPS];
unsigned char stamp_y[NUM_STAMPS];
...

stamp_x[i]=new_stamp_pos_x;
stamp_y[i]=new_stamp_pos_y;

...

if (stamp_x[i]==... && stamp_y[i]==...)
{

}
You can see, not only does this look cleaner, but it also runs much better, because the resulting calls literally become either direct X or indirect Y loads and stores. Which the 6502 loves to do..which is why I am KICKING myself for not doing it earlier. I KNOW this from doing 6502 assembler that it's better to keep arrays of the same data laterally together instead of in a c type struct or array, as it's simply an index change in the end.

I've pasted a copy of the latest ROM here, you can see it runs a fuckload faster, wowza!
wow-post-optimize.nes
(40.02 KiB) Downloaded 461 times
And of course, a GIF showing the new speed, it flies.. and I can now really start tuning the main game.

Image

Damn, I feel good!

-Thom
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

You can see the relevant diff here:

https://github.com/tschak909/wow/pull/9 ... af2acc3754

-Thom
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: WIP: Wizard of Wor

Post by lidnariq »

LDA zpg,X is the same speed as LDA abs,X — at least as long as there's no zero crossing —so if you find there's memory pressure on zero page addresses you may be able to move arrays up.
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

now that everything is so smooth and zoomy-zoomy, I'm re-working the animation and delay routines to slow everything down, and slowly speed up as the level progresses (given a level #, adjust how fast the scaling happens, and the top speed value.)

This is happening in the initiial_tuning branch.

-Thom
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

Does anyone have a decent algorithm for a fractional delay? I need to apply both an animation cel delay, and a sprite position delay, and using frames for this seems to be too coarse.

-Thom
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: WIP: Wizard of Wor

Post by Kasumi »

Add a 16bit number, but only use the high byte to display where it is.

Code: Select all

lda poslow,x
clc
adc #$C0
sta poslow,x

lda poshigh,x
adc #0
sta poshigh,x

sta OAM,y
This will move the object a bit faster than one pixel every two frames (which would be adc #$80)
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

Kasumi wrote:Add a 16bit number, but only use the high byte to display where it is.

Code: Select all

lda poslow,x
clc
adc #$C0
sta poslow,x

lda poshigh,x
adc #0
sta poshigh,x

sta OAM,y
This will move the object a bit faster than one pixel every two frames (which would be adc #$80)
Thanks.

The problem I seem to be having, is that if I delay any amount, the delay seems asymmetrical, and I suspect this may be because of the code in the runtime that allows not only for detection of NTSC and PAL, but sets the same frame rate for both (50fps).. could this be the case? I'm going bonkers trying to see wtf is going on so I can do appropriate speed tuning.

-Thom
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: WIP: Wizard of Wor

Post by Kasumi »

You game appears to skip running logic every sixth frame, on NTSC.
So on NTSC:
5 gameplay frames are run for every 6 "real" frames.
50 gameplay frames are run for every 60 "real" frames.
At 60 frames per second (close enough), 50 gameplay frames for every second.
And on PAL:
5 gameplay frames are run for every 5 "real" frames.
50 gameplay frames are run for every 50 "real" frames.
At 50 frames per second (close enough), 50 gameplay frames for every second.

So yes, your game is attempting to match NTSC and PAL gameplay speed. I'm unsure of if you're asking this question because you weren't aware it was doing that at all, or if you were totally aware and just want to do it a different way. (Or you don't want to do it at all, and want both versions to run 1 gameplay frame for every "real" frame with the NTSC character moving 60 pixels per second and the PAL character moving 50 pixels per second.)
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

I'm simply trying to determine why if I use e.g. a delay counter that decrements every 'frame' that I am seeing some frames go faster than others.

-Thom
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: WIP: Wizard of Wor

Post by Kasumi »

Here's the code in _ppu_wait_frame:
(Comments mine)

Code: Select all


	lda #1;Tell the NMI the vram buffer is totally (rather than partially) updated (presumably)
	sta <VRAM_UPDATE
	lda <FRAME_CNT1;Load a counter changed in the NMI (presumably)

@1:

	cmp <FRAME_CNT1;Compare to what's in A. When the NMI changes this, it'll be different
	beq @1;and we'll stop looping
	lda <NTSC_MODE;Assuming PAL is zero, we're done
	beq @3;And branch
;If NTSC (non zero presumably)
@2:

	lda <FRAME_CNT2;We check if this frame is a multiple of six
	cmp #5
	beq @2;If it is, keep waiting until it's not.

@3:

	rts
So if you want it to not do that, you could do this:

Code: Select all

	lda #1
	sta <VRAM_UPDATE
	lda <FRAME_CNT1;Load a counter changed in the NMI (presumably)

@1:

	cmp <FRAME_CNT1;Compare to what's in A. When the NMI changes this, it'll be different
	beq @1;and we'll stop looping
	rts
in theory. But that may have other effects, since I'm not too familiar with neslib.
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

ok, replaced my BOX_PIXEL_X and BOX_PIXEL_Y multiply by 24 macros with a straight table lookup, and this seems to have made everything extremely smooth, if fast. Debating on whether or not to replace the div24 routine, which is very fast, anyway.

-Thom
tschak909
Posts: 142
Joined: Mon Jul 03, 2017 4:37 pm
Contact:

Re: WIP: Wizard of Wor

Post by tschak909 »

Looks like with removing the multiplies, things are smooth now that I am applying two types of delay, animation delay, and move delay. I can now build a set of tables to scale those up per level.

With this and the current tuning that I've done for laser speeds and player movements, I just need to implement monster speed scaling, and it'll be good for the first pass of tuning.

CC65's generalized multiply routines, are, understandably slower than grandma stuck in molasses in January going uphill in a fucking ice storm.

-Thom
na_th_an
Posts: 558
Joined: Mon May 27, 2013 9:40 am

Re: WIP: Wizard of Wor

Post by na_th_an »

If you are tight and want to ditch tables, notice that N*24 = N*8+N*16, or (N<<3)+(N<<4).
Post Reply