It is currently Tue Mar 26, 2019 5:10 pm

 All times are UTC - 7 hours

 Page 1 of 1 [ 15 posts ]
 Print view Previous topic | Next topic
Author Message
 Posted: Tue Oct 24, 2017 6:35 am

Joined: Tue Jun 28, 2011 2:39 pm
Posts: 192
I need to multiply/divide numbers by arbitrary values so bitwise ops aren't useful here as they only allow for multiplication/division by very specific values.

I know I'll have to do it in software, but I don't feel like reinventing the wheel. Anyone has good, fast code that would allow me to do both? Preferably something I can just jsr into.

Top

 Posted: Tue Oct 24, 2017 6:47 am

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7339
Bregalad shared a nice multiply here:
Wiki: 8-bit Multiply

CC65 has some multiply/divide routines: (see umul/udiv/imul/idiv)
CC65 GitHub runtime module

Keldon recently shared a multiply using a lookup table:
Forum: Relatively fast multi, ...

In general, you can make it faster with lookup tables, but they take up space.

There's unfortunately a lot of different needs for multiplication (e.g. 8 bit x 8 bit = 16 bit? 16 x 16 = 32? signed/unsigned? remainder or fixed point?) so it might be tricky to find a "generic" one that does exactly what you want efficiently.

If you want to try writing your own, the simple method isn't really that tricky. If you know how to do long multiplication / long division by hand, it's basically just this technique in binary (i.e. for multiply it's a left shift x2 for each row instead of x10, but you still just add up the rows in the end).

Finally, there's an even simpler version where you can implement multiply as just a repeated add, or divide as counting a repeated subtract. This is slow and inefficient in the general case, but if your numbers are small or you're in a situation where the result is cumulatively maintained, they can be pretty effective anyway.

Top

 Posted: Tue Oct 24, 2017 6:54 am

Joined: Fri May 08, 2015 7:17 pm
Posts: 2470
Location: DIGDUG
http://6502org.wikidot.com/software-math-intmul

http://6502org.wikidot.com/software-math-intdiv

If you absolutely know you need multiplication of a factor of 2, bit shift is much faster.

I've used a table of answers before, for example, if I always need to multiply by 10.
LDX value
LDA table, x

_________________
nesdoug.com -- blog/tutorial on programming for the NES

Top

 Posted: Tue Oct 24, 2017 7:12 am

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11246
Location: Rio de Janeiro - Brazil
I've been using versions of this for multiplication, and have avoided division like the plague, but the page where that comes from has a few suggestions for division as well. If I need to divide by a constant, I multiply by the reciprocal of that constant instead, which's faster.

May I ask why do you need multiplication and division in a 6502 program? Most games can get away without them, even when more complex physics is involved. Sometimes you can make do with approximations, so you can use smaller look-up tables and faster code. The only reason I'm using multiplication is because I'm working on a raycaster, which involves trigonometry, but even then I try to use look-up tables as much as possible to avoid/simplify the real-time math.

Top

 Posted: Tue Oct 24, 2017 7:17 am

Joined: Tue Jun 28, 2011 2:39 pm
Posts: 192
Because I want my game to have score multipliers/score dividers.

Top

 Posted: Tue Oct 24, 2017 7:25 am

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 1107
Location: Pennsylvania, USA
Hmm, if you're just updating a score, you could spread repeated addition or subtraction over multiple frames perhaps, if you don't want to go the lookup table route. That's assuming you have some kind of action game. If it's a puzzle game or rpg where speed is not at a premium I'd probably just use repeated addition/subtraction (all at once rather than spread out)

Top

 Posted: Tue Oct 24, 2017 7:30 am

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11246
Location: Rio de Janeiro - Brazil
So I guess that speed isn't such a big concern then, since you'll most likely not be doing several of these operations every frame. In this case, I recommend straightforward shift-and-add and shift-and-subtract routines, which are more than efficient enough to be used every once in a while, instead of large look-up tables and obscure logic.

Top

 Posted: Tue Oct 24, 2017 3:12 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7339

Top

 Posted: Tue Oct 24, 2017 3:35 pm

Joined: Thu Mar 31, 2016 11:15 am
Posts: 506
rainwarrior wrote:

If you add DEY to the start of the subroutine you can drop the CLC inside the loop. Saves <=14 cycles.

I think.

Also, unrolling that loop saves like ~40 cycles at the cost of ~50 bytes. Well worth it, IMO. 50 bytes is nothing.

Top

 Posted: Tue Oct 24, 2017 4:02 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7339
pubby wrote:
If you add DEY to the start of the subroutine you can drop the CLC inside the loop. Saves <=14 cycles.

I think that's valid, yeah.

It would probably need a check for 0 before the DEY, which I guess would add 6 more cycles, so the difference is from -4 to 10 cycles? Though you also get a very quick Y=0 case in the bargain.

Edit: or wait, never mind, 255 + carry is going to be 0 anyway. I guess you don't actually need to check at all. So, 0 to 14 cycles, yeah.

I guess cc65's umul8x8r16 is actually the same as tepples' algorithm, just with slightly less parsimonious return value storage:
https://github.com/cc65/cc65/blob/master/libsrc/runtime/umul8x8r16.s

Last edited by rainwarrior on Tue Oct 24, 2017 6:47 pm, edited 2 times in total.

Top

 Posted: Tue Oct 24, 2017 4:31 pm

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1276
Haven't checked if it's better than the ones on the wiki, but here's another player from atariage: http://atariage.com/forums/topic/71120- ... /?p=896028

Not a new method, just different code.

_________________
https://kasumi.itch.io/indivisible

Top

 Posted: Tue Oct 24, 2017 6:47 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7339
Kasumi wrote:
Haven't checked if it's better than the ones on the wiki, but here's another player from atariage: http://atariage.com/forums/topic/71120- ... /?p=896028

Not a new method, just different code.

It's equivalent to tepples' method with the (non-unrolling) optimization pubby suggested. Also it doesn't bother to initialize A and treats it as an extra "add" input.

Top

 Posted: Wed Oct 25, 2017 4:15 am

Joined: Thu Mar 31, 2016 11:15 am
Posts: 506
rainwarrior wrote:
Edit: or wait, never mind, 255 + carry is going to be 0 anyway. I guess you don't actually need to check at all. So, 0 to 14 cycles, yeah.

prodlo will remain zero, but register A will end up non-zero. So checking for Y=0 is still necessary.

Top

 Posted: Wed Oct 25, 2017 5:08 am
 Formerly WheelInventor

Joined: Thu Apr 14, 2016 2:55 am
Posts: 1953
Location: Gothenburg, Sweden
As long as something's to be multiplied/divided by a power of 2, doing this is the simplest way:

Code:
mult32:
asl
mult16:
asl
mult8:
asl
mult4:
asl
asl
rts

div32:
lsr
div16:
lsr
div8:
lsr
div4:
lsr
lsr
rts

Then jsr label

You can avoid the addtional 12 cycles that jsr and rts cost together by hardcoding it into your specific
function. If you can figure out a siginificant var or const to be a value so that the operation requires even fewer lsr/asl:s to get the desired value; even better.

Several integer and floating point math examples can be found at http://6502.org/source/ - i haven't looked them up but came across it when i was looking for a reference on how many cycles the above would take.

_________________
http://www.frankengraphics.com - personal NES blog

Top

 Posted: Wed Oct 25, 2017 10:32 pm

Joined: Wed Oct 18, 2017 1:33 pm
Posts: 5
I made an interesting 8*8=8bit shift-add multiply routine in 15 bytes (over in Commodore 64 land). It shrinks down to 11 bytes if you initialize .A yourself, for the added portion of a multiply-accumulate.

The "interesting" part is that It exits early whenever the right-shifting multiplier reaches zero, which is fundamentally faster than performing all the iterations for the bit width if one of your values is small. I also extended it to 8*8=16bit which can also function as 16*8=16bit, still performing the early exit. There's plenty of other multiplication & division algorithms under the math section in there as well, though I think all the other primary mult techniques have been covered here so far.

_________________
WFDis Interactive 6502 Disassembler

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 1 [ 15 posts ]

 All times are UTC - 7 hours

#### Who is online

Users browsing this forum: Google Adsense [Bot], Revenant, tepples and 4 guests

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

Search for:
 Jump to:  Select a forum ------------------ NES / Famicom    NESdev    NESemdev    NES Graphics    NES Music    Homebrew Projects       2018 NESdev Competition       2017 NESdev Competition       2016 NESdev Competition       2014 NESdev Competition       2011 NESdev Competition    Newbie Help Center    NES Hardware and Flash Equipment       Reproduction    NESdev International       FCdev       NESdev China       NESdev Middle East Other    General Stuff    Membler Industries    Other Retro Dev       SNESdev       GBDev    Test Forum Site Issues    phpBB Issues    Web Issues    nesdevWiki