became

mulax10, which is a dedicated subroutine for multiplying by 10, and is pretty efficient. CC65 does have dedicated ones for 3, 5, 6, 7, 9, and 10. Powers of two are generally handled with reasonably efficient shifts. (More simply: 0-10 and other powers of 2 are efficient.) Other numbers will use a more generic and slower iterative multiply subroutine.

became

tosumoda0, which in turn calls a generic iterative division subroutine (

udiv16). This is relatively slow.

It's not at all unreasonable to do some multiplication or division in your program. Just because it's slower than addition or subtraction doesn't mean it can't be fit into a performance budget.

You can analyze the code and understand the algorithms to get a sense of their (in)efficiency, but the easiest way to figure this stuff out is just to measure the code and see how many cycles it takes. If it's too many, then look at a different approach, if not, just carry on.

However, an immediate and simple suggestion is to make game board 16 wide, because multiply and modulo by 16 is generally fast (power of 2). Even if you don't use the extra 6 bytes on the end of each row, the extra 60 wasted bytes will still get you back a lot of performance if you're going into that array a lot.