I can't really do anything more than quote the datasheet?
The SN76494 and SN76494A require approximately 4 clock cycles to load the data into the control register. The SN76496 and SN76496A require approximately 32 clock cycles. The open-collector READY output is used to synchronize the microprocessor to this transfer and is pulled to the false state (low) immediately following the falling edge of C̅E̅ (or W̅E̅ when data transfer is initiated by W̅E̅). READY will go high upon completion of the data transfer cycle.
(emphasis mine, just to ease comprehension)
The 76494 has a maximum clock of 500kHz and the SN76496 has a maximum clock of 4MHz: the only difference is the latter has a global divide-by-8 on its clock input.
This requirement is that the data bus be stable for the whole time; while this design clearly intended to stall the CPU during a write, an external latch also works.
This delay is pretty incomprehensible to me. I don't see, even given late 1970s technology, why latching the values takes more than a single cycle.