Suggestion 1 about counting the pixel changes of the whole image helped a lot. What is measured now better matches what's changed. Combined with the suggestion to bias the merge changes, this now does subtle changes to more tiles rather then big changes to less.
I not sure how to work in suggestion 2. I did try something with a 3x3 region, but I must of severely messed up the weights because the result was garbage.
The way I see the problem is that tiles with opposite edges and corners are matched together, but every time I try to fix the random edge dots, the overall picture gets worse. I'm starting to think the random dots are actually a feature and the best compromise for reducing the tileset.
A thing that I might try another day is to factor in a manually made mask that prevents select pixels from changing. So that some crucial details, like faces, are preserved.