Hah, this explains everything. So yeah, today CPUs are fast enough to fetch and execute on a single clock, but that's not the case for the 6502, so it makes dummy fetches (and never 'rests', even in NOPs).
Actually the reason today's CPUs are able to execute one instruction (and more) per cycle is Pipelining. What happens is internally you get a number N of stages that gets filled up. When the cpu makes access to memory, it checks the cache first then goes to the main memory, halting the whole pipeline until ready.
The 6502 used no cache and actually accesses the memory every cycle. There is no way to know when a "good" access is made. Also, keep in mind RESET is implemented just like NMI, INT and BRK are. The difference is R/W is forced high, the int vector is set to $FFFC/$FFFD and writes are disabled but the stack still is decremented by 2.
Inside the 6502, NOPs are just like any other instruction except for the fact that nothing happens. That means you can get 1-byte, 2-byte etc nops. Think of it as LDA but without saving anything: the operands are still fetched.