Presumably, the wave should be smoothed across pixels, not only pixel-internally. Which brings us to the color bleeding artifacts.
I.e. instead of,
Code: Select all
000000000000111111111111222222222222 x coordinate
aaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbb pixel type
______¯¯¯¯¯¯____¯¯¯¯¯¯______¯¯¯¯¯¯__ signal
, you get something like this:
Code: Select all
000000000000111111111111222222222222
aaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbb
______/¯¯¯¯¯`-__/¯¯¯¯¯`-____/¯¯¯¯¯`-
Where the waveforms for the two successive "b" pixels are different because one is preceded by a signal high and the other is not.
This obviously cannot be modelled with a palette file. There seems to be merit to scanline rendering after all.
Basically, your pixel converter should produce 12 signal levels per pixel, and the scanline renderer should go across the signal levels, smooth it a bit (such as, every value becomes the weighted average of the last 10), and then convert it into YIQ and sRGB in 12 sample units.
EDIT: I tested this. Each pixel has 12 samples as the signal level, as usual. However, the signal levels are faded with formula oldsignal × (1-M) + newsignal × M where M might be e.g. 0.7 for a 70% signal clarity. Before fading, the signal is translated to –0.5 to 0.5 range. After fading, back to 0 to 1 range.
Left: 10%, middle: 30%, right: 70%
Left: 100% (no color bleeding), middle: 120%, right: 70% and 150% mixed together in 70%-30% proportion.
I tested a number of different fade coefficients. At > 100 %, it should produce the over-shoot spikes at signal edges. At low quality levels, it seems that the saturation suffers. However, you can indeed observe color artifacts where ever the color changes horizontally.
It is important to note that this is not a palette hack. Each pixel is interpreted from the raw transformed scanline signal. A palette can not model this effect.
Here's one in which I applied it at subpixel level. Each pixel is offseted by 3 signal samples from the previous one. Signal is the 70-150 mix explained above and below.
(Ahem, am I just reinventing the same as what Blargg did earlier?)
Here is a signal dump of scanline 10 of each image.
EDIT: Source code here. Apologies about the wonky indentation levels; I removed some code not relevant for documentation, and did not fix the indentation after the fact:
Code: Select all
unsigned Xscale = 4, Yscale = 3; // will render at 256*4 by 240*3
struct cache
{
float levels[12];
} yiqmap[240][256] = { {{{}}} };
void PutPixel(unsigned px,unsigned py, unsigned pixel)
{
// The input value is a NES color index (with de-emphasis bits).
auto& r = yiqmap[py][px];
// Decode the color index
int color = (pixel & 0x0F), level = color<0xE ? (pixel>>4) & 3 : 1;
// Voltage levels, relative to synch voltage
static const float black=.518f, white=1.962f, attenuation=.746f,
levels[8] = {.350f, .518f, .962f,1.550f, // Signal low
1.094f,1.506f,1.962f,1.962f}; // Signal high
// Calculate the luma and chroma by emulating the relevant circuits:
auto wave = [](int p, int color) { return (color+8+p)%12 < 6; };
for(int p=0; p<12; ++p) // 12 clock cycles per pixel.
{
// NES NTSC modulator (square wave between two voltage levels):
float spot = levels[level + 4*(color <= 12*wave(p,color))];
// De-emphasis bits attenuate a part of the signal:
if(((pixel & 0x40) && wave(p,12))
|| ((pixel & 0x80) && wave(p, 4))
|| ((pixel &0x100) && wave(p, 8))) spot *= attenuation;
// Normalize:
float v = (spot - black) / (white-black) / 12.f;
r.c.levels[p] = v;
}
}
#define c(v) std::cos(3.141592653 * (v) / 6) * 1.5
static const float cos[12] =
{ c(0),c(1),c(2),c(3),c(4),c(5),c(6),c(7),c(8),c(9),c(10),c(11) };
static const float sin[12] =
{ c(9),c(10),c(11),c(0),c(1),c(2),c(3),c(4),c(5),c(6),c(7),c(8) };
#undef c
void FlushScanline(unsigned py)
{
u32* pix = (u32*) s->pixels; // SDL surface
float level07=0.f, level15=0.f, cache[256*12];
for(unsigned o=0, px=0; px<256; ++px)
for(int p=0; p<12; ++p)
{
level07 = level07*0.3 + 0.7*(yiqmap[py][px].levels[p]-0.5f);
level15 = level15*-.5 + 1.5*(yiqmap[py][px].levels[p]-0.5f);
cache[o++] = 0.5f + (level07*0.7 + level15*0.3);
}
for(unsigned px=0; px<256; ++px)
for(int r=0; r< int(Xscale); ++r)
{
float yiq[3] = {0.f, 0.f, 0.f};
for(int x=px*12 + ((r+1-int(Xscale))*12/int(Xscale)),
p=0; p<12; ++p, ++x)
{
if(x<0 || x>=256*12) continue;
float v = cache[x];
// Simulate ideal TV NTSC decoder:
yiq[0] += v;
yiq[1] += v * cos[x%12] * 1.5;
yiq[2] += v * sin[x%12] * 1.5;
}
float gamma = 1.8f;
// Convert YIQ into RGB according to FCC sanctioned matrix.
auto gammafix = [=](float f) { return f < 0.f ? 0.f : std::pow(f, 2.2f / gamma); };
auto clamp = [](int v) { return v<0 ? 0 : v>255 ? 255 : v; };
unsigned rgb = 0x10000*clamp(255 * gammafix(yiq[0] + 0.946882f*yiq[1] + 0.623557f*yiq[2]))
+ 0x00100*clamp(255 * gammafix(yiq[0] + -0.274788f*yiq[1] + -0.635691f*yiq[2]))
+ 0x00001*clamp(255 * gammafix(yiq[0] + -1.108545f*yiq[1] + 1.709007f*yiq[2]));
for(int p=0; p< int(Yscale); ++p)
pix[ (py*Yscale+p) * (256*Xscale) + px*Xscale + r] = rgb;
}
//SDL_UpdateRect(s, 0,py*3, 256*3,3);
if(py == 239) SDL_Flip(s);
}
P.S. With this code, the super black color is actually meaningful! There is a theoretical difference to the next pixel whether the previous pixel was black or super black. A very slight difference, but a difference nonetheless.
Here is a test image. Horizontally, all even pixels are either 0D or 1D; odd pixels are everything from 00..3F. The 0D/1D selection toggles every 4 scanlines. Emphasis bits change every 30 pixels. Firefox's color picker extension reveals that there indeed are differences every 4 pixels, but they are very small.
Could someone verify this effect on the NES?
One more note. The combination of square wave and RC is quantized to 12 samples in this generator. Or more accurately, this assumes that the TV samples the video signal exactly 12 times per pixel. In reality, it samples it close to an infinite number per second. I am not a master of integral mathematics, so I won't even try to model it more accurately than that. Chances are that the differences are still completely neglible... If you care, simply read the same square wave level multiple times (e.g. instead of 12 samples per pixel, you'd get 48 samples per pixel), adjust the blur factors appropriately, and change the /12 divider (e.g. /48 divider). It will of course be slower.