C optimizations?

You can talk about almost anything that you want to on this board.

Moderator: Moderators

Post Reply
User avatar
Zepper
Formerly Fx3
Posts: 3262
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

C optimizations?

Post by Zepper »

This is used to upscale a bitmap image to 4x.
Is possible to optimize it?

Code: Select all

 //surface_00 is a (unsigned int) pointer to the bitmap.
 //surface_08 is a pointer to the next line of the bitmap.
 //surface_16 and 24 are pointers to the next lines.
 //value is the color (24bit format RRGGBB).
 
   *surface_00 = *surface_08 = *surface_16 = *surface_24 =
   surface_00[1] = surface_08[1] = surface_16[1] = surface_24[1] =
   surface_00[2] = surface_08[2] = surface_16[2] = surface_24[2] =
   surface_00[3] = surface_08[3] = surface_16[3] = surface_24[3] = value;
   surface_00 += 4; surface_08 += 4; surface_16 += 4; surface_24 += 4;
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: C optimizations?

Post by rainwarrior »

That code is very simple and direct. I'd expect a modern compiler to do a very good job optimizing that, especially if it allows SSE2 optimizations. (Even more if you want to allow AVX.)

However in the general question "can this be optimized", I don't think that can be answered without looking at the specific assembly it generates (and on a modern CPU, even then the question might require more context than that).

I wouldn't really have any suggestions for optimizing that within C itself. Usually something that plain should come out well.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: C optimizations?

Post by calima »

A C-level optimization is making sure everything is aligned and writing in 64-bit units.
none
Posts: 117
Joined: Thu Sep 03, 2020 1:09 am

Re: C optimizations?

Post by none »

You could use vector intrinsics to make it faster:

Code: Select all

        __m128i broadcast = _mm_setr_epi32(source[ii], source[ii], source[ii], source[ii]);
        surface[ii] = surface1[ii] = surface2[ii] = surface3[ii] = broadcast;
If you don't want to use intrinsics, compilers are hard to get to optimize things like these completely.
Both gcc and clang will not apply SIMD optimizations to your code.

If you switch around the column / row ordering like this, at least clang will use SSE or AVX, gcc still won't.

Code: Select all

        
        surface_00[0] = surface_00[1] = surface_00[2] = surface_00[3] =
        surface_08[0] = surface_08[1] = surface_08[2] = surface_08[3] = 
        surface_16[0] = surface_16[1] = surface_16[2] = surface_16[3] =
        surface_24[0] = surface_24[1] = surface_24[2] = surface_24[3] = value;
clang will generate assembly like this

Code: Select all

        vbroadcastss    xmm0, dword ptr [rdx + 4*rsi]
        inc     rsi
        vmovups xmmword ptr [rcx + 4*r8], xmm0
        vmovups xmmword ptr [rcx + 4*rdi], xmm0
        vmovups xmmword ptr [rcx + 4*rax], xmm0
        vmovups xmmword ptr [rcx], xmm0



Keeping a separate pointer for each line might or might not be a good idea, it depends heavily on surrounding code, architecture and compiler. As a rule of thumb however, if you have copying code in a loop, compilers usually optimize better if you use indexing instead and use row-major ordering, e.g.

Code: Select all

for (int i = 0; i < n; ++i) { surface[i * 4 + w * 0 + 0] = surface[i * 4 + w * 0 + 1] = .... = source_image[i]; }
User avatar
Dwedit
Posts: 4922
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: C optimizations?

Post by Dwedit »

I would avoid doing image scaling in software. A GPU will do a much better job drawing a single rectangle to a bigger render target.

I wonder if I should write a few D3D9 tutorials to explain how to use it for 2D graphics.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: C optimizations?

Post by lidnariq »

In this case, the problem is "how does one do image scaling given still using Allegro, and a total toolkit rewrite is out-of-scope"
Post Reply