C optimizations?

You can talk about almost anything that you want to on this board.

Moderator: Moderators

Post Reply
User avatar
Formerly Fx3
Posts: 3265
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil

C optimizations?

Post by Zepper » Thu May 06, 2021 2:52 pm

This is used to upscale a bitmap image to 4x.
Is possible to optimize it?

Code: Select all

 //surface_00 is a (unsigned int) pointer to the bitmap.
 //surface_08 is a pointer to the next line of the bitmap.
 //surface_16 and 24 are pointers to the next lines.
 //value is the color (24bit format RRGGBB).
   *surface_00 = *surface_08 = *surface_16 = *surface_24 =
   surface_00[1] = surface_08[1] = surface_16[1] = surface_24[1] =
   surface_00[2] = surface_08[2] = surface_16[2] = surface_24[2] =
   surface_00[3] = surface_08[3] = surface_16[3] = surface_24[3] = value;
   surface_00 += 4; surface_08 += 4; surface_16 += 4; surface_24 += 4;

User avatar
Posts: 8058
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada

Re: C optimizations?

Post by rainwarrior » Thu May 06, 2021 3:29 pm

That code is very simple and direct. I'd expect a modern compiler to do a very good job optimizing that, especially if it allows SSE2 optimizations. (Even more if you want to allow AVX.)

However in the general question "can this be optimized", I don't think that can be answered without looking at the specific assembly it generates (and on a modern CPU, even then the question might require more context than that).

I wouldn't really have any suggestions for optimizing that within C itself. Usually something that plain should come out well.

Posts: 1373
Joined: Tue Oct 06, 2015 10:16 am

Re: C optimizations?

Post by calima » Fri May 07, 2021 12:05 am

A C-level optimization is making sure everything is aligned and writing in 64-bit units.

Posts: 57
Joined: Thu Sep 03, 2020 12:56 am

Re: C optimizations?

Post by none » Sat May 22, 2021 5:57 pm

You could use vector intrinsics to make it faster:

Code: Select all

        __m128i broadcast = _mm_setr_epi32(source[ii], source[ii], source[ii], source[ii]);
        surface[ii] = surface1[ii] = surface2[ii] = surface3[ii] = broadcast;
If you don't want to use intrinsics, compilers are hard to get to optimize things like these completely.
Both gcc and clang will not apply SIMD optimizations to your code.

If you switch around the column / row ordering like this, at least clang will use SSE or AVX, gcc still won't.

Code: Select all

        surface_00[0] = surface_00[1] = surface_00[2] = surface_00[3] =
        surface_08[0] = surface_08[1] = surface_08[2] = surface_08[3] = 
        surface_16[0] = surface_16[1] = surface_16[2] = surface_16[3] =
        surface_24[0] = surface_24[1] = surface_24[2] = surface_24[3] = value;
clang will generate assembly like this

Code: Select all

        vbroadcastss    xmm0, dword ptr [rdx + 4*rsi]
        inc     rsi
        vmovups xmmword ptr [rcx + 4*r8], xmm0
        vmovups xmmword ptr [rcx + 4*rdi], xmm0
        vmovups xmmword ptr [rcx + 4*rax], xmm0
        vmovups xmmword ptr [rcx], xmm0

Keeping a separate pointer for each line might or might not be a good idea, it depends heavily on surrounding code, architecture and compiler. As a rule of thumb however, if you have copying code in a loop, compilers usually optimize better if you use indexing instead and use row-major ordering, e.g.

Code: Select all

for (int i = 0; i < n; ++i) { surface[i * 4 + w * 0 + 0] = surface[i * 4 + w * 0 + 1] = .... = source_image[i]; }

User avatar
Posts: 4462
Joined: Fri Nov 19, 2004 7:35 pm

Re: C optimizations?

Post by Dwedit » Sat May 22, 2021 7:16 pm

I would avoid doing image scaling in software. A GPU will do a much better job drawing a single rectangle to a bigger render target.

I wonder if I should write a few D3D9 tutorials to explain how to use it for 2D graphics.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

Posts: 10666
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: C optimizations?

Post by lidnariq » Sat May 22, 2021 7:36 pm

In this case, the problem is "how does one do image scaling given still using Allegro, and a total toolkit rewrite is out-of-scope"

Post Reply