Optimizing alpha blending of two 32bit images : Graphics
This is a discussion on Optimizing alpha blending of two 32bit images within the Graphics forums in Theory and Concepts category; Hi, I'm working on the alpha blending algorithm of two 32bit images. The background image is not opaque. The result should be an image that contains both the alpha channel mixed. I searched for the algorithm and I found this rule: FinalAlpha = 255-(255-Alpha1)*(255-Alpha2)/255; FinalColor = Alpha1*Color1/FinalAlpha + Alpha2*Color2*(255-Alpha1)/ FinalAlpha/255; where Color1/Alpha1 represents the foreground image, and Color2/Alpha2 represents the background. So I came up with the following code to blend them together: // pxD is the RGBA sequence for the output, // pxFG is foreground while pxBg is background BYTE *pxD, *pxFg, *pxBg; BYTE aD; for (px=0; px < ...
![]() |
| | LinkBack | Thread Tools |
|
#1
| |||
| |||
| I'm working on the alpha blending algorithm of two 32bit images. The background image is not opaque. The result should be an image that contains both the alpha channel mixed. I searched for the algorithm and I found this rule: FinalAlpha = 255-(255-Alpha1)*(255-Alpha2)/255; FinalColor = Alpha1*Color1/FinalAlpha + Alpha2*Color2*(255-Alpha1)/ FinalAlpha/255; where Color1/Alpha1 represents the foreground image, and Color2/Alpha2 represents the background. So I came up with the following code to blend them together: // pxD is the RGBA sequence for the output, // pxFG is foreground while pxBg is background BYTE *pxD, *pxFg, *pxBg; BYTE aD; for (px=0; px < nTotalPx; px+=4) { aD = 255 - (255-pxFg[px+3]) * (255-pxBg[px+3]) / 255; if (aD != 0 ) { pxD[px+0] = pxFg[px+3] * pxFg[px+0] / aD + pxBg[px+0] * pxBg[px+3] * (255 - pxFg[px+3]) / aD / 255; pxD[px+1] = pxFg[px+3] * pxFg[px+1] / aD + pxBg[px+1] * pxBg[px+3] * (255 - pxFg[px+3]) / aD / 255; pxD[px+2] = pxFg[px+3] * pxFg[px+2] / aD + pxBg[px+2] * pxBg[px+3] * (255 - pxFg[px+3]) / aD / 255; } pxD[px+3] = aD; } It works but it looks pretty complicated to me. And I have no idea how to optimize it. I appreciate it if anyone can give me some hints. Will MMX help in this situation? Thanks in advance, -- Bill Holt |
|
#2
| |||
| |||
| Bill Holt <mailbill@21cn.com> wrote: > Hi, > > I'm working on the alpha blending algorithm of two 32bit images. The > background image is not opaque. The result should be an image that > contains both the alpha channel mixed. You mean: final_color = background + foreground*(bck_alpha * fore_alpha)? Integer SIMD instructions (MMX/SSE2) will help. w. |
|
#3
| |||
| |||
| On Apr 7, 9:22 pm, Wojciech Muła <wojciech_m...@poczta.null.onet.pl.invalid> wrote: > You mean: final_color = background + foreground*(bck_alpha * fore_alpha)? > Integer SIMD instructions (MMX/SSE2) will help. > > w. Thanks, but where did you find that algorithm? The result doesn't look right. The bright part is brighter than expected, and the dark part is darker. What I want to simulate, is the effect that one translucent image is layered above another translucent image in Photoshop, without the background layer of course. Best regards, Bill Holt |
|
#4
| |||
| |||
| Look at Graphics32 it has MMX optimisation |
|
#5
| |||
| |||
| Bill Holt wrote: > I'm working on the alpha blending algorithm of two 32bit images. The > background image is not opaque. The result should be an image that > contains both the alpha channel mixed. I searched for the algorithm > and I found this rule: > > FinalAlpha = 255-(255-Alpha1)*(255-Alpha2)/255; > FinalColor = Alpha1*Color1/FinalAlpha + Alpha2*Color2*(255-Alpha1)/ > FinalAlpha/255; > > where Color1/Alpha1 represents the foreground image, and Color2/Alpha2 > represents the background. This rule can be derived in many different ways and I think it's probably one of the most essential rules in digital image compositing. It was first described in 1981 in a SIGGRAPH paper by Bruce A. Wallace and Mark Levoy: <http://graphics.stanford.edu/papers/merging-sig81/> It's also been described in the following paper by Alvy Ray Smith (where he argues that premultiplied alpha values should be used whenever possible): <ftp://ftp.alvyray.com/Acrobat/4_Comp.pdf> One way to approach this problem is to think of it in terms of physical transportation of light. If we think of a single photon being transmitted onto the foreground layer, we know that it may be either reflected or absorbed by the layer, or it may be transmitted through it. We also know that the probability of each of these events depends on the color and the translucensy of this particular layer. In fact, the opacity of the layer can be thought of as the probability by which the photon is either absorbed or reflected. Now, the derivation of the formula will require some understanding of conditional probability and Bayesian statistics. Let us start by defining the following events: r1 -- the photon is reflected or absorbed by the foreground layer, r2 -- the photon is reflected or absorbed by the background layer, rx -- the photon is reflected or absorbed by either layer [r1 OR r2]. We can then define the probability P(r1), that the photon is reflected/absorbed by layer 1 and the conditional probabilities P(r2|r1 = false) and P(r2|r1 = true) that the photon is reflected/absorbed by layer 2, given that it is first transmitted through layer 1: P(r1) = a1 (a1 = foreground alpha) P(r2|r1 = false) = a2 (a2 = background alpha) The probability that the photon is reflected/absorbed by layer 2, P(r2), is given as the joint probability of r1 = false and r2, i.e.: P(r2) = P(r1 = false, r2) = P(r1 = false) * P(r2|r1 = false) = (1 - a1) * a2. We know that the output alpha, a_out, should yield the same reflectance probability as the total reflectance probability of both of the unmerged layers, and hence: a_out = P(rx) = P(r1) + P(r2) = a1 + a2 - a1 * a2 Furthermore, we may think of the color value of a layer is the ratio between what is reflected and what is absorbed. In order to determine the output color, c_out, we need to know the color contribution of the foreground and the background layer respectively. Thus, we need to know the conditional probability that given that the photon is reflected/absorbed, it is reflected/absorbed by either the foreground or the background, i.e. P(r1|rx) and P(r2|rx). This is where Bayes theorem comes to rescue. Bayes theorem states that we can relate two random events A and B according to the following formula: P(A|B) = P(B|A)*P(A)/P(B) We know that given that the photon is reflected by either layer, the probability of rx must necessarily be 1, P(rx|r1) = P(rx|r2) = 1, and consequently, from Bayes theorem, it follows that P(r1|rx) = P(rx|r1)*P(r1)/P(rx) = a1 / (a1 + a2 - a1 * a2) P(r2|rx) = P(rx|r2)*P(r2)/P(rx) = (1 - a1) * a2 / (a1 + a2 - a1 * a2) If c1 and c2 are the colors of the foreground and the background respectively, then the resultant color is given as sum of the contribution of each layer: c_out = c1 * P(r1|rx) + c2 * P(r2|rx) = (c1 * a1 + c2 * (1 - a1) * a2) / (a1 + a2 - a1 * a2) = (c1 * a1 + c2 * (1 - a1) * a2) / a_out By using premultiplied opacity values, this can further be simplified to: c_out = c1 + c2 * (1 - a1) Mattias |
|
#6
| |||
| |||
| On Apr 8, 5:56 am, "Mattias Andersson" <matt...@centaurix.com> wrote: > By using premultiplied opacity values, this can further be simplified to: > > c_out = c1 + c2 * (1 - a1) > > Mattias Thank you Mattias for the explanation. Now I understand the origin of this rule. As you suggested, I used premultiplied alpha and your last equation to rewrite the blending code. Evaluation showed a 30% of performance increase. I guess premultiplication contributed a lot. With division optimization I found somewhere else, I got another 20% increase. Now the code looks like this: #define DIVIDE_BY_255(thisInput)\ ((thisInput+(((thisInput+128)>>8)+128))>>8 ) WORD tmpVar1, tmpVar2; DWORD tmpVar3; for (px=0; px < nTotalPx; px+=4) { tmpVar1 = pixelsFg[px+3]*pixelsBg[px+3]; pxD[px+3] = pixelsFg[px+3]+pixelsBg[px+3]-DIVIDE_BY_255(tmpVar1); tmpVar1 = 255-pixelsFg[px+3]; tmpVar2 = pixelsFg[px+0]*pixelsFg[px+3]; tmpVar3 = pixelsBg[px+0]*pixelsBg[px+3]*tmpVar1; pxD[px+0] = DIVIDE_BY_255(tmpVar2)+DIVIDE_BY_255(DIVIDE_BY_255 (tmpVar3)); tmpVar2 = pixelsFg[px+1]*pixelsFg[px+3]; tmpVar3 = pixelsBg[px+1]*pixelsBg[px+3]*tmpVar1; pxD[px+1] = DIVIDE_BY_255(tmpVar2)+DIVIDE_BY_255(DIVIDE_BY_255 (tmpVar3)); tmpVar2 = pixelsFg[px+2]*pixelsFg[px+3]; tmpVar3 = pixelsBg[px+2]*pixelsBg[px+3]*tmpVar1; pxD[px+2] = DIVIDE_BY_255(tmpVar2)+DIVIDE_BY_255(DIVIDE_BY_255 (tmpVar3)); } It's a bit hard to read. But it works perfectly. I'm still looking for SSE or MMX hints to optimize the above code. Or is it still worth further working? Best regards, Bill Holt |
|
#7
| |||
| |||
| Bill Holt wrote: > I'm still looking for SSE or MMX hints to optimize the above code. Or > is it still worth further working? Well, there is always a trade-off between performance and accuracy. In this routine you rely on using a DIVIDE_BY_255 look up table. If you were using MMX, this wouldn't be a feasible, since there are no instructions for reading from multiple memory locations at once. Instead you would have to use bitwise shift instructions, which will impact the accuracy. In the Graphics32 library, we decided not to use MMX, since the look up table approach was both faster and had better accuracy. One optimization, however, would be to also take some special cases into account, e.g.: if (a1 == 0) return c2; if (a1 == 1) return c1; if (a2 == 0) return c1; if (a2 == 1) return a1 * c1 + (1 - a1) * c2; Because quite often pixels may be either fully transparent or fully opaque. HTH, Mattias |
|
#8
| |||
| |||
| "Mattias Andersson" <mattias@centaurix.com> wrote: > Bill Holt wrote: > > I'm still looking for SSE or MMX hints to optimize the above code. Or > > is it still worth further working? > > Well, there is always a trade-off between performance and accuracy. In this > routine you rely on using a DIVIDE_BY_255 look up table. If you were using > MMX, this wouldn't be a feasible, since there are no instructions for > reading from multiple memory locations at once. Instead you would have to > use bitwise shift instructions, which will impact the accuracy. > > In the Graphics32 library, we decided not to use MMX, since the look up > table approach was both faster and had better accuracy. MMX has too short registers and do not provide any instruction to get a permutation of bytes (it is really hard to populate alpha values). However SSE instruction set brings 128-bit registers, and SSSE3 (or SSE3) has nice instruction called PSHUFB, that shuffles bytes in any order. It is possible (in best case) to add & multiply eight 16-bit components at the same time. Intel has published lot of MMX application notes, I think most of them remain feasible for SSE integer instructions: http://softwarecommunity.intel.com/a...s/eng/1713.htm. w. |
|
#9
| |||
| |||
| Bill Holt <mailbill@21cn.com> wrote: > Thanks, but where did you find that algorithm? Sorry, I didn't understand your problem and post too fast. w. |
|
#10
| |||
| |||
| For division by 255 there is the good old Jim Blinn-Trick which encodes the first continued fraction and adds some rounding: ui32 iyteblend (ui8 a, ui8 b) // returns a*b/255. Exact in all corner-cases but // some difference in the inbetweens. { i32 t = a * b; return ((t>>8) + t + 1)>>8; } For ARGB32 blending you might want to check out this weblink: http://www.stereopsis.com/doubleblend.html Nils |



