• stephana's avatar
    Revert of Revert of SSE4 opaque blend using intrinsics instead of assembly.... · 4bf1ce27
    stephana authored
    Revert of Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #1 id:1 of https://codereview.chromium.org/873553003/)
    
    Reason for revert:
    Reverted the wrong CL.
    
    Original issue's description:
    > Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #16 id:300001 of https://codereview.chromium.org/874863002/)
    >
    > Reason for revert:
    > This causes a bug on the 'hittestpath' GM on MacMini 4,1
    >
    > See:
    >
    > https://gold.skia.org/#/triage/hittestpath?head=0
    >
    > for details.
    >
    > Original issue's description:
    > > SSE4 opaque blend using intrinsics instead of assembly.
    > >
    > > Since we had such a hard time with the assembly versions of this blit (to the
    > > point that we have them completely disabled everywhere), I thought I'd take
    > > a shot at writing a version of the blit using intrinsics.
    > >
    > > The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
    > > to skip the blend when the 16 src pixels we consider each loop are all opaque
    > > or all transparent.  _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
    > > all those alphas.
    > >
    > > It's worth looking to see if we can backport this type of logic to SSE2 using
    > > _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
    > >
    > > My local performance testing doesn't show this to be an unambiguous win
    > > (there are probably microbenchmarks and SKPs where we'd be better off just
    > > powering through the blend rather than looking at alphas), but the potential
    > > does seem tantalizing enough to let skiaperf vet it on the bots.  (< 1.0x is a win.)
    > >
    > > DM says it draws pixel perfect compare to the old code.
    > >
    > > Microbenchmarks:
    > >                bitmap_RGBA_8888_A_source_stripes_two	  14us -> 14.4us	1.03x
    > >              bitmap_RGBA_8888_A_source_stripes_three	14.3us -> 14.5us	1.01x
    > >                        bitmap_RGBA_8888_scale_bilerp	61.9us -> 62.2us	1.01x
    > > bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp	 102us ->  101us	0.99x
    > >                 bitmap_RGBA_8888_scale_rotate_bilerp	 103us ->  101us	0.99x
    > >                               bitmap_RGBA_8888_scale	18.4us -> 18.2us	0.99x
    > >              bitmap_RGBA_8888_A_scale_rotate_bicubic	  71us ->   70us	0.99x
    > >          bitmap_RGBA_8888_update_scale_rotate_bilerp	 103us ->  101us	0.99x
    > >               bitmap_RGBA_8888_A_scale_rotate_bilerp	 112us ->  109us	0.98x
    > >                     bitmap_RGBA_8888_update_volatile	5.72us -> 5.58us	0.98x
    > >                                     bitmap_RGBA_8888	5.73us -> 5.58us	0.97x
    > >                              bitmap_RGBA_8888_update	5.78us ->  5.6us	0.97x
    > >                      bitmap_RGBA_8888_A_scale_bilerp	70.7us ->   68us	0.96x
    > >                     bitmap_RGBA_8888_A_scale_bicubic	23.7us -> 21.8us	0.92x
    > >                                   bitmap_RGBA_8888_A	13.9us -> 10.9us	0.78x
    > >                     bitmap_RGBA_8888_A_source_opaque	  14us -> 6.29us	0.45x
    > >                bitmap_RGBA_8888_A_source_transparent	  14us -> 3.65us	0.26x
    > >
    > > Running over our ~70 SKP web page captures, this looks like we spend 0.7x
    > > the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
    > > be a decent predictor of real-world impact.
    > >
    > > BUG=chromium:399842
    > >
    > > Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785
    > >
    > > CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot
    > >
    > > Committed: https://skia.googlesource.com/skia/+/6dbfb21a6c88af6d94e8c823c3ad559f1a41b493
    >
    > TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
    > NOPRESUBMIT=true
    > NOTREECHECKS=true
    > NOTRY=true
    > BUG=chromium:399842
    >
    > Committed: https://skia.googlesource.com/skia/+/4988891a1173cd405bf1c1dd3a3668c451f45e4c
    
    TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
    NOPRESUBMIT=true
    NOTREECHECKS=true
    NOTRY=true
    BUG=chromium:399842
    
    Review URL: https://codereview.chromium.org/894083002
    4bf1ce27
SkBlitRow_opts_SSE4.cpp 2.67 KB