1. 26 3月, 2012 1 次提交
  2. 25 3月, 2012 1 次提交
  3. 24 2月, 2012 1 次提交
  4. 03 2月, 2012 1 次提交
  5. 31 1月, 2012 2 次提交
    • C
      rv40: x86 SIMD for biweight · e5c9de2a
      Christophe Gisquet 提交于
      Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
      multiples of 512 (which is often the case when the values round up nicely).
      
      *_TIMER report for the 16x16 and 8x8 cases:
      C:
      9015 decicycles in 16, 524257 runs, 31 skips
      2656 decicycles in 8, 524271 runs, 17 skips
      MMX:
      4156 decicycles in 16, 262090 runs, 54 skips
      1206 decicycles in 8, 262131 runs, 13 skips
      MMX on fast-path:
      2760 decicycles in 16, 524222 runs, 66 skips
      995 decicycles in 8, 524252 runs, 36 skips
      SSE2:
      2163 decicycles in 16, 262131 runs, 13 skips
      832 decicycles in 8, 262137 runs, 7 skips
      SSE2 with fast path:
      1783 decicycles in 16, 524276 runs, 12 skips
      711 decicycles in 8, 524283 runs, 5 skips
      SSSE3:
      2117 decicycles in 16, 262136 runs, 8 skips
      814 decicycles in 8, 262143 runs, 1 skips
      SSSE3 with fast path:
      1315 decicycles in 16, 524285 runs, 3 skips
      578 decicycles in 8, 524286 runs, 2 skips
      
      This means around a 4% speedup for some sequences.
      Signed-off-by: NDiego Biurrun <diego@biurrun.de>
      e5c9de2a
    • D
      x86: Give RV40 init file a more suitable name. · 91bafb52
      Diego Biurrun 提交于
      91bafb52
  6. 30 1月, 2012 2 次提交
  7. 12 1月, 2012 1 次提交
    • C
      rv34: DC-only inverse transform · 3faa303a
      Christophe GISQUET 提交于
      When decoding coefficients, detect whether the block is DC-only, and take
      advantage of this knowledge to perform DC-only inverse transform.
      
      This is achieved by:
      - first, changing the 108x4 element modulo_three_table into a 108 element
        table (kind of base4), and accessing each value using mask and shifts.
      - then, checking low bits for 0 (as they represent the presence of higher
        frequency coefficients)
      
      Also provide x86 SIMD code for the DC-only inverse transform.
      Signed-off-by: NKostya Shishkov <kostya.shishkov@gmail.com>
      3faa303a
  8. 09 1月, 2012 1 次提交
  9. 19 12月, 2011 1 次提交
  10. 14 12月, 2011 1 次提交
  11. 11 10月, 2011 1 次提交
  12. 12 8月, 2011 1 次提交
  13. 03 7月, 2011 1 次提交
  14. 21 6月, 2011 1 次提交
  15. 18 6月, 2011 1 次提交
  16. 06 6月, 2011 1 次提交
  17. 01 6月, 2011 1 次提交
  18. 21 5月, 2011 1 次提交
  19. 19 5月, 2011 1 次提交
  20. 11 5月, 2011 1 次提交
  21. 12 3月, 2011 1 次提交
  22. 11 2月, 2011 1 次提交
  23. 02 2月, 2011 1 次提交
  24. 17 9月, 2010 1 次提交
  25. 14 9月, 2010 1 次提交
    • R
      Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from · 1d16a1cf
      Ronald S. Bultje 提交于
      h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
      coded in asm instead of C, this is (depending on the function) up to 50%
      faster for cases where gcc didn't do a great job at looping.
      
      Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
      in-asm idct calling can now be enabled for chroma as well (see r16207). For
      MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
      the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.
      
      Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
      1d16a1cf
  26. 10 9月, 2010 1 次提交
  27. 08 9月, 2010 1 次提交
  28. 04 9月, 2010 1 次提交
  29. 02 9月, 2010 2 次提交
  30. 31 8月, 2010 4 次提交
  31. 25 8月, 2010 2 次提交
  32. 08 8月, 2010 1 次提交
  33. 05 8月, 2010 1 次提交