- 26 3月, 2012 1 次提交
-
-
由 Diego Biurrun 提交于
-
- 25 3月, 2012 1 次提交
-
-
由 Diego Biurrun 提交于
-
- 24 2月, 2012 1 次提交
-
-
由 Christophe GISQUET 提交于
The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
-
- 03 2月, 2012 1 次提交
-
-
由 Ronald S. Bultje 提交于
This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
-
- 31 1月, 2012 2 次提交
-
-
由 Christophe Gisquet 提交于
Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: NDiego Biurrun <diego@biurrun.de>
-
由 Diego Biurrun 提交于
-
- 30 1月, 2012 2 次提交
-
-
由 Ronald S. Bultje 提交于
-
由 Ronald S. Bultje 提交于
-
- 12 1月, 2012 1 次提交
-
-
由 Christophe GISQUET 提交于
When decoding coefficients, detect whether the block is DC-only, and take advantage of this knowledge to perform DC-only inverse transform. This is achieved by: - first, changing the 108x4 element modulo_three_table into a 108 element table (kind of base4), and accessing each value using mask and shifts. - then, checking low bits for 0 (as they represent the presence of higher frequency coefficients) Also provide x86 SIMD code for the DC-only inverse transform. Signed-off-by: NKostya Shishkov <kostya.shishkov@gmail.com>
-
- 09 1月, 2012 1 次提交
-
-
由 Vitor Sessak 提交于
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
-
- 19 12月, 2011 1 次提交
-
-
由 Diego Biurrun 提交于
-
- 14 12月, 2011 1 次提交
-
-
由 Diego Biurrun 提交于
-
- 11 10月, 2011 1 次提交
-
-
由 Ronald S. Bultje 提交于
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
-
- 12 8月, 2011 1 次提交
-
-
由 Kostya Shishkov 提交于
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
-
- 03 7月, 2011 1 次提交
-
-
由 Daniel Kang 提交于
Mainly ported from 8-bit H.264 qpel. Some code ported from x264. LGPL ok by author. Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
-
- 21 6月, 2011 1 次提交
-
-
由 Daniel Kang 提交于
Mainly ported from 8-bit H.264 weight/biweight. Signed-off-by: NDiego Biurrun <diego@biurrun.de>
-
- 18 6月, 2011 1 次提交
-
-
由 Daniel Kang 提交于
Mainly ported from 8-bit H.264 MC Chroma. Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
-
- 06 6月, 2011 1 次提交
-
-
由 Daniel Kang 提交于
Parts are inspired from the 8-bit H.264 predict code in Libav. Other parts ported from x264 with relicensing permission from author. Signed-off-by: NDiego Biurrun <diego@biurrun.de>
-
- 01 6月, 2011 1 次提交
-
-
由 Daniel Kang 提交于
Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by: NRonald S. Bultje <rbultje@google.com>
-
- 21 5月, 2011 1 次提交
-
-
由 Vitor Sessak 提交于
-
- 19 5月, 2011 1 次提交
-
-
由 Mans Rullgard 提交于
Signed-off-by: NMans Rullgard <mans@mansr.com>
-
- 11 5月, 2011 1 次提交
-
-
由 Jason Garrett-Glaser 提交于
-
- 12 3月, 2011 1 次提交
-
-
由 Mans Rullgard 提交于
Signed-off-by: NMans Rullgard <mans@mansr.com>
-
- 11 2月, 2011 1 次提交
-
-
由 Justin Ruggles 提交于
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
-
- 02 2月, 2011 1 次提交
-
-
由 Justin Ruggles 提交于
This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: NMans Rullgard <mans@mansr.com>
-
- 17 9月, 2010 1 次提交
-
-
由 Ronald S. Bultje 提交于
Win64/FATE issues. Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 14 9月, 2010 1 次提交
-
-
由 Ronald S. Bultje 提交于
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 10 9月, 2010 1 次提交
-
-
由 Jason Garrett-Glaser 提交于
This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 08 9月, 2010 1 次提交
-
-
由 Stefano Sabatini 提交于
function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 04 9月, 2010 1 次提交
-
-
由 Ronald S. Bultje 提交于
format), LGPL'ed with permission from Jason and Loren. This includes mmx2 code, so remove inline asm from h264dsp_mmx.c accordingly. Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 02 9月, 2010 2 次提交
-
-
由 Ronald S. Bultje 提交于
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be removed. OK'ed by Jason on IRC. Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
由 Ronald S. Bultje 提交于
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c, which represents H264DSPContext and is now compiled on its own. Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 31 8月, 2010 4 次提交
-
-
由 Ronald S. Bultje 提交于
Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
由 Ronald S. Bultje 提交于
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in dsputil_mmx.c). Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
由 Ronald S. Bultje 提交于
fate failures on Win64. Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
由 Ronald S. Bultje 提交于
issues on Win64. Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 25 8月, 2010 2 次提交
-
-
由 Ronald S. Bultje 提交于
help in fixing the Win64 fate failures. Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
由 Ronald S. Bultje 提交于
Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 08 8月, 2010 1 次提交
-
-
由 Jason Garrett-Glaser 提交于
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 05 8月, 2010 1 次提交
-
-
由 Eli Friedman 提交于
Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk
-