提交 · 2ef15b46e42647f6688d05abe2400fe008de5e0a · 小白菜888 / Ffmpeg

26 3月, 2012 1 次提交
- D
  
  build: prettyprinting cosmetics · ad0e31f1
  由 Diego Biurrun 提交于 2月 02, 2012
  
  ad0e31f1
25 3月, 2012 1 次提交
- D
  
  x86: conditionally compile H.264 QPEL optimizations · 915a2a0a
  由 Diego Biurrun 提交于 12月 18, 2011
  
  915a2a0a
24 2月, 2012 1 次提交

SBR DSP x86: implement SSE sbr_sum_square_sse · 34454c76

由 Christophe GISQUET 提交于 2月 23, 2012

The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C  /32bits: 82c (unrolled)/102c
               C  /64bits: 69c (unrolled)/82c
               SSE/32bits: 42c
               SSE/64bits: 31c

Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>

34454c76

03 2月, 2012 1 次提交

win64: add a XMM clobber test configure option. · 7e4d9d5d

由 Ronald S. Bultje 提交于 2月 02, 2012

This will be useful to test more aggressively for failures to mark XMM
registers as clobbered in Win64 builds, and prevent regressions thereof.

Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>

7e4d9d5d

31 1月, 2012 2 次提交

rv40: x86 SIMD for biweight · e5c9de2a

由 Christophe Gisquet 提交于 1月 12, 2012

Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).

*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips

This means around a 4% speedup for some sequences.
Signed-off-by: NDiego Biurrun <diego@biurrun.de>

e5c9de2a

D

x86: Give RV40 init file a more suitable name. · 91bafb52
由 Diego Biurrun 提交于 1月 30, 2012

91bafb52

30 1月, 2012 2 次提交
- R
  
  png: convert DSP functions to yasm. · 59f474b4
  由 Ronald S. Bultje 提交于 1月 27, 2012
  
  59f474b4
- R
  
  png: move DSP functions to their own DSP context. · e9200351
  由 Ronald S. Bultje 提交于 1月 27, 2012
  
  e9200351
12 1月, 2012 1 次提交

rv34: DC-only inverse transform · 3faa303a

由 Christophe GISQUET 提交于 1月 01, 2012

When decoding coefficients, detect whether the block is DC-only, and take
advantage of this knowledge to perform DC-only inverse transform.

This is achieved by:
- first, changing the 108x4 element modulo_three_table into a 108 element
  table (kind of base4), and accessing each value using mask and shifts.
- then, checking low bits for 0 (as they represent the presence of higher
  frequency coefficients)

Also provide x86 SIMD code for the DC-only inverse transform.
Signed-off-by: NKostya Shishkov <kostya.shishkov@gmail.com>

3faa303a

09 1月, 2012 1 次提交
- V
  mpegaudiodec: optimized iMDCT transform · 39df0c43
  由 Vitor Sessak 提交于 1月 05, 2012
```
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
```
  39df0c43
19 12月, 2011 1 次提交
- D
  
  x86: conditionally compile dnxhd encoder optimizations · 30bbd5cb
  由 Diego Biurrun 提交于 12月 18, 2011
  
  30bbd5cb
14 12月, 2011 1 次提交
- D
  
  build: conditionally compile x86 H.264 chroma optimizations · 88b97357
  由 Diego Biurrun 提交于 12月 13, 2011
  
  88b97357
11 10月, 2011 1 次提交
- R
  prores: idct sse2/sse4 optimizations. · e3f530fe
  由 Ronald S. Bultje 提交于 9月 30, 2011
```
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
```
  e3f530fe
12 8月, 2011 1 次提交
- K
  Move RV3/4-specific DSP functions into their own context · d241f51e
  由 Kostya Shishkov 提交于 8月 09, 2011
```
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
```
  d241f51e
03 7月, 2011 1 次提交

H.264: Add x86 assembly for 10-bit H.264 qpel functions. · 9bfa5363

由 Daniel Kang 提交于 7月 02, 2011

Mainly ported from 8-bit H.264 qpel.

Some code ported from x264. LGPL ok by author.
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>

9bfa5363

21 6月, 2011 1 次提交
- D
  h264: Add x86 assembly for 10-bit weight/biweight H.264 functions. · 84e70ef0
  由 Daniel Kang 提交于 6月 21, 2011
```
Mainly ported from 8-bit H.264 weight/biweight.
Signed-off-by: NDiego Biurrun <diego@biurrun.de>
```
  84e70ef0
18 6月, 2011 1 次提交
- D
  H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions. · f188a1e0
  由 Daniel Kang 提交于 6月 05, 2011
```
Mainly ported from 8-bit H.264 MC Chroma.
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
```
  f188a1e0
06 6月, 2011 1 次提交

Add x86 assembly for some 10-bit H.264 intra predict functions. · a8d44f9d

由 Daniel Kang 提交于 6月 05, 2011

Parts are inspired from the 8-bit H.264 predict code in Libav.
Other parts ported from x264 with relicensing permission from author.
Signed-off-by: NDiego Biurrun <diego@biurrun.de>

a8d44f9d

01 6月, 2011 1 次提交

Add IDCT functions for 10-bit H.264. · 836f47d3

由 Daniel Kang 提交于 5月 24, 2011

Ports the majority of IDCT functions for 10-bit H.264.

Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author.
Signed-off-by: NRonald S. Bultje <rbultje@google.com>

836f47d3

21 5月, 2011 1 次提交
- V
  
  dct32: port SSE 32-point DCT to YASM · 3758eb0e
  由 Vitor Sessak 提交于 5月 17, 2011
  
  3758eb0e
19 5月, 2011 1 次提交
- M
  mpegaudiodsp: fix x86 and ppc makefiles · 0b5e44ed
  由 Mans Rullgard 提交于 5月 19, 2011
```
Signed-off-by: NMans Rullgard <mans@mansr.com>
```
  0b5e44ed
11 5月, 2011 1 次提交
- J
  
  Port x86 10-bit H.264 deblock asm from x264 · 9f3d6ca4
  由 Jason Garrett-Glaser 提交于 5月 10, 2011
  
  9f3d6ca4
12 3月, 2011 1 次提交
- M
  Add CONFIG_AC3DSP symbol to simplify makefiles · a5444fee
  由 Mans Rullgard 提交于 3月 11, 2011
```
Signed-off-by: NMans Rullgard <mans@mansr.com>
```
  a5444fee
11 2月, 2011 1 次提交
- J
  Add x86-optimized versions of exponent_min(). · dda3f0ef
  由 Justin Ruggles 提交于 2月 10, 2011
```
Signed-off-by: NRonald S. Bultje <rsbultje@gmail.com>
```
  dda3f0ef
02 2月, 2011 1 次提交

Separate format conversion DSP functions from DSPContext. · c73d99e6

由 Justin Ruggles 提交于 1月 30, 2011

This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: NMans Rullgard <mans@mansr.com>

c73d99e6

17 9月, 2010 1 次提交
- R
  Move sse16_sse2() from inline asm to yasm. It is one of the functions causing · d0acc2d2
  由 Ronald S. Bultje 提交于 9月 17, 2010
```
Win64/FATE issues.

Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  d0acc2d2
14 9月, 2010 1 次提交

Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from · 1d16a1cf

由 Ronald S. Bultje 提交于 9月 14, 2010

h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.

Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.

Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk

1d16a1cf

10 9月, 2010 1 次提交

LGPL SSE2 H.264 iDCT · 8acb554a

由 Jason Garrett-Glaser 提交于 9月 10, 2010

This leaves no more GPL-only H.264 decoding asm code.

Approved by Loren.

Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk

8acb554a

08 9月, 2010 1 次提交
- S
  Move mm_support() from libavcodec to libavutil, make it a public · c6c98d08
  由 Stefano Sabatini 提交于 9月 08, 2010
```
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  c6c98d08
04 9月, 2010 1 次提交

Port latest x264 deblock asm (before they moved to using NV12 as internal · 2c166c3a

由 Ronald S. Bultje 提交于 9月 03, 2010

format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk

2c166c3a

02 9月, 2010 2 次提交

Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square · a33a2562

由 Ronald S. Bultje 提交于 9月 01, 2010

biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC.

Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk

a33a2562

Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c, · 14bc1f24

由 Ronald S. Bultje 提交于 9月 01, 2010

still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk

14bc1f24

31 8月, 2010 4 次提交
- R
  Fix vertical align. · 5929b3a6
  由 Ronald S. Bultje 提交于 8月 31, 2010
```
Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  5929b3a6
- R
  Split intra prediction initialization (i.e. assigning of function pointers) · de1c253b
  由 Ronald S. Bultje 提交于 8月 30, 2010
```
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in
dsputil_mmx.c).

Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  de1c253b
- R
  Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1 · d0eb5a11
  由 Ronald S. Bultje 提交于 8月 30, 2010
```
fate failures on Win64.

Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  d0eb5a11
- R
  Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6 · e9f5f020
  由 Ronald S. Bultje 提交于 8月 30, 2010
```
issues on Win64.

Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  e9f5f020
25 8月, 2010 2 次提交
- R
  Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should · 89fa3504
  由 Ronald S. Bultje 提交于 8月 25, 2010
```
help in fixing the Win64 fate failures.

Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  89fa3504
- R
  Move vp6_filter_diag4() from DSPContext to VP56DSPContext. · 3a088514
  由 Ronald S. Bultje 提交于 8月 25, 2010
```
Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  3a088514
08 8月, 2010 1 次提交

Split h264dsp and h264pred in configure. · 4a384de5

由 Jason Garrett-Glaser 提交于 8月 07, 2010

Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself.

Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk

4a384de5

05 8月, 2010 1 次提交

H.264: SSE2/SSSE3 weighted prediction asm · c12d6955

由 Eli Friedman 提交于 8月 05, 2010

Patch by Eli Friedman <eli.friedman at gmail dot com>

Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk

c12d6955