提交 · 99684f3ae752fc8bfb44a2dd1482f8d7a3d8536d · 小白菜888 / Ffmpeg

11 2月, 2017 10 次提交

avio: add a destructor for AVIOContext · 99684f3a

由 Anton Khirnov 提交于 1月 13, 2017

Before this commit, AVIOContext is to be freed with a plain av_free(),
which prevents us from adding any deeper structure to it.

99684f3a

M
arm: vp9lpf: Use orrs instead of orr+cmp · 435cd7bc
由 Martin Storsjö 提交于 1月 13, 2017
```
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
435cd7bc

arm/aarch64: vp9lpf: Calculate !hev directly · e1f9de86

由 Martin Storsjö 提交于 1月 12, 2017

Previously we first calculated hev, and then negated it.

Since we were able to schedule the negation in the middle
of another calculation, we don't see any gain in all cases.

Before: Cortex A7 A8 A9 A53 A53/AArch64
vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7
vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7
vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7
vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0
After:
vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7
vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7
vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7
vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0
Signed-off-by: NMartin Storsjö <martin@martin.st>

e1f9de86

aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling · 3fcf788f

由 Martin Storsjö 提交于 1月 04, 2017

This work is sponsored by, and copyright, Google.

Before:                           Cortex A53
vp9_inv_dct_dct_16x16_sub1_add_neon:   235.3
vp9_inv_dct_dct_32x32_sub1_add_neon:   555.1
After:
vp9_inv_dct_dct_16x16_sub1_add_neon:   180.2
vp9_inv_dct_dct_32x32_sub1_add_neon:   475.3
Signed-off-by: NMartin Storsjö <martin@martin.st>

3fcf788f

arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling · a76bf8cf

由 Martin Storsjö 提交于 1月 04, 2017

This work is sponsored by, and copyright, Google.

Before:                            Cortex A7      A8      A9     A53
vp9_inv_dct_dct_16x16_sub1_add_neon:   273.0   189.5   211.7   235.8
vp9_inv_dct_dct_32x32_sub1_add_neon:   752.0   459.2   862.2   553.9
After:
vp9_inv_dct_dct_16x16_sub1_add_neon:   226.5   145.0   225.1   171.8
vp9_inv_dct_dct_32x32_sub1_add_neon:   721.2   415.7   727.6   475.0
Signed-off-by: NMartin Storsjö <martin@martin.st>

a76bf8cf

M
aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter · 388e0d25
由 Martin Storsjö 提交于 12月 17, 2016
```
No measured speedup on a Cortex A53, but other cores might benefit.
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
388e0d25

arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter · fea92a4b

由 Martin Storsjö 提交于 12月 17, 2016

Before:                    Cortex A7      A8     A9     A53
vp9_put_8tap_smooth_4h_neon:   378.1   273.2  340.7   229.5
After:
vp9_put_8tap_smooth_4h_neon:   352.1   222.2  290.5   229.5
Signed-off-by: NMartin Storsjö <martin@martin.st>

fea92a4b

aarch64: vp9mc: Simplify the extmla macro parameters · 5e0c2158

由 Martin Storsjö 提交于 12月 17, 2016

Fold the field lengths into the macro.

This makes the macro invocations much more readable, when the
lines are shorter.

This also makes it easier to use only half the registers within
the macro.
Signed-off-by: NMartin Storsjö <martin@martin.st>

5e0c2158

mov: Rework stsc index validation · 53ea595e

由 Vittorio Giovara 提交于 2月 03, 2017

In order to avoid potential integer overflow change the comparison
and make sure to use the same unsigned type for both elements.

53ea595e

V

imgutils: Document av_image_get_buffer_size() · ce6d72d1
由 Vittorio Giovara 提交于 2月 07, 2017

ce6d72d1

10 2月, 2017 10 次提交
- L
  hlsenc: Correctly write down all 16 bytes in hex · b6093e8c
  由 Luca Barbato 提交于 2月 09, 2017
```
Signed-off-by: NLuca Barbato <lu_zero@gentoo.org>
```
  b6093e8c
- M
  utvideodec: Add a missing include · bc258976
  由 Martin Storsjö 提交于 2月 10, 2017
```
This was missing from 77c23704, fixing building.
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
  bc258976
- T
  nvenc: make gpu indices independent of supported capabilities · a52976c0
  由 Timo Rothenpieler 提交于 2月 06, 2017
```
Do not allocate a CUDA context for every available gpu.
Signed-off-by: NLuca Barbato <lu_zero@gentoo.org>
```
  a52976c0
- D
  avcodec: Mark some codecs with threadsafe init as such · 77c23704
  由 Derek Buitenhuis 提交于 2月 08, 2017
```
Signed-off-by: NDerek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: NLuca Barbato <lu_zero@gentoo.org>
```
  77c23704
- M
  aarch64: vp9itxfm: Fix incorrect vertical alignment · 0c0b87f1
  由 Martin Storsjö 提交于 1月 03, 2017
```
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
  0c0b87f1
- M
  aarch64: vp9itxfm: Update a comment to refer to a register with a different name · 8476eb0d
  由 Martin Storsjö 提交于 1月 03, 2017
```
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
  8476eb0d
- M
  aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability · 3dd78272
  由 Martin Storsjö 提交于 1月 03, 2017
```
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
  3dd78272
- M
  aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible · ed8d2933
  由 Martin Storsjö 提交于 1月 03, 2017
```
The ld1r is a leftover from the arm version, where this trick is
beneficial on some cores.

Use a single-lane load where we don't need the semantics of ld1r.
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
  ed8d2933
- M
  aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function · 4da4b2b8
  由 Martin Storsjö 提交于 1月 03, 2017
```
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
  4da4b2b8
- M
  arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function · 3933b86b
  由 Martin Storsjö 提交于 1月 03, 2017
```
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
  3933b86b
09 2月, 2017 7 次提交

aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 · a63da451

由 Martin Storsjö 提交于 11月 22, 2016

This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14740 bytes to 24292 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub8_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1387.4
vp9_inv_dct_dct_16x16_sub16_add_neon:   1387.6
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5198.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    5198.6
vp9_inv_dct_dct_32x32_sub8_add_neon:    5196.3
vp9_inv_dct_dct_32x32_sub12_add_neon:   6183.4
vp9_inv_dct_dct_32x32_sub16_add_neon:   6174.3
vp9_inv_dct_dct_32x32_sub20_add_neon:   7151.4
vp9_inv_dct_dct_32x32_sub24_add_neon:   7145.3
vp9_inv_dct_dct_32x32_sub28_add_neon:   8119.3
vp9_inv_dct_dct_32x32_sub32_add_neon:   8118.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:     640.8
vp9_inv_dct_dct_16x16_sub4_add_neon:     639.0
vp9_inv_dct_dct_16x16_sub8_add_neon:     842.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1388.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   1389.3
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    3685.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    3685.1
vp9_inv_dct_dct_32x32_sub8_add_neon:    3684.4
vp9_inv_dct_dct_32x32_sub12_add_neon:   5312.2
vp9_inv_dct_dct_32x32_sub16_add_neon:   5315.4
vp9_inv_dct_dct_32x32_sub20_add_neon:   7154.9
vp9_inv_dct_dct_32x32_sub24_add_neon:   7154.5
vp9_inv_dct_dct_32x32_sub28_add_neon:   8126.6
vp9_inv_dct_dct_32x32_sub32_add_neon:   8127.2
Signed-off-by: NMartin Storsjö <martin@martin.st>

a63da451

arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible · 5eb5aec4

由 Martin Storsjö 提交于 11月 22, 2016

This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 12388 bytes to 19784 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before: Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8
vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5
vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5
vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2
vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9
vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9
vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7
vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3
vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0
vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3
vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3
vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2
vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4
vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0
vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8
vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7
vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3
vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1
vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6
vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2
vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7
vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0
vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9
vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1
vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4
vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0
Signed-off-by: NMartin Storsjö <martin@martin.st>

5eb5aec4

M
aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · 79d332eb
由 Martin Storsjö 提交于 2月 05, 2017
```
This allows reusing the macro for a separate implementation of the
pass2 function.
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
79d332eb
M
arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · 47b3c2c1
由 Martin Storsjö 提交于 2月 05, 2017
```
This allows reusing the macro for a separate implementation of the
pass2 function.
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
47b3c2c1

aarch64: vp9itxfm: Make the larger core transforms standalone functions · 11547601

由 Martin Storsjö 提交于 11月 23, 2016

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
19496 to 14740 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub32_add_neon:   8095.7

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   1390.1
vp9_inv_dct_dct_32x32_sub4_add_neon:    5199.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8125.8
Signed-off-by: NMartin Storsjö <martin@martin.st>

11547601

arm: vp9itxfm: Make the larger core transforms standalone functions · 0331c3f5

由 Martin Storsjö 提交于 11月 23, 2016

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
15324 to 12388 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before: Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1
vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3
vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2
vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2

After:
vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0
vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1
vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9
Signed-off-by: NMartin Storsjö <martin@martin.st>

0331c3f5

configure: Correctly recurse in do_check_deps() · c546147d

由 Diego Biurrun 提交于 2月 08, 2017

Fixes all sorts of configuration problems introducec by dad7a9c7
on non-Linux or non-vanilla configs. Also removes a line made redundant
in that commit.

c546147d

08 2月, 2017 4 次提交

omx: Use the EOS flag to handle flushing at the end · 57ec83e4

由 Martin Storsjö 提交于 2月 07, 2017

This avoids having to count the number of frames sent to the codec
and the number of output packets received; instead just wait until
the encoder returns a buffer with the EOS flag set.
Signed-off-by: NMartin Storsjö <martin@martin.st>

57ec83e4

D
configure: Rework dependency handling for conflicting components · dad7a9c7
由 Diego Biurrun 提交于 1月 20, 2017
```
This makes the feature more visible and obvious.
```
dad7a9c7

configure: Add name parameter to require_pkg_config() helper function · 9127ac5e

由 Diego Biurrun 提交于 1月 23, 2017

This allows distinguishing between the internal variable name for
external libraries and the pkg-config package name. Having both
names available avoids special-casing outside the helper function
when the two identifiers do not match.

9127ac5e

D

Use bitstream_init8() where appropriate · a25dac97
由 Diego Biurrun 提交于 6月 06, 2016

a25dac97

06 2月, 2017 3 次提交
- D
  
  configure: Use cppflags check helper functions where appropriate · 71a49fe2
  由 Diego Biurrun 提交于 1月 20, 2017
  
  71a49fe2
- D
  configure: Add stdlib.h #include to CPPFLAGS check helper functions · 0ce3761c
  由 Diego Biurrun 提交于 2月 03, 2017
```
This ensures that added CPPFLAGS are validated against libc headers.
```
  0ce3761c
- A
  
  wma: Convert to the new bitstream reader · f7ec7f54
  由 Alexandra Hájková 提交于 4月 15, 2016
  
  f7ec7f54
05 2月, 2017 2 次提交

aarch64: vp9itxfm: Restructure the idct32 store macros · 58d87e0f

由 Martin Storsjö 提交于 12月 01, 2016

This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.

This is also arguably more readable.
Signed-off-by: NMartin Storsjö <martin@martin.st>

58d87e0f

M
arm: vp9itxfm: Avoid .irp when it doesn't save any lines · 3bc5b28d
由 Martin Storsjö 提交于 2月 04, 2017
```
This makes it more readable.
Signed-off-by: NMartin Storsjö <martin@martin.st>
```
3bc5b28d

04 2月, 2017 1 次提交

asfdec: Use the ASF stream count when iterating · 8e67039c

由 John Stebbins 提交于 1月 12, 2017

The AVFormat stream count can be larger due external factors, such as
an id3 tag appended.

Avoid an out of bound read.
Signed-off-by: NLuca Barbato <lu_zero@gentoo.org>

8e67039c

03 2月, 2017 3 次提交

D

asm: Consistently uppercase SECTION markers · 7abdd026
由 Diego Biurrun 提交于 2月 01, 2017

7abdd026
D

build: Ignore generated .version files · 740b0bf0
由 Diego Biurrun 提交于 1月 31, 2017

740b0bf0

rtmp: Correctly handle the Window Acknowledgement Size packets · 15a92e0c

由 Martin Storsjö 提交于 1月 31, 2017

This swaps which field is set when the Window Acknowledgement Size
and Set Peer BW packets are received, renames the fields in
order to clarify their role further and adds verbose comments
explaining their respective roles and how well the code currently
does what it is supposed to.

The Set Peer BW packet tells the receiver of the packet (which
can be either client or server) that it should not send more data
if it already has sent more data than the specified number of bytes,
without receiving acknowledgement for them. Actually checking this
limit is currently not implemented.

In order to be able to check that properly, one can send the
Window Acknowledgement Size packet, which tells the receiver of the
packet that it needs to send Acknowledgement packets
(RTMP_PT_BYTES_READ) at least after receiving a given number of bytes
since the last Acknowledgement.

Therefore, when we receive a Window Acknowledgement Size packet,
this sets the maximum number of bytes we can receive without sending
an Acknowledgement; therefore when handling this packet we should set
the receive_report_size field (previously client_report_size).
Signed-off-by: NMartin Storsjö <martin@martin.st>

15a92e0c