- 11 2月, 2017 10 次提交
-
-
由 Anton Khirnov 提交于
Before this commit, AVIOContext is to be freed with a plain av_free(), which prevents us from adding any deeper structure to it.
-
由 Martin Storsjö 提交于
Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
Previously we first calculated hev, and then negated it. Since we were able to schedule the negation in the middle of another calculation, we don't see any gain in all cases. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7 vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0 After: vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
This work is sponsored by, and copyright, Google. Before: Cortex A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 180.2 vp9_inv_dct_dct_32x32_sub1_add_neon: 475.3 Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
This work is sponsored by, and copyright, Google. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0 459.2 862.2 553.9 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 226.5 145.0 225.1 171.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 721.2 415.7 727.6 475.0 Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
No measured speedup on a Cortex A53, but other cores might benefit. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
Before: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 After: vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
Fold the field lengths into the macro. This makes the macro invocations much more readable, when the lines are shorter. This also makes it easier to use only half the registers within the macro. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Vittorio Giovara 提交于
In order to avoid potential integer overflow change the comparison and make sure to use the same unsigned type for both elements.
-
由 Vittorio Giovara 提交于
-
- 10 2月, 2017 10 次提交
-
-
由 Luca Barbato 提交于
Signed-off-by: NLuca Barbato <lu_zero@gentoo.org>
-
由 Martin Storsjö 提交于
This was missing from 77c23704, fixing building. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Timo Rothenpieler 提交于
Do not allocate a CUDA context for every available gpu. Signed-off-by: NLuca Barbato <lu_zero@gentoo.org>
-
由 Derek Buitenhuis 提交于
Signed-off-by: NDerek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: NLuca Barbato <lu_zero@gentoo.org>
-
由 Martin Storsjö 提交于
Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
Signed-off-by: NMartin Storsjö <martin@martin.st>
-
- 09 2月, 2017 7 次提交
-
-
由 Martin Storsjö 提交于
This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from 19496 to 14740 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 8095.7 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 1390.1 vp9_inv_dct_dct_32x32_sub4_add_neon: 5199.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8125.8 Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from 15324 to 12388 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3 vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1 vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9 Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Diego Biurrun 提交于
Fixes all sorts of configuration problems introducec by dad7a9c7 on non-Linux or non-vanilla configs. Also removes a line made redundant in that commit.
-
- 08 2月, 2017 4 次提交
-
-
由 Martin Storsjö 提交于
This avoids having to count the number of frames sent to the codec and the number of output packets received; instead just wait until the encoder returns a buffer with the EOS flag set. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Diego Biurrun 提交于
This makes the feature more visible and obvious.
-
由 Diego Biurrun 提交于
This allows distinguishing between the internal variable name for external libraries and the pkg-config package name. Having both names available avoids special-casing outside the helper function when the two identifiers do not match.
-
由 Diego Biurrun 提交于
-
- 06 2月, 2017 3 次提交
-
-
由 Diego Biurrun 提交于
-
由 Diego Biurrun 提交于
This ensures that added CPPFLAGS are validated against libc headers.
-
由 Alexandra Hájková 提交于
-
- 05 2月, 2017 2 次提交
-
-
由 Martin Storsjö 提交于
This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. This is also arguably more readable. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
由 Martin Storsjö 提交于
This makes it more readable. Signed-off-by: NMartin Storsjö <martin@martin.st>
-
- 04 2月, 2017 1 次提交
-
-
由 John Stebbins 提交于
The AVFormat stream count can be larger due external factors, such as an id3 tag appended. Avoid an out of bound read. Signed-off-by: NLuca Barbato <lu_zero@gentoo.org>
-
- 03 2月, 2017 3 次提交
-
-
由 Diego Biurrun 提交于
-
由 Diego Biurrun 提交于
-
由 Martin Storsjö 提交于
This swaps which field is set when the Window Acknowledgement Size and Set Peer BW packets are received, renames the fields in order to clarify their role further and adds verbose comments explaining their respective roles and how well the code currently does what it is supposed to. The Set Peer BW packet tells the receiver of the packet (which can be either client or server) that it should not send more data if it already has sent more data than the specified number of bytes, without receiving acknowledgement for them. Actually checking this limit is currently not implemented. In order to be able to check that properly, one can send the Window Acknowledgement Size packet, which tells the receiver of the packet that it needs to send Acknowledgement packets (RTMP_PT_BYTES_READ) at least after receiving a given number of bytes since the last Acknowledgement. Therefore, when we receive a Window Acknowledgement Size packet, this sets the maximum number of bytes we can receive without sending an Acknowledgement; therefore when handling this packet we should set the receive_report_size field (previously client_report_size). Signed-off-by: NMartin Storsjö <martin@martin.st>
-