1. 11 2月, 2017 10 次提交
  2. 10 2月, 2017 10 次提交
  3. 09 2月, 2017 7 次提交
    • M
      aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 · a63da451
      Martin Storsjö 提交于
      This work is sponsored by, and copyright, Google.
      
      This avoids loading and calculating coefficients that we know will
      be zero, and avoids filling the temp buffer with zeros in places
      where we know the second pass won't read.
      
      This gives a pretty substantial speedup for the smaller subpartitions.
      
      The code size increases from 14740 bytes to 24292 bytes.
      
      The idct16/32_end macros are moved above the individual functions; the
      instructions themselves are unchanged, but since new functions are added
      at the same place where the code is moved from, the diff looks rather
      messy.
      
      Before:
      vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
      vp9_inv_dct_dct_16x16_sub2_add_neon:    1051.0
      vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
      vp9_inv_dct_dct_16x16_sub8_add_neon:    1051.0
      vp9_inv_dct_dct_16x16_sub12_add_neon:   1387.4
      vp9_inv_dct_dct_16x16_sub16_add_neon:   1387.6
      vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
      vp9_inv_dct_dct_32x32_sub2_add_neon:    5198.5
      vp9_inv_dct_dct_32x32_sub4_add_neon:    5198.6
      vp9_inv_dct_dct_32x32_sub8_add_neon:    5196.3
      vp9_inv_dct_dct_32x32_sub12_add_neon:   6183.4
      vp9_inv_dct_dct_32x32_sub16_add_neon:   6174.3
      vp9_inv_dct_dct_32x32_sub20_add_neon:   7151.4
      vp9_inv_dct_dct_32x32_sub24_add_neon:   7145.3
      vp9_inv_dct_dct_32x32_sub28_add_neon:   8119.3
      vp9_inv_dct_dct_32x32_sub32_add_neon:   8118.7
      
      After:
      vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
      vp9_inv_dct_dct_16x16_sub2_add_neon:     640.8
      vp9_inv_dct_dct_16x16_sub4_add_neon:     639.0
      vp9_inv_dct_dct_16x16_sub8_add_neon:     842.0
      vp9_inv_dct_dct_16x16_sub12_add_neon:   1388.3
      vp9_inv_dct_dct_16x16_sub16_add_neon:   1389.3
      vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
      vp9_inv_dct_dct_32x32_sub2_add_neon:    3685.5
      vp9_inv_dct_dct_32x32_sub4_add_neon:    3685.1
      vp9_inv_dct_dct_32x32_sub8_add_neon:    3684.4
      vp9_inv_dct_dct_32x32_sub12_add_neon:   5312.2
      vp9_inv_dct_dct_32x32_sub16_add_neon:   5315.4
      vp9_inv_dct_dct_32x32_sub20_add_neon:   7154.9
      vp9_inv_dct_dct_32x32_sub24_add_neon:   7154.5
      vp9_inv_dct_dct_32x32_sub28_add_neon:   8126.6
      vp9_inv_dct_dct_32x32_sub32_add_neon:   8127.2
      Signed-off-by: NMartin Storsjö <martin@martin.st>
      a63da451
    • M
      arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible · 5eb5aec4
      Martin Storsjö 提交于
      This work is sponsored by, and copyright, Google.
      
      This avoids loading and calculating coefficients that we know will
      be zero, and avoids filling the temp buffer with zeros in places
      where we know the second pass won't read.
      
      This gives a pretty substantial speedup for the smaller subpartitions.
      
      The code size increases from 12388 bytes to 19784 bytes.
      
      The idct16/32_end macros are moved above the individual functions; the
      instructions themselves are unchanged, but since new functions are added
      at the same place where the code is moved from, the diff looks rather
      messy.
      
      Before:                              Cortex A7       A8       A9      A53
      vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    212.0    235.8
      vp9_inv_dct_dct_16x16_sub2_add_neon:    2102.1   1521.7   1736.2   1265.8
      vp9_inv_dct_dct_16x16_sub4_add_neon:    2104.5   1533.0   1736.6   1265.5
      vp9_inv_dct_dct_16x16_sub8_add_neon:    2484.8   1828.7   2014.4   1506.5
      vp9_inv_dct_dct_16x16_sub12_add_neon:   2851.2   2117.8   2294.8   1753.2
      vp9_inv_dct_dct_16x16_sub16_add_neon:   3239.4   2408.3   2543.5   1994.9
      vp9_inv_dct_dct_32x32_sub1_add_neon:     758.3    456.7    864.5    553.9
      vp9_inv_dct_dct_32x32_sub2_add_neon:   10776.7   7949.8   8567.7   6819.7
      vp9_inv_dct_dct_32x32_sub4_add_neon:   10865.6   8131.5   8589.6   6816.3
      vp9_inv_dct_dct_32x32_sub8_add_neon:   12053.9   9271.3   9387.7   7564.0
      vp9_inv_dct_dct_32x32_sub12_add_neon:  13328.3  10463.2  10217.0   8321.3
      vp9_inv_dct_dct_32x32_sub16_add_neon:  14176.4  11509.5  11018.7   9062.3
      vp9_inv_dct_dct_32x32_sub20_add_neon:  15301.5  12999.9  11855.1   9828.2
      vp9_inv_dct_dct_32x32_sub24_add_neon:  16482.7  14931.5  12650.1  10575.0
      vp9_inv_dct_dct_32x32_sub28_add_neon:  17589.5  15811.9  13482.8  11333.4
      vp9_inv_dct_dct_32x32_sub32_add_neon:  18696.2  17049.2  14355.6  12089.7
      
      After:
      vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    211.7    235.8
      vp9_inv_dct_dct_16x16_sub2_add_neon:    1203.5    998.2   1035.3    763.0
      vp9_inv_dct_dct_16x16_sub4_add_neon:    1203.5    998.1   1035.5    760.8
      vp9_inv_dct_dct_16x16_sub8_add_neon:    1926.1   1610.6   1722.1   1271.7
      vp9_inv_dct_dct_16x16_sub12_add_neon:   2873.2   2129.7   2285.1   1757.3
      vp9_inv_dct_dct_16x16_sub16_add_neon:   3221.4   2520.3   2557.6   2002.1
      vp9_inv_dct_dct_32x32_sub1_add_neon:     753.0    457.5    866.6    554.6
      vp9_inv_dct_dct_32x32_sub2_add_neon:    7554.6   5652.4   6048.4   4920.2
      vp9_inv_dct_dct_32x32_sub4_add_neon:    7549.9   5685.0   6046.9   4925.7
      vp9_inv_dct_dct_32x32_sub8_add_neon:    8336.9   6704.5   6604.0   5478.0
      vp9_inv_dct_dct_32x32_sub12_add_neon:  10914.0   9777.2   9240.4   7416.9
      vp9_inv_dct_dct_32x32_sub16_add_neon:  11859.2  11223.3   9966.3   8095.1
      vp9_inv_dct_dct_32x32_sub20_add_neon:  15237.1  13029.4  11838.3   9829.4
      vp9_inv_dct_dct_32x32_sub24_add_neon:  16293.2  14379.8  12644.9  10572.0
      vp9_inv_dct_dct_32x32_sub28_add_neon:  17424.3  15734.7  13473.0  11326.9
      vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.3  17457.0  14298.6  12080.0
      Signed-off-by: NMartin Storsjö <martin@martin.st>
      5eb5aec4
    • M
      aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · 79d332eb
      Martin Storsjö 提交于
      This allows reusing the macro for a separate implementation of the
      pass2 function.
      Signed-off-by: NMartin Storsjö <martin@martin.st>
      79d332eb
    • M
      arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · 47b3c2c1
      Martin Storsjö 提交于
      This allows reusing the macro for a separate implementation of the
      pass2 function.
      Signed-off-by: NMartin Storsjö <martin@martin.st>
      47b3c2c1
    • M
      aarch64: vp9itxfm: Make the larger core transforms standalone functions · 11547601
      Martin Storsjö 提交于
      This work is sponsored by, and copyright, Google.
      
      This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
      19496 to 14740 bytes.
      
      This gives a small slowdown of a couple of tens of cycles, but makes
      it more feasible to add more optimized versions of these transforms.
      
      Before:
      vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
      vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.2
      vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
      vp9_inv_dct_dct_32x32_sub32_add_neon:   8095.7
      
      After:
      vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
      vp9_inv_dct_dct_16x16_sub16_add_neon:   1390.1
      vp9_inv_dct_dct_32x32_sub4_add_neon:    5199.9
      vp9_inv_dct_dct_32x32_sub32_add_neon:   8125.8
      Signed-off-by: NMartin Storsjö <martin@martin.st>
      11547601
    • M
      arm: vp9itxfm: Make the larger core transforms standalone functions · 0331c3f5
      Martin Storsjö 提交于
      This work is sponsored by, and copyright, Google.
      
      This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
      15324 to 12388 bytes.
      
      This gives a small slowdown of a couple tens of cycles, up to around
      150 cycles for the full case of the largest transform, but makes
      it more feasible to add more optimized versions of these transforms.
      
      Before:                              Cortex A7       A8       A9      A53
      vp9_inv_dct_dct_16x16_sub4_add_neon:    2063.4   1516.0   1719.5   1245.1
      vp9_inv_dct_dct_16x16_sub16_add_neon:   3279.3   2454.5   2525.2   1982.3
      vp9_inv_dct_dct_32x32_sub4_add_neon:   10750.0   7955.4   8525.6   6754.2
      vp9_inv_dct_dct_32x32_sub32_add_neon:  18574.0  17108.4  14216.7  12010.2
      
      After:
      vp9_inv_dct_dct_16x16_sub4_add_neon:    2060.8   1608.5   1735.7   1262.0
      vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.2   2443.5   2546.1   1999.5
      vp9_inv_dct_dct_32x32_sub4_add_neon:   10682.0   8043.8   8581.3   6810.1
      vp9_inv_dct_dct_32x32_sub32_add_neon:  18522.4  17277.4  14286.7  12087.9
      Signed-off-by: NMartin Storsjö <martin@martin.st>
      0331c3f5
    • D
      configure: Correctly recurse in do_check_deps() · c546147d
      Diego Biurrun 提交于
      Fixes all sorts of configuration problems introducec by dad7a9c7
      on non-Linux or non-vanilla configs. Also removes a line made redundant
      in that commit.
      c546147d
  4. 08 2月, 2017 4 次提交
  5. 06 2月, 2017 3 次提交
  6. 05 2月, 2017 2 次提交
  7. 04 2月, 2017 1 次提交
  8. 03 2月, 2017 3 次提交
    • D
      asm: Consistently uppercase SECTION markers · 7abdd026
      Diego Biurrun 提交于
      7abdd026
    • D
      build: Ignore generated .version files · 740b0bf0
      Diego Biurrun 提交于
      740b0bf0
    • M
      rtmp: Correctly handle the Window Acknowledgement Size packets · 15a92e0c
      Martin Storsjö 提交于
      This swaps which field is set when the Window Acknowledgement Size
      and Set Peer BW packets are received, renames the fields in
      order to clarify their role further and adds verbose comments
      explaining their respective roles and how well the code currently
      does what it is supposed to.
      
      The Set Peer BW packet tells the receiver of the packet (which
      can be either client or server) that it should not send more data
      if it already has sent more data than the specified number of bytes,
      without receiving acknowledgement for them. Actually checking this
      limit is currently not implemented.
      
      In order to be able to check that properly, one can send the
      Window Acknowledgement Size packet, which tells the receiver of the
      packet that it needs to send Acknowledgement packets
      (RTMP_PT_BYTES_READ) at least after receiving a given number of bytes
      since the last Acknowledgement.
      
      Therefore, when we receive a Window Acknowledgement Size packet,
      this sets the maximum number of bytes we can receive without sending
      an Acknowledgement; therefore when handling this packet we should set
      the receive_report_size field (previously client_report_size).
      Signed-off-by: NMartin Storsjö <martin@martin.st>
      15a92e0c