1. 12 1月, 2016 2 次提交
  2. 09 1月, 2016 1 次提交
  3. 07 1月, 2016 8 次提交
  4. 04 1月, 2016 4 次提交
  5. 01 1月, 2016 3 次提交
  6. 31 12月, 2015 1 次提交
  7. 30 12月, 2015 1 次提交
    • J
      x86: use emms after ff_int32_to_float_fmul_scalar_sse · 8563f988
      Janne Grunau 提交于
      Intel's Instruction Set Reference (as of September 2015) clearly states
      that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
      source is a memory location. The Instruction Set Reference from 1999
      (Order Number 243191) describes this behaviour but all later versions
      I've seen have make no distinction whether MMX registers or memory is
      used as source.
      The documentation for the matching SSE2 instruction to convert to double
      (cvtpi2pd) was fixed (see the valgrind bug
      https://bugs.kde.org/show_bug.cgi?id=210264).
      
      It will take time to get a clarification and fixes in place. In the
      meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
      be correct according to the documentation. The vast majority of users
      will have SSE2 so a change to the SSE version has little effect.
      
      Fixes fate-checkasm on x86 valgrind targets.
      
      Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
      8563f988
  8. 29 12月, 2015 2 次提交
  9. 26 12月, 2015 2 次提交
  10. 24 12月, 2015 1 次提交
  11. 23 12月, 2015 2 次提交
    • A
      dca: change the core to work with integer coefficients. · aebf0707
      Alexandra Hájková 提交于
      The DCA core decoder converts integer coefficients read from the
      bitstream to floats just after reading them (along with dequantization).
      All the other steps of the audio reconstruction are done with floats
      which makes the output for the DTS lossless extension (XLL)
      actually lossy.
      This patch changes the DCA core to work with integer coefficients
      until QMF. At this point the integer coefficients are converted to floats.
      The coefficients for the LFE channel (lfe_data) are not touched.
      This is the first step for the really lossless XLL decoding.
      aebf0707
    • A
      dca: Add math helpers. · 85990140
      Alexandra Hájková 提交于
      They will be used by the integer core decoder.
      85990140
  12. 22 12月, 2015 5 次提交
  13. 21 12月, 2015 1 次提交
  14. 17 12月, 2015 2 次提交
  15. 14 12月, 2015 5 次提交
    • J
      arm: add ff_int32_to_float_fmul_array8_neon · 90b1b935
      Janne Grunau 提交于
      Quite a bit faster than int32_to_float_fmul_array8_c calling
      ff_int32_to_float_fmul_scalar_neon through FmtConvertContext.
      Number of cycles per int32_to_float_fmul_array8 call while decoding
      padded.dts on exynos5422:
      
                     before  after   change
      cortex-a7:     1270     951    -25%
      cortex-a15:     434     285    -34%
      
      checkasm --bench cycle counts:     cortex-a15   cortex-a7
      int32_to_float_fmul_array8_c:      1730.4       4384.5
      int32_to_float_fmul_array8_neon_c:  571.5       1694.3
      int32_to_float_fmul_array8_neon:    374.0       1448.8
      
      Interesting are the differences between
      int32_to_float_fmul_array8_neon_c and int32_to_float_fmul_array8_neon.
      The former is current behaviour of calling
      ff_int32_to_float_fmul_scalar_neon repeatedly from the c function,
      The raw numbers differ since checkasm uses different lengths than the
      dca decoder.
      90b1b935
    • J
      arm64: int32_to_float_fmul neon asm · a0fc780a
      Janne Grunau 提交于
      3% faster dts decoding on a cortex-a57.
      
                                       cortex-a57   cortex-a53
      int32_to_float_fmul_array8_c:    1270.9       4475.6
      int32_to_float_fmul_array8_neon:  328.6        569.2
      int32_to_float_fmul_scalar_c:     928.5       4119.6
      int32_to_float_fmul_scalar_neon:  309.1        524.1
      a0fc780a
    • J
      arm64: port synth_filter_float_neon from arm · 705f5e5e
      Janne Grunau 提交于
      ~25% faster dts decoding overall. The checkasm CPU cycles numbers are
      not that useful since synth_filter_float() calls FFTContext.imdct_half().
      
                               cortex-a57   cortex-a53
      synth_filter_float_c:    1866.2       3490.9
      synth_filter_float_neon:  915.0       1531.5
      
      With fftc.imdct_half forced to imdct_half_neon:
                               cortex-a57   cortex-a53
      synth_filter_float_c:    1718.4       3025.3
      synth_filter_float_neon:  926.2       1530.1
      705f5e5e
    • J
      arm64: convert dcadsp neon asm from arm · c33c1fa8
      Janne Grunau 提交于
      ~2% faster dts decoding overall.
      
                          cortex-a57   cortex-a53
      dca_decode_hf_c:    474.8        1659.9
      dca_decode_hf_neon: 225.2         301.1
      dca_lfe_fir0_c:     913.2        1537.7
      dca_lfe_fir0_neon:  286.8         451.9
      dca_lfe_fir1_c:     848.7        1711.5
      dca_lfe_fir1_neon:  387.1         506.4
      c33c1fa8
    • J
      arm: add a cpu flag for the VFPv2 vector mode · e2710e79
      Janne Grunau 提交于
      The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
      implementations do not support it in hardware. Vector mode code will
      depending the OS either be emulated in software or result in an illegal
      instruction on cpus which does not support it. This was not really
      problem in practice since NEON implementations of the same functions are
      preferred. It will however become a problem for checkasm which tests
      every cpu flag separately.
      
      Since this is a cpu feature newer cpu do not support anymore the
      behaviour of this flag differs from the other flags. It can be only
      activated by runtime cpu feature selection.
      e2710e79