1. 06 1月, 2013 3 次提交
  2. 05 1月, 2013 9 次提交
  3. 28 12月, 2012 10 次提交
  4. 26 12月, 2012 4 次提交
  5. 22 12月, 2012 14 次提交
    • K
      [media] vivi: Optimize precalculate_line() · d40fbf8d
      Kirill Smelkov 提交于
      precalculate_line() is not very high on profile, but it calls expensive
      gen_twopix(), so let's polish it too:
          call gen_twopix() only once for every color bar and then distribute
          the result.
      before:
          # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
          #
          # Samples: 46K of event 'cycles'
          # Event count (approx.): 15574200568
          #
          # Overhead          Command         Shared Object
          # ........  ...............  ....................
          #
              27.99%             rawv  libc-2.13.so          [.] __memcpy_ssse3
              23.29%           vivi-*  [kernel.kallsyms]     [k] memcpy
              10.30%             Xorg  [unknown]             [.] 0xa75c98f8
               5.34%           vivi-*  [vivi]                [k] gen_text.constprop.6
               4.61%             rawv  [vivi]                [k] gen_twopix
               2.64%             rawv  [vivi]                [k] precalculate_line
               1.37%          swapper  [kernel.kallsyms]     [k] read_hpet
      after:
          # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
          #
          # Samples: 45K of event 'cycles'
          # Event count (approx.): 15561769214
          #
          # Overhead          Command         Shared Object
          # ........  ...............  ....................
          #
              30.73%             rawv  libc-2.13.so          [.] __memcpy_ssse3
              26.78%           vivi-*  [kernel.kallsyms]     [k] memcpy
              10.68%             Xorg  [unknown]             [.] 0xa73015e9
               5.55%           vivi-*  [vivi]                [k] gen_text.constprop.6
               1.36%          swapper  [kernel.kallsyms]     [k] read_hpet
               0.96%             Xorg  [kernel.kallsyms]     [k] read_hpet
               ...
               0.16%             rawv  [vivi]                [k] precalculate_line
               ...
               0.14%             rawv  [vivi]                [k] gen_twopix
      (i.e. gen_twopix and precalculate_line overheads are almost gone)
      Signed-off-by: NKirill Smelkov <kirr@mns.spb.ru>
      Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      d40fbf8d
    • K
      [media] vivi: Move computations out of vivi_fillbuf linecopy loop · 13908f33
      Kirill Smelkov 提交于
      The "dev->mvcount % wmax" thing was showing high in profiles (we do it
      for each line which ~ 500 per frame)
                 ?     000010c0 <vivi_fillbuff>:
                       ...
            0,39 ? 70:???mov    0x3ff4(%edi),%esi
            0,22 ? 76:?  mov    0x2a0(%edi),%eax
            0,30 ?    ?  mov    -0x84(%ebp),%ebx
            0,35 ?    ?  mov    %eax,%edx
            0,04 ?    ?  mov    -0x7c(%ebp),%ecx
            0,35 ?    ?  sar    $0x1f,%edx
            0,44 ?    ?  idivl  -0x7c(%ebp)
           21,68 ?    ?  imul   %esi,%ecx
            0,70 ?    ?  imul   %esi,%ebx
            0,52 ?    ?  add    -0x88(%ebp),%ebx
            1,65 ?    ?  mov    %ebx,%eax
            0,22 ?    ?  imul   %edx,%esi
            0,04 ?    ?  lea    0x3f4(%edi,%esi,1),%edx
            2,18 ?    ?? call   vivi_fillbuff+0xa6
            0,74 ?    ?  addl   $0x1,-0x80(%ebp)
           62,69 ?    ?  mov    -0x7c(%ebp),%edx
            1,18 ?    ?  mov    -0x80(%ebp),%ecx
            0,35 ?    ?  add    %edx,-0x84(%ebp)
            0,61 ?    ?  cmp    %ecx,-0x8c(%ebp)
            0,22 ?    ???jne    70
      so since all variables stay the same for all iterations let's move
      computations out of the loop: the abovementioned division and
      "width*pixelsize" too
      before:
          # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
          #
          # Samples: 49K of event 'cycles'
          # Event count (approx.): 16475832370
          #
          # Overhead          Command           Shared Object
          # ........  ...............  ......................
          #
              29.07%             rawv  libc-2.13.so            [.] __memcpy_ssse3
              20.57%           vivi-*  [kernel.kallsyms]       [k] memcpy
              10.20%             Xorg  [unknown]               [.] 0xa7301494
               5.16%           vivi-*  [vivi]                  [k] gen_text.constprop.6
               4.43%             rawv  [vivi]                  [k] gen_twopix
               4.36%           vivi-*  [vivi]                  [k] vivi_fillbuff
               2.42%             rawv  [vivi]                  [k] precalculate_line
               1.33%          swapper  [kernel.kallsyms]       [k] read_hpet
      after:
          # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
          #
          # Samples: 46K of event 'cycles'
          # Event count (approx.): 15574200568
          #
          # Overhead          Command         Shared Object
          # ........  ...............  ....................
          #
              27.99%             rawv  libc-2.13.so          [.] __memcpy_ssse3
              23.29%           vivi-*  [kernel.kallsyms]     [k] memcpy
              10.30%             Xorg  [unknown]             [.] 0xa75c98f8
               5.34%           vivi-*  [vivi]                [k] gen_text.constprop.6
               4.61%             rawv  [vivi]                [k] gen_twopix
               2.64%             rawv  [vivi]                [k] precalculate_line
               1.37%          swapper  [kernel.kallsyms]     [k] read_hpet
               0.79%             Xorg  [kernel.kallsyms]     [k] read_hpet
               0.64%             Xorg  [kernel.kallsyms]     [k] unix_poll
               0.45%             Xorg  [kernel.kallsyms]     [k] fget_light
               0.43%             rawv  libxcb.so.1.1.0       [.] 0x0000aae9
               0.40%            runsv  [kernel.kallsyms]     [k] ext2_try_to_allocate
               0.36%             Xorg  [kernel.kallsyms]     [k] _raw_spin_lock_irqsave
               0.31%           vivi-*  [vivi]                [k] vivi_fillbuff
      (i.e. vivi_fillbuff own overhead is almost gone)
      Signed-off-by: NKirill Smelkov <kirr@mns.spb.ru>
      Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      13908f33
    • K
      [media] vivi: vivi_dev->line[] was not aligned · 10ce8441
      Kirill Smelkov 提交于
      Though dev->line[] is u8 array we work with it as with u16, u24 or u32
      pixels, and also pass it to memcpy() and it's better to align it to at
      least 4.
      Before the patch, on x86 offsetof(vivi_dev, line) was 1003 and after
      patch it is 1004.
      There is slight performance increase, but I think is is slight, only
      because we start copying not from line[0]:
          ---- 8< ---- drivers/media/platform/vivi.c
          static void vivi_fillbuff(struct vivi_dev *dev, struct vivi_buffer *buf)
          {
                  ...
                  for (h = 0; h < hmax; h++)
                          memcpy(vbuf + h * wmax * dev->pixelsize,
                                 dev->line + (dev->mv_count % wmax) * dev->pixelsize,
                                 wmax * dev->pixelsize);
      before:
          # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
          #
          # Samples: 49K of event 'cycles'
          # Event count (approx.): 16799780016
          #
          # Overhead          Command         Shared Object
          # ........  ...............  ....................
          #
              27.51%             rawv  libc-2.13.so          [.] __memcpy_ssse3
              23.77%           vivi-*  [kernel.kallsyms]     [k] memcpy
               9.96%             Xorg  [unknown]             [.] 0xa76f5e12
               4.94%           vivi-*  [vivi]                [k] gen_text.constprop.6
               4.44%             rawv  [vivi]                [k] gen_twopix
               3.17%           vivi-*  [vivi]                [k] vivi_fillbuff
               2.45%             rawv  [vivi]                [k] precalculate_line
               1.20%          swapper  [kernel.kallsyms]     [k] read_hpet
          23.77%           vivi-*  [kernel.kallsyms]     [k] memcpy
                           |
                           --- memcpy
                              |
                              |--99.28%-- vivi_fillbuff
                              |          vivi_thread
                              |          kthread
                              |          ret_from_kernel_thread
                               --0.72%-- [...]
      after:
          # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
          #
          # Samples: 49K of event 'cycles'
          # Event count (approx.): 16475832370
          #
          # Overhead          Command           Shared Object
          # ........  ...............  ......................
          #
              29.07%             rawv  libc-2.13.so            [.] __memcpy_ssse3
              20.57%           vivi-*  [kernel.kallsyms]       [k] memcpy
              10.20%             Xorg  [unknown]               [.] 0xa7301494
               5.16%           vivi-*  [vivi]                  [k] gen_text.constprop.6
               4.43%             rawv  [vivi]                  [k] gen_twopix
               4.36%           vivi-*  [vivi]                  [k] vivi_fillbuff
               2.42%             rawv  [vivi]                  [k] precalculate_line
               1.33%          swapper  [kernel.kallsyms]       [k] read_hpet
      Signed-off-by: NKirill Smelkov <kirr@mns.spb.ru>
      Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      10ce8441
    • K
      [media] vivi: Optimize gen_text() · e3a8b4d2
      Kirill Smelkov 提交于
      I've noticed that vivi takes a lot of CPU to produce its frames.
      For example for 8 devices and 8 simple programs running, where each
      captures YUY2 640x480 and displays it to X via SDL, profile timing is as
      follows:
          # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
          # Samples: 82K of event 'cycles'
          # Event count (approx.): 31551930117
          #
          # Overhead          Command         Shared Object                                                           Symbol
          # ........  ...............  ....................
          #
              49.48%           vivi-*  [vivi]                [k] gen_twopix
              10.79%           vivi-*  [kernel.kallsyms]     [k] memcpy
              10.02%             rawv  libc-2.13.so          [.] __memcpy_ssse3
               8.35%           vivi-*  [vivi]                [k] gen_text.constprop.6
               5.06%             Xorg  [unknown]             [.] 0xa73015f8
               2.32%             rawv  [vivi]                [k] gen_twopix
               1.22%             rawv  [vivi]                [k] precalculate_line
               1.20%           vivi-*  [vivi]                [k] vivi_fillbuff
          (rawv is display program, vivi-* is a combination of vivi-000 through vivi-007)
      so a lot of time is spent in gen_twopix() which as the follwing
      call-graph profile shows ...
          49.48%           vivi-*  [vivi]                [k] gen_twopix
                           |
                           --- gen_twopix
                              |
                              |--96.30%-- gen_text.constprop.6
                              |          vivi_fillbuff
                              |          vivi_thread
                              |          kthread
                              |          ret_from_kernel_thread
                              |
                               --3.70%-- vivi_fillbuff
                                         vivi_thread
                                         kthread
                                         ret_from_kernel_thread
      ... is called mostly from gen_text().
      If we'll look at gen_text(), in the inner loop, we'll see
          if (chr & (1 << (7 - i)))
                  gen_twopix(dev, pos + j * dev->pixelsize, WHITE, (x+y) & 1);
          else
                  gen_twopix(dev, pos + j * dev->pixelsize, TEXT_BLACK, (x+y) & 1);
      which calls gen_twopix() for every character pixel, and that is very
      expensive, because gen_twopix() branches several times.
      Now, let's note, that we operate on only two colors - WHITE and
      TEXT_BLACK, and that pixel for that colors could be precomputed and
      gen_twopix() moved out of the inner loop. Also note, that for black
      and white colors even/odd does not make a difference for all supported
      pixel formats, so we could stop doing that `odd` gen_twopix() parameter
      game.
      So the first thing we are doing here is
          1) moving gen_twopix() calls out of gen_text() into vivi_fillbuff(),
             to pregenerate black and white colors, just before printing
             starts.
      what we have next is that gen_text's font rendering loop, even with
      gen_twopix() calls moved out, was inefficient and branchy, so let's
          2) rewrite gen_text() loop so it uses less variables + unroll char
             horizontal-rendering loop + instantiate 3 code paths for pixelsizes 2,3
             and 4 so that in all inner loops we don't have to branch or make
             indirections (*).
      Done all above reworks, for gen_text() we get nice, non-branchy
      streamlined code (showing loop for pixelsize=2):
                 ?       cmp    $0x2,%eax
                 ?     ? jne    26
                 ?       mov    -0x18(%ebp),%eax
                 ?       mov    -0x20(%ebp),%edi
                 ?       imul   -0x20(%ebp),%eax
                 ?       movzwl 0x3ffc(%ebx),%esi
            0,08 ?       movzwl 0x4000(%ebx),%ecx
            0,04 ?       add    %edi,%edi
                 ?       mov    0x0,%ebx
            0,51 ?       mov    %edi,-0x1c(%ebp)
                 ?       mov    %ebx,-0x14(%ebp)
                 ?       movl   $0x0,-0x10(%ebp)
                 ?       lea    0x20(%edx,%eax,2),%eax
                 ?       mov    %eax,-0x18(%ebp)
                 ?       xchg   %ax,%ax
            0,04 ? a0:   mov    0x8(%ebp),%ebx
                 ?       mov    -0x18(%ebp),%eax
            0,04 ?       movzbl (%ebx),%edx
            0,16 ?       test   %dl,%dl
            0,04 ?     ? je     128
            0,08 ?       lea    0x0(%esi),%esi
            1,61 ? b0:???shl    $0x4,%edx
            1,02 ?    ?  mov    -0x14(%ebp),%edi
            2,04 ?    ?  add    -0x10(%ebp),%edx
            2,24 ?    ?  lea    0x1(%ebx),%ebx
            0,27 ?    ?  movzbl (%edi,%edx,1),%edx
            9,92 ?    ?  mov    %esi,%edi
            0,39 ?    ?  test   %dl,%dl
            2,04 ?    ?  cmovns %ecx,%edi
            4,63 ?    ?  test   $0x40,%dl
            0,55 ?    ?  mov    %di,(%eax)
            3,76 ?    ?  mov    %esi,%edi
            0,71 ?    ?  cmove  %ecx,%edi
            3,41 ?    ?  test   $0x20,%dl
            0,75 ?    ?  mov    %di,0x2(%eax)
            2,43 ?    ?  mov    %esi,%edi
            0,59 ?    ?  cmove  %ecx,%edi
            4,59 ?    ?  test   $0x10,%dl
            0,67 ?    ?  mov    %di,0x4(%eax)
            2,55 ?    ?  mov    %esi,%edi
            0,78 ?    ?  cmove  %ecx,%edi
            4,31 ?    ?  test   $0x8,%dl
            0,67 ?    ?  mov    %di,0x6(%eax)
            5,76 ?    ?  mov    %esi,%edi
            1,80 ?    ?  cmove  %ecx,%edi
            4,20 ?    ?  test   $0x4,%dl
            0,86 ?    ?  mov    %di,0x8(%eax)
            2,98 ?    ?  mov    %esi,%edi
            1,37 ?    ?  cmove  %ecx,%edi
            4,67 ?    ?  test   $0x2,%dl
            0,20 ?    ?  mov    %di,0xa(%eax)
            2,78 ?    ?  mov    %esi,%edi
            0,75 ?    ?  cmove  %ecx,%edi
            3,92 ?    ?  and    $0x1,%edx
            0,75 ?    ?  mov    %esi,%edx
            2,59 ?    ?  mov    %di,0xc(%eax)
            0,59 ?    ?  cmove  %ecx,%edx
            3,10 ?    ?  mov    %dx,0xe(%eax)
            2,39 ?    ?  add    $0x10,%eax
            0,51 ?    ?  movzbl (%ebx),%edx
            2,86 ?    ?  test   %dl,%dl
            2,31 ?    ???jne    b0
            0,04 ?128:   addl   $0x1,-0x10(%ebp)
            4,00 ?       mov    -0x1c(%ebp),%eax
            0,04 ?       add    %eax,-0x18(%ebp)
            0,08 ?       cmpl   $0x10,-0x10(%ebp)
                 ?     ? jne    a0
      which almost goes away from the profile:
          # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
          # Samples: 49K of event 'cycles'
          # Event count (approx.): 16799780016
          #
          # Overhead          Command         Shared Object                                                           Symbol
          # ........  ...............  ....................
          #
              27.51%             rawv  libc-2.13.so          [.] __memcpy_ssse3
              23.77%           vivi-*  [kernel.kallsyms]     [k] memcpy
               9.96%             Xorg  [unknown]             [.] 0xa76f5e12
               4.94%           vivi-*  [vivi]                [k] gen_text.constprop.6
               4.44%             rawv  [vivi]                [k] gen_twopix
               3.17%           vivi-*  [vivi]                [k] vivi_fillbuff
               2.45%             rawv  [vivi]                [k] precalculate_line
               1.20%          swapper  [kernel.kallsyms]     [k] read_hpet
      i.e. gen_twopix() overhead dropped from 49% to 4% and gen_text() loops
      from ~8% to ~4%, and overal cycles count dropped from 31551930117 to
      16799780016 which is ~1.9x whole workload speedup.
      (*) for RGB24 rendering I've introduced x24, which could be thought as
          synthetic u24 for simplifying the code. That's done because for
          memcpy used for conditional assignment, gcc generates suboptimal code
          with more indirections.
          Fortunately, in C struct assignment is builtin and that's all we
          need from pixeltype for font rendering.
      Signed-off-by: NKirill Smelkov <kirr@mns.spb.ru>
      Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      e3a8b4d2
    • J
      [media] media: m2m-deinterlace: Do not set debugging flag to true · 20272409
      Javier Martin 提交于
      Default value should be 'debugging disabled'.
      Signed-off-by: NJavier Martin <javier.martin@vista-silicon.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      20272409
    • J
      [media] media: coda: Fix H.264 header alignment - v2 · 832fbb5a
      Javier Martin 提交于
      Length of H.264 headers is variable and thus it might not be
      aligned for the coda to append the encoded frame. This causes
      the first frame to overwrite part of the H.264 PPS.
      In order to solve that, a filler NAL must be added between
      the headers and the first frame to preserve alignment.
      
      [mchehab@redhat.com: applied only v2 diff here, as v1 ended by mistakenly
       being applied]
      Signed-off-by: NJavier Martin <javier.martin@vista-silicon.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      832fbb5a
    • J
      [media] media: coda: Fix H.264 header alignment · 3f3f5c7f
      Javier Martin 提交于
      Length of H.264 headers is variable and thus it might not be
      aligned for the coda to append the encoded frame. This causes
      the first frame to overwrite part of the H.264 PPS.
      In order to solve that, a filler NAL must be added between
      the headers and the first frame to preserve alignment.
      
      [mchehab@redhat.com: Fix a few CodingStyle issues]
      Signed-off-by: NJavier Martin <javier.martin@vista-silicon.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      3f3f5c7f
    • W
      [media] davinci: vpbe: remove unused variable in vpbe_initialize() · cc91de5f
      Wei Yongjun 提交于
      The variable 'output_index' is initialized but never used
      otherwise, so remove the unused variable.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Acked-by: NPrabhakar Lad <prabhakar.lad@ti.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      cc91de5f
    • W
      [media] media: davinci: vpbe: return error code on error in vpbe_display_g_crop() · e276f03b
      Wei Yongjun 提交于
      We have assigned error code to 'ret' if crop->type is not
      V4L2_BUF_TYPE_VIDEO_OUTPUT, but never use it.
      We'd better return the error code on this error.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Acked-by: NPrabhakar Lad <prabhakar.lad@ti.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      e276f03b
    • W
      [media] media: davinci: vpbe: fix return value check in vpbe_display_reqbufs() · 4d22f108
      Wei Yongjun 提交于
      In case of error, the function vb2_dma_contig_init_ctx() returns
      ERR_PTR() and never returns NULL. The NULL test in the return value
      check should be replaced with IS_ERR().
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Acked-by: NPrabhakar Lad <prabhakar.lad@ti.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      4d22f108
    • L
      [media] media: davinci: vpbe: enable building of vpbe driver for DM355 and DM365 · cfe9dbd8
      Lad, Prabhakar 提交于
      This patch allows enabling building of VPBE display driver for DM365
      and DM355. This also removes unnecessary entry VIDEO_DM644X_VPBE
      in Kconfig, which could have been done with single entry, and
      appropriate changes in Makefile for building.
      Signed-off-by: NLad, Prabhakar <prabhakar.lad@ti.com>
      Signed-off-by: NManjunath Hadli <manjunath.hadli@ti.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      cfe9dbd8
    • L
      [media] davinci: vpbe: pass different platform names to handle different ip's · caff80c3
      Lad, Prabhakar 提交于
      The vpbe driver can handle different platforms DM644X, DM36X and
      DM355. To differentiate between this platforms venc_type/vpbe_type
      was passed as part of platform data which was incorrect. The correct
      way to differentiate to handle this case is by passing different
      platform names.
      This patch creates platform_device_id[] array supporting different
      platforms and assigns id_table to the platform driver, and finally
      in the probe gets the actual device by using platform_get_device_id()
      and gets the appropriate driver data for that platform.
      Taking this approach will also make the DT transition easier.
      Signed-off-by: NLad, Prabhakar <prabhakar.lad@ti.com>
      Signed-off-by: NManjunath Hadli <manjunath.hadli@ti.com>
      Acked-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      caff80c3
    • M
      [media] davinci/vpss: add helper functions for setting hw params · d31c1002
      Manjunath Hadli 提交于
      Add vpss helper functions to be used in the main driver for setting
      hardware parameters.
      
      Add interface functions to set sync polarity, interrupt completion and
      pageframe size in vpss to be used by the main driver.
      Signed-off-by: NManjunath Hadli <manjunath.hadli@ti.com>
      Signed-off-by: NLad, Prabhakar <prabhakar.lad@ti.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      d31c1002
    • M
      [media] davinci: vpss: dm365: set vpss clk ctrl · 3de93941
      Manjunath Hadli 提交于
      request_mem_region for VPSS_CLK_CTRL register and ioremap.
      and enable clocks appropriately.
      Signed-off-by: NManjunath Hadli <manjunath.hadli@ti.com>
      Signed-off-by: NLad, Prabhakar <prabhakar.lad@ti.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      3de93941