1. 11 11月, 2021 16 次提交
  2. 10 11月, 2021 6 次提交
    • V
      Revert "drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping" · 4579509e
      Vandita Kulkarni 提交于
      This reverts commit 991d9557 ("drm/i915/tgl/dsi: Gate the ddi clocks
      after pll mapping"). The Bspec was updated recently with the pll ungate
      sequence similar to that of icl dsi enable sequence. Hence reverting.
      
      Bspec: 49187
      Fixes: 991d9557 ("drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping")
      Cc: <stable@vger.kernel.org> # v5.4+
      Signed-off-by: NVandita Kulkarni <vandita.kulkarni@intel.com>
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211109120428.15211-1-vandita.kulkarni@intel.com
      4579509e
    • D
      drm/i915: pin: delete duplicate check in intel_pin_and_fence_fb_obj() · 6cff894e
      Dan Carpenter 提交于
      The "ret" variable is checked on the previous line so we know it's
      zero.  No need to check again.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211109114850.GB16587@kili
      6cff894e
    • V
      drm/i915: Call intel_update_active_dpll() for both bigjoiner pipes · c68dac96
      Ville Syrjälä 提交于
      Currently we're only calling intel_update_active_dpll() for the
      bigjoiner master pipe but not for the slave. With TC ports this
      leads to the two pipes end up trying to use different PLLs
      (TC vs. TBT). What's worse we're enabling the PLL that didn't get
      intel_update_active_dpll() called on it at the spot where we
      need the clocks turned on. So we turn on the wrong PLL and the
      DDI is now trying to source its clock from the other PLL which is
      still disabled. Naturally that doesn't end so well and the DDI
      fails to start up.
      
      The state checker also gets a bit unhappy (which is a good thing)
      when it notices that one of the pipes was using the wrong PLL.
      
      Let's fix this by remembering to call intel_update_active_dpll()
      for both pipes. That should get the correct PLL turned on when
      we need it, and the state checker should also be happy.
      
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Manasi Navare <manasi.d.navare@intel.com>
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4434
      Fixes: e12d6218 ("drm/i915: Reduce bigjoiner special casing")
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211105212156.5697-1-ville.syrjala@linux.intel.comReviewed-by: NImre Deak <imre.deak@intel.com>
      c68dac96
    • V
      drm/i915: Use unlocked register accesses for LUT loads · 115e0f68
      Ville Syrjälä 提交于
      We have to bash in a lot of registers to load the higher
      precision LUT modes. The locking overhead is significant, especially
      as we have to get this done as quickly as possible during vblank.
      So let's switch to unlocked accesses for these. Fortunately the LUT
      registers are mostly spread around such that two pipes do not have
      any registers on the same cacheline. So as long as commits on the
      same pipe are serialized (which they are) we should get away with
      this without angering the hardware.
      
      The only exceptions are the PREC_PIPEGCMAX registers on ilk/snb which
      we don't use atm as they are only used in the 12bit gamma mode. If/when
      we add support for that we may need to remember to still serialize
      those registers, though I'm not sure ilk/snb are actually affected
      by the same cacheline issue. I think ivb/hsw at least were, but they
      use a different set of registers for the precision LUT.
      
      I have a test case which is updating the LUTs on two pipes from a
      single atomic commit. Running that in a loop for a minute I get the
      following worst case with the locks in place:
       intel_crtc_vblank_work_start: pipe B, frame=10037, scanline=1081
       intel_crtc_vblank_work_start: pipe A, frame=12274, scanline=769
       intel_crtc_vblank_work_end: pipe A, frame=12274, scanline=58
       intel_crtc_vblank_work_end: pipe B, frame=10037, scanline=74
      
      And here's the worst case with the locks removed:
       intel_crtc_vblank_work_start: pipe B, frame=5869, scanline=1081
       intel_crtc_vblank_work_start: pipe A, frame=7616, scanline=769
       intel_crtc_vblank_work_end: pipe B, frame=5869, scanline=1096
       intel_crtc_vblank_work_end: pipe A, frame=7616, scanline=777
      
      The test was done on a snb using the 10bit 1024 entry LUT mode.
      The vtotals for the two displays are 793 and 1125. So we can
      see that with the locks ripped out the LUT updates are pretty
      nicely confined within the vblank, whereas with the locks in
      place we're routinely blasting past the vblank end which causes
      visual artifacts near the top of the screen.
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020223339.669-5-ville.syrjala@linux.intel.comReviewed-by: NUma Shankar <uma.shankar@intel.com>
      115e0f68
    • V
      drm/i915: Use vblank workers for gamma updates · 2bbc6fca
      Ville Syrjälä 提交于
      The pipe gamma registers are single buffered so they should only
      be updated during the vblank to avoid screen tearing. In fact they
      really should only be updated between start of vblank and frame
      start because that is the only time the pipe is guaranteed to be
      empty. Already at frame start the pipe begins to fill up with
      data for the next frame.
      
      Unfortunately frame start happens ~1 scanline after the start
      of vblank which in practice doesn't always leave us enough time to
      finish the gamma update in time (gamma LUTs can be several KiB of
      data we have to bash into the registers). However we must try our
      best and so we'll add a vblank work for each pipe from where we
      can do the gamma update. Additionally we could consider pushing
      frame start forward to the max of ~4 scanlines after start of
      vblank. But not sure that's exactly a validated configuration.
      As it stands the ~100 first pixels tend to make it through with
      the old gamma values.
      
      Even though the vblank worker is running on a high prority thread
      we still have to contend with C-states. If the CPU happens be in
      a deep C-state when the vblank interrupt arrives even the irq
      handler gets delayed massively (I've observed dozens of scanlines
      worth of latency). To avoid that problem we'll use the qos mechanism
      to keep the CPU awake while the vblank work is scheduled.
      
      With all this hooked up we can finally enjoy near atomic gamma
      updates. It even works across several pipes from the same atomic
      commit which previously was a total fail because we did the
      gamma updates for each pipe serially after waiting for all
      pipes to have latched the double buffered registers.
      
      In the future the DSB should take over this responsibility
      which will hopefully avoid some of these issues.
      
      Kudos to Lyude for finishing the actual vblank workers.
      Works like the proverbial train toilet.
      
      v2: Add missing intel_atomic_state fwd declaration
      v3: Clean up properly when not scheduling the worker
      v4: Clean up the rest and add tracepoints
      v5: s/intel_wait_for_vblank_works/intel_wait_for_vblank_workers/ (Jani,Uma)
      
      CC: Lyude Paul <lyude@redhat.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020223339.669-4-ville.syrjala@linux.intel.comReviewed-by: NUma Shankar <uma.shankar@intel.com>
      2bbc6fca
    • V
      drm/i915: Do vrr push before sampling the frame counter · 6f9976bd
      Ville Syrjälä 提交于
      Do the vrr push before we sample the frame counter to
      know when the commit has been latched. Doing these in the
      wrong order could lead us to complete the flip before it
      has actually happened.
      
      Cc: Manasi Navare <manasi.d.navare@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020223339.669-3-ville.syrjala@linux.intel.comReviewed-by: NUma Shankar <uma.shankar@intel.com>
      6f9976bd
  3. 09 11月, 2021 2 次提交
  4. 06 11月, 2021 1 次提交
  5. 05 11月, 2021 6 次提交
  6. 04 11月, 2021 9 次提交
    • V
      drm/i915: Split vlv/chv sprite plane update into noarm+arm pair · a14fef80
      Ville Syrjälä 提交于
      Chop vlv_sprite_update() into two halves. Fist half becomes
      the _noarm() variant, second part the _arm() variant.
      
      Fortunately I have already previously grouped the register
      writes into roughtly the correct order, so the split looks
      surprisingly clean.
      
      Looks like most of the hardware logic was copied from the
      pre-ctg sprite C, so SPSTRIDE/POS/SIZE are armed by SPSURF,
      while the rest are self arming. SPCONSTALPHA is the one
      entirely new register that didn't exist in the old sprite C,
      and looks like that one is self arming. The CHV pipe B CSC
      is also self arming, like the rest of the CHV pipe B
      additions.
      
      I didn't have time to capture i915_update_info numbers for
      these, but since all the other platforms generally showed
      improvements, and crucially no regression, I am fairly
      confident this should behave similarly.
      
      Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-10-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      a14fef80
    • V
      drm/i915: Split ivb+ sprite plane update into noarm+arm pair · 50105a3a
      Ville Syrjälä 提交于
      Chop ivb_sprite_update() into two halves. Fist half becomes
      the _noarm() variant, second part the _arm() variant.
      
      Fortunately I have already previously grouped the register
      writes into roughtly the correct order, so the split looks
      surprisingly clean.
      
      Didn't bother with i915_update_info numbers for this one.
      I expect the results to be pretty much identical to the snb
      numbers from the corresponding g4x+ sprite modification.
      
      Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-9-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      50105a3a
    • V
      drm/i915: Split g4x+ sprite plane update into noarm+arm pair · 120542e2
      Ville Syrjälä 提交于
      Chop g4x_sprite_update() into two halves. Fist half becomes
      the _noarm() variant, second part the _arm() variant.
      
      Fortunately I have already previously grouped the register
      writes into roughtly the correct order, so the split looks
      surprisingly clean.
      
      Not much of a change in i915_update_info on these older
      platforms that don't have so many planes or registers to
      begin with. Here are the numbers from snb (totally unpatched
      vs. both primary plane and sprite patched applied) running
      kms_atomic_transition --r plane-all-transition --extended:
      w/o patch                           w/ patch
      Updates: 5404			    Updates: 5405
             |			    	   |
         1us |******			       1us |******
             |*********		    	   |*********
         4us |***********		       4us |***********
             |**********		    	   |**********
        16us |**			      16us |**
             |			    	   |
        66us |			      66us |
             |			    	   |
       262us |			     262us |
             |			    	   |
         1ms |			       1ms |
             |			    	   |
         4ms |			       4ms |
             |			    	   |
        17ms |			      17ms |
             |			    	   |
      Min update: 1400ns		    Min update: 1307ns
      Max update: 19809ns		    Max update: 20194ns
      Average update: 6957ns		    Average update: 6432ns
      Overruns > 100us: 0		    Overruns > 100us: 0
      
      But there seems to be a slight improvement with
      lockdep enabled:
      w/o patch                           w/ patch
      Updates: 17612			    Updates: 16364
             |			    	   |
         1us |			       1us |
             |******			    	   |******
         4us |**********		       4us |**********
             |************		    	   |*************
        16us |*************		      16us |************
             |***			    	   |*
        66us |			      66us |
             |			    	   |
       262us |			     262us |
             |			    	   |
         1ms |			       1ms |
             |			    	   |
         4ms |			       4ms |
             |			    	   |
        17ms |			      17ms |
             |			    	   |
      Min update: 3141ns		    Min update: 3562ns
      Max update: 126450ns		    Max update: 73354ns
      Average update: 16373ns		    Average update: 15153ns
      Overruns > 250us: 0		    Overruns > 250us: 0
      
      Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-8-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      120542e2
    • V
      drm/i915: Split pre-skl primary plane update into noarm+arm pair · 4d0d77de
      Ville Syrjälä 提交于
      Chop i9xx_plane_update() into two halves. Fist half becomes
      the _noarm() variant, second part the _arm() variant.
      
      Fortunately I have already previously grouped the register
      writes into roughtly the correct order, so the split looks
      surprisingly clean.
      
      One slightly surprising fact was that the CHV pipe B PRIMPOS/SIZE
      registers are self arming unlike their pre-ctg DSPPOS/SIZE
      counterparts. In fact all the new CHV pipe B registers are
      self arming.
      
      Also we must remind ourselves that i830/i845 are a bit borked
      in that all of their plane registers are self-arming.
      
      I didn't do any i915_update_info measurements for this one
      alone. I'll get total numbers with the corrsponding sprite
      plane changes.
      
      v2: Don't break my precious i830/i845
      
      Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020212757.13517-1-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      4d0d77de
    • V
      drm/i915: Split skl+ plane update into noarm+arm pair · 890b6ec4
      Ville Syrjälä 提交于
      Chop skl_program_plane() into two halves. Fist half becomes
      the _noarm() variant, second part the _arm() variant.
      
      Fortunately I have already previously grouped the register
      writes into roughtly the correct order, so the split looks
      surprisingly clean.
      
      A few notable oddities I did not realize were self arming
      are AUX_DIST and COLOR_CTL.
      
      i915_update_info doesn't look too terrible on my cfl running
      kms_atomic_transition --r plane-all-transition --extended:
      w/o patch                           w/ patch
      Updates: 2178                       Updates: 2018
             |                                   |
         1us |                               1us |
             |                                   |
         4us |                               4us |*****
             |*********                          |**********
        16us |**********                    16us |*******
             |***                                |
        66us |                              66us |
             |                                   |
       262us |                             262us |
             |                                   |
         1ms |                               1ms |
             |                                   |
         4ms |                               4ms |
             |                                   |
        17ms |                              17ms |
             |                                   |
      Min update: 8332ns                  Min update: 6164ns
      Max update: 48758ns                 Max update: 31808ns
      Average update: 19959ns             Average update: 13159ns
      Overruns > 100us: 0                 Overruns > 100us: 0
      
      And with lockdep enabled:
      w/o patch                           w/ patch
      Updates: 2177			    Updates: 2172
             |			    	   |
         1us |			       1us |
             |			    	   |
         4us |			       4us |
             |*******			    	   |*********
        16us |**********		      16us |**********
             |*******			    	   |*
        66us |			      66us |
             |			    	   |
       262us |			     262us |
             |			    	   |
         1ms |			       1ms |
             |			    	   |
         4ms |			       4ms |
             |			    	   |
        17ms |			      17ms |
             |			    	   |
      Min update: 12645ns		    Min update: 9980ns
      Max update: 50153ns		    Max update: 33533ns
      Average update: 25337ns		    Average update: 18245ns
      Overruns > 250us: 0		    Overruns > 250us: 0
      
      TODO: On icl+ everything seems to be armed by PLANE_SURF, so we
            can optimize this even further on modern platforms. But I
            think there's a bit of refactoring to be done first to
            figure out the best way to go about it (eg. just reusing
            the current skl+ functions, or doing a lower level split).
      
      TODO: Split scaler programming as well, but IIRC the scaler
            has some oddball double buffering behaviour on some
            platforms, so needs proper reverse engineering
      
      Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-6-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      890b6ec4
    • V
      drm/i915: Split update_plane() into update_noarm() + update_arm() · 8ac80733
      Ville Syrjälä 提交于
      The amount of plane registers we have to write has been steadily
      increasing, putting more pressure on the vblank evasion mechanism
      and forcing us to increase its time budget. Let's try to take some
      of the pressure off by splitting plane updates into two parts:
      1) write all non-self arming plane registers, ie. the registers
         where the write actually does nothing until a separate arming
         register is also written which will cause the hardware to latch
         the new register values at the next start of vblank
      2) write all self arming plane registers, ie. registers which always
         just latch at the next start of vblank, and registers which also
         arm other registers to do so
      
      Here we just provide the mechanism, but don't actually implement
      the split on any platform yet. so everything stays now in the _arm()
      hooks. Subsequently we can move a whole bunch of stuff into the
      _noarm() part, especially in more modern platforms where the number
      of registers we have to write is also the greatest. On older
      platforms this is less beneficial probably, but no real reason
      to deviate from a common behaviour.
      
      And let's sprinkle some TODOs around the areas that will need
      adapting.
      
      Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-5-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      8ac80733
    • V
      drm/i915: Fix up the sprite namespacing · e56b80d9
      Ville Syrjälä 提交于
      Give all sprite exclusive functions/etc. a proper namespace.
      
      Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-4-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      e56b80d9
    • V
      drm/i915: Fix async flip with decryption and/or DPT · 50faf7a1
      Ville Syrjälä 提交于
      We're currently forgetting to set the PLANE_SURF_DECRYPT
      flag in the async flip path. So if the hardware were to
      latch that bit despite this being an async flip we'd start
      scanning out garbage. And if it doesn't latch it then I
      guess we'd just end up with a weird register value that
      doesn't actually match the hardware state, which isn't
      great for anyone staring at register dumps.
      
      Similarly the async flip path also forgets to call
      skl_surf_address() which means the DPT address space to
      GGTT address space downshift is not being applied to
      the offset. Which means we are pointing PLANE_SURF
      at some random location in GGTT instead of the correct
      DPT page.
      
      So let's fix two birds with one stone and extract the
      PLANE_SURF calculation from skl_program_plane() into
      a small helper and use it in the async flip path as well.
      
      Cc: Anshuman Gupta <anshuman.gupta@intel.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Juston Li <juston.li@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Uma Shankar <uma.shankar@intel.com>
      Cc: Karthik B S <karthik.b.s@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-3-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      50faf7a1
    • V
      drm/i915: Reject planar formats when doing async flips · aaec72ee
      Ville Syrjälä 提交于
      Async flips are only capable of changing PLANE_SURF, hence we
      they can't easily be used with planar formats.
      
      Older platforms could require updating AUX_DIST as well, which
      is not possible. We'd have to make sure AUX_DIST doesn't change
      before allowing the async flip through. If we could get async
      flips with CCS then that might be interesting, but since the hw
      doesn't allow async flips with CCS I don't see much point in
      allowing this for planar formats either. No one renders their
      game content in YUV anyway.
      
      icl+ could in theory do this I suppose since each color plane
      has its own PLANE_SURF register, but I don't know if there is
      some magic to guarantee that both the Y and UV plane would
      async flip synchronously if you will. Ie. beyond just a clean
      tear we'd potentially get some kind of weird tear with some
      random mix of luma and chroma from the old and new frames.
      
      So let's just say no to async flips when scanning out planar
      formats.
      
      Cc: Karthik B S <karthik.b.s@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-2-ville.syrjala@linux.intel.comReviewed-by: NStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      aaec72ee