1. 06 11月, 2017 11 次提交
    • C
      powerpc/powernv: Add OPAL_BUSY to opal_error_code() · 77adbd22
      Cyril Bur 提交于
      Also export opal_error_code() so that it can be used in modules
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      77adbd22
    • C
      powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async · 9aab2449
      Cyril Bur 提交于
      This patch adds an _interruptible version of opal_async_wait_response().
      This is useful when a long running OPAL call is performed on behalf of
      a userspace thread, for example, the opal_flash_{read,write,erase}
      functions performed by the powernv-flash MTD driver.
      
      It is foreseeable that these functions would take upwards of two
      minutes causing the wait_event() to block long enough to cause hung
      task warnings. Furthermore, wait_event_interruptible() is preferable
      as otherwise there is no way for signals to stop the process which is
      going to be confusing in userspace.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9aab2449
    • S
      powernv/opal-sensor: remove not needed lock · 95e1bc1d
      Stewart Smith 提交于
      Parallel sensor reads could run out of async tokens due to
      opal_get_sensor_data grabbing tokens but then doing the sensor
      read behind a mutex, essentially serializing the (possibly
      asynchronous and relatively slow) sensor read.
      
      It turns out that the mutex isn't needed at all, not only
      should the OPAL interface allow concurrent reads, the implementation
      is certainly safe for that, and if any sensor we were reading
      from somewhere isn't, doing the mutual exclusion in the kernel
      is the wrong place to do it, OPAL should be doing it for the kernel.
      
      So, remove the mutex.
      
      Additionally, we shouldn't be printing out an error when we don't
      get a token as the only way this should happen is if we've been
      interrupted in down_interruptible() on the semaphore.
      Reported-by: NRobert Lippert <rlippert@google.com>
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      95e1bc1d
    • C
      powerpc/opal: Rework the opal-async interface · 86cd6d98
      Cyril Bur 提交于
      Future work will add an opal_async_wait_response_interruptible()
      which will call wait_event_interruptible(). This work requires extra
      token state to be tracked as wait_event_interruptible() can return and
      the caller could release the token before OPAL responds.
      
      Currently token state is tracked with two bitfields which are 64 bits
      big but may not need to be as OPAL informs Linux how many async tokens
      there are. It also uses an array indexed by token to store response
      messages for each token.
      
      The bitfields make it difficult to add more state and also provide a
      hard maximum as to how many tokens there can be - it is possible that
      OPAL will inform Linux that there are more than 64 tokens.
      
      Rather than add a bitfield to track the extra state, rework the
      internals slightly.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Fix __opal_async_get_token() when no tokens are free]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      86cd6d98
    • C
      powerpc/opal: Make __opal_async_{get, release}_token() static · 59cf9a1c
      Cyril Bur 提交于
      There are no callers of both __opal_async_get_token() and
      __opal_async_release_token().
      
      This patch also removes the possibility of "emergency through
      synchronous call to __opal_async_get_token()" as such it makes more
      sense to initialise opal_sync_sem for the maximum number of async
      tokens.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      59cf9a1c
    • W
      powerpc/opal: Fix EBUSY bug in acquiring tokens · 71e24d77
      William A. Kennington III 提交于
      The current code checks the completion map to look for the first token
      that is complete. In some cases, a completion can come in but the
      token can still be on lease to the caller processing the completion.
      If this completed but unreleased token is the first token found in the
      bitmap by another tasks trying to acquire a token, then the
      __test_and_set_bit call will fail since the token will still be on
      lease. The acquisition will then fail with an EBUSY.
      
      This patch reorganizes the acquisition code to look at the
      opal_async_token_map for an unleased token. If the token has no lease
      it must have no outstanding completions so we should never see an
      EBUSY, unless we have leased out too many tokens. Since
      opal_async_get_token_inrerruptible is protected by a semaphore, we
      will practically never see EBUSY anymore.
      
      Fixes: 8d724823 ("powerpc/powernv: Infrastructure to support OPAL async completion")
      Signed-off-by: NWilliam A. Kennington III <wak@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      71e24d77
    • M
      powerpc/mm: Add a CONFIG option to choose if radix is used by default · 1fd6c022
      Michael Ellerman 提交于
      Currently if the hardware supports the radix MMU we will use
      it, *unless* "disable_radix" is passed on the kernel command line.
      
      However some users would like the reverse semantics. ie. The kernel
      uses the hash MMU by default, unless radix is explicitly requested on
      the command line.
      
      So add a CONFIG option to choose whether we use radix by default or
      not, and expand the disable_radix command line option to allow
      "disable_radix=no" which *enables* radix.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1fd6c022
    • M
      powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747
      Michael Ellerman 提交于
      CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
      on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
      found in "server" processors, from IBM mainly.
      
      Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
      annoying to have two symbols that always have the same value, it's not
      quite annoying enough to bother removing one.
      
      However with the arrival of Power9, we now have the situation where
      CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
      Radix MMU - *not* the "standard" MMU. So it is now actively confusing
      to use it, because it implies that code is disabled or inactive when
      the Radix MMU is in use, however that is not necessarily true.
      
      So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
      formatting updates of some of the affected lines.
      
      This will be a pain for backports, but c'est la vie.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4e003747
    • A
      powerpc/powernv: Reserve a hole which appears after enabling IOV · d6f934fd
      Alexey Kardashevskiy 提交于
      In order to make generic IOV code work, the physical function IOV BAR
      should start from offset of the first VF. Since M64 segments share
      PE number space across PHB, and some PEs may be in use at the time
      when IOV is enabled, the existing code shifts the IOV BAR to the index
      of the first PE/VF. This creates a hole in IOMEM space which can be
      potentially taken by some other device.
      
      This reserves a temporary hole on a parent and releases it when IOV is
      disabled; the temporary resources are stored in pci_dn to avoid
      kmalloc/free.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d6f934fd
    • T
      powerpc/pseries/vio: Dispose of virq mapping on vdevice unregister · b8f89fea
      Tyrel Datwyler 提交于
      When a vdevice is DLPAR removed from the system the vio subsystem
      doesn't bother unmapping the virq from the irq_domain. As a result we
      have a virq mapped to a hardware irq that is no longer valid for the
      irq_domain. A side effect is that we are left with /proc/irq/<irq#>
      affinity entries, and attempts to modify the smp_affinity of the irq
      will fail.
      
      In the following observed example the kernel log is spammed by
      ics_rtas_set_affinity errors after the removal of a VSCSI adapter.
      This is a result of irqbalance trying to adjust the affinity every 10
      seconds.
      
        rpadlpar_io: slot U8408.E8E.10A7ACV-V5-C25 removed
        ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
        ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
      
      This patch fixes the issue by calling irq_dispose_mapping() on the
      virq of the viodev on unregister.
      
      Fixes: f2ab6219 ("powerpc/pseries: Add PFO support to the VIO bus")
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b8f89fea
    • N
      powerpc/powernv: Use FIXUP_ENDIAN_HV in OPAL return · 63c9d8a4
      Nicholas Piggin 提交于
      Close the recoverability gap for OPAL calls by using FIXUP_ENDIAN_HV
      in the return path.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      63c9d8a4
  2. 22 10月, 2017 2 次提交
  3. 21 10月, 2017 1 次提交
    • M
      powerpc/powernv: Enable TM without suspend if possible · 54820530
      Michael Ellerman 提交于
      Some Power9 revisions can run in a mode where TM operates without
      suspended state. If we find ourself on a CPU that might be in this
      mode, we query OPAL to check, and if so we reenable TM in CPU
      features, and enable a new user feature to signal to userspace that we
      are in this mode.
      
      We do not enable the "normal" user feature, PPC_FEATURE2_HTM, but we
      do enable PPC_FEATURE2_HTM_NOSC because that indicates to userspace
      that the kernel will abort transactions on syscall entry, which is
      true regardless of the suspend mode.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      54820530
  4. 16 10月, 2017 1 次提交
  5. 06 10月, 2017 1 次提交
    • M
      powerpc/powernv: Make opal_event_shutdown() callable from IRQ context · c6baa077
      Michael Ellerman 提交于
      In opal_event_shutdown() we free all the IRQs hanging off the
      opal_event_irqchip. However it's not safe to do so if we're called
      from IRQ context, because free_irq() wants to synchronise versus IRQ
      context. This can lead to warnings and a stuck system.
      
      For example from sysrq-b:
      
        Trying to free IRQ 17 from IRQ context!
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1461 __free_irq+0x398/0x8d0
        ...
        NIP __free_irq+0x398/0x8d0
        LR __free_irq+0x394/0x8d0
        Call Trace:
          __free_irq+0x394/0x8d0 (unreliable)
          free_irq+0xa4/0x140
          opal_event_shutdown+0x128/0x180
          opal_shutdown+0x1c/0xb0
          pnv_shutdown+0x20/0x40
          machine_restart+0x38/0x90
          emergency_restart+0x28/0x40
          sysrq_handle_reboot+0x24/0x40
          __handle_sysrq+0x198/0x590
          hvc_poll+0x48c/0x8c0
          hvc_handle_interrupt+0x1c/0x50
          __handle_irq_event_percpu+0xe8/0x6e0
          handle_irq_event_percpu+0x34/0xe0
          handle_irq_event+0xc4/0x210
          handle_level_irq+0x250/0x770
          generic_handle_irq+0x5c/0xa0
          opal_handle_events+0x11c/0x240
          opal_interrupt+0x38/0x50
          __handle_irq_event_percpu+0xe8/0x6e0
          handle_irq_event_percpu+0x34/0xe0
          handle_irq_event+0xc4/0x210
          handle_fasteoi_irq+0x174/0xa10
          generic_handle_irq+0x5c/0xa0
          __do_irq+0xbc/0x4e0
          call_do_irq+0x14/0x24
          do_IRQ+0x18c/0x540
          hardware_interrupt_common+0x158/0x180
      
      We can avoid that by using disable_irq_nosync() rather than
      free_irq(). Although it doesn't fully free the IRQ, it should be
      sufficient when we're shutting down, particularly in an emergency.
      
      Add an in_interrupt() check and use free_irq() when we're shutting
      down normally. It's probably OK to use disable_irq_nosync() in that
      case too, but for now it's safer to leave that behaviour as-is.
      
      Fixes: 9f0fd049 ("powerpc/powernv: Add a virtual irqchip for opal events")
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c6baa077
  6. 04 10月, 2017 2 次提交
    • A
      powerpc/powermac: Use setup_timer() helper · 01451ad4
      Allen Pais 提交于
      Use setup_timer function instead of initializing timer with the
      function and data fields.
      Signed-off-by: NAllen Pais <allen.lkml@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      01451ad4
    • N
      powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET · e36d0a2e
      Nicholas Piggin 提交于
      This allows MSR[EE]=0 lockups to be detected on an OPAL (bare metal)
      system similarly to the hcall NMI IPI on pseries guests, when the
      platform/firmware supports it.
      
      This is an example of CPU10 spinning with interrupts hard disabled:
      
        Watchdog CPU:32 detected Hard LOCKUP other CPUS:10
        Watchdog CPU:10 Hard LOCKUP
        CPU: 10 PID: 4410 Comm: bash Not tainted 4.13.0-rc7-00074-ge89ce1f8-dirty #34
        task: c0000003a82b4400 task.stack: c0000003af55c000
        NIP: c0000000000a7b38 LR: c000000000659044 CTR: c0000000000a7b00
        REGS: c00000000fd23d80 TRAP: 0100   Not tainted  (4.13.0-rc7-00074-ge89ce1f8-dirty)
        MSR: 90000000000c1033 <SF,HV,ME,IR,DR,RI,LE>
        CR: 28422222  XER: 20000000
        CFAR: c0000000000a7b38 SOFTE: 0
        GPR00: c000000000659044 c0000003af55fbb0 c000000001072a00 0000000000000078
        GPR04: c0000003c81b5c80 c0000003c81cc7e8 9000000000009033 0000000000000000
        GPR08: 0000000000000000 c0000000000a7b00 0000000000000001 9000000000001003
        GPR12: c0000000000a7b00 c00000000fd83200 0000000010180df8 0000000010189e60
        GPR16: 0000000010189ed8 0000000010151270 000000001018bd88 000000001018de78
        GPR20: 00000000370a0668 0000000000000001 00000000101645e0 0000000010163c10
        GPR24: 00007fffd14d6294 00007fffd14d6290 c000000000fba6f0 0000000000000004
        GPR28: c000000000f351d8 0000000000000078 c000000000f4095c 0000000000000000
        NIP [c0000000000a7b38] sysrq_handle_xmon+0x38/0x40
        LR [c000000000659044] __handle_sysrq+0xe4/0x270
        Call Trace:
        [c0000003af55fbd0] [c000000000659044] __handle_sysrq+0xe4/0x270
        [c0000003af55fc70] [c000000000659810] write_sysrq_trigger+0x70/0xa0
        [c0000003af55fca0] [c0000000003da650] proc_reg_write+0xb0/0x110
        [c0000003af55fcf0] [c0000000003423bc] __vfs_write+0x6c/0x1b0
        [c0000003af55fd90] [c000000000344398] vfs_write+0xd8/0x240
        [c0000003af55fde0] [c00000000034632c] SyS_write+0x6c/0x110
        [c0000003af55fe30] [c00000000000b220] system_call+0x58/0x6c
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Use kernel types for opal_signal_system_reset()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e36d0a2e
  7. 27 9月, 2017 1 次提交
    • M
      powerpc/64s: Add workaround for P9 vector CI load issue · 5080332c
      Michael Neuling 提交于
      POWER9 DD2.1 and earlier has an issue where some cache inhibited
      vector load will return bad data. The workaround is two part, one
      firmware/microcode part triggers HMI interrupts when hitting such
      loads, the other part is this patch which then emulates the
      instructions in Linux.
      
      The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
      lxvh8x.
      
      When an instruction triggers the HMI, all threads in the core will be
      sent to the HMI handler, not just the one running the vector load.
      
      In general, these spurious HMIs are detected by the emulation code and
      we just return back to the running process. Unfortunately, if a
      spurious interrupt occurs on a vector load that's to normal memory we
      have no way to detect that it's spurious (unless we walk the page
      tables, which is very expensive). In this case we emulate the load but
      we need do so using a vector load itself to ensure 128bit atomicity is
      preserved.
      
      Some additional debugfs emulated instruction counters are added also.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5080332c
  8. 26 9月, 2017 1 次提交
    • B
      powerpc/powernv: Rework EEH initialization on powernv · b9fde58d
      Benjamin Herrenschmidt 提交于
      Remove the post_init callback which is only used
      by powernv, we can just call it explicitly from
      the powernv code.
      
      This partially kills the ability to "disable" eeh at
      runtime via debugfs as this was calling that same
      callback again, but this is both unused and broken
      in several ways. If we want to revive it, we need
      to create a dedicated enable/disable callback on the
      backend that does the right thing.
      
      Let the bulk of eeh initialize normally at
      core_initcall() like it does on pseries by removing
      the hack in eeh_init() that delays it.
      
      Instead we make sure our eeh->probe cleanly bails
      out of the PEs haven't been created yet and we force
      a re-probe where we used to call eeh_init() again.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b9fde58d
  9. 21 9月, 2017 2 次提交
    • T
      powerpc/pseries: Fix parent_dn reference leak in add_dt_node() · b537ca6f
      Tyrel Datwyler 提交于
      A reference to the parent device node is held by add_dt_node() for the
      node to be added. If the call to dlpar_configure_connector() fails
      add_dt_node() returns ENOENT and that reference is not freed.
      
      Add a call to of_node_put(parent_dn) prior to bailing out after a
      failed dlpar_configure_connector() call.
      
      Fixes: 8d5ff320 ("powerpc/pseries: Make dlpar_configure_connector parent node aware")
      Cc: stable@vger.kernel.org # v3.12+
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b537ca6f
    • T
      powerpc/pseries: Fix "OF: ERROR: Bad of_node_put() on /cpus" during DLPAR · 087ff6a5
      Tyrel Datwyler 提交于
      Commit 215ee763 ("powerpc: pseries: remove dlpar_attach_node
      dependency on full path") reworked dlpar_attach_node() to no longer
      look up the parent node "/cpus", but instead to have the parent node
      passed by the caller in the function parameter list.
      
      As a result dlpar_attach_node() is no longer responsible for freeing
      the reference to the parent node. However, commit 215ee763 failed
      to remove the of_node_put(parent) call in dlpar_attach_node(), or to
      take into account that the reference to the parent in the caller
      dlpar_cpu_add() needs to be held until after dlpar_attach_node()
      returns.
      
      As a result doing repeated cpu add/remove dlpar operations will
      eventually result in the following error:
      
        OF: ERROR: Bad of_node_put() on /cpus
        CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1
        Call Trace:
         dump_stack+0x15c/0x1f8 (unreliable)
         of_node_release+0x1a4/0x1c0
         kobject_put+0x1a8/0x310
         kobject_del+0xbc/0xf0
         __of_detach_node_sysfs+0x144/0x210
         of_detach_node+0xf0/0x180
         dlpar_detach_node+0xc4/0x120
         dlpar_cpu_remove+0x280/0x560
         dlpar_cpu_release+0xbc/0x1b0
         arch_cpu_release+0x6c/0xb0
         cpu_release_store+0xa0/0x100
         dev_attr_store+0x68/0xa0
         sysfs_kf_write+0xa8/0xf0
         kernfs_fop_write+0x2cc/0x400
         __vfs_write+0x5c/0x340
         vfs_write+0x1a8/0x3d0
         SyS_write+0xa8/0x1a0
         system_call+0x58/0x6c
      
      Fix the issue by removing the of_node_put(parent) call from
      dlpar_attach_node(), and ensuring that the reference to the parent
      node is properly held and released by the caller dlpar_cpu_add().
      
      Fixes: 215ee763 ("powerpc: pseries: remove dlpar_attach_node dependency on full path")
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      [mpe: Add a comment in the code and frob the change log slightly]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      087ff6a5
  10. 20 9月, 2017 1 次提交
  11. 14 9月, 2017 1 次提交
    • M
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko 提交于
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
  12. 02 9月, 2017 1 次提交
    • C
      powerpc/xive: guest exploitation of the XIVE interrupt controller · eac1e731
      Cédric Le Goater 提交于
      This is the framework for using XIVE in a PowerVM guest. The support
      is very similar to the native one in a much simpler form.
      
      Each source is associated with an Event State Buffer (ESB). This is a
      two bit state machine which is used to trigger events. The bits are
      named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
      The Guest OS registers event (or notifications) queues on which the HW
      will post event data for a target to notify.
      
      Instead of OPAL calls, a set of Hypervisors call are used to configure
      the interrupt sources and the event/notification queues of the guest:
      
       - H_INT_GET_SOURCE_INFO
      
         used to obtain the address of the MMIO page of the Event State
         Buffer (PQ bits) entry associated with the source.
      
       - H_INT_SET_SOURCE_CONFIG
      
         assigns a source to a "target".
      
       - H_INT_GET_SOURCE_CONFIG
      
         determines to which "target" and "priority" is assigned to a source
      
       - H_INT_GET_QUEUE_INFO
      
         returns the address of the notification management page associated
         with the specified "target" and "priority".
      
       - H_INT_SET_QUEUE_CONFIG
      
         sets or resets the event queue for a given "target" and "priority".
         It is also used to set the notification config associated with the
         queue, only unconditional notification for the moment.  Reset is
         performed with a queue size of 0 and queueing is disabled in that
         case.
      
       - H_INT_GET_QUEUE_CONFIG
      
         returns the queue settings for a given "target" and "priority".
      
       - H_INT_RESET
      
         resets all of the partition's interrupt exploitation structures to
         their initial state, losing all configuration set via the hcalls
         H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
      
       - H_INT_SYNC
      
         issue a synchronisation on a source to make sure sure all
         notifications have reached their queue.
      
      As for XICS, the XIVE interface for the guest is described in the
      device tree under the "interrupt-controller" node. A couple of new
      properties are specific to XIVE :
      
       - "reg"
      
         contains the base address and size of the thread interrupt
         managnement areas (TIMA), also called rings, for the User level and
         for the Guest OS level. Only the Guest OS level is taken into
         account today.
      
       - "ibm,xive-eq-sizes"
      
         the size of the event queues. One cell per size supported, contains
         log2 of size, in ascending order.
      
       - "ibm,xive-lisn-ranges"
      
         the interrupt numbers ranges assigned to the guest. These are
         allocated using a simple bitmap.
      
      and also :
      
       - "/ibm,plat-res-int-priorities"
      
         contains a list of priorities that the hypervisor has reserved for
         its own use.
      
      Tested with a QEMU XIVE model for pseries and with the Power hypervisor.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eac1e731
  13. 01 9月, 2017 2 次提交
  14. 31 8月, 2017 13 次提交