1. 25 6月, 2016 2 次提交
    • M
      x86: get rid of superfluous __GFP_REPEAT · a3a9a59d
      Michal Hocko 提交于
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.
      
      PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
      flag is for more than order-0.  This means that this flag has never been
      actually useful here because it has always been used only for
      PAGE_ALLOC_COSTLY requests.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-3-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3a9a59d
    • M
      tree wide: get rid of __GFP_REPEAT for order-0 allocations part I · 32d6bd90
      Michal Hocko 提交于
      This is the third version of the patchset previously sent [1].  I have
      basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
      rid of superfluous gfp flags" which went through dm tree.  I am sending
      it now because it is tree wide and chances for conflicts are reduced
      considerably when we want to target rc2.  I plan to send the next step
      and rename the flag and move to a better semantic later during this
      release cycle so we will have a new semantic ready for 4.8 merge window
      hopefully.
      
      Motivation:
      
      While working on something unrelated I've checked the current usage of
      __GFP_REPEAT in the tree.  It seems that a majority of the usage is and
      always has been bogus because __GFP_REPEAT has always been about costly
      high order allocations while we are using it for order-0 or very small
      orders very often.  It seems that a big pile of them is just a
      copy&paste when a code has been adopted from one arch to another.
      
      I think it makes some sense to get rid of them because they are just
      making the semantic more unclear.  Please note that GFP_REPEAT is
      documented as
      
      * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
      
      * _might_ fail.  This depends upon the particular VM implementation.
        while !costly requests have basically nofail semantic.  So one could
        reasonably expect that order-0 request with __GFP_REPEAT will not loop
        for ever.  This is not implemented right now though.
      
      I would like to move on with __GFP_REPEAT and define a better semantic
      for it.
      
        $ git grep __GFP_REPEAT origin/master | wc -l
        111
        $ git grep __GFP_REPEAT | wc -l
        36
      
      So we are down to the third after this patch series.  The remaining
      places really seem to be relying on __GFP_REPEAT due to large allocation
      requests.  This still needs some double checking which I will do later
      after all the simple ones are sorted out.
      
      I am touching a lot of arch specific code here and I hope I got it right
      but as a matter of fact I even didn't compile test for some archs as I
      do not have cross compiler for them.  Patches should be quite trivial to
      review for stupid compile mistakes though.  The tricky parts are usually
      hidden by macro definitions and thats where I would appreciate help from
      arch maintainers.
      
      [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
      
      This patch (of 19):
      
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.  Yet we
      have the full kernel tree with its usage for apparently order-0
      allocations.  This is really confusing because __GFP_REPEAT is
      explicitly documented to allow allocation failures which is a weaker
      semantic than the current order-0 has (basically nofail).
      
      Let's simply drop __GFP_REPEAT from those places.  This would allow to
      identify place which really need allocator to retry harder and formulate
      a more specific semantic for what the flag is supposed to do actually.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d6bd90
  2. 24 6月, 2016 1 次提交
  3. 19 6月, 2016 1 次提交
    • L
      ARM: dts: STi: stih407-family: Disable reserved-memory co-processor nodes · 0e289e53
      Lee Jones 提交于
      This patch fixes a non-booting issue in Mainline.
      
      When booting with a compressed kernel, we need to be careful how we
      populate memory close to DDR start.  AUTO_ZRELADDR is enabled by default
      in multi-arch enabled configurations, which place some restrictions on
      where the kernel is placed and where it will be uncompressed to on boot.
      
      AUTO_ZRELADDR takes the decompressor code's start address and masks out
      the bottom 28 bits to obtain an address to uncompress the kernel to
      (thus a load address of 0x42000000 means that the kernel will be
      uncompressed to 0x40000000 i.e. DDR START on this platform).
      
      Even changing the load address to after the co-processor's shared memory
      won't render a booting platform, since the AUTO_ZRELADDR algorithm still
      ensures the kernel is uncompressed into memory shared with the first
      co-processor (0x40000000).
      
      Another option would be to move loading to 0x4A000000, since this will
      mean the decompressor will decompress the kernel to 0x48000000. However,
      this would mean a large chunk (0x44000000 => 0x48000000 (64MB)) of
      memory would essentially be wasted for no good reason.
      
      Until we can work with ST to find a suitable memory location to
      relocate co-processor shared memory, let's disable the shared memory
      nodes.  This will ensure a working platform in the mean time.
      
      NB: The more observant of you will notice that we're leaving the DMU
      shared memory node enabled; this is because a) it is the only one in
      active use at the time of this writing and b) it is not affected by
      the current default behaviour which is causing issues.
      
      Fixes: fe135c63 (ARM: dts: STiH407: Move over to using the 'reserved-memory' API for obtaining DMA memory)
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Reviewed-by Peter Griffin <peter.griffin@linaro.org>
      Signed-off-by: NMaxime Coquelin <maxime.coquelin@st.com>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      0e289e53
  4. 18 6月, 2016 1 次提交
    • W
      isa: Allow ISA-style drivers on modern systems · 3a495511
      William Breathitt Gray 提交于
      Several modern devices, such as PC/104 cards, are expected to run on
      modern systems via an ISA bus interface. Since ISA is a legacy interface
      for most modern architectures, ISA support should remain disabled in
      general. Support for ISA-style drivers should be enabled on a per driver
      basis.
      
      To allow ISA-style drivers on modern systems, this patch introduces the
      ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
      build conditionally on the ISA_BUS_API Kconfig option, which defaults to
      the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
      ISA_BUS_API Kconfig option to be selected on architectures which do not
      enable ISA (e.g. X86_64).
      
      The ISA_BUS Kconfig option is currently only implemented for X86
      architectures. Other architectures may have their own ISA_BUS Kconfig
      options added as required.
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NWilliam Breathitt Gray <vilhelm.gray@gmail.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a495511
  5. 17 6月, 2016 4 次提交
  6. 16 6月, 2016 4 次提交
  7. 15 6月, 2016 4 次提交
    • W
      arm64: spinlock: Ensure forward-progress in spin_unlock_wait · c56bdcac
      Will Deacon 提交于
      Rather than wait until we observe the lock being free (which might never
      happen), we can also return from spin_unlock_wait if we observe that the
      lock is now held by somebody else, which implies that it was unlocked
      but we just missed seeing it in that state.
      
      Furthermore, in such a scenario there is no longer a need to write back
      the value that we loaded, since we know that there has been a lock
      hand-off, which is sufficient to publish any stores prior to the
      unlock_wait because the ARm architecture ensures that a Store-Release
      instruction is multi-copy atomic when observed by a Load-Acquire
      instruction.
      
      The litmus test is something like:
      
      AArch64
      {
      0:X1=x; 0:X3=y;
      1:X1=y;
      2:X1=y; 2:X3=x;
      }
       P0          | P1           | P2           ;
       MOV W0,#1   | MOV W0,#1    | LDAR W0,[X1] ;
       STR W0,[X1] | STLR W0,[X1] | LDR W2,[X3]  ;
       DMB SY      |              |              ;
       LDR W2,[X3] |              |              ;
      exists
      (0:X2=0 /\ 2:X0=1 /\ 2:X2=0)
      
      where P0 is doing spin_unlock_wait, P1 is doing spin_unlock and P2 is
      doing spin_lock.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c56bdcac
    • W
      arm64: spinlock: fix spin_unlock_wait for LSE atomics · 3a5facd0
      Will Deacon 提交于
      Commit d86b8da0 ("arm64: spinlock: serialise spin_unlock_wait against
      concurrent lockers") fixed spin_unlock_wait for LL/SC-based atomics under
      the premise that the LSE atomics (in particular, the LDADDA instruction)
      are indivisible.
      
      Unfortunately, these instructions are only indivisible when used with the
      -AL (full ordering) suffix and, consequently, the same issue can
      theoretically be observed with LSE atomics, where a later (in program
      order) load can be speculated before the write portion of the atomic
      operation.
      
      This patch fixes the issue by performing a CAS of the lock once we've
      established that it's unlocked, in much the same way as the LL/SC code.
      
      Fixes: d86b8da0 ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3a5facd0
    • W
      arm64: spinlock: order spin_{is_locked,unlock_wait} against local locks · 38b850a7
      Will Deacon 提交于
      spin_is_locked has grown two very different use-cases:
      
      (1) [The sane case] API functions may require a certain lock to be held
          by the caller and can therefore use spin_is_locked as part of an
          assert statement in order to verify that the lock is indeed held.
          For example, usage of assert_spin_locked.
      
      (2) [The insane case] There are two locks, where a CPU takes one of the
          locks and then checks whether or not the other one is held before
          accessing some shared state. For example, the "optimized locking" in
          ipc/sem.c.
      
      In the latter case, the sequence looks like:
      
        spin_lock(&sem->lock);
        if (!spin_is_locked(&sma->sem_perm.lock))
          /* Access shared state */
      
      and requires that the spin_is_locked check is ordered after taking the
      sem->lock. Unfortunately, since our spinlocks are implemented using a
      LDAXR/STXR sequence, the read of &sma->sem_perm.lock can be speculated
      before the STXR and consequently return a stale value.
      
      Whilst this hasn't been seen to cause issues in practice, PowerPC fixed
      the same issue in 51d7d520 ("powerpc: Add smp_mb() to
      arch_spin_is_locked()") and, although we did something similar for
      spin_unlock_wait in d86b8da0 ("arm64: spinlock: serialise
      spin_unlock_wait against concurrent lockers") that doesn't actually take
      care of ordering against local acquisition of a different lock.
      
      This patch adds an smp_mb() to the start of our arch_spin_is_locked and
      arch_spin_unlock_wait routines to ensure that the lock value is always
      loaded after any other locks have been taken by the current CPU.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      38b850a7
    • P
      arm: Use _rcuidle for smp_cross_call() tracepoints · 7c64cc05
      Paul E. McKenney 提交于
      Further testing with false negatives suppressed by commit 293e2421
      ("rcu: Remove superfluous versions of rcu_read_lock_sched_held()")
      identified another unprotected use of RCU from the idle loop.  Because RCU
      actively ignores idle-loop code (for energy-efficiency reasons, among
      other things), using RCU from the idle loop can result in too-short
      grace periods, in turn resulting in arbitrary misbehavior.
      
      The resulting lockdep-RCU splat is as follows:
      
      ------------------------------------------------------------------------
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.6.0-rc5-next-20160426+ #1112 Not tainted
      -------------------------------
      include/trace/events/ipi.h:35 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      RCU used illegally from idle CPU!
      rcu_scheduler_active = 1, debug_locks = 0
      RCU used illegally from extended quiescent state!
      no locks held by swapper/0/0.
      
      stack backtrace:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1112
      Hardware name: Generic OMAP4 (Flattened Device Tree)
      [<c0110308>] (unwind_backtrace) from [<c010c3a8>] (show_stack+0x10/0x14)
      [<c010c3a8>] (show_stack) from [<c047fec8>] (dump_stack+0xb0/0xe4)
      [<c047fec8>] (dump_stack) from [<c010dcfc>] (smp_cross_call+0xbc/0x188)
      [<c010dcfc>] (smp_cross_call) from [<c01c9e28>] (generic_exec_single+0x9c/0x15c)
      [<c01c9e28>] (generic_exec_single) from [<c01ca0a0>] (smp_call_function_single_async+0 x38/0x9c)
      [<c01ca0a0>] (smp_call_function_single_async) from [<c0603728>] (cpuidle_coupled_poke_others+0x8c/0xa8)
      [<c0603728>] (cpuidle_coupled_poke_others) from [<c0603c10>] (cpuidle_enter_state_coupled+0x26c/0x390)
      [<c0603c10>] (cpuidle_enter_state_coupled) from [<c0183c74>] (cpu_startup_entry+0x198/0x3a0)
      [<c0183c74>] (cpu_startup_entry) from [<c0b00c0c>] (start_kernel+0x354/0x3c8)
      [<c0b00c0c>] (start_kernel) from [<8000807c>] (0x8000807c)
      
      ------------------------------------------------------------------------
      Reported-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NTony Lindgren <tony@atomide.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <linux-omap@vger.kernel.org>
      Cc: <linux-arm-kernel@lists.infradead.org>
      7c64cc05
  8. 14 6月, 2016 6 次提交
    • M
      arm64: mm: mark fault_info table const · bbb1681e
      Mark Rutland 提交于
      Unlike the debug_fault_info table, we never intentionally alter the
      fault_info table at runtime, and all derived pointers are treated as
      const currently.
      
      Make the table const so that it can be placed in .rodata and protected
      from unintentional writes, as we do for the syscall tables.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      bbb1681e
    • M
      arm64: fix dump_instr when PAN and UAO are in use · c5cea06b
      Mark Rutland 提交于
      If the kernel is set to show unhandled signals, and a user task does not
      handle a SIGILL as a result of an instruction abort, we will attempt to
      log the offending instruction with dump_instr before killing the task.
      
      We use dump_instr to log the encoding of the offending userspace
      instruction. However, dump_instr is also used to dump instructions from
      kernel space, and internally always switches to KERNEL_DS before dumping
      the instruction with get_user. When both PAN and UAO are in use, reading
      a user instruction via get_user while in KERNEL_DS will result in a
      permission fault, which leads to an Oops.
      
      As we have regs corresponding to the context of the original instruction
      abort, we can inspect this and only flip to KERNEL_DS if the original
      abort was taken from the kernel, avoiding this issue. At the same time,
      remove the redundant (and incorrect) comments regarding the order
      dump_mem and dump_instr are called in.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: <stable@vger.kernel.org> #4.6+
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Tested-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Fixes: 57f4959b ("arm64: kernel: Add support for User Access Override")
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c5cea06b
    • J
      MIPS: KVM: Fix CACHE triggered exception emulation · 6df82a7b
      James Hogan 提交于
      When emulating TLB miss / invalid exceptions during CACHE instruction
      emulation, be sure to set up the correct PC and host_cp0_badvaddr state
      for the kvm_mips_emlulate_tlb*_ld() function to pick up for guest EPC
      and BadVAddr.
      
      PC needs to be rewound otherwise the guest EPC will end up pointing at
      the next instruction after the faulting CACHE instruction.
      
      host_cp0_badvaddr must be set because guest CACHE instructions trap with
      a Coprocessor Unusable exception, which doesn't update the host BadVAddr
      as a TLB exception would.
      
      This doesn't tend to get hit when dynamic translation of emulated
      instructions is enabled, since only the first execution of each CACHE
      instruction actually goes through this code path, with subsequent
      executions hitting the SYNCI instruction that it gets replaced with.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6df82a7b
    • J
      MIPS: KVM: Don't unwind PC when emulating CACHE · cc81e948
      James Hogan 提交于
      When a CACHE instruction is emulated by kvm_mips_emulate_cache(), the PC
      is first updated to point to the next instruction, and afterwards it
      falls through the "dont_update_pc" label, which rewinds the PC back to
      its original address.
      
      This works when dynamic translation of emulated instructions is enabled,
      since the CACHE instruction is replaced with a SYNCI which works without
      trapping, however when dynamic translation is disabled the guest hangs
      on CACHE instructions as they always trap and are never stepped over.
      
      Roughly swap the meanings of the "done" and "dont_update_pc" to match
      kvm_mips_emulate_CP0(), so that "done" will roll back the PC on failure,
      and "dont_update_pc" won't change PC at all (for the sake of exceptions
      that have already modified the PC).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cc81e948
    • J
      MIPS: KVM: Include bit 31 in segment matches · 7f5a1ddc
      James Hogan 提交于
      When faulting guest addresses are matched against guest segments with
      the KVM_GUEST_KSEGX() macro, change the mask to 0xe0000000 so as to
      include bit 31.
      
      This is mainly for safety's sake, as it prevents a rogue BadVAddr in the
      host kseg2/kseg3 segments (e.g. 0xC*******) after a TLB exception from
      matching the guest kseg0 segment (e.g. 0x4*******), triggering an
      internal KVM error instead of allowing the corresponding guest kseg0
      page to be mapped into the host vmalloc space.
      
      Such a rogue BadVAddr was observed to happen with the host MIPS kernel
      running under QEMU with KVM built as a module, due to a not entirely
      transparent optimisation in the QEMU TLB handling. This has already been
      worked around properly in a previous commit.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7f5a1ddc
    • J
      MIPS: KVM: Fix modular KVM under QEMU · 797179bc
      James Hogan 提交于
      Copy __kvm_mips_vcpu_run() into unmapped memory, so that we can never
      get a TLB refill exception in it when KVM is built as a module.
      
      This was observed to happen with the host MIPS kernel running under
      QEMU, due to a not entirely transparent optimisation in the QEMU TLB
      handling where TLB entries replaced with TLBWR are copied to a separate
      part of the TLB array. Code in those pages continue to be executable,
      but those mappings persist only until the next ASID switch, even if they
      are marked global.
      
      An ASID switch happens in __kvm_mips_vcpu_run() at exception level after
      switching to the guest exception base. Subsequent TLB mapped kernel
      instructions just prior to switching to the guest trigger a TLB refill
      exception, which enters the guest exception handlers without updating
      EPC. This appears as a guest triggered TLB refill on a host kernel
      mapped (host KSeg2) address, which is not handled correctly as user
      (guest) mode accesses to kernel (host) segments always generate address
      error exceptions.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.10.x-
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      797179bc
  9. 13 6月, 2016 4 次提交
  10. 10 6月, 2016 7 次提交
  11. 09 6月, 2016 5 次提交
  12. 08 6月, 2016 1 次提交