1. 25 6月, 2016 4 次提交
    • K
      Revert "mm: make faultaround produce old ptes" · 315d09bf
      Kirill A. Shutemov 提交于
      This reverts commit 5c0a85fa.
      
      The commit causes ~6% regression in unixbench.
      
      Let's revert it for now and consider other solution for reclaim problem
      later.
      
      Link: http://lkml.kernel.org/r/1465893750-44080-2-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      315d09bf
    • A
      mm: mempool: kasan: don't poot mempool objects in quarantine · 9b75a867
      Andrey Ryabinin 提交于
      Currently we may put reserved by mempool elements into quarantine via
      kasan_kfree().  This is totally wrong since quarantine may really free
      these objects.  So when mempool will try to use such element,
      use-after-free will happen.  Or mempool may decide that it no longer
      need that element and double-free it.
      
      So don't put object into quarantine in kasan_kfree(), just poison it.
      Rename kasan_kfree() to kasan_poison_kfree() to respect that.
      
      Also, we shouldn't use kasan_slab_alloc()/kasan_krealloc() in
      kasan_unpoison_element() because those functions may update allocation
      stacktrace.  This would be wrong for the most of the remove_element call
      sites.
      
      (The only call site where we may want to update alloc stacktrace is
       in mempool_alloc(). Kmemleak solves this by calling
       kmemleak_update_trace(), so we could make something like that too.
       But this is out of scope of this patch).
      
      Fixes: 55834c59 ("mm: kasan: initial memory quarantine implementation")
      Link: http://lkml.kernel.org/r/575977C3.1010905@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NKuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
      Acked-by: NAlexander Potapenko <glider@google.com>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b75a867
    • L
      fix up initial thread stack pointer vs thread_info confusion · 7f1a00b6
      Linus Torvalds 提交于
      The INIT_TASK() initializer was similarly confused about the stack vs
      thread_info allocation that the allocators had, and that were fixed in
      commit b235beea ("Clarify naming of thread info/stack allocators").
      
      The task ->stack pointer only incidentally ends up having the same value
      as the thread_info, and in fact that will change.
      
      So fix the initial task struct initializer to point to 'init_stack'
      instead of 'init_thread_info', and make sure the ia64 definition for
      that exists.
      
      This actually makes the ia64 tsk->stack pointer be sensible for the
      initial task, but not for any other task.  As mentioned in commit
      b235beea, that whole pointer isn't actually used on ia64, since
      task_stack_page() there just points to the (single) allocation.
      
      All the other architectures seem to have copied the 'init_stack'
      definition, even if it tended to be generally unusued.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f1a00b6
    • L
      Clarify naming of thread info/stack allocators · b235beea
      Linus Torvalds 提交于
      We've had the thread info allocated together with the thread stack for
      most architectures for a long time (since the thread_info was split off
      from the task struct), but that is about to change.
      
      But the patches that move the thread info to be off-stack (and a part of
      the task struct instead) made it clear how confused the allocator and
      freeing functions are.
      
      Because the common case was that we share an allocation with the thread
      stack and the thread_info, the two pointers were identical.  That
      identity then meant that we would have things like
      
      	ti = alloc_thread_info_node(tsk, node);
      	...
      	tsk->stack = ti;
      
      which certainly _worked_ (since stack and thread_info have the same
      value), but is rather confusing: why are we assigning a thread_info to
      the stack? And if we move the thread_info away, the "confusing" code
      just gets to be entirely bogus.
      
      So remove all this confusion, and make it clear that we are doing the
      stack allocation by renaming and clarifying the function names to be
      about the stack.  The fact that the thread_info then shares the
      allocation is an implementation detail, and not really about the
      allocation itself.
      
      This is a pure renaming and type fix: we pass in the same pointer, it's
      just that we clarify what the pointer means.
      
      The ia64 code that actually only has one single allocation (for all of
      task_struct, thread_info and kernel thread stack) now looks a bit odd,
      but since "tsk->stack" is actually not even used there, that oddity
      doesn't matter.  It would be a separate thing to clean that up, I
      intentionally left the ia64 changes as a pure brute-force renaming and
      type change.
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b235beea
  2. 24 6月, 2016 1 次提交
    • P
      locking/static_key: Fix concurrent static_key_slow_inc() · 4c5ea0a9
      Paolo Bonzini 提交于
      The following scenario is possible:
      
          CPU 1                                   CPU 2
          static_key_slow_inc()
           atomic_inc_not_zero()
            -> key.enabled == 0, no increment
           jump_label_lock()
           atomic_inc_return()
            -> key.enabled == 1 now
                                                  static_key_slow_inc()
                                                   atomic_inc_not_zero()
                                                    -> key.enabled == 1, inc to 2
                                                   return
                                                  ** static key is wrong!
           jump_label_update()
           jump_label_unlock()
      
      Testing the static key at the point marked by (**) will follow the
      wrong path for jumps that have not been patched yet.  This can
      actually happen when creating many KVM virtual machines with userspace
      LAPIC emulation; just run several copies of the following program:
      
          #include <fcntl.h>
          #include <unistd.h>
          #include <sys/ioctl.h>
          #include <linux/kvm.h>
      
          int main(void)
          {
              for (;;) {
                  int kvmfd = open("/dev/kvm", O_RDONLY);
                  int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
                  close(ioctl(vmfd, KVM_CREATE_VCPU, 1));
                  close(vmfd);
                  close(kvmfd);
              }
              return 0;
          }
      
      Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call.
      The static key's purpose is to skip NULL pointer checks and indeed one
      of the processes eventually dereferences NULL.
      
      As explained in the commit that introduced the bug:
      
        706249c2 ("locking/static_keys: Rework update logic")
      
      jump_label_update() needs key.enabled to be true.  The solution adopted
      here is to temporarily make key.enabled == -1, and use go down the
      slow path when key.enabled <= 0.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # v4.3+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 706249c2 ("locking/static_keys: Rework update logic")
      Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com
      [ Small stylistic edits to the changelog and the code. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c5ea0a9
  3. 23 6月, 2016 1 次提交
    • E
      IB/mlx5: Fix post send fence logic · c9b25495
      Eli Cohen 提交于
      If the caller specified IB_SEND_FENCE in the send flags of the work
      request and no previous work request stated that the successive one
      should be fenced, the work request would be executed without a fence.
      This could result in RDMA read or atomic operations failure due to a MR
      being invalidated. Fix this by adding the mlx5 enumeration for fencing
      RDMA/atomic operations and fix the logic to apply this.
      
      Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c9b25495
  4. 18 6月, 2016 2 次提交
  5. 15 6月, 2016 2 次提交
  6. 10 6月, 2016 7 次提交
  7. 08 6月, 2016 3 次提交
    • R
      drivers: of: Fix of_pci.h header guard · 5c1d3310
      Robin Murphy 提交于
      The compilation of of_pci.c is governed by CONFIG_OF_PCI, but the
      corresponding declarations in of_pci.h are inconsistently guarded by
      CONFIG_OF, with the result that if CONFIG_PCI is disabled for an OF
      platform, the dangling external declarations are still active and the
      inline stub definitions not. So far this has managed to go unnoticed
      since it happens that the only references to these functions are from
      code which itself depends on CONFIG_PCI or CONFIG_OF_PCI.
      
      Fix this with the appropriate config guard so that any new callers
      outside PCI-specific code don't start unexpectedly breaking under
      certain configs.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      5c1d3310
    • T
      leds: core: Fix brightness setting upon hardware blinking enabled · 7cfe749f
      Tony Makkiel 提交于
      Commit 76931edd ("leds: fix brightness changing when software blinking
      is active") changed the semantics of led_set_brightness() which according
      to the documentation should disable blinking upon any brightness setting.
      Moreover it made it different for soft blink case, where it was possible
      to change blink brightness, and for hardware blink case, where setting
      any brightness greater than 0 was ignored.
      
      While the change itself is against the documentation claims, it was driven
      also by the fact that timer trigger remained active after turning blinking
      off. Fixing that would have required major refactoring in the led-core,
      led-class, and led-triggers because of cyclic dependencies.
      
      Finally, it has been decided that allowing for brightness change during
      blinking is beneficial as it can be accomplished without disturbing
      blink rhythm.
      
      The change in brightness setting semantics will not affect existing
      LED class drivers that implement blink_set op thanks to the LED_BLINK_SW
      flag introduced by this patch. The flag state will be from now on checked
      in led_set_brightness() which will allow to distinguish between software
      and hardware blink mode. In the latter case the control will be passed
      directly to the drivers which apply their semantics on brightness set,
      which is disable the blinking in case of most such drivers. New drivers
      will apply new semantics and just change the brightness while hardware
      blinking is on, if possible.
      
      The issue was smuggled by subsequent LED core improvements, which modified
      the code that originally introduced the problem.
      
      Fixes: f1e80c07 ("leds: core: Add two new LED_BLINK_ flags")
      Signed-off-by: NTony Makkiel <tony.makkiel@daqri.com>
      Signed-off-by: NJacek Anaszewski <j.anaszewski@samsung.com>
      7cfe749f
    • M
      coredump: fix dumping through pipes · 1607f09c
      Mateusz Guzik 提交于
      The offset in the core file used to be tracked with ->written field of
      the coredump_params structure. The field was retired in favour of
      file->f_pos.
      
      However, ->f_pos is not maintained for pipes which leads to breakage.
      
      Restore explicit tracking of the offset in coredump_params. Introduce
      ->pos field for this purpose since ->written was already reused.
      
      Fixes: a0083939 ("get rid of coredump_params->written").
      Reported-by: NZbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
      Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1607f09c
  8. 07 6月, 2016 1 次提交
  9. 06 6月, 2016 1 次提交
    • E
      devpts: Make each mount of devpts an independent filesystem. · eedf265a
      Eric W. Biederman 提交于
      The /dev/ptmx device node is changed to lookup the directory entry "pts"
      in the same directory as the /dev/ptmx device node was opened in.  If
      there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx
      uses that filesystem.  Otherwise the open of /dev/ptmx fails.
      
      The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that
      userspace can now safely depend on each mount of devpts creating a new
      instance of the filesystem.
      
      Each mount of devpts is now a separate and equal filesystem.
      
      Reserved ttys are now available to all instances of devpts where the
      mounter is in the initial mount namespace.
      
      A new vfs helper path_pts is introduced that finds a directory entry
      named "pts" in the directory of the passed in path, and changes the
      passed in path to point to it.  The helper path_pts uses a function
      path_parent_directory that was factored out of follow_dotdot.
      
      In the implementation of devpts:
       - devpts_mnt is killed as it is no longer meaningful if all mounts of
         devpts are equal.
       - pts_sb_from_inode is replaced by just inode->i_sb as all cached
         inodes in the tty layer are now from the devpts filesystem.
       - devpts_add_ref is rolled into the new function devpts_ptmx.  And the
         unnecessary inode hold is removed.
       - devpts_del_ref is renamed devpts_release and reduced to just a
         deacrivate_super.
       - The newinstance mount option continues to be accepted but is now
         ignored.
      
      In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as
      they are never used.
      
      Documentation/filesystems/devices.txt is updated to describe the current
      situation.
      
      This has been verified to work properly on openwrt-15.05, centos5,
      centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3,
      ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1,
      slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01.  With the
      caveat that on centos6 and on slackware-14.1 that there wind up being
      two instances of the devpts filesystem mounted on /dev/pts, the lower
      copy does not end up getting used.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Aurelien Jarno <aurelien@aurel32.net>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Jann Horn <jann@thejh.net>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eedf265a
  10. 04 6月, 2016 1 次提交
  11. 03 6月, 2016 6 次提交
    • K
      of: add missing const for of_parse_phandle_with_args() in !CONFIG_OF · e93aeeae
      Kuninori Morimoto 提交于
      commit 93c667ca
      ("of: *node argument to of_parse_phandle_with_args should be const")
      changed to const for struct device node *np,
      but it cares CONFIG_OF case only, !CONFIG_OF case need it too.
      Signed-off-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      e93aeeae
    • V
      efi: Fix for_each_efi_memory_desc_in_map() for empty memmaps · 55f1ea15
      Vitaly Kuznetsov 提交于
      Commit:
      
        78ce248f ("efi: Iterate over efi.memmap in for_each_efi_memory_desc()")
      
      introduced a regression for systems booted with the 'noefi' kernel option.
      
      In particular, I observed an early kernel hang in efi_find_mirror()'s
      for_each_efi_memory_desc() call. As we don't have efi memmap on this
      system we enter this iterator with the following parameters:
      
        efi.memmap.map = 0, efi.memmap.map_end = 0, efi.memmap.desc_size = 28
      
      ... then for_each_efi_memory_desc_in_map() does the following comparison:
      
        (md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size);
      
      ... where md = 0, (m)->map_end = 0 and (m)->desc_size = 28 but when we subtract
      something from a NULL pointer wrap around happens and we end up returning
      invalid pointer and crash.
      
      Fix it by using the correct pointer arithmetics.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 78ce248f ("efi: Iterate over efi.memmap in for_each_efi_memory_desc()")
      Link: http://lkml.kernel.org/r/1464690224-4503-2-git-send-email-matt@codeblueprint.co.uk
      [ Made the changelog more readable. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      55f1ea15
    • P
      locking/seqcount: Re-fix raw_read_seqcount_latch() · 55eed755
      Peter Zijlstra 提交于
      Commit 50755bc1 ("seqlock: fix raw_read_seqcount_latch()") broke
      raw_read_seqcount_latch().
      
      If you look at the comment that was modified; the thing that changes is
      the seq count, not the latch pointer.
      
       * void latch_modify(struct latch_struct *latch, ...)
       * {
       *	smp_wmb();	<- Ensure that the last data[1] update is visible
       *	latch->seq++;
       *	smp_wmb();	<- Ensure that the seqcount update is visible
       *
       *	modify(latch->data[0], ...);
       *
       *	smp_wmb();	<- Ensure that the data[0] update is visible
       *	latch->seq++;
       *	smp_wmb();	<- Ensure that the seqcount update is visible
       *
       *	modify(latch->data[1], ...);
       * }
       *
       * The query will have a form like:
       *
       * struct entry *latch_query(struct latch_struct *latch, ...)
       * {
       *	struct entry *entry;
       *	unsigned seq, idx;
       *
       *	do {
       *		seq = lockless_dereference(latch->seq);
      
      So here we have:
      
      		seq = READ_ONCE(latch->seq);
      		smp_read_barrier_depends();
      
      Which is exactly what we want; the new code:
      
      		seq = ({ p = READ_ONCE(latch);
      			 smp_read_barrier_depends(); p })->seq;
      
      is just wrong; because it looses the volatile read on seq, which can now
      be torn or worse 'optimized'. And the read_depend barrier is also placed
      wrong, we want it after the load of seq, to match the above data[]
      up-to-date wmb()s.
      
      Such that when we dereference latch->data[] below, we're guaranteed to
      observe the right data.
      
       *
       *		idx = seq & 0x01;
       *		entry = data_query(latch->data[idx], ...);
       *
       *		smp_rmb();
       *	} while (seq != latch->seq);
       *
       *	return entry;
       * }
      
      So yes, not passing a pointer is not pretty, but the code was correct,
      and isn't anymore now.
      
      Change to explicit READ_ONCE()+smp_read_barrier_depends() to avoid
      confusion and allow strict lockless_dereference() checking.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 50755bc1 ("seqlock: fix raw_read_seqcount_latch()")
      Link: http://lkml.kernel.org/r/20160527111117.GL3192@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      55eed755
    • C
      cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE · 9bd616e3
      Catalin Marinas 提交于
      The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is
      enabled. Commit c8cc7d4d ("sched/idle: Reorganize the idle loop")
      removed the #ifdef CONFIG_CPU_IDLE around cpuidle_idle_call() with the
      compiler optimising away __this_cpu_read(cpuidle_devices). However, with
      CONFIG_UBSAN && !CONFIG_CPU_IDLE, this optimisation no longer happens
      and the kernel fails to link since cpuidle_devices is not defined.
      
      This patch introduces an accessor function for the current CPU cpuidle
      device (returning NULL when !CONFIG_CPU_IDLE) and uses it in
      cpuidle_idle_call().
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9bd616e3
    • A
      irqchip/gic-v3: Fix copy+paste mistakes in defines · fab0cdc3
      Andrew Jones 提交于
      ICC_SGI1R_AFFINITY_{2,3}_MASK are unused, which is good
      because they were defined with the wrong shifts.
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      fab0cdc3
    • M
      irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask · dd5f1b04
      Marc Zyngier 提交于
      The INTID mask is wrong, and is made a signed value, which has
      nteresting effects in the KVM emulation. Let's sanitize it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      dd5f1b04
  12. 02 6月, 2016 3 次提交
  13. 01 6月, 2016 6 次提交
  14. 31 5月, 2016 2 次提交