1. 07 7月, 2016 16 次提交
    • A
      timers: Move __run_timers() function · 73420fea
      Anna-Maria Gleixner 提交于
      Move __run_timers() below __next_timer_interrupt() and next_pending_bucket()
      in preparation for __run_timers() NOHZ optimization.
      
      No functional change.
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.271872665@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      73420fea
    • T
      timers: Remove set_timer_slack() leftovers · 53bf837b
      Thomas Gleixner 提交于
      We now have implicit batching in the timer wheel. The slack API is no longer
      used, so remove it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrew F. Davis <afd@ti.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jaehoon Chung <jh80.chung@samsung.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Nyman <mathias.nyman@intel.com>
      Cc: Pali Rohár <pali.rohar@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mmc@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: linux-usb@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      53bf837b
    • T
      timers: Switch to a non-cascading wheel · 500462a9
      Thomas Gleixner 提交于
      The current timer wheel has some drawbacks:
      
      1) Cascading:
      
         Cascading can be an unbound operation and is completely pointless in most
         cases because the vast majority of the timer wheel timers are canceled or
         rearmed before expiration. (They are used as timeout safeguards, not as
         real timers to measure time.)
      
      2) No fast lookup of the next expiring timer:
      
         In NOHZ scenarios the first timer soft interrupt after a long NOHZ period
         must fast forward the base time to the current value of jiffies. As we
         have no way to find the next expiring timer fast, the code loops linearly
         and increments the base time one by one and checks for expired timers
         in each step. This causes unbound overhead spikes exactly in the moment
         when we should wake up as fast as possible.
      
      After a thorough analysis of real world data gathered on laptops,
      workstations, webservers and other machines (thanks Chris!) I came to the
      conclusion that the current 'classic' timer wheel implementation can be
      modified to address the above issues.
      
      The vast majority of timer wheel timers is canceled or rearmed before
      expiry. Most of them are timeouts for networking and other I/O tasks. The
      nature of timeouts is to catch the exception from normal operation (TCP ack
      timed out, disk does not respond, etc.). For these kinds of timeouts the
      accuracy of the timeout is not really a concern. Timeouts are very often
      approximate worst-case values and in case the timeout fires, we already
      waited for a long time and performance is down the drain already.
      
      The few timers which actually expire can be split into two categories:
      
       1) Short expiry times which expect halfways accurate expiry
      
       2) Long term expiry times are inaccurate today already due to the
          batching which is done for NOHZ automatically and also via the
          set_timer_slack() API.
      
      So for long term expiry timers we can avoid the cascading property and just
      leave them in the less granular outer wheels until expiry or
      cancelation. Timers which are armed with a timeout larger than the wheel
      capacity are no longer cascaded. We expire them with the longest possible
      timeout (6+ days). We have not observed such timeouts in our data collection,
      but at least we handle them, applying the rule of the least surprise.
      
      To avoid extending the wheel levels for HZ=1000 so we can accomodate the
      longest observed timeouts (5 days in the network conntrack code) we reduce the
      first level granularity on HZ=1000 to 4ms, which effectively is the same as
      the HZ=250 behaviour. From our data analysis there is nothing which relies on
      that 1ms granularity and as a side effect we get better batching and timer
      locality for the networking code as well.
      
      Contrary to the classic wheel the granularity of the next wheel is not the
      capacity of the first wheel. The granularities of the wheels are in the
      currently chosen setting 8 times the granularity of the previous wheel.
      
      So for HZ=250 we end up with the following granularity levels:
      
       Level Offset   Granularity                  Range
           0      0          4 ms                 0 ms -        252 ms
           1     64         32 ms               256 ms -       2044 ms (256ms - ~2s)
           2    128        256 ms              2048 ms -      16380 ms (~2s   - ~16s)
           3    192       2048 ms (~2s)       16384 ms -     131068 ms (~16s  - ~2m)
           4    256      16384 ms (~16s)     131072 ms -    1048572 ms (~2m   - ~17m)
           5    320     131072 ms (~2m)     1048576 ms -    8388604 ms (~17m  - ~2h)
           6    384    1048576 ms (~17m)    8388608 ms -   67108863 ms (~2h   - ~18h)
           7    448    8388608 ms (~2h)    67108864 ms -  536870911 ms (~18h  - ~6d)
      
      That's a worst case inaccuracy of 12.5% for the timers which are queued at the
      beginning of a level.
      
      So the new wheel concept addresses the old issues:
      
      1) Cascading is avoided completely
      
      2) By keeping the timers in the bucket until expiry/cancelation we can track
         the buckets which have timers enqueued in a bucket bitmap and therefore can
         look up the next expiring timer very fast and O(1).
      
      A further benefit of the concept is that the slack calculation which is done
      on every timer start is no longer necessary because the granularity levels
      provide natural batching already.
      
      Our extensive testing with various loads did not show any performance
      degradation vs. the current wheel implementation.
      
      This patch does not address the 'fast lookup' issue as we wanted to make sure
      that there is no regression introduced by the wheel redesign. The
      optimizations are in follow up patches.
      
      This patch contains fixes from Anna-Maria Gleixner and Richard Cochran.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      500462a9
    • T
      timers: Reduce the CPU index space to 256k · b0d6e2dc
      Thomas Gleixner 提交于
      We want to store the array index in the flags space. 256k CPUs should be
      enough for a while.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.030144293@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b0d6e2dc
    • T
      timers: Give a few structs and members proper names · 494af3ed
      Thomas Gleixner 提交于
      Some of the names in the internal implementation of the timer code
      are not longer correct and others are simply too long to type.
      
      Clean it up before we switch the wheel implementation over to
      the new scheme.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.948752516@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      494af3ed
    • T
      hlist: Add hlist_is_singular_node() helper · 15dba1e3
      Thomas Gleixner 提交于
      Required to figure out whether the entry is the only one in the hlist.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.867631372@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      15dba1e3
    • T
      signals: Use hrtimer for sigtimedwait() · 2b1ecc3d
      Thomas Gleixner 提交于
      We've converted most timeout related syscalls to hrtimers, but
      sigtimedwait() did not get this treatment.
      
      Convert it so we get a reasonable accuracy and remove the
      user space exposure to the timer wheel properties.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Cyril Hrubis <chrubis@suse.cz>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.787164909@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2b1ecc3d
    • T
      timers: Remove the deprecated mod_timer_pinned() API · 177ec0a0
      Thomas Gleixner 提交于
      We switched all users to initialize the timers as pinned and call
      mod_timer(). Remove the now unused timer API function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      177ec0a0
    • T
      timers, net/ipv4/inet: Initialize connection request timers as pinned · f3438bc7
      Thomas Gleixner 提交于
      Pinned timers must carry the pinned attribute in the timer structure
      itself, so convert the code to the new API.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.617891430@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f3438bc7
    • T
      timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned · 853f90d4
      Thomas Gleixner 提交于
      Pinned timers must carry the pinned attribute in the timer structure
      itself, so convert the code to the new API.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.537448301@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      853f90d4
    • T
      timers, drivers/tty/metag_da: Initialize the poll timer as pinned · 01bab536
      Thomas Gleixner 提交于
      Pinned timers must carry the pinned attribute in the timer structure
      itself, so convert the code to the new API.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.456452642@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      01bab536
    • T
      timers, driver/net/ethernet/tile: Initialize the egress timer as pinned · 25df5760
      Thomas Gleixner 提交于
      Pinned timers must carry the pinned attribute in the timer structure
      itself, so convert the code to the new API.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.376394205@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      25df5760
    • T
      timers, cpufreq/powernv: Initialize the gpstate timer as pinned · 7bc54b65
      Thomas Gleixner 提交于
      Pinned timers must carry the pinned attribute in the timer structure
      itself, so convert the code to the new API.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.297014487@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7bc54b65
    • T
      timers, x86/mce: Initialize MCE restart timer as pinned · f9c287ba
      Thomas Gleixner 提交于
      Pinned timers must carry the pinned attribute in the timer structure
      itself, so convert the code to the new API.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.215783439@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f9c287ba
    • T
      timers, x86/apic/uv: Initialize the UV heartbeat timer as pinned · 920a4a70
      Thomas Gleixner 提交于
      Pinned timers must carry the pinned attribute in the timer structure
      itself, so convert the code to the new API.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.133837204@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      920a4a70
    • T
      timers: Make 'pinned' a timer property · e675447b
      Thomas Gleixner 提交于
      We want to move the timer migration logic from a 'push' to a 'pull' model.
      
      Under the current 'push' model pinned timers are handled via
      a runtime API variant: mod_timer_pinned().
      
      The 'pull' model requires us to store the pinned attribute of a timer
      in the timer_list structure itself, as a new TIMER_PINNED bit in
      timer->flags.
      
      This flag must be set at initialization time and the timer APIs
      recognize the flag.
      
      This patch:
      
       - Implements the new flag and associated new-style initialization
         methods
      
       - makes mod_timer() recognize new-style pinned timers,
      
       - and adds some migration helper facility to allow
         step by step conversion of old-style to new-style
         pinned timers.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.049338558@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e675447b
  2. 04 7月, 2016 3 次提交
  3. 03 7月, 2016 7 次提交
    • V
      ovl: warn instead of error if d_type is not supported · e7c0b599
      Vivek Goyal 提交于
      overlay needs underlying fs to support d_type. Recently I put in a
      patch in to detect this condition and started failing mount if
      underlying fs did not support d_type.
      
      But this breaks existing configurations over kernel upgrade. Those who
      are running docker (partially broken configuration) with xfs not
      supporting d_type, are surprised that after kernel upgrade docker does
      not run anymore.
      
      https://github.com/docker/docker/issues/22937#issuecomment-229881315
      
      So instead of erroring out, detect broken configuration and warn
      about it. This should allow existing docker setups to continue
      working after kernel upgrade.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 45aebeaf ("ovl: Ensure upper filesystem supports d_type")
      Cc: <stable@vger.kernel.org> 4.6
      e7c0b599
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 4f302921
      Linus Torvalds 提交于
      Pull MIPS fix from Ralf Baechle:
       "Only a single fix for 4.7 pending at this point.  It fixes an issue
        that may lead to corruption of the cache mode bits in the page table"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: Fix possible corruption of cache mode by mprotect.
      4f302921
    • L
      Merge tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 70bd68d7
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
      
       - tm: Always reclaim in start_thread() for exec() class syscalls from
         Cyril Bur
      
       - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael
         Neuling
      
       - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan
      
       - Initialise pci_io_base as early as possible from Darren Stevens
      
      * tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc: Initialise pci_io_base as early as possible
        powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
        powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()
        powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
      70bd68d7
    • L
      Merge tag 'drm-fixes-for-v4.7-rc6' of git://people.freedesktop.org/~airlied/linux · 99b0f54e
      Linus Torvalds 提交于
      Pull drm fixes frlm Dave Airlie:
       "Just some AMD and Intel fixes, the AMD ones are further production
        Polaris fixes, and the Intel ones fix some early timeouts, some PCI ID
        changes and a couple of other fixes.
      
        Still a bit Internet challenged here, hopefully end of next week will
        solve it"
      
      * tag 'drm-fixes-for-v4.7-rc6' of git://people.freedesktop.org/~airlied/linux:
        drm/i915: Fix missing unlock on error in i915_ppgtt_info()
        drm/amd/powerplay: workaround for UVD clock issue
        drm/amdgpu: add ACLK_CNTL setting for polaris10
        drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11.
        drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10.
        drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
        drm/i915: Add more Kabylake PCI IDs.
        drm/i915: Avoid early timeout during AUX transfers
        drm/i915/hsw: Avoid early timeout during LCPLL disable/restore
        drm/i915/lpt: Avoid early timeout during FDI PHY reset
        drm/i915/bxt: Avoid early timeout during PLL enable
        drm/i915: Refresh cached DP port register value on resume
        drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation
        drm/amd/powerplay: disable FFC.
        drm/amd/powerplay: add some definition for FFC feature on polaris.
      99b0f54e
    • L
      Merge tag 'spi-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 467ce769
      Linus Torvalds 提交于
      Pull spi fixes from Mark Brown:
       "A few small driver-specific fixes for SPI, all in the normal important
        if you hit them category especially the rockchip driver fix which
        addresses a race which has been exposed more frequently with some
        recent performance improvements"
      
      * tag 'spi-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: sunxi: fix transfer timeout
        spi: sun4i: fix FIFO limit
        spi: rockchip: Signal unfinished DMA transfers
        spi: spi-ti-qspi: Suspend the queue before removing the device
      467ce769
    • L
      Merge tag 'regulator-fix-v4.7-rc5' of... · a2b0db5b
      Linus Torvalds 提交于
      Merge tag 'regulator-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "Two small fixes for the regulator subsystem - one fixing a crash with
        one of the devices supported by the max77620 driver, another fixing
        startup for the anatop regulator when it starts up with the regulator
        in bypass mode"
      
      * tag 'regulator-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: max77620: check for valid regulator info
        regulator: anatop: allow regulator to be in bypass mode
      a2b0db5b
    • L
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 44385120
      Linus Torvalds 提交于
      Pull clk fixes from Stephen Boyd:
       "A small fix for the newly added oxnas clk driver and a handful of
        rockchip clk driver fixes for newly added rk3399 support"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: Fix return value check in oxnas_stdclk_probe()
        clk: rockchip: release io resource when failing to init clk on rk3399
        clk: rockchip: fix cpuclk registration error handling
        clk: rockchip: Revert "clk: rockchip: reset init state before mmc card initialization"
        clk: rockchip: fix incorrect parent for rk3399's {c,g}pll_aclk_perihp_src
        clk: rockchip: mark rk3399 GIC clocks as critical
        clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
      44385120
  4. 02 7月, 2016 14 次提交
    • D
      Merge tag 'drm-intel-fixes-2016-06-30' of git://anongit.freedesktop.org/drm-intel into drm-fixes · 88c08710
      Dave Airlie 提交于
      here's a batch of i915 fixes for 4.7.
      
      * tag 'drm-intel-fixes-2016-06-30' of git://anongit.freedesktop.org/drm-intel:
        drm/i915: Fix missing unlock on error in i915_ppgtt_info()
        drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
        drm/i915: Add more Kabylake PCI IDs.
        drm/i915: Avoid early timeout during AUX transfers
        drm/i915/hsw: Avoid early timeout during LCPLL disable/restore
        drm/i915/lpt: Avoid early timeout during FDI PHY reset
        drm/i915/bxt: Avoid early timeout during PLL enable
        drm/i915: Refresh cached DP port register value on resume
      88c08710
    • D
      Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 40793e85
      Dave Airlie 提交于
      Just a few more late fixes for Polaris cards.
      
      * 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux:
        drm/amd/powerplay: workaround for UVD clock issue
        drm/amdgpu: add ACLK_CNTL setting for polaris10
        drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11.
        drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10.
        drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation
        drm/amd/powerplay: disable FFC.
        drm/amd/powerplay: add some definition for FFC feature on polaris.
      40793e85
    • R
      MIPS: Fix possible corruption of cache mode by mprotect. · 6d037de9
      Ralf Baechle 提交于
      The following testcase may result in a page table entries with a invalid
      CCA field being generated:
      
      static void *bindstack;
      
      static int sysrqfd;
      
      static void protect_low(int protect)
      {
      	mprotect(bindstack, BINDSTACK_SIZE, protect);
      }
      
      static void sigbus_handler(int signal, siginfo_t * info, void *context)
      {
      	void *addr = info->si_addr;
      
      	write(sysrqfd, "x", 1);
      
      	printf("sigbus, fault address %p (should not happen, but might)\n",
      	       addr);
      	abort();
      }
      
      static void run_bind_test(void)
      {
      	unsigned int *p = bindstack;
      
      	p[0] = 0xf001f001;
      
      	write(sysrqfd, "x", 1);
      
      	/* Set trap on access to p[0] */
      	protect_low(PROT_NONE);
      
      	write(sysrqfd, "x", 1);
      
      	/* Clear trap on access to p[0] */
      	protect_low(PROT_READ | PROT_WRITE | PROT_EXEC);
      
      	write(sysrqfd, "x", 1);
      
      	/* Check the contents of p[0] */
      	if (p[0] != 0xf001f001) {
      		write(sysrqfd, "x", 1);
      
      		/* Reached, but shouldn't be */
      		printf("badness, shouldn't happen but does\n");
      		abort();
      	}
      }
      
      int main(void)
      {
      	struct sigaction sa;
      
      	sysrqfd = open("/proc/sysrq-trigger", O_WRONLY);
      
      	if (sigprocmask(SIG_BLOCK, NULL, &sa.sa_mask)) {
      		perror("sigprocmask");
      		return 0;
      	}
      
      	sa.sa_sigaction = sigbus_handler;
      	sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_RESTART;
      	if (sigaction(SIGBUS, &sa, NULL)) {
      		perror("sigaction");
      		return 0;
      	}
      
      	bindstack = mmap(NULL,
      			 BINDSTACK_SIZE,
      			 PROT_READ | PROT_WRITE | PROT_EXEC,
      			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
      	if (bindstack == MAP_FAILED) {
      		perror("mmap bindstack");
      		return 0;
      	}
      
      	printf("bindstack: %p\n", bindstack);
      
      	run_bind_test();
      
      	printf("done\n");
      
      	return 0;
      }
      
      There are multiple ingredients for this:
      
       1) PAGE_NONE is defined to _CACHE_CACHABLE_NONCOHERENT, which is CCA 3
          on all platforms except SB1 where it's CCA 5.
       2) _page_cachable_default must have bits set which are not set
          _CACHE_CACHABLE_NONCOHERENT.
       3) Either the defective version of pte_modify for XPA or the standard
          version must be in used.  However pte_modify for the 36 bit address
          space support is no affected.
      
      In that case additional bits in the final CCA mode may generate an invalid
      value for the CCA field.  On the R10000 system where this was tracked
      down for example a CCA 7 has been observed, which is Uncached Accelerated.
      
      Fixed by:
      
       1) Using the proper CCA mode for PAGE_NONE just like for all the other
          PAGE_* pte/pmd bits.
       2) Fix the two affected variants of pte_modify.
      
      Further code inspection also shows the same issue to exist in pmd_modify
      which would affect huge page systems.
      
      Issue in pte_modify tracked down by Alastair Bridgewater, PAGE_NONE
      and pmd_modify issue found by me.
      
      The history of this goes back beyond Linus' git history.  Chris Dearman's
      commit 35133692 ("[MIPS] Allow setting of
      the cache attribute at run time.") missed the opportunity to fix this
      but it was originally introduced in lmo commit
      d523832cf12007b3242e50bb77d0c9e63e0b6518 ("Missing from last commit.")
      and 32cc38229ac7538f2346918a09e75413e8861f87 ("New configuration option
      CONFIG_MIPS_UNCACHED.")
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Reported-by: NAlastair Bridgewater <alastair.bridgewater@gmail.com>
      6d037de9
    • L
      Merge tag 'acpi-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · dbdc3bb7
      Linus Torvalds 提交于
      Pull ACPI fix from Rafael Wysocki:
       "Fix an expression in the ACPI PCI IRQ management code added by a
        recent commit that overlooked missing parens in it, so the result of
        the computation is incorrect in some cases (Sinan Kaya)"
      
      * tag 'acpi-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI,PCI,IRQ: correct operator precedence
      dbdc3bb7
    • L
      Merge tag 'pm-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 81dbd6f5
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "Three cpufreq fixes, one in the core (stable-candidate) and two in
        drivers (intel_pstate and cpufreq-dt).
      
        Specifics:
      
         - Fix a recent intel_pstate regression that caused the number of
           wakeups to increase significantly on an idle system in some cases
           due to excessive synchronize_sched() invocations (Rafael Wysocki).
      
         - Fix unnecessary invocations of WARN_ON() in the cpufreq core after
           cpufreq has been suspended introduced during the 4.6 cycla (Rafael
           Wysocki).
      
         - Fix an error code path in the cpufreq-dt-platdev driver that
           forgets to drop a reference to a DT node (Masahiro Yamada)"
      
      * tag 'pm-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: Avoid false-positive WARN_ON()s in cpufreq_update_policy()
        cpufreq: dt: call of_node_put() before error out
        intel_pstate: Do not clear utilization update hooks on policy changes
      81dbd6f5
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 48c4565e
      Linus Torvalds 提交于
      Pull vfs fixes from Al Viro:
       "Tmpfs readdir throughput regression fix (this cycle) + some -stable
        fodder all over the place.
      
        One missing bit is Miklos' tonight locks.c fix - NFS folks had already
        grabbed that one by the time I woke up ;-)"
      
      [ The locks.c fix came through the nfsd tree just moments ago ]
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        namespace: update event counter when umounting a deleted dentry
        9p: use file_dentry()
        ceph: fix d_obtain_alias() misuses
        lockless next_positive()
        libfs.c: new helper - next_positive()
        dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
      48c4565e
    • L
      Merge tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux · 2728c57f
      Linus Torvalds 提交于
      Pull lockd/locks fixes from Bruce Fields:
       "One fix for lockd soft lookups in an error path, and one fix for file
        leases on overlayfs"
      
      * tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux:
        locks: use file_inode()
        lockd: unregister notifier blocks if the service fails to come up completely
      2728c57f
    • L
      Merge tag 'mfd-fixes-4.7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · 0d064a7b
      Linus Torvalds 提交于
      Pull more MFD fixes from Lee Jones:
       "Apologies for missing these from the first pull request.
      
        Final patches fixing Reset API change"
      
      * tag 'mfd-fixes-4.7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
        usb: dwc3: st: Use explicit reset_control_get_exclusive() API
        phy: phy-stih407-usb: Use explicit reset_control_get_exclusive() API
        phy: miphy28lp: Inform the reset framework that our reset line may be shared
      0d064a7b
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · f3683ccd
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
       "1/ Two regression fixes since v4.6: one for the byte order of a sysfs
           attribute (bz121161) and another for QEMU 2.6's NVDIMM _DSM (ACPI
           Device Specific Method) implementation that gets tripped up by new
           auto-probing behavior in the NFIT driver.
      
        2/ A fix tagged for -stable that stops the kernel from
           clobbering/ignoring changes to the configuration of a 'pfn'
           instance ("struct page" driver).  For example changing the
           alignment from 2M to 1G may silently revert to 2M if that value is
           currently stored on media.
      
        3/ A fix from Eric for an xfstests failure in dax.  It is not
           currently tagged for -stable since it requires an 8-exabyte file
           system to trigger, and there appear to be no user visible side
           effects"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        nfit: fix format interface code byte order
        dax: fix offset overflow in dax_io
        acpi, nfit: fix acpi_check_dsm() vs zero functions implemented
        libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
      f3683ccd
    • L
      Merge tag 'staging-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 6e5c4f13
      Linus Torvalds 提交于
      Pull staging and IIO fixes from Greg KH:
       "Here are a few small staging and iio driver fixes for 4.7-rc6.
      
        Nothing major here, just a number of small fixes, all have been in
        linux-next for a while, and the full details are in the shortlog"
      
      * tag 'staging-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio:ad7266: Fix probe deferral for vref
        iio:ad7266: Fix support for optional regulators
        iio:ad7266: Fix broken regulator error handling
        iio: accel: kxsd9: fix the usage of spi_w8r8()
        staging: iio: accel: fix error check
        staging: iio: ad5933: fix order of cycle conditions
        staging: iio: fix ad7606_spi regression
        iio: inv_mpu6050: Fix use-after-free in ACPI code
      6e5c4f13
    • L
      Merge tag 'tty-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 756c0aec
      Linus Torvalds 提交于
      Pull tty fixes from Greg KH:
       "Here are two tty fixes for some reported issues.  One resolves a crash
        in devpts, and the other resolves a problem with the fbcon cursor
        blink causing lockups.
      
        Both have been in linux-next with no reported problems"
      
      * tag 'tty-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        devpts: fix null pointer dereference on failed memory allocation
        tty: vt: Fix soft lockup in fbcon cursor blink timer.
      756c0aec
    • L
      Merge tag 'usb-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 0232b23d
      Linus Torvalds 提交于
      Pull USB and PHY fixes from Greg KH:
       "Here are a number of small USB and PHY driver fixes for 4.7-rc6.
      
        Nothing major here, all are described in the shortlog below.  All have
        been in linux-next with no reported issues"
      
      * tag 'usb-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: don't free bandwidth_mutex too early
        USB: EHCI: declare hostpc register as zero-length array
        phy-sun4i-usb: Fix irq free conditions to match request conditions
        phy: bcm-ns-usb2: checking the wrong variable
        phy-sun4i-usb: fix missing __iomem *
        phy: phy-sun4i-usb: Fix optional gpios failing probe
        phy: rockchip-dp: fix return value check in rockchip_dp_phy_probe()
        phy: rcar-gen3-usb2: fix unexpected repeat interrupts of VBUS change
        usb: common: otg-fsm: add license to usb-otg-fsm
      0232b23d
    • L
      Merge tag 'iommu-fixes-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · aa7a6c8e
      Linus Torvalds 提交于
      Pull IOMMU fixes from Joerg Roedel:
       "Three fixes:
      
         - Fix use of smp_processor_id() in preemptible code in the IOVA
           allocation code.  This got introduced with the scalability
           improvements in this release cycle.
      
         - A VT-d fix for out-of-bounds access of the iommu->domains array.
           The bug showed during suspend/resume.
      
         - AMD IOMMU fix to print the correct device id in the ACPI parsing
           code"
      
      * tag 'iommu-fixes-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/amd: Initialize devid variable before using it
        iommu/vt-d: Fix overflow of iommu->domains array
        iommu/iova: Disable preemption around use of this_cpu_ptr()
      aa7a6c8e
    • M
      Merge remote-tracking branches 'regulator/fix/anatop' and... · a29a36f2
      Mark Brown 提交于
      Merge remote-tracking branches 'regulator/fix/anatop' and 'regulator/fix/max77620' into regulator-linus
      a29a36f2