1. 20 3月, 2014 11 次提交
    • S
      s390, cacheinfo: Fix CPU hotplug callback registration · 6575080e
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the cacheinfo code in s390 by using this latter form of callback
      registration.
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6575080e
    • S
      arm, kvm: Fix CPU hotplug callback registration · 8146875d
      Srivatsa S. Bhat 提交于
      On 03/15/2014 12:40 AM, Christoffer Dall wrote:
      > On Fri, Mar 14, 2014 at 11:13:29AM +0530, Srivatsa S. Bhat wrote:
      >> On 03/13/2014 04:51 AM, Christoffer Dall wrote:
      >>> On Tue, Mar 11, 2014 at 02:05:38AM +0530, Srivatsa S. Bhat wrote:
      >>>> Subsystems that want to register CPU hotplug callbacks, as well as perform
      >>>> initialization for the CPUs that are already online, often do it as shown
      >>>> below:
      >>>>
      [...]
      >>> Just so we're clear, the existing code was simply racy as not prone to
      >>> deadlocks, right?
      
      > >>> This makes it clear that the test above for compatible CPUs can be quite
      > >>> easily evaded by using CPU hotplug, but we don't really have a good
      > >>> solution for handling that yet...  Hmmm, grumble grumble, I guess if you
      > >>> hotplug unsupported CPUs on a KVM/ARM system for now, stuff will break.
      
      >>
      >> In this particular case, there was no deadlock possibility, rather the
      >> existing code had insufficient synchronization against CPU hotplug.
      >>
      >> init_hyp_mode() would invoke cpu_init_hyp_mode() on currently online CPUs
      >> using on_each_cpu(). If a CPU came online after this point and before calling
      >> register_cpu_notifier(), that CPU would remain uninitialized because this
      >> subsystem would miss the hot-online event. This patch fixes this bug and
      >> also uses the new synchronization method (instead of get/put_online_cpus())
      >> to ensure that we don't deadlock with CPU hotplug.
      >>
      >
      > Yes, that was my conclusion as well.  Thanks for clarifying.  (It could
      > be noted in the commit message as well if you should feel so inclined).
      >
      
      Please find the patch with updated changelog (and your Ack) below.
      (No changes in code).
      
      From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Subject: [PATCH] arm, kvm: Fix CPU hotplug callback registration
      
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      In the existing arm kvm code, there is no synchronization with CPU hotplug
      to avoid missing the hotplug events that might occur after invoking
      init_hyp_mode() and before calling register_cpu_notifier(). Fix this bug
      and also use the new synchronization method (instead of get/put_online_cpus())
      to ensure that we don't deadlock with CPU hotplug.
      
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8146875d
    • S
      arm, hw-breakpoint: Fix CPU hotplug callback registration · c5929bd3
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the hw-breakpoint code in arm by using this latter form of callback
      registration.
      
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c5929bd3
    • S
      ia64, err-inject: Fix CPU hotplug callback registration · f2e48a89
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the error injection code in ia64 by using this latter form of callback
      registration.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f2e48a89
    • S
      ia64, topology: Fix CPU hotplug callback registration · f5a7d445
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the topology code in ia64 by using this latter form of callback
      registration.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f5a7d445
    • S
      ia64, palinfo: Fix CPU hotplug callback registration · 9f37bca9
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the palinfo code in ia64 by using this latter form of callback
      registration.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9f37bca9
    • S
      ia64, salinfo: Fix hotplug callback registration · eff722b0
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the salinfo code in ia64 by using this latter form of callback
      registration.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eff722b0
    • S
      CPU hotplug, perf: Fix CPU hotplug callback registration · f0bdb5e0
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the perf subsystem's hotplug notifier by using this latter form of
      callback registration.
      
      Also provide a bare-bones version of perf_cpu_notifier() that doesn't
      invoke the notifiers for the already online CPUs. This would be useful
      for subsystems that need to perform a different set of initialization
      for the already online CPUs, or don't need the initialization altogether.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f0bdb5e0
    • S
      Doc/cpu-hotplug: Specify race-free way to register CPU hotplug callbacks · 8489d90b
      Srivatsa S. Bhat 提交于
      Recommend the usage of the new CPU hotplug callback registration APIs
      (__register_cpu_notifier() etc), when subsystems need to also perform
      initialization for already online CPUs. Provide examples of correct
      and race-free ways of achieving this, and point out the kinds of code
      that are error-prone.
      
      Cc: Rob Landley <rob@landley.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8489d90b
    • S
      CPU hotplug: Provide lockless versions of callback registration functions · 93ae4f97
      Srivatsa S. Bhat 提交于
      The following method of CPU hotplug callback registration is not safe
      due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
      and the cpu_hotplug.lock.
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      The deadlock is shown below:
      
                CPU 0                                         CPU 1
                -----                                         -----
      
         Acquire cpu_hotplug.lock
         [via get_online_cpus()]
      
                                                    CPU online/offline operation
                                                    takes cpu_add_remove_lock
                                                    [via cpu_maps_update_begin()]
      
         Try to acquire
         cpu_add_remove_lock
         [via register_cpu_notifier()]
      
                                                    CPU online/offline operation
                                                    tries to acquire cpu_hotplug.lock
                                                    [via cpu_hotplug_begin()]
      
                                  *** DEADLOCK! ***
      
      The problem here is that callback registration takes the locks in one order
      whereas the CPU hotplug operations take the same locks in the opposite order.
      To avoid this issue and to provide a race-free method to register CPU hotplug
      callbacks (along with initialization of already online CPUs), introduce new
      variants of the callback registration APIs that simply register the callbacks
      without holding the cpu_add_remove_lock during the registration. That way,
      we can avoid the ABBA scenario. However, we will need to hold the
      cpu_add_remove_lock throughout the entire critical section, to protect updates
      to the callback/notifier chain.
      
      This can be achieved by writing the callback registration code as follows:
      
      	cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* This doesn't take the cpu_add_remove_lock */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_maps_update_done();  [ or cpu_notifier_register_done(); see below ]
      
      Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
      because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
      notifiers, and hence get_online_cpus() cannot provide the necessary
      synchronization to protect the callback/notifier chains against concurrent
      reads and writes. On the other hand, since the cpu_add_remove_lock protects
      the entire hotplug operation (including CPU_POST_DEAD), we can use
      cpu_maps_update_begin/done() to guarantee proper synchronization.
      
      Also, since cpu_maps_update_begin/done() is like a super-set of
      get/put_online_cpus(), the former naturally protects the critical sections
      from concurrent hotplug operations.
      
      Since the names cpu_maps_update_begin/done() don't make much sense in CPU
      hotplug callback registration scenarios, we'll introduce new APIs named
      cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().
      
      In summary, introduce the lockless variants of un/register_cpu_notifier() and
      also export the cpu_notifier_register_begin/done() APIs for use by modules.
      This way, we provide a race-free way to register hotplug callbacks as well as
      perform initialization for the CPUs that are already online.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      93ae4f97
    • G
      CPU hotplug: Add lockdep annotations to get/put_online_cpus() · a19423b9
      Gautham R. Shenoy 提交于
      Add lockdep annotations for get/put_online_cpus() and
      cpu_hotplug_begin()/cpu_hotplug_end().
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a19423b9
  2. 17 3月, 2014 4 次提交
    • L
      Linux 3.14-rc7 · dcb99fd9
      Linus Torvalds 提交于
      dcb99fd9
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 59bf6c3c
      Linus Torvalds 提交于
      Pull scheduler fixes from Ingo Molnar:
       "Three small fixes"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/clock: Prevent tracing recursion in sched_clock_cpu()
        stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
        sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy
      59bf6c3c
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b44eeb4d
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "Misc smaller fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Fix leak in uncore_type_init failure paths
        perf machine: Use map as success in ip__resolve_ams
        perf symbols: Fix crash in elf_section_by_name
        perf trace: Decode architecture-specific signal numbers
      b44eeb4d
    • M
      ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation · 4f87dac3
      Michael Kerrisk 提交于
      While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
      Kinsbursky added in commit 4a674f34 ("ipc: introduce message queue
      copy feature" => kernel 3.8), I discovered a couple of bugs in the
      implementation.  The two bugs concern MSG_COPY interactions with other
      msgrcv() flags, namely:
      
       (A) MSG_COPY + MSG_EXCEPT
       (B) MSG_COPY + !IPC_NOWAIT
      
      The bugs are distinct (and the fix for the first one is obvious),
      however my fix for both is a single-line patch, which is why I'm
      combining them in a single mail, rather than writing two mails+patches.
      
       ===== (A) MSG_COPY + MSG_EXCEPT =====
      
      With the addition of the MSG_COPY flag, there are now two msgrcv()
      flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
      argument in unrelated ways.  Specifying both in the same call is a
      logical error that is currently permitted, with the effect that MSG_COPY
      has priority and MSG_EXCEPT is ignored.  The call should give an error
      if both flags are specified.  The patch below implements that behavior.
      
       ===== (B) (B) MSG_COPY + !IPC_NOWAIT =====
      
      The test code that was submitted in commit 3a665531 ("selftests: IPC
      message queue copy feature test") shows MSG_COPY being used in
      conjunction with IPC_NOWAIT.  In other words, if there is no message at
      the position 'msgtyp'.  return immediately with the error in ENOMSG.
      
      What was not (fully) tested is the behavior if MSG_COPY is specified
      *without* IPC_NOWAIT, and there is an odd behavior.  If the queue
      contains less than 'msgtyp' messages, then the call blocks until the
      next message is written to the queue.  At that point, the msgrcv() call
      returns a copy of the newly added message, regardless of whether that
      message is at the ordinal position 'msgtyp'.  This is clearly bogus, and
      problematic for applications that might want to make use of the MSG_COPY
      flag.
      
      I considered the following possible solutions to this problem:
      
       (1) Force the call to block until a message *does* appear at the
           position 'msgtyp'.
      
       (2) If the MSG_COPY flag is specified, the kernel should implicitly add
           IPC_NOWAIT, so that the call fails with ENOMSG for this case.
      
       (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
           an error (probably, EINVAL is the right one).
      
      I do not know if any application would really want to have the
      functionality of solution (1), especially since an application can
      determine in advance the number of messages in the queue using msgctl()
      IPC_STAT.  Obviously, this solution would be the most work to implement.
      
      Solution (2) would have the effect of silently fixing any applications
      that tried to employ broken behavior.  However, it would mean that if we
      later decided to implement solution (1), then user-space could not
      easily detect what the kernel supports (but, since I'm somewhat doubtful
      that solution (1) is needed, I'm not sure that this is much of a
      problem).
      
      Solution (3) would have the effect of informing broken applications that
      they are doing something broken.  The downside is that this would cause
      a ABI breakage for any applications that are currently employing the
      broken behavior.  However:
      
      a) Those applications are almost certainly not getting the results they
         expect.
      b) Possibly, those applications don't even exist, because MSG_COPY is
         currently hidden behind CONFIG_CHECKPOINT_RESTORE.
      
      The upside of solution (3) is that if we later decided to implement
      solution (1), user-space could determine what the kernel supports, via
      the error return.
      
      In my view, solution (3) is mildly preferable to solution (2), and
      solution (1) could still be done later if anyone really cares.  The
      patch below implements solution (3).
      
      PS.  For anyone out there still listening, it's the usual story:
      documenting an API (and the thinking about, and the testing of the API,
      that documentation entails) is the one of the single best ways of
      finding bugs in the API, as I've learned from a lot of experience.  Best
      to do that documentation before releasing the API.
      Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: stable@vger.kernel.org
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f87dac3
  3. 16 3月, 2014 1 次提交
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 3b4df68d
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "This is a set of six fixes.  Two are instant crash/null deref types
        (storvsc and isci).  The two qla2xxx are initialisation problems that
        cause MSI-X failures and card misdetection, the isci erroneous macro
        is actually illegal C that's causing a miscompile with certain gcc
        versions and the be2iscsi bad if expression is a static checker fix"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        [SCSI] storvsc: NULL pointer dereference fix
        [SCSI] qla2xxx: Poll during initialization for ISP25xx and ISP83xx
        [SCSI] isci: correct erroneous for_each_isci_host macro
        [SCSI] isci: fix reset timeout handling
        [SCSI] be2iscsi: fix bad if expression
        [SCSI] qla2xxx: Fix multiqueue MSI-X registration.
      3b4df68d
  4. 15 3月, 2014 3 次提交
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a4ecdf82
      Linus Torvalds 提交于
      Pull x86 fixes from Peter Anvin:
       "Two x86 fixes: Suresh's eager FPU fix, and a fix to the NUMA quirk for
        AMD northbridges.
      
        This only includes Suresh's fix patch, not the "mostly a cleanup"
        patch which had __init issues"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/amd/numa: Fix northbridge quirk to assign correct NUMA node
        x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU
      a4ecdf82
    • L
      Merge tag 'pm+acpi-3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · cee152ff
      Linus Torvalds 提交于
      Pull ACPI and power management fixes from Rafael Wysocki:
       "Three of these are regression fixes, for two recent regressions and
        one introduced during the 3.13 cycle, and the fourth one is a working
        version of the fix that had to be reverted last time.
      
        Specifics:
      
         - A recent ACPI resources handling fix overlooked the fact that it
           had to update the ACPI PNP subsystem's resources parsing too and
           caused confusing warning messages to be printed during system
           intialization on some systems (with arguably buggy ACPI tables).
           Fix from Zhang Rui.
      
         - Moving the early ACPI initialization before timekeeping_init()
           earlier in this cycle broke fast TSC calibration on at least one
           system, so it needs to be done later, but still before
           efi_enter_virtual_mode() to allow the EFI initialization to refer
           to ACPI.
      
         - A change related to code duplication reduction in the cpufreq core
           inadvertently caused cpufreq intialization to fail for some CPUs
           handled by intel_pstate by adding checks that may fail for that
           driver, but aren't even necessary when it is used.  The issue is
           addressed by preventing those checks from run in the configurations
           in which they aren't needed.
      
         - If the Hardware Reduced ACPI flag is set in the ACPI tables, system
           suspend, hibernation and ACPI power off will only work when special
           sleep control and sleep status registeres are provided (their
           addresses in the ACPI tables are not zero).  If those registers are
           not available, the features in question have no chances to work, so
           they shouldn't even be regarded as supported.  That helps with
           power off in particular, because alternative power off methods may
           be used then and they may actually work"
      
      * tag 'pm+acpi-3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / sleep: Add extra checks for HW Reduced ACPI mode sleep states
        ACPI / init: Invoke early ACPI initialization later
        cpufreq: Skip current frequency initialization for ->setpolicy drivers
        PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failures
      cee152ff
    • L
      Merge tag 'dm-3.14-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 0c01b452
      Linus Torvalds 提交于
      Pull device-mapper fixes form Mike Snitzer:
       "Two small fixes for the DM cache target:
      
         - fix corruption with >2TB fast device due to truncation bug
         - fix access beyond end of origin device due to a partial block"
      
      * tag 'dm-3.14-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm cache: fix access beyond end of origin device
        dm cache: fix truncation bug when copying a block to/from >2TB fast device
      0c01b452
  5. 14 3月, 2014 14 次提交
    • D
      x86/amd/numa: Fix northbridge quirk to assign correct NUMA node · 847d7970
      Daniel J Blueman 提交于
      For systems with multiple servers and routed fabric, all
      northbridges get assigned to the first server. Fix this by also
      using the node reported from the PCI bus. For single-fabric
      systems, the northbriges are on PCI bus 0 by definition, which
      are on NUMA node 0 by definition, so this is invarient on most
      systems.
      
      Tested on fam10h and fam15h single and multi-fabric systems and
      candidate for stable.
      Signed-off-by: NDaniel J Blueman <daniel@numascale.com>
      Acked-by: NSteffen Persvold <sp@numascale.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1394710981-3596-1-git-send-email-daniel@numascale.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      847d7970
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · c60f7d5a
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Pretty minor set of fixes for radeon, ttm and vmwgfx.  The ttm ones
        are a regression and an oops seen on server chipsets"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/vmwgfx: Fix a surface reference corner-case in legacy emulation mode
        drm/radeon/cik: properly set compute ring status on disable
        drm/radeon/cik: stop the sdma engines in the enable() function
        drm/radeon/cik: properly set sdma ring status on disable
        drm/radeon: fix runpm disabling on non-PX harder
        drm/ttm: don't oops if no invalidate_caches()
        drm/ttm: Work around performance regression with VM_PFNMAP
      c60f7d5a
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · c14c06b7
      Linus Torvalds 提交于
      Pull i2c Kconfig fix from Wolfram Sang.
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: Remove usage of orphaned symbol OF_I2C
      c14c06b7
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 53611c0c
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "I know this is a bit more than you want to see, and I've told the
        wireless folks under no uncertain terms that they must severely scale
        back the extent of the fixes they are submitting this late in the
        game.
      
        Anyways:
      
         1) vmxnet3's netpoll doesn't perform the equivalent of an ISR, which
            is the correct implementation, like it should.  Instead it does
            something like a NAPI poll operation.  This leads to crashes.
      
            From Neil Horman and Arnd Bergmann.
      
         2) Segmentation of SKBs requires proper socket orphaning of the
            fragments, otherwise we might access stale state released by the
            release callbacks.
      
            This is a 5 patch fix, but the initial patches are giving
            variables and such significantly clearer names such that the
            actual fix itself at the end looks trivial.
      
            From Michael S.  Tsirkin.
      
         3) TCP control block release can deadlock if invoked from a timer on
            an already "owned" socket.  Fix from Eric Dumazet.
      
         4) In the bridge multicast code, we must validate that the
            destination address of general queries is the link local all-nodes
            multicast address.  From Linus Lüssing.
      
         5) The x86 BPF JIT support for negative offsets puts the parameter
            for the helper function call in the wrong register.  Fix from
            Alexei Starovoitov.
      
         6) The descriptor type used for RTL_GIGA_MAC_VER_17 chips in the
            r8169 driver is incorrect.  Fix from Hayes Wang.
      
         7) The xen-netback driver tests skb_shinfo(skb)->gso_type bits to see
            if a packet is a GSO frame, but that's not the correct test.  It
            should use skb_is_gso(skb) instead.  Fix from Wei Liu.
      
         8) Negative msg->msg_namelen values should generate an error, from
            Matthew Leach.
      
         9) at86rf230 can deadlock because it takes the same lock from it's
            ISR and it's hard_start_xmit method, without disabling interrupts
            in the latter.  Fix from Alexander Aring.
      
        10) The FEC driver's restart doesn't perform operations in the correct
            order, so promiscuous settings can get lost.  Fix from Stefan
            Wahren.
      
        11) Fix SKB leak in SCTP cookie handling, from Daniel Borkmann.
      
        12) Reference count and memory leak fixes in TIPC from Ying Xue and
            Erik Hugne.
      
        13) Forced eviction in inet_frag_evictor() must strictly make sure all
            frags are deleted, otherwise module unload (f.e.  6lowpan) can
            crash.  Fix from Florian Westphal.
      
        14) Remove assumptions in AF_UNIX's use of csum_partial() (which it
            uses as a hash function), which breaks on PowerPC.  From Anton
            Blanchard.
      
            The main gist of the issue is that csum_partial() is defined only
            as a value that, once folded (f.e.  via csum_fold()) produces a
            correct 16-bit checksum.  It is legitimate, therefore, for
            csum_partial() to produce two different 32-bit values over the
            same data if their respective alignments are different.
      
        15) Fix endiannes bug in MAC address handling of ibmveth driver, also
            from Anton Blanchard.
      
        16) Error checks for ipv6 exthdrs offload registration are reversed,
            from Anton Nayshtut.
      
        17) Externally triggered ipv6 addrconf routes should count against the
            garbage collection threshold.  Fix from Sabrina Dubroca.
      
        18) The PCI shutdown handler added to the bnx2 driver can wedge the
            chip if it was not brought up earlier already, which in particular
            causes the firmware to shut down the PHY.  Fix from Michael Chan.
      
        19) Adjust the sanity WARN_ON_ONCE() in qdisc_list_add() because as
            currently coded it can and does trigger in legitimate situations.
            From Eric Dumazet.
      
        20) BNA driver fails to build on ARM because of a too large udelay()
            call, fix from Ben Hutchings.
      
        21) Fair-Queue qdisc holds locks during GFP_KERNEL allocations, fix
            from Eric Dumazet.
      
        22) The vlan passthrough ops added in the previous release causes a
            regression in source MAC address setting of outgoing headers in
            some circumstances.  Fix from Peter Boström"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
        ipv6: Avoid unnecessary temporary addresses being generated
        eth: fec: Fix lost promiscuous mode after reconnecting cable
        bonding: set correct vlan id for alb xmit path
        at86rf230: fix lockdep splats
        net/mlx4_en: Deregister multicast vxlan steering rules when going down
        vmxnet3: fix building without CONFIG_PCI_MSI
        MAINTAINERS: add networking selftests to NETWORKING
        net: socket: error on a negative msg_namelen
        MAINTAINERS: Add tools/net to NETWORKING [GENERAL]
        packet: doc: Spelling s/than/that/
        net/mlx4_core: Load the IB driver when the device supports IBoE
        net/mlx4_en: Handle vxlan steering rules for mac address changes
        net/mlx4_core: Fix wrong dump of the vxlan offloads device capability
        xen-netback: use skb_is_gso in xenvif_start_xmit
        r8169: fix the incorrect tx descriptor version
        tools/net/Makefile: Define PACKAGE to fix build problems
        x86: bpf_jit: support negative offsets
        bridge: multicast: enable snooping on general queries only
        bridge: multicast: add sanity check for general query destination
        tcp: tcp_release_cb() should release socket ownership
        ...
      53611c0c
    • R
      i2c: Remove usage of orphaned symbol OF_I2C · 62c19c9d
      Richard Weinberger 提交于
      The symbol is an orphan, don't depend on it anymore.
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      [wsa: enhanced commit message]
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      Fixes: 687b81d0 (i2c: move OF helpers into the core)
      Cc: stable@kernel.org
      62c19c9d
    • R
      Merge branches 'pnp', 'acpi-init', 'acpi-sleep' and 'pm-cpufreq' · d5af40d6
      Rafael J. Wysocki 提交于
      * pnp:
        PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failures
      
      * acpi-init:
        ACPI / init: Invoke early ACPI initialization later
      
      * acpi-sleep:
        ACPI / sleep: Add extra checks for HW Reduced ACPI mode sleep states
      
      * pm-cpufreq:
        cpufreq: Skip current frequency initialization for ->setpolicy drivers
      d5af40d6
    • R
      ACPI / sleep: Add extra checks for HW Reduced ACPI mode sleep states · a4e90bed
      Rafael J. Wysocki 提交于
      If the HW Reduced ACPI mode bit is set in the FADT, ACPICA uses
      the optional sleep control and sleep status registers for making
      the system enter sleep states (including S5), so it is not possible
      to use system sleep states or power it off using ACPI if the HW
      Reduced ACPI mode bit is set and those registers are not available.
      
      For this reason, add a new function, acpi_sleep_state_supported(),
      checking if the HW Reduced ACPI mode bit is set and whether or not
      system sleep states are usable in that case in addition to checking
      the return value of acpi_get_sleep_type_data() and make the ACPI
      sleep setup routines use that function to check the availability of
      system sleep states.
      
      Among other things, this prevents the kernel from attempting to
      use ACPI for powering off HW Reduced ACPI systems without the sleep
      control and sleep status registers, because ACPI power off doesn't
      have a chance to work on them.  That allows alternative power off
      mechanisms that may actually work to be used on those systems.  The
      affected machines include Dell Venue 8 Pro, Asus T100TA, Haswell
      Desktop SDP and Ivy Bridge EP Demo depot.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=70931Reported-by: NAdam Williamson <awilliam@redhat.com>
      Tested-by: NAubrey Li <aubrey.li@linux.intel.com>
      Cc: 3.4+ <stable@vger.kernel.org> # 3.4+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a4e90bed
    • H
      ipv6: Avoid unnecessary temporary addresses being generated · ecab6701
      Heiner Kallweit 提交于
      tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore
      age needs to be added to the condition.
      
      Age calculation in ipv6_create_tempaddr is different from the one
      in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS.
      This can cause age in ipv6_create_tempaddr to be less than the one
      in addrconf_verify and therefore unnecessary temporary address to
      be generated.
      Use age calculation as in addrconf_modify to avoid this.
      Signed-off-by: NHeiner Kallweit <heiner.kallweit@web.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecab6701
    • S
      eth: fec: Fix lost promiscuous mode after reconnecting cable · 84fe6182
      Stefan Wahren 提交于
      If the Freescale fec is in promiscuous mode and network cable is
      reconnected then the promiscuous mode get lost. The problem is caused
      by a too soon call of set_multicast_list to re-enable promisc mode.
      The FEC_R_CNTRL register changes are overwritten by fec_restart.
      
      This patch fixes this by moving the call behind the init of FEC_R_CNTRL
      register in fec_restart.
      
      Successful tested on a i.MX28 board.
      Signed-off-by: NStefan Wahren <stefan.wahren@i2se.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84fe6182
    • D
      bonding: set correct vlan id for alb xmit path · fb00bc2e
      dingtianhong 提交于
      The commit d3ab3ffd
      (bonding: use rlb_client_info->vlan_id instead of ->tag)
      remove the rlb_client_info->tag, but occur some issues,
      The vlan_get_tag() will return 0 for success and -EINVAL for
      error, so the client_info->vlan_id always be set to 0 if the
      vlan_get_tag return 0 for success, so the client_info would
      never get a correct vlan id.
      
      We should only set the vlan id to 0 when the vlan_get_tag return error.
      
      Fixes: d3ab3ffd (bonding: use rlb_client_info->vlan_id instead of ->tag)
      
      CC: Ding Tianhong <dingtianhong@huawei.com>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Acked-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb00bc2e
    • A
      at86rf230: fix lockdep splats · 6e07a1e0
      Alexander Aring 提交于
      This patch fix a lockdep in the at86rf230 driver, otherwise we get:
      
      [   30.206517] =================================
      [   30.211078] [ INFO: inconsistent lock state ]
      [   30.215647] 3.14.0-20140108-1-00994-g32e9426 #163 Not tainted
      [   30.221660] ---------------------------------
      [   30.226222] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
      [   30.232514] systemd-udevd/157 [HC1[1]:SC0[0]:HE0:SE1] takes:
      [   30.238439]  (&(&lp->lock)->rlock){?.+...}, at: [<c03600f8>] at86rf230_isr+0x18/0x44
      [   30.246621] {HARDIRQ-ON-W} state was registered at:
      [   30.251728]   [<c0061ce4>] __lock_acquire+0x7a4/0x18d8
      [   30.257135]   [<c0063500>] lock_acquire+0x68/0x7c
      [   30.262071]   [<c0588820>] _raw_spin_lock+0x28/0x38
      [   30.267203]   [<c0361240>] at86rf230_xmit+0x1c/0x144
      [   30.272412]   [<c057ba6c>] mac802154_xmit_worker+0x88/0x148
      [   30.278271]   [<c0047844>] process_one_work+0x274/0x404
      [   30.283761]   [<c00484c0>] worker_thread+0x228/0x374
      [   30.288971]   [<c004cfb8>] kthread+0xd0/0xe4
      [   30.293455]   [<c000dac8>] ret_from_fork+0x14/0x2c
      [   30.298493] irq event stamp: 8948
      [   30.301963] hardirqs last  enabled at (8947): [<c00cb290>] __kmalloc+0xb4/0x110
      [   30.309636] hardirqs last disabled at (8948): [<c00115d4>] __irq_svc+0x34/0x5c
      [   30.317215] softirqs last  enabled at (8452): [<c0037324>] __do_softirq+0x1dc/0x264
      [   30.325243] softirqs last disabled at (8439): [<c0037638>] irq_exit+0x80/0xf4
      
      We use the lp->lock inside the isr of at86rf230, that's why we need the
      irqsave spinlock calls.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e07a1e0
    • O
      net/mlx4_en: Deregister multicast vxlan steering rules when going down · de123268
      Or Gerlitz 提交于
      When mlx4_en_stop_port() is called, we need to deregister also the
      tunnel steering rules that relate to multicast.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de123268
    • A
      vmxnet3: fix building without CONFIG_PCI_MSI · 0a8d8c44
      Arnd Bergmann 提交于
      Since commit d25f06ea "vmxnet3: fix netpoll race condition",
      the vmxnet3 driver fails to build when CONFIG_PCI_MSI is disabled,
      because it unconditionally references the vmxnet3_msix_rx()
      function.
      
      To fix this, use the same #ifdef in the caller that exists around
      the function definition.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
      Cc: "VMware, Inc." <pv-drivers@vmware.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: stable@vger.kernel.org
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a8d8c44
    • D
      MAINTAINERS: add networking selftests to NETWORKING · f4e53f9a
      Daniel Borkmann 提交于
      Add it to NETWORKING [GENERAL] to make sure patches for selftests
      go to the netdev list as well.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4e53f9a
  6. 13 3月, 2014 7 次提交