提交 · 810bc075f78ff2c221536eb3008eac6a492dba2d · openanolis / cloud-kernel

17 7月, 2015 12 次提交

x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection · 810bc075

由 Andy Lutomirski 提交于 7月 15, 2015

We have a tricky bug in the nested NMI code: if we see RSP
pointing to the NMI stack on NMI entry from kernel mode, we
assume that we are executing a nested NMI.

This isn't quite true.  A malicious userspace program can point
RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
happen while RSP is still pointing at the NMI stack.

Fix it with a sneaky trick.  Set DF in the region of code that
the RSP check is intended to detect.  IRET will clear DF
atomically.

( Note: other than paravirt, there's little need for all this
  complexity. We could check RIP instead of RSP. )
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

810bc075

x86/nmi/64: Reorder nested NMI checks · a27507ca

由 Andy Lutomirski 提交于 7月 15, 2015

Check the repeat_nmi .. end_repeat_nmi special case first.  The
next patch will rework the RSP check and, as a side effect, the
RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
we'll need this ordering of the checks.

Note: this is more subtle than it appears.  The check for
repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
instead of adjusting the "iret" frame to force a repeat.  This
is necessary, because the code between repeat_nmi and
end_repeat_nmi sets "NMI executing" and then writes to the
"iret" frame itself.  If a nested NMI comes in and modifies the
"iret" frame while repeat_nmi is also modifying it, we'll end up
with garbage.  The old code got this right, as does the new
code, but the new code is a bit more explicit.

If we were to move the check right after the "NMI executing"
check, then we'd get it wrong and have random crashes.

( Because the "NMI executing" check would jump to the code that would
  modify the "iret" frame without checking if the interrupted NMI was
  currently modifying it. )
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

a27507ca

x86/nmi/64: Improve nested NMI comments · 0b22930e

由 Andy Lutomirski 提交于 7月 15, 2015

I found the nested NMI documentation to be difficult to follow.
Improve the comments.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

0b22930e

x86/nmi/64: Switch stacks on userspace NMI entry · 9b6e6a83

由 Andy Lutomirski 提交于 7月 15, 2015

Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.

The NMI nesting fixup relies on a precise stack layout and
atomic IRET.  Rather than trying to teach the NMI nesting fixup
to handle ESPFIX and failed IRET, punt: run NMIs that came from
user mode on the normal kernel stack.

This will make some nested NMIs visible to C code, but the C
code is okay with that.

As a side effect, this should speed up perf: it eliminates an
RDMSR when NMIs come from user mode.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

9b6e6a83

x86/nmi/64: Remove asm code that saves CR2 · 0e181bb5

由 Andy Lutomirski 提交于 7月 15, 2015

Now that do_nmi saves CR2, we don't need to save it in asm.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

0e181bb5

x86/nmi: Enable nested do_nmi() handling for 64-bit kernels · 9d050416

由 Andy Lutomirski 提交于 7月 15, 2015

32-bit kernels handle nested NMIs in C.  Enable the exact same
handling on 64-bit kernels as well.  This isn't currently
necessary, but it will become necessary once the asm code starts
allowing limited nesting.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

9d050416

Merge tag 'pm+acpi-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 21bdb584

由 Linus Torvalds 提交于 7月 16, 2015

Pull power management and ACPI fixes from Rafael Wysocki:
 "These fix two bugs in the cpufreq core (including one recent
  regression), fix a 4.0 PCI regression related to the ACPI resources
  management and quieten an RCU-related lockdep complaint about a
  tracepoint in the suspend-to-idle code.

  Specifics:

   - Fix a recently introduced issue in the cpufreq policy object
     reinitialization that leads to CPU offline/online breakage (Viresh
     Kumar)

   - Make it possible to access frequency tables of offline CPUs which
     is needed by thermal management code among other things (Viresh
     Kumar)

   - Fix an ACPI resource management regression introduced during the
     4.0 cycle that may cause incorrect resource validation results to
     appear in 32-bit x86 kernels due to silent truncation of 64-bit
     values to 32-bit (Jiang Liu)

   - Fix up an RCU-related lockdep complaint about suspicious RCU usage
     in idle caused by using a suspend tracepoint in the core suspend-
     to-idle code (Rafael J Wysocki)"

* tag 'pm+acpi-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  cpufreq: Allow freq_table to be obtained for offline CPUs
  cpufreq: Initialize the governor again while restoring policy
  suspend-to-idle: Prevent RCU from complaining about tick_freeze()

21bdb584

Merge tag 'platform-drivers-x86-v4.2-3' of... · 3e87ee06

由 Linus Torvalds 提交于 7月 16, 2015

Merge tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
 "Fix SMBIOS call handling and hwswitch state coherency in the
  dell-laptop driver.  Cleanups for intel_*_ipc drivers.  Details:

  dell-laptop:
   - Do not cache hwswitch state
   - Check return value of each SMBIOS call
   - Clear buffer before each SMBIOS call

  intel_scu_ipc:
   - Move local memory initialization out of a mutex

  intel_pmc_ipc:
   - Update kerneldoc formatting
   - Fix compiler casting warnings"

* tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  intel_scu_ipc: move local memory initialization out of a mutex
  intel_pmc_ipc: Update kerneldoc formatting
  dell-laptop: Do not cache hwswitch state
  dell-laptop: Check return value of each SMBIOS call
  dell-laptop: Clear buffer before each SMBIOS call
  intel_pmc_ipc: Fix compiler casting warnings

3e87ee06

Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · f85c7124

由 Linus Torvalds 提交于 7月 16, 2015

Pull m68knommu/coldfire fixes from Greg Ungerer:
 "Contains build fixes and updates for the ColdFire defconfigs.

  Specifically there is a couple of fixes that address problems building
  allnoconfig.  Also fix for enabling PCI bus on the M54xx family of
  ColdFire"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: enable PCI support for m5475evb defconfig
  m68k: fix io functions for ColdFire/MMU/PCI case
  m68knommu: update defconfig for ColdFire m5475evb
  m68knommu: update defconfig for ColdFire m5407c3
  m68knommu: update defconfig for ColdFire m5307c3
  m68knommu: update defconfig for ColdFire m5275evb
  m68knommu: update defconfig for ColdFire m5272c3
  m68knommu: update defconfig for ColdFire m5249evb
  m68knommu: update defconfig for m5208evb
  m68knommu: make ColdFire SoC selection a choice
  m68knommu: improve the clock configuration defaults
  m68knommu: force setting of CONFIG_CLOCK_FREQ for ColdFire

f85c7124

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 761ab766

由 Linus Torvalds 提交于 7月 16, 2015

Pull block fixes from Jens Axboe:
 "A collection of fixes from the last few weeks that should go into the
  current series.  This contains:

   - Various fixes for the per-blkcg policy data, fixing regressions
     since 4.1.  From Arianna and Tejun

   - Code cleanup for bcache closure macros from me.  Really just
     flushing this out, it's been sitting in another branch for months

   - FIELD_SIZEOF cleanup from Maninder Singh

   - bio integrity oops fix from Mike

   - Timeout regression fix for blk-mq from Ming Lei"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: set default timeout as 30 seconds
  NVMe: Reread partitions on metadata formats
  bcache: don't embed 'return' statements in closure macros
  blkcg: fix blkcg_policy_data allocation bug
  blkcg: implement all_blkcgs list
  blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[]
  blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods
  block/blk-cgroup.c: free per-blkcg data when freeing the blkcg
  block: use FIELD_SIZEOF to calculate size of a field
  bio integrity: do not assume bio_integrity_pool exists if bioset exists

761ab766

Merge tag 'jfs-4.2' of git://github.com/kleikamp/linux-shaggy · f76d94de

由 Linus Torvalds 提交于 7月 16, 2015

Pull jfs fixes from David Kleikamp:
 "A couple trivial fixes and an error path fix"

* tag 'jfs-4.2' of git://github.com/kleikamp/linux-shaggy:
  jfs: clean up jfs_rename and fix out of order unlock
  jfs: fix indentation on if statement
  jfs: removed a prohibited space after opening parenthesis

f76d94de

Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'acpi-resources' · 17ffc8b0

由 Rafael J. Wysocki 提交于 7月 16, 2015

* pm-cpuidle:
  suspend-to-idle: Prevent RCU from complaining about tick_freeze()

* pm-cpufreq:
  cpufreq: Allow freq_table to be obtained for offline CPUs
  cpufreq: Initialize the governor again while restoring policy

* acpi-resources:
  ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel

17ffc8b0

16 7月, 2015 11 次提交

blk-mq: set default timeout as 30 seconds · e56f698b

由 Ming Lei 提交于 7月 16, 2015

It is reasonable to set default timeout of request as 30 seconds instead of
30000 ticks, which may be 300 seconds if HZ is 100, for example, some arm64
based systems may choose 100 HZ.
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Fixes: c76cbbcf ("blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()"
Signed-off-by: NJens Axboe <axboe@fb.com>

e56f698b

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 3aa20508

由 Linus Torvalds 提交于 7月 15, 2015

Pull TPM bugfixes from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  tpm, tpm_crb: fail when TPM2 ACPI table contents look corrupted
  tpm: Fix initialization of the cdev

3aa20508

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · 9090fdb9

由 Linus Torvalds 提交于 7月 15, 2015

Pull rdma fixes from Doug Ledford:
 "Mainly fix-ups for the various 4.2 items"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits)
  IB/core: Destroy ocrdma_dev_id IDR on module exit
  IB/core: Destroy multcast_idr on module exit
  IB/mlx4: Optimize do_slave_init
  IB/mlx4: Fix memory leak in do_slave_init
  IB/mlx4: Optimize freeing of items on error unwind
  IB/mlx4: Fix use of flow-counters for process_mad
  IB/ipath: Convert use of __constant_<foo> to <foo>
  IB/ipoib: Set MTU to max allowed by mode when mode changes
  IB/ipoib: Scatter-Gather support in connected mode
  IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES
  IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush
  IB/ucma: Fix lockdep warning in ucma_lock_files
  rds: rds_ib_device.refcount overflow
  RDMA/nes: Fix for incorrect recording of the MAC address
  RDMA/nes: Fix for resolving the neigh
  RDMA/core: Fixes for port mapper client registration
  IB/IPoIB: Fix bad error flow in ipoib_add_port()
  IB/mlx4: Do not attemp to report HCA clock offset on VFs
  IB/cm: Do not queue work to a device that's going away
  IB/srp: Avoid using uninitialized variable
  ...

9090fdb9

NVMe: Reread partitions on metadata formats · 7bee6074

由 Keith Busch 提交于 7月 14, 2015

This patch has the driver automatically reread partitions if a namespace
has a separate metadata format. Previously revalidating a disk was
sufficient to get the correct capacity set on such formatted drives,
but partitions that may exist would not have been surfaced.
Reported-by: NPaul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Tested-by: NPaul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7bee6074

Merge tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux · 16ff49a0

由 Linus Torvalds 提交于 7月 15, 2015

Pull file locking updates from Jeff Layton:
 "I had thought that I was going to get away without a pull request this
  cycle.  There was a NFSv4 file locking problem that cropped up that I
  tried to fix in the NFSv4 code alone, but that fix has turned out to
  be problematic.  These patches fix this in the correct way.

  Note that this touches some NFSv4 code as well.  Ordinarily I'd wait
  for Trond to ACK this, but he's on holiday right now and the bug is
  rather nasty.  So I suggest we merge this and if he raises issues with
  it we can sort it out when he gets back"
Acked-by: NBruce Fields <bfields@fieldses.org>
Acked-by: NDan Williams <dan.j.williams@intel.com>
 [ +1 to this series fixing a 100% reproducible slab corruption +
   general protection fault in my nfs-root test environment. - Dan ]
Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

* tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux:
  locks: inline posix_lock_file_wait and flock_lock_file_wait
  nfs4: have do_vfs_lock take an inode pointer
  locks: new helpers - flock_lock_inode_wait and posix_lock_inode_wait
  locks: have flock_lock_file take an inode pointer instead of a filp
  Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"

16ff49a0

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · df14a68d

由 Linus Torvalds 提交于 7月 15, 2015

Pull KVM fixes from Paolo Bonzini:

 - Fix FPU refactoring ("kvm: x86: fix load xsave feature warning")

 - Fix eager FPU mode (Cc stable)

 - AMD bits of MTRR virtualization

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: fix load xsave feature warning
  KVM: x86: apply guest MTRR virtualization on host reserved pages
  KVM: SVM: Sync g_pat with guest-written PAT value
  KVM: SVM: use NPT page attributes
  KVM: count number of assigned devices
  KVM: VMX: fix vmwrite to invalid VMCS
  KVM: x86: reintroduce kvm_is_mmio_pfn
  x86: hyperv: add CPUID bit for crash handlers

df14a68d

Merge tag 'arc-v4.2-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · bec33cd2

由 Linus Torvalds 提交于 7月 15, 2015

Pull ARC fixes from Vineet Gupta:
 - Makefile changes (top-level+ARC) reinstates -O3 builds (regression
   since 3.16)
 - IDU intc related fixes, IRQ affinity
 - patch to make bitops safer for ARC
 - perf fix from Alexey to remove signed PC braino
 - Futex backend gets llock/scond support

* tag 'arc-v4.2-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARCv2: support HS38 releases
  ARC: make sure instruction_pointer() returns unsigned value
  ARC: slightly refactor macros for boot logging
  ARC: Add llock/scond to futex backend
  arc:irqchip: prepare for drivers/irqchip/irqchip.h removal
  ARC: Make ARC bitops "safer" (add anti-optimization)
  ARCv2: [axs103] bump CPU frequency from 75 to 90 MHZ
  ARCv2: intc: IDU: Fix potential race in installing a chained IRQ handler
  ARCv2: intc: IDU: support irq affinity
  ARC: fix unused var wanring
  ARC: Don't memzero twice in dma_alloc_coherent for __GFP_ZERO
  ARC: Override toplevel default -O2 with -O3
  kbuild: Allow arch Makefiles to override {cpp,ld,c}flags
  ARCv2: guard SLC DMA ops with spinlock
  ARC: Kconfig: better way to disable ARC_HAS_LLSC for ARC_CPU_750D

bec33cd2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 9c69481e

由 Linus Torvalds 提交于 7月 15, 2015

Pull s390 fixes from Martin Schwidefsky:
 "One improvement for the zcrypt driver, the quality attribute for the
  hwrng device has been missing.  Without it the kernel entropy seeding
  will not happen automatically.

  And six bug fixes, the most important one is the fix for the vector
  register corruption due to machine checks"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/nmi: fix vector register corruption
  s390/process: fix sfpc inline assembly
  s390/dasd: fix kernel panic when alias is set offline
  s390/sclp: clear upper register halves in _sclp_print_early
  s390/oprofile: fix compile error
  s390/sclp: fix compile error
  s390/zcrypt: enable s390 hwrng to seed kernel entropy

9c69481e

jfs: clean up jfs_rename and fix out of order unlock · 26456955

由 Dave Kleikamp 提交于 7月 15, 2015

The end of jfs_rename(), which is also used by the error paths,
included a call to IWRITE_UNLOCK(new_ip) after labels out1, out2
and out3. If we come in through these labels, IWRITE_LOCK() has not
been called yet.

In moving that call to the correct spot, I also moved some
exceptional truncate code earlier as well, since the early error
paths don't need to deal with it, and I renamed out4: to out_tx: so
a future patch by Jan Kara doesn't need to deal with renumbering or
confusing out-of-order labels.
Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>

26456955

Merge tag 'module-final-v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux · 97d6e2b6

由 Linus Torvalds 提交于 7月 15, 2015

Pull final init.h/module.h code relocation from Paul Gortmaker:
 "With the release of 4.2-rc2 done, we should not be seeing any new code
  added that gets upset by this small code move, and we've banked yet
  another complete week of testing with this move in place on top of
  4.2-rc1 via linux-next to ensure that remained true.

  Given that, I'd like to put it in now so that people formulating new
  work for 4.3-rc1 will be exposed to the ever so slightly stricter (but
  sensible) requirements wrt.  whether they are needing init.h vs.
  module.h macros, even if they are not using linux-next.

  The diffstat of the move is slightly asymmetrical due to needing to
  leave behind a couple #ifdef in the old location and add the same ones
  to the new location, but other than that, it is a 1:1 move, complete
  with the module_init/exit trailing semicolon that we can't fix.  That
  is, until/unless someone does a tree-wide sed fix of all the
  approximately 800 currently in tree users relying on it"

* tag 'module-final-v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  module: relocate module_init from init.h to module.h

97d6e2b6

Merge tag 'trace-v4.2-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 75580097

由 Linus Torvalds 提交于 7月 15, 2015

Pull tracing fix from Steven Rostedt:
 "Fengguang Wu discovered a crash that happened to be because of the
  branch tracer (traces unlikely and likely branches) when enabled with
  certain debug options.

  What happened was that various debug options like lockdep and
  DEBUG_PREEMPT can cause parts of the branch tracer to recurse outside
  its recursion protection.  In fact, part of its recursion protection
  used these features that caused the lockup.  This cleans up the code a
  little and makes the recursion protection a bit more robust"

* tag 'trace-v4.2-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have branch tracer use recursive field of task struct

75580097

15 7月, 2015 17 次提交

J

Merge tag 'tpm-fixes-for-4.2-rc2' of https://github.com/PeterHuewe/linux-tpmdd into for-linus · 91ef5ccd
由 James Morris 提交于 7月 15, 2015

91ef5ccd

intel_scu_ipc: move local memory initialization out of a mutex · 8642d7f8

由 Christophe JAILLET 提交于 7月 13, 2015

'{ }' and memset will both reset the cbuf buffer.
Only once is enough and this can be done outside fo the mutex.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: NDarren Hart <dvhart@linux.intel.com>

8642d7f8

IB/core: Destroy ocrdma_dev_id IDR on module exit · d8b2ba7c

由 Johannes Thumshirn 提交于 7月 08, 2015

Destroy ocrdma_dev_id IDR on module exit, reclaiming the allocated memory.

This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@

module_init(init);

@ defines_module_exit @
identifier exit;
@@

module_exit(exit);

@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@

DEFINE_IDR(idr);

@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@

exit(void)
{
 ...
 idr_destroy(&idr);
 ...
}

@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@

exit(void)
{
 ...
 +idr_destroy(&idr);
 }

</SmPL>
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d8b2ba7c

IB/core: Destroy multcast_idr on module exit · 45d25420

由 Johannes Thumshirn 提交于 7月 08, 2015

Destroy multcast_idr on module exit, reclaiming the allocated memory.

This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@

module_init(init);

@ defines_module_exit @
identifier exit;
@@

module_exit(exit);

@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@

DEFINE_IDR(idr);

@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@

exit(void)
{
 ...
 idr_destroy(&idr);
 ...
}

@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@

exit(void)
{
 ...
 +idr_destroy(&idr);
}

</SmPL>
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

45d25420

IB/mlx4: Optimize do_slave_init · d9a047ae

由 Doug Ledford 提交于 7月 09, 2015

There is little chance our memory allocation will fail, so we can
combine initializing the work structs with allocating them instead of
looping through all of them once to allocate and again to initialize.
Then when we need to actually find out if our device is up or in the
process of going down, have all of our work structs batched up, take the
spin_lock once and only once, and do all of the batch under the one
spin_lock invocation instead of incurring all of the locked memory cycles
we would otherwise incur to take/release the spin_lock over and over
again.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d9a047ae

IB/mlx4: Fix memory leak in do_slave_init · 9bbf282d

由 Doug Ledford 提交于 7月 09, 2015

We create a number of work structs to be queued up to a workqueue, and
on completion of the workqueue handler, the workqueue handler frees the
allocated memory. If, however, we don't queue the work struct because
the device is going down, then we need to free the memory ourselves.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9bbf282d

IB/mlx4: Optimize freeing of items on error unwind · a39a98ff

由 Maninder Singh 提交于 7月 08, 2015

On failure, we loop through all possible pointers and test them before
calling kfree. But really, why even attempt to free items we didn't
allocate when we can easily loop through exactly and only the devices
for which the original memory allocation succeeded and free just those.
Signed-off-by: NManinder Singh <maninder1.s@samsung.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a39a98ff

IB/mlx4: Fix use of flow-counters for process_mad · 43bfb972

由 Or Gerlitz 提交于 6月 25, 2015

For IB links, reading HCA flow counters through iboe_process_mad() should
be used when mlx4_ib_process_mad() is invoked only for VFs PMA queries and
exactly nothing else.

Fixes: 7193a141 ('IB/mlx4: Set VF to read from QP counters')
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

43bfb972

IB/ipath: Convert use of __constant_<foo> to <foo> · cb1ff431

由 Vaishali Thakkar 提交于 6月 16, 2015

In little endian cases, the macros be16_to_cpu and cpu_to_be64
unfolds to __swab{16,64} which provides special case for constants.
In big endian cases, __constant_be16_to_cpu and be16_to_cpu
expand directly to the same expression. The same applies for
__constant_cpu_to_be64 and cpu_to_be64.

So, replace __constant_be16_to_cpu with be16_to_cpu and
__constant_cpu_to_be64 with cpu_to_be64, with the goal of getting
rid of the definition of __constant_be16_to_cpu and
__constant_cpu_to_be64 completely.
Signed-off-by: NVaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cb1ff431

IB/ipoib: Set MTU to max allowed by mode when mode changes · edcd2a74

由 Erez Shitrit 提交于 6月 07, 2015

When switching between modes (datagram / connected) change the MTU
accordingly.
datagram mode up to 4K, connected mode up to (64K - 0x10).
Signed-off-by: NELi Cohen <eli@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

edcd2a74

IB/ipoib: Scatter-Gather support in connected mode · c4268778

由 Yuval Shaia 提交于 7月 12, 2015

By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better
performance.
This MTU plus overhead puts the memory allocation for IP based packets at
32 4k pages (order 5), which have to be contiguous.
When the system memory under pressure, it was observed that allocating 128k
contiguous physical memory is difficult and causes serious errors (such as
system becomes unusable).

This enhancement resolve the issue by removing the physically contiguous
memory requirement using Scatter/Gather feature that exists in Linux stack.

With this fix Scatter-Gather will be supported also in connected mode.

This change reverts some of the change made in commit e112373f
("IPoIB/cm: Reduce connected mode TX object size").

The ability to use SG in IPoIB CM is possible because the coupling
between NETIF_F_SG and NETIF_F_CSUM was removed in commit
ec5f0615 ("net: Kill link between CSUM and SG features.")
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NChristian Marie <christian@ponies.io>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c4268778

IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES · 59d40dd9

由 Carol L Soto 提交于 6月 11, 2015

ib_ucm_release_dev clears the wrong bit if devnum is greater
than IB_UCM_MAX_DEVICES.
Signed-off-by: NCarol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

59d40dd9

IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush · 8b7cce0d

由 Haggai Eran 提交于 7月 07, 2015

__ipoib_ib_dev_flush calls itself recursively on child devices, and lockdep
complains about locking vlan_rwsem twice (see below). Use down_read_nested
instead of down_read to prevent the warning.

 =============================================
 [ INFO: possible recursive locking detected ]
 4.1.0-rc4+ #36 Tainted: G           O
 ---------------------------------------------
 kworker/u20:2/261 is trying to acquire lock:
  (&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]

 but task is already holding lock:
  (&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&priv->vlan_rwsem);
   lock(&priv->vlan_rwsem);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 3 locks held by kworker/u20:2/261:
  #0:  ("%s""ipoib_flush"){.+.+..}, at: [<ffffffff810827cc>] process_one_work+0x15c/0x760
  #1:  ((&priv->flush_heavy)){+.+...}, at: [<ffffffff810827cc>] process_one_work+0x15c/0x760
  #2:  (&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]

 stack backtrace:
 CPU: 3 PID: 261 Comm: kworker/u20:2 Tainted: G           O    4.1.0-rc4+ #36
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
 Workqueue: ipoib_flush ipoib_ib_dev_flush_heavy [ib_ipoib]
  ffff8801c6c54790 ffff8801c9927af8 ffffffff81665238 0000000000000001
  ffffffff825b5b30 ffff8801c9927bd8 ffffffff810bba51 ffff880100000000
  ffffffff00000001 ffff880100000001 ffff8801c6c55428 ffff8801c6c54790
 Call Trace:
  [<ffffffff81665238>] dump_stack+0x4f/0x6f
  [<ffffffff810bba51>] __lock_acquire+0x741/0x1820
  [<ffffffff810bcbf8>] lock_acquire+0xc8/0x240
  [<ffffffffa0791e2a>] ? __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
  [<ffffffff81669d2c>] down_read+0x4c/0x70
  [<ffffffffa0791e2a>] ? __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
  [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
  [<ffffffffa0791e4a>] __ipoib_ib_dev_flush+0x5a/0x2b0 [ib_ipoib]
  [<ffffffffa07920ba>] ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib]
  [<ffffffff81082871>] process_one_work+0x201/0x760
  [<ffffffff810827cc>] ? process_one_work+0x15c/0x760
  [<ffffffff81082ef0>] worker_thread+0x120/0x4d0
  [<ffffffff81082dd0>] ? process_one_work+0x760/0x760
  [<ffffffff81082dd0>] ? process_one_work+0x760/0x760
  [<ffffffff81088b7e>] kthread+0xfe/0x120
  [<ffffffff81088a80>] ? __init_kthread_worker+0x70/0x70
  [<ffffffff8166c6e2>] ret_from_fork+0x42/0x70
  [<ffffffff81088a80>] ? __init_kthread_worker+0x70/0x70
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8b7cce0d

IB/ucma: Fix lockdep warning in ucma_lock_files · 31b57b87

由 Haggai Eran 提交于 7月 07, 2015

The ucma_lock_files() locks the mut mutex on two files, e.g. for migrating
an ID. Use mutex_lock_nested() to prevent the warning below.

 =============================================
 [ INFO: possible recursive locking detected ]
 4.1.0-rc6-hmm+ #40 Tainted: G           O
 ---------------------------------------------
 pingpong_rpc_se/10260 is trying to acquire lock:
  (&file->mut){+.+.+.}, at: [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]

 but task is already holding lock:
  (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&file->mut);
   lock(&file->mut);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by pingpong_rpc_se/10260:
  #0:  (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]

 stack backtrace:
 CPU: 0 PID: 10260 Comm: pingpong_rpc_se Tainted: G           O    4.1.0-rc6-hmm+ #40
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
  ffff8801f85b63d0 ffff880195677b58 ffffffff81668f49 0000000000000001
  ffffffff825cbbe0 ffff880195677c38 ffffffff810bb991 ffff880100000000
  ffff880100000000 ffff880100000001 ffff8801f85b7010 ffffffff8121bee9
 Call Trace:
  [<ffffffff81668f49>] dump_stack+0x4f/0x6e
  [<ffffffff810bb991>] __lock_acquire+0x741/0x1820
  [<ffffffff8121bee9>] ? dput+0x29/0x320
  [<ffffffff810bcb38>] lock_acquire+0xc8/0x240
  [<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
  [<ffffffff8166b901>] ? mutex_lock_nested+0x291/0x3e0
  [<ffffffff8166b6d5>] mutex_lock_nested+0x65/0x3e0
  [<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
  [<ffffffff810baeed>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff8166b66e>] ? mutex_unlock+0xe/0x10
  [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
  [<ffffffffa0478474>] ucma_write+0xa4/0xb0 [rdma_ucm]
  [<ffffffff81200674>] __vfs_write+0x34/0x100
  [<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
  [<ffffffff810ec055>] ? current_kernel_time+0xc5/0xe0
  [<ffffffff812aa4d3>] ? security_file_permission+0x23/0x90
  [<ffffffff8120088d>] ? rw_verify_area+0x5d/0xe0
  [<ffffffff812009bb>] vfs_write+0xab/0x120
  [<ffffffff81201519>] SyS_write+0x59/0xd0
  [<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
  [<ffffffff8166ffee>] system_call_fastpath+0x12/0x76
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

31b57b87

rds: rds_ib_device.refcount overflow · 4fabb594

由 Wengang Wang 提交于 7月 06, 2015

Fixes: 3e0249f9 ("RDS/IB: add refcount tracking to struct rds_ib_device")

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118         if (atomic_dec_and_test(&rds_ibdev->refcount))
119                 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4fabb594

RDMA/nes: Fix for incorrect recording of the MAC address · 0a691272

由 Tatyana Nikolova 提交于 7月 02, 2015

Fix for incorrect recording of the MAC address
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0a691272

RDMA/nes: Fix for resolving the neigh · 165d6822

由 Tatyana Nikolova 提交于 7月 02, 2015

Neighbor resolution doesn't work without this fix
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

165d6822

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功