1. 14 8月, 2015 5 次提交
    • K
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet 提交于
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ae12666
    • K
      md/raid5: get rid of bio_fits_rdev() · 7140aafc
      Kent Overstreet 提交于
      Remove bio_fits_rdev() as sufficient merge_bvec_fn() handling is now
      performed by blk_queue_split() in md_make_request().
      
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: add more description in commit message]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7140aafc
    • M
      md/raid5: split bio for chunk_aligned_read · 7ef6b12a
      Ming Lin 提交于
      If a read request fits entirely in a chunk, it will be passed directly to the
      underlying device (providing it hasn't failed of course).  If it doesn't fit,
      the slightly less efficient path that uses the stripe_cache is used.
      Requests that get to the stripe cache are always completely split up as
      necessary.
      
      So with RAID5, ripping out the merge_bvec_fn doesn't cause it to stop work,
      but could cause it to take the less efficient path more often.
      
      All that is needed to manage this is for 'chunk_aligned_read' do some bio
      splitting, much like the RAID0 code does.
      
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7ef6b12a
    • K
      bcache: remove driver private bio splitting code · 749b61da
      Kent Overstreet 提交于
      The bcache driver has always accepted arbitrarily large bios and split
      them internally.  Now that every driver must accept arbitrarily large
      bios this code isn't nessecary anymore.
      
      Cc: linux-bcache@vger.kernel.org
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: add more description in commit message]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      749b61da
    • K
      block: make generic_make_request handle arbitrarily sized bios · 54efd50b
      Kent Overstreet 提交于
      The way the block layer is currently written, it goes to great lengths
      to avoid having to split bios; upper layer code (such as bio_add_page())
      checks what the underlying device can handle and tries to always create
      bios that don't need to be split.
      
      But this approach becomes unwieldy and eventually breaks down with
      stacked devices and devices with dynamic limits, and it adds a lot of
      complexity. If the block layer could split bios as needed, we could
      eliminate a lot of complexity elsewhere - particularly in stacked
      drivers. Code that creates bios can then create whatever size bios are
      convenient, and more importantly stacked drivers don't have to deal with
      both their own bio size limitations and the limitations of the
      (potentially multiple) devices underneath them.  In the future this will
      let us delete merge_bvec_fn and a bunch of other code.
      
      We do this by adding calls to blk_queue_split() to the various
      make_request functions that need it - a few can already handle arbitrary
      size bios. Note that we add the call _after_ any call to
      blk_queue_bounce(); this means that blk_queue_split() and
      blk_recalc_rq_segments() don't need to be concerned with bouncing
      affecting segment merging.
      
      Some make_request_fn() callbacks were simple enough to audit and verify
      they don't need blk_queue_split() calls. The skipped ones are:
      
       * nfhd_make_request (arch/m68k/emu/nfblock.c)
       * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
       * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
       * brd_make_request (ramdisk - drivers/block/brd.c)
       * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
       * loop_make_request
       * null_queue_bio
       * bcache's make_request fns
      
      Some others are almost certainly safe to remove now, but will be left
      for future patches.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits)
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      54efd50b
  2. 12 8月, 2015 1 次提交
    • S
      block: don't access bio->bi_error after bio_put() · 9b81c842
      Sasha Levin 提交于
      Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few
      dereferences of 'bio' after a call to bio_put(). This causes use-after-frees
      such as:
      
      [521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714
      [521120.720638] Read of size 4 by task mount.ocfs2/9644
      [521120.721212] =============================================================================
      [521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected
      [521120.722968] -----------------------------------------------------------------------------
      [521120.722968]
      [521120.723915] Disabling lock debugging due to kernel taint
      [521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080
      [521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800
      [521120.726037]
      [521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff  ...6............
      [521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00  ................
      [521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00  ................
      [521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff  ................
      [521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff  ...........6....
      [521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff  .s".....@..<....
      [521120.736667] Object ffff880f36b38790: 00 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00  ................
      [521120.737596] Object ffff880f36b387a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.738524] Object ffff880f36b387b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.739388] Object ffff880f36b387c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.740277] Object ffff880f36b387d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.741187] Object ffff880f36b387e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.742233] Object ffff880f36b387f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [521120.743229] CPU: 41 PID: 9644 Comm: mount.ocfs2 Tainted: G    B           4.2.0-rc6-next-20150810-sasha-00039-gf909086 #2420
      [521120.744274]  ffff880f36b38000 ffff880d89c8f638 ffffffffb6e9ba8a ffff880101c0e5c0
      [521120.745025]  ffff880d89c8f668 ffffffffad76a313 ffff880101c0e5c0 ffffea003cdace00
      [521120.745908]  ffff880f36b38700 ffff880f36b38798 ffff880d89c8f690 ffffffffad772854
      [521120.747063] Call Trace:
      [521120.747520] dump_stack (lib/dump_stack.c:52)
      [521120.748053] print_trailer (mm/slub.c:653)
      [521120.748582] object_err (mm/slub.c:660)
      [521120.749079] kasan_report_error (include/linux/kasan.h:20 mm/kasan/report.c:152 mm/kasan/report.c:194)
      [521120.750834] __asan_report_load4_noabort (mm/kasan/report.c:250)
      [521120.753580] dio_bio_complete (fs/direct-io.c:478)
      [521120.755752] do_blockdev_direct_IO (fs/direct-io.c:494 fs/direct-io.c:1291)
      [521120.759765] __blockdev_direct_IO (fs/direct-io.c:1322)
      [521120.761658] blkdev_direct_IO (fs/block_dev.c:162)
      [521120.762993] generic_file_read_iter (mm/filemap.c:1738)
      [521120.767405] blkdev_read_iter (fs/block_dev.c:1649)
      [521120.768556] __vfs_read (fs/read_write.c:423 fs/read_write.c:434)
      [521120.772126] vfs_read (fs/read_write.c:454)
      [521120.773118] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594)
      [521120.776062] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
      [521120.777375] Memory state around the buggy address:
      [521120.778118]  ffff880f36b38600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [521120.779211]  ffff880f36b38680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [521120.780315] >ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [521120.781465]                          ^
      [521120.782083]  ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [521120.783717]  ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [521120.784818] ==================================================================
      
      This patch fixes a few of those places that I caught while auditing the patch, but the
      original patch should be audited further for more occurences of this issue since I'm
      not too familiar with the code.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9b81c842
  3. 29 7月, 2015 2 次提交
    • J
      block: manipulate bio->bi_flags through helpers · b7c44ed9
      Jens Axboe 提交于
      Some places use helpers now, others don't. We only have the 'is set'
      helper, add helpers for setting and clearing flags too.
      
      It was a bit of a mess of atomic vs non-atomic access. With
      BIO_UPTODATE gone, we don't have any risk of concurrent access to the
      flags. So relax the restriction and don't make any of them atomic. The
      flags that do have serialization issues (reffed and chained), we
      already handle those separately.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b7c44ed9
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  4. 17 7月, 2015 1 次提交
  5. 16 7月, 2015 1 次提交
  6. 15 7月, 2015 24 次提交
    • C
      intel_scu_ipc: move local memory initialization out of a mutex · 8642d7f8
      Christophe JAILLET 提交于
      '{ }' and memset will both reset the cbuf buffer.
      Only once is enough and this can be done outside fo the mutex.
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      8642d7f8
    • J
      IB/core: Destroy ocrdma_dev_id IDR on module exit · d8b2ba7c
      Johannes Thumshirn 提交于
      Destroy ocrdma_dev_id IDR on module exit, reclaiming the allocated memory.
      
      This was detected by the following semantic patch (written by Luis Rodriguez
      <mcgrof@suse.com>)
      <SmPL>
      @ defines_module_init @
      declarer name module_init, module_exit;
      declarer name DEFINE_IDR;
      identifier init;
      @@
      
      module_init(init);
      
      @ defines_module_exit @
      identifier exit;
      @@
      
      module_exit(exit);
      
      @ declares_idr depends on defines_module_init && defines_module_exit @
      identifier idr;
      @@
      
      DEFINE_IDR(idr);
      
      @ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
      identifier declares_idr.idr, defines_module_exit.exit;
      @@
      
      exit(void)
      {
       ...
       idr_destroy(&idr);
       ...
      }
      
      @ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
      identifier declares_idr.idr, defines_module_exit.exit;
      @@
      
      exit(void)
      {
       ...
       +idr_destroy(&idr);
       }
      
      </SmPL>
      Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d8b2ba7c
    • J
      IB/core: Destroy multcast_idr on module exit · 45d25420
      Johannes Thumshirn 提交于
      Destroy multcast_idr on module exit, reclaiming the allocated memory.
      
      This was detected by the following semantic patch (written by Luis Rodriguez
      <mcgrof@suse.com>)
      <SmPL>
      @ defines_module_init @
      declarer name module_init, module_exit;
      declarer name DEFINE_IDR;
      identifier init;
      @@
      
      module_init(init);
      
      @ defines_module_exit @
      identifier exit;
      @@
      
      module_exit(exit);
      
      @ declares_idr depends on defines_module_init && defines_module_exit @
      identifier idr;
      @@
      
      DEFINE_IDR(idr);
      
      @ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
      identifier declares_idr.idr, defines_module_exit.exit;
      @@
      
      exit(void)
      {
       ...
       idr_destroy(&idr);
       ...
      }
      
      @ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
      identifier declares_idr.idr, defines_module_exit.exit;
      @@
      
      exit(void)
      {
       ...
       +idr_destroy(&idr);
      }
      
      </SmPL>
      Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      45d25420
    • D
      IB/mlx4: Optimize do_slave_init · d9a047ae
      Doug Ledford 提交于
      There is little chance our memory allocation will fail, so we can
      combine initializing the work structs with allocating them instead of
      looping through all of them once to allocate and again to initialize.
      Then when we need to actually find out if our device is up or in the
      process of going down, have all of our work structs batched up, take the
      spin_lock once and only once, and do all of the batch under the one
      spin_lock invocation instead of incurring all of the locked memory cycles
      we would otherwise incur to take/release the spin_lock over and over
      again.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d9a047ae
    • D
      IB/mlx4: Fix memory leak in do_slave_init · 9bbf282d
      Doug Ledford 提交于
      We create a number of work structs to be queued up to a workqueue, and
      on completion of the workqueue handler, the workqueue handler frees the
      allocated memory.  If, however, we don't queue the work struct because
      the device is going down, then we need to free the memory ourselves.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9bbf282d
    • M
      IB/mlx4: Optimize freeing of items on error unwind · a39a98ff
      Maninder Singh 提交于
      On failure, we loop through all possible pointers and test them before
      calling kfree.  But really, why even attempt to free items we didn't
      allocate when we can easily loop through exactly and only the devices
      for which the original memory allocation succeeded and free just those.
      Signed-off-by: NManinder Singh <maninder1.s@samsung.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      a39a98ff
    • O
      IB/mlx4: Fix use of flow-counters for process_mad · 43bfb972
      Or Gerlitz 提交于
      For IB links, reading HCA flow counters through iboe_process_mad() should
      be used when mlx4_ib_process_mad() is invoked only for VFs PMA queries and
      exactly nothing else.
      
      Fixes: 7193a141 ('IB/mlx4: Set VF to read from QP counters')
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      43bfb972
    • V
      IB/ipath: Convert use of __constant_<foo> to <foo> · cb1ff431
      Vaishali Thakkar 提交于
      In little endian cases, the macros be16_to_cpu and cpu_to_be64
      unfolds to __swab{16,64} which provides special case for constants.
      In big endian cases, __constant_be16_to_cpu and be16_to_cpu
      expand directly to the same expression. The same applies for
      __constant_cpu_to_be64 and cpu_to_be64.
      
      So, replace __constant_be16_to_cpu with be16_to_cpu and
      __constant_cpu_to_be64 with cpu_to_be64, with the goal of getting
      rid of the definition of __constant_be16_to_cpu and
      __constant_cpu_to_be64 completely.
      Signed-off-by: NVaishali Thakkar <vthakkar1994@gmail.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      cb1ff431
    • E
      IB/ipoib: Set MTU to max allowed by mode when mode changes · edcd2a74
      Erez Shitrit 提交于
      When switching between modes (datagram / connected) change the MTU
      accordingly.
      datagram mode up to 4K, connected mode up to (64K - 0x10).
      Signed-off-by: NELi Cohen <eli@mellanox.com>
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      edcd2a74
    • Y
      IB/ipoib: Scatter-Gather support in connected mode · c4268778
      Yuval Shaia 提交于
      By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better
      performance.
      This MTU plus overhead puts the memory allocation for IP based packets at
      32 4k pages (order 5), which have to be contiguous.
      When the system memory under pressure, it was observed that allocating 128k
      contiguous physical memory is difficult and causes serious errors (such as
      system becomes unusable).
      
      This enhancement resolve the issue by removing the physically contiguous
      memory requirement using Scatter/Gather feature that exists in Linux stack.
      
      With this fix Scatter-Gather will be supported also in connected mode.
      
      This change reverts some of the change made in commit e112373f
      ("IPoIB/cm: Reduce connected mode TX object size").
      
      The ability to use SG in IPoIB CM is possible because the coupling
      between NETIF_F_SG and NETIF_F_CSUM was removed in commit
      ec5f0615 ("net: Kill link between CSUM and SG features.")
      Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
      Acked-by: NChristian Marie <christian@ponies.io>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c4268778
    • C
      IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES · 59d40dd9
      Carol L Soto 提交于
      ib_ucm_release_dev clears the wrong bit if devnum is greater
      than IB_UCM_MAX_DEVICES.
      Signed-off-by: NCarol L Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      59d40dd9
    • H
      IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush · 8b7cce0d
      Haggai Eran 提交于
      __ipoib_ib_dev_flush calls itself recursively on child devices, and lockdep
      complains about locking vlan_rwsem twice (see below). Use down_read_nested
      instead of down_read to prevent the warning.
      
       =============================================
       [ INFO: possible recursive locking detected ]
       4.1.0-rc4+ #36 Tainted: G           O
       ---------------------------------------------
       kworker/u20:2/261 is trying to acquire lock:
        (&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
      
       but task is already holding lock:
        (&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&priv->vlan_rwsem);
         lock(&priv->vlan_rwsem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       3 locks held by kworker/u20:2/261:
        #0:  ("%s""ipoib_flush"){.+.+..}, at: [<ffffffff810827cc>] process_one_work+0x15c/0x760
        #1:  ((&priv->flush_heavy)){+.+...}, at: [<ffffffff810827cc>] process_one_work+0x15c/0x760
        #2:  (&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
      
       stack backtrace:
       CPU: 3 PID: 261 Comm: kworker/u20:2 Tainted: G           O    4.1.0-rc4+ #36
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
       Workqueue: ipoib_flush ipoib_ib_dev_flush_heavy [ib_ipoib]
        ffff8801c6c54790 ffff8801c9927af8 ffffffff81665238 0000000000000001
        ffffffff825b5b30 ffff8801c9927bd8 ffffffff810bba51 ffff880100000000
        ffffffff00000001 ffff880100000001 ffff8801c6c55428 ffff8801c6c54790
       Call Trace:
        [<ffffffff81665238>] dump_stack+0x4f/0x6f
        [<ffffffff810bba51>] __lock_acquire+0x741/0x1820
        [<ffffffff810bcbf8>] lock_acquire+0xc8/0x240
        [<ffffffffa0791e2a>] ? __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
        [<ffffffff81669d2c>] down_read+0x4c/0x70
        [<ffffffffa0791e2a>] ? __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
        [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
        [<ffffffffa0791e4a>] __ipoib_ib_dev_flush+0x5a/0x2b0 [ib_ipoib]
        [<ffffffffa07920ba>] ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib]
        [<ffffffff81082871>] process_one_work+0x201/0x760
        [<ffffffff810827cc>] ? process_one_work+0x15c/0x760
        [<ffffffff81082ef0>] worker_thread+0x120/0x4d0
        [<ffffffff81082dd0>] ? process_one_work+0x760/0x760
        [<ffffffff81082dd0>] ? process_one_work+0x760/0x760
        [<ffffffff81088b7e>] kthread+0xfe/0x120
        [<ffffffff81088a80>] ? __init_kthread_worker+0x70/0x70
        [<ffffffff8166c6e2>] ret_from_fork+0x42/0x70
        [<ffffffff81088a80>] ? __init_kthread_worker+0x70/0x70
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8b7cce0d
    • H
      IB/ucma: Fix lockdep warning in ucma_lock_files · 31b57b87
      Haggai Eran 提交于
      The ucma_lock_files() locks the mut mutex on two files, e.g. for migrating
      an ID. Use mutex_lock_nested() to prevent the warning below.
      
       =============================================
       [ INFO: possible recursive locking detected ]
       4.1.0-rc6-hmm+ #40 Tainted: G           O
       ---------------------------------------------
       pingpong_rpc_se/10260 is trying to acquire lock:
        (&file->mut){+.+.+.}, at: [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
      
       but task is already holding lock:
        (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&file->mut);
         lock(&file->mut);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       1 lock held by pingpong_rpc_se/10260:
        #0:  (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]
      
       stack backtrace:
       CPU: 0 PID: 10260 Comm: pingpong_rpc_se Tainted: G           O    4.1.0-rc6-hmm+ #40
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
        ffff8801f85b63d0 ffff880195677b58 ffffffff81668f49 0000000000000001
        ffffffff825cbbe0 ffff880195677c38 ffffffff810bb991 ffff880100000000
        ffff880100000000 ffff880100000001 ffff8801f85b7010 ffffffff8121bee9
       Call Trace:
        [<ffffffff81668f49>] dump_stack+0x4f/0x6e
        [<ffffffff810bb991>] __lock_acquire+0x741/0x1820
        [<ffffffff8121bee9>] ? dput+0x29/0x320
        [<ffffffff810bcb38>] lock_acquire+0xc8/0x240
        [<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
        [<ffffffff8166b901>] ? mutex_lock_nested+0x291/0x3e0
        [<ffffffff8166b6d5>] mutex_lock_nested+0x65/0x3e0
        [<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
        [<ffffffff810baeed>] ? trace_hardirqs_on+0xd/0x10
        [<ffffffff8166b66e>] ? mutex_unlock+0xe/0x10
        [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
        [<ffffffffa0478474>] ucma_write+0xa4/0xb0 [rdma_ucm]
        [<ffffffff81200674>] __vfs_write+0x34/0x100
        [<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
        [<ffffffff810ec055>] ? current_kernel_time+0xc5/0xe0
        [<ffffffff812aa4d3>] ? security_file_permission+0x23/0x90
        [<ffffffff8120088d>] ? rw_verify_area+0x5d/0xe0
        [<ffffffff812009bb>] vfs_write+0xab/0x120
        [<ffffffff81201519>] SyS_write+0x59/0xd0
        [<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
        [<ffffffff8166ffee>] system_call_fastpath+0x12/0x76
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      31b57b87
    • T
      RDMA/nes: Fix for incorrect recording of the MAC address · 0a691272
      Tatyana Nikolova 提交于
      Fix for incorrect recording of the MAC address
      Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0a691272
    • T
      RDMA/nes: Fix for resolving the neigh · 165d6822
      Tatyana Nikolova 提交于
      Neighbor resolution doesn't work without this fix
      Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      165d6822
    • T
      RDMA/core: Fixes for port mapper client registration · a7f2f24c
      Tatyana Nikolova 提交于
      Fixes to allow clients to make remove mapping requests, after
      they have provided the user space service with the mapping
      information, they are using when the service is restarted.
      
      1) Adding IWPM_REG_VALID, IWPM_REG_INCOMPL and IWPM_REG_UNDEF
         registration types for the port mapper clients and functions
         to set/check the registration type.
      2) If the port mapper user space service is not available to register
         the client, then its registration stays IWPM_REG_UNDEF and the
         registration isn't checked until the service becomes available
         (no mappings are possible, if the user space service isn't running).
      3) After the service is restarted, the user space port mapper pid is set
         to valid and the client registration is set to IWPM_REG_INCOMPL
         to allow the client to make remove mapping requests.
      Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      a7f2f24c
    • A
      IB/IPoIB: Fix bad error flow in ipoib_add_port() · 58e9cc90
      Amir Vadai 提交于
      Error values of ib_query_port() and ib_query_device() weren't propagated
      correctly. Because of that, ipoib_add_port() could return NULL value,
      which escaped the IS_ERR() check in ipoib_add_one() and we crashed.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      58e9cc90
    • M
      IB/mlx4: Do not attemp to report HCA clock offset on VFs · 8a7ff14d
      Matan Barak 提交于
      mlx4 VFs can provide CQE raw time-stamping services, but they
      don't have the hca core clock mapped to their PCI bars.
      
      As such, we should not attempt to query and report the clock offset
      to user space for VFs. Doing so causes query_device over VFs to fail
      with -ENOSUPP.
      
      Fixes: 4b664c43 ('IB/mlx4: Add support for CQ time-stamping')
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8a7ff14d
    • E
      IB/cm: Do not queue work to a device that's going away · be4b4993
      Erez Shitrit 提交于
      Whenever ib_cm gets remove_one call, like when there is a hot-unplug
      event, the driver should mark itself as going_down and confirm that no
      new works are going to be queued for that device.
      so, the order of the actions are:
      1. mark the going_down bit.
      2. flush the wq.
      3. [make sure no new works for that device.]
      4. unregister mad agent.
      
      otherwise, works that are already queued can be scheduled after the mad
      agent was freed.
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      be4b4993
    • S
      IB/srp: Avoid using uninitialized variable · 3fdf70ac
      Sagi Grimberg 提交于
      We might return res which is not initialized. Also
      reduce code duplication by exporting srp_parse_tmo so
      srp_tmo_set can reuse it.
      
      Detected by Coverity.
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NJenny Falkovich <jennyf@mellanox.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3fdf70ac
    • V
      IB/srpt: Convert use of __constant_cpu_to_beXX to cpu_to_beXX · b356c1c1
      Vaishali Thakkar 提交于
      In little endian cases, the macro cpu_to_be{16,32,64} unfolds to
      __swab{16,32,64} which provides special case for constants. In
      big endian cases, __constant_cpu_to_be{16,32,64} and
      cpu_to_be{16,32,64} expand directly to the same expression. So,
      replace __constant_cpu_to_be{16,32,64} with cpu_to_be{16,32,64}
      with the goal of getting rid of the definitions of
      __constant_cpu_to_be{16,32,64} completely.
      
      The Coccinelle semantic patch that performs this transformation
      is as follows:
      
      @@expression x;@@
      
      (
      - __constant_cpu_to_be16(x)
      + cpu_to_be16(x)
      |
      - __constant_cpu_to_be32(x)
      + cpu_to_be32(x)
      |
      - __constant_cpu_to_be64(x)
      + cpu_to_be64(x)
      )
      Signed-off-by: NVaishali Thakkar <vthakkar1994@gmail.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      b356c1c1
    • I
      IB/mad: Remove improper use of BUG_ON · 3b8ab700
      Ira Weiny 提交于
      We recently added BUG_ON's which were inappropriate for a condition which
      should never happen. Change these to be WARN_ON_ONCE as a debugging aid.
      
      Fixes: 4cd7c947 ('IB/mad: Add support for additional MAD info to/from drivers')
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3b8ab700
    • I
      IB/mad: Fix compare between big endian and cpu endian · cd4cd565
      Ira Weiny 提交于
      The define OPA_LID_PERMISSIVE is big endian and was compared to the
      cpu endian variable opa_drslid.
      
      Problem caught by 0-day build infrastructure.
      
      Fixes: 8e4349d1 (IB/mad: Add final OPA MAD processing)
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NJohn, Jubin <jubin.john@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      cd4cd565
    • H
      IB: Add rdma_cap_ib_switch helper and use where appropriate · 4139032b
      Hal Rosenstock 提交于
      Persuant to Liran's comments on node_type on linux-rdma
      mailing list:
      
      In an effort to reform the RDMA core and ULPs to minimize use of
      node_type in struct ib_device, an additional bit is added to
      struct ib_device for is_switch (IB switch). This is needed
      to be initialized by any IB switch device driver. This is a
      NEW requirement on such device drivers which are all
      "out of tree".
      
      In addition, an ib_switch helper was added to ib_verbs.h
      based on the is_switch device bit rather than node_type
      (although those should be consistent).
      
      The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs)
      as well as (IPoIB and SRP) ULPs are updated where
      appropriate to use this new helper. In some cases,
      the helper is now used under the covers of using
      rdma_[start end]_port rather than the open coding
      previously used.
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Reviewed-By: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Tested-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NHal Rosenstock <hal@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4139032b
  7. 14 7月, 2015 2 次提交
  8. 13 7月, 2015 4 次提交
    • S
      s390/dasd: fix kernel panic when alias is set offline · f81a49d1
      Stefan Haberland 提交于
      The dasd device driver selects which (alias or base) device is used
      for a given requests when the request is build. If the chosen alias
      device is set offline before the request gets queued to the device
      queue the starting function may use device structures that are
      already freed. This might lead to a hanging offline process or a
      kernel panic.
      
      Add a check to the starting function that returns the request to the
      upper layer if the device is already in offline processing.
      
      In addition to that prevent that an alias device that's already in
      offline processing gets chosen as start device.
      Reviewed-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Reviewed-by: NPeter Oberparleiter <peter.oberparleiter@linux.vnet.ibm.com>
      Signed-off-by: NStefan Haberland <stefan.haberland@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f81a49d1
    • L
      Revert "drm/i915: Use crtc_state->active in primary check_plane func" · 01e2d062
      Linus Torvalds 提交于
      This reverts commit dec4f799.
      
      Jörg Otte reports a NULL pointder dereference due to this commit, as
      'crtc_state' very much can be NULL:
      
              crtc_state = state->base.state ?
                      intel_atomic_get_crtc_state(state->base.state, intel_crtc) : NULL;
      
      So the change to test 'crtc_state->base.active' cannot possibly be
      correct as-is.
      
      There may be some other minimal fix (like just checking crtc_state for
      NULL), but I'm just reverting it now for the rc2 release, and people
      like Daniel Vetter who actually know this code will figure out what the
      right solution is in the longer term.
      Reported-and-bisected-by: NJörg Otte <jrg.otte@gmail.com>
      Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01e2d062
    • O
      can: replace timestamp as unique skb attribute · d3b58c47
      Oliver Hartkopp 提交于
      Commit 514ac99c "can: fix multiple delivery of a single CAN frame for
      overlapping CAN filters" requires the skb->tstamp to be set to check for
      identical CAN skbs.
      
      Without timestamping to be required by user space applications this timestamp
      was not generated which lead to commit 36c01245 "can: fix loss of CAN frames
      in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
      by introducing several __net_timestamp() calls.
      
      This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
      to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
      in mainline Linux.
      
      This patch removes the timestamp dependency and uses an atomic counter to
      create an unique identifier together with the skbuff pointer.
      
      Btw: the new skbcnt element introduced in struct can_skb_priv has to be
      initialized with zero in out-of-tree drivers which are not using
      alloc_can{,fd}_skb() too.
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      d3b58c47
    • J
      can: c_can: Fix default pinmux glitch at init · 03336519
      J.D. Schroeder 提交于
      The previous change 3973c526 (net: can: c_can: Disable pins when CAN
      interface is down) causes a slight glitch on the pinctrl settings when used.
      Since commit ab78029e (drivers/pinctrl: grab default handles from device core),
      the device core will automatically set the default pins. This causes the pins
      to be momentarily set to the default and then to the sleep state in
      register_c_can_dev(). By adding an optional "enable" state, boards can set the
      default pin state to be disabled and avoid the glitch when the switch from
      default to sleep first occurs. If the "enable" state is not available
      c_can_pinctrl_select_state() falls back to using the "default" pinctrl state.
      
      [Roger Q] - Forward port to v4.2 and use pinctrl_get_select().
      Signed-off-by: NJ.D. Schroeder <jay.schroeder@garmin.com>
      Signed-off-by: NRoger Quadros <rogerq@ti.com>
      Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      03336519