1. 16 2月, 2016 1 次提交
    • D
      mm/gup: Introduce get_user_pages_remote() · 1e987790
      Dave Hansen 提交于
      For protection keys, we need to understand whether protections
      should be enforced in software or not.  In general, we enforce
      protections when working on our own task, but not when on others.
      We call these "current" and "remote" operations.
      
      This patch introduces a new get_user_pages() variant:
      
              get_user_pages_remote()
      
      Which is a replacement for when get_user_pages() is called on
      non-current tsk/mm.
      
      We also introduce a new gup flag: FOLL_REMOTE which can be used
      for the "__" gup variants to get this new behavior.
      
      The uprobes is_trap_at_addr() location holds mmap_sem and
      calls get_user_pages(current->mm) on an instruction address.  This
      makes it a pretty unique gup caller.  Being an instruction access
      and also really originating from the kernel (vs. the app), I opted
      to consider this a 'remote' access where protection keys will not
      be enforced.
      
      Without protection keys, this patch should not change any behavior.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e987790
  2. 13 2月, 2016 2 次提交
    • L
      IB/mlx5: Fix RC transport send queue overhead computation · 75c1657e
      Leon Romanovsky 提交于
      Fix the RC QPs send queue overhead computation to take into account
      two additional segments in the WQE which are needed for registration
      operations.
      
      The ATOMIC and UMR segments can't coexist together, so chose maximum out
      of them.
      
      The commit 9e65dc37 ("IB/mlx5: Fix RC transport send queue overhead
      computation") was intended to update RC transport as commit messages
      states, but added the code to UC transport.
      
      Fixes: 9e65dc37 ("IB/mlx5: Fix RC transport send queue overhead computation")
      Signed-off-by: NKamal Heib <kamalh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      75c1657e
    • A
      IB/ipoib: fix for rare multicast join race condition · 08bc3276
      Alex Estrin 提交于
      A narrow window for race condition still exist between
      multicast join thread and *dev_flush workers.
      A kernel crash caused by prolong erratic link state changes
      was observed (most likely a faulty cabling):
      
      [167275.656270] BUG: unable to handle kernel NULL pointer dereference at
      0000000000000020
      [167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib]
      [167275.674443] PGD 0
      [167275.677373] Oops: 0000 [#1] SMP
      ...
      [167275.977530] Call Trace:
      [167275.982225]  [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib]
      [167275.992024]  [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490
      [ib_ipoib]
      [167276.002149]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
      [167276.010754]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
      [167276.019088]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
      [167276.027737]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
      Here was a hit spot:
      ipoib_mcast_join() {
      ..............
            rec.qkey      = priv->broadcast->mcmember.qkey;
                                             ^^^^^^^
      .....
       }
      Proposed patch should prevent multicast join task to continue
      if link state change is detected.
      Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
      
      Changes from v4:
      - as suggested by Doug Ledford, optimized spinlock usage,
      i.e. ipoib_mcast_join() is called with lock held.
      Changes from v3:
      - sync with priv->lock before flag check.
      Chages from v2:
      - Move check for OPER_UP flag state to mcast_join() to
      ensure no event worker is in progress.
      - minor style fixes.
      Changes from v1:
      - No need to lock again if error detected.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      08bc3276
  3. 12 2月, 2016 1 次提交
  4. 06 2月, 2016 4 次提交
  5. 04 2月, 2016 2 次提交
  6. 03 2月, 2016 5 次提交
  7. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  8. 22 1月, 2016 12 次提交
  9. 20 1月, 2016 12 次提交
    • S
      IB/srpt: Remove redundant wc array · f9a6ed62
      Sagi Grimberg 提交于
      No usage after the conversion to the new CQ API.
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f9a6ed62
    • M
      IB/qib: Improve ipoib UD performance · 967bcfc0
      Mike Marciniszyn 提交于
      Based on profiling, UD performance drops in case of processes
      in a single client due to excess context switches when
      the progress workqueue is scheduled.
      
      This is solved by modifying the heuristic to select the
      direct progress instead of the scheduling progress via
      the workqueue when UD-like situations are detected in
      the heuristic.
      Reviewed-by: NVinit Agnihotri <vinit.abhay.agnihotri@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      967bcfc0
    • M
      IB/mlx4: Advertise RoCE v2 support · 4ed088e6
      Matan Barak 提交于
      Advertise RoCE v2 support in port_immutable attributes according to
      the hardware's capabilities. This enables the verbs stack to use
      RoCE v2 mode.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4ed088e6
    • M
      IB/mlx4: Create and use another QP1 for RoCEv2 · e1b866c6
      Moni Shoua 提交于
      The mlx4 driver uses a special QP to implement the GSI QP. This kind
      of QP allows to build the InfiniBand headers in software.
      When mlx4 hardware builds the packet, it calculates the ICRC and puts
      it at the end of the payload. However, this ICRC calculation depends
      on the QP configuration, which is determined when the QP is modified
      (roce_mode during INIT->RTR).
      When receiving a packet, the ICRC verification doesn't depend on this
      configuration.
      Therefore, using two GSI QPs for send (one for each RoCE version) and
      one GSI QP for receive are required.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e1b866c6
    • M
      IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers · 3ef967a4
      Moni Shoua 提交于
      RoCEv2 packets are sent over IP/UDP protocols.
      The mlx4 driver uses a type of RAW QP to send packets for QP1 and
      therefore needs to build the network headers below BTH in software.
      
      This patch adds option to build QP1 packets with IP and UDP headers if
      RoCEv2 is requested.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3ef967a4
    • M
      IB/mlx4: Enable RoCE v2 when the IB device is added · 71a39bbb
      Moni Shoua 提交于
      If the hardware supports RoCE v2, we configure the hardware UDP
      port according to the RoCE v2 Annex when mlx4_ib device is added.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      71a39bbb
    • M
      IB/mlx4: Support modify_qp for RoCE v2 · 3b5daf28
      Moni Shoua 提交于
      In order to support modify_qp for RoCE v2, we need to set
      the gid_type in the QP context.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3b5daf28
    • M
      IB/mlx4: Add support for setting RoCEv2 gids in hardware · 7e57b85c
      Moni Shoua 提交于
      To tell hardware about a gid with type RoCEv2, software needs a new
      modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can
      replace the old method, MLX4_SET_PORT_GID_TABLE, for  RoCEv1 gids.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      7e57b85c
    • M
      IB/mlx4: Add gid_type to GID properties · b699a859
      Moni Shoua 提交于
      IB core driver adds a property of type to struct ib_gid_attr.
      The mlx4 driver should take that in consideration when modifying or
      querying the hardware gid table.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      b699a859
    • M
      IB/core: Use hop-limit from IP stack for RoCE · c3efe750
      Matan Barak 提交于
      Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
      RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
      hop limit values.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c3efe750
    • M
      IB/core: Rename rdma_addr_find_dmac_by_grh · f7f4b23e
      Matan Barak 提交于
      rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and
      downsteram patch will also add hop_limit as an output parameter,
      thus we rename it to rdma_addr_find_l2_eth_by_grh.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f7f4b23e
    • B
      IB/cm: Fix a recently introduced deadlock · 4bfdf635
      Bart Van Assche 提交于
      ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
      that can be locked from inside an interrupt handler. Hence do not
      enable interrupts inside cm_enter_timewait() if called with interrupts
      disabled.
      
      This patch fixes e.g. the following deadlock:
      Acked-by: NErez Shitrit <erezsh@mellanox.com>
      
      =================================
      [ INFO: inconsistent lock state ]
      4.4.0-rc7+ #1 Tainted: G            E
      ---------------------------------
      inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
      swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
      (&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x
      74/0x1b0 [ib_cm]
      {HARDIRQ-ON-W} state was registered at:
        [<ffffffff810a3c11>] mark_held_locks+0x71/0x90
        [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
        [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
        [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
        [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
        [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
        [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
        [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
        [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
        [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
        [<ffffffff8107184d>] process_one_work+0x1bd/0x460
        [<ffffffff81073148>] worker_thread+0x118/0x420
        [<ffffffff81078454>] kthread+0xe4/0x100
        [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
      irq event stamp: 1672286
      hardirqs last  enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80
      hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89
      softirqs last  enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
      softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&cm_id_priv->lock)->rlock);
        <Interrupt>
          lock(&(&cm_id_priv->lock)->rlock);
      
       *** DEADLOCK ***
      
      no locks held by swapper/8/0.
      
      stack backtrace:
      CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E   4.4.0-rc7+ #1
      Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
       ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
       0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
       ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
      Call Trace:
       <IRQ>  [<ffffffff81251c1b>] dump_stack+0x4f/0x74
       [<ffffffff810a32f4>] print_usage_bug+0x184/0x190
       [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
       [<ffffffff810a3995>] mark_lock+0x115/0x1b0
       [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
       [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
       [<ffffffff810a53c2>] lock_acquire+0x62/0x80
       [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
       [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
       [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
       [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
       [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
       [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
       [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
       [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
       [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
       [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70
       [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
       [<ffffffff81007f3d>] handle_irq+0x5d/0x130
       [<ffffffff810071fd>] do_IRQ+0x6d/0x130
       [<ffffffff8151d309>] common_interrupt+0x89/0x89
       <EOI>  [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
       [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20
       [<ffffffff810990d6>] call_cpuidle+0x36/0x60
       [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110
       [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
       [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10
       [<ffffffff8103c443>] start_secondary+0x83/0x90
      
      Fixes: commit be4b4993 ("IB/cm: Do not queue work to a device that's going away")
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Erez Shitrit <erezsh@mellanox.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4bfdf635