1. 07 7月, 2017 1 次提交
  2. 24 5月, 2017 2 次提交
    • D
      selinux lsm IB/core: Implement LSM notification system · 8f408ab6
      Daniel Jurgens 提交于
      Add a generic notificaiton mechanism in the LSM. Interested consumers
      can register a callback with the LSM and security modules can produce
      events.
      
      Because access to Infiniband QPs are enforced in the setup phase of a
      connection security should be enforced again if the policy changes.
      Register infiniband devices for policy change notification and check all
      QPs on that device when the notification is received.
      
      Add a call to the notification mechanism from SELinux when the AVC
      cache changes or setenforce is cleared.
      Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
      Acked-by: NJames Morris <james.l.morris@oracle.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      8f408ab6
    • D
      IB/core: Enforce PKey security on QPs · d291f1a6
      Daniel Jurgens 提交于
      Add new LSM hooks to allocate and free security contexts and check for
      permission to access a PKey.
      
      Allocate and free a security context when creating and destroying a QP.
      This context is used for controlling access to PKeys.
      
      When a request is made to modify a QP that changes the port, PKey index,
      or alternate path, check that the QP has permission for the PKey in the
      PKey table index on the subnet prefix of the port. If the QP is shared
      make sure all handles to the QP also have access.
      
      Store which port and PKey index a QP is using. After the reset to init
      transition the user can modify the port, PKey index and alternate path
      independently. So port and PKey settings changes can be a merge of the
      previous settings and the new ones.
      
      In order to maintain access control if there are PKey table or subnet
      prefix change keep a list of all QPs are using each PKey index on
      each port. If a change occurs all QPs using that device and port must
      have access enforced for the new cache settings.
      
      These changes add a transaction to the QP modify process. Association
      with the old port and PKey index must be maintained if the modify fails,
      and must be removed if it succeeds. Association with the new port and
      PKey index must be established prior to the modify and removed if the
      modify fails.
      
      1. When a QP is modified to a particular Port, PKey index or alternate
         path insert that QP into the appropriate lists.
      
      2. Check permission to access the new settings.
      
      3. If step 2 grants access attempt to modify the QP.
      
      4a. If steps 2 and 3 succeed remove any prior associations.
      
      4b. If ether fails remove the new setting associations.
      
      If a PKey table or subnet prefix changes walk the list of QPs and
      check that they have permission. If not send the QP to the error state
      and raise a fatal error event. If it's a shared QP make sure all the
      QPs that share the real_qp have permission as well. If the QP that
      owns a security structure is denied access the security structure is
      marked as such and the QP is added to an error_list. Once the moving
      the QP to error is complete the security structure mark is cleared.
      
      Maintaining the lists correctly turns QP destroy into a transaction.
      The hardware driver for the device frees the ib_qp structure, so while
      the destroy is in progress the ib_qp pointer in the ib_qp_security
      struct is undefined. When the destroy process begins the ib_qp_security
      structure is marked as destroying. This prevents any action from being
      taken on the QP pointer. After the QP is destroyed successfully it
      could still listed on an error_list wait for it to be processed by that
      flow before cleaning up the structure.
      
      If the destroy fails the QPs port and PKey settings are reinserted into
      the appropriate lists, the destroying flag is cleared, and access control
      is enforced, in case there were any cache changes during the destroy
      flow.
      
      To keep the security changes isolated a new file is used to hold security
      related functionality.
      Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      [PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      d291f1a6
  3. 22 4月, 2017 1 次提交
    • P
      IB/core: Fix kernel crash during fail to initialize device · 4be3a4fa
      Parav Pandit 提交于
      This patch fixes the kernel crash that occurs during ib_dealloc_device()
      called due to provider driver fails with an error after
      ib_alloc_device() and before it can register using ib_register_device().
      
      This crashed seen in tha lab as below which can occur with any IB device
      which fails to perform its device initialization before invoking
      ib_register_device().
      
      This patch avoids touching cache and port immutable structures if device
      is not yet initialized.
      It also releases related memory when cache and port immutable data
      structure initialization fails during register_device() state.
      
      [81416.561946] BUG: unable to handle kernel NULL pointer dereference at (null)
      [81416.570340] IP: ib_cache_release_one+0x29/0x80 [ib_core]
      [81416.576222] PGD 78da66067
      [81416.576223] PUD 7f2d7c067
      [81416.579484] PMD 0
      [81416.582720]
      [81416.587242] Oops: 0000 [#1] SMP
      [81416.722395] task: ffff8807887515c0 task.stack: ffffc900062c0000
      [81416.729148] RIP: 0010:ib_cache_release_one+0x29/0x80 [ib_core]
      [81416.735793] RSP: 0018:ffffc900062c3a90 EFLAGS: 00010202
      [81416.741823] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
      [81416.749785] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff880859fec000
      [81416.757757] RBP: ffffc900062c3aa0 R08: ffff8808536e5ac0 R09: ffff880859fec5b0
      [81416.765708] R10: 00000000536e5c01 R11: ffff8808536e5ac0 R12: ffff880859fec000
      [81416.773672] R13: 0000000000000000 R14: ffff8808536e5ac0 R15: ffff88084ebc0060
      [81416.781621] FS:  00007fd879fab740(0000) GS:ffff88085fac0000(0000) knlGS:0000000000000000
      [81416.790522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [81416.797094] CR2: 0000000000000000 CR3: 00000007eb215000 CR4: 00000000003406e0
      [81416.805051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [81416.812997] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [81416.820950] Call Trace:
      [81416.824226]  ib_device_release+0x1e/0x40 [ib_core]
      [81416.829858]  device_release+0x32/0xa0
      [81416.834370]  kobject_cleanup+0x63/0x170
      [81416.839058]  kobject_put+0x25/0x50
      [81416.843319]  ib_dealloc_device+0x25/0x40 [ib_core]
      [81416.848986]  mlx5_ib_add+0x163/0x1990 [mlx5_ib]
      [81416.854414]  mlx5_add_device+0x5a/0x160 [mlx5_core]
      [81416.860191]  mlx5_register_interface+0x8d/0xc0 [mlx5_core]
      [81416.866587]  ? 0xffffffffa09e9000
      [81416.870816]  mlx5_ib_init+0x15/0x17 [mlx5_ib]
      [81416.876094]  do_one_initcall+0x51/0x1b0
      [81416.880861]  ? __vunmap+0x85/0xd0
      [81416.885113]  ? kmem_cache_alloc_trace+0x14b/0x1b0
      [81416.890768]  ? vfree+0x2e/0x70
      [81416.894762]  do_init_module+0x60/0x1fa
      [81416.899441]  load_module+0x15f6/0x1af0
      [81416.904114]  ? __symbol_put+0x60/0x60
      [81416.908709]  ? ima_post_read_file+0x3d/0x80
      [81416.913828]  ? security_kernel_post_read_file+0x6b/0x80
      [81416.920006]  SYSC_finit_module+0xa6/0xf0
      [81416.924888]  SyS_finit_module+0xe/0x10
      [81416.929568]  entry_SYSCALL_64_fastpath+0x1a/0xa9
      [81416.935089] RIP: 0033:0x7fd879494949
      [81416.939543] RSP: 002b:00007ffdbc1b4e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
      [81416.947982] RAX: ffffffffffffffda RBX: 0000000001b66f00 RCX: 00007fd879494949
      [81416.955965] RDX: 0000000000000000 RSI: 000000000041a13c RDI: 0000000000000003
      [81416.963926] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001b652a0
      [81416.971861] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffdbc1b3e70
      [81416.979763] R13: 00007ffdbc1b3e50 R14: 0000000000000005 R15: 0000000000000000
      [81417.008005] RIP: ib_cache_release_one+0x29/0x80 [ib_core] RSP: ffffc900062c3a90
      [81417.016045] CR2: 0000000000000000
      
      Fixes: 55aeed06 ("IB/core: Make ib_alloc_device init the kobject")
      Fixes: 7738613e ("IB/core: Add per port immutable struct to ib_device")
      Cc: <stable@vger.kernel.org> # v4.2+
      Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4be3a4fa
  4. 25 3月, 2017 2 次提交
    • S
      IB/device: Convert ib-comp-wq to be CPU-bound · b7363e67
      Sagi Grimberg 提交于
      This workqueue is used by our storage target mode ULPs
      via the new CQ API. Recent observations when working
      with very high-end flash storage devices reveal that
      UNBOUND workqueue threads can migrate between cpu cores
      and even numa nodes (although some numa locality is accounted
      for).
      
      While this attribute can be useful in some workloads,
      it does not fit in very nicely with the normal
      run-to-completion model we usually use in our target-mode
      ULPs and the block-mq irq<->cpu affinity facilities.
      
      The whole block-mq concept is that the completion will
      land on the same cpu where the submission was performed.
      The fact that our submitter thread is migrating cpus
      can break this locality.
      
      We assume that as a target mode ULP, we will serve multiple
      initiators/clients and we can spread the load enough without
      having to use unbound kworkers.
      
      Also, while we're at it, expose this workqueue via sysfs which
      is harmless and can be useful for debug.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      b7363e67
    • B
      IB/core: Restore I/O MMU, s390 and powerpc support · 0957c29f
      Bart Van Assche 提交于
      Avoid that the following error message is reported on the console
      while loading an RDMA driver with I/O MMU support enabled:
      
      DMAR: Allocating domain for mlx5_0 failed
      
      Ensure that DMA mapping operations that use to_pci_dev() to
      access to struct pci_dev see the correct PCI device. E.g. the s390
      and powerpc DMA mapping operations use to_pci_dev() even with I/O
      MMU support disabled.
      
      This patch preserves the following changes of the DMA mapping updates
      patch series:
      - Introduction of dma_virt_ops.
      - Removal of ib_device.dma_ops.
      - Removal of struct ib_dma_mapping_ops.
      - Removal of an if-statement from each ib_dma_*() operation.
      - IB HW drivers no longer set dma_device directly.
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Reported-by: NParav Pandit <parav@mellanox.com>
      Fixes: commit 99db9494 ("IB/core: Remove ib_device.dma_device")
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: parav@mellanox.com
      Tested-by: parav@mellanox.com
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0957c29f
  5. 28 1月, 2017 1 次提交
  6. 25 1月, 2017 2 次提交
  7. 11 1月, 2017 1 次提交
    • P
      IB/core: added support to use rdma cgroup controller · 43579b5f
      Parav Pandit 提交于
      Added support APIs for IB core to register/unregister every IB/RDMA
      device with rdma cgroup for tracking rdma resources.
      IB core registers with rdma cgroup controller.
      Added support APIs for uverbs layer to make use of rdma controller.
      Added uverbs layer to perform resource charge/uncharge functionality.
      Added support during query_device uverb operation to ensure it
      returns resource limits by honoring rdma cgroup configured limits.
      Signed-off-by: NParav Pandit <pandit.parav@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      43579b5f
  8. 04 12月, 2016 1 次提交
  9. 24 6月, 2016 1 次提交
  10. 07 6月, 2016 2 次提交
  11. 25 5月, 2016 5 次提交
  12. 22 3月, 2016 1 次提交
  13. 03 3月, 2016 1 次提交
  14. 01 3月, 2016 1 次提交
  15. 23 12月, 2015 3 次提交
  16. 12 12月, 2015 1 次提交
    • C
      IB: add a proper completion queue abstraction · 14d3a3b2
      Christoph Hellwig 提交于
      This adds an abstraction that allows ULPs to simply pass a completion
      object and completion callback with each submitted WR and let the RDMA
      core handle the nitty gritty details of how to handle completion
      interrupts and poll the CQ.
      
      In detail there is a new ib_cqe structure which just contains the
      completion callback, and which can be used to get at the containing
      object using container_of.  It is pointed to by the WR and WC as an
      alternative to the wr_id field, similar to how many ULPs already use
      the field to store a pointer using casts.
      
      A driver using the new completion callbacks allocates it's CQs using
      the new ib_create_cq API, which in addition to the number of CQEs and
      the completion vectors also takes a mode on how we poll for CQEs.
      Three modes are available: direct for drivers that never take CQ
      interrupts and just poll for them, softirq to poll from softirq context
      using the to be renamed blk-iopoll infrastructure which takes care of
      rearming and budgeting, or a workqueue for consumer who want to be
      called from user context.
      
      Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote
      the current version of the workqueue code because my two previous
      attempts sucked too much and converted the iSER initiator to the new
      API.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      14d3a3b2
  17. 22 10月, 2015 2 次提交
  18. 31 8月, 2015 7 次提交
    • D
      IB/core: Remove needless bracketization · b8071ad8
      Doug Ledford 提交于
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      b8071ad8
    • D
      IB/core: missing curly braces in ib_find_gid() · 98d25afa
      Dan Carpenter 提交于
      Smatch says that, based on the indenting, we should probably add curly
      braces here.
      
      Fixes: 03db3a2d ('IB/core: Add RoCE GID table management')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      98d25afa
    • M
      IB/core: Add RoCE GID table management · 03db3a2d
      Matan Barak 提交于
      RoCE GIDs are based on IP addresses configured on Ethernet net-devices
      which relate to the RDMA (RoCE) device port.
      
      Currently, each of the low-level drivers that support RoCE (ocrdma,
      mlx4) manages its own RoCE port GID table. As there's nothing which is
      essentially vendor specific, we generalize that, and enhance the RDMA
      core GID cache to do this job.
      
      In order to populate the GID table, we listen for events:
      
      (a) netdev up/down/change_addr events - if a netdev is built onto
          our RoCE device, we need to add/delete its IPs. This involves
          adding all GIDs related to this ndev, add default GIDs, etc.
      
      (b) inet events - add new GIDs (according to the IP addresses)
          to the table.
      
      For programming the port RoCE GID table, providers must implement
      the add_gid and del_gid callbacks.
      
      RoCE GID management requires us to state the associated net_device
      alongside the GID. This information is necessary in order to manage
      the GID table. For example, when a net_device is removed, its
      associated GIDs need to be removed as well.
      
      RoCE mandates generating a default GID for each port, based on the
      related net-device's IPv6 link local. In contrast to the GID based on
      the regular IPv6 link-local (as we generate GID per IP address),
      the default GID is also available when the net device is down (in
      order to support loopback).
      
      Locking is done as follows:
      The patch modify the GID table code both for new RoCE drivers
      implementing the add_gid/del_gid callbacks and for current RoCE and
      IB drivers that do not. The flows for updating the table are
      different, so the locking requirements are too.
      
      While updating RoCE GID table, protection against multiple writers is
      achieved via mutex_lock(&table->lock). Since writing to a table
      requires us to find an entry (possible a free entry) in the table and
      then modify it, this mutex protects both the find_gid and write_gid
      ensuring the atomicity of the action.
      Each entry in the GID cache is protected by rwlock. In RoCE, writing
      (usually results from netdev notifier) involves invoking the vendor's
      add_gid and del_gid callbacks, which could sleep.
      Therefore, an invalid flag is added for each entry. Updates for RoCE are
      done via a workqueue, thus sleeping is permitted.
      
      In IB, updates are done in write_lock_irq(&device->cache.lock), thus
      write_gid isn't allowed to sleep and add_gid/del_gid are not called.
      
      When passing net-device into/out-of the GID cache, the device
      is always passed held (dev_hold).
      
      The code uses a single work item for updating all RDMA devices,
      following a netdev or inet notifier.
      
      The patch moves the cache from being a client (which was incorrect,
      as the cache is part of the IB infrastructure) to being explicitly
      initialized/freed when a device is registered/removed.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      03db3a2d
    • J
      IB/core: Make ib_alloc_device init the kobject · 55aeed06
      Jason Gunthorpe 提交于
      This gets rid of the weird in-between state where struct ib_device
      was allocated but the kobject didn't work.
      
      Consequently ib_device_release is now guaranteed to be called in
      all situations and we needn't duplicate its kfrees on error paths.
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      55aeed06
    • Y
      IB/core: Find the network device matching connection parameters · 9268f72d
      Yotam Kenneth 提交于
      In the case of IPoIB, and maybe in other cases, the network device is
      managed by an upper-layer protocol (ULP). In order to expose this
      network device to other users of the IB device, let ULPs implement
      a callback that returns network device according to connection parameters.
      
      The IB device and port, together with the P_Key and the GID should
      be enough to uniquely identify the ULP net device. However, in current
      kernels there can be multiple IPoIB interfaces created with the same GID.
      Furthermore, such configuration may be desireable to support ipvlan-like
      configurations for RDMA CM with IPoIB.  To resolve the device in these
      cases the code will also take the IP address as an additional input.
      Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NYotam Kenneth <yotamke@mellanox.com>
      Signed-off-by: NShachar Raindel <raindel@mellanox.com>
      Signed-off-by: NGuy Shapiro <guysh@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9268f72d
    • H
      IB/core: lock client data with lists_rwsem · 7c1eb45a
      Haggai Eran 提交于
      An ib_client callback that is called with the lists_rwsem locked only for
      read is protected from changes to the IB client lists, but not from
      ib_unregister_device() freeing its client data. This is because
      ib_unregister_device() will remove the device from the device list with
      lists_rwsem locked for write, but perform the rest of the cleanup,
      including the call to remove() without that lock.
      
      Mark client data that is undergoing de-registration with a new going_down
      flag in the client data context. Lock the client data list with lists_rwsem
      for write in addition to using the spinlock, so that functions calling the
      callback would be able to lock only lists_rwsem for read and let callbacks
      sleep.
      
      Since ib_unregister_client() now marks the client data context, no need for
      remove() to search the context again, so pass the client data directly to
      remove() callbacks.
      Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      7c1eb45a
    • H
      IB/core: Add rwsem to allow reading device list or client list · 5aa44bb9
      Haggai Eran 提交于
      Currently the RDMA subsystem's device list and client list are protected by
      a single mutex. This prevents adding user-facing APIs that iterate these
      lists, since using them may cause a deadlock. The patch attempts to solve
      this problem by adding a read-write semaphore to protect the lists. Readers
      now don't need the mutex, and are safe just by read-locking the semaphore.
      
      The ib_register_device, ib_register_client, ib_unregister_device, and
      ib_unregister_client functions are modified to lock the semaphore for write
      during their respective list modification. Also, in order to make sure
      client callbacks are called only between add() and remove() calls, the code
      is changed to only add items to the lists after the add() calls and remove
      from the lists before the remove() calls.
      
      This patch attempts to solve a similar need [1] that was seen in the RoCE
      v2 patch series.
      
      [1] http://www.spinics.net/lists/linux-rdma/msg24733.htmlReviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Matan Barak <matanb@mellanox.com>
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5aa44bb9
  19. 13 6月, 2015 3 次提交
  20. 21 5月, 2015 2 次提交