1. 30 10月, 2017 7 次提交
    • L
      mmc: block: Delete mmc_access_rpmb() · 14f4ca7e
      Linus Walleij 提交于
      This function is used by the block layer queue to bail out of
      requests if the current request is towards an RPMB
      "block device".
      
      This was done to avoid boot time scanning of this "block
      device" which was never really a block device, thus duct-taping
      over the fact that it was badly engineered.
      
      This problem is now gone as we removed the offending RPMB block
      device in another patch and replaced it with a character
      device.
      
      Cc: Tomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      14f4ca7e
    • L
      mmc: block: Convert RPMB to a character device · 97548575
      Linus Walleij 提交于
      The RPMB partition on the eMMC devices is a special area used
      for storing cryptographically safe information signed by a
      special secret key. To write and read records from this special
      area, authentication is needed.
      
      The RPMB area is *only* and *exclusively* accessed using
      ioctl():s from userspace. It is not really a block device,
      as blocks cannot be read or written from the device, also
      the signed chunks that can be stored on the RPMB are actually
      256 bytes, not 512 making a block device a real bad fit.
      
      Currently the RPMB partition spawns a separate block device
      named /dev/mmcblkNrpmb for each device with an RPMB partition,
      including the creation of a block queue with its own kernel
      thread and all overhead associated with this. On the Ux500
      HREFv60 platform, for example, the two eMMCs means that two
      block queues with separate threads are created for no use
      whatsoever.
      
      I have concluded that this block device design for RPMB is
      actually pretty wrong. The RPMB area should have been designed
      to be accessed from /dev/mmcblkN directly, using ioctl()s on
      the main block device. It is however way too late to change
      that, since userspace expects to open an RPMB device in
      /dev/mmcblkNrpmb and we cannot break userspace.
      
      This patch tries to amend the situation using the following
      strategy:
      
      - Stop creating a block device for the RPMB partition/area
      
      - Instead create a custom, dynamic character device with
        the same name.
      
      - Make this new character device support exactly the same
        set of ioctl()s as the old block device.
      
      - Wrap the requests back to the same ioctl() handlers, but
        issue them on the block queue of the main partition/area,
        i.e. /dev/mmcblkN
      
      We need to create a special "rpmb" bus type in order to get
      udev and/or busybox hot/coldplug to instantiate the device
      node properly.
      
      Before the patch, this appears in 'ps aux':
      
      101 root       0:00 [mmcqd/2rpmb]
      123 root       0:00 [mmcqd/3rpmb]
      
      After applying the patch these surplus block queue threads
      are gone, but RPMB is as usable as ever using the userspace
      MMC tools, such as 'mmc rpmb read-counter'.
      
      We get instead those dynamice devices in /dev:
      
      brw-rw----    1 root     root      179,   0 Jan  1  2000 mmcblk0
      brw-rw----    1 root     root      179,   1 Jan  1  2000 mmcblk0p1
      brw-rw----    1 root     root      179,   2 Jan  1  2000 mmcblk0p2
      brw-rw----    1 root     root      179,   5 Jan  1  2000 mmcblk0p5
      brw-rw----    1 root     root      179,   8 Jan  1  2000 mmcblk2
      brw-rw----    1 root     root      179,  16 Jan  1  2000 mmcblk2boot0
      brw-rw----    1 root     root      179,  24 Jan  1  2000 mmcblk2boot1
      crw-rw----    1 root     root      248,   0 Jan  1  2000 mmcblk2rpmb
      brw-rw----    1 root     root      179,  32 Jan  1  2000 mmcblk3
      brw-rw----    1 root     root      179,  40 Jan  1  2000 mmcblk3boot0
      brw-rw----    1 root     root      179,  48 Jan  1  2000 mmcblk3boot1
      brw-rw----    1 root     root      179,  33 Jan  1  2000 mmcblk3p1
      crw-rw----    1 root     root      248,   1 Jan  1  2000 mmcblk3rpmb
      
      Notice the (248,0) and (248,1) character devices for RPMB.
      
      Cc: Tomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      97548575
    • Y
      mmc: sdhci-of-esdhc: disable SD clock for clock value 0 · dd3f6983
      yangbo lu 提交于
      SD clock should be disabled for clock value 0. It's not
      right to just return. This may cause failure of signal
      voltage switching.
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      dd3f6983
    • A
      mmc: sdhci-pci: Add support for Intel CDF · cdaba732
      Adrian Hunter 提交于
      Add PCI Id for Intel CDF.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      cdaba732
    • B
      mmc: sdhci-msm: Enable delay circuit calibration clocks · 4946b3af
      Bjorn Andersson 提交于
      The delay circuit used to support HS400 is calibrated based on two
      additional clocks. When these clocks are not available and
      FF_CLK_SW_RST_DIS is not set in CORE_HC_MODE, reset might fail. But on
      some platforms this doesn't work properly and below dump can be seen in
      the kernel log.
      
        mmc0: Reset 0x1 never completed.
        mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
        mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001102
        mmc0: sdhci: Blk size:  0x00004000 | Blk cnt:  0x00000000
        mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
        mmc0: sdhci: Present:   0x01f80000 | Host ctl: 0x00000000
        mmc0: sdhci: Power:     0x00000000 | Blk gap:  0x00000000
        mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000002
        mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00000000
        mmc0: sdhci: Int enab:  0x00000000 | Sig enab: 0x00000000
        mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
        mmc0: sdhci: Caps:      0x742dc8b2 | Caps_1:   0x00008007
        mmc0: sdhci: Cmd:       0x00000000 | Max curr: 0x00000000
        mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
        mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
        mmc0: sdhci: Host ctl2: 0x00000000
        mmc0: sdhci: ============================================
      
      Add support for the additional calibration clocks to allow these
      platforms to be configured appropriately.
      
      Cc: Venkat Gopalakrishnan <venkatg@codeaurora.org>
      Cc: Ritesh Harjani <riteshh@codeaurora.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: NRob Herring <robh@kernel.org>
      Tested-by: NJeremy McNicoll <jeremymc@redhat.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      4946b3af
    • B
      mmc: sdhci-msm: Utilize bulk clock API · e4bf91f6
      Bjorn Andersson 提交于
      By stuffing the runtime controlled clocks into a clk_bulk_data array we
      can utilize the newly introduced bulk clock operations and clean up the
      error paths. This allow us to handle additional clocks in subsequent
      patch, without the added complexity.
      
      Cc: Ritesh Harjani <riteshh@codeaurora.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Tested-by: NJeremy McNicoll <jeremymc@redhat.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      e4bf91f6
    • K
      mmc: tegra: Mark 64 bit dma broken on Tegra186 · 68481a7e
      Krishna Reddy 提交于
      SDHCI controllers on Tegra186 support 40 bit addressing.
      IOVA addresses are 48-bit wide on Tegra186.
      SDHCI host common code sets dma mask as either 32-bit or 64-bit.
      To avoid access issues when SMMU is enabled, disable 64-bit dma.
      Signed-off-by: NKrishna Reddy <vdumpa@nvidia.com>
      Tested-by: NThierry Reding <treding@nvidia.com>
      Acked-by: NThierry Reding <treding@nvidia.com>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      68481a7e
  2. 28 10月, 2017 4 次提交
  3. 27 10月, 2017 6 次提交
  4. 26 10月, 2017 19 次提交
    • A
      i40e: Add programming descriptors to cleaned_count · 62b4c669
      Alexander Duyck 提交于
      This patch updates the i40e driver to include programming descriptors in
      the cleaned_count. Without this change it becomes possible for us to leak
      memory as we don't trigger a large enough allocation when the time comes to
      allocate new buffers and we end up overwriting a number of rx_buffers equal
      to the number of programming descriptors we encountered.
      
      Fixes: 0e626ff7 ("i40e: Fix support for flow director programming status")
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAnders K. Pedersen <akp@cohaesio.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      62b4c669
    • A
      i40e: Fix incorrect use of tx_itr_setting when checking for Rx ITR setup · 10781348
      Alexander Duyck 提交于
      It looks like there was either a copy/paste error or just a typo that
      resulted in the Tx ITR setting being used to determine if we were using
      adaptive Rx interrupt moderation or not.
      
      This patch fixes the typo.
      
      Fixes: 65e87c03 ("i40evf: support queue-specific settings for interrupt moderation")
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      10781348
    • A
      ixgbe: Fix Tx map failure path · 069db9cd
      Alexander Duyck 提交于
      This patch is a partial revert of "ixgbe: Don't bother clearing buffer
      memory for descriptor rings". Specifically I messed up the exception
      handling path a bit and this resulted in us incorrectly adding the count
      back in when we didn't need to.
      
      In order to make this simpler I am reverting most of the exception handling
      path change and instead just replacing the bit that was handled by the
      unmap_and_free call.
      
      Fixes: ffed21bc ("ixgbe: Don't bother clearing buffer memory for descriptor rings")
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      069db9cd
    • J
      igb: Fix TX map failure path · 104ba833
      Jean-Philippe Brucker 提交于
      When the driver cannot map a TX buffer, instead of rolling back
      gracefully and retrying later, we currently get a panic:
      
      [  159.885994] igb 0000:00:00.0: TX DMA map failed
      [  159.886588] Unable to handle kernel paging request at virtual address ffff00000a08c7a8
                     ...
      [  159.897031] PC is at igb_xmit_frame_ring+0x9c8/0xcb8
      
      Fix the erroneous test that leads to this situation.
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      104ba833
    • C
      e1000: avoid null pointer dereference on invalid stat type · 5983587c
      Colin Ian King 提交于
      Currently if the stat type is invalid then data[i] is being set
      either by dereferencing a null pointer p, or it is reading from
      an incorrect previous location if we had a valid stat type
      previously.  Fix this by skipping over the read of p on an invalid
      stat type.
      
      Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5983587c
    • V
      e1000: fix race condition between e1000_down() and e1000_watchdog · 44c445c3
      Vincenzo Maffione 提交于
      This patch fixes a race condition that can result into the interface being
      up and carrier on, but with transmits disabled in the hardware.
      The bug may show up by repeatedly IFF_DOWN+IFF_UP the interface, which
      allows e1000_watchdog() interleave with e1000_down().
      
          CPU x                           CPU y
          --------------------------------------------------------------------
          e1000_down():
              netif_carrier_off()
                                          e1000_watchdog():
                                              if (carrier == off) {
                                                  netif_carrier_on();
                                                  enable_hw_transmit();
                                              }
              disable_hw_transmit();
                                          e1000_watchdog():
                                              /* carrier on, do nothing */
      Signed-off-by: NVincenzo Maffione <v.maffione@gmail.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      44c445c3
    • J
      xen: fix booting ballooned down hvm guest · 5266b8e4
      Juergen Gross 提交于
      Commit 96edd61d ("xen/balloon: don't
      online new memory initially") introduced a regression when booting a
      HVM domain with memory less than mem-max: instead of ballooning down
      immediately the system would try to use the memory up to mem-max
      resulting in Xen crashing the domain.
      
      For HVM domains the current size will be reflected in Xenstore node
      memory/static-max instead of memory/target.
      
      Additionally we have to trigger the ballooning process at once.
      
      Cc: <stable@vger.kernel.org> # 4.13
      Fixes: 96edd61d ("xen/balloon: don't
             online new memory initially")
      Reported-by: NSimon Gaiser <hw42@ipsumj.de>
      Suggested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      5266b8e4
    • G
      tap: double-free in error path in tap_open() · 78e0ea67
      Girish Moodalbail 提交于
      Double free of skb_array in tap module is causing kernel panic. When
      tap_set_queue() fails we free skb_array right away by calling
      skb_array_cleanup(). However, later on skb_array_cleanup() is called
      again by tap_sock_destruct through sock_put(). This patch fixes that
      issue.
      
      Fixes: 362899b8 (macvtap: switch to use skb array)
      Signed-off-by: NGirish Moodalbail <girish.moodalbail@oracle.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78e0ea67
    • A
      net: mvpp2: do not sleep in set_rx_mode · 239dd4ee
      Antoine Tenart 提交于
      This patch replaces GFP_KERNEL by GFP_ATOMIC to avoid sleeping in the
      ndo_set_rx_mode() call which is called with BH disabled.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: NAntoine Tenart <antoine.tenart@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      239dd4ee
    • A
      net: mvpp2: fix invalid parameters order when calling the tcam init · 20746d71
      Antoine Tenart 提交于
      When calling mvpp2_prs_mac_multi_set() from mvpp2_prs_mac_init(), two
      parameters (the port index and the table index) are inverted. Fixes
      this.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: NAntoine Tenart <antoine.tenart@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20746d71
    • A
      net: mvpp2: fix typo in the tcam setup · ef4816f0
      Antoine Tenart 提交于
      This patch fixes a typo in the mvpp2_prs_tcam_data_cmp() function, as
      the shift value is inverted with the data.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: NAntoine Tenart <antoine.tenart@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef4816f0
    • H
      net/mlx5e: DCBNL, Implement tc with ets type and zero bandwidth · be0f161e
      Huy Nguyen 提交于
      Previously, tc with ets type and zero bandwidth is not accepted
      by driver. This behavior does not follow the IEEE802.1qaz spec.
      
      If there are tcs with ets type and zero bandwidth, these tcs are
      assigned to the lowest priority tc_group #0. We equally distribute
      100% bw of the tc_group #0 to these zero bandwidth ets tcs.
      Also, the non zero bandwidth ets tcs are assigned to tc_group #1.
      
      If there is no zero bandwidth ets tc, the non zero bandwidth ets tcs
      are assigned to tc_group #0.
      
      Fixes: cdcf1121 ("net/mlx5e: Validate BW weight values of ETS")
      Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      be0f161e
    • O
      net/mlx5e: Properly deal with encap flows add/del under neigh update · 3c37745e
      Or Gerlitz 提交于
      Currently, the encap action offload is handled in the actions parse
      function and not in mlx5e_tc_add_fdb_flow() where we deal with all
      the other aspects of offloading actions (vlan, modify header) and
      the rule itself.
      
      When the neigh update code (mlx5e_tc_encap_flows_add()) recreates the
      encap entry and offloads the related flows, we wrongly call again into
      mlx5e_tc_add_fdb_flow(), this for itself would cause us to handle
      again the offloading of vlans and header re-write which puts things
      in non consistent state and step on freed memory (e.g the modify
      header parse buffer which is already freed).
      
      Since on error, mlx5e_tc_add_fdb_flow() detaches and may release the
      encap entry, it causes a corruption at the neigh update code which goes
      over the list of flows associated with this encap entry, or double free
      when the tc flow is later deleted by user-space.
      
      When neigh update (mlx5e_tc_encap_flows_del()) unoffloads the flows related
      to an encap entry which is now invalid, we do a partial repeat of the eswitch
      flow removal code which is wrong too.
      
      To fix things up we do the following:
      
      (1) handle the encap action offload in the eswitch flow add function
          mlx5e_tc_add_fdb_flow() as done for the other actions and the rule itself.
      
      (2) modify the neigh update code (mlx5e_tc_encap_flows_add/del) to only
          deal with the encap entry and rules delete/add and not with any of
          the other offloaded actions.
      
      Fixes: 232c0013 ('net/mlx5e: Add support to neighbour update flow')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      3c37745e
    • H
      net/mlx5: Delay events till mlx5 interface's add complete for pci resume · 4ca637a2
      Huy Nguyen 提交于
      mlx5_ib_add is called during mlx5_pci_resume after a pci error.
      Before mlx5_ib_add completes, there are multiple events which trigger
      function mlx5_ib_event. This cause kernel panic because mlx5_ib_event
      accesses unitialized resources.
      
      The fix is to extend Erez Shitrit's patch <97834eba>
      ("net/mlx5: Delay events till ib registration ends") to cover
      the pci resume code path.
      
      Trace:
      mlx5_core 0001:01:00.6: mlx5_pci_resume was called
      mlx5_core 0001:01:00.6: firmware version: 16.20.1011
      mlx5_core 0001:01:00.6: mlx5_attach_interface:164:(pid 779):
      mlx5_ib_event:2996:(pid 34777): warning: event on port 1
      mlx5_ib_event:2996:(pid 34782): warning: event on port 1
      Unable to handle kernel paging request for data at address 0x0001c104
      Faulting instruction address: 0xd000000008f411fc
      Oops: Kernel access of bad area, sig: 11 [#1]
      ...
      ...
      Call Trace:
      [c000000fff77bb70] [d000000008f4119c] mlx5_ib_event+0x64/0x470 [mlx5_ib] (unreliable)
      [c000000fff77bc60] [d000000008e67130] mlx5_core_event+0xb8/0x210 [mlx5_core]
      [c000000fff77bd10] [d000000008e4bd00] mlx5_eq_int+0x528/0x860[mlx5_core]
      
      Fixes: 97834eba ("net/mlx5: Delay events till ib registration ends")
      Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      4ca637a2
    • M
      net/mlx5: Fix health work queue spin lock to IRQ safe · 6377ed0b
      Moshe Shemesh 提交于
      spin_lock/unlock of health->wq_lock should be IRQ safe.
      It was changed to spin_lock_irqsave since adding commit 0179720d
      ("net/mlx5: Introduce trigger_health_work function") which uses
      spin_lock from asynchronous event (IRQ) context.
      Thus, all spin_lock/unlock of health->wq_lock should have been moved
      to IRQ safe mode.
      However, one occurrence on new code using this lock missed that
      change, resulting in possible deadlock:
        kernel: Possible unsafe locking scenario:
        kernel:       CPU0
        kernel:       ----
        kernel:  lock(&(&health->wq_lock)->rlock);
        kernel:  <Interrupt>
        kernel:    lock(&(&health->wq_lock)->rlock);
        kernel: #012 *** DEADLOCK ***
      
      Fixes: 2a0165a0 ("net/mlx5: Cancel delayed recovery work when unloading the driver")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      6377ed0b
    • J
      tun: allow positive return values on dev_get_valid_name() call · 5c25f65f
      Julien Gomes 提交于
      If the name argument of dev_get_valid_name() contains "%d", it will try
      to assign it a unit number in __dev__alloc_name() and return either the
      unit number (>= 0) or an error code (< 0).
      Considering positive values as error values prevent tun device creations
      relying this mechanism, therefor we should only consider negative values
      as errors here.
      Signed-off-by: NJulien Gomes <julien@arista.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c25f65f
    • P
      nfp: refuse offloading filters that redirects to upper devices · d309ae5c
      Pieter Jansen van Vuuren 提交于
      Previously we did not ensure that a netdev is a representative netdev
      before dereferencing its private data. This can occur when an upper netdev
      is created on a representative netdev. This patch corrects this by first
      ensuring that the netdev is a representative netdev before using it.
      Checking only switchdev_port_same_parent_id is not sufficient to ensure
      that we can safely use the netdev. Failing to check that the netdev is also
      a representative netdev would result in incorrect dereferencing.
      
      Fixes: 1a1e586f ("nfp: add basic action capabilities to flower offloads")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d309ae5c
    • M
      RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag · b4d91aeb
      Michael J. Ruhl 提交于
      rdma_nl_rcv_msg() checks to see if it should use the .dump() callback
      or the .doit() callback.  The check is done with this check:
      
      if (flags & NLM_F_DUMP) ...
      
      The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH).
      
      When an RDMA_NL_LS message (response) is received, the bit used for
      indicating an error is the same bit as NLM_F_ROOT.
      
      NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR.
      
      ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error
      occurs in the service.  The current code then misinterprets the
      NLM_F_DUMP bit and trys to call the .dump() callback.
      
      If the .dump() callback for the specified request is not available
      (which is true for the RDMA_NL_LS messages) the following Oops occurs:
      
      [ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at
         (null)
      [ 4555.969046] IP:           (null)
      [ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0
      [ 4555.979287] Oops: 0010 [#1] SMP
      [ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod
      target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm
      ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod
      dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass
      crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd
      glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core mei_me
      lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si ipmi_devintf
      ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace
      sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea
      sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci
      pps_core drm dca libata i2c_algo_bit i2c_core
      [ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G          I
      4.14.0-rc2+ #6
      [ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS
      SE5C610.86B.01.01.0008.021120151325 02/11/2015
      [ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000
      [ 4556.087967] RIP: 0010:          (null)
      [ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246
      [ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX:
      0000000000000000
      [ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI:
      ffff881056362000
      [ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09:
      0000000000001100
      [ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12:
      ffff881056362000
      [ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15:
      0000000000000ec0
      [ 4556.137945] FS:  00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000)
      knlGS:0000000000000000
      [ 4556.147000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4:
      00000000001606e0
      [ 4556.161419] Call Trace:
      [ 4556.164167]  ? netlink_dump+0x12c/0x290
      [ 4556.168468]  __netlink_dump_start+0x186/0x1f0
      [ 4556.173357]  rdma_nl_rcv_msg+0x193/0x1b0 [ib_core]
      [ 4556.178724]  rdma_nl_rcv+0xdc/0x130 [ib_core]
      [ 4556.183604]  netlink_unicast+0x181/0x240
      [ 4556.187998]  netlink_sendmsg+0x2c2/0x3b0
      [ 4556.192392]  sock_sendmsg+0x38/0x50
      [ 4556.196299]  SYSC_sendto+0x102/0x190
      [ 4556.200308]  ? __audit_syscall_entry+0xaf/0x100
      [ 4556.205387]  ? syscall_trace_enter+0x1d0/0x2b0
      [ 4556.210366]  ? __audit_syscall_exit+0x209/0x290
      [ 4556.215442]  SyS_sendto+0xe/0x10
      [ 4556.219060]  do_syscall_64+0x67/0x1b0
      [ 4556.223165]  entry_SYSCALL64_slow_path+0x25/0x25
      [ 4556.228328] RIP: 0033:0x7fe0b9db2a63
      [ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX:
      000000000000002c
      [ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX:
      00007fe0b9db2a63
      [ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI:
      000000000000000d
      [ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09:
      000000000000000c
      [ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12:
      00007ffc55edc280
      [ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15:
      0000000000000001
      [ 4556.282368] Code:  Bad RIP value.
      [ 4556.286629] RIP:           (null) RSP: ffffc900246b7bc8
      [ 4556.293013] CR2: 0000000000000000
      [ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]---
      [ 4556.305465] Kernel panic - not syncing: Fatal exception
      [ 4556.313786] Kernel Offset: disabled
      [ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception
      [ 4556.328960] ------------[ cut here ]------------
      
      Special case RDMA_NL_LS response messages to call the appropriate
      callback.
      
      Additionally, make sure that the .dump() callback is not NULL
      before calling it.
      
      Fixes: 647c75ac ("RDMA/netlink: Convert LS to doit callback")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NKaike Wan <kaike.wan@intel.com>
      Reviewed-by: NAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Reviewed-by: NShiraz Saleem <shiraz.saleem@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      b4d91aeb
    • J
      xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap() · 298d275d
      Juergen Gross 提交于
      In case gntdev_mmap() succeeds only partially in mapping grant pages
      it will leave some vital information uninitialized needed later for
      cleanup. This will lead to an out of bounds array access when unmapping
      the already mapped pages.
      
      So just initialize the data needed for unmapping the pages a little bit
      earlier.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NArthur Borsboom <arthurborsboom@gmail.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      298d275d
  5. 25 10月, 2017 4 次提交