1. 11 4月, 2018 7 次提交
    • E
      vhost: Fix vhost_copy_to_user() · 7ced6c98
      Eric Auger 提交于
      vhost_copy_to_user is used to copy vring used elements to userspace.
      We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.
      
      Fixes: f8894913 ("vhost: introduce O(1) vq metadata cache")
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ced6c98
    • I
      net: aquantia: oops when shutdown on already stopped device · 9a11aff2
      Igor Russkikh 提交于
      In case netdev is closed at the moment of pci shutdown, aq_nic_stop
      gets called second time. napi_disable in that case hangs indefinitely.
      In other case, if device was never opened at all, we get oops because
      of null pointer access.
      
      We should invoke aq_nic_stop conditionally, only if device is running
      at the moment of shutdown.
      Reported-by: NDavid Arcari <darcari@redhat.com>
      Fixes: 90869ddf ("net: aquantia: Implement pci shutdown callback")
      Signed-off-by: NIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a11aff2
    • I
      net: aquantia: Regression on reset with 1.x firmware · cce96d18
      Igor Russkikh 提交于
      On ASUS XG-C100C with 1.5.44 firmware a special mode called "dirty wake"
      is active. With this mode when motherboard gets powered (but no poweron
      happens yet), NIC automatically enables powersave link and watches
      for WOL packet.
      This normally allows to powerup the PC after AC power failures.
      
      Not all motherboards or bios settings gives power to PCI slots,
      so this mode is not enabled on all the hardware.
      
      4.16 linux driver introduced full hardware reset sequence
      This is required since before that we had no NIC hardware
      reset implemented and there were side effects of "not clean start".
      
      But this full reset is incompatible with "dirty wake" WOL feature
      it keeps the PHY link in a special mode forever. As a consequence,
      driver sees no link and no traffic.
      
      To fix this we forcibly change FW state to idle state before doing
      the full reset. This makes FW to restore link state.
      
      Fixes: c8c82eb3 net: aquantia: Introduce global AQC hardware reset sequence
      Signed-off-by: NIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cce96d18
    • B
      cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN · 53765341
      Bassem Boubaker 提交于
      The Cinterion AHS8 is a 3G device with one embedded WWAN interface
      using cdc_ether as a driver.
      
      The modem is controlled via AT commands through the exposed TTYs.
      
      AT+CGDCONT write command can be used to activate or deactivate a WWAN
      connection for a PDP context defined with the same command. UE
      supports one WWAN adapter.
      Signed-off-by: NBassem Boubaker <bassem.boubaker@actia.fr>
      Acked-by: NOliver Neukum <oneukum@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53765341
    • T
      slip: Check if rstate is initialized before uncompressing · 3f01ddb9
      Tejaswi Tanikella 提交于
      On receiving a packet the state index points to the rstate which must be
      used to fill up IP and TCP headers. But if the state index points to a
      rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
      infinite loop inside ip_fast_csum trying to compute the ip checsum of a
      header with zero length.
      
      89.666953:   <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468
      89.666965:   <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c
      89.666978:   <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0
      89.666991:   <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198
      89.667005:   <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370
      89.667027:   <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250
      89.667040:   <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60
      89.667053:   <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c
      89.667065:   <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38
      89.667085:   <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154
      89.667098:   <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c
      89.667117:   <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c
      89.667131:   <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c
      
      ./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
       ip_fast_csum at arch/arm64/include/asm/checksum.h:40
       (inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615
      
      Adding a variable to indicate if the current rstate is initialized. If
      such a packet arrives, move to toss state.
      Signed-off-by: NTejaswi Tanikella <tejaswit@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f01ddb9
    • P
      lan78xx: Avoid spurious kevent 4 "error" · fed56079
      Phil Elwell 提交于
      lan78xx_defer_event generates an error message whenever the work item
      is already scheduled. lan78xx_open defers three events -
      EVENT_STAT_UPDATE, EVENT_DEV_OPEN and EVENT_LINK_RESET. Being aware
      of the likelihood (or certainty) of an error message, the DEV_OPEN
      event is added to the set of pending events directly, relying on
      the subsequent deferral of the EVENT_LINK_RESET call to schedule the
      work.  Take the same precaution with EVENT_STAT_UPDATE to avoid a
      totally unnecessary error message.
      Signed-off-by: NPhil Elwell <phil@raspberrypi.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fed56079
    • P
      lan78xx: Correctly indicate invalid OTP · 4bfc3380
      Phil Elwell 提交于
      lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP
      content, but the value gets overwritten before it is returned and the
      read goes ahead anyway. Make the read conditional as it should be
      and preserve the error code.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: NPhil Elwell <phil@raspberrypi.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bfc3380
  2. 09 4月, 2018 10 次提交
    • H
      vhost-net: set packet weight of tx polling to 2 * vq size · a2ac9990
      haibinzhang(张海斌) 提交于
      handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
      polling udp packets with small length(e.g. 1byte udp payload), because setting
      VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
      
      Ping-Latencies shown below were tested between two Virtual Machines using
      netperf (UDP_STREAM, len=1), and then another machine pinged the client:
      
      vq size=256
      Packet-Weight   Ping-Latencies(millisecond)
                         min      avg       max
      Origin           3.319   18.489    57.303
      64               1.643    2.021     2.552
      128              1.825    2.600     3.224
      256              1.997    2.710     4.295
      512              1.860    3.171     4.631
      1024             2.002    4.173     9.056
      2048             2.257    5.650     9.688
      4096             2.093    8.508    15.943
      
      vq size=512
      Packet-Weight   Ping-Latencies(millisecond)
                         min      avg       max
      Origin           6.537   29.177    66.245
      64               2.798    3.614     4.403
      128              2.861    3.820     4.775
      256              3.008    4.018     4.807
      512              3.254    4.523     5.824
      1024             3.079    5.335     7.747
      2048             3.944    8.201    12.762
      4096             4.158   11.057    19.985
      
      Seems pretty consistent, a small dip at 2 VQ sizes.
      Ring size is a hint from device about a burst size it can tolerate. Based on
      benchmarks, set the weight to 2 * vq size.
      
      To evaluate this change, another tests were done using netperf(RR, TX) between
      two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
      tweaked through qemu. Results shown below does not show obvious changes.
      
      vq size=256 TCP_RR                vq size=512 TCP_RR
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
         1/       1/  -7%/        -2%      1/       1/   0%/        -2%
         1/       4/  +1%/         0%      1/       4/  +1%/         0%
         1/       8/  +1%/        -2%      1/       8/   0%/        +1%
        64/       1/  -6%/         0%     64/       1/  +7%/        +3%
        64/       4/   0%/        +2%     64/       4/  -1%/        +1%
        64/       8/   0%/         0%     64/       8/  -1%/        -2%
       256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
       256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
       256/       8/  +2%/         0%    256/       8/  +1%/        -1%
      
      vq size=256 UDP_RR                vq size=512 UDP_RR
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
         1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
         1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
         1/       8/  -1%/        -1%      1/       8/  -1%/         0%
        64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
        64/       4/  -5%/        -1%     64/       4/  +2%/         0%
        64/       8/   0%/        -1%     64/       8/  -2%/        +1%
       256/       1/  +7%/        +1%    256/       1/  -7%/         0%
       256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
       256/       8/  +2%/        +2%    256/       8/  +1%/        +1%
      
      vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
        64/       1/   0%/        -3%     64/       1/   0%/         0%
        64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
        64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
       256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
       256/       4/  -1%/        -1%    256/       4/  -3%/         0%
       256/       8/  +7%/        +5%    256/       8/  -3%/         0%
       512/       1/  +1%/         0%    512/       1/  -1%/        -1%
       512/       4/  +1%/        -1%    512/       4/   0%/         0%
       512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
      1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
      1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
      1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
      2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
      2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
      2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
      4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
      4096/       4/  +2%/         0%   4096/       4/   0%/         0%
      4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NHaibin Zhang <haibinzhang@tencent.com>
      Signed-off-by: NYunfang Tai <yunfangtai@tencent.com>
      Signed-off-by: NLidong Chen <lidongchen@tencent.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2ac9990
    • V
      net: thunderx: rework mac addresses list to u64 array · 9b5c4dfb
      Vadim Lomovtsev 提交于
      It is too expensive to pass u64 values via linked list, instead
      allocate array for them by overall number of mac addresses from netdev.
      
      This eventually removes multiple kmalloc() calls, aviod memory
      fragmentation and allow to put single null check on kmalloc
      return value in order to prevent a potential null pointer dereference.
      
      Addresses-Coverity-ID: 1467429 ("Dereference null return value")
      Fixes: 37c3347e ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b5c4dfb
    • E
      dp83640: Ensure against premature access to PHY registers after reset · 76327a35
      Esben Haabendal 提交于
      The datasheet specifies a 3uS pause after performing a software
      reset. The default implementation of genphy_soft_reset() does not
      provide this, so implement soft_reset with the needed pause.
      Signed-off-by: NEsben Haabendal <eha@deif.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76327a35
    • J
      devlink: convert occ_get op to separate registration · fc56be47
      Jiri Pirko 提交于
      This resolves race during initialization where the resources with
      ops are registered before driver and the structures used by occ_get
      op is initialized. So keep occ_get callbacks registered only when
      all structs are initialized.
      
      The example flows, as it is in mlxsw:
      1) driver load/asic probe:
         mlxsw_core
            -> mlxsw_sp_resources_register
              -> mlxsw_sp_kvdl_resources_register
                -> devlink_resource_register IDX
         mlxsw_spectrum
            -> mlxsw_sp_kvdl_init
              -> mlxsw_sp_kvdl_parts_init
                -> mlxsw_sp_kvdl_part_init
                  -> devlink_resource_size_get IDX (to get the current setup
                                                    size from devlink)
              -> devlink_resource_occ_get_register IDX (register current
                                                        occupancy getter)
      2) reload triggered by devlink command:
        -> mlxsw_devlink_core_bus_device_reload
          -> mlxsw_sp_fini
            -> mlxsw_sp_kvdl_fini
      	-> devlink_resource_occ_get_unregister IDX
          (struct mlxsw_sp *mlxsw_sp is freed at this point, call to occ get
           which is using mlxsw_sp would cause use-after free)
          -> mlxsw_sp_init
            -> mlxsw_sp_kvdl_init
              -> mlxsw_sp_kvdl_parts_init
                -> mlxsw_sp_kvdl_part_init
                  -> devlink_resource_size_get IDX (to get the current setup
                                                    size from devlink)
              -> devlink_resource_occ_get_register IDX (register current
                                                        occupancy getter)
      
      Fixes: d9f9b9a4 ("devlink: Add support for resource abstraction")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc56be47
    • E
      net/fsl_pq_mdio: Allow explicit speficition of TBIPA address · 21481189
      Esben Haabendal 提交于
      This introduces a simpler and generic method for for finding (and mapping)
      the TBIPA register.
      
      Instead of relying of complicated logic for finding the TBIPA register
      address based on the MDIO or MII register block base
      address, which even in some cases relies on undocumented shadow registers,
      a second "reg" entry for the mdio bus devicetree node specifies the TBIPA
      register.
      
      Backwards compatibility is kept, as the existing logic is applied when
      only a single "reg" mapping is specified.
      Signed-off-by: NEsben Haabendal <eha@deif.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21481189
    • N
      ibmvnic: Do not reset CRQ for Mobility driver resets · 30f79625
      Nathan Fontenot 提交于
      When resetting the ibmvnic driver after a partition migration occurs
      there is no requirement to do a reset of the main CRQ. The current
      driver code does the required re-enable of the main CRQ, then does
      a reset of the main CRQ later.
      
      What we should be doing for a driver reset after a migration is to
      re-enable the main CRQ, release all the sub-CRQs, and then allocate
      new sub-CRQs after capability negotiation.
      
      This patch updates the handling of mobility resets to do the proper
      work and not reset the main CRQ. To do this the initialization/reset
      of the main CRQ had to be moved out of the ibmvnic_init routine
      and in to the ibmvnic_probe and do_reset routines.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30f79625
    • T
      ibmvnic: Fix failover case for non-redundant configuration · 5a18e1e0
      Thomas Falcon 提交于
      There is a failover case for a non-redundant pseries VNIC
      configuration that was not being handled properly. The current
      implementation assumes that the driver will always have a redandant
      device to communicate with following a failover notification. There
      are cases, however, when a non-redundant configuration can receive
      a failover request. If that happens, the driver should wait until
      it receives a signal that the device is ready for operation.
      
      The driver is agnostic of its backing hardware configuration,
      so this fix necessarily affects all device failover management.
      The driver needs to wait until it receives a signal that the device
      is ready for resetting. A flag is introduced to track this intermediary
      state where the driver is waiting for an active device.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a18e1e0
    • T
      ibmvnic: Fix reset scheduler error handling · af894d23
      Thomas Falcon 提交于
      In some cases, if the driver is waiting for a reset following
      a device parameter change, failure to schedule a reset can result
      in a hang since a completion signal is never sent.
      
      If the device configuration is being altered by a tool such
      as ethtool or ifconfig, it could cause the console to hang
      if the reset request does not get scheduled. Add some additional
      error handling code to exit the wait_for_completion if there is
      one in progress.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af894d23
    • T
      ibmvnic: Zero used TX descriptor counter on reset · 41f71467
      Thomas Falcon 提交于
      The counter that tracks used TX descriptors pending completion
      needs to be zeroed as part of a device reset. This change fixes
      a bug causing transmit queues to be stopped unnecessarily and in
      some cases a transmit queue stall and timeout reset. If the counter
      is not reset, the remaining descriptors will not be "removed",
      effectively reducing queue capacity. If the queue is over half full,
      it will cause the queue to stall if stopped.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41f71467
    • T
      ibmvnic: Fix DMA mapping mistakes · 37e40fa8
      Thomas Falcon 提交于
      Fix some mistakes caught by the DMA debugger. The first change
      fixes a unnecessary unmap that should have been removed in an
      earlier update. The next hunk fixes another bad unmap by zeroing
      the bit checked to determine that an unmap is needed. The final
      change fixes some buffers that are unmapped with the wrong
      direction specified.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37e40fa8
  3. 08 4月, 2018 1 次提交
    • L
      treewide: fix up files incorrectly marked executable · 90fda63f
      Linus Torvalds 提交于
      Joe Perches noted that we have a few source files that for some
      inexplicable reason (read: I'm too lazy to even go look at the history)
      are marked executable:
      
        drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
        drivers/net/ethernet/cadence/macb_ptp.c
      
      A simple git command line to show executable C/asm/header files is this:
      
          git ls-files -s '*.[chsS]' | grep '^100755'
      
      and then you can fix them up with scripting by just feeding that output
      into:
      
          | cut -f2 | xargs chmod -x
      
      and commit it.
      
      Which is exactly what this commit does.
      Reported-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90fda63f
  4. 07 4月, 2018 1 次提交
  5. 06 4月, 2018 21 次提交
    • R
      ARM: sa1100/simpad: switch simpad CF to use gpiod APIs · 64b2f129
      Russell King 提交于
      Switch simpad's CF implementation to use the gpiod APIs.  The inverted
      detection is handled using gpiolib's native inversion abilities.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      64b2f129
    • R
      ARM: sa1100/shannon: convert to generic CF sockets · b51af865
      Russell King 提交于
      Convert shannon to use the generic CF socket support.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      b51af865
    • R
      ARM: sa1100/nanoengine: convert to generic CF sockets · 80c799db
      Russell King 提交于
      Convert nanoengine to use the generic CF socket support.
      Makefile fix from Arnd Bergmann <arnd@arndb.de>.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      80c799db
    • A
      ice: Bug fixes in ethtool code · cba5957d
      Anirudh Venkataramanan 提交于
      1) Return correct size from ice_get_regs_len.
      2) Fix incorrect use of ARRAY_SIZE in ice_get_regs.
      
      Fixes: fcea6f3d (ice: Add stats and ethtool support)
      Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cba5957d
    • W
      ice: Fix error return code in ice_init_hw() · 63bb4e1e
      Wei Yongjun 提交于
      Fix to return error code ICE_ERR_NO_MEMORY from the alloc error
      handling case instead of 0, as done elsewhere in this function.
      
      Fixes: dc49c772 ("ice: Get MAC/PHY/link info and scheduler topology")
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Acked-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      63bb4e1e
    • R
      headers: untangle kmemleak.h from mm.h · 514c6032
      Randy Dunlap 提交于
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h
      from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
      don't already #include it.  Also remove <linux/kmemleak.h> from source
      files that do not use it.
      
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
      be good to run it through the 0day bot for other $ARCHes.  I have
      neither the horsepower nor the storage space for the other $ARCHes.
      
      Update: This patch has been extensively build-tested by both the 0day
      bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
      for which patches are included here (in v2).
      
      [ slab.h is the second most used header file after module.h; kernel.h is
        right there with slab.h. There could be some minor error in the
        counting due to some #includes having comments after them and I didn't
        combine all of those. ]
      
      [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      514c6032
    • S
      zram: drop max_zpage_size and use zs_huge_class_size() · 60f5921a
      Sergey Senozhatsky 提交于
      Remove ZRAM's enforced "huge object" value and use zsmalloc huge-class
      watermark instead, which makes more sense.
      
      TEST
      - I used a 1G zram device, LZO compression back-end, original
        data set size was 444MB. Looking at zsmalloc classes stats the
        test ended up to be pretty fair.
      
      BASE ZRAM/ZSMALLOC
      =====================
      zram mm_stat
      
      498978816 191482495 199831552        0 199831552    15634        0
      
      zsmalloc classes
      
       class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
      ...
         151  2448           0            0          1240       1240        744                3        0
         168  2720           0            0          4200       4200       2800                2        0
         190  3072           0            0         10100      10100       7575                3        0
         202  3264           0            0           380        380        304                4        0
         254  4096           0            0         10620      10620      10620                1        0
      
       Total                 7           46        106982     106187      48787                         0
      
      PATCHED ZRAM/ZSMALLOC
      =====================
      
      zram mm_stat
      
      498978816 182579184 194248704        0 194248704    15628        0
      
      zsmalloc classes
      
       class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
      ...
         151  2448           0            0          1240       1240        744                3        0
         168  2720           0            0          4200       4200       2800                2        0
         190  3072           0            0         10100      10100       7575                3        0
         202  3264           0            0          7180       7180       5744                4        0
         254  4096           0            0          3820       3820       3820                1        0
      
       Total                 8           45        106959     106193      47424                         0
      
      As we can see, we reduced the number of objects stored in class-4096,
      because a huge number of objects which we previously forcibly stored in
      class-4096 now stored in non-huge class-3264.  This results in lower
      memory consumption:
      
      - zsmalloc now uses 47424 physical pages, which is less than 48787 pages
        zsmalloc used before.
      
      - objects that we store in class-3264 share zspages.  That's why overall
        the number of pages that both class-4096 and class-3264 consumed went
        down from 10924 to 9564.
      
      [sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
        Link: http://lkml.kernel.org/r/20180314081833.1096-3-sergey.senozhatsky@gmail.com
      Link: http://lkml.kernel.org/r/20180306070639.7389-3-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60f5921a
    • D
      device-dax: implement ->pagesize() for smaps to report MMUPageSize · c1d53b92
      Dan Williams 提交于
      Given that device-dax is making similar page mapping size guarantees as
      hugetlbfs, emit the size in smaps and any other kernel path that
      requests the mapping size of a vma.
      
      Link: http://lkml.kernel.org/r/151996255287.27922.18397777516059080245.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NJane Chu <jane.chu@oracle.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1d53b92
    • P
      mm/memory_hotplug: optimize memory hotplug · d0dc12e8
      Pavel Tatashin 提交于
      During memory hotplugging we traverse struct pages three times:
      
      1. memset(0) in sparse_add_one_section()
      2. loop in __add_section() to set do: set_page_node(page, nid); and
         SetPageReserved(page);
      3. loop in memmap_init_zone() to call __init_single_pfn()
      
      This patch removes the first two loops, and leaves only loop 3.  All
      struct pages are initialized in one place, the same as it is done during
      boot.
      
      The benefits:
      
       - We improve memory hotplug performance because we are not evicting the
         cache several times and also reduce loop branching overhead.
      
       - Remove condition from hotpath in __init_single_pfn(), that was added
         in order to fix the problem that was reported by Bharata in the above
         email thread, thus also improve performance during normal boot.
      
       - Make memory hotplug more similar to the boot memory initialization
         path because we zero and initialize struct pages only in one
         function.
      
       - Simplifies memory hotplug struct page initialization code, and thus
         enables future improvements, such as multi-threading the
         initialization of struct pages in order to improve hotplug
         performance even further on larger machines.
      
      [pasha.tatashin@oracle.com: v5]
        Link: http://lkml.kernel.org/r/20180228030308.1116-7-pasha.tatashin@oracle.com
      Link: http://lkml.kernel.org/r/20180215165920.8570-7-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0dc12e8
    • P
      mm/memory_hotplug: don't read nid from struct page during hotplug · fc44f7f9
      Pavel Tatashin 提交于
      During memory hotplugging the probe routine will leave struct pages
      uninitialized, the same as it is currently done during boot.  Therefore,
      we do not want to access the inside of struct pages before
      __init_single_page() is called during onlining.
      
      Because during hotplug we know that pages in one memory block belong to
      the same numa node, we can skip the checking.  We should keep checking
      for the boot case.
      
      [pasha.tatashin@oracle.com: s/register_new_memory()/hotplug_memory_register()]
        Link: http://lkml.kernel.org/r/20180228030308.1116-6-pasha.tatashin@oracle.com
      Link: http://lkml.kernel.org/r/20180215165920.8570-6-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc44f7f9
    • P
      mm/memory_hotplug: optimize probe routine · b77eab70
      Pavel Tatashin 提交于
      When memory is hotplugged pages_correctly_reserved() is called to verify
      that the added memory is present, this routine traverses through every
      struct page and verifies that PageReserved() is set.  This is a slow
      operation especially if a large amount of memory is added.
      
      Instead of checking every page, it is enough to simply check that the
      section is present, has mapping (struct page array is allocated), and
      the mapping is online.
      
      In addition, we should not excpect that probe routine sets flags in
      struct page, as the struct pages have not yet been initialized.  The
      initialization should be done in __init_single_page(), the same as
      during boot.
      
      Link: http://lkml.kernel.org/r/20180215165920.8570-5-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b77eab70
    • M
      hv_netvsc: Pass net_device parameter to revoke and teardown functions · 3f076eff
      Mohammed Gamal 提交于
      The callers to netvsc_revoke_*_buf() and netvsc_teardown_*_gpadl()
      already have their net_device instances. Pass them as a paramaeter to
      the function instead of obtaining them from netvsc_device struct
      everytime
      Signed-off-by: NMohammed Gamal <mgamal@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f076eff
    • M
      hv_netvsc: Ensure correct teardown message sequence order · a56d99d7
      Mohammed Gamal 提交于
      Prior to commit 0cf73780 ("hv_netvsc: netvsc_teardown_gpadl() split")
      the call sequence in netvsc_device_remove() was as follows (as
      implemented in netvsc_destroy_buf()):
      1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
      2- Teardown receive buffer GPADL
      3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
      4- Teardown send buffer GPADL
      5- Close vmbus
      
      This didn't work for WS2016 hosts. Commit 0cf73780
      ("hv_netvsc: netvsc_teardown_gpadl() split") rearranged the
      teardown sequence as follows:
      1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
      2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
      3- Close vmbus
      4- Teardown receive buffer GPADL
      5- Teardown send buffer GPADL
      
      That worked well for WS2016 hosts, but it prevented guests on older hosts from
      shutting down after changing network settings. Commit 0ef58b0a
      ("hv_netvsc: change GPAD teardown order on older versions") ensured the
      following message sequence for older hosts
      1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
      2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
      3- Teardown receive buffer GPADL
      4- Teardown send buffer GPADL
      5- Close vmbus
      
      However, with this sequence calling `ip link set eth0 mtu 1000` hangs and the
      process becomes uninterruptible. On futher analysis it turns out that on tearing
      down the receive buffer GPADL the kernel is waiting indefinitely
      in vmbus_teardown_gpadl() for a completion to be signaled.
      
      Here is a snippet of where this occurs:
      int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
      {
              struct vmbus_channel_gpadl_teardown *msg;
              struct vmbus_channel_msginfo *info;
              unsigned long flags;
              int ret;
      
              info = kmalloc(sizeof(*info) +
                             sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
              if (!info)
                      return -ENOMEM;
      
              init_completion(&info->waitevent);
              info->waiting_channel = channel;
      [....]
              ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_gpadl_teardown),
                                   true);
      
              if (ret)
                      goto post_msg_err;
      
              wait_for_completion(&info->waitevent);
      [....]
      }
      
      The completion is signaled from vmbus_ongpadl_torndown(), which gets called when
      the corresponding message is received from the host, which apparently never happens
      in that case.
      This patch works around the issue by restoring the first mentioned message sequence
      for older hosts
      
      Fixes: 0ef58b0a ("hv_netvsc: change GPAD teardown order on older versions")
      Signed-off-by: NMohammed Gamal <mgamal@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a56d99d7
    • M
      hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl() · 7992894c
      Mohammed Gamal 提交于
      Split each of the functions into two for each of send/recv buffers.
      This will be needed in order to implement a fine-grained messaging
      sequence to the host so that we accommodate the requirements of
      different Windows versions
      
      Fixes: 0ef58b0a ("hv_netvsc: change GPAD teardown order on older versions")
      Signed-off-by: NMohammed Gamal <mgamal@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7992894c
    • M
      hv_netvsc: Use Windows version instead of NVSP version on GPAD teardown · 2afc5d61
      Mohammed Gamal 提交于
      When changing network interface settings, Windows guests
      older than WS2016 can no longer shutdown. This was addressed
      by commit 0ef58b0a ("hv_netvsc: change GPAD teardown order
      on older versions"), however the issue also occurs on WS2012
      guests that share NVSP protocol versions with WS2016 guests.
      Hence we use Windows version directly to differentiate them.
      
      Fixes: 0ef58b0a ("hv_netvsc: change GPAD teardown order on older versions")
      Signed-off-by: NMohammed Gamal <mgamal@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2afc5d61
    • M
      net: mvpp2: Fix parser entry init boundary check · 3d92f0b5
      Maxime Chevallier 提交于
      Boundary check in mvpp2_prs_init_from_hw must be done according to the
      passed "tid" parameter, not the mvpp2_prs_entry index, which is not yet
      initialized at the time of the check.
      
      Fixes: 47e0e14e ("net: mvpp2: Make mvpp2_prs_hw_read a parser entry init function")
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d92f0b5
    • K
      kernel.h: Retain constant expression output for max()/min() · 3c8ba0d6
      Kees Cook 提交于
      In the effort to remove all VLAs from the kernel[1], it is desirable to
      build with -Wvla.  However, this warning is overly pessimistic, in that
      it is only happy with stack array sizes that are declared as constant
      expressions, and not constant values.  One case of this is the
      evaluation of the max() macro which, due to its construction, ends up
      converting constant expression arguments into a constant value result.
      
      All attempts to rewrite this macro with __builtin_constant_p() failed
      with older compilers (e.g.  gcc 4.4)[2].  However, Martin Uecker,
      constructed[3] a mind-shattering solution that works everywhere.
      Cthulhu fhtagn!
      
      This patch updates the min()/max() macros to evaluate to a constant
      expression when called on constant expression arguments.  This removes
      several false-positive stack VLA warnings from an x86 allmodconfig build
      when -Wvla is added:
      
        $ diff -u before.txt after.txt | grep ^-
        -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids variable length array ‘ids’ [-Wvla]
        -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array ‘namebuf’ [-Wvla]
        -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ [-Wvla]
        -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
        -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
        -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ [-Wvla]
      
      This also updates two cases where different enums were being compared
      and explicitly casts them to int (which matches the old side-effect of
      the single-evaluation code): one in tpm/tpm_tis_core.h, and one in
      drm/drm_color_mgmt.c.
      
       [1] https://lkml.org/lkml/2018/3/7/621
       [2] https://lkml.org/lkml/2018/3/10/170
       [3] https://lkml.org/lkml/2018/3/20/845Co-Developed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Co-Developed-by: NMartin Uecker <Martin.Uecker@med.uni-goettingen.de>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c8ba0d6
    • M
      IB/rxe: Fix for oops in rxe_register_device on ppc64le arch · efc365e7
      Mikhail Malygin 提交于
      On ppc64le arch rxe_add command causes oops in kernel log:
      
      [   92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
      [   92.499710] SMP NR_CPUS=2048 NUMA pSeries
      [   92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
      _nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
       nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
      b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
      m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
      generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
      a2xxx(E)
      [   92.499832]  scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
      [   92.499834] Supported: No, Unsupported modules are loaded
      [   92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G           OE   NX 4.4.120-ttln.17-default #1
      [   92.499841] task: c0000000afe8a490 ti: c0000000beba8000 task.ti: c0000000beba8000
      [   92.499842] NIP: c00000000008ba3c LR: c000000000027644 CTR: c00000000008ba10
      [   92.499844] REGS: c0000000bebab750 TRAP: 0300   Tainted: G           OE   NX  (4.4.120-ttln.17-default)
      [   92.499850] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28424428  XER: 20000000
      [   92.499871] CFAR: 0000000000002424 DAR: 0000000000000208 DSISR: 40000000 SOFTE: 1
                     GPR00: c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
                     GPR04: d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
                     GPR08: 000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
                     GPR12: c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
                     GPR16: 0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
                     GPR20: c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
                     GPR24: c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
                     GPR28: d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
      [   92.499881] NIP [c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
      [   92.499885] LR [c000000000027644] dma_get_required_mask+0x44/0xac
      [   92.499886] Call Trace:
      [   92.499891] [c0000000bebab9d0] [c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
      [   92.499894] [c0000000bebaba10] [c000000000027644] dma_get_required_mask+0x44/0xac
      [   92.499904] [c0000000bebaba30] [d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
      [   92.499910] [c0000000bebabab0] [d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
      [   92.499915] [c0000000bebabb30] [d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
      [   92.499921] [c0000000bebabb60] [d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
      [   92.499924] [c0000000bebabbf0] [c0000000000e78c0] param_attr_store+0xa0/0x180
      [   92.499927] [c0000000bebabc70] [c0000000000e6448] module_attr_store+0x48/0x70
      [   92.499932] [c0000000bebabc90] [c000000000391f60] sysfs_kf_write+0x70/0xb0
      [   92.499935] [c0000000bebabcb0] [c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
      [   92.499939] [c0000000bebabd00] [c0000000002e22ac] __vfs_write+0x4c/0x1d0
      [   92.499942] [c0000000bebabd90] [c0000000002e2f94] vfs_write+0xc4/0x200
      [   92.499945] [c0000000bebabde0] [c0000000002e488c] SyS_write+0x6c/0x110
      [   92.499948] [c0000000bebabe30] [c000000000009384] system_call+0x38/0xe4
      [   92.499949] Instruction dump:
      [   92.499954] 4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
      [   92.499958] fbc1fff0 fbe1fff8 f8010010 f821ffc1 <e9230208> 7c7e1b78 2fa90000 419e0078
      [   92.499962] ---[ end trace bed077e15eb420cf ]---
      
      It fails in dma_get_required_mask, that has ppc-specific implementation,
      and fail if provided device argument is NULL
      Signed-off-by: NMikhail Malygin <mikhail@malygin.me>
      Reviewed-by: NYonatan Cohen <yonatanc@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      efc365e7
    • A
      IB/mlx5: Device memory mr registration support · 6c29f57e
      Ariel Levkovich 提交于
      Adding mlx5_ib driver implementation for reg_dm_mr callback
      which allows registering device memory (DM) as an MR for
      local and remote access.
      Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6c29f57e
    • A
      net/mlx5: Mkey creation command adjustments · cdbd0d2b
      Ariel Levkovich 提交于
      This change updates the mlx5 interface to create mkey
      on the device.
      
      The updates in the command mailbox include increasing the
      access mode type field to 5 bits in order to support additional
      types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
      memory access type and will be used when registering MR on allocated
      device memory.
      
      All the places that use the old access mode format are adjusted as
      well.
      Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      cdbd0d2b
    • A
      IB/mlx5: Device memory support in mlx5_ib · 24da0016
      Ariel Levkovich 提交于
      This patch adds the mlx5_ib driver implementation for the device
      memory allocation API.
      It implements the ib_device callbacks for allocation and deallocation
      operations as well as a new mmap command support which allows mapping
      an allocated device memory to a VMA.
      
      The change also adds reporting of device memory maximum size and
      alignment parameters reported in device capabilities.
      
      The allocation/deallocation operations are using new firmware
      commands to allocate MEMIC memory on the device.
      Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      24da0016