1. 07 12月, 2019 2 次提交
  2. 03 12月, 2019 2 次提交
  3. 27 11月, 2019 7 次提交
    • J
      nvme-fc: fix double-free scenarios on hw queues · c869e494
      James Smart 提交于
      If an error occurs on one of the ios used for creating an
      association, the creating routine has error paths that are
      invoked by the command failure and the error paths will free
      up the controller resources created to that point.
      
      But... the io was ultimately determined by an asynchronous
      completion routine that detected the error and which
      unconditionally invokes the error_recovery path which calls
      delete_association. Delete association deletes all outstanding
      io then tears down the controller resources. So the
      create_association thread can be running in parallel with
      the error_recovery thread. What was seen was the LLDD received
      a call to delete a queue, causing the LLDD to do a free of a
      resource, then the transport called the delete queue again
      causing the driver to repeat the free call. The second free
      routine corrupted the allocator. The transport shouldn't be
      making the duplicate call, and the delete queue is just one
      of the resources being freed.
      
      To fix, it is realized that the create_association path is
      completely serialized with one command at a time. So the
      failed io completion will always be seen by the create_association
      path and as of the failure, there are no ios to terminate and there
      is no reason to be manipulating queue freeze states, etc.
      The serialized condition stays true until the controller is
      transitioned to the LIVE state. Thus the fix is to change the
      error recovery path to check the controller state and only
      invoke the teardown path if not already in the CONNECTING state.
      Reviewed-by: NHimanshu Madhani <hmadhani@marvell.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      c869e494
    • E
      nvme: else following return is not needed · c80b36cd
      Edmund Nadolski 提交于
      Remove unnecessary keyword in nvme_create_queue().
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NEdmund Nadolski <edmund.nadolski@intel.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      c80b36cd
    • J
      nvme: add error message on mismatching controller ids · a8157ff3
      James Smart 提交于
      We've seen a few devices that return different controller id's to
      the Fabric Connect command vs the Identify(controller) command. It's
      currently hard to identify this failure by existing error messages. It
      comes across as a (re)connect attempt in the transport that fails with
      a -22 (-EINVAL) status. The issue is compounded by older kernels not
      having the controller id check or had the identify command overwrite the
      fabrics controller id value before it checked. Both resulted in cases
      where the devices appeared fine until more recent kernels.
      
      Clarify the reject by adding an error message on controller id mismatches.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      a8157ff3
    • J
      nvme_fc: add module to ops template to allow module references · 863fbae9
      James Smart 提交于
      In nvme-fc: it's possible to have connected active controllers
      and as no references are taken on the LLDD, the LLDD can be
      unloaded.  The controller would enter a reconnect state and as
      long as the LLDD resumed within the reconnect timeout, the
      controller would resume.  But if a namespace on the controller
      is the root device, allowing the driver to unload can be problematic.
      To reload the driver, it may require new io to the boot device,
      and as it's no longer connected we get into a catch-22 that
      eventually fails, and the system locks up.
      
      Fix this issue by taking a module reference for every connected
      controller (which is what the core layer did to the transport
      module). Reference is cleared when the controller is removed.
      Acked-by: NHimanshu Madhani <hmadhani@marvell.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      863fbae9
    • I
      nvmet-loop: Avoid preallocating big SGL for data · 52e6d8ed
      Israel Rukshin 提交于
      nvme_loop_create_io_queues() preallocates a big buffer for the IO SGL based
      on SG_CHUNK_SIZE.
      
      Modern DMA engines are often capable of dealing with very big segments so
      the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
      SGL allocation per command.
      
      If a controller has lots of deep queues, preallocation for the sg list can
      consume substantial amounts of memory. For nvmet-loop, nr_hw_queues can be
      128 and each queue's depth 128. This means the resulting preallocation
      for the data SGL is 128*128*4K = 64MB per controller.
      
      Switch to runtime allocation for SGL for lists longer than 2 entries. This
      is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
      well. Runtime SGL allocation has always been the case for the legacy I/O
      path so this is nothing new.
      Tested-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      52e6d8ed
    • I
      nvme-fc: Avoid preallocating big SGL for data · b1ae1a23
      Israel Rukshin 提交于
      nvme_fc_create_io_queues() preallocates a big buffer for the IO SGL based
      on SG_CHUNK_SIZE.
      
      Modern DMA engines are often capable of dealing with very big segments so
      the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
      SGL allocation per command.
      
      If a controller has lots of deep queues, preallocation for the sg list can
      consume substantial amounts of memory. For nvme-fc, nr_hw_queues can be
      128 and each queue's depth 128. This means the resulting preallocation
      for the data SGL is 128*128*4K = 64MB per controller.
      
      Switch to runtime allocation for SGL for lists longer than 2 entries. This
      is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
      well. Runtime SGL allocation has always been the case for the legacy I/O
      path so this is nothing new.
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      b1ae1a23
    • I
      nvme-rdma: Avoid preallocating big SGL for data · 38e18002
      Israel Rukshin 提交于
      nvme_rdma_alloc_tagset() preallocates a big buffer for the IO SGL based
      on SG_CHUNK_SIZE.
      
      Modern DMA engines are often capable of dealing with very big segments so
      the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
      SGL allocation per command.
      
      If a controller has lots of deep queues, preallocation for the sg list can
      consume substantial amounts of memory. For nvme-rdma, nr_hw_queues can be
      128 and each queue's depth 128. This means the resulting preallocation
      for the data SGL is 128*128*4K = 64MB per controller.
      
      Switch to runtime allocation for SGL for lists longer than 2 entries. This
      is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
      well. Runtime SGL allocation has always been the case for the legacy I/O
      path so this is nothing new.
      
      The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
      support SG_CHAIN, use only runtime allocation for the SGL.
      
      We didn't notice of a performance degradation, since for small IOs we'll
      use the inline SG and for the bigger IOs the allocation of a bigger SGL
      from slab is fast enough.
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      38e18002
  4. 26 11月, 2019 6 次提交
    • J
      slip: Fix use-after-free Read in slip_open · e58c1912
      Jouni Hogander 提交于
      Slip_open doesn't clean-up device which registration failed from the
      slip_devs device list. On next open after failure this list is iterated
      and freed device is accessed. Fix this by calling sl_free_netdev in error
      path.
      
      Here is the trace from the Syzbot:
      
      __dump_stack lib/dump_stack.c:77 [inline]
      dump_stack+0x197/0x210 lib/dump_stack.c:118
      print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
      __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
      kasan_report+0x12/0x20 mm/kasan/common.c:634
      __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
      sl_sync drivers/net/slip/slip.c:725 [inline]
      slip_open+0xecd/0x11b7 drivers/net/slip/slip.c:801
      tty_ldisc_open.isra.0+0xa3/0x110 drivers/tty/tty_ldisc.c:469
      tty_set_ldisc+0x30e/0x6b0 drivers/tty/tty_ldisc.c:596
      tiocsetd drivers/tty/tty_io.c:2334 [inline]
      tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2594
      vfs_ioctl fs/ioctl.c:46 [inline]
      file_ioctl fs/ioctl.c:509 [inline]
      do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:696
      ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
      __do_sys_ioctl fs/ioctl.c:720 [inline]
      __se_sys_ioctl fs/ioctl.c:718 [inline]
      __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
      do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
      entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 3b5a3997 ("slip: Fix memory leak in slip_open error path")
      Reported-by: syzbot+4d5170758f3762109542@syzkaller.appspotmail.com
      Cc: David Miller <davem@davemloft.net>
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: NJouni Hogander <jouni.hogander@unikie.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e58c1912
    • O
      net: dsa: sja1105: fix sja1105_parse_rgmii_delays() · 9bca3a0a
      Oleksij Rempel 提交于
      This function was using configuration of port 0 in devicetree for all ports.
      In case CPU port was not 0, the delay settings was ignored. This resulted not
      working communication between CPU and the switch.
      
      Fixes: f5b8631c ("net: dsa: sja1105: Error out if RGMII delays are requested in DT")
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bca3a0a
    • M
      macvlan: schedule bc_work even if error · 1d7ea556
      Menglong Dong 提交于
      While enqueueing a broadcast skb to port->bc_queue, schedule_work()
      is called to add port->bc_work, which processes the skbs in
      bc_queue, to "events" work queue. If port->bc_queue is full, the
      skb will be discarded and schedule_work(&port->bc_work) won't be
      called. However, if port->bc_queue is full and port->bc_work is not
      running or pending, port->bc_queue will keep full and schedule_work()
      won't be called any more, and all broadcast skbs to macvlan will be
      discarded. This case can happen:
      
      macvlan_process_broadcast() is the pending function of port->bc_work,
      it moves all the skbs in port->bc_queue to the queue "list", and
      processes the skbs in "list". During this, new skbs will keep being
      added to port->bc_queue in macvlan_broadcast_enqueue(), and
      port->bc_queue may already full when macvlan_process_broadcast()
      return. This may happen, especially when there are a lot of real-time
      threads and the process is preempted.
      
      Fix this by calling schedule_work(&port->bc_work) even if
      port->bc_work is full in macvlan_broadcast_enqueue().
      
      Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
      Signed-off-by: NMenglong Dong <dong.menglong@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d7ea556
    • P
      enetc: add support Credit Based Shaper(CBS) for hardware offload · c431047c
      Po Liu 提交于
      The ENETC hardware support the Credit Based Shaper(CBS) which part
      of the IEEE-802.1Qav. The CBS driver was loaded by the sch_cbs
      interface when set in the QOS in the kernel.
      
      Here is an example command to set 20Mbits bandwidth in 1Gbits port
      for taffic class 7:
      
      tc qdisc add dev eth0 root handle 1: mqprio \
      	   num_tc 8 map 0 1 2 3 4 5 6 7 hw 1
      
      tc qdisc replace dev eth0 parent 1:8 cbs \
      	   locredit -1470 hicredit 30 \
      	   sendslope -980000 idleslope 20000 offload 1
      Signed-off-by: NPo Liu <Po.Liu@nxp.com>
      Reviewed-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c431047c
    • H
      net: phy: add helpers phy_(un)lock_mdio_bus · bec170e5
      Heiner Kallweit 提交于
      Add helpers to make locking/unlocking the MDIO bus easier.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec170e5
    • D
      mdio_bus: don't use managed reset-controller · 32085f25
      David Bauer 提交于
      Geert Uytterhoeven reported that using devm_reset_controller_get leads
      to a WARNING when probing a reset-controlled PHY. This is because the
      device devm_reset_controller_get gets supplied is not actually the
      one being probed.
      
      Acquire an unmanaged reset-control as well as free the reset_control on
      unregister to fix this.
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      CC: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid Bauer <mail@david-bauer.net>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32085f25
  5. 25 11月, 2019 17 次提交
  6. 24 11月, 2019 6 次提交