1. 08 12月, 2016 17 次提交
  2. 07 12月, 2016 23 次提交
    • D
      Merge branch 'bnxt_en-RDMA' · d3243aef
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: Add interface to support RDMA driver.
      
      This series adds an interface to support a brand new RDMA driver bnxt_re.
      The first step is to re-arrange some code so that pci_enable_msix() can
      be called during pci probe.  The purpose is to allow the RDMA driver to
      initialize and stay initialized whether the netdev is up or down.
      
      Then we make some changes to VF resource allocation so that there is
      enough resources to support RDMA.
      
      Finally the last patch adds a simple interface to allow the RDMA driver to
      probe and register itself with any bnxt_en devices that support RDMA.
      Once registered, the RDMA driver can request MSIX, send fw messages, and
      receive some notifications.
      
      v2: Fixed kbuild test robot warnings.
      
      David, please consider this series for net-next.  Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3243aef
    • M
      bnxt_en: Add interface to support RDMA driver. · a588e458
      Michael Chan 提交于
      Since the network driver and RDMA driver operate on the same PCI function,
      we need to create an interface to allow the RDMA driver to share resources
      with the network driver.
      
      1. Create a new bnxt_en_dev struct which will be returned by
      bnxt_ulp_probe() upon success.  After that, all calls from the RDMA driver
      to bnxt_en will pass a pointer to this struct.
      
      2. This struct contains additional function pointers to register, request
      msix, send fw messages, register for async events.
      
      3. If the RDMA driver wants to enable RDMA on the function, it needs to
      call the function pointer bnxt_register_device().  A ulp_ops structure
      is passed for RCU protected upcalls from bnxt_en to the RDMA driver.
      
      4. The RDMA driver can call firmware APIs using the bnxt_send_fw_msg()
      function pointer.
      
      5. 1 stats context is reserved when the RDMA driver registers.  MSIX
      and completion rings are reserved when the RDMA driver calls
      bnxt_request_msix() function pointer.
      
      6. When the RDMA driver calls bnxt_unregister_device(), all RDMA resources
      will be cleaned up.
      
      v2: Fixed 2 uninitialized variable warnings.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a588e458
    • M
      bnxt_en: Refactor the driver registration function with firmware. · a1653b13
      Michael Chan 提交于
      The driver register function with firmware consists of passing version
      information and registering for async events.  To support the RDMA driver,
      the async events that we need to register may change.  Separate the
      driver register function into 2 parts so that we can just update the
      async events for the RDMA driver.
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1653b13
    • M
      bnxt_en: Reserve RDMA resources by default. · e4060d30
      Michael Chan 提交于
      If the device supports RDMA, we'll setup network default rings so that
      there are enough minimum resources for RDMA, if possible.  However, the
      user can still increase network rings to the max if he wants.  The actual
      RDMA resources won't be reserved until the RDMA driver registers.
      
      v2: Fix compile warning when BNXT_CONFIG_SRIOV is not set.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4060d30
    • M
      bnxt_en: Improve completion ring allocation for VFs. · 7b08f661
      Michael Chan 提交于
      All available remaining completion rings not used by the PF should be
      made available for the VFs so that there are enough rings in the VF to
      support RDMA.  The earlier workaround code of capping the rings by the
      statistics context is removed.
      
      When SRIOV is disabled, call a new function bnxt_restore_pf_fw_resources()
      to restore FW resources.  Later on we need to add some logic to account
      for RDMA resources.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b08f661
    • M
      bnxt_en: Move function reset to bnxt_init_one(). · aa8ed021
      Michael Chan 提交于
      Now that MSIX is enabled in bnxt_init_one(), resources may be allocated by
      the RDMA driver before the network device is opened.  So we cannot do
      function reset in bnxt_open() which will clear all the resources.
      
      The proper place to do function reset now is in bnxt_init_one().
      If we get AER, we'll do function reset as well.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa8ed021
    • M
      bnxt_en: Enable MSIX early in bnxt_init_one(). · 7809592d
      Michael Chan 提交于
      To better support the new RDMA driver, we need to move pci_enable_msix()
      from bnxt_open() to bnxt_init_one().  This way, MSIX vectors are available
      to the RDMA driver whether the network device is up or down.
      
      Part of the existing bnxt_setup_int_mode() function is now refactored into
      a new bnxt_init_int_mode().  bnxt_init_int_mode() is called during
      bnxt_init_one() to enable MSIX.  The remaining logic in
      bnxt_setup_int_mode() to map the IRQs to the completion rings is called
      during bnxt_open().
      
      v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7809592d
    • M
      bnxt_en: Add bnxt_set_max_func_irqs(). · 33c2657e
      Michael Chan 提交于
      By refactoring existing code into this new function.  The new function
      will be used in subsequent patches.
      
      v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33c2657e
    • E
      net: sock_rps_record_flow() is for connected sockets · 5b8e2f61
      Eric Dumazet 提交于
      Paolo noticed a cache line miss in UDP recvmsg() to access
      sk_rxhash, sharing a cache line with sk_drops.
      
      sk_drops might be heavily incremented by cpus handling a flood targeting
      this socket.
      
      We might place sk_drops on a separate cache line, but lets try
      to avoid wasting 64 bytes per socket just for this, since we have
      other bottlenecks to take care of.
      
      sock_rps_record_flow() should only access sk_rxhash for connected
      flows.
      
      Testing sk_state for TCP_ESTABLISHED covers most of the cases for
      connected sockets, for a zero cost, since system calls using
      sock_rps_record_flow() also access sk->sk_prot which is on the
      same cache line.
      
      A follow up patch will provide a static_key (Jump Label) since most
      hosts do not even use RFS.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b8e2f61
    • A
      tun: Use netif_receive_skb instead of netif_rx · d4aea20d
      Andrey Konovalov 提交于
      This patch changes tun.c to call netif_receive_skb instead of netif_rx
      when a packet is received (if CONFIG_4KSTACKS is not enabled to avoid
      stack exhaustion). The difference between the two is that netif_rx queues
      the packet into the backlog, and netif_receive_skb proccesses the packet
      in the current context.
      
      This patch is required for syzkaller [1] to collect coverage from packet
      receive paths, when a packet being received through tun (syzkaller collects
      coverage per process in the process context).
      
      As mentioned by Eric this change also speeds up tun/tap. As measured by
      Peter it speeds up his closed-loop single-stream tap/OVS benchmark by
      about 23%, from 700k packets/second to 867k packets/second.
      
      A similar patch was introduced back in 2010 [2, 3], but the author found
      out that the patch doesn't help with the task he had in mind (for cgroups
      to shape network traffic based on the original process) and decided not to
      go further with it. The main concern back then was about possible stack
      exhaustion with 4K stacks.
      
      [1] https://github.com/google/syzkaller
      
      [2] https://www.spinics.net/lists/netdev/thrd440.html#130570
      
      [3] https://www.spinics.net/lists/netdev/msg130570.htmlSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4aea20d
    • D
      Merge branch 'w83977af_ir-neatening' · d10d0198
      David S. Miller 提交于
      Joe Perches says:
      
      ====================
      irda: w83977af_ir: Neatening
      
      Originally on top of Arnd's overly long udelay patches because I
      noticed a misindented block.  That's now already fixed along with some
      other whitespace problems.  These patches are the remainder style
      issues from my original series.
      
      Even though I haven't turned on the netwinder in a box in the
      garage in who knows how long, if this device is still used somewhere,
      might as well neaten the code too.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d10d0198
    • J
      irda: w83977af_ir: Neaten logging · 99d8d215
      Joe Perches 提交于
      Use more common logging style, standardize function output logging use.
      
      Miscellanea:
      
      o Add and use pr_fmt
      o Convert printks to pr_<level>
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99d8d215
    • J
      irda: w83977af_ir: Parenthesis alignment · 646bf092
      Joe Perches 提交于
      Neaten function declaration and definition arguments.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      646bf092
    • J
      irda: w83977af_ir: Use the common brace style · ae9e736b
      Joe Perches 提交于
      Add braces where appropriate and remove an unnecessary else.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae9e736b
    • J
      irda: w83977af_ir: Neaten pointer comparisons · c9471646
      Joe Perches 提交于
      Convert pointer comparisons to NULL.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9471646
    • J
      irda: w83977af_ir: Remove and add blank lines · 47fda1a7
      Joe Perches 提交于
      Use a more typical vertical spacing style.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47fda1a7
    • J
      irda: w83977af_ir: More whitespace neatening · 8a3fb40d
      Joe Perches 提交于
      Add spaces around operators.
      git diff -w shows no differences.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a3fb40d
    • J
      irda: w83977af_ir: Whitespace neatening · 352019c8
      Joe Perches 提交于
      Remove leading and trailing whitespace.
      git diff -w shows no differences.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      352019c8
    • D
      c63d352f
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · bc3913a5
      Linus Torvalds 提交于
      Pull sparc fix from David Miller:
       "A use-before-NULL-check from Dan Carpenter"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        dbri: move dereference after check for NULL
      bc3913a5
    • D
      dbri: move dereference after check for NULL · 163117e8
      Dan Carpenter 提交于
      We accidentally introduced a dereference before the NULL check in
      xmit_descs() as part of silencing a GCC warning.
      
      Fixes: 16f46050 ("dbri: Fix compiler warning")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      163117e8
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · da1b466f
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) When dcbnl_cee_fill() fails to be able to push a new netlink
          attribute, it return 0 instead of an error code. From Pan Bian.
      
       2) Two suffix handling fixes to FIB trie code, from Alexander Duyck.
      
       3) bnxt_hwrm_stat_ctx_alloc() goes through all the trouble of setting
          and maintaining a return code 'rc' but fails to actually return it.
          Also from Pan Bian.
      
       4) ping socket ICMP handler needs to validate ICMP header length, from
          Kees Cook.
      
       5) caif_sktinit_module() has this interesting logic:
      
              int err = sock_register(...);
              if (!err)
                      return err;
              return 0;
      
          Just return sock_register()'s return value directly which is the
          only possible correct thing to do.
      
       6) Two bnx2x driver fixes from Yuval Mintz, return a reasonable
          estimate from get_ringparam() ethtool op when interface is down and
          avoid trying to use UDP port based tunneling on 577xx chips.
      
       7) Fix ep93xx_eth crash on module unload from Florian Fainelli.
      
       8) Missing uapi exports, from Stephen Hemminger.
      
       9) Don't schedule work from sk_destruct(), because the socket will be
          freed upon return from that function. From Herbert Xu.
      
      10) Buggy drivers, of which we know there is at least one, can send a
          huge packet into the TCP stack but forget to set the gso_size in the
          SKB, which causes all kinds of problems.
      
          Correct this when it happens, and emit a one-time warning with the
          device name included so that it can be diagnosed more easily.
      
          From Marcelo Ricardo Leitner.
      
      11) virtio-net does DMA off the stack causes hiccups with VMAP_STACK,
          fix from Andy Lutomirski.
      
      12) Fix fec driver compilation with CONFIG_M5272, from Nikita
          Yushchenko.
      
      13) mlx5 fixes from Kamal Heib, Saeed Mahameed, and Mohamad Haj Yahia.
          (erroneously flushing queues on error, module parameter validation,
          etc)
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
        net/mlx5e: Change the SQ/RQ operational state to positive logic
        net/mlx5e: Don't flush SQ on error
        net/mlx5e: Don't notify HW when filling the edge of ICO SQ
        net/mlx5: Fix query ISSI flow
        net/mlx5: Remove duplicate pci dev name print
        net/mlx5: Verify module parameters
        net: fec: fix compile with CONFIG_M5272
        be2net: Add DEVSEC privilege to SET_HSW_CONFIG command.
        virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
        tcp: warn on bogus MSS and try to amend it
        uapi glibc compat: fix outer guard of net device flags enum
        net: stmmac: clear reset value of snps, wr_osr_lmt/snps, rd_osr_lmt before writing
        netlink: Do not schedule work from sk_destruct
        uapi: export nf_log.h
        uapi: export tc_skbmod.h
        net: ep93xx_eth: Do not crash unloading module
        bnx2x: Prevent tunnel config for 577xx
        bnx2x: Correct ringparam estimate when DOWN
        isdn: hisax: set error code on failure
        net: bnx2x: fix improper return value
        ...
      da1b466f
    • L
      shmem: fix shm fallocate() list corruption · 10d20bd2
      Linus Torvalds 提交于
      The shmem hole punching with fallocate(FALLOC_FL_PUNCH_HOLE) does not
      want to race with generating new pages by faulting them in.
      
      However, the wait-queue used to delay the page faulting has a serious
      problem: the wait queue head (in shmem_fallocate()) is allocated on the
      stack, and the code expects that "wake_up_all()" will make sure that all
      the queue entries are gone before the stack frame is de-allocated.
      
      And that is not at all necessarily the case.
      
      Yes, a normal wake-up sequence will remove the wait-queue entry that
      caused the wakeup (see "autoremove_wake_function()"), but the key
      wording there is "that caused the wakeup".  When there are multiple
      possible wakeup sources, the wait queue entry may well stay around.
      
      And _particularly_ in a page fault path, we may be faulting in new pages
      from user space while we also have other things going on, and there may
      well be other pending wakeups.
      
      So despite the "wake_up_all()", it's not at all guaranteed that all list
      entries are removed from the wait queue head on the stack.
      
      Fix this by introducing a new wakeup function that removes the list
      entry unconditionally, even if the target process had already woken up
      for other reasons.  Use that "synchronous" function to set up the
      waiters in shmem_fault().
      
      This problem has never been seen in the wild afaik, but Dave Jones has
      reported it on and off while running trinity.  We thought we fixed the
      stack corruption with the blk-mq rq_list locking fix (commit
      7fe31130: "blk-mq: update hardware and software queues for sleeping
      alloc"), but it turns out there was _another_ stack corruptor hiding
      in the trinity runs.
      
      Vegard Nossum (also running trinity) was able to trigger this one fairly
      consistently, and made us look once again at the shmem code due to the
      faults often being in that area.
      
      Reported-and-tested-by: Vegard Nossum <vegard.nossum@oracle.com>.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10d20bd2