1. 24 6月, 2017 3 次提交
  2. 21 6月, 2017 1 次提交
  3. 18 6月, 2017 1 次提交
  4. 16 6月, 2017 1 次提交
  5. 13 6月, 2017 1 次提交
  6. 10 6月, 2017 1 次提交
  7. 08 6月, 2017 2 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
    • A
      net: don't call strlen on non-terminated string in dev_set_alias() · c28294b9
      Alexander Potapenko 提交于
      KMSAN reported a use of uninitialized memory in dev_set_alias(),
      which was caused by calling strlcpy() (which in turn called strlen())
      on the user-supplied non-terminated string.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c28294b9
  8. 07 6月, 2017 1 次提交
  9. 28 5月, 2017 1 次提交
  10. 22 5月, 2017 1 次提交
  11. 20 5月, 2017 4 次提交
  12. 18 5月, 2017 1 次提交
  13. 12 5月, 2017 2 次提交
    • D
      xdp: refine xdp api with regards to generic xdp · d67b9cd2
      Daniel Borkmann 提交于
      While working on the iproute2 generic XDP frontend, I noticed that
      as of right now it's possible to have native *and* generic XDP
      programs loaded both at the same time for the case when a driver
      supports native XDP.
      
      The intended model for generic XDP from b5cdae32 ("net: Generic
      XDP") is, however, that only one out of the two can be present at
      once which is also indicated as such in the XDP netlink dump part.
      The main rationale for generic XDP is to ease accessibility (in
      case a driver does not yet have XDP support) and to generically
      provide a semantical model as an example for driver developers
      wanting to add XDP support. The generic XDP option for an XDP
      aware driver can still be useful for comparing and testing both
      implementations.
      
      However, it is not intended to have a second XDP processing stage
      or layer with exactly the same functionality of the first native
      stage. Only reason could be to have a partial fallback for future
      XDP features that are not supported yet in the native implementation
      and we probably also shouldn't strive for such fallback and instead
      encourage native feature support in the first place. Given there's
      currently no such fallback issue or use case, lets not go there yet
      if we don't need to.
      
      Therefore, change semantics for loading XDP and bail out if the
      user tries to load a generic XDP program when a native one is
      present and vice versa. Another alternative to bailing out would
      be to handle the transition from one flavor to another gracefully,
      but that would require to bring the device down, exchange both
      types of programs, and bring it up again in order to avoid a tiny
      window where a packet could hit both hooks. Given this complicates
      the logic for just a debugging feature in the native case, I went
      with the simpler variant.
      
      For the dump, remove IFLA_XDP_FLAGS that was added with b5cdae32
      and reuse IFLA_XDP_ATTACHED for indicating the mode. Dumping all
      or just a subset of flags that were used for loading the XDP prog
      is suboptimal in the long run since not all flags are useful for
      dumping and if we start to reuse the same flag definitions for
      load and dump, then we'll waste bit space. What we really just
      want is to dump the mode for now.
      
      Current IFLA_XDP_ATTACHED semantics are: nothing was installed (0),
      a program is running at the native driver layer (1). Thus, add a
      mode that says that a program is running at generic XDP layer (2).
      Applications will handle this fine in that older binaries will
      just indicate that something is attached at XDP layer, effectively
      this is similar to IFLA_XDP_FLAGS attr that we would have had
      modulo the redundancy.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d67b9cd2
    • D
      xdp: add flag to enforce driver mode · 0489df9a
      Daniel Borkmann 提交于
      After commit b5cdae32 ("net: Generic XDP") we automatically fall
      back to a generic XDP variant if the driver does not support native
      XDP. Allow for an option where the user can specify that always the
      native XDP variant should be selected and in case it's not supported
      by a driver, just bail out.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0489df9a
  14. 09 5月, 2017 2 次提交
  15. 02 5月, 2017 1 次提交
  16. 01 5月, 2017 1 次提交
  17. 28 4月, 2017 1 次提交
  18. 27 4月, 2017 1 次提交
  19. 26 4月, 2017 1 次提交
    • D
      net: Generic XDP · b5cdae32
      David S. Miller 提交于
      This provides a generic SKB based non-optimized XDP path which is used
      if either the driver lacks a specific XDP implementation, or the user
      requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE.
      
      It is arguable that perhaps I should have required something like
      this as part of the initial XDP feature merge.
      
      I believe this is critical for two reasons:
      
      1) Accessibility.  More people can play with XDP with less
         dependencies.  Yes I know we have XDP support in virtio_net, but
         that just creates another depedency for learning how to use this
         facility.
      
         I wrote this to make life easier for the XDP newbies.
      
      2) As a model for what the expected semantics are.  If there is a pure
         generic core implementation, it serves as a semantic example for
         driver folks adding XDP support.
      
      One thing I have not tried to address here is the issue of
      XDP_PACKET_HEADROOM, thanks to Daniel for spotting that.  It seems
      incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or
      whatever even if the XDP program doesn't try to push headers at all.
      I think we really need the verifier to somehow propagate whether
      certain XDP helpers are used or not.
      
      v5:
       - Handle both negative and positive offset after running prog
       - Fix mac length in XDP_TX case (Alexei)
       - Use rcu_dereference_protected() in free_netdev (kbuild test robot)
      
      v4:
       - Fix MAC header adjustmnet before calling prog (David Ahern)
       - Disable LRO when generic XDP is installed (Michael Chan)
       - Bypass qdisc et al. on XDP_TX and record the event (Alexei)
       - Do not perform generic XDP on reinjected packets (DaveM)
      
      v3:
       - Make sure XDP program sees packet at MAC header, push back MAC
         header if we do XDP_TX.  (Alexei)
       - Elide GRO when generic XDP is in use.  (Alexei)
       - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic
         XDP even if the driver has an XDP implementation.  (Alexei)
       - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS
         attribute.  (Daniel)
      
      v2:
       - Add some "fall through" comments in switch statements based
         upon feedback from Andrew Lunn
       - Use RCU for generic xdp_prog, thanks to Johannes Berg.
      Tested-by: NAndy Gospodarek <andy@greyhouse.net>
      Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5cdae32
  20. 22 4月, 2017 1 次提交
  21. 14 4月, 2017 1 次提交
    • S
      net: Add a xfrm validate function to validate_xmit_skb · f6e27114
      Steffen Klassert 提交于
      When we do IPsec offloading, we need a fallback for
      packets that were targeted to be IPsec offloaded but
      rerouted to a device that does not support IPsec offload.
      For that we add a function that checks the offloading
      features of the sending device and and flags the
      requirement of a fallback before it calls the IPsec
      output function. The IPsec output function adds the IPsec
      trailer and does encryption if needed.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f6e27114
  22. 12 4月, 2017 1 次提交
  23. 11 4月, 2017 1 次提交
    • N
      sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() · 717a94b5
      NeilBrown 提交于
      It is not safe for one thread to modify the ->flags
      of another thread as there is no locking that can protect
      the update.
      
      So tsk_restore_flags(), which takes a task pointer and modifies
      the flags, is an invitation to do the wrong thing.
      
      All current users pass "current" as the task, so no developers have
      accepted that invitation.  It would be best to ensure it remains
      that way.
      
      So rename tsk_restore_flags() to current_restore_flags() and don't
      pass in a task_struct pointer.  Always operate on current->flags.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      717a94b5
  24. 10 4月, 2017 1 次提交
  25. 05 4月, 2017 1 次提交
    • V
      rtnl: Add support for netdev event to link messages · def12888
      Vlad Yasevich 提交于
      When netdev events happen, a rtnetlink_event() handler will send
      messages for every event in it's white list.  These messages contain
      current information about a particular device, but they do not include
      the iformation about which event just happened.  The consumer of
      the message has to try to infer this information.  In some cases
      (ex: NETDEV_NOTIFY_PEERS), that is not possible.
      
      This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
      that would have an encoding of the which event triggered this
      message.  This would allow the the message consumer to easily determine
      if it is interested in a particular event or not.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def12888
  26. 25 3月, 2017 4 次提交
  27. 15 3月, 2017 1 次提交
  28. 02 3月, 2017 2 次提交
    • E
      net: net_enable_timestamp() can be called from irq contexts · 13baa00a
      Eric Dumazet 提交于
      It is now very clear that silly TCP listeners might play with
      enabling/disabling timestamping while new children are added
      to their accept queue.
      
      Meaning net_enable_timestamp() can be called from BH context
      while current state of the static key is not enabled.
      
      Lets play safe and allow all contexts.
      
      The work queue is scheduled only under the problematic cases,
      which are the static key enable/disable transition, to not slow down
      critical paths.
      
      This extends and improves what we did in commit 5fa8bbda ("net: use
      a work queue to defer net_disable_timestamp() work")
      
      Fixes: b90e5794 ("net: dont call jump_label_dec from irq context")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13baa00a
    • E
      net: solve a NAPI race · 39e6c820
      Eric Dumazet 提交于
      While playing with mlx4 hardware timestamping of RX packets, I found
      that some packets were received by TCP stack with a ~200 ms delay...
      
      Since the timestamp was provided by the NIC, and my probe was added
      in tcp_v4_rcv() while in BH handler, I was confident it was not
      a sender issue, or a drop in the network.
      
      This would happen with a very low probability, but hurting RPC
      workloads.
      
      A NAPI driver normally arms the IRQ after the napi_complete_done(),
      after NAPI_STATE_SCHED is cleared, so that the hard irq handler can grab
      it.
      
      Problem is that if another point in the stack grabs NAPI_STATE_SCHED bit
      while IRQ are not disabled, we might have later an IRQ firing and
      finding this bit set, right before napi_complete_done() clears it.
      
      This can happen with busy polling users, or if gro_flush_timeout is
      used. But some other uses of napi_schedule() in drivers can cause this
      as well.
      
      thread 1                                 thread 2 (could be on same cpu, or not)
      
      // busy polling or napi_watchdog()
      napi_schedule();
      ...
      napi->poll()
      
      device polling:
      read 2 packets from ring buffer
                                                Additional 3rd packet is
      available.
                                                device hard irq
      
                                                // does nothing because
      NAPI_STATE_SCHED bit is owned by thread 1
                                                napi_schedule();
      
      napi_complete_done(napi, 2);
      rearm_irq();
      
      Note that rearm_irq() will not force the device to send an additional
      IRQ for the packet it already signaled (3rd packet in my example)
      
      This patch adds a new NAPI_STATE_MISSED bit, that napi_schedule_prep()
      can set if it could not grab NAPI_STATE_SCHED
      
      Then napi_complete_done() properly reschedules the napi to make sure
      we do not miss something.
      
      Since we manipulate multiple bits at once, use cmpxchg() like in
      sk_busy_loop() to provide proper transactions.
      
      In v2, I changed napi_watchdog() to use a relaxed variant of
      napi_schedule_prep() : No need to set NAPI_STATE_MISSED from this point.
      
      In v3, I added more details in the changelog and clears
      NAPI_STATE_MISSED in busy_poll_stop()
      
      In v4, I added the ideas given by Alexander Duyck in v3 review
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39e6c820