1. 16 5月, 2018 1 次提交
  2. 28 3月, 2018 1 次提交
  3. 10 3月, 2018 1 次提交
    • P
      net: introduce IFF_NO_RX_HANDLER · f5426250
      Paolo Abeni 提交于
      Some network devices - notably ipvlan slave - are not compatible with
      any kind of rx_handler. Currently the hook can be installed but any
      configuration (bridge, bond, macsec, ...) is nonfunctional.
      
      This change allocates a priv_flag bit to mark such devices and explicitly
      forbid installing a rx_handler if such bit is set. The new bit is used
      by ipvlan slave device.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5426250
  4. 08 3月, 2018 1 次提交
    • P
      net: unpollute priv_flags space · 1ec54cb4
      Paolo Abeni 提交于
      the ipvlan device driver defines and uses 2 bits inside the priv_flags
      net_device field. Such bits and the related helper are used only
      inside the ipvlan device driver, and the core networking does not
      need to be aware of them.
      
      This change moves netif_is_ipvlan* helper in the ipvlan driver and
      re-implement them looking for ipvlan specific symbols instead of
      using priv_flags.
      
      Overall this frees two bits inside priv_flags - and move the following
      ones to avoid gaps - without any intended functional change.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ec54cb4
  5. 05 3月, 2018 1 次提交
  6. 01 3月, 2018 1 次提交
    • P
      ipvlan: use per device spinlock to protect addrs list updates · 82308194
      Paolo Abeni 提交于
      This changeset moves ipvlan address under RCU protection, using
      a per ipvlan device spinlock to protect list mutation and RCU
      read access to protect list traversal.
      
      Also explicitly use RCU read lock to traverse the per port
      ipvlans list, so that we can now perform a full address lookup
      without asserting the RTNL lock.
      
      Overall this allows the ipvlan driver to check fully for duplicate
      addresses - before this commit ipv6 addresses assigned by autoconf
      via prefix delegation where accepted without any check - and avoid
      the following rntl assertion failure still in the same code path:
      
       RTNL: assertion failed at drivers/net/ipvlan/ipvlan_core.c (124)
       WARNING: CPU: 15 PID: 0 at drivers/net/ipvlan/ipvlan_core.c:124 ipvlan_addr_busy+0x97/0xa0 [ipvlan]
       Modules linked in: ipvlan(E) ixgbe
       CPU: 15 PID: 0 Comm: swapper/15 Tainted: G            E    4.16.0-rc2.ipvlan+ #1782
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
       RIP: 0010:ipvlan_addr_busy+0x97/0xa0 [ipvlan]
       RSP: 0018:ffff881ff9e03768 EFLAGS: 00010286
       RAX: 0000000000000000 RBX: ffff881fdf2a9000 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: 00000000000000f6 RDI: 0000000000000300
       RBP: ffff881fdf2a8000 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: ffff881ff9e034c0 R12: ffff881fe07bcc00
       R13: 0000000000000001 R14: ffffffffa02002b0 R15: 0000000000000001
       FS:  0000000000000000(0000) GS:ffff881ff9e00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fc5c1a4f248 CR3: 000000207e012005 CR4: 00000000001606e0
       Call Trace:
        <IRQ>
        ipvlan_addr6_event+0x6c/0xd0 [ipvlan]
        notifier_call_chain+0x49/0x90
        atomic_notifier_call_chain+0x6a/0x100
        ipv6_add_addr+0x5f9/0x720
        addrconf_prefix_rcv_add_addr+0x244/0x3c0
        addrconf_prefix_rcv+0x2f3/0x790
        ndisc_router_discovery+0x633/0xb70
        ndisc_rcv+0x155/0x180
        icmpv6_rcv+0x4ac/0x5f0
        ip6_input_finish+0x138/0x6a0
        ip6_input+0x41/0x1f0
        ipv6_rcv+0x4db/0x8d0
        __netif_receive_skb_core+0x3d5/0xe40
        netif_receive_skb_internal+0x89/0x370
        napi_gro_receive+0x14f/0x1e0
        ixgbe_clean_rx_irq+0x4ce/0x1020 [ixgbe]
        ixgbe_poll+0x31a/0x7a0 [ixgbe]
        net_rx_action+0x296/0x4f0
        __do_softirq+0xcf/0x4f5
        irq_exit+0xf5/0x110
        do_IRQ+0x62/0x110
        common_interrupt+0x91/0x91
        </IRQ>
      
       v1 -> v2: drop unneeded in_softirq check in ipvlan_addr6_validator_event()
      
      Fixes: e9997c29 ("ipvlan: fix check for IP addresses in control path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82308194
  7. 28 2月, 2018 1 次提交
    • K
      net: Convert ipvlan_net_ops · 68eabe8b
      Kirill Tkhai 提交于
      These pernet_operations unregister ipvlan net hooks.
      nf_unregister_net_hooks() removes hooks one-by-one,
      and then frees the memory via rcu. This looks similar
      to that happens, when a new hooks is added: allocation
      of bigger memory region, copy of old content, and rcu
      freeing the old memory. So, all of net code should be
      well with this behavior. Also at the time of hook
      unregistering, there are no packets, and foreign net
      pernet_operations are not interested in others hooks.
      So, we mark them as async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68eabe8b
  8. 22 2月, 2018 1 次提交
  9. 03 12月, 2017 1 次提交
  10. 18 11月, 2017 1 次提交
  11. 29 10月, 2017 2 次提交
    • M
      ipvlan: implement VEPA mode · fe89aa6b
      Mahesh Bandewar 提交于
      This is very similar to the Macvlan VEPA mode, however, there is some
      difference. IPvlan uses the mac-address of the lower device, so the VEPA
      mode has implications of ICMP-redirects for packets destined for its
      immediate neighbors sharing same master since the packets will have same
      source and dest mac. The external switch/router will send redirect msg.
      
      Having said that, this will be useful tool in terms of debugging
      since IPvlan will not switch packets within its slaves and rely completely
      on the external entity as intended in 802.1Qbg.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe89aa6b
    • M
      ipvlan: introduce 'private' attribute for all existing modes. · a190d04d
      Mahesh Bandewar 提交于
      IPvlan has always operated in bridge mode. However there are scenarios
      where each slave should be able to talk through the master device but
      not necessarily across each other. Think of an environment where each
      of a namespace is a private and independant customer. In this scenario
      the machine which is hosting these namespaces neither want to tell who
      their neighbor is nor the individual namespaces care to talk to neighbor
      on short-circuited network path.
      
      This patch implements the mode that is very similar to the 'private' mode
      in macvlan where individual slaves can send and receive traffic through
      the master device, just that they can not talk among slave devices.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a190d04d
  12. 20 10月, 2017 2 次提交
  13. 13 10月, 2017 1 次提交
    • M
      ipvlan: always use the current L2 addr of the master · 32c10bbf
      Mahesh Bandewar 提交于
      If the underlying master ever changes its L2 (e.g. bonding device),
      then make sure that the IPvlan slaves always emit packets with the
      current L2 of the master instead of the stale mac addr which was
      copied during the device creation. The problem can be seen with
      following script -
      
        #!/bin/bash
        # Create a vEth pair
        ip link add dev veth0 type veth peer name veth1
        ip link set veth0 up
        ip link set veth1 up
        ip link show veth0
        ip link show veth1
        # Create an IPvlan device on one end of this vEth pair.
        ip link add link veth0 dev ipvl0 type ipvlan mode l2
        ip link show ipvl0
        # Change the mac-address of the vEth master.
        ip link set veth0 address 02:11:22:33:44:55
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32c10bbf
  14. 05 10月, 2017 1 次提交
  15. 02 8月, 2017 1 次提交
  16. 01 8月, 2017 1 次提交
  17. 18 7月, 2017 1 次提交
  18. 27 6月, 2017 3 次提交
  19. 10 6月, 2017 1 次提交
    • K
      Ipvlan should return an error when an address is already in use. · 3ad7d246
      Krister Johansen 提交于
      The ipvlan code already knows how to detect when a duplicate address is
      about to be assigned to an ipvlan device.  However, that failure is not
      propogated outward and leads to a silent failure.
      
      Introduce a validation step at ip address creation time and allow device
      drivers to register to validate the incoming ip addresses.  The ipvlan
      code is the first consumer.  If it detects an address in use, we can
      return an error to the user before beginning to commit the new ifa in
      the networking code.
      
      This can be especially useful if it is necessary to provision many
      ipvlans in containers.  The provisioning software (or operator) can use
      this to detect situations where an ip address is unexpectedly in use.
      Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ad7d246
  20. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  21. 25 4月, 2017 1 次提交
    • F
      ipvlan: use pernet operations and restrict l3s hooks to master netns · 3133822f
      Florian Westphal 提交于
      commit 4fbae7d8 ("ipvlan: Introduce l3s mode") added
      registration of netfilter hooks via nf_register_hooks().
      
      This API provides the illusion of 'global' netfilter hooks by placing the
      hooks in all current and future network namespaces.
      
      In case of ipvlan the hook appears to be only needed in the namespace
      that contains the ipvlan master device (i.e., usually init_net), so
      placing them in all namespaces is not needed.
      
      This switches ipvlan driver to pernet operations, and then only registers
      hooks in namespaces where a ipvlan master device is set to l3s mode.
      
      Extra care has to be taken when the master device is moved to another
      namespace, as we might have to 'move' the netfilter hooks too.
      
      This is done by storing the namespace the ipvlan port was created in.
      On REGISTER event, do (un)register operations in the old/new namespaces.
      
      This will also allow removal of the nf_register_hooks() in a future patch.
      
      Cc: Mahesh Bandewar <maheshb@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3133822f
  22. 12 2月, 2017 1 次提交
  23. 21 1月, 2017 1 次提交
  24. 17 1月, 2017 1 次提交
    • M
      ipvlan: fix dev_id creation corner case. · 019ec003
      Mahesh Bandewar 提交于
      In the last patch da36e13c ("ipvlan: improvise dev_id generation
      logic in IPvlan") I missed some part of Dave's suggestion and because
      of that the dev_id creation could fail in a corner case scenario. This
      would happen when more or less 64k devices have been already created and
      several have been deleted. If the devices that are still sticking around
      are the last n bits from the bitmap. So in this scenario even if lower
      bits are available, the dev_id search is so narrow that it always fails.
      
      Fixes: da36e13c ("ipvlan: improvise dev_id generation logic in IPvlan")
      CC: David Miller <davem@davemloft.org>
      CC: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      019ec003
  25. 11 1月, 2017 1 次提交
    • M
      ipvlan: improvise dev_id generation logic in IPvlan · da36e13c
      Mahesh Bandewar 提交于
      The patch 009146d1 ("ipvlan: assign unique dev-id for each slave
      device.") used ida_simple_get() to generate dev_ids assigned to the
      slave devices. However (Eric has pointed out that) there is a shortcoming
      with that approach as it always uses the first available ID. This
      becomes a problem when a slave gets deleted and a new slave gets added.
      The ID gets reassigned causing the new slave to get the same link-local
      address. This side-effect is undesirable.
      
      This patch adds a per-port variable that keeps track of the IDs
      assigned and used as the stat-base for the IDR api. This base will be
      wrapped around when it reaches the MAX (0xFFFE) value possibly on a
      busy system where slaves are added and deleted routinely.
      
      Fixes: 009146d1 ("ipvlan: assign unique dev-id for each slave device.")
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      CC: Eric Dumazet <edumazet@google.com>
      CC: David Miller <davem@davemloft.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da36e13c
  26. 09 1月, 2017 1 次提交
  27. 05 1月, 2017 1 次提交
  28. 29 12月, 2016 1 次提交
  29. 24 12月, 2016 1 次提交
  30. 09 12月, 2016 1 次提交
  31. 08 12月, 2016 1 次提交
  32. 01 12月, 2016 1 次提交
  33. 28 11月, 2016 1 次提交
  34. 16 10月, 2016 1 次提交
    • J
      ipvlan: constify l3mdev_ops structure · ab530f63
      Julia Lawall 提交于
      This l3mdev_ops structure is only stored in the l3mdev_ops field of a
      net_device structure.  This field is declared const, so the l3mdev_ops
      structure can be declared as const also.  Additionally drop the
      __read_mostly annotation.
      
      The semantic patch that adds const is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r disable optional_qualifier@
      identifier i;
      position p;
      @@
      static struct l3mdev_ops i@p = { ... };
      
      @ok@
      identifier r.i;
      struct net_device *e;
      position p;
      @@
      e->l3mdev_ops = &i@p;
      
      @bad@
      position p != {r.p,ok.p};
      identifier r.i;
      struct l3mdev_ops e;
      @@
      e@i@p
      
      @depends on !bad disable optional_qualifier@
      identifier r.i;
      @@
      static
      +const
       struct l3mdev_ops i = { ... };
      // </smpl>
      
      The effect on the layout of the .o file is shown by the following output
      of the size command, first before then after the transformation:
      
         text    data     bss     dec     hex filename
         7364     466      52    7882    1eca drivers/net/ipvlan/ipvlan_main.o
         7412     434      52    7898    1eda drivers/net/ipvlan/ipvlan_main.o
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab530f63
  35. 19 9月, 2016 1 次提交
    • M
      ipvlan: Introduce l3s mode · 4fbae7d8
      Mahesh Bandewar 提交于
      In a typical IPvlan L3 setup where master is in default-ns and
      each slave is into different (slave) ns. In this setup egress
      packet processing for traffic originating from slave-ns will
      hit all NF_HOOKs in slave-ns as well as default-ns. However same
      is not true for ingress processing. All these NF_HOOKs are
      hit only in the slave-ns skipping them in the default-ns.
      IPvlan in L3 mode is restrictive and if admins want to deploy
      iptables rules in default-ns, this asymmetric data path makes it
      impossible to do so.
      
      This patch makes use of the l3_rcv() (added as part of l3mdev
      enhancements) to perform input route lookup on RX packets without
      changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN
      to change the skb->dev just before handing over skb to L4.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      CC: David Ahern <dsa@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fbae7d8
  36. 10 6月, 2016 1 次提交