1. 18 4月, 2011 1 次提交
  2. 22 3月, 2011 1 次提交
    • E
      macvlan: Fix use after free of struct macvlan_port. · d5cd9244
      Eric W. Biederman 提交于
      When the macvlan driver was extended to call unregisgter_netdevice_queue
      in 23289a37, a use after free of struct
      macvlan_port was introduced.  The code in dellink relied on unregister_netdevice
      actually unregistering the net device so it would be safe to free macvlan_port.
      
      Since unregister_netdevice_queue can just queue up the unregister instead of
      performing the unregiser immediately we free the macvlan_port too soon and
      then the code in macvlan_stop removes the macaddress for the set of macaddress
      to listen for and uses memory that has already been freed.
      
      To fix this add a reference count to track when it is safe to free the macvlan_port
      and move the call of macvlan_port_destroy into macvlan_uninit which is guaranteed
      to be called after the final macvlan_port_close.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5cd9244
  3. 17 3月, 2011 1 次提交
  4. 15 3月, 2011 1 次提交
    • D
      macvlan : fix checksums error when we are in bridge mode · 12a2856b
      Daniel Lezcano 提交于
      When the lower device has offloading capabilities, the packets checksums
      are not computed. That leads to have any macvlan port in bridge mode to
      not work because the packets are dropped due to a bad checksum.
      
      If the macvlan is in bridge mode, the packet is forwarded to another
      macvlan port and reach the network stack where it looks for a checksum
      but this one was not computed due to the offloading of the lower device.
      In this case, we have to set the packet with CHECKSUM_UNNECESSARY
      when it is forwarded to a bridged port and restore the previous value of
      ip_summed when the packet goes to the lowerdev.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Andrian Nord <nightnord@gmail.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12a2856b
  5. 23 11月, 2010 1 次提交
    • S
      macvlan: Introduce 'passthru' mode to takeover the underlying device · eb06acdc
      Sridhar Samudrala 提交于
      With the current default 'vepa' mode, a KVM guest using virtio with
      macvtap backend has the following limitations.
      - cannot change/add a mac address on the guest virtio-net
      - cannot create a vlan device on the guest virtio-net
      - cannot enable promiscuous mode on guest virtio-net
      
      To address these limitations, this patch introduces a new mode called
      'passthru' when creating a macvlan device which allows takeover of the
      underlying device and passing it to a guest using virtio with macvtap
      backend.
      
      Only one macvlan device is allowed in passthru mode and it inherits
      the mac address from the underlying device and sets it in promiscuous
      mode to receive and forward all the packets.
      Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
      
      -------------------------------------------------------------------------
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb06acdc
  6. 17 11月, 2010 1 次提交
    • E
      macvlan: lockless tx path · 8ffab51b
      Eric Dumazet 提交于
      macvlan is a stacked device, like tunnels. We should use the lockless
      mechanism we are using in tunnels and loopback.
      
      This patch completely removes locking in TX path.
      
      tx stat counters are added into existing percpu stat structure, renamed
      from rx_stats to pcpu_stats.
      
      Note : this reverts commit 2c114553 (macvlan: add multiqueue
      capability)
      
      Note : rx_errors converted to a 32bit counter, like tx_dropped, since
      they dont need 64bit range.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Ben Greear <greearb@candelatech.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ffab51b
  7. 18 9月, 2010 1 次提交
  8. 28 7月, 2010 1 次提交
  9. 23 7月, 2010 1 次提交
    • H
      macvtap: Limit packet queue length · 8a35747a
      Herbert Xu 提交于
      Mark Wagner reported OOM symptoms when sending UDP traffic over
      a macvtap link to a kvm receiver.
      
      This appears to be caused by the fact that macvtap packet queues
      are unlimited in length.  This means that if the receiver can't
      keep up with the rate of flow, then we will hit OOM. Of course
      it gets worse if the OOM killer then decides to kill the receiver.
      
      This patch imposes a cap on the packet queue length, in the same
      way as the tuntap driver, using the device TX queue length.
      
      Please note that macvtap currently has no way of giving congestion
      notification, that means the software device TX queue cannot be
      used and packets will always be dropped once the macvtap driver
      queue fills up.
      
      This shouldn't be a great problem for the scenario where macvtap
      is used to feed a kvm receiver, as the traffic is most likely
      external in origin so congestion notification can't be applied
      anyway.
      
      Of course, if anybody decides to complain about guest-to-guest
      UDP packet loss down the track, then we may have to revisit this.
      
      Incidentally, this patch also fixes a real memory leak when
      macvtap_get_queue fails.
      
      Chris Wright noticed that for this patch to work, we need a
      non-zero TX queue length.  This patch includes his work to change
      the default macvtap TX queue length to 500.
      Reported-by: NMark Wagner <mwagner@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a35747a
  10. 10 7月, 2010 1 次提交
    • B
      net: Get rid of rtnl_link_stats64 / net_device_stats union · 3cfde79c
      Ben Hutchings 提交于
      In commit be1f3c2c "net: Enable 64-bit
      net device statistics on 32-bit architectures" I redefined struct
      net_device_stats so that it could be used in a union with struct
      rtnl_link_stats64, avoiding the need for explicit copying or
      conversion between the two.  However, this is unsafe because there is
      no locking required and no lock consistently held around calls to
      dev_get_stats() and use of the statistics structure it returns.
      
      In commit 28172739 "net: fix 64 bit
      counters on 32 bit arches" Eric Dumazet dealt with that problem by
      requiring callers of dev_get_stats() to provide storage for the
      result.  This means that the net_device::stats64 field and the padding
      in struct net_device_stats are now redundant, so remove them.
      
      Update the comment on net_device_ops::ndo_get_stats64 to reflect its
      new usage.
      
      Change dev_txq_stats_fold() to use struct rtnl_link_stats64, since
      that is what all its callers are really using and it is no longer
      going to be compatible with struct net_device_stats.
      
      Eric Dumazet suggested the separate function for the structure
      conversion.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3cfde79c
  11. 08 7月, 2010 1 次提交
    • E
      net: fix 64 bit counters on 32 bit arches · 28172739
      Eric Dumazet 提交于
      There is a small possibility that a reader gets incorrect values on 32
      bit arches. SNMP applications could catch incorrect counters when a
      32bit high part is changed by another stats consumer/provider.
      
      One way to solve this is to add a rtnl_link_stats64 param to all
      ndo_get_stats64() methods, and also add such a parameter to
      dev_get_stats().
      
      Rule is that we are not allowed to use dev->stats64 as a temporary
      storage for 64bit stats, but a caller provided area (usually on stack)
      
      Old drivers (only providing get_stats() method) need no changes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28172739
  12. 29 6月, 2010 1 次提交
  13. 16 6月, 2010 2 次提交
  14. 08 6月, 2010 1 次提交
  15. 02 6月, 2010 1 次提交
  16. 25 5月, 2010 1 次提交
  17. 16 5月, 2010 2 次提交
  18. 04 4月, 2010 1 次提交
  19. 19 3月, 2010 1 次提交
  20. 04 2月, 2010 2 次提交
  21. 16 1月, 2010 1 次提交
  22. 04 12月, 2009 1 次提交
  23. 27 11月, 2009 3 次提交
    • A
      macvlan: export macvlan mode through netlink · 27c0b1a8
      Arnd Bergmann 提交于
      In order to support all three modes of macvlan at
      runtime, extend the existing netlink protocol
      to allow choosing the mode per macvlan slave
      interface.
      
      This depends on a matching patch to iproute2
      in order to become accessible in user land.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27c0b1a8
    • A
      macvlan: implement bridge, VEPA and private mode · 618e1b74
      Arnd Bergmann 提交于
      This allows each macvlan slave device to be in one
      of three modes, depending on the use case:
      
      MACVLAN_PRIVATE:
        The device never communicates with any other device
        on the same upper_dev. This even includes frames
        coming back from a reflective relay, where supported
        by the adjacent bridge.
      
      MACVLAN_VEPA:
        The new Virtual Ethernet Port Aggregator (VEPA) mode,
        we assume that the adjacent bridge returns all frames
        where both source and destination are local to the
        macvlan port, i.e. the bridge is set up as a reflective
        relay.
        Broadcast frames coming in from the upper_dev get
        flooded to all macvlan interfaces in VEPA mode.
        We never deliver any frames locally.
      
      MACVLAN_BRIDGE:
        We provide the behavior of a simple bridge between
        different macvlan interfaces on the same port. Frames
        from one interface to another one get delivered directly
        and are not sent out externally. Broadcast frames get
        flooded to all other bridge ports and to the external
        interface, but when they come back from a reflective
        relay, we don't deliver them again.
        Since we know all the MAC addresses, the macvlan bridge
        mode does not require learning or STP like the bridge
        module does.
      
      Based on an earlier patch "macvlan: Reflect macvlan packets
      meant for other macvlan devices" by Eric Biederman.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      618e1b74
    • A
      macvlan: cleanup rx statistics · a1e514c5
      Arnd Bergmann 提交于
      We have very similar code for rx statistics in
      two places in the macvlan driver, with a third
      one being added in the next patch.
      
      Consolidate them into one function to improve
      overall readability of the driver.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1e514c5
  24. 24 11月, 2009 1 次提交
  25. 18 11月, 2009 1 次提交
    • E
      macvlan: Precise RX stats accounting · fccaf710
      Eric Dumazet 提交于
      With multi queue devices, its possible that several cpus call
      macvlan RX routines simultaneously for the same macvlan device.
      
      We update RX stats counter without any locking, so we can
      get slightly wrong counters.
      
      One possible fix is to use percpu counters, to get precise
      accounting and also get guarantee of no cache line ping pongs
      between cpus.
      
      Note: this adds 16 bytes (32 bytes on 64bit arches) of percpu
      data per macvlan device.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fccaf710
  26. 14 11月, 2009 1 次提交
  27. 08 11月, 2009 1 次提交
    • E
      net: Support specifying the network namespace upon device creation. · 81adee47
      Eric W. Biederman 提交于
      There is no good reason to not support userspace specifying the
      network namespace during device creation, and it makes it easier
      to create a network device and pass it to a child network namespace
      with a well known name.
      
      We have to be careful to ensure that the target network namespace
      for the new device exists through the life of the call.  To keep
      that logic clear I have factored out the network namespace grabbing
      logic into rtnl_link_get_net.
      
      In addtion we need to continue to pass the source network namespace
      to the rtnl_link_ops.newlink method so that we can find the base
      device source network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      81adee47
  28. 28 10月, 2009 1 次提交
  29. 04 9月, 2009 1 次提交
    • E
      macvlan: add multiqueue capability · 2c114553
      Eric Dumazet 提交于
      macvlan devices are currently not multi-queue capable.
      
      We can do that defining rtnl_link_ops method,
      get_tx_queues(), called from rtnl_create_link()
      
      This new method gets num_tx_queues/real_num_tx_queues
      from lower device.
      
      macvlan_get_tx_queues() is a copy of vlan_get_tx_queues().
      
      Because macvlan_start_xmit() has to update netdev_queue
      stats only (and not dev->stats), I chose to change
      tx_errors/tx_aborted_errors accounting to tx_dropped,
      since netdev_queue structure doesnt define tx_errors /
      tx_aborted_errors.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c114553
  30. 02 9月, 2009 1 次提交
  31. 01 9月, 2009 1 次提交
  32. 11 6月, 2009 1 次提交
    • S
      drivers/net/macvlan.c: fix cloning of tagged VLAN interfaces · ef5c8996
      sg.tweak@gmail.com 提交于
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13348
      
      akpm: the reporter disappeared, so I typed it in again.
      
      It is not possible to make clone of tagged VLAN interface to be used as
      mac-based vlan interfave.
      
      How reproducible:
      Use any 802.1q tagged vlan interface, e.g. eth2.700 and clone it:
      
        ip link add link eth2.700 address 00:04:75:cb:38:09 macvlan0 type macvlan
        ip link set dev macvlan0 up
        ip addr add 10.195.1.1/24 dev macvlan0
      
      So far, so good. Now try to ping anything via macvlan0:
      
        ping 10.195.1.2
      
      Actual results:
      For every attempted packet tx kernel writes to console:
      
      ------------[ cut here ]------------
      WARNING: at net/8021q/vlan_dev.c:254 vlan_dev_hard_header+0x36/0x126 [8021q]()
      Hardware name: M22ES
      Modules linked in: arptable_filter arp_tables bridge veth macvlan arc4 ecb
      ppp_mppe ppp_async crc_ccitt ppp_generic slhc autofs4 sunrpc 8021q garp stp
      ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp
      x_tables dm_mirror dm_region_hash dm_log dm_multipath dm_mod sbs sbshc lp
      floppy snd_intel8x0 joydev snd_seq_dummy snd_intel8x0m snd_ac97_codec
      ide_cd_mod ac97_bus snd_seq_oss cdrom snd_seq_midi_event serio_raw snd_seq
      snd_seq_device snd_pcm_oss snd_mixer_oss parport_pc snd_pcm parport battery
      8139cp snd_timer i2c_sis96x ac button snd rtc_cmos rtc_core 8139too soundcore
      rtc_lib mii i2c_core pcspkr snd_page_alloc pata_sis libata sd_mod scsi_mod ext3
      jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: ip_tables]
      Pid: 0, comm: swapper Tainted: G        W  2.6.29.3 #1
      Call Trace:
       [<c0425f48>] warn_slowpath+0x60/0x9f
       [<c0425f6f>] warn_slowpath+0x87/0x9f
       [<dffb850d>] vlan_dev_hard_header+0x0/0x126 [8021q]
       [<dffb8543>] vlan_dev_hard_header+0x36/0x126 [8021q]
       [<dffb850d>] vlan_dev_hard_header+0x0/0x126 [8021q]
       [<df83155d>] macvlan_hard_header+0x3c/0x47 [macvlan]
       [<df831521>] macvlan_hard_header+0x0/0x47 [macvlan]
       [<c062bf3f>] arp_create+0xef/0x1ff
       [<c062c08c>] arp_send+0x3d/0x54
       [<c062c916>] arp_solicit+0x16c/0x177
       [<c05fadd2>] neigh_timer_handler+0x227/0x269
       [<c05fabab>] neigh_timer_handler+0x0/0x269
       [<c042ce4d>] run_timer_softirq+0xf0/0x141
       [<c0429e5a>] __do_softirq+0x76/0xf8
       [<c0429de4>] __do_softirq+0x0/0xf8
       <IRQ>  [<c044fb67>] handle_level_irq+0x0/0xad
       [<c0429db7>] irq_exit+0x35/0x62
       [<c04046bb>] do_IRQ+0xdf/0xf4
       [<c04035a7>] common_interrupt+0x27/0x2c
       [<c04079c5>] default_idle+0x2a/0x3d
       [<c0401bb6>] cpu_idle+0x57/0x70
      
      Macvlan driver always uses standard ethernet header length for all types
      of interface to which it is linked.  This patch fixes this problem.
      
      Reported-by: <sg.tweak@gmail.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef5c8996
  33. 30 5月, 2009 1 次提交
    • J
      net: convert unicast addr list · ccffad25
      Jiri Pirko 提交于
      This patch converts unicast address list to standard list_head using
      previously introduced struct netdev_hw_addr. It also relaxes the
      locking. Original spinlock (still used for multicast addresses) is not
      needed and is no longer used for a protection of this list. All
      reading and writing takes place under rtnl (with no changes).
      
      I also removed a possibility to specify the length of the address
      while adding or deleting unicast address. It's always dev->addr_len.
      
      The convertion touched especially e1000 and ixgbe codes when the
      change is not so trivial.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      
       drivers/net/bnx2.c               |   13 +--
       drivers/net/e1000/e1000_main.c   |   24 +++--
       drivers/net/ixgbe/ixgbe_common.c |   14 ++--
       drivers/net/ixgbe/ixgbe_common.h |    4 +-
       drivers/net/ixgbe/ixgbe_main.c   |    6 +-
       drivers/net/ixgbe/ixgbe_type.h   |    4 +-
       drivers/net/macvlan.c            |   11 +-
       drivers/net/mv643xx_eth.c        |   11 +-
       drivers/net/niu.c                |    7 +-
       drivers/net/virtio_net.c         |    7 +-
       drivers/s390/net/qeth_l2_main.c  |    6 +-
       drivers/scsi/fcoe/fcoe.c         |   16 ++--
       include/linux/netdevice.h        |   18 ++--
       net/8021q/vlan.c                 |    4 +-
       net/8021q/vlan_dev.c             |   10 +-
       net/core/dev.c                   |  195 +++++++++++++++++++++++++++-----------
       net/dsa/slave.c                  |   10 +-
       net/packet/af_packet.c           |    4 +-
       18 files changed, 227 insertions(+), 137 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccffad25
  34. 19 5月, 2009 1 次提交
    • E
      net: release dst entry in dev_hard_start_xmit() · 93f154b5
      Eric Dumazet 提交于
      One point of contention in high network loads is the dst_release() performed
      when a transmited skb is freed. This is because NIC tx completion calls
      dev_kree_skb() long after original call to dev_queue_xmit(skb).
      
      CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
      quite visible if one CPU is 100% handling softirqs for a network device,
      since dst_clone() is done by other cpus, involving cache line ping pongs.
      
      It seems right place to release dst is in dev_hard_start_xmit(), for most
      devices but ones that are virtual, and some exceptions.
      
      David Miller suggested to define a new device flag, set in alloc_netdev_mq()
      (so that most devices set it at init time), and carefuly unset in devices
      which dont want a NULL skb->dst in their ndo_start_xmit().
      
      List of devices that must clear this flag is :
      
      - loopback device, because it calls netif_rx() and quoting Patrick :
          "ip_route_input() doesn't accept loopback addresses, so loopback packets
           already need to have a dst_entry attached."
      - appletalk/ipddp.c : needs skb->dst in its xmit function
      
      - And all devices that call again dev_queue_xmit() from their xmit function
      (as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f154b5
  35. 21 4月, 2009 1 次提交