1. 19 2月, 2013 1 次提交
  2. 16 2月, 2013 3 次提交
  3. 15 2月, 2013 1 次提交
  4. 07 2月, 2013 4 次提交
    • E
      net: reset mac header in dev_start_xmit() · 6d1ccff6
      Eric Dumazet 提交于
      On 64 bit arches :
      
      There is a off-by-one error in qdisc_pkt_len_init() because
      mac_header is not set in xmit path.
      
      skb_mac_header() returns an out of bound value that was
      harmless because hdr_len is an 'unsigned int'
      
      On 32bit arches, the error is abysmal.
      
      This patch is also a prereq for "macvlan: add multicast filter"
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d1ccff6
    • C
      net: adjust skb_gso_segment() for calling in rx path · 12b0004d
      Cong Wang 提交于
      skb_gso_segment() is almost always called in tx path,
      except for openvswitch. It calls this function when
      it receives the packet and tries to queue it to user-space.
      In this special case, the ->ip_summed check inside
      skb_gso_segment() is no longer true, as ->ip_summed value
      has different meanings on rx path.
      
      This patch adjusts skb_gso_segment() so that we can at least
      avoid such warnings on checksum.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12b0004d
    • N
      netpoll: protect napi_poll and poll_controller during dev_[open|close] · ca99ca14
      Neil Horman 提交于
      Ivan Vercera was recently backporting commit
      9c13cb8b to a RHEL kernel, and I noticed that,
      while this patch protects the tg3 driver from having its ndo_poll_controller
      routine called during device initalization, it does nothing for the driver
      during shutdown. I.e. it would be entirely possible to have the
      ndo_poll_controller method (or subsequently the ndo_poll) routine called for a
      driver in the netpoll path on CPU A while in parallel on CPU B, the ndo_close or
      ndo_open routine could be called.  Given that the two latter routines tend to
      initizlize and free many data structures that the former two rely on, the result
      can easily be data corruption or various other crashes.  Furthermore, it seems
      that this is potentially a problem with all net drivers that support netpoll,
      and so this should ideally be fixed in a common path.
      
      As Ben H Pointed out to me, we can't preform dev_open/dev_close in atomic
      context, so I've come up with this solution.  We can use a mutex to sleep in
      open/close paths and just do a mutex_trylock in the napi poll path and abandon
      the poll attempt if we're locked, as we'll just retry the poll on the next send
      anyway.
      
      I've tested this here by flooding netconsole with messages on a system whos nic
      driver I modfied to periodically return NETDEV_TX_BUSY, so that the netpoll tx
      workqueue would be forced to send frames and poll the device.  While this was
      going on I rapidly ifdown/up'ed the interface and watched for any problems.
      I've not found any.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Ivan Vecera <ivecera@redhat.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Francois Romieu <romieu@fr.zoreil.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca99ca14
    • J
      net: core: Remove unnecessary alloc/OOM messages · 62b5942a
      Joe Perches 提交于
      alloc failures already get standardized OOM
      messages and a dump_stack.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62b5942a
  5. 30 1月, 2013 1 次提交
  6. 28 1月, 2013 1 次提交
    • E
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet 提交于
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      Reported-by: NPravin Shelar <pshelar@nicira.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef401de
  7. 22 1月, 2013 1 次提交
  8. 16 1月, 2013 1 次提交
  9. 12 1月, 2013 2 次提交
  10. 11 1月, 2013 6 次提交
    • A
      net: Add support for XPS without sysfs being defined · 024e9679
      Alexander Duyck 提交于
      This patch makes it so that we can support transmit packet steering without
      sysfs needing to be enabled.  The reason for making this change is to make
      it so that a driver can make use of the XPS even while the sysfs portion of
      the interface is not present.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      024e9679
    • A
      net: Rewrite netif_set_xps_queues to address several issues · 01c5f864
      Alexander Duyck 提交于
      This change is meant to address several issues I found within the
      netif_set_xps_queues function.
      
      If the allocation of one of the maps to be assigned to new_dev_maps failed
      we could end up with the device map in an inconsistent state since we had
      already worked through a number of CPUs and removed or added the queue.  To
      address that I split the process into several steps.  The first of which is
      just the allocation of updated maps for CPUs that will need larger maps to
      store the queue.  By doing this we can fail gracefully without actually
      altering the contents of the current device map.
      
      The second issue I found was the fact that we were always allocating a new
      device map even if we were not adding any queues.  I have updated the code
      so that we only allocate a new device map if we are adding queues,
      otherwise if we are not adding any queues to CPUs we just skip to the
      removal process.
      
      The last change I made was to reuse the code from remove_xps_queue to remove
      the queue from the CPU.  By making this change we can be consistent in how
      we go about adding and removing the queues from the CPUs.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01c5f864
    • A
      net: Rewrite netif_reset_xps_queue to allow for better code reuse · 10cdc3f3
      Alexander Duyck 提交于
      This patch does a minor refactor on netif_reset_xps_queue to address a few
      items I noticed.
      
      First is the fact that we are doing removal of queues in both
      netif_reset_xps_queue and netif_set_xps_queue.  Since there is no need to
      have the code in two places I am pushing it out into a separate function
      and will come back in another patch and reuse the code in
      netif_set_xps_queue.
      
      The second item this change addresses is the fact that the Tx queues were
      not getting their numa_node value cleared as a part of the XPS queue reset.
      This patch resolves that by resetting the numa_node value if the dev_maps
      value is set.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10cdc3f3
    • A
      net: Add functions netif_reset_xps_queue and netif_set_xps_queue · 537c00de
      Alexander Duyck 提交于
      This patch adds two functions, netif_reset_xps_queue and
      netif_set_xps_queue.  The main idea behind these two functions is to
      provide a mechanism through which drivers can update their defaults in
      regards to XPS.
      
      Currently no such mechanism exists and as a result we cannot use XPS for
      things such as ATR which would require a basic configuration to start in
      which the Tx queues are mapped to CPUs via a 1:1 mapping.  With this change
      I am making it possible for drivers such as ixgbe to be able to use the XPS
      feature by controlling the default configuration.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      537c00de
    • A
      net: Split core bits of netdev_pick_tx into __netdev_pick_tx · 416186fb
      Alexander Duyck 提交于
      This change splits the core bits of netdev_pick_tx into a separate function.
      The main idea behind this is to make this code accessible to select queue
      functions when they decide to process the standard path instead of their
      own custom path in their select queue routine.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      416186fb
    • E
      net_sched: more precise pkt_len computation · 1def9238
      Eric Dumazet 提交于
      One long standing problem with TSO/GSO/GRO packets is that skb->len
      doesn't represent a precise amount of bytes on wire.
      
      Headers are only accounted for the first segment.
      For TCP, thats typically 66 bytes per 1448 bytes segment missing,
      an error of 4.5 % for normal MSS value.
      
      As consequences :
      
      1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits.
      2) Device stats are slightly under estimated as well.
      
      Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len
      computation.
      
      Packet schedulers should use qdisc pkt_len instead of skb->len for their
      bandwidth limitations, and TSO enabled devices drivers could use pkt_len
      if their statistics are not hardware assisted, and if they don't scratch
      skb->cb[] first word.
      
      Both egress and ingress paths work, thanks to commit fda55eca
      (net: introduce skb_transport_header_was_set()) : If GRO built
      a GSO packet, it also set the transport header for us.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Paolo Valente <paolo.valente@unimore.it>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1def9238
  11. 09 1月, 2013 2 次提交
  12. 05 1月, 2013 2 次提交
  13. 04 1月, 2013 2 次提交
  14. 29 12月, 2012 1 次提交
  15. 22 12月, 2012 1 次提交
  16. 12 12月, 2012 1 次提交
  17. 09 12月, 2012 1 次提交
  18. 08 12月, 2012 2 次提交
  19. 05 12月, 2012 1 次提交
    • S
      net: dev_change_net_namespace: send a KOBJ_REMOVED/KOBJ_ADD · 4e66ae2e
      Serge Hallyn 提交于
      When a new nic is created in namespace ns1, the kernel sends a KOBJ_ADD uevent
      to ns1.  When the nic is moved to ns2, we only send a KOBJ_MOVE to ns2, and
      nothing to ns1.
      
      This patch changes that behavior so that when moving a nic from ns1 to ns2, we
      send a KOBJ_REMOVED to ns1 and KOBJ_ADD to ns2.  (The KOBJ_MOVE is still
      sent to ns2).
      
      The effects of this can be seen when starting and stopping containers in
      an upstart based host.  Lxc will create a pair of veth nics, the kernel
      sends KOBJ_ADD, and upstart starts network-instance jobs for each.  When
      one nic is moved to the container, because no KOBJ_REMOVED event is
      received, the network-instance job for that veth never goes away.  This
      was reported at https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1065589
      With this patch the networ-instance jobs properly go away.
      
      The other oddness solved here is that if a nic is passed into a running
      upstart-based container, without this patch no network-instance job is
      started in the container.  But when the container creates a new nic
      itself (ip link add new type veth) then network-interface jobs are
      created.  With this patch, behavior comes in line with a regular host.
      
      v2: also send KOBJ_ADD to new netns.  There will then be a
      _MOVE event from the device_rename() call, but that should
      be innocuous.
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e66ae2e
  20. 30 11月, 2012 1 次提交
  21. 27 11月, 2012 1 次提交
    • B
      sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name · c91f6df2
      Brian Haley 提交于
      Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
      will then require another call like if_indextoname() to get the actual interface
      name, have it return the name directly.
      
      This also matches the existing man page description on socket(7) which mentions
      the argument being an interface name.
      
      If the value has not been set, zero is returned and optlen will be set to zero
      to indicate there is no interface name present.
      
      Added a seqlock to protect this code path, and dev_ifname(), from someone
      changing the device name via dev_change_name().
      
      v2: Added seqlock protection while copying device name.
      
      v3: Fixed word wrap in patch.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91f6df2
  22. 21 11月, 2012 1 次提交
  23. 19 11月, 2012 1 次提交
    • E
      net: Allow userns root control of the core of the network stack. · 5e1fccc0
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow ethtool ioctls.
      
      Allow binding to network devices.
      Allow setting the socket mark.
      Allow setting the socket priority.
      
      Allow setting the network device alias via sysfs.
      Allow setting the mtu via sysfs.
      Allow changing the network device flags via sysfs.
      Allow setting the network device group via sysfs.
      
      Allow the following network device ioctls.
      SIOCGMIIPHY
      SIOCGMIIREG
      SIOCSIFNAME
      SIOCSIFFLAGS
      SIOCSIFMETRIC
      SIOCSIFMTU
      SIOCSIFHWADDR
      SIOCSIFSLAVE
      SIOCADDMULTI
      SIOCDELMULTI
      SIOCSIFHWBROADCAST
      SIOCSMIIREG
      SIOCBONDENSLAVE
      SIOCBONDRELEASE
      SIOCBONDSETHWADDR
      SIOCBONDCHANGEACTIVE
      SIOCBRADDIF
      SIOCBRDELIF
      SIOCSHWTSTAMP
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e1fccc0
  24. 17 11月, 2012 2 次提交