1. 11 1月, 2013 2 次提交
    • A
      net: Split core bits of netdev_pick_tx into __netdev_pick_tx · 416186fb
      Alexander Duyck 提交于
      This change splits the core bits of netdev_pick_tx into a separate function.
      The main idea behind this is to make this code accessible to select queue
      functions when they decide to process the standard path instead of their
      own custom path in their select queue routine.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      416186fb
    • E
      net_sched: more precise pkt_len computation · 1def9238
      Eric Dumazet 提交于
      One long standing problem with TSO/GSO/GRO packets is that skb->len
      doesn't represent a precise amount of bytes on wire.
      
      Headers are only accounted for the first segment.
      For TCP, thats typically 66 bytes per 1448 bytes segment missing,
      an error of 4.5 % for normal MSS value.
      
      As consequences :
      
      1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits.
      2) Device stats are slightly under estimated as well.
      
      Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len
      computation.
      
      Packet schedulers should use qdisc pkt_len instead of skb->len for their
      bandwidth limitations, and TSO enabled devices drivers could use pkt_len
      if their statistics are not hardware assisted, and if they don't scratch
      skb->cb[] first word.
      
      Both egress and ingress paths work, thanks to commit fda55eca
      (net: introduce skb_transport_header_was_set()) : If GRO built
      a GSO packet, it also set the transport header for us.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Paolo Valente <paolo.valente@unimore.it>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1def9238
  2. 09 1月, 2013 2 次提交
  3. 05 1月, 2013 2 次提交
  4. 04 1月, 2013 2 次提交
  5. 29 12月, 2012 1 次提交
  6. 22 12月, 2012 1 次提交
  7. 12 12月, 2012 1 次提交
  8. 09 12月, 2012 1 次提交
  9. 08 12月, 2012 2 次提交
  10. 05 12月, 2012 1 次提交
    • S
      net: dev_change_net_namespace: send a KOBJ_REMOVED/KOBJ_ADD · 4e66ae2e
      Serge Hallyn 提交于
      When a new nic is created in namespace ns1, the kernel sends a KOBJ_ADD uevent
      to ns1.  When the nic is moved to ns2, we only send a KOBJ_MOVE to ns2, and
      nothing to ns1.
      
      This patch changes that behavior so that when moving a nic from ns1 to ns2, we
      send a KOBJ_REMOVED to ns1 and KOBJ_ADD to ns2.  (The KOBJ_MOVE is still
      sent to ns2).
      
      The effects of this can be seen when starting and stopping containers in
      an upstart based host.  Lxc will create a pair of veth nics, the kernel
      sends KOBJ_ADD, and upstart starts network-instance jobs for each.  When
      one nic is moved to the container, because no KOBJ_REMOVED event is
      received, the network-instance job for that veth never goes away.  This
      was reported at https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1065589
      With this patch the networ-instance jobs properly go away.
      
      The other oddness solved here is that if a nic is passed into a running
      upstart-based container, without this patch no network-instance job is
      started in the container.  But when the container creates a new nic
      itself (ip link add new type veth) then network-interface jobs are
      created.  With this patch, behavior comes in line with a regular host.
      
      v2: also send KOBJ_ADD to new netns.  There will then be a
      _MOVE event from the device_rename() call, but that should
      be innocuous.
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e66ae2e
  11. 30 11月, 2012 1 次提交
  12. 27 11月, 2012 1 次提交
    • B
      sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name · c91f6df2
      Brian Haley 提交于
      Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
      will then require another call like if_indextoname() to get the actual interface
      name, have it return the name directly.
      
      This also matches the existing man page description on socket(7) which mentions
      the argument being an interface name.
      
      If the value has not been set, zero is returned and optlen will be set to zero
      to indicate there is no interface name present.
      
      Added a seqlock to protect this code path, and dev_ifname(), from someone
      changing the device name via dev_change_name().
      
      v2: Added seqlock protection while copying device name.
      
      v3: Fixed word wrap in patch.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91f6df2
  13. 21 11月, 2012 1 次提交
  14. 19 11月, 2012 1 次提交
    • E
      net: Allow userns root control of the core of the network stack. · 5e1fccc0
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow ethtool ioctls.
      
      Allow binding to network devices.
      Allow setting the socket mark.
      Allow setting the socket priority.
      
      Allow setting the network device alias via sysfs.
      Allow setting the mtu via sysfs.
      Allow changing the network device flags via sysfs.
      Allow setting the network device group via sysfs.
      
      Allow the following network device ioctls.
      SIOCGMIIPHY
      SIOCGMIIREG
      SIOCSIFNAME
      SIOCSIFFLAGS
      SIOCSIFMETRIC
      SIOCSIFMTU
      SIOCSIFHWADDR
      SIOCSIFSLAVE
      SIOCADDMULTI
      SIOCDELMULTI
      SIOCSIFHWBROADCAST
      SIOCSMIIREG
      SIOCBONDENSLAVE
      SIOCBONDRELEASE
      SIOCBONDSETHWADDR
      SIOCBONDCHANGEACTIVE
      SIOCBRADDIF
      SIOCBRDELIF
      SIOCSHWTSTAMP
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e1fccc0
  15. 17 11月, 2012 2 次提交
  16. 16 11月, 2012 3 次提交
  17. 08 11月, 2012 1 次提交
    • E
      af-packet: fix oops when socket is not present · a3d744e9
      Eric Leblond 提交于
      Due to a NULL dereference, the following patch is causing oops
      in normal trafic condition:
      
      commit c0de08d0
      Author: Eric Leblond <eric@regit.org>
      Date:   Thu Aug 16 22:02:58 2012 +0000
      
          af_packet: don't emit packet on orig fanout group
      
      This buggy patch was a feature fix and has reached most stable
      branches.
      
      When skb->sk is NULL and when packet fanout is used, there is a
      crash in match_fanout_group where skb->sk is accessed.
      This patch fixes the issue by returning false as soon as the
      socket is NULL: this correspond to the wanted behavior because
      the kernel as to resend the skb to all the listening socket in
      this case.
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3d744e9
  18. 22 10月, 2012 1 次提交
  19. 09 10月, 2012 2 次提交
    • F
      vlan: don't deliver frames for unknown vlans to protocols · 48cc32d3
      Florian Zumbiehl 提交于
      6a32e4f9 made the vlan code skip marking
      vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if
      there was an rx_handler, as the rx_handler could cause the frame to be received
      on a different (virtual) vlan-capable interface where that vlan might be
      configured.
      
      As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause
      frames for unknown vlans to be delivered to the protocol stack as if they had
      been received untagged.
      
      For example, if an ipv6 router advertisement that's tagged for a locally not
      configured vlan is received on an interface with macvlan interfaces attached,
      macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the
      macvlan interfaces, which caused it to be passed to the protocol stack, leading
      to ipv6 addresses for the announced prefix being configured even though those
      are completely unusable on the underlying interface.
      
      The fix moves marking as PACKET_OTHERHOST after the rx_handler so the
      rx_handler, if there is one, sees the frame unchanged, but afterwards,
      before the frame is delivered to the protocol stack, it gets marked whether
      there is an rx_handler or not.
      Signed-off-by: NFlorian Zumbiehl <florz@florz.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48cc32d3
    • E
      net: gro: selective flush of packets · 2e71a6f8
      Eric Dumazet 提交于
      Current GRO can hold packets in gro_list for almost unlimited
      time, in case napi->poll() handler consumes its budget over and over.
      
      In this case, napi_complete()/napi_gro_flush() are not called.
      
      Another problem is that gro_list is flushed in non friendly way :
      We scan the list and complete packets in the reverse order.
      (youngest packets first, oldest packets last)
      This defeats priorities that sender could have cooked.
      
      Since GRO currently only store TCP packets, we dont really notice the
      bug because of retransmits, but this behavior can add unexpected
      latencies, particularly on mice flows clamped by elephant flows.
      
      This patch makes sure no packet can stay more than 1 ms in queue, and
      only in stress situations.
      
      It also complete packets in the right order to minimize latencies.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e71a6f8
  20. 08 10月, 2012 1 次提交
  21. 02 10月, 2012 1 次提交
    • E
      net: add gro_cells infrastructure · c9e6bc64
      Eric Dumazet 提交于
      This adds a new include file (include/net/gro_cells.h), to bring GRO
      (Generic Receive Offload) capability to tunnels, in a modular way.
      
      Because tunnels receive path is lockless, and GRO adds a serialization
      using a napi_struct, I chose to add an array of up to
      DEFAULT_MAX_NUM_RSS_QUEUES cells, so that multi queue devices wont be
      slowed down because of GRO layer.
      
      skb_get_rx_queue() is used as selector.
      
      In the future, we might add optional fanout capabilities, using rxhash
      for example.
      
      With help from Ben Hutchings who reminded me
      netif_get_num_default_rss_queues() function.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9e6bc64
  22. 21 9月, 2012 1 次提交
    • E
      net: do not disable sg for packets requiring no checksum · c0d680e5
      Ed Cashin 提交于
      A change in a series of VLAN-related changes appears to have
      inadvertently disabled the use of the scatter gather feature of
      network cards for transmission of non-IP ethernet protocols like ATA
      over Ethernet (AoE).  Below is a reference to the commit that
      introduces a "harmonize_features" function that turns off scatter
      gather when the NIC does not support hardware checksumming for the
      ethernet protocol of an sk buff.
      
        commit f01a5236
        Author: Jesse Gross <jesse@nicira.com>
        Date:   Sun Jan 9 06:23:31 2011 +0000
      
            net offloading: Generalize netif_get_vlan_features().
      
      The can_checksum_protocol function is not equipped to consider a
      protocol that does not require checksumming.  Calling it for a
      protocol that requires no checksum is inappropriate.
      
      The patch below has harmonize_features call can_checksum_protocol when
      the protocol needs a checksum, so that the network layer is not forced
      to perform unnecessary skb linearization on the transmission of AoE
      packets.  Unnecessary linearization results in decreased performance
      and increased memory pressure, as reported here:
      
        http://www.spinics.net/lists/linux-mm/msg15184.html
      
      The problem has probably not been widely experienced yet, because
      only recently has the kernel.org-distributed aoe driver acquired the
      ability to use payloads of over a page in size, with the patchset
      recently included in the mm tree:
      
        https://lkml.org/lkml/2012/8/28/140
      
      The coraid.com-distributed aoe driver already could use payloads of
      greater than a page in size, but its users generally do not use the
      newest kernels.
      Signed-off-by: NEd Cashin <ecashin@coraid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0d680e5
  23. 20 9月, 2012 4 次提交
  24. 19 9月, 2012 1 次提交
  25. 18 9月, 2012 1 次提交
    • E
      userns: Convert the audit loginuid to be a kuid · e1760bd5
      Eric W. Biederman 提交于
      Always store audit loginuids in type kuid_t.
      
      Print loginuids by converting them into uids in the appropriate user
      namespace, and then printing the resulting uid.
      
      Modify audit_get_loginuid to return a kuid_t.
      
      Modify audit_set_loginuid to take a kuid_t.
      
      Modify /proc/<pid>/loginuid on read to convert the loginuid into the
      user namespace of the opener of the file.
      
      Modify /proc/<pid>/loginud on write to convert the loginuid
      rom the user namespace of the opener of the file.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Paul Moore <paul@paul-moore.com> ?
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      e1760bd5
  26. 17 9月, 2012 3 次提交