1. 29 10月, 2009 4 次提交
  2. 28 10月, 2009 3 次提交
  3. 27 10月, 2009 1 次提交
    • E
      vlan: allow null VLAN ID to be used · 05423b24
      Eric Dumazet 提交于
      We currently use a 16 bit field (vlan_tci) to store VLAN ID/PRIO on a skb.
      
      Null value is used as a special value, meaning vlan tagging not enabled.
      This forbids use of null vlan ID.
      
      As pointed by David, some drivers use the 3 high order bits (PRIO)
      
      As VLAN ID is 12 bits, we can use the remaining bit (CFI) as a flag, and
      allow null VLAN ID.
      
      In case future code really wants to use VLAN_CFI_MASK, we'll have to use
      a bit outside of vlan_tci.
      
      #define VLAN_PRIO_MASK         0xe000 /* Priority Code Point */
      #define VLAN_PRIO_SHIFT        13
      #define VLAN_CFI_MASK          0x1000 /* Canonical Format Indicator */
      #define VLAN_TAG_PRESENT       VLAN_CFI_MASK
      #define VLAN_VID_MASK          0x0fff /* VLAN Identifier */
      Reported-by: NGertjan Hofman <gertjan_hofman@yahoo.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05423b24
  4. 23 10月, 2009 1 次提交
  5. 20 10月, 2009 3 次提交
  6. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  7. 14 10月, 2009 1 次提交
  8. 13 10月, 2009 4 次提交
    • W
      can: make the number of echo skb's configurable · a6e4bc53
      Wolfgang Grandegger 提交于
      This patch allows the CAN controller driver to define the number of echo
      skb's used for the local loopback (echo), as suggested by Kurt Van
      Dijck, with the function:
      
        struct net_device *alloc_candev(int sizeof_priv,
                                        unsigned int echo_skb_max);
      
      The CAN drivers have been adapted accordingly. For the ems_usb driver,
      as suggested by Sebastian Haas, the number of echo skb's has been
      increased to 10, which improves the transmission performance a lot.
      Signed-off-by: NWolfgang Grandegger <wg@grandegger.com>
      Signed-off-by: NKurt Van Dijck <kurt.van.dijck@eia.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6e4bc53
    • E
      net: Add netdev_alloc_skb_ip_align() helper · 61321bbd
      Eric Dumazet 提交于
      Instead of hardcoding NET_IP_ALIGN stuff in various network drivers,
      we can add a helper around netdev_alloc_skb()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61321bbd
    • A
      net: Introduce recvmmsg socket syscall · a2e27255
      Arnaldo Carvalho de Melo 提交于
      Meaning receive multiple messages, reducing the number of syscalls and
      net stack entry/exit operations.
      
      Next patches will introduce mechanisms where protocols that want to
      optimize this operation will provide an unlocked_recvmsg operation.
      
      This takes into account comments made by:
      
      . Paul Moore: sock_recvmsg is called only for the first datagram,
        sock_recvmsg_nosec is used for the rest.
      
      . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
        works in the same fashion as the ppoll one.
      
        If the underlying protocol returns a datagram with MSG_OOB set, this
        will make recvmmsg return right away with as many datagrams (+ the OOB
        one) it has received so far.
      
      . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
        datagrams and then recvmsg returns an error, recvmmsg will return
        the successfully received datagrams, store the error and return it
        in the next call.
      
      This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
      where we will be able to acquire the lock only at batch start and end, not at
      every underlying recvmsg call.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2e27255
    • N
      net: Generalize socket rx gap / receive queue overflow cmsg · 3b885787
      Neil Horman 提交于
      Create a new socket level option to report number of queue overflows
      
      Recently I augmented the AF_PACKET protocol to report the number of frames lost
      on the socket receive queue between any two enqueued frames.  This value was
      exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
      requested that this feature be generalized so that any datagram oriented socket
      could make use of this option.  As such I've created this patch, It creates a
      new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
      SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
      overflowed between any two given frames.  It also augments the AF_PACKET
      protocol to take advantage of this new feature (as it previously did not touch
      sk->sk_drops, which this patch uses to record the overflow count).  Tested
      successfully by me.
      
      Notes:
      
      1) Unlike my previous patch, this patch simply records the sk_drops value, which
      is not a number of drops between packets, but rather a total number of drops.
      Deltas must be computed in user space.
      
      2) While this patch currently works with datagram oriented protocols, it will
      also be accepted by non-datagram oriented protocols. I'm not sure if thats
      agreeable to everyone, but my argument in favor of doing so is that, for those
      protocols which aren't applicable to this option, sk_drops will always be zero,
      and reporting no drops on a receive queue that isn't used for those
      non-participating protocols seems reasonable to me.  This also saves us having
      to code in a per-protocol opt in mechanism.
      
      3) This applies cleanly to net-next assuming that commit
      97775007 (my af packet cmsg patch) is reverted
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b885787
  9. 12 10月, 2009 1 次提交
  10. 09 10月, 2009 1 次提交
    • J
      ipv6: Fix the size overflow of addrconf_sysctl array · e0e6f55d
      Jin Dongming 提交于
      (This patch fixes bug of commit f7734fdf
       title "make TLLAO option for NA packets configurable")
      
      When the IPV6 conf is used, the function sysctl_set_parent is called and the
      array addrconf_sysctl is used as a parameter of the function.
      
      The above patch added new conf "force_tllao" into the array addrconf_sysctl,
      but the size of the array was not modified, the static allocated size is
      DEVCONF_MAX + 1 but the real size is DEVCONF_MAX + 2, so the problem is
      that the function sysctl_set_parent accessed wrong address.
      
      I got the following information.
      Call Trace:
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff810622d5>] __register_sysctl_paths+0xde/0x272
          [<ffffffff8110892d>] ? __kmalloc_track_caller+0x16e/0x180
          [<ffffffffa00cfac3>] ? __addrconf_sysctl_register+0xc5/0x144 [ipv6]
          [<ffffffff8141f2c9>] register_net_sysctl_table+0x48/0x4b
          [<ffffffffa00cfaf5>] __addrconf_sysctl_register+0xf7/0x144 [ipv6]
          [<ffffffffa00cfc16>] addrconf_init_net+0xd4/0x104 [ipv6]
          [<ffffffff8139195f>] setup_net+0x35/0x82
          [<ffffffff81391f6c>] copy_net_ns+0x76/0xe0
          [<ffffffff8107ad60>] create_new_namespaces+0xf0/0x16e
          [<ffffffff8107afee>] copy_namespaces+0x65/0x9f
          [<ffffffff81056dff>] copy_process+0xb2c/0x12c3
          [<ffffffff810576e1>] do_fork+0x14b/0x2d2
          [<ffffffff8107ac4e>] ? up_read+0xe/0x10
          [<ffffffff81438e73>] ? do_page_fault+0x27a/0x2aa
          [<ffffffff8101044b>] sys_clone+0x28/0x2a
          [<ffffffff81011fb3>] stub_clone+0x13/0x20
          [<ffffffff81011c72>] ? system_call_fastpath+0x16/0x1b
      
      And the information of IPV6 in .config is as following.
      IPV6 in .config:
          CONFIG_IPV6=m
          CONFIG_IPV6_PRIVACY=y
          CONFIG_IPV6_ROUTER_PREF=y
          CONFIG_IPV6_ROUTE_INFO=y
          CONFIG_IPV6_OPTIMISTIC_DAD=y
          CONFIG_IPV6_MIP6=m
          CONFIG_IPV6_SIT=m
          # CONFIG_IPV6_SIT_6RD is not set
          CONFIG_IPV6_NDISC_NODETYPE=y
          CONFIG_IPV6_TUNNEL=m
          CONFIG_IPV6_MULTIPLE_TABLES=y
          CONFIG_IPV6_SUBTREES=y
          CONFIG_IPV6_MROUTE=y
          CONFIG_IPV6_PIMSM_V2=y
          # CONFIG_IP_VS_IPV6 is not set
          CONFIG_NF_CONNTRACK_IPV6=m
          CONFIG_IP6_NF_MATCH_IPV6HEADER=m
      
      I confirmed this patch fixes this problem.
      Signed-off-by: NJin Dongming <jin.dongming@np.css.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e6f55d
  11. 08 10月, 2009 5 次提交
  12. 07 10月, 2009 5 次提交
    • S
      dcb: data center bridging ops should be r/o · 32953543
      Stephen Hemminger 提交于
      The data center bridging ops structure can be const
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Acked-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32953543
    • O
      make TLLAO option for NA packets configurable · f7734fdf
      Octavian Purdila 提交于
      On Friday 02 October 2009 20:53:51 you wrote:
      
      > This is good although I would have shortened the name.
      
      Ah, I knew I forgot something :) Here is v4.
      
      tavi
      
      >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
      From: Octavian Purdila <opurdila@ixiacom.com>
      Date: Fri, 2 Oct 2009 00:51:15 +0300
      Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs
      
      Neighbor advertisements responding to unicast neighbor solicitations
      did not include the target link-layer address option. This patch adds
      a new sysctl option (disabled by default) which controls whether this
      option should be sent even with unicast NAs.
      
      The need for this arose because certain routers expect the TLLAO in
      some situations even as a response to unicast NS packets.
      
      Moreover, RFC 2461 recommends sending this to avoid a race condition
      (section 4.4, Target link-layer address)
      Signed-off-by: NCosmin Ratiu <cratiu@ixiacom.com>
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7734fdf
    • B
      ethtool: Add reset operation · d73d3a8c
      Ben Hutchings 提交于
      After updating firmware stored in flash, users may wish to reset the
      relevant hardware and start the new firmware immediately.  This should
      not be completely automatic as it may be disruptive.
      
      A selective reset may also be useful for debugging or diagnostics.
      
      This adds a separate reset operation which takes flags indicating the
      components to be reset.  Drivers are allowed to reset only a subset of
      those requested, and must indicate the actual subset.  This allows the
      use of generic component masks and some future expansion.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d73d3a8c
    • Y
      ipv6 sit: 6rd (IPv6 Rapid Deployment) Support. · fa857afc
      YOSHIFUJI Hideaki / 吉藤英明 提交于
      IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
      mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
      deploy IPv6 unicast service to IPv4 sites to which it provides
      customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
      IPv4 encapsulation in order to transit IPv4-only network
      infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
      prefix of its own in place of the fixed 6to4 prefix.
      
      With this option enabled, the SIT driver offers 6rd functionality by
      providing additional ioctl API to configure the IPv6 Prefix for in
      stead of static 2002::/16 for 6to4.
      
      Original patch was done by Alexandre Cassen <acassen@freebox.fr>
      based on old Internet-Draft.
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa857afc
    • I
      add vif using local interface index instead of IP · ee5e81f0
      Ilia K 提交于
      When routing daemon wants to enable forwarding of multicast traffic it
      performs something like:
      
             struct vifctl vc = {
                     .vifc_vifi  = 1,
                     .vifc_flags = 0,
                     .vifc_threshold = 1,
                     .vifc_rate_limit = 0,
                     .vifc_lcl_addr = ip, /* <--- ip address of physical
      interface, e.g. eth0 */
                     .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
               };
             setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
      
      This leads (in the kernel) to calling  vif_add() function call which
      search the (physical) device using assigned IP address:
             dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
      
      The current API (struct vifctl) does not allow to specify an
      interface other way than using it's IP, and if there are more than a
      single interface with specified IP only the first one will be found.
      
      The attached patch (against 2.6.30.4) allows to specify an interface
      by its index, instead of IP address:
      
             struct vifctl vc = {
                     .vifc_vifi  = 1,
                     .vifc_flags = VIFF_USE_IFINDEX,   /* NEW */
                     .vifc_threshold = 1,
                     .vifc_rate_limit = 0,
                     .vifc_lcl_ifindex = if_nametoindex("eth0"),   /* NEW */
                     .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
               };
             setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
      Signed-off-by: NIlia K. <mail4ilia@gmail.com>
      
      === modified file 'include/linux/mroute.h'
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee5e81f0
  13. 05 10月, 2009 7 次提交
    • J
      net: introduce NETDEV_POST_INIT notifier · 7ffbe3fd
      Johannes Berg 提交于
      For various purposes including a wireless extensions
      bugfix, we need to hook into the netdev creation before
      before netdev_register_kobject(). This will also ease
      doing the dev type assignment that Marcel was working
      on for cfg80211 drivers w/o touching them all.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ffbe3fd
    • M
      usbnet: Use wwan%d interface name for mobile broadband devices · e1e499ee
      Marcel Holtmann 提交于
      Add support for usbnet based devices like CDC-Ether to indicate that they
      are actually mobile broadband devices. In that case use wwan%d as default
      interface name.
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1e499ee
    • B
      net: Support inclusion of <linux/socket.h> before <sys/socket.h> · 9c501935
      Ben Hutchings 提交于
      The following user-space program fails to compile:
      
          #include <linux/socket.h>
          #include <sys/socket.h>
          int main() { return 0; }
      
      The reason is that <linux/socket.h> tests __GLIBC__ to decide whether it
      should define various structures and macros that are now defined for
      user-space by <sys/socket.h>, but __GLIBC__ is not defined if no libc
      headers have yet been included.
      
      It seems safe to drop support for libc 5 now.
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NBastian Blank <waldi@debian.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c501935
    • N
      af_packet: add interframe drop cmsg (v6) · 97775007
      Neil Horman 提交于
      Add Ancilliary data to better represent loss information
      
      I've had a few requests recently to provide more detail regarding frame loss
      during an AF_PACKET packet capture session.  Specifically the requestors want to
      see where in a packet sequence frames were lost, i.e. they want to see that 40
      frames were lost between frames 302 and 303 in a packet capture file.  In order
      to do this we need:
      
      1) The kernel to export this data to user space
      2) The applications to make use of it
      
      This patch addresses item (1).  It does this by doing the following:
      
      A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
      also no increment a stats called po->stats.tp_gap.
      
      B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
      value of po->stats.tp_gap in skb->mark.  skb->cb would nominally be the place to
      record this, but since all the space there is used up, we're overloading
      skb->mark.  Its safe to do since any enqueued packet is guaranteed to be
      unshared at this point, and skb->mark isn't used for anything else in the rx
      path to the application.  After we record tp_gap in the skb, we zero
      po->stats.tp_gap.  This allows us to keep a counter of the number of frames lost
      between any two enqueued packets
      
      C) When the application goes to dequeue a frame from the packet socket, we look
      at skb->mark for that frame.  If it is non-zero, we add a cmsg chunk to the
      msghdr of level SOL_PACKET and type PACKET_GAPDATA.  Its a 32 bit integer that
      represents the number of frames lost between this packet and the last previous
      frame received.
      
      Note there is a chance that if there is frame loss after a receive, and then the
      socket is closed, some gap data might be lost.  This is covered by the use of
      the PACKET_AUXDATA socket option, which gives total loss data.  With a bit of
      math, the final gap can be determined that way.
      
      I've tested this patch myself, and it works well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      
       include/linux/if_packet.h |    2 ++
       net/packet/af_packet.c    |   33 +++++++++++++++++++++++++++++++++
       2 files changed, 35 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97775007
    • B
      ethtool: Remove support for obsolete string query operations · a9828ec6
      Ben Hutchings 提交于
      The in-tree implementations have all been converted to
      get_sset_count().
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9828ec6
    • A
      a99bbaf5
    • J
      Revert "Seperate read and write statistics of in_flight requests" · 0f78ab98
      Jens Axboe 提交于
      This reverts commit a9327cac.
      
      Corrado Zoccolo <czoccolo@gmail.com> reports:
      
      "with 2.6.32-rc1 I started getting the following strange output from
      "iostat -kx 2":
      Linux 2.6.31bisect (et2) 	04/10/2009 	_i686_	(2 CPU)
      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                10,70    0,00    3,16   15,75    0,00   70,38
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda              18,22     0,00    0,67    0,01    14,77     0,02
      43,94     0,01   10,53 39043915,03 2629219,87
      sdb              60,89     9,68   50,79    3,04  1724,43    50,52
      65,95     0,70   13,06 488437,47 2629219,87
      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 2,72    0,00    0,74    0,00    0,00   96,53
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 6,68    0,00    0,99    0,00    0,00   92,33
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 4,40    0,00    0,73    1,47    0,00   93,40
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     4,00    0,00    3,00     0,00    28,00
      18,67     0,06   19,50 333,33 100,00
      
      Global values for service time and utilization are garbage. For
      interval values, utilization is always 100%, and service time is
      higher than normal.
      
      I bisected it down to:
      [a9327cac] Seperate read and write
      statistics of in_flight requests
      and verified that reverting just that commit indeed solves the issue
      on 2.6.32-rc1."
      
      So until this is debugged, revert the bad commit.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0f78ab98
  14. 04 10月, 2009 1 次提交
  15. 03 10月, 2009 2 次提交