1. 13 10月, 2009 3 次提交
    • C
      c3faca05
    • A
      net: Introduce recvmmsg socket syscall · a2e27255
      Arnaldo Carvalho de Melo 提交于
      Meaning receive multiple messages, reducing the number of syscalls and
      net stack entry/exit operations.
      
      Next patches will introduce mechanisms where protocols that want to
      optimize this operation will provide an unlocked_recvmsg operation.
      
      This takes into account comments made by:
      
      . Paul Moore: sock_recvmsg is called only for the first datagram,
        sock_recvmsg_nosec is used for the rest.
      
      . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
        works in the same fashion as the ppoll one.
      
        If the underlying protocol returns a datagram with MSG_OOB set, this
        will make recvmmsg return right away with as many datagrams (+ the OOB
        one) it has received so far.
      
      . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
        datagrams and then recvmsg returns an error, recvmmsg will return
        the successfully received datagrams, store the error and return it
        in the next call.
      
      This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
      where we will be able to acquire the lock only at batch start and end, not at
      every underlying recvmsg call.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2e27255
    • N
      net: Generalize socket rx gap / receive queue overflow cmsg · 3b885787
      Neil Horman 提交于
      Create a new socket level option to report number of queue overflows
      
      Recently I augmented the AF_PACKET protocol to report the number of frames lost
      on the socket receive queue between any two enqueued frames.  This value was
      exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
      requested that this feature be generalized so that any datagram oriented socket
      could make use of this option.  As such I've created this patch, It creates a
      new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
      SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
      overflowed between any two given frames.  It also augments the AF_PACKET
      protocol to take advantage of this new feature (as it previously did not touch
      sk->sk_drops, which this patch uses to record the overflow count).  Tested
      successfully by me.
      
      Notes:
      
      1) Unlike my previous patch, this patch simply records the sk_drops value, which
      is not a number of drops between packets, but rather a total number of drops.
      Deltas must be computed in user space.
      
      2) While this patch currently works with datagram oriented protocols, it will
      also be accepted by non-datagram oriented protocols. I'm not sure if thats
      agreeable to everyone, but my argument in favor of doing so is that, for those
      protocols which aren't applicable to this option, sk_drops will always be zero,
      and reporting no drops on a receive queue that isn't used for those
      non-participating protocols seems reasonable to me.  This also saves us having
      to code in a per-protocol opt in mechanism.
      
      3) This applies cleanly to net-next assuming that commit
      97775007 (my af packet cmsg patch) is reverted
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b885787
  2. 12 10月, 2009 4 次提交
  3. 09 10月, 2009 1 次提交
  4. 08 10月, 2009 22 次提交
  5. 07 10月, 2009 10 次提交
    • S
      ipv4: arp_notify address list bug · a21090cf
      Stephen Hemminger 提交于
      This fixes a bug with arp_notify.
      
      If arp_notify is enabled, kernel will crash if address is changed
      and no IP address is assigned.
        http://bugzilla.kernel.org/show_bug.cgi?id=14330Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a21090cf
    • S
      net: mark net_proto_ops as const · ec1b4cf7
      Stephen Hemminger 提交于
      All usages of structure net_proto_ops should be declared const.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec1b4cf7
    • O
      make TLLAO option for NA packets configurable · f7734fdf
      Octavian Purdila 提交于
      On Friday 02 October 2009 20:53:51 you wrote:
      
      > This is good although I would have shortened the name.
      
      Ah, I knew I forgot something :) Here is v4.
      
      tavi
      
      >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
      From: Octavian Purdila <opurdila@ixiacom.com>
      Date: Fri, 2 Oct 2009 00:51:15 +0300
      Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs
      
      Neighbor advertisements responding to unicast neighbor solicitations
      did not include the target link-layer address option. This patch adds
      a new sysctl option (disabled by default) which controls whether this
      option should be sent even with unicast NAs.
      
      The need for this arose because certain routers expect the TLLAO in
      some situations even as a response to unicast NS packets.
      
      Moreover, RFC 2461 recommends sending this to avoid a race condition
      (section 4.4, Target link-layer address)
      Signed-off-by: NCosmin Ratiu <cratiu@ixiacom.com>
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7734fdf
    • B
      Use sk_mark for IPv6 routing lookups · 51953d5b
      Brian Haley 提交于
      Atis Elsts wrote:
      > Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.
      
      Ok, I'll just drop that part, I'm not sure what should be done in this case.
      
      > Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?
      
      I just sent that patch out too quickly, here's a better one with the updates.
      
      Add support for IPv6 route lookups using sk_mark.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51953d5b
    • B
      ethtool: Add reset operation · d73d3a8c
      Ben Hutchings 提交于
      After updating firmware stored in flash, users may wish to reset the
      relevant hardware and start the new firmware immediately.  This should
      not be completely automatic as it may be disruptive.
      
      A selective reset may also be useful for debugging or diagnostics.
      
      This adds a separate reset operation which takes flags indicating the
      components to be reset.  Drivers are allowed to reset only a subset of
      those requested, and must indicate the actual subset.  This allows the
      use of generic component masks and some future expansion.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d73d3a8c
    • E
      pkt_sched: gen_estimator: Dont report fake rate estimators · d250a5f9
      Eric Dumazet 提交于
      Jarek Poplawski a écrit :
      >
      >
      > Hmm... So you made me to do some "real" work here, and guess what?:
      > there is one serious checkpatch warning! ;-) Plus, this new parameter
      > should be added to the function description. Otherwise:
      > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
      >
      > Thanks,
      > Jarek P.
      >
      > PS: I guess full "Don't" would show we really mean it...
      
      Okay :) Here is the last round, before the night !
      
      Thanks again
      
      [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators
      
      We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
      is running.
      
      # tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
       rate 0bit 0pps backlog 0b 0p requeues 0
      
      User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
      one (because no estimator is active)
      
      After this patch, tc command output is :
      $ tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      
      We add a parameter to gnet_stats_copy_rate_est() function so that
      it can use gen_estimator_active(bstats, r), as suggested by Jarek.
      
      This parameter can be NULL if check is not necessary, (htb for
      example has a mandatory rate estimator)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d250a5f9
    • E
      Use sk_mark for routing lookup in more places · 2d37a186
      Eric Dumazet 提交于
      Here is a followup on this area, thanks.
      
      [RFC] af_packet: fill skb->mark at xmit
      
      skb->mark may be used by classifiers, so fill it in case user
      set a SO_MARK option on socket.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d37a186
    • Y
      ipv6 sit: 6rd (IPv6 Rapid Deployment) Support. · fa857afc
      YOSHIFUJI Hideaki / 吉藤英明 提交于
      IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
      mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
      deploy IPv6 unicast service to IPv4 sites to which it provides
      customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
      IPv4 encapsulation in order to transit IPv4-only network
      infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
      prefix of its own in place of the fixed 6to4 prefix.
      
      With this option enabled, the SIT driver offers 6rd functionality by
      providing additional ioctl API to configure the IPv6 Prefix for in
      stead of static 2002::/16 for 6to4.
      
      Original patch was done by Alexandre Cassen <acassen@freebox.fr>
      based on old Internet-Draft.
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa857afc
    • I
      add vif using local interface index instead of IP · ee5e81f0
      Ilia K 提交于
      When routing daemon wants to enable forwarding of multicast traffic it
      performs something like:
      
             struct vifctl vc = {
                     .vifc_vifi  = 1,
                     .vifc_flags = 0,
                     .vifc_threshold = 1,
                     .vifc_rate_limit = 0,
                     .vifc_lcl_addr = ip, /* <--- ip address of physical
      interface, e.g. eth0 */
                     .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
               };
             setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
      
      This leads (in the kernel) to calling  vif_add() function call which
      search the (physical) device using assigned IP address:
             dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
      
      The current API (struct vifctl) does not allow to specify an
      interface other way than using it's IP, and if there are more than a
      single interface with specified IP only the first one will be found.
      
      The attached patch (against 2.6.30.4) allows to specify an interface
      by its index, instead of IP address:
      
             struct vifctl vc = {
                     .vifc_vifi  = 1,
                     .vifc_flags = VIFF_USE_IFINDEX,   /* NEW */
                     .vifc_threshold = 1,
                     .vifc_rate_limit = 0,
                     .vifc_lcl_ifindex = if_nametoindex("eth0"),   /* NEW */
                     .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
               };
             setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
      Signed-off-by: NIlia K. <mail4ilia@gmail.com>
      
      === modified file 'include/linux/mroute.h'
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee5e81f0
    • E
      net: speedup sk_wake_async() · bcdce719
      Eric Dumazet 提交于
      An incoming datagram must bring into cpu cache *lot* of cache lines,
      in particular : (other parts omitted (hash chains, ip route cache...))
      
      On 32bit arches :
      
      offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
      offsetof(struct sock, sk_lock)         =0x34   (rw)
      
      offsetof(struct sock, sk_sleep)        =0x50 (read)
      offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
      offsetof(struct sock, sk_receive_queue)=0x74   (rw)
      
      offsetof(struct sock, sk_forward_alloc)=0x98   (rw)
      
      offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
      offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
      offsetof(struct sock, sk_filter)       =0xf8    (read)
      
      offsetof(struct sock, sk_socket)       =0x138 (read)
      
      offsetof(struct sock, sk_data_ready)   =0x15c   (read)
      
      
      We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
      with no fasync() structures. (socket->fasync_list ptr is probably already in cache
      because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
      
      This avoids one cache line load per incoming packet for common cases (no fasync())
      
      We can leave (or even move in a future patch) sk->sk_socket in a cold location
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcdce719