1. 08 10月, 2009 7 次提交
  2. 07 10月, 2009 9 次提交
    • S
      net: mark net_proto_ops as const · ec1b4cf7
      Stephen Hemminger 提交于
      All usages of structure net_proto_ops should be declared const.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec1b4cf7
    • O
      make TLLAO option for NA packets configurable · f7734fdf
      Octavian Purdila 提交于
      On Friday 02 October 2009 20:53:51 you wrote:
      
      > This is good although I would have shortened the name.
      
      Ah, I knew I forgot something :) Here is v4.
      
      tavi
      
      >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
      From: Octavian Purdila <opurdila@ixiacom.com>
      Date: Fri, 2 Oct 2009 00:51:15 +0300
      Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs
      
      Neighbor advertisements responding to unicast neighbor solicitations
      did not include the target link-layer address option. This patch adds
      a new sysctl option (disabled by default) which controls whether this
      option should be sent even with unicast NAs.
      
      The need for this arose because certain routers expect the TLLAO in
      some situations even as a response to unicast NS packets.
      
      Moreover, RFC 2461 recommends sending this to avoid a race condition
      (section 4.4, Target link-layer address)
      Signed-off-by: NCosmin Ratiu <cratiu@ixiacom.com>
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7734fdf
    • B
      Use sk_mark for IPv6 routing lookups · 51953d5b
      Brian Haley 提交于
      Atis Elsts wrote:
      > Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.
      
      Ok, I'll just drop that part, I'm not sure what should be done in this case.
      
      > Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?
      
      I just sent that patch out too quickly, here's a better one with the updates.
      
      Add support for IPv6 route lookups using sk_mark.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51953d5b
    • B
      ethtool: Add reset operation · d73d3a8c
      Ben Hutchings 提交于
      After updating firmware stored in flash, users may wish to reset the
      relevant hardware and start the new firmware immediately.  This should
      not be completely automatic as it may be disruptive.
      
      A selective reset may also be useful for debugging or diagnostics.
      
      This adds a separate reset operation which takes flags indicating the
      components to be reset.  Drivers are allowed to reset only a subset of
      those requested, and must indicate the actual subset.  This allows the
      use of generic component masks and some future expansion.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d73d3a8c
    • E
      pkt_sched: gen_estimator: Dont report fake rate estimators · d250a5f9
      Eric Dumazet 提交于
      Jarek Poplawski a écrit :
      >
      >
      > Hmm... So you made me to do some "real" work here, and guess what?:
      > there is one serious checkpatch warning! ;-) Plus, this new parameter
      > should be added to the function description. Otherwise:
      > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
      >
      > Thanks,
      > Jarek P.
      >
      > PS: I guess full "Don't" would show we really mean it...
      
      Okay :) Here is the last round, before the night !
      
      Thanks again
      
      [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators
      
      We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
      is running.
      
      # tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
       rate 0bit 0pps backlog 0b 0p requeues 0
      
      User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
      one (because no estimator is active)
      
      After this patch, tc command output is :
      $ tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      
      We add a parameter to gnet_stats_copy_rate_est() function so that
      it can use gen_estimator_active(bstats, r), as suggested by Jarek.
      
      This parameter can be NULL if check is not necessary, (htb for
      example has a mandatory rate estimator)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d250a5f9
    • E
      Use sk_mark for routing lookup in more places · 2d37a186
      Eric Dumazet 提交于
      Here is a followup on this area, thanks.
      
      [RFC] af_packet: fill skb->mark at xmit
      
      skb->mark may be used by classifiers, so fill it in case user
      set a SO_MARK option on socket.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d37a186
    • Y
      ipv6 sit: 6rd (IPv6 Rapid Deployment) Support. · fa857afc
      YOSHIFUJI Hideaki / 吉藤英明 提交于
      IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
      mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
      deploy IPv6 unicast service to IPv4 sites to which it provides
      customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
      IPv4 encapsulation in order to transit IPv4-only network
      infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
      prefix of its own in place of the fixed 6to4 prefix.
      
      With this option enabled, the SIT driver offers 6rd functionality by
      providing additional ioctl API to configure the IPv6 Prefix for in
      stead of static 2002::/16 for 6to4.
      
      Original patch was done by Alexandre Cassen <acassen@freebox.fr>
      based on old Internet-Draft.
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa857afc
    • I
      add vif using local interface index instead of IP · ee5e81f0
      Ilia K 提交于
      When routing daemon wants to enable forwarding of multicast traffic it
      performs something like:
      
             struct vifctl vc = {
                     .vifc_vifi  = 1,
                     .vifc_flags = 0,
                     .vifc_threshold = 1,
                     .vifc_rate_limit = 0,
                     .vifc_lcl_addr = ip, /* <--- ip address of physical
      interface, e.g. eth0 */
                     .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
               };
             setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
      
      This leads (in the kernel) to calling  vif_add() function call which
      search the (physical) device using assigned IP address:
             dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
      
      The current API (struct vifctl) does not allow to specify an
      interface other way than using it's IP, and if there are more than a
      single interface with specified IP only the first one will be found.
      
      The attached patch (against 2.6.30.4) allows to specify an interface
      by its index, instead of IP address:
      
             struct vifctl vc = {
                     .vifc_vifi  = 1,
                     .vifc_flags = VIFF_USE_IFINDEX,   /* NEW */
                     .vifc_threshold = 1,
                     .vifc_rate_limit = 0,
                     .vifc_lcl_ifindex = if_nametoindex("eth0"),   /* NEW */
                     .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
               };
             setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
      Signed-off-by: NIlia K. <mail4ilia@gmail.com>
      
      === modified file 'include/linux/mroute.h'
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee5e81f0
    • E
      net: speedup sk_wake_async() · bcdce719
      Eric Dumazet 提交于
      An incoming datagram must bring into cpu cache *lot* of cache lines,
      in particular : (other parts omitted (hash chains, ip route cache...))
      
      On 32bit arches :
      
      offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
      offsetof(struct sock, sk_lock)         =0x34   (rw)
      
      offsetof(struct sock, sk_sleep)        =0x50 (read)
      offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
      offsetof(struct sock, sk_receive_queue)=0x74   (rw)
      
      offsetof(struct sock, sk_forward_alloc)=0x98   (rw)
      
      offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
      offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
      offsetof(struct sock, sk_filter)       =0xf8    (read)
      
      offsetof(struct sock, sk_socket)       =0x138 (read)
      
      offsetof(struct sock, sk_data_ready)   =0x15c   (read)
      
      
      We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
      with no fasync() structures. (socket->fasync_list ptr is probably already in cache
      because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
      
      This avoids one cache line load per incoming packet for common cases (no fasync())
      
      We can leave (or even move in a future patch) sk->sk_socket in a cold location
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcdce719
  3. 05 10月, 2009 13 次提交
    • J
      wext: let get_wireless_stats() sleep · a160ee69
      Johannes Berg 提交于
      A number of drivers (recently including cfg80211-based ones)
      assume that all wireless handlers, including statistics, can
      sleep and they often also implicitly assume that the rtnl is
      held around their invocation. This is almost always true now
      except when reading from sysfs:
      
        BUG: sleeping function called from invalid context at kernel/mutex.c:280
        in_atomic(): 1, irqs_disabled(): 0, pid: 10450, name: head
        2 locks held by head/10450:
         #0:  (&buffer->mutex){+.+.+.}, at: [<c10ceb99>] sysfs_read_file+0x24/0xf4
         #1:  (dev_base_lock){++.?..}, at: [<c12844ee>] wireless_show+0x1a/0x4c
        Pid: 10450, comm: head Not tainted 2.6.32-rc3 #1
        Call Trace:
         [<c102301c>] __might_sleep+0xf0/0xf7
         [<c1324355>] mutex_lock_nested+0x1a/0x33
         [<f8cea53b>] wdev_lock+0xd/0xf [cfg80211]
         [<f8cea58f>] cfg80211_wireless_stats+0x45/0x12d [cfg80211]
         [<c13118d6>] get_wireless_stats+0x16/0x1c
         [<c12844fe>] wireless_show+0x2a/0x4c
      
      Fix this by using the rtnl instead of dev_base_lock.
      Reported-by: NMiles Lane <miles.lane@gmail.com>
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a160ee69
    • A
      net: export device speed and duplex via sysfs · d519e17e
      Andy Gospodarek 提交于
      This patch exports the link-speed (in Mbps) and duplex of an interface
      via sysfs.  This eliminates the need to use ethtool just to check the
      link-speed.  Not requiring 'ethtool' and not relying on the SIOCETHTOOL
      ioctl should be helpful in an embedded environment where space is at a
      premium as well.
      
      NOTE: This patch also intentionally allows non-root users to check the link
      speed and duplex -- something not possible with ethtool.
      
      Here's some sample output:
      
      # cat /sys/class/net/eth0/speed
      100
      # cat /sys/class/net/eth0/duplex
      half
      # ethtool eth0
      Settings for eth0:
              Supported ports: [ TP ]
              Supported link modes:   10baseT/Half 10baseT/Full
                                      100baseT/Half 100baseT/Full
                                      1000baseT/Half 1000baseT/Full
              Supports auto-negotiation: Yes
              Advertised link modes:  Not reported
              Advertised auto-negotiation: No
              Speed: 100Mb/s
              Duplex: Half
              Port: Twisted Pair
              PHYAD: 1
              Transceiver: internal
              Auto-negotiation: off
              Supports Wake-on: g
              Wake-on: g
              Current message level: 0x000000ff (255)
              Link detected: yes
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d519e17e
    • M
      cfg80211: assign device type in netdev notifier callback · 053a93dd
      Marcel Holtmann 提交于
      Instead of having to modify every non-mac80211 for device type assignment,
      do this inside the netdev notifier callback of cfg80211. So all drivers
      that integrate with cfg80211 will export a proper device type.
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053a93dd
    • J
      net: introduce NETDEV_POST_INIT notifier · 7ffbe3fd
      Johannes Berg 提交于
      For various purposes including a wireless extensions
      bugfix, we need to hook into the netdev creation before
      before netdev_register_kobject(). This will also ease
      doing the dev type assignment that Marcel was working
      on for cfg80211 drivers w/o touching them all.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ffbe3fd
    • E
      tunnels: Optimize tx path · 0bfbedb1
      Eric Dumazet 提交于
      We currently dirty a cache line to update tunnel device stats
      (tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
      counters that already are present in cpu cache, in the cache
      line shared with txq->_xmit_lock
      
      This patch extends IPTUNNEL_XMIT() macro to use txq pointer
      provided by the caller.
      
      Also &tunnel->dev->stats can be replaced by &dev->stats
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bfbedb1
    • S
      ipv4: fib table algorithm performance improvement · 16c6cf8b
      Stephen Hemminger 提交于
      The FIB algorithim for IPV4 is set at compile time, but kernel goes through
      the overhead of function call indirection at runtime. Save some
      cycles by turning the indirect calls to direct calls to either
      hash or trie code.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16c6cf8b
    • N
      af_packet: add interframe drop cmsg (v6) · 97775007
      Neil Horman 提交于
      Add Ancilliary data to better represent loss information
      
      I've had a few requests recently to provide more detail regarding frame loss
      during an AF_PACKET packet capture session.  Specifically the requestors want to
      see where in a packet sequence frames were lost, i.e. they want to see that 40
      frames were lost between frames 302 and 303 in a packet capture file.  In order
      to do this we need:
      
      1) The kernel to export this data to user space
      2) The applications to make use of it
      
      This patch addresses item (1).  It does this by doing the following:
      
      A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
      also no increment a stats called po->stats.tp_gap.
      
      B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
      value of po->stats.tp_gap in skb->mark.  skb->cb would nominally be the place to
      record this, but since all the space there is used up, we're overloading
      skb->mark.  Its safe to do since any enqueued packet is guaranteed to be
      unshared at this point, and skb->mark isn't used for anything else in the rx
      path to the application.  After we record tp_gap in the skb, we zero
      po->stats.tp_gap.  This allows us to keep a counter of the number of frames lost
      between any two enqueued packets
      
      C) When the application goes to dequeue a frame from the packet socket, we look
      at skb->mark for that frame.  If it is non-zero, we add a cmsg chunk to the
      msghdr of level SOL_PACKET and type PACKET_GAPDATA.  Its a 32 bit integer that
      represents the number of frames lost between this packet and the last previous
      frame received.
      
      Note there is a chance that if there is frame loss after a receive, and then the
      socket is closed, some gap data might be lost.  This is covered by the use of
      the PACKET_AUXDATA socket option, which gives total loss data.  With a bit of
      math, the final gap can be determined that way.
      
      I've tested this patch myself, and it works well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      
       include/linux/if_packet.h |    2 ++
       net/packet/af_packet.c    |   33 +++++++++++++++++++++++++++++++++
       2 files changed, 35 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97775007
    • E
      pktgen: Avoid dirtying skb->users when txq is full · 0835acfe
      Eric Dumazet 提交于
      We can avoid two atomic ops on skb->users if packet is not going to be
      sent to the device (because hardware txqueue is full)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0835acfe
    • E
      icmp: No need to call sk_write_space() · b3a5b6cc
      Eric Dumazet 提交于
      We can make icmp messages tx completion callback a litle bit faster.
      
      Setting SOCK_USE_WRITE_QUEUE sk flag tells sock_wfree() to
      not call sk_write_space() on a socket we know no thread is posssibly
      waiting for write space. (on per cpu kernel internal icmp sockets only)
      
      This avoids the sock_def_write_space() call and
      read_lock(&sk->sk_callback_lock)/read_unlock(&sk->sk_callback_lock) calls
      as well.
      
      We avoid three atomic ops.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3a5b6cc
    • B
      ethtool: Remove support for obsolete string query operations · a9828ec6
      Ben Hutchings 提交于
      The in-tree implementations have all been converted to
      get_sset_count().
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9828ec6
    • E
      pktgen: restore nanosec delays · 9240d715
      Eric Dumazet 提交于
      Commit fd29cf72 (pktgen: convert to use ktime_t)
      inadvertantly converted "delay" parameter from nanosec to microsec.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9240d715
    • E
      pktgen: Fix multiqueue handling · 896a7cf8
      Eric Dumazet 提交于
      It is not currently possible to instruct pktgen to use one selected tx queue.
      
      When Robert added multiqueue support in commit 45b270f8, he added
      an interval (queue_map_min, queue_map_max), and his code doesnt take
      into account the case of min = max, to select one tx queue exactly.
      
      I suspect a high performance setup on a eight txqueue device wants
      to use exactly eight cpus, and assign one tx queue to each sender.
      
      This patchs makes pktgen select the right tx queue, not the first one.
      
      Also updates Documentation to reflect Robert changes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      896a7cf8
    • A
      a99bbaf5
  4. 03 10月, 2009 1 次提交
  5. 02 10月, 2009 4 次提交
    • A
      net: Use sk_mark for routing lookup in more places · 914a9ab3
      Atis Elsts 提交于
      This patch against v2.6.31 adds support for route lookup using sk_mark in some 
      more places. The benefits from this patch are the following.
      First, SO_MARK option now has effect on UDP sockets too.
      Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing 
      lookup correctly if TCP sockets with SO_MARK were used.
      Signed-off-by: NAtis Elsts <atis@mikrotik.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      914a9ab3
    • O
      IPv4 TCP fails to send window scale option when window scale is zero · 89e95a61
      Ori Finkelman 提交于
      Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
      and SYN headers even if our window scale is zero.
      
      This fixes the following observed behavior:
      
      1. Client sends a SYN with TCP window scaling option and non zero window scale
      value to a Linux box.
      2. Linux box notes large receive window from client.
      3. Linux decides on a zero value of window scale for its part.
      4. Due to compare against requested window scale size option, Linux does not to
       send windows scale TCP option header on SYN/ACK at all.
      
      With the following result:
      
      Client box thinks TCP window scaling is not supported, since SYN/ACK had no
      TCP window scale option, while Linux thinks that TCP window scaling is
      supported (and scale might be non zero), since SYN had  TCP window scale
      option and we have a mismatched idea between the client and server
      regarding window sizes.
      
      Probably it also fixes up the following bug (not observed in practice):
      
      1. Linux box opens TCP connection to some server.
      2. Linux decides on zero value of window scale.
      3. Due to compare against computed window scale size option, Linux does
      not to set windows scale TCP  option header on SYN.
      
      With the expected result that the server OS does not use window scale option
      due to not receiving such an option in the SYN headers, leading to suboptimal
      performance.
      Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NOri Finkelman <ori@comsleep.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e95a61
    • A
      net/ipv4/tcp.c: fix min() type mismatch warning · 4fdb78d3
      Andrew Morton 提交于
      net/ipv4/tcp.c: In function 'do_tcp_setsockopt':
      net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fdb78d3
    • E
      pktgen: Fix delay handling · 417bc4b8
      Eric Dumazet 提交于
      After last pktgen changes, delay handling is wrong.
      
      pktgen actually sends packets at full line speed.
      
      Fix is to update pkt_dev->next_tx even if spin() returns early,
      so that next spin() calls have a chance to see a positive delay.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      417bc4b8
  6. 01 10月, 2009 6 次提交