1. 20 3月, 2009 2 次提交
    • E
      net: reorder struct Qdisc for better SMP performance · 5e140dfc
      Eric Dumazet 提交于
      dev_queue_xmit() needs to dirty fields "state", "q", "bstats" and "qstats"
      
      On x86_64 arch, they currently span three cache lines, involving more
      cache line ping pongs than necessary, making longer holding of queue spinlock.
      
      We can reduce this to one cache line, by grouping all read-mostly fields
      at the beginning of structure. (Or should I say, all highly modified fields
      at the end :) )
      
      Before patch :
      
      offsetof(struct Qdisc, state)=0x38
      offsetof(struct Qdisc, q)=0x48
      offsetof(struct Qdisc, bstats)=0x80
      offsetof(struct Qdisc, qstats)=0x90
      sizeof(struct Qdisc)=0xc8
      
      After patch :
      
      offsetof(struct Qdisc, state)=0x80
      offsetof(struct Qdisc, q)=0x88
      offsetof(struct Qdisc, bstats)=0xa0
      offsetof(struct Qdisc, qstats)=0xac
      sizeof(struct Qdisc)=0xc0
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e140dfc
    • S
      rtnetlink: add new value for DHCP added routes · 2e1ab634
      Stephen Hemminger 提交于
      To improve manageability, it would be good to be able to disambiguate routes
      added by administrator from those added by DHCP client.  The only necessary
      kernel change is to add value to rtnetlink include file so iproute2 utility
      can use it.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e1ab634
  2. 17 3月, 2009 4 次提交
  3. 16 3月, 2009 5 次提交
    • P
      netfilter: conntrack: don't deliver events for racy packets · b1e93a68
      Pablo Neira Ayuso 提交于
      This patch skips the delivery of conntrack events if the packet
      was drop due to a race condition in the conntrack insertion.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      b1e93a68
    • I
      tcp: cache result of earlier divides when mss-aligning things · 2a3a041c
      Ilpo Järvinen 提交于
      The results is very unlikely change every so often so we
      hardly need to divide again after doing that once for a
      connection. Yet, if divide still becomes necessary we
      detect that and do the right thing and again settle for
      non-divide state. Takes the u16 space which was previously
      taken by the plain xmit_size_goal.
      
      This should take care part of the tso vs non-tso difference
      we found earlier.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a3a041c
    • I
      tcp: simplify tcp_current_mss · 0c54b85f
      Ilpo Järvinen 提交于
      There's very little need for most of the callsites to get
      tp->xmit_goal_size updated. That will cost us divide as is,
      so slice the function in two. Also, the only users of the
      tp->xmit_goal_size are directly behind tcp_current_mss(),
      so there's no need to store that variable into tcp_sock
      at all! The drop of xmit_goal_size currently leaves 16-bit
      hole and some reorganization would again be necessary to
      change that (but I'm aiming to fill that hole with u16
      xmit_goal_size_segs to cache the results of the remaining
      divide to get that tso on regression).
      
      Bring xmit_goal_size parts into tcp.c
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c54b85f
    • I
      tcp: consolidate paws check · c887e6d2
      Ilpo Järvinen 提交于
      Wow, it was quite tricky to merge that stream of negations
      but I think I finally got it right:
      
      check & replace_ts_recent:
      (s32)(rcv_tsval - ts_recent) >= 0                  => 0
      (s32)(ts_recent - rcv_tsval) <= 0                  => 0
      
      discard:
      (s32)(ts_recent - rcv_tsval)  > TCP_PAWS_WINDOW    => 1
      (s32)(ts_recent - rcv_tsval) <= TCP_PAWS_WINDOW    => 0
      
      I toggled the return values of tcp_paws_check around since
      the old encoding added yet-another negation making tracking
      of truth-values really complicated.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c887e6d2
    • E
      net: reorder fields of struct socket · 8bdd663a
      Eric Dumazet 提交于
      On x86_64, its rather unfortunate that "wait_queue_head_t wait"
      field of "struct socket" spans two cache lines (assuming a 64
      bytes cache line in current cpus)
      
      offsetof(struct socket, wait)=0x30
      sizeof(wait_queue_head_t)=0x18
      
      This might explain why Kenny Chang noticed that his multicast workload
      was performing bad with 64 bit kernels, since more cache lines ping pongs
      were involved.
      
      This litle patch moves "wait" field next "fasync_list" so that both
      fields share a single cache line, to speedup sock_def_readable()
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bdd663a
  4. 14 3月, 2009 6 次提交
    • G
      ppp: ppp_mp_explode() redesign · 9c705260
      Gabriele Paoloni 提交于
      I found the PPP subsystem to not work properly when connecting channels
      with different speeds to the same bundle.
      
      Problem Description:
      
      As the "ppp_mp_explode" function fragments the sk_buff buffer evenly
      among the PPP channels that are connected to a certain PPP unit to
      make up a bundle, if we are transmitting using an upper layer protocol
      that requires an Ack before sending the next packet (like TCP/IP for
      example), we will have a bandwidth bottleneck on the slowest channel
      of the bundle.
      
      Let's clarify by an example. Let's consider a scenario where we have
      two PPP links making up a bundle: a slow link (10KB/sec) and a fast
      link (1000KB/sec) working at the best (full bandwidth). On the top we
      have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the
      PPP subsystem. The "ppp_mp_explode" function will divide the buffer in
      two fragments of 500B each (we are neglecting all the headers, crc,
      flags etc?.). Before the TCP/IP stack sends out the next buffer, it
      will have to wait for the ACK response from the remote peer, so it
      will have to wait for both fragments to have been sent over the two
      PPP links, received by the remote peer and reconstructed. The
      resulting behaviour is that, rather than having a bundle working
      @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle
      working @20KB/sec (the double of the slowest channels bandwidth).
      
      
      Problem Solution:
      
      The problem has been solved by redesigning the "ppp_mp_explode"
      function in such a way to make it split the sk_buff buffer according
      to the speeds of the underlying PPP channels (the speeds of the serial
      interfaces respectively attached to the PPP channels). Referring to
      the above example, the redesigned "ppp_mp_explode" function will now
      divide the 1000 Bytes buffer into two fragments whose sizes are set
      according to the speeds of the channels where they are going to be
      sent on (e.g .  10 Byets on 10KB/sec channel and 990 Bytes on
      1000KB/sec channel).  The reworked function grants the same
      performances of the original one in optimal working conditions (i.e. a
      bundle made up of PPP links all working at the same speed), while
      greatly improving performances on the bundles made up of channels
      working at different speeds.
      Signed-off-by: NGabriele Paoloni <gabriele.paoloni@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c705260
    • M
      phylib: convert state_queue work to delayed_work · a390d1f3
      Marcin Slusarz 提交于
      It closes a race in phy_stop_machine when reprogramming of phy_timer
      (from phy_state_machine) happens between del_timer_sync and cancel_work_sync.
      
      Without this change it could lead to crash if phy_device would be freed after
      phy_stop_machine (timer would fire and schedule freed work).
      Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
      Acked-by: NJean Delvare <khali@linux-fr.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a390d1f3
    • N
      Network Drop Monitor: Adding Build changes to enable drop monitor · 273ae44b
      Neil Horman 提交于
      Network Drop Monitor: Adding Build changes to enable drop monitor
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/linux/Kbuild |    1 +
       net/Kconfig          |   11 +++++++++++
       net/core/Makefile    |    1 +
       3 files changed, 13 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      273ae44b
    • N
      Network Drop Monitor: Adding drop monitor implementation & Netlink protocol · 9a8afc8d
      Neil Horman 提交于
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/linux/net_dropmon.h |   56 +++++++++
       net/core/drop_monitor.c     |  263 ++++++++++++++++++++++++++++++++++++++++++++
       2 files changed, 319 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a8afc8d
    • N
      Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying... · ead2ceb0
      Neil Horman 提交于
      Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/linux/skbuff.h |    4 +++-
       net/core/datagram.c    |    2 +-
       net/core/skbuff.c      |   22 ++++++++++++++++++++++
       net/ipv4/arp.c         |    2 +-
       net/ipv4/udp.c         |    2 +-
       net/packet/af_packet.c |    2 +-
       6 files changed, 29 insertions(+), 5 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ead2ceb0
    • N
      Network Drop Monitor: Add trace declaration for skb frees · 4893d39e
      Neil Horman 提交于
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/trace/skb.h   |    8 ++++++++
       net/core/Makefile     |    2 ++
       net/core/net-traces.c |   29 +++++++++++++++++++++++++++++
       3 files changed, 39 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4893d39e
  5. 06 3月, 2009 1 次提交
  6. 05 3月, 2009 1 次提交
    • D
      vlan: Fix vlan-in-vlan crashes. · 9d40bbda
      David S. Miller 提交于
      As analyzed by Patrick McHardy, vlan needs to reset it's
      netdev_ops pointer in it's ->init() function but this
      leaves the compat method pointers stale.
      
      Add a netdev_resync_ops() and call it from the vlan code.
      
      Any other driver which changes ->netdev_ops after register_netdevice()
      will need to call this new function after doing so too.
      
      With help from Patrick McHardy.
      Tested-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d40bbda
  7. 04 3月, 2009 1 次提交
  8. 03 3月, 2009 3 次提交
    • E
      netns: Remove net_alive · 17edde52
      Eric W. Biederman 提交于
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17edde52
    • V
      sctp: Fix broken RTO-doubling for data retransmits · 7e99013a
      Vlad Yasevich 提交于
      Commit faee47cd
      (sctp: Fix the RTO-doubling on idle-link heartbeats)
      broke the RTO doubling for data retransmits.  If the
      heartbeat was sent before the data T3-rtx time, the
      the RTO will not double upon the T3-rtx expiration.
      Distingish between the operations by passing an argument
      to the function.
      
      Additionally, Wei Youngjun pointed out that our treatment
      of requested HEARTBEATS and timer HEARTBEATS is the same
      wrt resetting congestion window.  That needs to be separated,
      since user requested HEARTBEATS should not treat the link
      as idle.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e99013a
    • H
      tcp: tcp_init_wl / tcp_update_wl argument cleanup · ee7537b6
      Hantzis Fotis 提交于
      The above functions from include/net/tcp.h have been defined with an
      argument that they never use. The argument is 'u32 ack' which is never
      used inside the function body, and thus it can be removed. The rest of
      the patch involves the necessary changes to the function callers of the
      above two functions.
      Signed-off-by: NHantzis Fotis <xantzis@ceid.upatras.gr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee7537b6
  9. 02 3月, 2009 8 次提交
    • R
      skbuff.h: fix timestamps kernel-doc · d3a21be8
      Randy Dunlap 提交于
      Fix skbuff.h kernel-doc for timestamps: must include "struct" keyword,
      otherwise there are kernel-doc errors:
      
      Error(linux-next-20090227//include/linux/skbuff.h:161): cannot understand prototype: 'struct skb_shared_hwtstamps '
      Error(linux-next-20090227//include/linux/skbuff.h:177): cannot understand prototype: 'union skb_shared_tx '
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3a21be8
    • I
      wimax/i2400m: implement RX reorder support · c747583d
      Inaky Perez-Gonzalez 提交于
      Allow the device to give the driver RX data with reorder information.
      
      When that is done, the device will indicate the driver if a packet has
      to be held in a (sorted) queue. It will also tell the driver when held
      packets have to be released to the OS.
      
      This is done to improve the WiMAX-protocol level retransmission
      support when missing frames are detected.
      
      The code docs provide details about the implementation.
      
      In general, this just hooks into the RX path in rx.c; if a packet with
      the reorder bit in the RX header is detected, the reorder information
      in the header is extracted and one of the four main reorder operations
      are executed. In one case (queue) no packet will be delivered to the
      networking stack, just queued, whereas in the others (reset, update_ws
      and queue_update_ws), queued packet might be delivered depending on
      the window start for the specific queue.
      
      The modifications to files other than rx.c are:
      
      - control.c: during device initialization, enable reordering support
        if the rx_reorder_disabled module parameter is not enabled
      
      - driver.c: expose a rx_reorder_disable module parameter and call
        i2400m_rx_setup/release() to initialize/shutdown RX reorder
        support.
      
      - i2400m.h: introduce members in 'struct i2400m' needed for
        implementing reorder support.
      
      - linux/i2400m.h: introduce TLVs, commands and constant definitions
        related to RX reorder
      
      Last but not least, the rx reorder code includes an small circular log
      where the last N reorder operations are recorded to be displayed in
      case of inconsistency. Otherwise diagnosing issues would be almost
      impossible.
      Signed-off-by: NInaky Perez-Gonzalez <inaky@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c747583d
    • I
      wimax/i2400m: support extended data RX protocol (no need to reallocate skbs) · fd5c565c
      Inaky Perez-Gonzalez 提交于
      Newer i2400m firmwares (>= v1.4) extend the data RX protocol so that
      each packet has a 16 byte header. This header is mainly used to
      implement host reordeing (which is addressed in later commits).
      
      However, this header also allows us to overwrite it (once data has
      been extracted) with an Ethernet header and deliver to the networking
      stack without having to reallocate the skb (as it happened in fw <=
      v1.3) to make room for it.
      
      - control.c: indicate the device [dev_initialize()] that the driver
        wants to use the extended data RX protocol. Also involves adding the
        definition of the needed data types in include/linux/wimax/i2400m.h.
      
      - rx.c: handle the new payload type for the extended RX data
        protocol. Prepares the skb for delivery to
        netdev.c:i2400m_net_erx().
      
      - netdev.c: Introduce i2400m_net_erx() that adds the fake ethernet
        address to a prepared skb and delivers it to the networking
        stack.
      
      - cleanup: in most instances in rx.c, the variable 'single' was
        renamed to 'single_last' for it better conveys its meaning.
      Signed-off-by: NInaky Perez-Gonzalez <inaky@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd5c565c
    • K
      wimax: struct device - replace bus_id with dev_name(), dev_set_name() · 347707ba
      Kay Sievers 提交于
      Cc: inaky.perez-gonzalez@intel.com
      Cc: linux-wimax@intel.com
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NInaky Perez-Gonzalez <inaky@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      347707ba
    • I
      wimax/i2400m: allow control of the base-station idle mode timeout · 8987691a
      Inaky Perez-Gonzalez 提交于
      For power saving reasons, WiMAX links can be put in idle mode while
      connected after a certain time of the link not being used for tx or
      rx. In this mode, the device pages the base-station regularly and when
      data is ready to be transmitted, the link is revived.
      
      This patch allows the user to control the time the device has to be
      idle before it decides to go to idle mode from a sysfs
      interace.
      
      It also updates the initialization code to acknowledge the module
      variable 'idle_mode_disabled' when the firmware is a newer version
      (upcoming 1.4 vs 2.6.29's v1.3).
      
      The method for setting the idle mode timeout in the older firmwares is
      much more limited and can be only done at initialization time. Thus,
      the sysfs file will return -ENOSYS on older ones.
      Signed-off-by: NInaky Perez-Gonzalez <inaky@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8987691a
    • I
      tcp: kill eff_sacks "cache", the sole user can calculate itself · cabeccbd
      Ilpo Järvinen 提交于
      Also fixes insignificant bug that would cause sending of stale
      SACK block (would occur in some corner cases).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cabeccbd
    • I
      tcp: add helper for AI algorithm · 758ce5c8
      Ilpo Järvinen 提交于
      It seems that implementation in yeah was inconsistent to what
      other did as it would increase cwnd one ack earlier than the
      others do.
      
      Size benefits:
      
        bictcp_cong_avoid |  -36
        tcp_cong_avoid_ai |  +52
        bictcp_cong_avoid |  -34
        tcp_scalable_cong_avoid |  -36
        tcp_veno_cong_avoid |  -12
        tcp_yeah_cong_avoid |  -38
      
      = -104 bytes total
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      758ce5c8
    • P
      5ce04e3d
  10. 01 3月, 2009 2 次提交
  11. 28 2月, 2009 7 次提交