1. 11 4月, 2012 7 次提交
    • E
      tcp: avoid order-1 allocations on wifi and tx path · a21d4572
      Eric Dumazet 提交于
      Marc Merlin reported many order-1 allocations failures in TX path on its
      wireless setup, that dont make any sense with MTU=1500 network, and non
      SG capable hardware.
      
      After investigation, it turns out TCP uses sk_stream_alloc_skb() and
      used as a convention skb_tailroom(skb) to know how many bytes of data
      payload could be put in this skb (for non SG capable devices)
      
      Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
      sizeof(struct skb_shared_info) being above 2048)
      
      Later, mac80211 layer need to add some bytes at the tail of skb
      (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
      available has to call pskb_expand_head() and request order-1
      allocations.
      
      This patch changes sk_stream_alloc_skb() so that only
      sk->sk_prot->max_header bytes of headroom are reserved, and use a new
      skb field, avail_size to hold the data payload limit.
      
      This way, order-0 allocations done by TCP stack can leave more than 2 KB
      of tailroom and no more allocation is performed in mac80211 layer (or
      any layer needing some tailroom)
      
      avail_size is unioned with mark/dropcount, since mark will be set later
      in IP stack for output packets. Therefore, skb size is unchanged.
      Reported-by: NMarc MERLIN <marc@merlins.org>
      Tested-by: NMarc MERLIN <marc@merlins.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a21d4572
    • E
      net: allow pskb_expand_head() to get maximum tailroom · 87151b86
      Eric Dumazet 提交于
      Marc Merlin reported many order-1 allocations failures in TX path on its
      wireless setup, that dont make any sense with MTU=1500 network, and non
      SG capable hardware.
      
      Turns out part of the problem comes from pskb_expand_head() not using
      ksize() to get exact head size given by kmalloc(). Doing the same thing
      than __alloc_skb() allows more tailroom in skb and can prevent future
      reallocations.
      
      As a bonus, struct skb_shared_info becomes cache line aligned.
      Reported-by: NMarc MERLIN <marc@merlins.org>
      Tested-by: NMarc MERLIN <marc@merlins.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87151b86
    • H
      bridge: Do not send queries on multicast group leaves · 996304bb
      Herbert Xu 提交于
      As it stands the bridge IGMP snooping system will respond to
      group leave messages with queries for remaining membership.
      This is both unnecessary and undesirable.  First of all any
      multicast routers present should be doing this rather than us.
      What's more the queries that we send may end up upsetting other
      multicast snooping swithces in the system that are buggy.
      
      In fact, we can simply remove the code that send these queries
      because the existing membership expiry mechanism doesn't rely
      on them anyway.
      
      So this patch simply removes all code associated with group
      queries in response to group leave messages.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      996304bb
    • D
      MAINTAINERS: Mark NATSEMI driver as orphan'd. · 09d208ec
      David S. Miller 提交于
      After discussion with Tim Hockin.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09d208ec
    • N
      tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample · 18a223e0
      Neal Cardwell 提交于
      Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
      unscaled RTT samples.
      
      The intent in the code was to only use the 'm' measurement if it was a
      new minimum.  However, since 'm' had not yet been shifted left 3 bits
      but 'new_sample' had, this comparison would nearly always succeed,
      leading us to erroneously set our receive-side RTT estimate to the 'm'
      sample when that sample could be nearly 8x too high to use.
      
      The overall effect is to often cause the receive-side RTT estimate to
      be significantly too large (up to 40% too large for brief periods in
      my tests).
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18a223e0
    • E
      tcp: restore correct limit · 5fb84b14
      Eric Dumazet 提交于
      Commit c43b874d (tcp: properly initialize tcp memory limits) tried
      to fix a regression added in commits 4acb4190 & 3dc43e3e,
      but still get it wrong.
      
      Result is machines with low amount of memory have too small tcp_rmem[2]
      value and slow tcp receives : Per socket limit being 1/1024 of memory
      instead of 1/128 in old kernels, so rcv window is capped to small
      values.
      
      Fix this to match comment and previous behavior.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fb84b14
    • D
      Merge branch 'master' of git://1984.lsi.us.es/net · ecd159fc
      David S. Miller 提交于
      ecd159fc
  2. 10 4月, 2012 3 次提交
  3. 09 4月, 2012 2 次提交
  4. 07 4月, 2012 2 次提交
    • L
      Make the "word-at-a-time" helper functions more commonly usable · f68e556e
      Linus Torvalds 提交于
      I have a new optimized x86 "strncpy_from_user()" that will use these
      same helper functions for all the same reasons the name lookup code uses
      them.  This is preparation for that.
      
      This moves them into an architecture-specific header file.  It's
      architecture-specific for two reasons:
      
       - some of the functions are likely to want architecture-specific
         implementations.  Even if the current code happens to be "generic" in
         the sense that it should work on any little-endian machine, it's
         likely that the "multiply by a big constant and shift" implementation
         is less than optimal for an architecture that has a guaranteed fast
         bit count instruction, for example.
      
       - I expect that if architectures like sparc want to start playing
         around with this, we'll need to abstract out a few more details (in
         particular the actual unaligned accesses).  So we're likely to have
         more architecture-specific stuff if non-x86 architectures start using
         this.
      
         (and if it turns out that non-x86 architectures don't start using
         this, then having it in an architecture-specific header is still the
         right thing to do, of course)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f68e556e
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 23f347ef
      Linus Torvalds 提交于
      Pull networking updates from David Miller:
      
       1) Fix inaccuracies in network driver interface documentation, from Ben
          Hutchings.
      
       2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.
      
       3) Compile warning, locking, and refcounting fixes in netfilter's
          xt_CT, from Pablo Neira Ayuso.
      
       4) phonet sendmsg needs to validate user length just like any other
          datagram protocol, fix from Sasha Levin.
      
       5) Ipv6 multicast code uses wrong loop index, from RongQing Li.
      
       6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
          and Yuval Mintz.
      
       7) mlx4 erroneously allocates 4 pages at a time, regardless of page
          size, fix from Thadeu Lima de Souza Cascardo.
      
       8) SCTP socket option wasn't extended in a backwards compatible way,
          fix from Thomas Graf.
      
       9) Add missing address change event emissions to bonding, from Shlomo
          Pongratz.
      
      10) /proc/net/dev regressed because it uses a private offset to track
          where we are in the hash table, but this doesn't track the offset
          pullback that the seq_file code does resulting in some entries being
          missed in large dumps.
      
          Fix from Eric Dumazet.
      
      11) do_tcp_sendpage() unloads the send queue way too fast, because it
          invokes tcp_push() when it shouldn't.  Let the natural sequence
          generated by the splice paths, and the assosciated MSG_MORE
          settings, guide the tcp_push() calls.
      
          Otherwise what goes out of TCP is spaghetti and doesn't batch
          effectively into GSO/TSO clusters.
      
          From Eric Dumazet.
      
      12) Once we put a SKB into either the netlink receiver's queue or a
          socket error queue, it can be consumed and freed up, therefore we
          cannot touch it after queueing it like that.
      
          Fixes from Eric Dumazet.
      
      13) PPP has this annoying behavior in that for every transmit call it
          immediately stops the TX queue, then calls down into the next layer
          to transmit the PPP frame.
      
          But if that next layer can take it immediately, it just un-stops the
          TX queue right before returning from the transmit method.
      
          Besides being useless work, it makes several facilities unusable, in
          particular things like the equalizers.  Well behaved devices should
          only stop the TX queue when they really are full, and in PPP's case
          when it gets backlogged to the downstream device.
      
          David Woodhouse therefore fixed PPP to not stop the TX queue until
          it's downstream can't take data any more.
      
      14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
          changes, re-add.  From Marc Kleine-Budde.
      
      15) Fix link flaps in ixgbe, from Eric W. Multanen.
      
      16) Descriptor writeback fixes in e1000e from Matthew Vick.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
        net: fix a race in sock_queue_err_skb()
        netlink: fix races after skb queueing
        doc, net: Update ndo_start_xmit return type and values
        doc, net: Remove instruction to set net_device::trans_start
        doc, net: Update netdev operation names
        doc, net: Update documentation of synchronisation for TX multiqueue
        doc, net: Remove obsolete reference to dev->poll
        ethtool: Remove exception to the requirement of holding RTNL lock
        MAINTAINERS: update for Marvell Ethernet drivers
        bonding: properly unset current_arp_slave on slave link up
        phonet: Check input from user before allocating
        tcp: tcp_sendpages() should call tcp_push() once
        ipv6: fix array index in ip6_mc_add_src()
        mlx4: allocate just enough pages instead of always 4 pages
        stmmac: re-add IFF_UNICAST_FLT for dwmac1000
        bnx2x: Clear MDC/MDIO warning message
        bnx2x: Fix BCM57711+BCM84823 link issue
        bnx2x: Clear BCM84833 LED after fan failure
        bnx2x: Fix BCM84833 PHY FW version presentation
        bnx2x: Fix link issue for BCM8727 boards.
        ...
      23f347ef
  5. 06 4月, 2012 26 次提交