1. 17 11月, 2010 2 次提交
    • N
      packet: Enhance AF_PACKET implementation to not require high order contiguous... · 0e3125c7
      Neil Horman 提交于
      packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
      MIME-Version: 1.0
      Content-Type: text/plain; charset=UTF-8
      Content-Transfer-Encoding: 8bit
      
      Version 4 of this patch.
      
      Change notes:
      1) Removed extra memset.  Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)
      
      Summary:
      It was shown to me recently that systems under high load were driven very deep
      into swap when tcpdump was run.  The reason this happened was because the
      AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
      application to specify how many entries an AF_PACKET socket will have and how
      large each entry will be.  It seems the default setting for tcpdump is to set
      the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
      allocation.  Thats difficult under good circumstances, and horrid under memory
      pressure.
      
      I thought it would be good to make that a bit more usable.  I was going to do a
      simple conversion of the ring buffer from contigous pages to iovecs, but
      unfortunately, the metadata which AF_PACKET places in these buffers can easily
      span a page boundary, and given that these buffers get mapped into user space,
      and the data layout doesn't easily allow for a change to padding between frames
      to avoid that, a simple iovec change is just going to break user space ABI
      consistency.
      
      So I've done this, I've added a three tiered mechanism to the af_packet set_ring
      socket option.  It attempts to allocate memory in the following order:
      
      1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
      digging into swap
      
      2) Using vmalloc
      
      3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
      needed to get the memory
      
      The effect is that we don't disturb the system as much when we're under load,
      while still being able to conduct tcpdumps effectively.
      
      Tested successfully by me.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NMaciej Żenczykowski <zenczykowski@gmail.com>
      Reported-by: NMaciej Żenczykowski <zenczykowski@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e3125c7
    • J
      ipv6: fix missing in6_ifa_put in addrconf · 9d82ca98
      John Fastabend 提交于
      Fix ref count bug introduced by
      
      commit 2de79570
      Author: Lorenzo Colitti <lorenzo@google.com>
      Date:   Wed Oct 27 18:16:49 2010 +0000
      
      ipv6: addrconf: don't remove address state on ifdown if the address
      is being kept
      
      Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
      refcnt correctly with in6_ifa_put().
      Reported-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d82ca98
  2. 16 11月, 2010 28 次提交
  3. 15 11月, 2010 6 次提交
    • G
      dccp ccid-2: Separate option parsing from CCID processing · 7e87fe84
      Gerrit Renker 提交于
      This patch replaces an almost identical replication of code: large parts
      of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c.
      
      Apart from the duplication, this caused two more problems:
       1. CCIDs should not need to be concerned with parsing header options;
       2. one can not assume that Ack Vectors appear as a contiguous area within an
          skb, it is legal to insert other options and/or padding in between. The
          current code would throw an error and stop reading in such a case.
      
      Since Ack Vectors provide CCID-specific information, they are now processed
      by the CCID directly, separating this functionality from the main DCCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      7e87fe84
    • G
      dccp ccid-2: Remove old infrastructure · 52394eec
      Gerrit Renker 提交于
      This removes
       * functions for which updates have been provided in the preceding patches and
       * the @av_vec_len field - it is no longer necessary since the buffer length is
         now always computed dynamically.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      52394eec
    • G
      dccp ccid-2: Schedule Sync as out-of-band mechanism · d83447f0
      Gerrit Renker 提交于
      The problem with Ack Vectors is that
        i) their length is variable and can in principle grow quite large,
       ii) it is hard to predict exactly how large they will be.
      
      Due to the second point it seems not a good idea to reduce the MPS; in
      particular when on average there is enough room for the Ack Vector and an
      increase in length is momentarily due to some burst loss, after which the
      Ack Vector returns to its normal/average length.
      
      The solution taken by this patch is to subtract a minimum-expected Ack Vector
      length from the MPS, and to defer any larger Ack Vectors onto a separate
      Sync - but only if indeed there is no space left on the skb.
      
      This patch provides the infrastructure to schedule Sync-packets for transporting
      (urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since
      it does not need to wait for new application data.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d83447f0
    • G
      dccp ccid-2: Consolidate Ack-Vector processing within main DCCP module · 18219463
      Gerrit Renker 提交于
      This aggregates Ack Vector processing (handling input and clearing old state)
      into one function, for the following reasons and benefits:
       * all Ack Vector-specific processing is now in one place;
       * duplicated code is removed;
       * ensuring sanity: from an Ack Vector point of view, it is better to clear the
                          old state first before entering new state;
       * Ack Event handling happens mostly within the CCIDs, not the main DCCP module.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      18219463
    • G
      dccp ccid-2: Update code for the Ack Vector input/registration routine · 38024086
      Gerrit Renker 提交于
      This patch updates the code which registers new packets as received, using the
      new circular buffer interface. It contributes a new algorithm which
       * supports both tail/head pointers and buffer wrap-around and
       * deals with overflow (head/tail move in lock-step).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      38024086
    • G
      dccp ccid-2: Algorithm to update buffer state · 5753fdfe
      Gerrit Renker 提交于
      This provides a routine to consistently update the buffer state when the
      peer acknowledges receipt of Ack Vectors; updating state in the list of Ack
      Vectors as well as in the circular buffer.
      
      While based on RFC 4340, several additional (and necessary) precautions were
      added to protect the consistency of the buffer state. These additions are
      essential, since analysis and experience showed that the basic algorithm was
      insufficient for this task (which lead to problems that were hard to debug).
      
      The algorithm now
       * deals with HC-sender acknowledging to HC-receiver and vice versa,
       * keeps track of the last unacknowledged but received seqno in tail_ackno,
       * has special cases to reset the overflow condition when appropriate,
       * is protected against receiving older information (would mess up buffer state).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      5753fdfe
  4. 13 11月, 2010 4 次提交