1. 20 3月, 2012 1 次提交
    • E
      tcp: reduce out_of_order memory use · c8628155
      Eric Dumazet 提交于
      With increasing receive window sizes, but speed of light not improved
      that much, out of order queue can contain a huge number of skbs, waiting
      to be moved to receive_queue when missing packets can fill the holes.
      
      Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
      sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
      probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
      many cases.
      
      When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
      latency killer and cpu cache blower.
      
      Doing the coalescing attempt each time we add a frame in ofo queue
      permits to keep memory use tight and in many cases avoid the
      tcp_collapse() thing later.
      
      Tested on various wireless setups (b43, ath9k, ...) known to use big skb
      truesize, this patch removed the "packets collapsed in receive queue due
      to low socket buffer" I had before.
      
      This also reduced average memory used by tcp sockets.
      
      With help from Neal Cardwell.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8628155
  2. 17 3月, 2012 1 次提交
    • N
      arp: allow arp processing to honor per interface arp_accept sysctl · 124d37e9
      Neil Horman 提交于
      I found recently that the arp_process function which handles all of our received
      arp frames, is using IPV4_DEVCONF_ALL macro to check the state of the arp_process
      flag.  This seems wrong, as it implies that either none or all of the network
      interfaces accept gratuitous arps.  This patch corrects that, allowing
      per-interface arp_accept configuration to deviate from the all setting.  Note
      this also brings us into line with the way the arp_filter setting is handled
      during arp_process execution.
      
      Tested this myself on my home network, and confirmed it works as expected.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      124d37e9
  3. 13 3月, 2012 3 次提交
  4. 12 3月, 2012 1 次提交
  5. 10 3月, 2012 2 次提交
  6. 08 3月, 2012 13 次提交
  7. 07 3月, 2012 5 次提交
  8. 06 3月, 2012 14 次提交
    • H
      memcg: fix GPF when cgroup removal races with last exit · 7512102c
      Hugh Dickins 提交于
      When moving tasks from old memcg (with move_charge_at_immigrate on new
      memcg), followed by removal of old memcg, hit General Protection Fault in
      mem_cgroup_lru_del_list() (called from release_pages called from
      free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
      exit_mmap from mmput from exit_mm from do_exit).
      
      Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
      been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
      still trying to update its stats, and take page off lru before freeing.
      
      A task, or a charge, or a page on lru: each secures a memcg against
      removal.  In this case, the last task has been moved out of the old memcg,
      and it is exiting: anonymous pages are uncharged one by one from the
      memcg, as they are zapped from its pagetables, so the charge gets down to
      0; but the pages themselves are queued in an mmu_gather for freeing.
      
      Most of those pages will be on lru (and force_empty is careful to
      lru_add_drain_all, to add pages from pagevec to lru first), but not
      necessarily all: perhaps some have been isolated for page reclaim, perhaps
      some isolated for other reasons.  So, force_empty may find no task, no
      charge and no page on lru, and let the removal proceed.
      
      There would still be no problem if these pages were immediately freed; but
      typically (and the put_page_testzero protocol demands it) they have to be
      added back to lru before they are found freeable, then removed from lru
      and freed.  We don't see the issue when adding, because the
      mem_cgroup_iter() loops keep their own reference to the memcg being
      scanned; but when it comes to mem_cgroup_lru_del_list().
      
      I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
      PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
      of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
      38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
      removed those convolutions, but left this General Protection Fault.
      
      But it's surprisingly easy to restore the old behaviour: just check
      PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
      to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
      change?  just going back to how it worked before; testing, and an audit of
      uses of pc->mem_cgroup, show no problem.
      
      And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
      sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
      longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
      clear pc->mem_cgroup if necessary").
      
      Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
      is not strictly necessary: the lru_lock there, with RCU before memcg
      structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
      without that; but it seems cleaner to rely on one dependency less.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7512102c
    • O
      vfork: kill PF_STARTING · 6e27f63e
      Oleg Nesterov 提交于
      Previously it was (ab)used by utrace.  Then it was wrongly used by the
      scheduler code.
      
      Currently it is not used, kill it before it finds the new erroneous user.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e27f63e
    • O
      coredump_wait: don't call complete_vfork_done() · 57b59c4a
      Oleg Nesterov 提交于
      Now that CLONE_VFORK is killable, coredump_wait() no longer needs
      complete_vfork_done().  zap_threads() should find and kill all tasks with
      the same ->mm, this includes our parent if ->vfork_done is set.
      
      mm_release() becomes the only caller, unexport complete_vfork_done().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57b59c4a
    • O
      vfork: make it killable · d68b46fe
      Oleg Nesterov 提交于
      Make vfork() killable.
      
      Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
      fails we do not return to the user-mode and never touch the memory shared
      with our child.
      
      However, in this case we should clear child->vfork_done before return, we
      use task_lock() in do_fork()->wait_for_vfork_done() and
      complete_vfork_done() to serialize with each other.
      
      Note: now that we use task_lock() we don't really need completion, we
      could turn task->vfork_done into "task_struct *wake_up_me" but this needs
      some complications.
      
      NOTE: this and the next patches do not affect in-kernel users of
      CLONE_VFORK, kernel threads run with all signals ignored including
      SIGKILL/SIGSTOP.
      
      However this is obviously the user-visible change.  Not only a fatal
      signal can kill the vforking parent, a sub-thread can do execve or
      exit_group() and kill the thread sleeping in vfork().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d68b46fe
    • O
      vfork: introduce complete_vfork_done() · c415c3b4
      Oleg Nesterov 提交于
      No functional changes.
      
      Move the clear-and-complete-vfork_done code into the new trivial helper,
      complete_vfork_done().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c415c3b4
    • M
      kmsg_dump: don't run on non-error paths by default · c22ab332
      Matthew Garrett 提交于
      Since commit 04c6862c ("kmsg_dump: add kmsg_dump() calls to the
      reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
      run on normal paths including poweroff and reboot.
      
      This is less than ideal given pstore implementations that can only
      represent single backtraces, since a reboot may overwrite a stored oops
      before it's been picked up by userspace.  In addition, some pstore
      backends may have low performance and provide a significant delay in
      reboot as a result.
      
      This patch adds a printk.always_kmsg_dump kernel parameter (which can also
      be changed from userspace).  Without it, the code will only be run on
      failure paths rather than on normal paths.  The option can be enabled in
      environments where there's a desire to attempt to audit whether or not a
      reboot was cleanly requested or not.
      Signed-off-by: NMatthew Garrett <mjg@redhat.com>
      Acked-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marco Stornelli <marco.stornelli@gmail.com>
      Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c22ab332
    • V
      cfg80211: Add an attribute to set inactivity timeout in AP mode · 1b658f11
      Vasanthakumar Thiagarajan 提交于
      This patch adds an attribute, NL80211_ATTR_INACTIVITY_TIMEOUT,
      to set the inactivity timeout which can be used to remove the
      station in AP mode. This can be passed in NL80211_CMD_START_AP
      and used by the drivers which have AP MLME in firmware but
      don't support get_station() properly. To disable inactivity
      timer in userspace, wpa_s for example, there is a new flag,
      NL80211_FEATURE_INACTIVITY_TIMER, in nl80211_feature_flags
      through which drivers can register their capability to use
      the inactivity timeout to free the stations.
      Signed-off-by: NVasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
      Acked-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      1b658f11
    • E
      net: export netdev_stats_to_stats64 · 77a1abf5
      Eric Dumazet 提交于
      Some drivers use internal netdev stats member to store part of their
      stats, yet advertize ndo_get_stats64() to implement some 64bit fields.
      
      Allow them to use netdev_stats_to_stats64() helper to make the copy of
      netdev stats before they compute their 64bit counters.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77a1abf5
    • A
      {nl,cfg,mac}80211: Implement RSSI threshold for mesh peering · 55335137
      Ashok Nagarajan 提交于
      Mesh peer links are established only if average rssi of the peer
      candidate satisfies the threshold. This is not in 802.11s specification
      but was requested by David Fulgham, an open80211s user. This is a way to avoid
      marginal peer links with stations that are barely within range.
      
      This patch adds a new mesh configuration parameter, mesh_rssi_threshold. This
      feature is supported only for hardwares that report signal in dBm.
      Signed-off-by: NAshok Nagarajan <ashok@cozybit.com>
      Signed-off-by: NJavier Cardona <javier@cozybit.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      55335137
    • H
      bcma: add support for sprom not found on the device · a027237a
      Hauke Mehrtens 提交于
      On SoCs the sprom is stored in the nvram in a special partition on the
      flash chip. The nvram contains the sprom for the main bus, but
      sometimes also for a pci devices using bcma. This patch makes it
      possible for the arch code to register a function to fetch the needed
      sprom from the nvram and provide it to the bcma code.
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      a027237a
    • H
      bcma: export bcma_find_core · 1c9351cf
      Hauke Mehrtens 提交于
      This function is needed by the bcm47xx arch code to get the number of
      the ieee80211 core.
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      1c9351cf
    • H
      ssb: add some missing sprom attributes · 52aa63f5
      Hauke Mehrtens 提交于
      This patch extends the sprom struct to contain all sprom attributes
      found in sprom version 1 to 9. This was done accordingly to the open
      source part of the Broadcom SDK.
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      52aa63f5
    • H
      ssb: add alpha2 · 03a5642b
      Hauke Mehrtens 提交于
      This member contains the country code encoded with two chars
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      03a5642b
    • H
      ssb: fix per path sprom vars · 3b64e6f9
      Hauke Mehrtens 提交于
      On sprom version 4 and 5 there are 4 values for pa_2g, pa_5gl, pa_5g
      and pa_5gh, for sprom version 8 and 9 there are only 3. Make the per
      path sprom store also work for older sprom versions.
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      3b64e6f9