1. 27 7月, 2011 14 次提交
    • A
      asm-generic: add another generic ext2 atomic bitops · 148817ba
      Akinobu Mita 提交于
      The majority of architectures implement ext2 atomic bitops as
      test_and_{set,clear}_bit() without spinlock.
      
      This adds this type of generic implementation in ext2-atomic-setbit.h and
      use it wherever possible.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Suggested-by: NAndreas Dilger <adilger@dilger.ca>
      Suggested-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      148817ba
    • A
      fault-injection: use debugfs_remove_recursive · 7f5ddcc8
      Akinobu Mita 提交于
      Use debugfs_remove_recursive() to simplify initialization and
      deinitialization of fault injection debugfs files.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f5ddcc8
    • A
      fault-injection: remove nonexistent function extern · 6b16f748
      Akinobu Mita 提交于
      should_fail_srandom() does not exist.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b16f748
    • S
      ramoops: make record_size a module parameter · 3e5c4fad
      Sergiu Iordache 提交于
      The size of the dump is currently set using the RECORD_SIZE macro which
      is set to a page size.  This patch makes the record size a module
      parameter and allows it to be set through platform data as well to allow
      larger dumps if needed.
      Signed-off-by: NSergiu Iordache <sergiu@chromium.org>
      Acked-by: NMarco Stornelli <marco.stornelli@gmail.com>
      Cc: "Ahmed S. Darwish" <darwish.07@gmail.com>
      Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e5c4fad
    • S
      ramoops: move dump_oops into platform data · 6b4d2a27
      Sergiu Iordache 提交于
      The platform driver currently allows setting the mem_size and
      mem_address.
      
      ince dump_oops is also a module parameter it would be more consistent if
      it could be set through platform data as well.
      Signed-off-by: NSergiu Iordache <sergiu@chromium.org>
      Acked-by: NMarco Stornelli <marco.stornelli@gmail.com>
      Cc: "Ahmed S. Darwish" <darwish.07@gmail.com>
      Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b4d2a27
    • M
      panic, vt: do not force oops output when panic_timeout < 0 · c958474b
      Mandeep Singh Baines 提交于
      Don't force output if you intend to reboot immediately.
      
      In this patch, I'm disabling the functionality enabled by
      vc->vc_panic_force_write if panic_timeout < 0 (i.e.  no timeout).
      vc_panic_force_write is only enabled for fb video consoles if the
      FBINFO_CAN_FORCE_OUTPUT flag is set.
      
      For our application, we're using ram_oops to preserved the panic in
      memory.  We want to reliably, and as fast as possible, machine_restart.
      The vc_panic_force_write flag results in a bunch of graphics driver code
      to be invoked which slows down restart and decreases reliability.  Since
      we're already storing the panic in RAM and are going to reboot
      immediately, there is no benefit in mode switching back to the vc in
      order to display the panic output.  The log buffer will get flushed by
      the console_unblank() call so remote management consoles should see all
      output.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c958474b
    • W
      include/linux/dma-mapping.h: remove DMA_xxBIT_MASK macros · 91f6cdf8
      WANG Cong 提交于
      git grep shows there are no users in tree, so we can remove them safely.
      Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Acked-by: NJiri Slaby <jslaby@suse.cz>
      Acked-by: NVinod Koul <vinod.koul@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91f6cdf8
    • V
      ipc: introduce shm_rmid_forced sysctl · b34a6b1d
      Vasiliy Kulikov 提交于
      Add support for the shm_rmid_forced sysctl.  If set to 1, all shared
      memory objects in current ipc namespace will be automatically forced to
      use IPC_RMID.
      
      The POSIX way of handling shmem allows one to create shm objects and
      call shmdt(), leaving shm object associated with no process, thus
      consuming memory not counted via rlimits.
      
      With shm_rmid_forced=1 the shared memory object is counted at least for
      one process, so OOM killer may effectively kill the fat process holding
      the shared memory.
      
      It obviously breaks POSIX - some programs relying on the feature would
      stop working.  So set shm_rmid_forced=1 only if you're sure nobody uses
      "orphaned" memory.  Use shm_rmid_forced=0 by default for compatability
      reasons.
      
      The feature was previously impemented in -ow as a configure option.
      
      [akpm@linux-foundation.org: fix documentation, per Randy]
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: readability/conventionality tweaks]
      [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@canonical.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Solar Designer <solar@openwall.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b34a6b1d
    • K
      cpumask: add cpumask_var_t documentation · a64a26e8
      KOSAKI Motohiro 提交于
      cpumask_var_t has one notable difference from cpumask_t.  Add the
      explanation.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Thiago Farina <tfransosi@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a64a26e8
    • M
      cpusets: randomize node rotor used in cpuset_mem_spread_node() · 778d3b0f
      Michal Hocko 提交于
      [ This patch has already been accepted as commit 0ac0c0d0 but later
        reverted (commit 35926ff5) because it itroduced arch specific
        __node_random which was defined only for x86 code so it broke other
        archs.  This is a followup without any arch specific code.  Other than
        that there are no functional changes.]
      
      Some workloads that create a large number of small files tend to assign
      too many pages to node 0 (multi-node systems).  Part of the reason is
      that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
      at node 0 for newly created tasks.
      
      This patch changes the rotor to be initialized to a random node number
      of the cpuset.
      
      [akpm@linux-foundation.org: fix layout]
      [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
      [mhocko@suse.cz: Make it arch independent]
      [akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
      Signed-off-by: NJack Steiner <steiner@sgi.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Paul Menage <menage@google.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Robin Holt <holt@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Menage <menage@google.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      778d3b0f
    • K
      memcg: add memory.vmscan_stat · 82f9d486
      KAMEZAWA Hiroyuki 提交于
      The commit log of 0ae5e89c ("memcg: count the soft_limit reclaim
      in...") says it adds scanning stats to memory.stat file.  But it doesn't
      because we considered we needed to make a concensus for such new APIs.
      
      This patch is a trial to add memory.scan_stat. This shows
        - the number of scanned pages(total, anon, file)
        - the number of rotated pages(total, anon, file)
        - the number of freed pages(total, anon, file)
        - the number of elaplsed time (including sleep/pause time)
      
        for both of direct/soft reclaim.
      
      The biggest difference with oringinal Ying's one is that this file
      can be reset by some write, as
      
        # echo 0 ...../memory.scan_stat
      
      Example of output is here. This is a result after make -j 6 kernel
      under 300M limit.
      
        [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
        [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
        scanned_pages_by_limit 9471864
        scanned_anon_pages_by_limit 6640629
        scanned_file_pages_by_limit 2831235
        rotated_pages_by_limit 4243974
        rotated_anon_pages_by_limit 3971968
        rotated_file_pages_by_limit 272006
        freed_pages_by_limit 2318492
        freed_anon_pages_by_limit 962052
        freed_file_pages_by_limit 1356440
        elapsed_ns_by_limit 351386416101
        scanned_pages_by_system 0
        scanned_anon_pages_by_system 0
        scanned_file_pages_by_system 0
        rotated_pages_by_system 0
        rotated_anon_pages_by_system 0
        rotated_file_pages_by_system 0
        freed_pages_by_system 0
        freed_anon_pages_by_system 0
        freed_file_pages_by_system 0
        elapsed_ns_by_system 0
        scanned_pages_by_limit_under_hierarchy 9471864
        scanned_anon_pages_by_limit_under_hierarchy 6640629
        scanned_file_pages_by_limit_under_hierarchy 2831235
        rotated_pages_by_limit_under_hierarchy 4243974
        rotated_anon_pages_by_limit_under_hierarchy 3971968
        rotated_file_pages_by_limit_under_hierarchy 272006
        freed_pages_by_limit_under_hierarchy 2318492
        freed_anon_pages_by_limit_under_hierarchy 962052
        freed_file_pages_by_limit_under_hierarchy 1356440
        elapsed_ns_by_limit_under_hierarchy 351386416101
        scanned_pages_by_system_under_hierarchy 0
        scanned_anon_pages_by_system_under_hierarchy 0
        scanned_file_pages_by_system_under_hierarchy 0
        rotated_pages_by_system_under_hierarchy 0
        rotated_anon_pages_by_system_under_hierarchy 0
        rotated_file_pages_by_system_under_hierarchy 0
        freed_pages_by_system_under_hierarchy 0
        freed_anon_pages_by_system_under_hierarchy 0
        freed_file_pages_by_system_under_hierarchy 0
        elapsed_ns_by_system_under_hierarchy 0
      
      total_xxxx is for hierarchy management.
      
      This will be useful for further memcg developments and need to be
      developped before we do some complicated rework on LRU/softlimit
      management.
      
      This patch adds a new struct memcg_scanrecord into scan_control struct.
      sc->nr_scanned at el is not designed for exporting information.  For
      example, nr_scanned is reset frequentrly and incremented +2 at scanning
      mapped pages.
      
      To avoid complexity, I added a new param in scan_control which is for
      exporting scanning score.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Cc: Andrew Bresticker <abrestic@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82f9d486
    • K
      memcg: consolidate memory cgroup lru stat functions · bb2a0de9
      KAMEZAWA Hiroyuki 提交于
      In mm/memcontrol.c, there are many lru stat functions as..
      
        mem_cgroup_zone_nr_lru_pages
        mem_cgroup_node_nr_file_lru_pages
        mem_cgroup_nr_file_lru_pages
        mem_cgroup_node_nr_anon_lru_pages
        mem_cgroup_nr_anon_lru_pages
        mem_cgroup_node_nr_unevictable_lru_pages
        mem_cgroup_nr_unevictable_lru_pages
        mem_cgroup_node_nr_lru_pages
        mem_cgroup_nr_lru_pages
        mem_cgroup_get_local_zonestat
      
      Some of them are under #ifdef MAX_NUMNODES >1 and others are not.
      This seems bad. This patch consolidates all functions into
      
        mem_cgroup_zone_nr_lru_pages()
        mem_cgroup_node_nr_lru_pages()
        mem_cgroup_nr_lru_pages()
      
      For these functions, "which LRU?" information is passed by a mask.
      
      example:
        mem_cgroup_nr_lru_pages(mem, BIT(LRU_ACTIVE_ANON))
      
      And I added some macro as ALL_LRU, ALL_LRU_FILE, ALL_LRU_ANON.
      
      example:
        mem_cgroup_nr_lru_pages(mem, ALL_LRU)
      
      BTW, considering layout of NUMA memory placement of counters, this patch seems
      to be better.
      
      Now, when we gather all LRU information, we scan in following orer
          for_each_lru -> for_each_node -> for_each_zone.
      
      This means we'll touch cache lines in different node in turn.
      
      After patch, we'll scan
          for_each_node -> for_each_zone -> for_each_lru(mask)
      
      Then, we'll gather information in the same cacheline at once.
      
      [akpm@linux-foundation.org: fix warnigns, build error]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb2a0de9
    • K
      memcg: export memory cgroup's swappiness with mem_cgroup_swappiness() · 1f4c025b
      KAMEZAWA Hiroyuki 提交于
      Each memory cgroup has a 'swappiness' value which can be accessed by
      get_swappiness(memcg).  The major user is try_to_free_mem_cgroup_pages()
      and swappiness is passed by argument.  It's propagated by scan_control.
      
      get_swappiness() is a static function but some planned updates will need
      to get swappiness from files other than memcontrol.c This patch exports
      get_swappiness() as mem_cgroup_swappiness().  With this, we can remove the
      argument of swapiness from try_to_free...  and drop swappiness from
      scan_control.  only memcg uses it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f4c025b
    • S
      libceph: don't time out osd requests that haven't been received · 4cf9d544
      Sage Weil 提交于
      Keep track of when an outgoing message is ACKed (i.e., the server fully
      received it and, presumably, queued it for processing).  Time out OSD
      requests only if it's been too long since they've been received.
      
      This prevents timeouts and connection thrashing when the OSDs are simply
      busy and are throttling the requests they read off the network.
      Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
      Signed-off-by: NSage Weil <sage@newdream.net>
      4cf9d544
  2. 26 7月, 2011 26 次提交