1. 27 7月, 2011 4 次提交
    • K
      memcg: add memory.vmscan_stat · 82f9d486
      KAMEZAWA Hiroyuki 提交于
      The commit log of 0ae5e89c ("memcg: count the soft_limit reclaim
      in...") says it adds scanning stats to memory.stat file.  But it doesn't
      because we considered we needed to make a concensus for such new APIs.
      
      This patch is a trial to add memory.scan_stat. This shows
        - the number of scanned pages(total, anon, file)
        - the number of rotated pages(total, anon, file)
        - the number of freed pages(total, anon, file)
        - the number of elaplsed time (including sleep/pause time)
      
        for both of direct/soft reclaim.
      
      The biggest difference with oringinal Ying's one is that this file
      can be reset by some write, as
      
        # echo 0 ...../memory.scan_stat
      
      Example of output is here. This is a result after make -j 6 kernel
      under 300M limit.
      
        [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
        [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
        scanned_pages_by_limit 9471864
        scanned_anon_pages_by_limit 6640629
        scanned_file_pages_by_limit 2831235
        rotated_pages_by_limit 4243974
        rotated_anon_pages_by_limit 3971968
        rotated_file_pages_by_limit 272006
        freed_pages_by_limit 2318492
        freed_anon_pages_by_limit 962052
        freed_file_pages_by_limit 1356440
        elapsed_ns_by_limit 351386416101
        scanned_pages_by_system 0
        scanned_anon_pages_by_system 0
        scanned_file_pages_by_system 0
        rotated_pages_by_system 0
        rotated_anon_pages_by_system 0
        rotated_file_pages_by_system 0
        freed_pages_by_system 0
        freed_anon_pages_by_system 0
        freed_file_pages_by_system 0
        elapsed_ns_by_system 0
        scanned_pages_by_limit_under_hierarchy 9471864
        scanned_anon_pages_by_limit_under_hierarchy 6640629
        scanned_file_pages_by_limit_under_hierarchy 2831235
        rotated_pages_by_limit_under_hierarchy 4243974
        rotated_anon_pages_by_limit_under_hierarchy 3971968
        rotated_file_pages_by_limit_under_hierarchy 272006
        freed_pages_by_limit_under_hierarchy 2318492
        freed_anon_pages_by_limit_under_hierarchy 962052
        freed_file_pages_by_limit_under_hierarchy 1356440
        elapsed_ns_by_limit_under_hierarchy 351386416101
        scanned_pages_by_system_under_hierarchy 0
        scanned_anon_pages_by_system_under_hierarchy 0
        scanned_file_pages_by_system_under_hierarchy 0
        rotated_pages_by_system_under_hierarchy 0
        rotated_anon_pages_by_system_under_hierarchy 0
        rotated_file_pages_by_system_under_hierarchy 0
        freed_pages_by_system_under_hierarchy 0
        freed_anon_pages_by_system_under_hierarchy 0
        freed_file_pages_by_system_under_hierarchy 0
        elapsed_ns_by_system_under_hierarchy 0
      
      total_xxxx is for hierarchy management.
      
      This will be useful for further memcg developments and need to be
      developped before we do some complicated rework on LRU/softlimit
      management.
      
      This patch adds a new struct memcg_scanrecord into scan_control struct.
      sc->nr_scanned at el is not designed for exporting information.  For
      example, nr_scanned is reset frequentrly and incremented +2 at scanning
      mapped pages.
      
      To avoid complexity, I added a new param in scan_control which is for
      exporting scanning score.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Cc: Andrew Bresticker <abrestic@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82f9d486
    • K
      memcg: consolidate memory cgroup lru stat functions · bb2a0de9
      KAMEZAWA Hiroyuki 提交于
      In mm/memcontrol.c, there are many lru stat functions as..
      
        mem_cgroup_zone_nr_lru_pages
        mem_cgroup_node_nr_file_lru_pages
        mem_cgroup_nr_file_lru_pages
        mem_cgroup_node_nr_anon_lru_pages
        mem_cgroup_nr_anon_lru_pages
        mem_cgroup_node_nr_unevictable_lru_pages
        mem_cgroup_nr_unevictable_lru_pages
        mem_cgroup_node_nr_lru_pages
        mem_cgroup_nr_lru_pages
        mem_cgroup_get_local_zonestat
      
      Some of them are under #ifdef MAX_NUMNODES >1 and others are not.
      This seems bad. This patch consolidates all functions into
      
        mem_cgroup_zone_nr_lru_pages()
        mem_cgroup_node_nr_lru_pages()
        mem_cgroup_nr_lru_pages()
      
      For these functions, "which LRU?" information is passed by a mask.
      
      example:
        mem_cgroup_nr_lru_pages(mem, BIT(LRU_ACTIVE_ANON))
      
      And I added some macro as ALL_LRU, ALL_LRU_FILE, ALL_LRU_ANON.
      
      example:
        mem_cgroup_nr_lru_pages(mem, ALL_LRU)
      
      BTW, considering layout of NUMA memory placement of counters, this patch seems
      to be better.
      
      Now, when we gather all LRU information, we scan in following orer
          for_each_lru -> for_each_node -> for_each_zone.
      
      This means we'll touch cache lines in different node in turn.
      
      After patch, we'll scan
          for_each_node -> for_each_zone -> for_each_lru(mask)
      
      Then, we'll gather information in the same cacheline at once.
      
      [akpm@linux-foundation.org: fix warnigns, build error]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb2a0de9
    • K
      memcg: export memory cgroup's swappiness with mem_cgroup_swappiness() · 1f4c025b
      KAMEZAWA Hiroyuki 提交于
      Each memory cgroup has a 'swappiness' value which can be accessed by
      get_swappiness(memcg).  The major user is try_to_free_mem_cgroup_pages()
      and swappiness is passed by argument.  It's propagated by scan_control.
      
      get_swappiness() is a static function but some planned updates will need
      to get swappiness from files other than memcontrol.c This patch exports
      get_swappiness() as mem_cgroup_swappiness().  With this, we can remove the
      argument of swapiness from try_to_free...  and drop swappiness from
      scan_control.  only memcg uses it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f4c025b
    • S
      libceph: don't time out osd requests that haven't been received · 4cf9d544
      Sage Weil 提交于
      Keep track of when an outgoing message is ACKed (i.e., the server fully
      received it and, presumably, queued it for processing).  Time out OSD
      requests only if it's been too long since they've been received.
      
      This prevents timeouts and connection thrashing when the OSDs are simply
      busy and are throttling the requests they read off the network.
      Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
      Signed-off-by: NSage Weil <sage@newdream.net>
      4cf9d544
  2. 26 7月, 2011 29 次提交
  3. 25 7月, 2011 1 次提交
  4. 24 7月, 2011 6 次提交
    • T
      VFS : mount lock scalability for internal mounts · 423e0ab0
      Tim Chen 提交于
      For a number of file systems that don't have a mount point (e.g. sockfs
      and pipefs), they are not marked as long term. Therefore in
      mntput_no_expire, all locks in vfs_mount lock are taken instead of just
      local cpu's lock to aggregate reference counts when we release
      reference to file objects.  In fact, only local lock need to have been
      taken to update ref counts as these file systems are in no danger of
      going away until we are ready to unregister them.
      
      The attached patch marks file systems using kern_mount without
      mount point as long term.  The contentions of vfs_mount lock
      is now eliminated.  Before un-registering such file system,
      kern_unmount should be called to remove the long term flag and
      make the mount point ready to be freed.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      423e0ab0
    • K
      module: add /sys/module/<name>/uevent files · 88bfa324
      Kay Sievers 提交于
      Userspace wants to manage module parameters with udev rules.
      This currently only works for loaded modules, but not for
      built-in ones.
      
      To allow access to the built-in modules we need to
      re-trigger all module load events that happened before any
      userspace was running. We already do the same thing for all
      devices, subsystems(buses) and drivers.
      
      This adds the currently missing /sys/module/<name>/uevent files
      to all module entries.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (split & trivial fix)
      88bfa324
    • K
      module: change attr callbacks to take struct module_kobject · 4befb026
      Kay Sievers 提交于
      This simplifies the next patch, where we have an attribute on a
      builtin module (ie. module == NULL).
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (split into 2)
      4befb026
    • J
      modules: add default loader hook implementations · 74e08fcf
      Jonas Bonn 提交于
      The module loader code allows architectures to hook into the code by
      providing a small number of entry points that each arch must implement.
      This patch provides __weakly linked generic implementations of these
      entry points for architectures that don't need to do anything special.
      Signed-off-by: NJonas Bonn <jonas@southpole.se>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      74e08fcf
    • X
      KVM: MMU: filter out the mmio pfn from the fault pfn · fce92dce
      Xiao Guangrong 提交于
      If the page fault is caused by mmio, the gfn can not be found in memslots, and
      'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify
      the mmio page fault.
      And, to clarify the meaning of mmio pfn, we return fault page instead of bad
      page when the gfn is not allowd to prefetch
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      fce92dce
    • J
      ata: Add and use ata_print_version_once · 06296a1e
      Joe Perches 提交于
      Use a single mechanism to show driver version.
      Reduces text a tiny bit too.
      
      Remove uses of static int printed_version
      Add and use ata_print_version(const struct device *, const char *ver)
      and ata_print_version_once.
      
      $ size drivers/ata/built-in.*
         text	   data	    bss	    dec	    hex	filename
       544969	  73893	 116584	 735446	  b38d6	drivers/ata/built-in.allyesconfig.ata.o
       543870	  73893	 116592	 734355	  b34ad	drivers/ata/built-in.allyesconfig.print_once.o
       141328	  14689	   4220	 160237	  271ed	drivers/ata/built-in.defconfig.ata.o
       141212	  14689	   4220	 160121	  27179	drivers/ata/built-in.defconfig.print_once.o
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NJeff Garzik <jgarzik@pobox.com>
      06296a1e