1. 29 9月, 2014 1 次提交
  2. 06 9月, 2014 3 次提交
    • S
      clocksource: sh_tmu: Document r8a7779 binding · fb0eee2f
      Simon Horman 提交于
      In general Renesas hardware is not documented to the extent
      where the relationship between IP blocks on different SoCs can be assumed
      although they may appear to operate the same way. Furthermore the
      documentation typically does not specify a version for individual
      IP blocks. For these reasons a convention of using the SoC name in place
      of a version and providing SoC-specific compat strings has been adopted.
      
      Although not universally liked this convention is used in the bindings
      for a number of drivers for Renesas hardware. The purpose of this patch is
      to update the Renesas R-Car Timer Unit (TMU) driver to follow this
      convention.
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Acked-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      
      ---
      * I plan to follow up with a patch patch to use the new binding in the
        dtsi files for the r8a7779 SoC.
      commit 471269b790aec03385dc4fb127ed7094ff83c16d
      
      v2
      * Suggestions by Mark Rutland and Sergei Shtylyov
        - Compatible strings should be "one or more" not "one" of those listed
        - Describe the generic binding as covering any MTU2 device
        - Re-order compat strings from most to least specific
      
      v3
      * Suggested by Laurent Pinchart
        - Reword in keeping with a similar though more extensive patch for CMT
      fb0eee2f
    • S
      clocksource: sh_mtu2: Document r7s72100 binding · ffd24a54
      Simon Horman 提交于
      In general Renesas hardware is not documented to the extent
      where the relationship between IP blocks on different SoCs can be assumed
      although they may appear to operate the same way. Furthermore the
      documentation typically does not specify a version for individual
      IP blocks. For these reasons a convention of using the SoC name in place
      of a version and providing SoC-specific compat strings has been adopted.
      
      Although not universally liked this convention is used in the bindings
      for a number of drivers for Renesas hardware. The purpose of this patch is
      to update the Renesas R-Car Multi-Function Timer Pulse Unit 2 (MTU2) driver
      to follow this convention.
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Acked-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      
      ---
      * I plan to follow up with a patch patch to use the new binding in the
        dtsi files for the r7s72100 SoC.
      
      v2
      * Suggestions by Mark Rutland and Sergei Shtylyov
        - Compatible strings should be "one or more" not "one" of those listed
        - Describe the generic binding as covering any MTU2 device
        - Re-order compat strings from most to least specific
      
      v3
      * Suggested by Laurent Pinchart
        - Reword compat documentation for consistency with a more extensive
          CMT change
      ffd24a54
    • S
      clocksource: sh_cmt: Document SoC specific bindings · 01fe3aaa
      Simon Horman 提交于
      In general Renesas hardware is not documented to the extent
      where the relationship between IP blocks on different SoCs can be assumed
      although they may appear to operate the same way. Furthermore the
      documentation typically does not specify a version for individual
      IP blocks. For these reasons a convention of using the SoC name in place
      of a version and providing SoC-specific compat strings has been adopted.
      
      Although not universally liked this convention is used in the bindings for
      a number of drivers for Renesas hardware. The purpose of this patch is to
      update the Renesas R-Car Compare Match Timer (CMT) driver to follow this
      convention.
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Acked-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      
      ---
      * I plan to follow up with patches to use these new bindings in the
        dtsi files for the affected SoCs.
      
      v2
      * Reorder compat entries so more-specific entries and their fallbacks
        are grouped with the fallback entry coming last.
      * Explicitly document fallback
      
      v3
      * Avoid circular dependency in documentation of fallback
        behaviour of renesas,cmt-48-gen2
      * Use consistent case for SoC names in compat string descriptions
      01fe3aaa
  3. 16 8月, 2014 2 次提交
  4. 15 8月, 2014 1 次提交
  5. 14 8月, 2014 1 次提交
  6. 12 8月, 2014 2 次提交
  7. 11 8月, 2014 2 次提交
  8. 10 8月, 2014 4 次提交
  9. 09 8月, 2014 13 次提交
    • J
      panic: add TAINT_SOFTLOCKUP · 69361eef
      Josh Hunt 提交于
      This taint flag will be set if the system has ever entered a softlockup
      state.  Similar to TAINT_WARN it is useful to know whether or not the
      system has been in a softlockup state when debugging.
      
      [akpm@linux-foundation.org: apply the taint before calling panic()]
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69361eef
    • A
      rapidio/tsi721_dma: rework scatter-gather list handling · 50835e97
      Alexandre Bounine 提交于
      Rework Tsi721 RapidIO DMA engine support to allow handling data
      scatter/gather lists longer than number of hardware buffer descriptors in
      the DMA channel's descriptor list.
      
      The current implementation of Tsi721 DMA transfers requires that number of
      entries in a scatter/gather list provided by a caller of
      dmaengine_prep_rio_sg() should not exceed number of allocated hardware
      buffer descriptors.
      
      This patch removes the limitation by processing long scatter/gather lists
      by sections that can be transferred using hardware descriptor ring of
      configured size.  It also introduces a module parameter
      "dma_desc_per_channel" to allow run-time configuration of Tsi721 hardware
      buffer descriptor rings.
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
      Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50835e97
    • V
      nilfs2: add /sys/fs/nilfs2/<device>/mounted_snapshots/<snapshot> group · a5a7332a
      Vyacheslav Dubeyko 提交于
      This patch adds creation of <snapshot> group for every mounted
      snapshot in /sys/fs/nilfs2/<device>/mounted_snapshots group.
      
      The group contains details about mounted snapshot:
      (1) inodes_count - show number of inodes for snapshot.
      (2) blocks_count - show number of blocks for snapshot.
      Signed-off-by: NVyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Michael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5a7332a
    • V
      nilfs2: add /sys/fs/nilfs2/<device>/mounted_snapshots group · a2ecb791
      Vyacheslav Dubeyko 提交于
      This patch adds creation of /sys/fs/nilfs2/<device>/mounted_snapshots
      group.
      
      The mounted_snapshots group contains group for every
      mounted snapshot.
      Signed-off-by: NVyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Michael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2ecb791
    • V
      nilfs2: add /sys/fs/nilfs2/<device>/checkpoints group · 02a0ba1c
      Vyacheslav Dubeyko 提交于
      This patch adds creation of /sys/fs/nilfs2/<device>/checkpoints
      group.
      
      The checkpoints group contains attributes that describe
      details about volume's checkpoints:
      (1) checkpoints_number - show number of checkpoints on volume.
      (2) snapshots_number - show number of snapshots on volume.
      (3) last_seg_checkpoint - show checkpoint number of the latest segment.
      (4) next_checkpoint - show next checkpoint number.
      Signed-off-by: NVyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Michael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02a0ba1c
    • V
      nilfs2: add /sys/fs/nilfs2/<device>/segments group · ef43d5cd
      Vyacheslav Dubeyko 提交于
      This patch adds creation of /sys/fs/nilfs2/<device>/segments
      group.
      
      The segments group contains attributes that describe
      details about volume's segments:
      (1) segments_number - show number of segments on volume.
      (2) blocks_per_segment - show number of blocks in segment.
      (3) clean_segments - show count of clean segments.
      (4) dirty_segments - show count of dirty segments.
      Signed-off-by: NVyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Michael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef43d5cd
    • V
      nilfs2: add /sys/fs/nilfs2/<device>/segctor group · abc968db
      Vyacheslav Dubeyko 提交于
      This patch adds creation of /sys/fs/nilfs2/<device>/segctor
      group.
      
      The segctor group contains attributes that describe
      segctor thread activity details:
      (1) last_pseg_block - show start block number of the latest segment.
      (2) last_seg_sequence - show sequence value of the latest segment.
      (3) last_seg_checkpoint - show checkpoint number of the latest segment.
      (4) current_seg_sequence - show segment sequence counter.
      (5) current_last_full_seg - show index number of the latest full segment.
      (6) next_full_seg - show index number of the full segment index
      to be used next.
      (7) next_pseg_offset - show offset of next partial segment in
      the current full segment.
      (8) next_checkpoint - show next checkpoint number.
      (9) last_seg_write_time - show write time of the last segment
      in human-readable format.
      (10) last_seg_write_time_secs - show write time of the last segment
      in seconds.
      (11) last_nongc_write_time - show write time of the last segment
      not for cleaner operation in human-readable format.
      (12) last_nongc_write_time_secs - show write time of the last segment
      not for cleaner operation in seconds.
      (13) dirty_data_blocks_count - show number of dirty data blocks.
      Signed-off-by: NVyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Michael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abc968db
    • V
      nilfs2: add /sys/fs/nilfs2/<device>/superblock group · caa05d49
      Vyacheslav Dubeyko 提交于
      This patch adds creation of /sys/fs/nilfs2/<device>/superblock
      group.
      
      The superblock group contains attributes that describe
      superblock's details:
      (1) sb_write_time - show previous write time of super block in
      human-readable format.
      (2) sb_write_time_secs - show previous write time of super block
      in seconds.
      (3) sb_write_count - show write count of super block.
      (4) sb_update_frequency - show/set interval of periodical update
      of superblock (in seconds). You can set preferable frequency of
      superblock update by command:
      
      echo <value> > /sys/fs/nilfs2/<device>/superblock/sb_update_frequency
      Signed-off-by: NVyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Michael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      caa05d49
    • V
      nilfs2: add /sys/fs/nilfs2/<device> group · da7141fb
      Vyacheslav Dubeyko 提交于
      This patch adds creation of /sys/fs/nilfs2/<device> group.
      
      The <device> group contains attributes that describe file
      system partition's details:
      (1) revision - show NILFS file system revision.
      (2) blocksize - show volume block size in bytes.
      (3) device_size - show volume size in bytes.
      (4) free_blocks - show count of free blocks on volume.
      (5) uuid - show volume's UUID.
      (6) volume_name - show volume's name.
      Signed-off-by: NVyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Michael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da7141fb
    • V
      nilfs2: add /sys/fs/nilfs2/features group · aebe17f6
      Vyacheslav Dubeyko 提交于
      This patchset implements creation of sysfs groups and attributes with
      the purpose to show NILFS2 volume details, internal state of the driver
      and to manage internal state of NILFS2 driver.
      
      Sysfs is a virtual file system that exports information about devices
      and drivers from the kernel device model to user space, and is also used
      for configuration.  NILFS2 is a complex file system that has segctor
      thread, GC thread, checkpoint/snapshot model and so on.  Sysfs namespace
      provides native and easy way for: (1) getting info and statistics about
      volume state; (2) getting info and configuration of internal subsystems
      (segctor thread); (3) snapshots management.
      
      Suggested patchset provides basis for managing segctor thread behaviour
      and manipulation by snapshots.  Currently, it informs only about segctor
      thread's internal parameters and about mounted snapshots.  But sysfs
      interface can provide easy and simple way for deep management of segctor
      thread and snapshots.
      
      This patchset provides opportunity to manage interval of periodical
      update of superblock (in seconds).  Default value is 10 seconds.  Now a
      user can increase this value by means of
      nilfs2/<device>/superblock/sb_update_frequency attribute in the case of
      necessity.
      
      Also the patchset provides opportunity to get information easily about
      key volumes's parameters (free blocks, superblock write count,
      superblock update frequency, latest segment info, dirty data blocks
      count, count of clean segments, count of dirty segments and so on) in
      real time manner.  Such information can be used in scripts for subtle
      management of filesystem.
      
      Implemented functionality creates such groups:
      (1) /sys/fs/nilfs2 - root group
      (2) /sys/fs/nilfs2/features - group contains attributes that describe NILFS
      file system driver features
      (3) /sys/fs/nilfs2/<device> - group contains attributes that describe file
      system partition's details
      (4) /sys/fs/nilfs2/<device>/superblock - group contains attributes that describe
      superblock's details
      (5) /sys/fs/nilfs2/<device>/segctor - group contains attributes that describe
      segctor thread activity details
      (6) /sys/fs/nilfs2/<device>/segments - group contains attributes that describe
      details about volume's segments
      (7) /sys/fs/nilfs2/<device>/checkpoints - group contains attributes that describe
      details about volume's checkpoints
      (8) /sys/fs/nilfs2/<device>/mounted_snapshots - group contains group for every
      mounted snapshot
      (9) /sys/fs/nilfs2/<device>/mounted_snapshots/<snapshot> - group contains
      details about mounted snapshot
      
      This patch (of 9):
      
      This patch adds code of creation /sys/fs/nilfs2 group and
      /sys/fs/nilfs2/features group.
      
      The features group contains attributes that describe NILFS
      file system driver features:
      (1) revision - show current revision of NILFS file system driver.
      
      There are two formats of timestamp output - seconds and human-readable
      format.  Every showed timestamp has two sysfs files (time-<xxx> and
      time-<xxx>-secs).  One sysfs file (time-<xxx>) shows time in
      human-readable format.  Another sysfs file (time-<xxx>-secs) shows time in
      seconds.
      
      It was reported by Michael Semon that timestamp output in human-readable
      format should be changed from "2014-4-12 14:5:38" to "2014-04-12
      14:05:38".  Second version of the patch fixes this issue.
      Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NVyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aebe17f6
    • S
      rtc: add pcf85063 support · 796b7abb
      Søren Andersen 提交于
      Add support for the pcf85063 rtc chip.
      
      [akpm@linux-foundation.org: fix comment typo, tweak conding style]
      Signed-off-by: NSoeren Andersen <san@rosetechnology.dk>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      796b7abb
    • J
      mm: memcontrol: rewrite uncharge API · 0a31bc97
      Johannes Weiner 提交于
      The memcg uncharging code that is involved towards the end of a page's
      lifetime - truncation, reclaim, swapout, migration - is impressively
      complicated and fragile.
      
      Because anonymous and file pages were always charged before they had their
      page->mapping established, uncharges had to happen when the page type
      could still be known from the context; as in unmap for anonymous, page
      cache removal for file and shmem pages, and swap cache truncation for swap
      pages.  However, these operations happen well before the page is actually
      freed, and so a lot of synchronization is necessary:
      
      - Charging, uncharging, page migration, and charge migration all need
        to take a per-page bit spinlock as they could race with uncharging.
      
      - Swap cache truncation happens during both swap-in and swap-out, and
        possibly repeatedly before the page is actually freed.  This means
        that the memcg swapout code is called from many contexts that make
        no sense and it has to figure out the direction from page state to
        make sure memory and memory+swap are always correctly charged.
      
      - On page migration, the old page might be unmapped but then reused,
        so memcg code has to prevent untimely uncharging in that case.
        Because this code - which should be a simple charge transfer - is so
        special-cased, it is not reusable for replace_page_cache().
      
      But now that charged pages always have a page->mapping, introduce
      mem_cgroup_uncharge(), which is called after the final put_page(), when we
      know for sure that nobody is looking at the page anymore.
      
      For page migration, introduce mem_cgroup_migrate(), which is called after
      the migration is successful and the new page is fully rmapped.  Because
      the old page is no longer uncharged after migration, prevent double
      charges by decoupling the page's memcg association (PCG_USED and
      pc->mem_cgroup) from the page holding an actual charge.  The new bits
      PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
      to the new page during migration.
      
      mem_cgroup_migrate() is suitable for replace_page_cache() as well,
      which gets rid of mem_cgroup_replace_page_cache().  However, care
      needs to be taken because both the source and the target page can
      already be charged and on the LRU when fuse is splicing: grab the page
      lock on the charge moving side to prevent changing pc->mem_cgroup of a
      page under migration.  Also, the lruvecs of both pages change as we
      uncharge the old and charge the new during migration, and putback may
      race with us, so grab the lru lock and isolate the pages iff on LRU to
      prevent races and ensure the pages are on the right lruvec afterward.
      
      Swap accounting is massively simplified: because the page is no longer
      uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
      transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
      before the final put_page() in page reclaim.
      
      Finally, page_cgroup changes are now protected by whatever protection the
      page itself offers: anonymous pages are charged under the page table lock,
      whereas page cache insertions, swapin, and migration hold the page lock.
      Uncharging happens under full exclusion with no outstanding references.
      Charging and uncharging also ensure that the page is off-LRU, which
      serializes against charge migration.  Remove the very costly page_cgroup
      lock and set pc->flags non-atomically.
      
      [mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
      [vdavydov@parallels.com: fix flags definition]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Tested-by: NJet Chen <jet.chen@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Tested-by: NFelipe Balbi <balbi@ti.com>
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a31bc97
    • J
      mm: memcontrol: rewrite charge API · 00501b53
      Johannes Weiner 提交于
      These patches rework memcg charge lifetime to integrate more naturally
      with the lifetime of user pages.  This drastically simplifies the code and
      reduces charging and uncharging overhead.  The most expensive part of
      charging and uncharging is the page_cgroup bit spinlock, which is removed
      entirely after this series.
      
      Here are the top-10 profile entries of a stress test that reads a 128G
      sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
       executing in the root memcg).  Before:
      
          15.36%              cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.31%              cat  [kernel.kallsyms]   [k] memset
          11.48%              cat  [kernel.kallsyms]   [k] do_mpage_readpage
           4.23%              cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.38%              cat  [kernel.kallsyms]   [k] put_page
           2.32%              cat  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge
           2.18%          kswapd0  [kernel.kallsyms]   [k] __mem_cgroup_uncharge_common
           1.92%          kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.86%              cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.62%              cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
      
      After:
      
          15.67%           cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.48%           cat  [kernel.kallsyms]   [k] memset
          11.42%           cat  [kernel.kallsyms]   [k] do_mpage_readpage
           3.98%           cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.46%           cat  [kernel.kallsyms]   [k] put_page
           2.13%       kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.88%           cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.67%           cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
           1.39%       kswapd0  [kernel.kallsyms]   [k] free_pcppages_bulk
           1.30%           cat  [kernel.kallsyms]   [k] kfree
      
      As you can see, the memcg footprint has shrunk quite a bit.
      
         text    data     bss     dec     hex filename
        37970    9892     400   48262    bc86 mm/memcontrol.o.old
        35239    9892     400   45531    b1db mm/memcontrol.o
      
      This patch (of 4):
      
      The memcg charge API charges pages before they are rmapped - i.e.  have an
      actual "type" - and so every callsite needs its own set of charge and
      uncharge functions to know what type is being operated on.  Worse,
      uncharge has to happen from a context that is still type-specific, rather
      than at the end of the page's lifetime with exclusive access, and so
      requires a lot of synchronization.
      
      Rewrite the charge API to provide a generic set of try_charge(),
      commit_charge() and cancel_charge() transaction operations, much like
      what's currently done for swap-in:
      
        mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
        pages from the memcg if necessary.
      
        mem_cgroup_commit_charge() commits the page to the charge once it
        has a valid page->mapping and PageAnon() reliably tells the type.
      
        mem_cgroup_cancel_charge() aborts the transaction.
      
      This reduces the charge API and enables subsequent patches to
      drastically simplify uncharging.
      
      As pages need to be committed after rmap is established but before they
      are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
      additions again.  Revive lru_cache_add_active_or_unevictable().
      
      [hughd@google.com: fix shmem_unuse]
      [hughd@google.com: Add comments on the private use of -EAGAIN]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00501b53
  10. 08 8月, 2014 4 次提交
  11. 07 8月, 2014 4 次提交
    • L
    • K
      list: fix order of arguments for hlist_add_after(_rcu) · 1d023284
      Ken Helias 提交于
      All other add functions for lists have the new item as first argument
      and the position where it is added as second argument.  This was changed
      for no good reason in this function and makes using it unnecessary
      confusing.
      
      The name was changed to hlist_add_behind() to cause unconverted code to
      generate a compile error instead of using the wrong parameter order.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKen Helias <kenhelias@firemail.de>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[intel driver bits]
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d023284
    • L
      printk: allow increasing the ring buffer depending on the number of CPUs · 23b2899f
      Luis R. Rodriguez 提交于
      The default size of the ring buffer is too small for machines with a
      large amount of CPUs under heavy load.  What ends up happening when
      debugging is the ring buffer overlaps and chews up old messages making
      debugging impossible unless the size is passed as a kernel parameter.
      An idle system upon boot up will on average spew out only about one or
      two extra lines but where this really matters is on heavy load and that
      will vary widely depending on the system and environment.
      
      There are mechanisms to help increase the kernel ring buffer for tracing
      through debugfs, and those interfaces even allow growing the kernel ring
      buffer per CPU.  We also have a static value which can be passed upon
      boot.  Relying on debugfs however is not ideal for production, and
      relying on the value passed upon bootup is can only used *after* an
      issue has creeped up.  Instead of being reactive this adds a proactive
      measure which lets you scale the amount of contributions you'd expect to
      the kernel ring buffer under load by each CPU in the worst case
      scenario.
      
      We use num_possible_cpus() to avoid complexities which could be
      introduced by dynamically changing the ring buffer size at run time,
      num_possible_cpus() lets us use the upper limit on possible number of
      CPUs therefore avoiding having to deal with hotplugging CPUs on and off.
      This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT
      which is used to specify the maximum amount of contributions to the
      kernel ring buffer in the worst case before the kernel ring buffer flips
      over, the size is specified as a power of 2.  The total amount of
      contributions made by each CPU must be greater than half of the default
      kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger
      an increase upon bootup.  The kernel ring buffer is increased to the
      next power of two that would fit the required minimum kernel ring buffer
      size plus the additional CPU contribution.  For example if LOG_BUF_SHIFT
      is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs
      in order to trigger an increase of the kernel ring buffer.  With a
      LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64
      possible CPUs to trigger an increase.  If you had 128 possible CPUs the
      amount of minimum required kernel ring buffer bumps to:
      
         ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
      
      Since we require the ring buffer to be a power of two the new required
      size would be 1024 KB.
      
      This CPU contributions are ignored when the "log_buf_len" kernel
      parameter is used as it forces the exact size of the ring buffer to an
      expected power of two value.
      
      [pmladek@suse.cz: fix build]
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Tested-by: NPetr Mladek <pmladek@suse.cz>
      Reviewed-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Arun KS <arunks.linux@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23b2899f
    • C
      mm: trace-vmscan-postprocess.pl: report the number of file/anon pages respectively · 2c51856c
      Chen Yucong 提交于
      Until now, the reporting from trace-vmscan-postprocess.pl is not very
      useful because we cannot directly use this script for checking the
      file/anon ratio of scanning.  This patch aims to report respectively the
      number of file/anon pages which were scanned/reclaimed by kswapd or
      direct-reclaim.  Sample output is usually something like the following.
      
      Summary
      Direct reclaims:                          8823
      Direct reclaim pages scanned:             2438797
      Direct reclaim file pages scanned:        1315200
      Direct reclaim anon pages scanned:        1123597
      Direct reclaim pages reclaimed:           446139
      Direct reclaim file pages reclaimed:      378668
      Direct reclaim anon pages reclaimed:      67471
      Direct reclaim write file sync I/O:       0
      Direct reclaim write anon sync I/O:       0
      Direct reclaim write file async I/O:      0
      Direct reclaim write anon async I/O:      4240
      Wake kswapd requests:                     122310
      Time stalled direct reclaim:              13.78 seconds
      
      Kswapd wakeups:                           25817
      Kswapd pages scanned:                     170779115
      Kswapd file pages scanned:                162725123
      Kswapd anon pages scanned:                8053992
      Kswapd pages reclaimed:                   129065738
      Kswapd file pages reclaimed:              128500930
      Kswapd anon pages reclaimed:              564808
      Kswapd reclaim write file sync I/O:       0
      Kswapd reclaim write anon sync I/O:       0
      Kswapd reclaim write file async I/O:      36
      Kswapd reclaim write anon async I/O:      730730
      Time kswapd awake:                        1015.50 seconds
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c51856c
  12. 06 8月, 2014 3 次提交