1. 05 6月, 2014 8 次提交
    • D
      Documentation: expand/clarify debug documentation · 6e099f55
      Dan Streetman 提交于
      The pr_debug() and related debug print macros all differ from the normal
      pr_XXX() macros, in that the normal ones print unconditionally, while
      the debug macros are compiled out unless DEBUG is defined or
      CONFIG_DYNAMIC_DEBUG is set.  This isn't obvious, and the only way to
      find this out is either to review the actual printk.h code or to read
      CodingStyle, and the message there doesn't highlight the fact.
      
      Change Documentation/CodingStyle to clearly indicate that pr_debug() and
      related debug printing macros behave differently than all other pr_XXX()
      macros, and attempt to clarify when and where the different debug
      printing methods might be used.
      
      Add short comment to printk.h above the pr_XXX() macros indicating that
      while these macros print unconditionally, pr_debug() does not.
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e099f55
    • D
      Documentation/sysctl/vm.txt: clarify vfs_cache_pressure description · 4a0da71b
      Denys Vlasenko 提交于
      Existing description is worded in a way which almost encourages setting of
      vfs_cache_pressure above 100, possibly way above it.
      
      Users are left in a dark what this numeric value is - an int?  a
      percentage?  what the scale is?
      
      As a result, we are getting reports about noticeable performance
      degradation from users who have set vfs_cache_pressure to ridiculously
      high values - because they thought there is no downside to it.
      
      Via code inspection it's obvious that this value is treated as a
      percentage.  This patch changes text to reflect this fact, and adds a
      cautionary paragraph advising against setting vfs_cache_pressure sky high.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a0da71b
    • N
      mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) · 3ba08129
      Naoya Horiguchi 提交于
      Currently memory error handler handles action optional errors in the
      deferred manner by default.  And if a recovery aware application wants
      to handle it immediately, it can do it by setting PF_MCE_EARLY flag.
      However, such signal can be sent only to the main thread, so it's
      problematic if the application wants to have a dedicated thread to
      handler such signals.
      
      So this patch adds dedicated thread support to memory error handler.  We
      have PF_MCE_EARLY flags for each thread separately, so with this patch
      AO signal is sent to the thread with PF_MCE_EARLY flag set, not the main
      thread.  If you want to implement a dedicated thread, you call prctl()
      to set PF_MCE_EARLY on the thread.
      
      Memory error handler collects processes to be killed, so this patch lets
      it check PF_MCE_EARLY flag on each thread in the collecting routines.
      
      No behavioral change for all non-early kill cases.
      
      Tony said:
      
      : The old behavior was crazy - someone with a multithreaded process might
      : well expect that if they call prctl(PF_MCE_EARLY) in just one thread, then
      : that thread would see the SIGBUS with si_code = BUS_MCEERR_A0 - even if
      : that thread wasn't the main thread for the process.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: Kamil Iskra <iskra@mcs.anl.gov>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chen Gong <gong.chen@linux.jf.intel.com>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ba08129
    • V
      Documentation/memcg: warn about incomplete kmemcg state · 2ee06468
      Vladimir Davydov 提交于
      Kmemcg is currently under development and lacks some important features.
      In particular, it does not have support of kmem reclaim on memory pressure
      inside cgroup, which practically makes it unusable in real life.  Let's
      warn about it in both Kconfig and Documentation to prevent complaints
      arising.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ee06468
    • M
      mm: disable zone_reclaim_mode by default · 4f9b16a6
      Mel Gorman 提交于
      When it was introduced, zone_reclaim_mode made sense as NUMA distances
      punished and workloads were generally partitioned to fit into a NUMA
      node.  NUMA machines are now common but few of the workloads are
      NUMA-aware and it's routine to see major performance degradation due to
      zone_reclaim_mode being enabled but relatively few can identify the
      problem.
      
      Those that require zone_reclaim_mode are likely to be able to detect
      when it needs to be enabled and tune appropriately so lets have a
      sensible default for the bulk of users.
      
      This patch (of 2):
      
      zone_reclaim_mode causes processes to prefer reclaiming memory from
      local node instead of spilling over to other nodes.  This made sense
      initially when NUMA machines were almost exclusively HPC and the
      workload was partitioned into nodes.  The NUMA penalties were
      sufficiently high to justify reclaiming the memory.  On current machines
      and workloads it is often the case that zone_reclaim_mode destroys
      performance but not all users know how to detect this.  Favour the
      common case and disable it by default.  Users that are sophisticated
      enough to know they need zone_reclaim_mode will detect it.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f9b16a6
    • L
      memory-hotplug: update documentation to hide information about SECTIONS and remove end_phys_index · 56a3c655
      Li Zhong 提交于
      Seems we all agree that information about SECTION, e.g. section size,
      sections per memory block should be kept as kernel internals, and not
      exposed to userspace.
      
      This patch updates Documentation/memory-hotplug.txt to refer to memory
      blocks instead of memory sections where appropriate and added a
      paragraph to explain that memory blocks are made of memory sections.
      The documentation update is mostly provided by Nathan.
      
      Also, as end_phys_index in code is actually not the end section id, but
      the end memory block id, which should always be the same as phys_index.
      So it is removed here.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56a3c655
    • J
      mm: memcontrol: remove hierarchy restrictions for swappiness and oom_control · 3dae7fec
      Johannes Weiner 提交于
      Per-memcg swappiness and oom killing can currently not be tweaked on a
      memcg that is part of a hierarchy, but not the root of that hierarchy.
      Users have complained that they can't configure this when they turned on
      hierarchy mode.  In fact, with hierarchy mode becoming the default, this
      restriction disables the tunables entirely.
      
      But there is no good reason for this restriction.  The settings for
      swappiness and OOM killing are taken from whatever memcg whose limit
      triggered reclaim and OOM invocation, regardless of its position in the
      hierarchy tree.
      
      Allow setting swappiness on any group.  The knob on the root memcg
      already reads the global VM swappiness, make it writable as well.
      
      Allow disabling the OOM killer on any non-root memcg.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3dae7fec
    • A
      cma: add placement specifier for "cma=" kernel parameter · 5ea3b1b2
      Akinobu Mita 提交于
      Currently, "cma=" kernel parameter is used to specify the size of CMA,
      but we can't specify where it is located.  We want to locate CMA below
      4GB for devices only supporting 32-bit addressing on 64-bit systems
      without iommu.
      
      This enables to specify the placement of CMA by extending "cma=" kernel
      parameter.
      
      Examples:
       1. locate 64MB CMA below 4GB by "cma=64M@0-4G"
       2. locate 64MB CMA exact at 512MB by "cma=64M@512M"
      
      Note that the DMA contiguous memory allocator on x86 assumes that
      page_address() works for the pages to allocate.  So this change requires
      to limit end address of contiguous memory area upto max_pfn_mapped to
      prevent from locating it on highmem area by the argument of
      dma_contiguous_reserve().
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ea3b1b2
  2. 01 6月, 2014 1 次提交
    • L
      ACPI: Fix x86 regression related to early mapping size limitation · 4fc0a7e8
      Lv Zheng 提交于
      The following warning message is triggered:
       WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:136 __early_ioremap+0x11f/0x1f2()
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-00017-g86dfc6f3-dirty #298
       Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x036.091920111209 09/19/2011
        0000000000000009 ffffffff81b75c40 ffffffff817c627b 0000000000000000
        ffffffff81b75c78 ffffffff81067b5d 000000000000007b 8000000000000563
        00000000b96b20dc 0000000000000001 ffffffffff300e0c ffffffff81b75c88
       Call Trace:
        [<ffffffff817c627b>] dump_stack+0x45/0x56
        [<ffffffff81067b5d>] warn_slowpath_common+0x7d/0xa0
        [<ffffffff81067c3a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81d4b9d5>] __early_ioremap+0x11f/0x1f2
        [<ffffffff81d4bc5b>] early_ioremap+0x13/0x15
        [<ffffffff81d2b8f3>] __acpi_map_table+0x13/0x18
        [<ffffffff817b8d1a>] acpi_os_map_memory+0x26/0x14e
        [<ffffffff813ff018>] acpi_tb_acquire_table+0x42/0x70
        [<ffffffff813ff086>] acpi_tb_validate_table+0x27/0x37
        [<ffffffff813ff0e5>] acpi_tb_verify_table+0x22/0xd8
        [<ffffffff813ff6a8>] acpi_tb_install_non_fixed_table+0x60/0x1c9
        [<ffffffff81d61024>] acpi_tb_parse_root_table+0x218/0x26a
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d610cd>] acpi_initialize_tables+0x57/0x59
        [<ffffffff81d5f25d>] acpi_table_init+0x1b/0x99
        [<ffffffff81d2bca0>] acpi_boot_table_init+0x1e/0x85
        [<ffffffff81d23043>] setup_arch+0x99d/0xcc6
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d1bbbe>] start_kernel+0x8b/0x415
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d1b5ee>] x86_64_start_reservations+0x2a/0x2c
        [<ffffffff81d1b72e>] x86_64_start_kernel+0x13e/0x14d
       ---[ end trace 11ae599a1898f4e7 ]---
      when installing the following table during early stage:
       ACPI: SSDT 0x00000000B9638018 07A0C4 (v02 INTEL  S2600CP  00004000 INTL 20100331)
      The regression is caused by the size limitation of the x86 early IO mapping.
      
      The root cause is:
       1. ACPICA doesn't split IO memory mapping and table mapping;
       2. Linux x86 OSL implements acpi_os_map_memory() using a size limited fix-map
          mechanism during early boot stage, which is more suitable for only IO
          mappings.
      
      This patch fixes this issue by utilizing acpi_gbl_verify_table_checksum to
      disable the table mapping during early stage and enabling it again for the
      late stage. In this way, the normal code path is not affected. Then after
      the code related to the root cause is cleaned up, the early checksum
      verification can be easily re-enabled.
      
      A new boot parameter - acpi_force_table_verification is introduced for
      the platforms that require the checksum verification to stop loading bad
      tables.
      
      This fix also covers the checksum verification for the table overrides. Now
      large tables can also be overridden using the initrd override mechanism.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Reported-and-tested-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4fc0a7e8
  3. 31 5月, 2014 4 次提交
  4. 30 5月, 2014 3 次提交
  5. 29 5月, 2014 12 次提交
  6. 28 5月, 2014 12 次提交