1. 10 10月, 2014 2 次提交
  2. 29 9月, 2014 2 次提交
  3. 24 9月, 2014 2 次提交
  4. 09 9月, 2014 1 次提交
  5. 07 8月, 2014 1 次提交
    • L
      printk: allow increasing the ring buffer depending on the number of CPUs · 23b2899f
      Luis R. Rodriguez 提交于
      The default size of the ring buffer is too small for machines with a
      large amount of CPUs under heavy load.  What ends up happening when
      debugging is the ring buffer overlaps and chews up old messages making
      debugging impossible unless the size is passed as a kernel parameter.
      An idle system upon boot up will on average spew out only about one or
      two extra lines but where this really matters is on heavy load and that
      will vary widely depending on the system and environment.
      
      There are mechanisms to help increase the kernel ring buffer for tracing
      through debugfs, and those interfaces even allow growing the kernel ring
      buffer per CPU.  We also have a static value which can be passed upon
      boot.  Relying on debugfs however is not ideal for production, and
      relying on the value passed upon bootup is can only used *after* an
      issue has creeped up.  Instead of being reactive this adds a proactive
      measure which lets you scale the amount of contributions you'd expect to
      the kernel ring buffer under load by each CPU in the worst case
      scenario.
      
      We use num_possible_cpus() to avoid complexities which could be
      introduced by dynamically changing the ring buffer size at run time,
      num_possible_cpus() lets us use the upper limit on possible number of
      CPUs therefore avoiding having to deal with hotplugging CPUs on and off.
      This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT
      which is used to specify the maximum amount of contributions to the
      kernel ring buffer in the worst case before the kernel ring buffer flips
      over, the size is specified as a power of 2.  The total amount of
      contributions made by each CPU must be greater than half of the default
      kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger
      an increase upon bootup.  The kernel ring buffer is increased to the
      next power of two that would fit the required minimum kernel ring buffer
      size plus the additional CPU contribution.  For example if LOG_BUF_SHIFT
      is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs
      in order to trigger an increase of the kernel ring buffer.  With a
      LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64
      possible CPUs to trigger an increase.  If you had 128 possible CPUs the
      amount of minimum required kernel ring buffer bumps to:
      
         ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
      
      Since we require the ring buffer to be a power of two the new required
      size would be 1024 KB.
      
      This CPU contributions are ignored when the "log_buf_len" kernel
      parameter is used as it forces the exact size of the ring buffer to an
      expected power of two value.
      
      [pmladek@suse.cz: fix build]
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Tested-by: NPetr Mladek <pmladek@suse.cz>
      Reviewed-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Arun KS <arunks.linux@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23b2899f
  6. 02 8月, 2014 1 次提交
  7. 22 7月, 2014 1 次提交
  8. 17 7月, 2014 4 次提交
    • D
      KEYS: validate certificate trust only with builtin keys · 32c4741c
      Dmitry Kasatkin 提交于
      Instead of allowing public keys, with certificates signed by any
      key on the system trusted keyring, to be added to a trusted keyring,
      this patch further restricts the certificates to those signed only by
      builtin keys on the system keyring.
      
      This patch defines a new option 'builtin' for the kernel parameter
      'keys_ownerid' to allow trust validation using builtin keys.
      
      Simplified Mimi's "KEYS: define an owner trusted keyring" patch
      
      Changelog v7:
      - rename builtin_keys to use_builtin_keys
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      32c4741c
    • D
      KEYS: validate certificate trust only with selected key · ffb70f61
      Dmitry Kasatkin 提交于
      Instead of allowing public keys, with certificates signed by any
      key on the system trusted keyring, to be added to a trusted keyring,
      this patch further restricts the certificates to those signed by a
      particular key on the system keyring.
      
      This patch defines a new kernel parameter 'ca_keys' to identify the
      specific key which must be used for trust validation of certificates.
      
      Simplified Mimi's "KEYS: define an owner trusted keyring" patch.
      
      Changelog:
      - support for builtin x509 public keys only
      - export "asymmetric_keyid_match"
      - remove ifndefs MODULE
      - rename kernel boot parameter from keys_ownerid to ca_keys
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      ffb70f61
    • D
      ima: introduce multi-page collect buffers · 6edf7a89
      Dmitry Kasatkin 提交于
      Use of multiple-page collect buffers reduces:
      1) the number of block IO requests
      2) the number of asynchronous hash update requests
      
      Second is important for HW accelerated hashing, because significant
      amount of time is spent for preparation of hash update operation,
      which includes configuring acceleration HW, DMA engine, etc...
      Thus, HW accelerators are more efficient when working on large
      chunks of data.
      
      This patch introduces usage of multi-page collect buffers. Buffer size
      can be specified using 'ahash_bufsize' module parameter. Default buffer
      size is 4096 bytes.
      
      Changes in v3:
      - kernel parameter replaced with module parameter
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      6edf7a89
    • D
      ima: use ahash API for file hash calculation · 3bcced39
      Dmitry Kasatkin 提交于
      Async hash API allows the use of HW acceleration for hash calculation.
      It may give significant performance gain and/or reduce power consumption,
      which might be very beneficial for battery powered devices.
      
      This patch introduces hash calculation using ahash API. ahash performance
      depends on the data size and the particular HW. Depending on the specific
      system, shash performance may be better.
      
      This patch defines 'ahash_minsize' module parameter, which is used to
      define the minimal file size to use with ahash.  If this minimum file size
      is not set or the file is smaller than defined by the parameter, shash will
      be used.
      
      Changes in v3:
      - kernel parameter replaced with module parameter
      - pr_crit replaced with pr_crit_ratelimited
      - more comment changes - Mimi
      
      Changes in v2:
      - ima_ahash_size became as ima_ahash
      - ahash pre-allocation moved out from __init code to be able to use
        ahash crypto modules. Ahash allocated once on the first use.
      - hash calculation falls back to shash if ahash allocation/calculation fails
      - complex initialization separated from variable declaration
      - improved comments
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      3bcced39
  9. 15 7月, 2014 2 次提交
  10. 09 7月, 2014 1 次提交
  11. 08 7月, 2014 1 次提交
    • P
      rcu: Parallelize and economize NOCB kthread wakeups · fbce7497
      Paul E. McKenney 提交于
      An 80-CPU system with a context-switch-heavy workload can require so
      many NOCB kthread wakeups that the RCU grace-period kthreads spend several
      tens of percent of a CPU just awakening things.  This clearly will not
      scale well: If you add enough CPUs, the RCU grace-period kthreads would
      get behind, increasing grace-period latency.
      
      To avoid this problem, this commit divides the NOCB kthreads into leaders
      and followers, where the grace-period kthreads awaken the leaders each of
      whom in turn awakens its followers.  By default, the number of groups of
      kthreads is the square root of the number of CPUs, but this default may
      be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
      This reduces the number of wakeups done per grace period by the RCU
      grace-period kthread by the square root of the number of CPUs, but of
      course by shifting those wakeups to the leaders.  In addition, because
      the leaders do grace periods on behalf of their respective followers,
      the number of wakeups of the followers decreases by up to a factor of two.
      Instead of being awakened once when new callbacks arrive and again
      at the end of the grace period, the followers are awakened only at
      the end of the grace period.
      
      For a numerical example, in a 4096-CPU system, the grace-period kthread
      would awaken 64 leaders, each of which would awaken its 63 followers
      at the end of the grace period.  This compares favorably with the 79
      wakeups for the grace-period kthread on an 80-CPU system.
      Reported-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fbce7497
  12. 01 7月, 2014 1 次提交
  13. 24 6月, 2014 2 次提交
    • A
      kernel/watchdog.c: print traces for all cpus on lockup detection · ed235875
      Aaron Tomlin 提交于
      A 'softlockup' is defined as a bug that causes the kernel to loop in
      kernel mode for more than a predefined period to time, without giving
      other tasks a chance to run.
      
      Currently, upon detection of this condition by the per-cpu watchdog
      task, debug information (including a stack trace) is sent to the system
      log.
      
      On some occasions, we have observed that the "victim" rather than the
      actual "culprit" (i.e.  the owner/holder of the contended resource) is
      reported to the user.  Often this information has proven to be
      insufficient to assist debugging efforts.
      
      To avoid loss of useful debug information, for architectures which
      support NMI, this patch makes it possible to improve soft lockup
      reporting.  This is accomplished by issuing an NMI to each cpu to obtain
      a stack trace.
      
      If NMI is not supported we just revert back to the old method.  A sysctl
      and boot-time parameter is available to toggle this feature.
      
      [dzickus@redhat.com: add CONFIG_SMP in certain areas]
      [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
      [mq@suse.cz: fix warning]
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NJan Moskyto Matejka <mq@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed235875
    • P
      rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832
      Paul E. McKenney 提交于
      Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
      fixed a problem where a CPU looping in the kernel with but one runnable
      task would give RCU CPU stall warnings, even if the in-kernel loop
      contained cond_resched() calls.  Unfortunately, in so doing, it introduced
      performance regressions in Anton Blanchard's will-it-scale "open1" test.
      The problem appears to be not so much the increased cond_resched() path
      length as an increase in the rate at which grace periods complete, which
      increased per-update grace-period overhead.
      
      This commit takes a different approach to fixing this bug, mainly by
      moving the RCU-visible quiescent state from cond_resched() to
      rcu_note_context_switch(), and by further reducing the check to a
      simple non-zero test of a single per-CPU variable.  However, this
      approach requires that the force-quiescent-state processing send
      resched IPIs to the offending CPUs.  These will be sent only once
      the grace period has reached an age specified by the boot/sysfs
      parameter rcutree.jiffies_till_sched_qs, or once the grace period
      reaches an age halfway to the point at which RCU CPU stall warnings
      will be emitted, whichever comes first.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
        ktest build robot.  Also fixed smp_mb() comment as noted by
        Oleg Nesterov. ]
      
      Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4a81e832
  14. 17 6月, 2014 2 次提交
  15. 07 6月, 2014 1 次提交
  16. 05 6月, 2014 2 次提交
    • P
      init/main.c: add initcall_blacklist kernel parameter · 7b0b73d7
      Prarit Bhargava 提交于
      When a module is built into the kernel the module_init() function
      becomes an initcall.  Sometimes debugging through dynamic debug can
      help, however, debugging built in kernel modules is typically done by
      changing the .config, recompiling, and booting the new kernel in an
      effort to determine exactly which module caused a problem.
      
      This patchset can be useful stand-alone or combined with initcall_debug.
      There are cases where some initcalls can hang the machine before the
      console can be flushed, which can make initcall_debug output inaccurate.
      Having the ability to skip initcalls can help further debugging of these
      scenarios.
      
      Usage: initcall_blacklist=<list of comma separated initcalls>
      
      ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and
      the log contains:
      
      	blacklisting initcall sgi_uv_sysfs_init
      	...
      	...
      	initcall sgi_uv_sysfs_init blacklisted
      
      ex) added "initcall_blacklist=foo_bar,sgi_uv_sysfs_init" as a kernel parameter
      and the log contains:
      
      	blacklisting initcall foo_bar
      	blacklisting initcall sgi_uv_sysfs_init
      	...
      	...
      	initcall sgi_uv_sysfs_init blacklisted
      
      [akpm@linux-foundation.org: tweak printk text]
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Richard Weinberger <richard.weinberger@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Rob Landley <rob@landley.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b0b73d7
    • A
      cma: add placement specifier for "cma=" kernel parameter · 5ea3b1b2
      Akinobu Mita 提交于
      Currently, "cma=" kernel parameter is used to specify the size of CMA,
      but we can't specify where it is located.  We want to locate CMA below
      4GB for devices only supporting 32-bit addressing on 64-bit systems
      without iommu.
      
      This enables to specify the placement of CMA by extending "cma=" kernel
      parameter.
      
      Examples:
       1. locate 64MB CMA below 4GB by "cma=64M@0-4G"
       2. locate 64MB CMA exact at 512MB by "cma=64M@512M"
      
      Note that the DMA contiguous memory allocator on x86 assumes that
      page_address() works for the pages to allocate.  So this change requires
      to limit end address of contiguous memory area upto max_pfn_mapped to
      prevent from locating it on highmem area by the argument of
      dma_contiguous_reserve().
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ea3b1b2
  17. 01 6月, 2014 1 次提交
    • L
      ACPI: Fix x86 regression related to early mapping size limitation · 4fc0a7e8
      Lv Zheng 提交于
      The following warning message is triggered:
       WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:136 __early_ioremap+0x11f/0x1f2()
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-00017-g86dfc6f3-dirty #298
       Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x036.091920111209 09/19/2011
        0000000000000009 ffffffff81b75c40 ffffffff817c627b 0000000000000000
        ffffffff81b75c78 ffffffff81067b5d 000000000000007b 8000000000000563
        00000000b96b20dc 0000000000000001 ffffffffff300e0c ffffffff81b75c88
       Call Trace:
        [<ffffffff817c627b>] dump_stack+0x45/0x56
        [<ffffffff81067b5d>] warn_slowpath_common+0x7d/0xa0
        [<ffffffff81067c3a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81d4b9d5>] __early_ioremap+0x11f/0x1f2
        [<ffffffff81d4bc5b>] early_ioremap+0x13/0x15
        [<ffffffff81d2b8f3>] __acpi_map_table+0x13/0x18
        [<ffffffff817b8d1a>] acpi_os_map_memory+0x26/0x14e
        [<ffffffff813ff018>] acpi_tb_acquire_table+0x42/0x70
        [<ffffffff813ff086>] acpi_tb_validate_table+0x27/0x37
        [<ffffffff813ff0e5>] acpi_tb_verify_table+0x22/0xd8
        [<ffffffff813ff6a8>] acpi_tb_install_non_fixed_table+0x60/0x1c9
        [<ffffffff81d61024>] acpi_tb_parse_root_table+0x218/0x26a
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d610cd>] acpi_initialize_tables+0x57/0x59
        [<ffffffff81d5f25d>] acpi_table_init+0x1b/0x99
        [<ffffffff81d2bca0>] acpi_boot_table_init+0x1e/0x85
        [<ffffffff81d23043>] setup_arch+0x99d/0xcc6
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d1bbbe>] start_kernel+0x8b/0x415
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d1b5ee>] x86_64_start_reservations+0x2a/0x2c
        [<ffffffff81d1b72e>] x86_64_start_kernel+0x13e/0x14d
       ---[ end trace 11ae599a1898f4e7 ]---
      when installing the following table during early stage:
       ACPI: SSDT 0x00000000B9638018 07A0C4 (v02 INTEL  S2600CP  00004000 INTL 20100331)
      The regression is caused by the size limitation of the x86 early IO mapping.
      
      The root cause is:
       1. ACPICA doesn't split IO memory mapping and table mapping;
       2. Linux x86 OSL implements acpi_os_map_memory() using a size limited fix-map
          mechanism during early boot stage, which is more suitable for only IO
          mappings.
      
      This patch fixes this issue by utilizing acpi_gbl_verify_table_checksum to
      disable the table mapping during early stage and enabling it again for the
      late stage. In this way, the normal code path is not affected. Then after
      the code related to the root cause is cleaned up, the early checksum
      verification can be easily re-enabled.
      
      A new boot parameter - acpi_force_table_verification is introduced for
      the platforms that require the checksum verification to stop loading bad
      tables.
      
      This fix also covers the checksum verification for the table overrides. Now
      large tables can also be overridden using the initrd override mechanism.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Reported-and-tested-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4fc0a7e8
  18. 30 5月, 2014 1 次提交
  19. 28 5月, 2014 1 次提交
  20. 26 5月, 2014 1 次提交
    • R
      PM / sleep: Introduce command line argument for sleep state enumeration · 0399d4db
      Rafael J. Wysocki 提交于
      On some systems the platform doesn't support neither
      PM_SUSPEND_MEM nor PM_SUSPEND_STANDBY, so PM_SUSPEND_FREEZE is the
      only available system sleep state.  However, some user space frameworks
      only use the "mem" and (sometimes) "standby" sleep state labels, so
      the users of those systems need to modify user space in order to be
      able to use system suspend at all and that is not always possible.
      
      For this reason, add a new kernel command line argument,
      relative_sleep_states, allowing the users of those systems to change
      the way in which the kernel assigns labels to system sleep states.
      Namely, for relative_sleep_states=1, the "mem", "standby" and "freeze"
      labels will enumerate the available system sleem states from the
      deepest to the shallowest, respectively, so that "mem" is always
      present in /sys/power/state and the other state strings may or may
      not be presend depending on what is supported by the platform.
      
      Update system sleep states documentation to reflect this change.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0399d4db
  21. 14 5月, 2014 1 次提交
  22. 12 5月, 2014 1 次提交
  23. 07 5月, 2014 1 次提交
    • H
      ACPI / video: change acpi-video brightness_switch_enabled default to 0 · 886129a8
      Hans de Goede 提交于
      acpi-video is unique in that it not only generates brightness up/down
      keypresses, but also (sometimes) actively changes the brightness itself.
      
      This presents an inconsistent kernel interface to userspace, basically there
      are 2 different scenarios, depending on the laptop model:
      
       1) On some laptops a brightness up/down keypress means: show a brightness osd
       with the current brightness, iow it is a brightness has changed notification.
      
       2) Where as on (a lot of) other laptops it means a brightness up/down key was
       pressed, deal with it.
      
      Most of the desktop environments interpret any press as in scenario 2, and
      change the brightness up / down as a response to the key events, causing it
      to be changed twice, once by acpi-video and once by the DE.
      
      With the new default for video.use_native_backlight we will be moving even
      more laptops over to behaving as in scenario 2. Making the remaining laptops
      even more of a weird exception. Also note that it is hard to detect scenario
      1 properly in userspace, and AFAIK none of the DE-s deals with it.
      
      Therefor this commit changes the default of brightness_switch_enabled to 0
      making its behavior consistent with all the other backlight drivers.
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Reviewed-by: NAaron Lu <aaron.lu@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      886129a8
  24. 25 4月, 2014 2 次提交
  25. 21 4月, 2014 1 次提交
  26. 17 4月, 2014 1 次提交
  27. 08 4月, 2014 1 次提交
  28. 07 4月, 2014 1 次提交
  29. 26 3月, 2014 1 次提交