1. 03 3月, 2013 2 次提交
    • J
      metag: Basic documentation · fdabf525
      James Hogan 提交于
      Add basic metag documentation. This includes an outline description of
      the ABIs (including syscall ABI) and calling conventions, similar to the
      one in Documentation/frv/.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: linux-doc@vger.kernel.org
      fdabf525
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  2. 26 2月, 2013 2 次提交
  3. 24 2月, 2013 2 次提交
    • T
      acpi, memory-hotplug: support getting hotplug info from SRAT · 01a178a9
      Tang Chen 提交于
      We now provide an option for users who don't want to specify physical
      memory address in kernel commandline.
      
               /*
                * For movablemem_map=acpi:
                *
                * SRAT:                |_____| |_____| |_________| |_________| ......
                * node id:                0       1         1           2
                * hotpluggable:           n       y         y           n
                * movablemem_map:              |_____| |_________|
                *
                * Using movablemem_map, we can prevent memblock from allocating memory
                * on ZONE_MOVABLE at boot time.
                */
      
      So user just specify movablemem_map=acpi, and the kernel will use
      hotpluggable info in SRAT to determine which memory ranges should be set
      as ZONE_MOVABLE.
      
      If all the memory ranges in SRAT is hotpluggable, then no memory can be
      used by kernel.  But before parsing SRAT, memblock has already reserve
      some memory ranges for other purposes, such as for kernel image, and so
      on.  We cannot prevent kernel from using these memory.  So we need to
      exclude these ranges even if these memory is hotpluggable.
      
      Furthermore, there could be several memory ranges in the single node
      which the kernel resides in.  We may skip one range that have memory
      reserved by memblock, but if the rest of memory is too small, then the
      kernel will fail to boot.  So, make the whole node which the kernel
      resides in un-hotpluggable.  Then the kernel has enough memory to use.
      
      NOTE: Using this way will cause NUMA performance down because the
            whole node will be set as ZONE_MOVABLE, and kernel cannot use memory
            on it.  If users don't want to lose NUMA performance, just don't use
            it.
      
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: use strcmp()]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01a178a9
    • T
      page_alloc: add movable_memmap kernel parameter · 34b71f1e
      Tang Chen 提交于
      Add functions to parse movablemem_map boot option.  Since the option
      could be specified more then once, all the maps will be stored in the
      global variable movablemem_map.map array.
      
      And also, we keep the array in monotonic increasing order by start_pfn.
      And merge all overlapped ranges.
      
      [akpm@linux-foundation.org: improve comment]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: remove unneeded parens]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Reviewed-by: NWen Congyang <wency@cn.fujitsu.com>
      Tested-by: NLin Feng <linfeng@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34b71f1e
  4. 16 2月, 2013 1 次提交
  5. 10 2月, 2013 2 次提交
    • L
      x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag · 27be4570
      Len Brown 提交于
      Remove 32-bit x86 a cmdline param "no-hlt",
      and the cpuinfo_x86.hlt_works_ok that it sets.
      
      If a user wants to avoid HLT, then "idle=poll"
      is much more useful, as it avoids invocation of HLT
      in idle, while "no-hlt" failed to do so.
      
      Indeed, hlt_works_ok was consulted in only 3 places.
      
      First, in /proc/cpuinfo where "hlt_bug yes"
      would be printed if and only if the user booted
      the system with "no-hlt" -- as there was no other code
      to set that flag.
      
      Second, check_hlt() would not invoke halt() if "no-hlt"
      were on the cmdline.
      
      Third, it was consulted in stop_this_cpu(), which is invoked
      by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
      all cases where the machine is being shutdown/reset.
      The flag was not consulted in the more frequently invoked
      play_dead()/hlt_play_dead() used in processor offline and suspend.
      
      Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
      indicating that it would be removed in 2012.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: x86@kernel.org
      27be4570
    • L
      x86 idle: remove mwait_idle() and "idle=mwait" cmdline param · 69fb3676
      Len Brown 提交于
      mwait_idle() is a C1-only idle loop intended to be more efficient
      than HLT, starting on Pentium-4 HT-enabled processors.
      
      But mwait_idle() has been replaced by the more general
      mwait_idle_with_hints(), which handles both C1 and deeper C-states.
      ACPI processor_idle and intel_idle use only mwait_idle_with_hints(),
      and no longer use mwait_idle().
      
      Here we simplify the x86 native idle code by removing mwait_idle(),
      and the "idle=mwait" bootparam used to invoke it.
      
      Since Linux 3.0 there has been a boot-time warning when "idle=mwait"
      was invoked saying it would be removed in 2012.  This removal
      was also noted in the (now removed:-) feature-removal-schedule.txt.
      
      After this change, kernels configured with
      (CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware
      that supports MWAIT will simply use HLT.  If MWAIT is desired
      on those systems, cpuidle and the cpuidle drivers above
      can be enabled.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: x86@kernel.org
      69fb3676
  6. 01 2月, 2013 2 次提交
  7. 30 1月, 2013 1 次提交
    • Y
      x86: Add Crash kernel low reservation · 0212f915
      Yinghai Lu 提交于
      During kdump kernel's booting stage, it need to find low ram for
      swiotlb buffer when system does not support intel iommu/dmar remapping.
      
      kexed-tools is appending memmap=exactmap and range from /proc/iomem
      with "Crash kernel", and that range is above 4G for 64bit after boot
      protocol 2.12.
      
      We need to add another range in /proc/iomem like "Crash kernel low",
      so kexec-tools could find that info and append to kdump kernel
      command line.
      
      Try to reserve some under 4G if the normal "Crash kernel" is above 4G.
      
      User could specify the size with crashkernel_low=XX[KMG].
      
      -v2: fix warning that is found by Fengguang's test robot.
      -v3: move out get_mem_size change to another patch, to solve compiling
           warning that is found by Borislav Petkov <bp@alien8.de>
      -v4: user must specify crashkernel_low if system does not support
           intel or amd iommu.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Rob Landley <rob@landley.net>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0212f915
  8. 09 1月, 2013 1 次提交
    • P
      rcu: Make rcu_nocb_poll an early_param instead of module_param · 1b0048a4
      Paul Gortmaker 提交于
      The as-documented rcu_nocb_poll will fail to enable this feature
      for two reasons.  (1) there is an extra "s" in the documented
      name which is not in the code, and (2) since it uses module_param,
      it really is expecting a prefix, akin to "rcutree.fanout_leaf"
      and the prefix isn't documented.
      
      However, there are several reasons why we might not want to
      simply fix the typo and add the prefix:
      
      1) we'd end up with rcutree.rcu_nocb_poll, and rather probably make
      a change to rcutree.nocb_poll
      
      2) if we did #1, then the prefix wouldn't be consistent with the
      rcu_nocbs=<cpumap> parameter (i.e. one with, one without prefix)
      
      3) the use of module_param in a header file is less than desired,
      since it isn't immediately obvious that it will get processed
      via rcutree.c and get the prefix from that (although use of
      module_param_named() could clarify that.)
      
      4) the implied export of /sys/module/rcutree/parameters/rcu_nocb_poll
      data to userspace via module_param() doesn't really buy us anything,
      as it is read-only and we can tell if it is enabled already without
      it, since there is a printk at early boot telling us so.
      
      In light of all that, just change it from a module_param() to an
      early_setup() call, and worry about adding it to /sys later on if
      we decide to allow a dynamic setting of it.
      
      Also change the variable to be tagged as read_mostly, since it
      will only ever be fiddled with at most, once at boot.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1b0048a4
  9. 21 12月, 2012 1 次提交
  10. 18 12月, 2012 1 次提交
  11. 11 12月, 2012 1 次提交
  12. 17 11月, 2012 1 次提交
    • P
      rcu: Add callback-free CPUs · 3fbfbf7a
      Paul E. McKenney 提交于
      RCU callback execution can add significant OS jitter and also can
      degrade both scheduling latency and, in asymmetric multiprocessors,
      energy efficiency.  This commit therefore adds the ability for selected
      CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
      to kthreads.  If the "rcu_nocb_poll" boot parameter is also specified,
      these kthreads will do polling, removing the need for the offloaded
      CPUs to do wakeups.  At least one CPU must be doing normal callback
      processing: currently CPU 0 cannot be selected as a no-CBs CPU.
      In addition, attempts to offline the last normal-CBs CPU will fail.
      
      This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
      this commit includes fixes to problems located by Fengguang Wu's
      kbuild test robot.
      
      [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3fbfbf7a
  13. 16 11月, 2012 1 次提交
  14. 15 11月, 2012 1 次提交
  15. 02 11月, 2012 2 次提交
  16. 10 10月, 2012 1 次提交
    • R
      module: signature checking hook · 106a4ee2
      Rusty Russell 提交于
      We do a very simple search for a particular string appended to the module
      (which is cache-hot and about to be SHA'd anyway).  There's both a config
      option and a boot parameter which control whether we accept or fail with
      unsigned modules and modules that are signed with an unknown key.
      
      If module signing is enabled, the kernel will be tainted if a module is
      loaded that is unsigned or has a signature for which we don't have the
      key.
      
      (Useful feedback and tweaks by David Howells <dhowells@redhat.com>)
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      106a4ee2
  17. 02 10月, 2012 1 次提交
    • C
      NFS: Add nfs4_unique_id boot parameter · 6f2ea7f2
      Chuck Lever 提交于
      An optional boot parameter is introduced to allow client
      administrators to specify a string that the Linux NFS client can
      insert into its nfs_client_id4 id string, to make it both more
      globally unique, and to ensure that it doesn't change even if the
      client's nodename changes.
      
      If this boot parameter is not specified, the client's nodename is
      used, as before.
      
      Client installation procedures can create a unique string (typically,
      a UUID) which remains unchanged during the lifetime of that client
      instance.  This works just like creating a UUID for the label of the
      system's root and boot volumes.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6f2ea7f2
  18. 23 9月, 2012 1 次提交
    • P
      rcu: Control grace-period duration from sysfs · d40011f6
      Paul E. McKenney 提交于
      Although almost everyone is well-served by the defaults, some uses of RCU
      benefit from shorter grace periods, while others benefit more from the
      greater efficiency provided by longer grace periods.  Situations requiring
      a large number of grace periods to elapse (and wireshark startup has
      been called out as an example of this) are helped by lower-latency
      grace periods.  Furthermore, in some embedded applications, people are
      willing to accept a small degradation in update efficiency (due to there
      being more of the shorter grace-period operations) in order to gain the
      lower latency.
      
      In contrast, those few systems with thousands of CPUs need longer grace
      periods because the CPU overhead of a grace period rises roughly
      linearly with the number of CPUs.  Such systems normally do not make
      much use of facilities that require large numbers of grace periods to
      elapse, so this is a good tradeoff.
      
      Therefore, this commit allows the durations to be controlled from sysfs.
      There are two sysfs parameters, one named "jiffies_till_first_fqs" that
      specifies the delay in jiffies from the end of grace-period initialization
      until the first attempt to force quiescent states, and the other named
      "jiffies_till_next_fqs" that specifies the delay (again in jiffies)
      between subsequent attempts to force quiescent states.  They both default
      to three jiffies, which is compatible with the old hard-coded behavior.
      
      At some future time, it may be possible to automatically increase the
      grace-period length with the number of CPUs, but we do not yet have
      sufficient data to do a good job.  Preliminary data indicates that we
      should add an addiitonal jiffy to each of the delays for every 200 CPUs
      in the system, but more experimentation is needed.  For now, the number
      of systems with more than 1,000 CPUs is small enough that this can be
      relegated to boot-time hand tuning.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      d40011f6
  19. 22 9月, 2012 1 次提交
  20. 19 9月, 2012 3 次提交
  21. 08 9月, 2012 2 次提交
    • M
      ima: add appraise action keywords and default rules · 07f6a794
      Mimi Zohar 提交于
      Unlike the IMA measurement policy, the appraise policy can not be dependent
      on runtime process information, such as the task uid, as the 'security.ima'
      xattr is written on file close and must be updated each time the file changes,
      regardless of the current task uid.
      
      This patch extends the policy language with 'fowner', defines an appraise
      policy, which appraises all files owned by root, and defines 'ima_appraise_tcb',
      a new boot command line option, to enable the appraise policy.
      
      Changelog v3:
      - separate the measure from the appraise rules in order to support measuring
        without appraising and appraising without measuring.
      - change appraisal default for filesystems without xattr support to fail
      - update default appraise policy for cgroups
      
      Changelog v1:
      - don't appraise RAMFS (Dmitry Kasatkin)
      - merged rest of "ima: ima_must_appraise_or_measure API change" commit
        (Dmtiry Kasatkin)
      
        ima_must_appraise_or_measure() called ima_match_policy twice, which
        searched the policy for a matching rule.  Once for a matching measurement
        rule and subsequently for an appraisal rule. Searching the policy twice
        is unnecessary overhead, which could be noticeable with a large policy.
      
        The new version of ima_must_appraise_or_measure() does everything in a
        single iteration using a new version of ima_match_policy().  It returns
        IMA_MEASURE, IMA_APPRAISE mask.
      
        With the use of action mask only one efficient matching function
        is enough.  Removed other specific versions of matching functions.
      
      Changelog:
      - change 'owner' to 'fowner' to conform to the new LSM conditions posted by
        Roberto Sassu.
      - fix calls to ima_log_string()
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: NDmitry Kasatkin <dmitry.kasatkin@intel.com>
      07f6a794
    • M
      ima: integrity appraisal extension · 2fe5d6de
      Mimi Zohar 提交于
      IMA currently maintains an integrity measurement list used to assert the
      integrity of the running system to a third party.  The IMA-appraisal
      extension adds local integrity validation and enforcement of the
      measurement against a "good" value stored as an extended attribute
      'security.ima'.  The initial methods for validating 'security.ima' are
      hashed based, which provides file data integrity, and digital signature
      based, which in addition to providing file data integrity, provides
      authenticity.
      
      This patch creates and maintains the 'security.ima' xattr, containing
      the file data hash measurement.  Protection of the xattr is provided by
      EVM, if enabled and configured.
      
      Based on policy, IMA calls evm_verifyxattr() to verify a file's metadata
      integrity and, assuming success, compares the file's current hash value
      with the one stored as an extended attribute in 'security.ima'.
      
      Changelov v4:
      - changed iint cache flags to hex values
      
      Changelog v3:
      - change appraisal default for filesystems without xattr support to fail
      
      Changelog v2:
      - fix audit msg 'res' value
      - removed unused 'ima_appraise=' values
      
      Changelog v1:
      - removed unused iint mutex (Dmitry Kasatkin)
      - setattr hook must not reset appraised (Dmitry Kasatkin)
      - evm_verifyxattr() now differentiates between no 'security.evm' xattr
        (INTEGRITY_NOLABEL) and no EVM 'protected' xattrs included in the
        'security.evm' (INTEGRITY_NOXATTRS).
      - replace hash_status with ima_status (Dmitry Kasatkin)
      - re-initialize slab element ima_status on free (Dmitry Kasatkin)
      - include 'security.ima' in EVM if CONFIG_IMA_APPRAISE, not CONFIG_IMA
      - merged half "ima: ima_must_appraise_or_measure API change" (Dmitry Kasatkin)
      - removed unnecessary error variable in process_measurement() (Dmitry Kasatkin)
      - use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configured
        (moved ima_inode_post_setattr() to ima_appraise.c)
      - make sure ima_collect_measurement() can read file
      
      Changelog:
      - add 'iint' to evm_verifyxattr() call (Dimitry Kasatkin)
      - fix the race condition between chmod, which takes the i_mutex and then
        iint->mutex, and ima_file_free() and process_measurement(), which take
        the locks in the reverse order, by eliminating iint->mutex. (Dmitry Kasatkin)
      - cleanup of ima_appraise_measurement() (Dmitry Kasatkin)
      - changes as a result of the iint not allocated for all regular files, but
        only for those measured/appraised.
      - don't try to appraise new/empty files
      - expanded ima_appraisal description in ima/Kconfig
      - IMA appraise definitions required even if IMA_APPRAISE not enabled
      - add return value to ima_must_appraise() stub
      - unconditionally set status = INTEGRITY_PASS *after* testing status,
        not before.  (Found by Joe Perches)
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: NDmitry Kasatkin <dmitry.kasatkin@intel.com>
      2fe5d6de
  22. 24 8月, 2012 1 次提交
  23. 23 8月, 2012 1 次提交
    • R
      x86/smp: Don't ever patch back to UP if we unplug cpus · 816afe4f
      Rusty Russell 提交于
      We still patch SMP instructions to UP variants if we boot with a
      single CPU, but not at any other time.  In particular, not if we
      unplug CPUs to return to a single cpu.
      
      Paul McKenney points out:
      
       mean offline overhead is 6251/48=130.2 milliseconds.
      
       If I remove the alternatives_smp_switch() from the offline
       path [...] the mean offline overhead is 550/42=13.1 milliseconds
      
      Basically, we're never going to get those 120ms back, and the
      code is pretty messy.
      
      We get rid of:
      
       1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
          documentation is wrong. It's now the default.
      
       2) The skip_smp_alternatives flag used by suspend.
      
       3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
          which were only used to set this one flag.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Paul McKenney <paul.mckenney@us.ibm.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.auSigned-off-by: NIngo Molnar <mingo@kernel.org>
      816afe4f
  24. 30 7月, 2012 1 次提交
  25. 20 7月, 2012 1 次提交
  26. 17 7月, 2012 1 次提交
  27. 03 7月, 2012 1 次提交
  28. 25 6月, 2012 1 次提交
  29. 30 5月, 2012 1 次提交
  30. 25 5月, 2012 2 次提交
    • S
      Documentation/kernel-parameters: remove autotest and mcatest · 9b170dbd
      Sebastian Andrzej Siewior 提交于
      It has no more users, the last one is gone in "[PATCH] ia64: Kconfig
      cleanup" aka ("6fd79ab50b").
      mcatest is gone in commit "[PATCH] ia64: SGI SN update"
      ("c6bacd5010ec").
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Acked-by: NRob Landley <rob@landley.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      9b170dbd
    • M
      tick: Add tick skew boot option · 5307c955
      Mike Galbraith 提交于
      Let the user decide whether power consumption or jitter is the
      more important consideration for their machines.
      
      Quoting removal commit af5ab277:
      
      "Historically, Linux has tried to make the regular timer tick on the
       various CPUs not happen at the same time, to avoid contention on
       xtime_lock.
          
       Nowadays, with the tickless kernel, this contention no longer happens
       since time keeping and updating are done differently. In addition,
       this skew is actually hurting power consumption in a measurable way on
       many-core systems."
      
      Problems:
      
      - Contrary to the above, systems do encounter contention on both
        xtime_lock and RCU structure locks when the tick is synchronized.
        
      - Moderate sized RT systems suffer intolerable jitter due to the tick
        being synchronized.
      
      - SGI reports the same for their large systems.
      
      - Fully utilized systems reap no power saving benefit from skew removal,
        but do suffer from resulting induced lock contention.
      
      - 0209f649 rcu: limit rcu_node leaf-level fanout
        This patch was born to combat lock contention which testing showed
        to have been _induced by_ skew removal.  Skew the tick, contention
        disappeared virtually completely.
      Signed-off-by: NMike Galbraith <mgalbraith@suse.de>
      Link: http://lkml.kernel.org/r/1336472458.21924.78.camel@marge.simpson.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5307c955