1. 18 3月, 2009 2 次提交
  2. 14 3月, 2009 1 次提交
  3. 13 3月, 2009 1 次提交
  4. 12 3月, 2009 2 次提交
  5. 10 3月, 2009 3 次提交
    • T
      percpu: generalize embedding first chunk setup helper · 66c3a757
      Tejun Heo 提交于
      Impact: code reorganization
      
      Separate out embedding first chunk setup helper from x86 embedding
      first chunk allocator and put it in mm/percpu.c.  This will be used by
      the default percpu first chunk allocator and possibly by other archs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      66c3a757
    • T
      percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk() · 6074d5b0
      Tejun Heo 提交于
      Impact: cleanup, more flexibility for first chunk init
      
      Non-negative @dyn_size used to be allowed iff @unit_size wasn't auto.
      This restriction stemmed from implementation detail and made things a
      bit less intuitive.  This patch allows @dyn_size to be specified
      regardless of @unit_size and swaps the positions of @dyn_size and
      @unit_size so that the parameter order makes more sense (static,
      reserved and dyn sizes followed by enclosing unit_size).
      
      While at it, add @unit_size >= PCPU_MIN_UNIT_SIZE sanity check.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6074d5b0
    • D
      Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod." · 129f8ae9
      Dave Jones 提交于
      This reverts commit e088e4c9.
      
      Removing the sysfs interface for p4-clockmod was flagged as a
      regression in bug 12826.
      
      Course of action:
       - Find out the remaining causes of overheating, and fix them
         if possible. ACPI should be doing the right thing automatically.
         If it isn't, we need to fix that.
       - mark p4-clockmod ui as deprecated
       - try again with the removal in six months.
      
      It's not really feasible to printk about the deprecation, because
      it needs to happen at all the sysfs entry points, which means adding
      a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.
      Signed-off-by: NDave Jones <davej@redhat.com>
      129f8ae9
  6. 08 3月, 2009 1 次提交
  7. 06 3月, 2009 6 次提交
    • T
      x86, percpu: setup reserved percpu area for x86_64 · 6b19b0c2
      Tejun Heo 提交于
      Impact: fix relocation overflow during module load
      
      x86_64 uses 32bit relocations for symbol access and static percpu
      symbols whether in core or modules must be inside 2GB of the percpu
      segement base which the dynamic percpu allocator doesn't guarantee.
      This patch makes x86_64 reserve PERCPU_MODULE_RESERVE bytes in the
      first chunk so that module percpu areas are always allocated from the
      first chunk which is always inside the relocatable range.
      
      This problem exists for any percpu allocator but is easily triggered
      when using the embedding allocator because the second chunk is located
      beyond 2GB on it.
      
      This patch also changes the meaning of PERCPU_DYNAMIC_RESERVE such
      that it only indicates the size of the area to reserve for dynamic
      allocation as static and dynamic areas can be separate.  New
      PERCPU_DYNAMIC_RESERVED is increased by 4k for both 32 and 64bits as
      the reserved area separation eats away some allocatable space and
      having slightly more headroom (currently between 4 and 8k after
      minimal boot sans module area) makes sense for common case
      performance.
      
      x86_32 can address anywhere from anywhere and doesn't need reserving.
      
      Mike Galbraith first reported the problem first and bisected it to the
      embedding percpu allocator commit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMike Galbraith <efault@gmx.de>
      Reported-by: NJaswinder Singh Rajput <jaswinder@kernel.org>
      6b19b0c2
    • T
      percpu, module: implement reserved allocation and use it for module percpu variables · edcb4639
      Tejun Heo 提交于
      Impact: add reserved allocation functionality and use it for module
      	percpu variables
      
      This patch implements reserved allocation from the first chunk.  When
      setting up the first chunk, arch can ask to set aside certain number
      of bytes right after the core static area which is available only
      through a separate reserved allocator.  This will be used primarily
      for module static percpu variables on architectures with limited
      relocation range to ensure that the module perpcu symbols are inside
      the relocatable range.
      
      If reserved area is requested, the first chunk becomes reserved and
      isn't available for regular allocation.  If the first chunk also
      includes piggy-back dynamic allocation area, a separate chunk mapping
      the same region is created to serve dynamic allocation.  The first one
      is called static first chunk and the second dynamic first chunk.
      Although they share the page map, their different area map
      initializations guarantee they serve disjoint areas according to their
      purposes.
      
      If arch doesn't setup reserved area, reserved allocation is handled
      like any other allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      edcb4639
    • T
      percpu: use negative for auto for pcpu_setup_first_chunk() arguments · cafe8816
      Tejun Heo 提交于
      Impact: argument semantic cleanup
      
      In pcpu_setup_first_chunk(), zero @unit_size and @dyn_size meant
      auto-sizing.  It's okay for @unit_size as 0 doesn't make sense but 0
      dynamic reserve size is valid.  Alos, if arch @dyn_size is calculated
      from other parameters, it might end up passing in 0 @dyn_size and
      malfunction when the size is automatically adjusted.
      
      This patch makes both @unit_size and @dyn_size ssize_t and use -1 for
      auto sizing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cafe8816
    • T
      percpu: cosmetic renames in pcpu_setup_first_chunk() · 2441d15c
      Tejun Heo 提交于
      Impact: cosmetic, preparation for future changes
      
      Make the following renames in pcpur_setup_first_chunk() in preparation
      for future changes.
      
      * s/free_size/dyn_size/
      * s/static_vm/first_vm/
      * s/static_chunk/schunk/
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2441d15c
    • T
      percpu: clean up percpu constants · 6a242909
      Tejun Heo 提交于
      Impact: cleaup
      
      Make the following cleanups.
      
      * There isn't much arch-specific about PERCPU_MODULE_RESERVE.  Always
        define it whether arch overrides PERCPU_ENOUGH_ROOM or not.
      
      * blackfin overrides PERCPU_ENOUGH_ROOM to align static area size.  Do
        it by default.
      
      * percpu allocation sizes doesn't have much to do with the page size.
        Don't use PAGE_SHIFT in their definition.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Bryan Wu <cooloney@kernel.org>
      6a242909
    • S
      ata: add CFA specific identify data words · d42ad15b
      Sergei Shtylyov 提交于
      Declare CFA specific identify data words 162 and 163 for future use.
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      [bart: update patch summary/description]
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      d42ad15b
  8. 05 3月, 2009 6 次提交
  9. 02 3月, 2009 2 次提交
    • I
      x86, mm: dont use non-temporal stores in pagecache accesses · f1800536
      Ingo Molnar 提交于
      Impact: standardize IO on cached ops
      
      On modern CPUs it is almost always a bad idea to use non-temporal stores,
      as the regression in this commit has shown it:
      
        30d697fa: x86: fix performance regression in write() syscall
      
      The kernel simply has no good information about whether using non-temporal
      stores is a good idea or not - and trying to add heuristics only increases
      complexity and inserts fragility.
      
      The regression on cached write()s took very long to be found - over two
      years. So dont take any chances and let the hardware decide how it makes
      use of its caches.
      
      The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
      absolutely sure that another entity (the GPU) will pick up the dirty
      data immediately and that the CPU will not touch that data before the
      GPU will.
      
      Also, keep the _nocache() primitives to make it easier for people to
      experiment with these details. There may be more clear-cut cases where
      non-cached copies can be used, outside of filemap.c.
      
      Cc: Salman Qazi <sqazi@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f1800536
    • P
      5ce04e3d
  10. 01 3月, 2009 3 次提交
  11. 28 2月, 2009 1 次提交
  12. 27 2月, 2009 1 次提交
  13. 26 2月, 2009 4 次提交
    • J
      block: reduce stack footprint of blk_recount_segments() · 1e428079
      Jens Axboe 提交于
      blk_recalc_rq_segments() requires a request structure passed in, which
      we don't have from blk_recount_segments(). So the latter allocates one on
      the stack, using > 400 bytes of stack for that. This can cause us to spill
      over one page of stack from ext4 at least:
      
       0)     4560     400   blk_recount_segments+0x43/0x62
       1)     4160      32   bio_phys_segments+0x1c/0x24
       2)     4128      32   blk_rq_bio_prep+0x2a/0xf9
       3)     4096      32   init_request_from_bio+0xf9/0xfe
       4)     4064     112   __make_request+0x33c/0x3f6
       5)     3952     144   generic_make_request+0x2d1/0x321
       6)     3808      64   submit_bio+0xb9/0xc3
       7)     3744      48   submit_bh+0xea/0x10e
       8)     3696     368   ext4_mb_init_cache+0x257/0xa6a [ext4]
       9)     3328     288   ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
      10)     3040     160   ext4_mb_new_blocks+0x211/0x4b4 [ext4]
      11)     2880     336   ext4_ext_get_blocks+0xb61/0xd45 [ext4]
      12)     2544      96   ext4_get_blocks_wrap+0xf2/0x200 [ext4]
      13)     2448      80   ext4_da_get_block_write+0x6e/0x16b [ext4]
      14)     2368     352   mpage_da_map_blocks+0x7e/0x4b3 [ext4]
      15)     2016     352   ext4_da_writepages+0x2ce/0x43c [ext4]
      16)     1664      32   do_writepages+0x2d/0x3c
      17)     1632     144   __writeback_single_inode+0x162/0x2cd
      18)     1488      96   generic_sync_sb_inodes+0x1e3/0x32b
      19)     1392      16   sync_sb_inodes+0xe/0x10
      20)     1376      48   writeback_inodes+0x69/0xb3
      21)     1328     208   balance_dirty_pages_ratelimited_nr+0x187/0x2f9
      22)     1120     224   generic_file_buffered_write+0x1d4/0x2c4
      23)      896     176   __generic_file_aio_write_nolock+0x35f/0x393
      24)      720      80   generic_file_aio_write+0x6c/0xc8
      25)      640      80   ext4_file_write+0xa9/0x137 [ext4]
      26)      560     320   do_sync_write+0xf0/0x137
      27)      240      48   vfs_write+0xb3/0x13c
      28)      192      64   sys_write+0x4c/0x74
      29)      128     128   system_call_fastpath+0x16/0x1b
      
      Split the segment counting out into a __blk_recalc_rq_segments() helper
      to avoid allocating an onstack request just for checking the physical
      segment count.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1e428079
    • P
      rcu: Teach RCU that idle task is not quiscent state at boot · a6826048
      Paul E. McKenney 提交于
      This patch fixes a bug located by Vegard Nossum with the aid of
      kmemcheck, updated based on review comments from Nick Piggin,
      Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
      and function-name language.  ;-)
      
      The boot CPU runs in the context of its idle thread during boot-up.
      During this time, idle_cpu(0) will always return nonzero, which will
      fool Classic and Hierarchical RCU into deciding that a large chunk of
      the boot-up sequence is a big long quiescent state.  This in turn causes
      RCU to prematurely end grace periods during this time.
      
      This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
      function to ignore the idle task as a quiescent state until the
      system has started up the scheduler in rest_init(), introducing a
      new non-API function rcu_idle_now_means_idle() to inform RCU of this
      transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
      to track this state, which is then used by rcu_check_callback() to
      determine if it should believe idle_cpu().
      
      Because this patch has the effect of disallowing RCU grace periods
      during long stretches of the boot-up sequence, this patch also introduces
      Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
      no-op if num_online_cpus() returns 1.  This allows boot-time code that
      calls synchronize_rcu() to proceed normally.  Note, however, that RCU
      callbacks registered by call_rcu() will likely queue up until later in
      the boot sequence.  Although rcuclassic and rcutree can also use this
      same optimization after boot completes, rcupreempt must restrict its
      use of this optimization to the portion of the boot sequence before the
      scheduler starts up, given that an rcupreempt RCU read-side critical
      section may be preeempted.
      
      In addition, this patch takes Nick Piggin's suggestion to make the
      system_state global variable be __read_mostly.
      
      Changes since v4:
      
      o	Changes the name of the introduced function and variable to
      	be less emotional.  ;-)
      
      Changes since v3:
      
      o	WARN_ON(nr_context_switches() > 0) to verify that RCU
      	switches out of boot-time mode before the first context
      	switch, as suggested by Nick Piggin.
      
      Changes since v2:
      
      o	Created rcu_blocking_is_gp() internal-to-RCU API that
      	determines whether a call to synchronize_rcu() is itself
      	a grace period.
      
      o	The definition of rcu_blocking_is_gp() for rcuclassic and
      	rcutree checks to see if but a single CPU is online.
      
      o	The definition of rcu_blocking_is_gp() for rcupreempt
      	checks to see both if but a single CPU is online and if
      	the system is still in early boot.
      
      	This allows rcupreempt to again work correctly if running
      	on a single CPU after booting is complete.
      
      o	Added check to rcupreempt's synchronize_sched() for there
      	being but one online CPU.
      
      Tested all three variants both SMP and !SMP, booted fine, passed a short
      rcutorture test on both x86 and Power.
      Located-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a6826048
    • T
      percpu: fix too low alignment restriction on UP · e3176036
      Tejun Heo 提交于
      UP __alloc_percpu() triggered WARN_ON_ONCE() if the requested
      alignment is larger than that of unsigned long long, which is too
      small for all the cacheline aligned allocations.  Bump it up to
      SMP_CACHE_BYTES which kmalloc allocations generally guarantee.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      e3176036
    • B
      ide: fix refcounting in device drivers · 8fed4368
      Bartlomiej Zolnierkiewicz 提交于
      During host driver module removal del_gendisk() results in a final
      put on drive->gendev and freeing the drive by drive_release_dev().
      
      Convert device drivers from using struct kref to use struct device
      so device driver's object holds reference on ->gendev and prevents
      drive from prematurely going away.
      
      Also fix ->remove methods to not erroneously drop reference on a
      host driver by using only put_device() instead of ide*_put().
      Reported-by: NStanislaw Gruszka <stf_xl@wp.pl>
      Tested-by: NStanislaw Gruszka <stf_xl@wp.pl>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      8fed4368
  14. 25 2月, 2009 5 次提交
  15. 24 2月, 2009 2 次提交
    • T
      percpu: give more latitude to arch specific first chunk initialization · 8d408b4b
      Tejun Heo 提交于
      Impact: more latitude for first percpu chunk allocation
      
      The first percpu chunk serves the kernel static percpu area and may or
      may not contain extra room for further dynamic allocation.
      Initialization of the first chunk needs to be done before normal
      memory allocation service is up, so it has its own init path -
      pcpu_setup_static().
      
      It seems archs need more latitude while initializing the first chunk
      for example to take advantage of large page mapping.  This patch makes
      the following changes to allow this.
      
      * Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
        to reserve in the first chunk for further dynamic allocation.
      
      * Rename pcpu_setup_static() to pcpu_setup_first_chunk().
      
      * Make pcpu_setup_first_chunk() much more flexible by fetching page
        pointer by callback and adding optional @unit_size, @free_size and
        @base_addr arguments which allow archs to selectively part of chunk
        initialization to their likings.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8d408b4b
    • T
      vmalloc: add @align to vm_area_register_early() · c0c0a293
      Tejun Heo 提交于
      Impact: allow larger alignment for early vmalloc area allocation
      
      Some early vmalloc users might want larger alignment, for example, for
      custom large page mapping.  Add @align to vm_area_register_early().
      While at it, drop docbook comment on non-existent @size.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      c0c0a293