1. 15 12月, 2008 3 次提交
    • I
      perfcounters: add context switch counter · 5d6a27d8
      Ingo Molnar 提交于
      Impact: add new feature, new sw counter
      
      Add a counter that counts the number of context-switches a task
      is doing.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d6a27d8
    • I
      perfcounters: implement "counter inheritance" · 9b51f66d
      Ingo Molnar 提交于
      Impact: implement new performance feature
      
      Counter inheritance can be used to run performance counters in a workload,
      transparently - and pipe back the counter results to the parent counter.
      
      Inheritance for performance counters works the following way: when creating
      a counter it can be marked with the .inherit=1 flag. Such counters are then
      'inherited' by all child tasks (be they fork()-ed or clone()-ed). These
      counters get inherited through exec() boundaries as well (except through
      setuid boundaries).
      
      The counter values get added back to the parent counter(s) when the child
      task(s) exit - much like stime/utime statistics are gathered. So inherited
      counters are ideal to gather summary statistics about an application's
      behavior via shell commands, without having to modify that application.
      
      The timec.c command utilizes counter inheritance:
      
        http://redhat.com/~mingo/perfcounters/timec.c
      
      Sample output:
      
         $ ./timec -e 1 -e 3 -e 5 ls -lR /usr/include/ >/dev/null
      
         Performance counter stats for 'ls':
      
                 163516953 instructions
                      2295 cache-misses
                   2855182 branch-misses
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9b51f66d
    • I
      perfcounters: restructure x86 counter math · ee06094f
      Ingo Molnar 提交于
      Impact: restructure code
      
      Change counter math from absolute values to clear delta logic.
      
      We try to extract elapsed deltas from the raw hw counter - and put
      that into the generic counter.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ee06094f
  2. 11 12月, 2008 14 次提交
    • I
      perf counters: clean up state transitions · 6a930700
      Ingo Molnar 提交于
      Impact: cleanup
      
      Introduce a proper enum for the 3 states of a counter:
      
      	PERF_COUNTER_STATE_OFF		= -1
      	PERF_COUNTER_STATE_INACTIVE	=  0
      	PERF_COUNTER_STATE_ACTIVE	=  1
      
      and rename counter->active to counter->state and propagate the
      changes everywhere.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6a930700
    • I
      perf counters: add prctl interface to disable/enable counters · 1d1c7ddb
      Ingo Molnar 提交于
      Add a way for self-monitoring tasks to disable/enable counters summarily,
      via a prctl:
      
      	PR_TASK_PERF_COUNTERS_DISABLE		31
      	PR_TASK_PERF_COUNTERS_ENABLE		32
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d1c7ddb
    • I
      perf counters: implement PERF_COUNT_TASK_CLOCK · bae43c99
      Ingo Molnar 提交于
      Impact: add new perf-counter type
      
      The 'task clock' counter counts the amount of time a task is executing,
      in nanoseconds. It stops ticking when a task is scheduled out either due
      to it blocking, sleeping or it being preempted.
      
      This counter type is a Linux kernel based abstraction, it is available
      even if the hardware does not support native hardware performance counters.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bae43c99
    • I
      perf counters: consolidate hw_perf save/restore APIs · 01b2838c
      Ingo Molnar 提交于
      Impact: cleanup
      
      Rename them to better match up the usual IRQ disable/enable APIs:
      
       hw_perf_disable_all()  => hw_perf_save_disable()
       hw_perf_restore_ctrl() => hw_perf_restore()
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      01b2838c
    • I
      perf counters: implement PERF_COUNT_CPU_CLOCK · 5c92d124
      Ingo Molnar 提交于
      Impact: add new perf-counter type
      
      The 'CPU clock' counter counts the amount of CPU clock time that is
      elapsing, in nanoseconds. (regardless of how much of it the task is
      spending on a CPU executing)
      
      This counter type is a Linux kernel based abstraction, it is available
      even if the hardware does not support native hardware performance counters.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5c92d124
    • I
      perf counters: hw driver API · 621a01ea
      Ingo Molnar 提交于
      Impact: restructure code, introduce hw_ops driver abstraction
      
      Introduce this abstraction to handle counter details:
      
       struct hw_perf_counter_ops {
      	void (*hw_perf_counter_enable)	(struct perf_counter *counter);
      	void (*hw_perf_counter_disable)	(struct perf_counter *counter);
      	void (*hw_perf_counter_read)	(struct perf_counter *counter);
       };
      
      This will be useful to support assymetric hw details, and it will also
      be useful to implement "software counters". (Counters that count kernel
      managed sw events such as pagefaults, context-switches, wall-clock time
      or task-local time.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      621a01ea
    • I
      perf counters: add support for group counters · 04289bb9
      Ingo Molnar 提交于
      Impact: add group counters
      
      This patch adds the "counter groups" abstraction.
      
      Groups of counters behave much like normal 'single' counters, with a
      few semantic and behavioral extensions on top of that.
      
      A counter group is created by creating a new counter with the open()
      syscall's group-leader group_fd file descriptor parameter pointing
      to another, already existing counter.
      
      Groups of counters are scheduled in and out in one atomic group, and
      they are also roundrobin-scheduled atomically.
      
      Counters that are member of a group can also record events with an
      (atomic) extended timestamp that extends to all members of the group,
      if the record type is set to PERF_RECORD_GROUP.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      04289bb9
    • I
      perf counters: restructure the API · 9f66a381
      Ingo Molnar 提交于
      Impact: clean up new API
      
      Thorough cleanup of the new perf counters API, we now get clean separation
      of the various concepts:
      
       - introduce perf_counter_hw_event to separate out the event source details
      
       - move special type flags into separate attributes: PERF_COUNT_NMI,
         PERF_COUNT_RAW
      
       - extend the type to u64 and reserve it fully to the architecture in the
         raw type case.
      
      And make use of all these changes in the core and x86 perfcounters code.
      
      Also change the syscall signature to:
      
        asmlinkage int sys_perf_counter_open(
      
      	struct perf_counter_hw_event	*hw_event_uptr		__user,
      	pid_t				pid,
      	int				cpu,
      	int				group_fd);
      
      ( Note that group_fd is unused for now - it's reserved for the counter
        groups abstraction. )
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f66a381
    • T
      perf counters: expand use of counter->event · dfa7c899
      Thomas Gleixner 提交于
      Impact: change syscall, cleanup
      
      Make use of the new perf_counters event type.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dfa7c899
    • T
      perf counters: clean up 'raw' type API · eab656ae
      Thomas Gleixner 提交于
      Impact: cleanup
      
      Introduce a separate hw_event type.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eab656ae
    • T
      perf counters: protect them against CSTATE transitions · 4ac13294
      Thomas Gleixner 提交于
      Impact: fix rare lost events problem
      
      There are CPUs whose performance counters misbehave on CSTATE transitions,
      so provide a way to just disable/enable them around deep idle methods.
      
      (hw_perf_enable_all() is cheap on x86.)
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4ac13294
    • H
      KSYM_SYMBOL_LEN fixes · 9c246247
      Hugh Dickins 提交于
      Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
      to my 966c8c12 sprint_symbol(): use
      less stack exposing a bug in slub's list_locations() -
      kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
      beyond the end of page provided.
      
      The 100 slop which list_locations() allows at end of page looks roughly
      enough for all the other stuff it might print after the symbol before
      it checks again: break out KSYM_SYMBOL_LEN earlier than before.
      
      Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
      need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
      where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
      them.
      
      [akpm@linux-foundation.org: ftrace.h needs module.h]
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc Miles Lane <miles.lane@gmail.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c246247
    • A
      revert "percpu_counter: new function percpu_counter_sum_and_set" · 02d21168
      Andrew Morton 提交于
      Revert
      
          commit e8ced39d
          Author: Mingming Cao <cmm@us.ibm.com>
          Date:   Fri Jul 11 19:27:31 2008 -0400
      
              percpu_counter: new function percpu_counter_sum_and_set
      
      As described in
      
      	revert "percpu counter: clean up percpu_counter_sum_and_set()"
      
      the new percpu_counter_sum_and_set() is racy against updates to the
      cpu-local accumulators on other CPUs.  Revert that change.
      
      This means that ext4 will be slow again.  But correct.
      Reported-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: <stable@kernel.org>		[2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02d21168
    • A
      revert "percpu counter: clean up percpu_counter_sum_and_set()" · 71c5576f
      Andrew Morton 提交于
      Revert
      
          commit 1f7c14c6
          Author: Mingming Cao <cmm@us.ibm.com>
          Date:   Thu Oct 9 12:50:59 2008 -0400
      
              percpu counter: clean up percpu_counter_sum_and_set()
      
      Before this patch we had the following:
      
      percpu_counter_sum(): return the percpu_counter's value
      
      percpu_counter_sum_and_set(): return the percpu_counter's value, copying
      that value into the central value and zeroing the per-cpu counters before
      returning.
      
      After this patch, percpu_counter_sum_and_set() has gone, and
      percpu_counter_sum() gets the old percpu_counter_sum_and_set()
      functionality.
      
      Problem is, as Eric points out, the old percpu_counter_sum_and_set()
      functionality was racy and wrong.  It zeroes out counters on "other" cpus,
      without holding any locks which will prevent races agaist updates from
      those other CPUS.
      
      This patch reverts 1f7c14c6.  This means
      that percpu_counter_sum_and_set() still has the race, but
      percpu_counter_sum() does not.
      
      Note that this is not a simple revert - ext4 has since started using
      percpu_counter_sum() for its dirty_blocks counter as well.
      
      Note that this revert patch changes percpu_counter_sum() semantics.
      
      Before the patch, a call to percpu_counter_sum() will bring the counter's
      central counter mostly up-to-date, so a following percpu_counter_read()
      will return a close value.
      
      After this patch, a call to percpu_counter_sum() will leave the counter's
      central accumulator unaltered, so a subsequent call to
      percpu_counter_read() can now return a significantly inaccurate result.
      
      If there is any code in the tree which was introduced after
      e8ced39d was merged, and which depends
      upon the new percpu_counter_sum() semantics, that code will break.
      Reported-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71c5576f
  3. 09 12月, 2008 2 次提交
  4. 08 12月, 2008 1 次提交
    • T
      performance counters: core code · 0793a61d
      Thomas Gleixner 提交于
      Implement the core kernel bits of Performance Counters subsystem.
      
      The Linux Performance Counter subsystem provides an abstraction of
      performance counter hardware capabilities. It provides per task and per
      CPU counters, and it provides event capabilities on top of those.
      
      Performance counters are accessed via special file descriptors.
      There's one file descriptor per virtual counter used.
      
      The special file descriptor is opened via the perf_counter_open()
      system call:
      
       int
       perf_counter_open(u32 hw_event_type,
                         u32 hw_event_period,
                         u32 record_type,
                         pid_t pid,
                         int cpu);
      
      The syscall returns the new fd. The fd can be used via the normal
      VFS system calls: read() can be used to read the counter, fcntl()
      can be used to set the blocking mode, etc.
      
      Multiple counters can be kept open at a time, and the counters
      can be poll()ed.
      
      See more details in Documentation/perf-counters.txt.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0793a61d
  5. 06 12月, 2008 1 次提交
  6. 04 12月, 2008 3 次提交
  7. 03 12月, 2008 4 次提交
    • M
      block: fix setting of max_segment_size and seg_boundary mask · 0e435ac2
      Milan Broz 提交于
      Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
      devices.
      
      When stacking devices (LVM over MD over SCSI) some of the request queue
      parameters are not set up correctly in some cases by default, namely
      max_segment_size and and seg_boundary mask.
      
      If you create MD device over SCSI, these attributes are zeroed.
      
      Problem become when there is over this mapping next device-mapper mapping
      - queue attributes are set in DM this way:
      
      request_queue   max_segment_size  seg_boundary_mask
      SCSI                65536             0xffffffff
      MD RAID1                0                      0
      LVM                 65536                 -1 (64bit)
      
      Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
      physical segments according to these parameters.
      
      During the generic_make_request() is segment cout recalculated and can
      increase bio->bi_phys_segments count over the allowed limit.  (After
      bio_clone() in stack operation.)
      
      Thi is specially problem in CCISS driver, where it produce OOPS here
      
          BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
      
      (MAXSEGENTRIES is 31 by default.)
      
      Sometimes even this command is enough to cause oops:
      
        dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10
      
      This command generates bios with 250 sectors, allocated in 32 4k-pages
      (last page uses only 1024 bytes).
      
      For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
      unfortunatelly on lower layer it is recalculated to 32 segments and this
      violates CCISS restriction and triggers BUG_ON().
      
      The patch tries to fix it by:
      
       * initializing attributes above in queue request constructor
         blk_queue_make_request()
      
       * make sure that blk_queue_stack_limits() inherits setting
      
       (DM uses its own function to set the limits because it
       blk_queue_stack_limits() was introduced later.  It should probably switch
       to use generic stack limit function too.)
      
       * sets the default seg_boundary value in one place (blkdev.h)
      
       * use this mask as default in DM (instead of -1, which differs in 64bit)
      
      Bugs related to this:
      https://bugzilla.redhat.com/show_bug.cgi?id=471639
      http://bugzilla.kernel.org/show_bug.cgi?id=8672Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reviewed-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0e435ac2
    • T
      block: internal dequeue shouldn't start timer · 53a08807
      Tejun Heo 提交于
      blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
      both start the timeout timer.  Barrier code dequeues the original
      barrier request but doesn't passes the request itself to lower level
      driver, only broken down proxy requests; however, as the original
      barrier code goes through the same dequeue path and timeout timer is
      started on it.  If barrier sequence takes long enough, this timer
      expires but the low level driver has no idea about this request and
      oops follows.
      
      Timeout timer shouldn't have been started on the original barrier
      request as it never goes through actual IO.  This patch unexports
      elv_dequeue_request(), which has no external user anyway, and makes it
      operate on elevator proper w/o adding the timer and make
      blkdev_dequeue_request() call elv_dequeue_request() and add timer.
      Internal users which don't pass the request to driver - barrier code
      and end_that_request_last() - are converted to use
      elv_dequeue_request().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      53a08807
    • J
      nfsd: fix vm overcommit crash fix #2 · 1b79cd04
      Junjiro R. Okajima 提交于
      The previous patch from Alan Cox ("nfsd: fix vm overcommit crash",
      commit 731572d3) fixed the problem where
      knfsd crashes on exported shmemfs objects and strict overcommit is set.
      
      But the patch forgot supporting the case when CONFIG_SECURITY is
      disabled.
      
      This patch copies a part of his fix which is mainly for detecting a bug
      earlier.
      Acked-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NJunjiro R. Okajima <hooanon05@yahoo.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b79cd04
    • B
      amd74xx: workaround unreliable AltStatus register for nVidia controllers · 6636487e
      Bartlomiej Zolnierkiewicz 提交于
      It seems that on some nVidia controllers using AltStatus register
      can be unreliable so default to Status register if the PCI device
      is in Compatibility Mode.  In order to achieve this:
      
      * Add ide_pci_is_in_compatibility_mode() inline helper to <linux/ide.h>.
      
      * Add IDE_HFLAG_BROKEN_ALTSTATUS host flag and set it in amd74xx host
        driver for nVidia controllers in Compatibility Mode.
      
      * Teach actual_try_to_identify() and drive_is_ready() about the new flag.
      
      This fixes the regression caused by removal of CONFIG_IDEPCI_SHARE_IRQ
      config option in 2.6.25 and using AltStatus register unconditionally when
      available (kernel.org bugs #11659 and #10216).
      
      [ Moreover for CONFIG_IDEPCI_SHARE_IRQ=y (which is what most people
        and distributions use) it never worked correctly. ]
      
      Thanks to Remy LABENE and Lars Winterfeld for help with debugging the problem.
      
      More info at:
      http://bugzilla.kernel.org/show_bug.cgi?id=11659
      http://bugzilla.kernel.org/show_bug.cgi?id=10216Reported-by: NRemy LABENE <remy.labene@free.fr>
      Tested-by: NRemy LABENE <remy.labene@free.fr>
      Tested-by: NLars Winterfeld <lars.winterfeld@tu-ilmenau.de>
      Acked-by: NBorislav Petkov <petkovbb@gmail.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      6636487e
  8. 02 12月, 2008 3 次提交
    • M
      lib/idr.c: fix rcu related race with idr_find · 6ff2d39b
      Manfred Spraul 提交于
      2nd part of the fixes needed for
      http://bugzilla.kernel.org/show_bug.cgi?id=11796.
      
      When the idr tree is either grown or shrunk, then the update to the number
      of layers and the top pointer were not atomic.  This race caused crashes.
      
      The attached patch fixes that by replicating the layers counter in each
      layer, thus idr_find doesn't need idp->layers anymore.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Clement Calmels <cboulte@gmail.com>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ff2d39b
    • D
      epoll: introduce resource usage limits · 7ef9964e
      Davide Libenzi 提交于
      It has been thought that the per-user file descriptors limit would also
      limit the resources that a normal user can request via the epoll
      interface.  Vegard Nossum reported a very simple program (a modified
      version attached) that can make a normal user to request a pretty large
      amount of kernel memory, well within the its maximum number of fds.  To
      solve such problem, default limits are now imposed, and /proc based
      configuration has been introduced.  A new directory has been created,
      named /proc/sys/fs/epoll/ and inside there, there are two configuration
      points:
      
        max_user_instances = Maximum number of devices - per user
      
        max_user_watches   = Maximum number of "watched" fds - per user
      
      The current default for "max_user_watches" limits the memory used by epoll
      to store "watches", to 1/32 of the amount of the low RAM.  As example, a
      256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
      That should be enough to not break existing heavy epoll users.  The
      default value for "max_user_instances" is set to 128, that should be
      enough too.
      
      This also changes the userspace, because a new error code can now come out
      from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
      listed, so that should be ok.
      
      [akpm@linux-foundation.org: use get_current_user()]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: <stable@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Reported-by: NVegard Nossum <vegardno@ifi.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef9964e
    • T
      libata: blacklist Seagate drives which time out FLUSH_CACHE when used with NCQ · ac70a964
      Tejun Heo 提交于
      Some recent Seagate harddrives have firmware bug which causes FLUSH
      CACHE to timeout under certain circumstances if NCQ is being used.
      This can be worked around by disabling NCQ and fixed by updating the
      firmware.  Implement ATA_HORKAGE_FIRMWARE_UPDATE and blacklist these
      devices.
      
      The wiki page has been updated to contain information on this issue.
      
        http://ata.wiki.kernel.org/index.php/Known_issuesSigned-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      ac70a964
  9. 01 12月, 2008 3 次提交
  10. 29 11月, 2008 1 次提交
  11. 28 11月, 2008 1 次提交
    • R
      Allow architectures to override copy_user_highpage() · 487ff320
      Russell King 提交于
      With aliasing VIPT cache support, the ARM implementation of
      clear_user_page() and copy_user_page() sets up a temporary kernel space
      mapping such that we have the same cache colour as the userspace page.
      This avoids having to consider any userspace aliases from this operation.
      
      However, when highmem is enabled, kmap_atomic() have to setup mappings.
      The copy_user_highpage() and clear_user_highpage() call these functions
      before delegating the copies to copy_user_page() and clear_user_page().
      
      The effect of this is that each of the *_user_highpage() functions setup
      their own kmap mapping, followed by the *_user_page() functions setting
      up another mapping.  This is rather wasteful.
      
      Thankfully, copy_user_highpage() can be overriden by architectures by
      defining __HAVE_ARCH_COPY_USER_HIGHPAGE.  However, replacement of
      clear_user_highpage() is more difficult because its inline definition
      is not conditional.  It seems that you're expected to define
      __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE and provide a replacement
      __alloc_zeroed_user_highpage() implementation instead.
      
      The allocation itself is fine, so we don't want to override that.  What
      we really want to do is to override clear_user_highpage() with our own
      version which doesn't kmap_atomic() unnecessarily.
      
      Other VIPT architectures (PARISC and SH) would also like to override
      this function as well.
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      487ff320
  12. 27 11月, 2008 1 次提交
  13. 25 11月, 2008 1 次提交
  14. 23 11月, 2008 1 次提交
  15. 20 11月, 2008 1 次提交