1. 02 10月, 2006 22 次提交
    • E
      [PATCH] pids coding style use struct pidmap in next_pidmap · c88be3eb
      Eric W. Biederman 提交于
      Use struct pidmap instead of pidmap_t.
      
      This updates my proc: readdir race fix (take 3) patch
      to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      to kill pidmap_t.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c88be3eb
    • S
      [PATCH] pids: coding style: use struct pidmap · 6a1f3b84
      Sukadev Bhattiprolu 提交于
      Use struct pidmap instead of pidmap_t.
      
      Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/271.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6a1f3b84
    • J
      [PATCH] const struct tty_operations · b68e31d0
      Jeff Dike 提交于
      As part of an SMP cleanliness pass over UML, I consted a bunch of
      structures in order to not have to document their locking.  One of these
      structures was a struct tty_operations.  In order to const it in UML
      without introducing compiler complaints, the declaration of
      tty_set_operations needs to be changed, and then all of its callers need to
      be fixed.
      
      This patch declares all struct tty_operations in the tree as const.  In all
      cases, they are static and used only as input to tty_set_operations.  As an
      extra check, I ran an i386 allyesconfig build which produced no extra
      warnings.
      
      53 drivers are affected.  I checked the history of a bunch of them, and in
      most cases, there have been only a handful of maintenance changes in the
      last six months.  serial_core.c was the busiest one that I looked at.
      Signed-off-by: NJeff Dike <jdike@addtoit.com>
      Acked-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b68e31d0
    • A
      [PATCH] fs/inode.c tweaks · ed97bd37
      Andreas Mohr 提交于
      Only touch inode's i_mtime and i_ctime to make them equal to "now" in case
      they aren't yet (don't just update timestamp unconditionally).  Uninline
      the hash function to save 259 Bytes.
      
      This tiny inode change which may improve cache behaviour also shaves off 8
      Bytes from file_update_time() on i386.
      
      Included a tiny codestyle cleanup, too.
      Signed-off-by: NAndreas Mohr <andi@lisas.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed97bd37
    • A
      [PATCH] Remove NULL check in register_nls() · 07acaf28
      Alexey Dobriyan 提交于
      Everybody passes valid pointer there.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      07acaf28
    • E
      [PATCH] file: modify struct fown_struct to use a struct pid · 609d7fa9
      Eric W. Biederman 提交于
      File handles can be requested to send sigio and sigurg to processes.  By
      tracking the destination processes using struct pid instead of pid_t we make
      the interface safe from all potential pid wrap around problems.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      609d7fa9
    • E
      [PATCH] vt: Make vt_pid a struct pid (making it pid wrap around safe). · bde0d2c9
      Eric W. Biederman 提交于
      I took a good hard look at the locking and it appears the locking on vt_pid
      is the console semaphore.  Every modified path is called under the console
      semaphore except reset_vc when it is called from fn_SAK or do_SAK both of
      which appear to be in interrupt context.  In addition I need to be careful
      because in the presence of an oops the console_sem may be arbitrarily
      dropped.
      
      Which leads me to conclude the current locking is inadequate for my needs.
      
      Given the weird cases we could hit because of oops printing instead of
      introducing an extra spin lock to protect the data and keep the pid to
      signal and the signal to send in sync, I have opted to use xchg on just the
      struct pid * pointer instead.
      
      Due to console_sem we will stay in sync between vt_pid and vt_mode except
      for a small window during a SAK, or oops handling.  SAK handling should
      kill any user space process that care, and oops handling we are broken
      anyway.  Besides the worst that can happen is that I try to send the wrong
      signal.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bde0d2c9
    • E
      [PATCH] vt: rework the console spawning variables · 81af8d67
      Eric W. Biederman 提交于
      This is such a rare path it took me a while to figure out how to test
      this after soring out the locking.
      
      This patch does several things.
      - The variables used are moved into a structure and declared in vt_kern.h
      - A spinlock is added so we don't have SMP races updating the values.
      - Instead of raw pid_t value a struct_pid is used to guard against
        pid wrap around issues, if the daemon to spawn a new console dies.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      81af8d67
    • E
      [PATCH] pid: implement pid_nr · 5feb8f5f
      Eric W. Biederman 提交于
      As we stop storing pid_t's and move to storing struct pid *.  We need a way to
      get the pid_t from the struct pid to report to user space what we have stored.
      
      Having a clean well defined way to do this is especially important as we move
      to multiple pid spaces as may need to report a different value to the caller
      depending on which pid space the caller is in.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5feb8f5f
    • E
      [PATCH] pid: export the symbols needed to use struct pid * · bbf73147
      Eric W. Biederman 提交于
      pids aren't something that drivers should care about.  However there are a lot
      of helper layers in the kernel that do care, and are built as modules.  Before
      I can convert them to using struct pid instead of pid_t I need to export the
      appropriate symbols so they can continue to be built.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bbf73147
    • E
      [PATCH] pid: implement signal functions that take a struct pid * · c4b92fc1
      Eric W. Biederman 提交于
      Currently the signal functions all either take a task or a pid_t argument.
      This patch implements variants that take a struct pid *.  After all of the
      users have been update it is my intention to remove the variants that take a
      pid_t as using pid_t can be more work (an extra hash table lookup) and
      difficult to get right in the presence of multiple pid namespaces.
      
      There are two kinds of functions introduced in this patch.  The are the
      general use functions kill_pgrp and kill_pid which take a priv argument that
      is ultimately used to create the appropriate siginfo information, Then there
      are _kill_pgrp_info, kill_pgrp_info, kill_pid_info the internal implementation
      helpers that take an explicit siginfo.
      
      The distinction is made because filling out an explcit siginfo is tricky, and
      will be even more tricky when pid namespaces are introduced.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c4b92fc1
    • E
      [PATCH] pid: add do_each_pid_task · 558cb325
      Eric W. Biederman 提交于
      To avoid pid rollover confusion the kernel needs to work with struct pid *
      instead of pid_t.  Currently there is not an iterator that walks through all
      of the tasks of a given pid type starting with a struct pid.  This prevents us
      replacing some pid_t instances with struct pid.  So this patch adds
      do_each_pid_task which walks through the set of task for a given pid type
      starting with a struct pid.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      558cb325
    • E
      [PATCH] pid: implement access helpers for a tacks various process groups · 22c935f4
      Eric W. Biederman 提交于
      In the last round of cleaning up the pid hash table a more general struct pid
      was introduced, that can be referenced counted.
      
      With the more general struct pid most if not all places where we store a pid_t
      we can now store a struct pid * and remove the need for a hash table lookup,
      and avoid any possible problems with pid roll over.
      
      Looking forward to the pid namespaces struct pid * gives us an absolute form a
      pid so we can compare and use them without caring which pid namespace we are
      in.
      
      This patchset introduces the infrastructure needed to use struct pid instead
      of pid_t, and then it goes on to convert two different kernel users that
      currently store a pid_t value.
      
      There are a lot more places to go but this is enough to get the basic idea.
      
      Before we can merge a pid namespace patch all of the kernel pid_t users need
      to be examined.  Those that deal with user space processes need to be
      converted to using a struct pid *.  Those that deal with kernel processes need
      to converted to using the kthread api.  A rare few that only use their current
      processes pid values get to be left alone.
      
      This patch:
      
      task_session returns the struct pid of a tasks session.
      task_pgrp    returns the struct pid of a tasks process group.
      task_tgid    returns the struct pid of a tasks thread group.
      task_pid     returns the struct pid of a tasks process id.
      
      These can be used to avoid unnecessary hash table lookups, and to implement
      safe pid comparisions in the face of a pid namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22c935f4
    • E
      [PATCH] proc: give the root directory a task · f6c7a1f3
      Eric W. Biederman 提交于
      Helper functions in base.c like proc_pident_readdir and proc_pident_lookup
      assume the directories have an associated task, and cannot currently be used
      on the /proc root directory because it does not have such a task.
      
      This small changes allows for base.c to be simplified and later when multiple
      pid spaces are introduced it makes getting the needed context information
      trivial.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f6c7a1f3
    • E
      [PATCH] proc: modify proc_pident_lookup to be completely table driven · 20cdc894
      Eric W. Biederman 提交于
      Currently proc_pident_lookup gets the names and types from a table and then
      has a huge switch statement to get the inode and file operations it needs.
      That is silly and is becoming increasingly hard to maintain so I just put all
      of the information in the table.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      20cdc894
    • E
      [PATCH] proc: reorder the functions in base.c · 28a6d671
      Eric W. Biederman 提交于
      There were enough changes in my last round of cleaning up proc I had to break
      up the patch series into smaller chunks, and my last chunk never got resent.
      
      This patchset gives proc dynamic inode numbers (the static inode numbers were
      a pain to maintain and prevent all kinds of things), and removes the horrible
      switch statements that had to be kept in sync with everything else.  Being
      fully table driver takes us 90% of the way of being able to register new
      process specific attributes in proc.
      
      This patch:
      
      Group the functions by what they implement instead of by type of operation.
      As it existed base.c was quickly approaching the point where it could not be
      followed.
      
      No functionality or code changes asside from adding/removing forward
      declartions are implemented in this patch.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      28a6d671
    • E
      [PATCH] proc: readdir race fix (take 3) · 0804ef4b
      Eric W. Biederman 提交于
      The problem: An opendir, readdir, closedir sequence can fail to report
      process ids that are continually in use throughout the sequence of system
      calls.  For this race to trigger the process that proc_pid_readdir stops at
      must exit before readdir is called again.
      
      This can cause ps to fail to report processes, and it is in violation of
      posix guarantees and normal application expectations with respect to
      readdir.
      
      Currently there is no way to work around this problem in user space short
      of providing a gargantuan buffer to user space so the directory read all
      happens in on system call.
      
      This patch implements the normal directory semantics for proc, that
      guarantee that a directory entry that is neither created nor destroyed
      while reading the directory entry will be returned.  For directory that are
      either created or destroyed during the readdir you may or may not see them.
       Furthermore you may seek to a directory offset you have previously seen.
      
      These are the guarantee that ext[23] provides and that posix requires, and
      more importantly that user space expects.  Plus it is a simple semantic to
      implement reliable service.  It is just a matter of calling readdir a
      second time if you are wondering if something new has show up.
      
      These better semantics are implemented by scanning through the pids in
      numerical order and by making the file offset a pid plus a fixed offset.
      
      The pid scan happens on the pid bitmap, which when you look at it is
      remarkably efficient for a brute force algorithm.  Given that a typical
      cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
      are only 40 cache lines for the entire 32K pid space.  A typical system
      will have 100 pids or more so this is actually fewer cache lines we have to
      look at to scan a linked list, and the worst case of having to scan the
      entire pid bitmap is pretty reasonable.
      
      If we need something more efficient we can go to a more efficient data
      structure for indexing the pids, but for now what we have should be
      sufficient.
      
      In addition this takes no additional locks and is actually less code than
      what we are doing now.
      
      Also another very subtle bug in this area has been fixed.  It is possible
      to catch a task in the middle of de_thread where a thread is assuming the
      thread of it's thread group leader.  This patch carefully handles that case
      so if we hit it we don't fail to return the pid, that is undergoing the
      de_thread dance.
      
      Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
      providing the first fix, pointing this out and working on it.
      
      [oleg@tv-sign.ru: fix it]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Jean Delvare <jdelvare@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0804ef4b
    • R
      [PATCH] list module taint flags in Oops/panic · 2bc2d61a
      Randy Dunlap 提交于
      When listing loaded modules during an oops or panic, also list each
      module's Tainted flags if non-zero (P: Proprietary or F: Forced load only).
      
      If a module is did not taint the kernel, it is just listed like
      	usbcore
      but if it did taint the kernel, it is listed like
      	wizmodem(PF)
      
      Example:
      [ 3260.121718] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
      [ 3260.121729]  [<ffffffff8804c099>] :dump_test:proc_dump_test+0x99/0xc8
      [ 3260.121742] PGD fe8d067 PUD 264a6067 PMD 0
      [ 3260.121748] Oops: 0002 [1] SMP
      [ 3260.121753] CPU 1
      [ 3260.121756] Modules linked in: dump_test(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ide_cd generic ohci1394 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd ieee1394 snd_page_alloc piix ide_core arcmsr aic79xx scsi_transport_spi usblp
      [ 3260.121785] Pid: 5556, comm: bash Tainted: P      2.6.18-git10 #1
      
      [Alternatively, I can look into listing tainted flags with 'lsmod',
      but that won't help in oopsen/panics so much.]
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2bc2d61a
    • D
      [PATCH] make genpool allocator adhere to kernel-doc standards · a58cbd7c
      Dean Nelson 提交于
      The exported kernel interfaces of genpool allocator need to adhere to
      the requirements of kernel-doc.
      Signed-off-by: NDean Nelson <dcn@sgi.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a58cbd7c
    • S
      [PATCH] LIB: add gen_pool_destroy() · 322acc96
      Steve Wise 提交于
      Modules using the genpool allocator need to be able to destroy the data
      structure when unloading.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Dean Nelson <dcn@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      322acc96
    • L
      pccard_store_cis: fix wrong error handling · d834c165
      Linus Torvalds 提交于
      The test for the error from pcmcia_replace_cis() was incorrect, and
      would always trigger (because if an error didn't happen, the "ret" value
      would not be zero, it would be the passed-in count).
      
      Reported and debugged by Fabrice Bellet <fabrice@bellet.info>
      
      Rather than just fix the single broken test, make the code in question
      use an understandable code-sequence instead, fixing the whole function
      to be more readable.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d834c165
    • A
      [PATCH] rtc-sysfs fix · 4e9011d5
      Andrew Morton 提交于
      It's not clear how this thinko got through..
      
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4e9011d5
  2. 01 10月, 2006 18 次提交
    • L
      Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart · 82965add
      Linus Torvalds 提交于
      * master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart:
        [AGPGART] printk fixups.
        [AGPGART] Use pci_get_slot not pci_find_slot
      82965add
    • L
      Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq · f0b364a1
      Linus Torvalds 提交于
      * master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
        [CPUFREQ] Make acpi-cpufreq unsticky again.
        [CPUFREQ] longhaul: remove duplicated code.
        [CPUFREQ] Longhaul - Disable arbiter CLE266
        [CPUFREQ] Fix section mismatch warning
        [CPUFREQ] Fix cut-n-paste bug in suspend printk
      f0b364a1
    • Z
      [PATCH] Some config.h removals · 5a73fdc5
      Zachary Amsden 提交于
      During tracking down a PAE compile failure, I found that config.h was being
      included in a bunch of places in i386 code.  It is no longer necessary, so
      drop it.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5a73fdc5
    • Z
      [PATCH] paravirt: update pte hook · 789e6ac0
      Zachary Amsden 提交于
      Add a pte_update_hook which notifies about pte changes that have been made
      without using the set_pte / clear_pte interfaces.  This allows shadow mode
      hypervisors which do not trap on page table access to maintain synchronized
      shadows.
      
      It also turns out, there was one pte update in PAE mode that wasn't using any
      accessor interface at all for setting NX protection.  Considering it is PAE
      specific, and the accessor is i386 specific, I didn't want to add a generic
      encapsulation of this behavior yet.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      789e6ac0
    • Z
      [PATCH] paravirt: remove set pte atomic · a93cb055
      Zachary Amsden 提交于
      Now that ptep_establish has a definition in PAE i386 3-level paging code, the
      only paging model which is insane enough to have multi-word hardware PTEs
      which are not efficient to set atomically, we can remove the ghost of
      set_pte_atomic from other architectures which falesly duplicated it, and
      remove all knowledge of it from the generic pgtable code.
      
      set_pte_atomic is now a private pte operator which is specific to i386
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a93cb055
    • Z
      [PATCH] paravirt: optimize ptep establish for pae · d6d861e3
      Zachary Amsden 提交于
      The ptep_establish macro is only used on user-level PTEs, for P->P mapping
      changes.  Since these always happen under protection of the pagetable lock,
      the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
      even a lock prefix needs to be used.  We can simply instead clear the P-bit,
      followed by a normal set.  The write ordering is still important to avoid the
      possibility of the TLB snooping a partially written PTE and getting a bad
      mapping installed.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6d861e3
    • Z
      [PATCH] paravirt: kpte flush · 23002d88
      Zachary Amsden 提交于
      Create a new PTE function which combines clearing a kernel PTE with the
      subsequent flush.  This allows the two to be easily combined into a single
      hypercall or paravirt-op.  More subtly, reverse the order of the flush for
      kmap_atomic.  Instead of flushing on establishing a mapping, flush on clearing
      a mapping.  This eliminates the possibility of leaving stale kmap entries
      which may still have valid TLB mappings.  This is required for direct mode
      hypervisors, which need to reprotect all mappings of a given page when
      changing the page type from a normal page to a protected page (such as a page
      table or descriptor table page).  But it also provides some nicer semantics
      for real hardware, by providing extra debug-proofing against using stale
      mappings, as well as ensuring that no stale mappings exist when changing the
      cacheability attributes of a page, which could lead to cache conflicts when
      two different types of mappings exist for the same page.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      23002d88
    • Z
      [PATCH] paravirt: combine flush accessed dirty.patch · 25e4df5b
      Zachary Amsden 提交于
      Remove ptep_test_and_clear_{dirty|young} from i386, and instead use the
      dominating functions, ptep_clear_flush_{dirty|young}.  This allows the TLB
      page flush to be contained in the same macro, and allows for an eager
      optimization - if reading the PTE initially returned dirty/accessed, we can
      assume the fact that no subsequent update to the PTE which cleared accessed /
      dirty has occurred, as the only way A/D bits can change without holding the
      page table lock is if a remote processor clears them.  This eliminates an
      extra branch which came from the generic version of the code, as we know that
      no other CPU could have cleared the A/D bit, so the flush will always be
      needed.
      
      We still export these two defines, even though we do not actually define
      the macros in the i386 code:
      
       #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
       #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
      
      The reason for this is that the only use of these functions is within the
      generic clear_flush functions, and we want a strong guarantee that there
      are no other users of these functions, so we want to prevent the generic
      code from defining them for us.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      25e4df5b
    • Z
      [PATCH] paravirt: lazy mmu mode hooks.patch · 6606c3e0
      Zachary Amsden 提交于
      Implement lazy MMU update hooks which are SMP safe for both direct and shadow
      page tables.  The idea is that PTE updates and page invalidations while in
      lazy mode can be batched into a single hypercall.  We use this in VMI for
      shadow page table synchronization, and it is a win.  It also can be used by
      PPC and for direct page tables on Xen.
      
      For SMP, the enter / leave must happen under protection of the page table
      locks for page tables which are being modified.  This is because otherwise,
      you end up with stale state in the batched hypercall, which other CPUs can
      race ahead of.  Doing this under the protection of the locks guarantees the
      synchronization is correct, and also means that spurious faults which are
      generated during this window by remote CPUs are properly handled, as the page
      fault handler must re-check the PTE under protection of the same lock.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6606c3e0
    • Z
      [PATCH] paravirt: pte clear not present · 9888a1ca
      Zachary Amsden 提交于
      Change pte_clear_full to a more appropriately named pte_clear_not_present,
      allowing optimizations when not-present mapping changes need not be reflected
      in the hardware TLB for protected page table modes.  There is also another
      case that can use it in the fremap code.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9888a1ca
    • Z
      [PATCH] paravirt: remove read hazard from cow · 3dc90795
      Zachary Amsden 提交于
      We don't want to read PTEs directly like this after they have been modified,
      as a lazy MMU implementation of direct page tables may not have written the
      updated PTE back to memory yet.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3dc90795
    • A
      [PATCH] invalidate_inode_pages2(): ignore page refcounts · bd4c8ce4
      Andrew Morton 提交于
      The recent fix to invalidate_inode_pages() (git commit 016eb4a0) managed to
      unfix invalidate_inode_pages2().
      
      The problem is that various bits of code in the kernel can take transient refs
      on pages: the page scanner will do this when inspecting a batch of pages, and
      the lru_cache_add() batching pagevecs also hold a ref.
      
      Net result is transient failures in invalidate_inode_pages2().  This affects
      NFS directory invalidation (observed) and presumably also block-backed
      direct-io (not yet reported).
      
      Fix it by reverting invalidate_inode_pages2() back to the old version which
      ignores the page refcounts.
      
      We may come up with something more clever later, but for now we need a 2.6.18
      fix for NFS.
      
      Cc: Chuck Lever <cel@citi.umich.edu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bd4c8ce4
    • A
      [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern · d025c9db
      Andi Kleen 提交于
      Using the infrastructure created in previous patches implement support to
      pipe core dumps into programs.
      
      This is done by overloading the existing core_pattern sysctl
      with a new syntax:
      
      |program
      
      When the first character of the pattern is a '|' the kernel will instead
      threat the rest of the pattern as a command to run.  The core dump will be
      written to the standard input of that program instead of to a file.
      
      This is useful for having automatic core dump analysis without filling up
      disks.  The program can do some simple analysis and save only a summary of
      the core dump.
      
      The core dump proces will run with the privileges and in the name space of
      the process that caused the core dump.
      
      I also increased the core pattern size to 128 bytes so that longer command
      lines fit.
      
      Most of the changes comes from allowing core dumps without seeks.  They are
      fairly straight forward though.
      
      One small incompatibility is that if someone had a core pattern previously
      that started with '|' they will get suddenly new behaviour.  I think that's
      unlikely to be a real problem though.
      
      Additional background:
      
      > Very nice, do you happen to have a program that can accept this kind of
      > input for crash dumps?  I'm guessing that the embedded people will
      > really want this functionality.
      
      I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
      ran gdb on it to get a backtrace and wrote the summary to a shared directory.
      Then there was a simple CGI script to generate a "top 10" crashes HTML
      listing.
      
      Unfortunately this still had the disadvantage to needing full disk space for a
      dump except for deleting it afterwards (in fact it was worse because over the
      pipe holes didn't work so if you have a holey address map it would require
      more space).
      
      Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
      cores (at least it worked with zsh's =(cat core) syntax), so it would be
      likely possible to do it without temporary space with a simple wrapper that
      calls it in the right way.  I ran out of time before doing that though.
      
      The demo prototype scripts weren't very good.  If there is really interest I
      can dig them out (they are currently on a laptop disk on the desk with the
      laptop itself being in service), but I would recommend to rewrite them for any
      serious application of this and fix the disk space problem.
      
      Also to be really useful it should probably find a way to automatically fetch
      the debuginfos (I cheated and just installed them in advance).  If nobody else
      does it I can probably do the rewrite myself again at some point.
      
      My hope at some point was that desktops would support it in their builtin
      crash reporters, but at least the KDE people I talked too seemed to be happy
      with their user space only solution.
      
      Alan sayeth:
      
        I don't believe that piping as such as neccessarily the right model, but
        the ability to intercept and processes core dumps from user space is asked
        for by many enterprise users as well.  They want to know about, capture,
        analyse and process core dumps, often centrally and in automated form.
      
      [akpm@osdl.org: loff_t != unsigned long]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d025c9db
    • A
      [PATCH] Create call_usermodehelper_pipe() · e239ca54
      Andi Kleen 提交于
      A new member in the ever growing family of call_usermode* functions is
      born.  The new call_usermodehelper_pipe() function allows to pipe data to
      the stdin of the called user mode progam and behaves otherwise like the
      normal call_usermodehelp() (except that it always waits for the child to
      finish)
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e239ca54
    • A
      [PATCH] Some cleanup in the pipe code · d6cbd281
      Andi Kleen 提交于
      Split the big and hard to read do_pipe function into smaller pieces.
      
      This creates new create_write_pipe/free_write_pipe/create_read_pipe
      functions.  These functions are made global so that they can be used by
      other parts of the kernel.
      
      The resulting code is more generic and easier to read and has cleaner error
      handling and less gotos.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6cbd281
    • A
      [PATCH] ioremap balanced with iounmap for drivers/serial/sunsu.c · 65da4d81
      Amol Lad 提交于
      ioremap must be balanced by an iounmap and failing to do so can result
      in a memory leak.
      Signed-off-by: NAmol Lad <amol@verismonetworks.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David S. Miller <davem@sunset.davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      65da4d81
    • A
      [PATCH] ioremap balanced with iounmap for drivers/serial/mux.c · af907dc8
      Amol Lad 提交于
      ioremap must be balanced by an iounmap and failing to do so can result
      in a memory leak.
      Signed-off-by: NAmol Lad <amol@verismonetworks.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      af907dc8
    • A
      [PATCH] ioremap balanced with iounmap for drivers/serial/mpsc.c · a141a043
      Amol Lad 提交于
      ioremap must be balanced by an iounmap and failing to do so can result
      in a memory leak.
      Signed-off-by: NAmol Lad <amol@verismonetworks.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Mark A. Greer <mgreer@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a141a043