1. 02 10月, 2006 11 次提交
    • E
      [PATCH] file: modify struct fown_struct to use a struct pid · 609d7fa9
      Eric W. Biederman 提交于
      File handles can be requested to send sigio and sigurg to processes.  By
      tracking the destination processes using struct pid instead of pid_t we make
      the interface safe from all potential pid wrap around problems.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      609d7fa9
    • E
      [PATCH] vt: Make vt_pid a struct pid (making it pid wrap around safe). · bde0d2c9
      Eric W. Biederman 提交于
      I took a good hard look at the locking and it appears the locking on vt_pid
      is the console semaphore.  Every modified path is called under the console
      semaphore except reset_vc when it is called from fn_SAK or do_SAK both of
      which appear to be in interrupt context.  In addition I need to be careful
      because in the presence of an oops the console_sem may be arbitrarily
      dropped.
      
      Which leads me to conclude the current locking is inadequate for my needs.
      
      Given the weird cases we could hit because of oops printing instead of
      introducing an extra spin lock to protect the data and keep the pid to
      signal and the signal to send in sync, I have opted to use xchg on just the
      struct pid * pointer instead.
      
      Due to console_sem we will stay in sync between vt_pid and vt_mode except
      for a small window during a SAK, or oops handling.  SAK handling should
      kill any user space process that care, and oops handling we are broken
      anyway.  Besides the worst that can happen is that I try to send the wrong
      signal.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bde0d2c9
    • E
      [PATCH] vt: rework the console spawning variables · 81af8d67
      Eric W. Biederman 提交于
      This is such a rare path it took me a while to figure out how to test
      this after soring out the locking.
      
      This patch does several things.
      - The variables used are moved into a structure and declared in vt_kern.h
      - A spinlock is added so we don't have SMP races updating the values.
      - Instead of raw pid_t value a struct_pid is used to guard against
        pid wrap around issues, if the daemon to spawn a new console dies.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      81af8d67
    • E
      [PATCH] pid: implement pid_nr · 5feb8f5f
      Eric W. Biederman 提交于
      As we stop storing pid_t's and move to storing struct pid *.  We need a way to
      get the pid_t from the struct pid to report to user space what we have stored.
      
      Having a clean well defined way to do this is especially important as we move
      to multiple pid spaces as may need to report a different value to the caller
      depending on which pid space the caller is in.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5feb8f5f
    • E
      [PATCH] pid: implement signal functions that take a struct pid * · c4b92fc1
      Eric W. Biederman 提交于
      Currently the signal functions all either take a task or a pid_t argument.
      This patch implements variants that take a struct pid *.  After all of the
      users have been update it is my intention to remove the variants that take a
      pid_t as using pid_t can be more work (an extra hash table lookup) and
      difficult to get right in the presence of multiple pid namespaces.
      
      There are two kinds of functions introduced in this patch.  The are the
      general use functions kill_pgrp and kill_pid which take a priv argument that
      is ultimately used to create the appropriate siginfo information, Then there
      are _kill_pgrp_info, kill_pgrp_info, kill_pid_info the internal implementation
      helpers that take an explicit siginfo.
      
      The distinction is made because filling out an explcit siginfo is tricky, and
      will be even more tricky when pid namespaces are introduced.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c4b92fc1
    • E
      [PATCH] pid: add do_each_pid_task · 558cb325
      Eric W. Biederman 提交于
      To avoid pid rollover confusion the kernel needs to work with struct pid *
      instead of pid_t.  Currently there is not an iterator that walks through all
      of the tasks of a given pid type starting with a struct pid.  This prevents us
      replacing some pid_t instances with struct pid.  So this patch adds
      do_each_pid_task which walks through the set of task for a given pid type
      starting with a struct pid.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      558cb325
    • E
      [PATCH] pid: implement access helpers for a tacks various process groups · 22c935f4
      Eric W. Biederman 提交于
      In the last round of cleaning up the pid hash table a more general struct pid
      was introduced, that can be referenced counted.
      
      With the more general struct pid most if not all places where we store a pid_t
      we can now store a struct pid * and remove the need for a hash table lookup,
      and avoid any possible problems with pid roll over.
      
      Looking forward to the pid namespaces struct pid * gives us an absolute form a
      pid so we can compare and use them without caring which pid namespace we are
      in.
      
      This patchset introduces the infrastructure needed to use struct pid instead
      of pid_t, and then it goes on to convert two different kernel users that
      currently store a pid_t value.
      
      There are a lot more places to go but this is enough to get the basic idea.
      
      Before we can merge a pid namespace patch all of the kernel pid_t users need
      to be examined.  Those that deal with user space processes need to be
      converted to using a struct pid *.  Those that deal with kernel processes need
      to converted to using the kthread api.  A rare few that only use their current
      processes pid values get to be left alone.
      
      This patch:
      
      task_session returns the struct pid of a tasks session.
      task_pgrp    returns the struct pid of a tasks process group.
      task_tgid    returns the struct pid of a tasks thread group.
      task_pid     returns the struct pid of a tasks process id.
      
      These can be used to avoid unnecessary hash table lookups, and to implement
      safe pid comparisions in the face of a pid namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22c935f4
    • E
      [PATCH] proc: modify proc_pident_lookup to be completely table driven · 20cdc894
      Eric W. Biederman 提交于
      Currently proc_pident_lookup gets the names and types from a table and then
      has a huge switch statement to get the inode and file operations it needs.
      That is silly and is becoming increasingly hard to maintain so I just put all
      of the information in the table.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      20cdc894
    • E
      [PATCH] proc: readdir race fix (take 3) · 0804ef4b
      Eric W. Biederman 提交于
      The problem: An opendir, readdir, closedir sequence can fail to report
      process ids that are continually in use throughout the sequence of system
      calls.  For this race to trigger the process that proc_pid_readdir stops at
      must exit before readdir is called again.
      
      This can cause ps to fail to report processes, and it is in violation of
      posix guarantees and normal application expectations with respect to
      readdir.
      
      Currently there is no way to work around this problem in user space short
      of providing a gargantuan buffer to user space so the directory read all
      happens in on system call.
      
      This patch implements the normal directory semantics for proc, that
      guarantee that a directory entry that is neither created nor destroyed
      while reading the directory entry will be returned.  For directory that are
      either created or destroyed during the readdir you may or may not see them.
       Furthermore you may seek to a directory offset you have previously seen.
      
      These are the guarantee that ext[23] provides and that posix requires, and
      more importantly that user space expects.  Plus it is a simple semantic to
      implement reliable service.  It is just a matter of calling readdir a
      second time if you are wondering if something new has show up.
      
      These better semantics are implemented by scanning through the pids in
      numerical order and by making the file offset a pid plus a fixed offset.
      
      The pid scan happens on the pid bitmap, which when you look at it is
      remarkably efficient for a brute force algorithm.  Given that a typical
      cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
      are only 40 cache lines for the entire 32K pid space.  A typical system
      will have 100 pids or more so this is actually fewer cache lines we have to
      look at to scan a linked list, and the worst case of having to scan the
      entire pid bitmap is pretty reasonable.
      
      If we need something more efficient we can go to a more efficient data
      structure for indexing the pids, but for now what we have should be
      sufficient.
      
      In addition this takes no additional locks and is actually less code than
      what we are doing now.
      
      Also another very subtle bug in this area has been fixed.  It is possible
      to catch a task in the middle of de_thread where a thread is assuming the
      thread of it's thread group leader.  This patch carefully handles that case
      so if we hit it we don't fail to return the pid, that is undergoing the
      de_thread dance.
      
      Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
      providing the first fix, pointing this out and working on it.
      
      [oleg@tv-sign.ru: fix it]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Jean Delvare <jdelvare@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0804ef4b
    • R
      [PATCH] list module taint flags in Oops/panic · 2bc2d61a
      Randy Dunlap 提交于
      When listing loaded modules during an oops or panic, also list each
      module's Tainted flags if non-zero (P: Proprietary or F: Forced load only).
      
      If a module is did not taint the kernel, it is just listed like
      	usbcore
      but if it did taint the kernel, it is listed like
      	wizmodem(PF)
      
      Example:
      [ 3260.121718] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
      [ 3260.121729]  [<ffffffff8804c099>] :dump_test:proc_dump_test+0x99/0xc8
      [ 3260.121742] PGD fe8d067 PUD 264a6067 PMD 0
      [ 3260.121748] Oops: 0002 [1] SMP
      [ 3260.121753] CPU 1
      [ 3260.121756] Modules linked in: dump_test(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ide_cd generic ohci1394 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd ieee1394 snd_page_alloc piix ide_core arcmsr aic79xx scsi_transport_spi usblp
      [ 3260.121785] Pid: 5556, comm: bash Tainted: P      2.6.18-git10 #1
      
      [Alternatively, I can look into listing tainted flags with 'lsmod',
      but that won't help in oopsen/panics so much.]
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2bc2d61a
    • S
      [PATCH] LIB: add gen_pool_destroy() · 322acc96
      Steve Wise 提交于
      Modules using the genpool allocator need to be able to destroy the data
      structure when unloading.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Dean Nelson <dcn@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      322acc96
  2. 01 10月, 2006 29 次提交
    • Z
      [PATCH] Some config.h removals · 5a73fdc5
      Zachary Amsden 提交于
      During tracking down a PAE compile failure, I found that config.h was being
      included in a bunch of places in i386 code.  It is no longer necessary, so
      drop it.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5a73fdc5
    • Z
      [PATCH] paravirt: update pte hook · 789e6ac0
      Zachary Amsden 提交于
      Add a pte_update_hook which notifies about pte changes that have been made
      without using the set_pte / clear_pte interfaces.  This allows shadow mode
      hypervisors which do not trap on page table access to maintain synchronized
      shadows.
      
      It also turns out, there was one pte update in PAE mode that wasn't using any
      accessor interface at all for setting NX protection.  Considering it is PAE
      specific, and the accessor is i386 specific, I didn't want to add a generic
      encapsulation of this behavior yet.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      789e6ac0
    • Z
      [PATCH] paravirt: remove set pte atomic · a93cb055
      Zachary Amsden 提交于
      Now that ptep_establish has a definition in PAE i386 3-level paging code, the
      only paging model which is insane enough to have multi-word hardware PTEs
      which are not efficient to set atomically, we can remove the ghost of
      set_pte_atomic from other architectures which falesly duplicated it, and
      remove all knowledge of it from the generic pgtable code.
      
      set_pte_atomic is now a private pte operator which is specific to i386
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a93cb055
    • Z
      [PATCH] paravirt: optimize ptep establish for pae · d6d861e3
      Zachary Amsden 提交于
      The ptep_establish macro is only used on user-level PTEs, for P->P mapping
      changes.  Since these always happen under protection of the pagetable lock,
      the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
      even a lock prefix needs to be used.  We can simply instead clear the P-bit,
      followed by a normal set.  The write ordering is still important to avoid the
      possibility of the TLB snooping a partially written PTE and getting a bad
      mapping installed.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6d861e3
    • Z
      [PATCH] paravirt: kpte flush · 23002d88
      Zachary Amsden 提交于
      Create a new PTE function which combines clearing a kernel PTE with the
      subsequent flush.  This allows the two to be easily combined into a single
      hypercall or paravirt-op.  More subtly, reverse the order of the flush for
      kmap_atomic.  Instead of flushing on establishing a mapping, flush on clearing
      a mapping.  This eliminates the possibility of leaving stale kmap entries
      which may still have valid TLB mappings.  This is required for direct mode
      hypervisors, which need to reprotect all mappings of a given page when
      changing the page type from a normal page to a protected page (such as a page
      table or descriptor table page).  But it also provides some nicer semantics
      for real hardware, by providing extra debug-proofing against using stale
      mappings, as well as ensuring that no stale mappings exist when changing the
      cacheability attributes of a page, which could lead to cache conflicts when
      two different types of mappings exist for the same page.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      23002d88
    • Z
      [PATCH] paravirt: combine flush accessed dirty.patch · 25e4df5b
      Zachary Amsden 提交于
      Remove ptep_test_and_clear_{dirty|young} from i386, and instead use the
      dominating functions, ptep_clear_flush_{dirty|young}.  This allows the TLB
      page flush to be contained in the same macro, and allows for an eager
      optimization - if reading the PTE initially returned dirty/accessed, we can
      assume the fact that no subsequent update to the PTE which cleared accessed /
      dirty has occurred, as the only way A/D bits can change without holding the
      page table lock is if a remote processor clears them.  This eliminates an
      extra branch which came from the generic version of the code, as we know that
      no other CPU could have cleared the A/D bit, so the flush will always be
      needed.
      
      We still export these two defines, even though we do not actually define
      the macros in the i386 code:
      
       #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
       #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
      
      The reason for this is that the only use of these functions is within the
      generic clear_flush functions, and we want a strong guarantee that there
      are no other users of these functions, so we want to prevent the generic
      code from defining them for us.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      25e4df5b
    • Z
      [PATCH] paravirt: lazy mmu mode hooks.patch · 6606c3e0
      Zachary Amsden 提交于
      Implement lazy MMU update hooks which are SMP safe for both direct and shadow
      page tables.  The idea is that PTE updates and page invalidations while in
      lazy mode can be batched into a single hypercall.  We use this in VMI for
      shadow page table synchronization, and it is a win.  It also can be used by
      PPC and for direct page tables on Xen.
      
      For SMP, the enter / leave must happen under protection of the page table
      locks for page tables which are being modified.  This is because otherwise,
      you end up with stale state in the batched hypercall, which other CPUs can
      race ahead of.  Doing this under the protection of the locks guarantees the
      synchronization is correct, and also means that spurious faults which are
      generated during this window by remote CPUs are properly handled, as the page
      fault handler must re-check the PTE under protection of the same lock.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6606c3e0
    • Z
      [PATCH] paravirt: pte clear not present · 9888a1ca
      Zachary Amsden 提交于
      Change pte_clear_full to a more appropriately named pte_clear_not_present,
      allowing optimizations when not-present mapping changes need not be reflected
      in the hardware TLB for protected page table modes.  There is also another
      case that can use it in the fremap code.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9888a1ca
    • A
      [PATCH] Create call_usermodehelper_pipe() · e239ca54
      Andi Kleen 提交于
      A new member in the ever growing family of call_usermode* functions is
      born.  The new call_usermodehelper_pipe() function allows to pipe data to
      the stdin of the called user mode progam and behaves otherwise like the
      normal call_usermodehelp() (except that it always waits for the child to
      finish)
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e239ca54
    • A
      [PATCH] Some cleanup in the pipe code · d6cbd281
      Andi Kleen 提交于
      Split the big and hard to read do_pipe function into smaller pieces.
      
      This creates new create_write_pipe/free_write_pipe/create_read_pipe
      functions.  These functions are made global so that they can be used by
      other parts of the kernel.
      
      The resulting code is more generic and easier to read and has cleaner error
      handling and less gotos.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6cbd281
    • H
      [PATCH] Generic ioremap_page_range: implementation · 74588d8b
      Haavard Skinnemoen 提交于
      This patch adds a generic implementation of ioremap_page_range() in
      lib/ioremap.c based on the i386 implementation. It differs from the
      i386 version in the following ways:
      
        * The PTE flags are passed as a pgprot_t argument and must be
          determined up front by the arch-specific code. No additional
          PTE flags are added.
        * Uses set_pte_at() instead of set_pte()
      
      [bunk@stusta.de: warning fix]
      ]dhowells@redhat.com: nommu build fix]
      Signed-off-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: <linux-m32r@ml.linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@parisc-linux.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      74588d8b
    • F
      [PATCH] stack overflow safe kdump: safe_smp_processor_id() · dc2bc768
      Fernando Vazquez 提交于
      This is a the first of a series of patch-sets aiming at making kdump more
      robust against stack overflows.
      
      This patch set does the following:
      
      * Add safe_smp_processor_id function to i386 architecture (this function was
        inspired by the x86_64 function of the same name).
      
      * Substitute "smp_processor_id" with the stack overflow-safe
        "safe_smp_processor_id" in the reboot path to the second kernel.
      
      This patch:
      
      On the event of a stack overflow critical data that usually resides at the
      bottom of the stack is likely to be stomped and, consequently, its use should
      be avoided.
      
      In particular, in the i386 and IA64 architectures the macro smp_processor_id
      ultimately makes use of the "cpu" member of struct thread_info which resides
      at the bottom of the stack.  x86_64, on the other hand, is not affected by
      this problem because it benefits from the use of the PDA infrastructure.
      
      To circumvent this problem I suggest implementing "safe_smp_processor_id()"
      (it already exists in x86_64) for i386 and IA64 and use it as a replacement
      for smp_processor_id in the reboot path to the dump capture kernel.  This is a
      possible implementation for i386.
      Signed-off-by: NFernando Vazquez <fernando@intellilink.co.jp>
      Looks-reasonable-to: Andi Kleen <ak@muc.de>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dc2bc768
    • D
      [PATCH] r/o bind mounts: monitor zeroing of i_nlink · ce71ec36
      Dave Hansen 提交于
      Some filesystems, instead of simply decrementing i_nlink, simply zero it
      during an unlink operation.  We need to catch these in addition to the
      decrement operations.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ce71ec36
    • D
      [PATCH] r/o bind mount prepwork: inc_nlink() helper · d8c76e6f
      Dave Hansen 提交于
      This is mostly included for parity with dec_nlink(), where we will have some
      more hooks.  This one should stay pretty darn straightforward for now.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d8c76e6f
    • D
      [PATCH] r/o bind mounts: unlink: monitor i_nlink · 9a53c3a7
      Dave Hansen 提交于
      When a filesystem decrements i_nlink to zero, it means that a write must be
      performed in order to drop the inode from the filesystem.
      
      We're shortly going to have keep filesystems from being remounted r/o between
      the time that this i_nlink decrement and that write occurs.
      
      So, add a little helper function to do the decrements.  We'll tie into it in a
      bit to note when i_nlink hits zero.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a53c3a7
    • J
      [PATCH] csa accounting taskstats update · db5fed26
      Jay Lan 提交于
      ChangeLog:
         Feedbacks from Andrew Morton:
         - define TS_COMM_LEN to 32
         - change acct_stimexpd field of task_struct to be of
           cputime_t, which is to be used to save the tsk->stime
           of last timer interrupt update.
         - a new Documentation/accounting/taskstats-struct.txt
           to describe fields of taskstats struct.
      
         Feedback from Balbir Singh:
         - keep the stime of a task to be zero when both stime
           and utime are zero as recoreded in task_struct.
      
         Misc:
         - convert accumulated RSS/VM from platform dependent
           pages-ticks to MBytes-usecs in the kernel
      
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      db5fed26
    • J
      [PATCH] csa: convert CONFIG tag for extended accounting routines · 8f0ab514
      Jay Lan 提交于
      There were a few accounting data/macros that are used in CSA but are #ifdef'ed
      inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
      CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
      kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
      include/linux/tsacct_kern.h.
      Signed-off-by: NJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f0ab514
    • J
      [PATCH] csa: Extended system accounting over taskstats · 9acc1853
      Jay Lan 提交于
      Add extended system accounting handling over taskstats interface.  A
      CONFIG_TASK_XACCT flag is created to enable the extended accounting code.
      Signed-off-by: NJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9acc1853
    • J
      [PATCH] csa: basic accounting over taskstats · f3cef7a9
      Jay Lan 提交于
      Add some basic accounting fields to the taskstats struct, add a new
      kernel/tsacct.c to handle basic accounting data handling upon exit.  A handle
      is added to taskstats.c to invoke the basic accounting data handling.
      Signed-off-by: NJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f3cef7a9
    • B
      [PATCH] Add genetlink utilities for payload length calculation · 17db952c
      Balbir Singh 提交于
      Add two utility helper functions genlmsg_msg_size() and genlmsg_total_size().
      These functions are derived from their netlink counterparts.
      Signed-off-by: NBalbir Singh <balbir@in.ibm.com>
      Cc: Jamal Hadi <hadi@cyberus.ca>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      17db952c
    • C
      [PATCH] clean up unused kiocb variables · 31608214
      Chen, Kenneth W 提交于
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NZach Brown <zach.brown@oracle.com>
      Cc: Suparna Bhattacharya <suparna@in.ibm.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      31608214
    • B
      [PATCH] Add vector AIO support · eed4e51f
      Badari Pulavarty 提交于
      This work is initially done by Zach Brown to add support for vectored aio.
      These are the core changes for AIO to support
      IOCB_CMD_PREADV/IOCB_CMD_PWRITEV.
      
      [akpm@osdl.org: huge build fix]
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Acked-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      eed4e51f
    • B
      [PATCH] Streamline generic_file_* interfaces and filemap cleanups · 543ade1f
      Badari Pulavarty 提交于
      This patch cleans up generic_file_*_read/write() interfaces.  Christoph
      Hellwig gave me the idea for this clean ups.
      
      In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
      do_sync_read/ do_sync_write() as their .read/.write methods.  This allows us
      to cleanup all variants of generic_file_* routines.
      
      Final available interfaces:
      
      generic_file_aio_read() - read handler
      generic_file_aio_write() - write handler
      generic_file_aio_write_nolock() - no lock write handler
      
      __generic_file_aio_write_nolock() - internal worker routine
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      543ade1f
    • B
      [PATCH] Remove readv/writev methods and use aio_read/aio_write instead · ee0b3e67
      Badari Pulavarty 提交于
      This patch removes readv() and writev() methods and replaces them with
      aio_read()/aio_write() methods.
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee0b3e67
    • B
      [PATCH] Vectorize aio_read/aio_write fileop methods · 027445c3
      Badari Pulavarty 提交于
      This patch vectorizes aio_read() and aio_write() methods to prepare for
      collapsing all aio & vectored operations into one interface - which is
      aio_read()/aio_write().
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      027445c3
    • J
      [PATCH] reiserfs: on-demand bitmap loading · 5065227b
      Jeff Mahoney 提交于
      This is the patch the three previous ones have been leading up to.
      
      It changes the behavior of ReiserFS from loading and caching all the bitmaps
      as special, to treating the bitmaps like any other bit of metadata and just
      letting the system-wide caches figure out what to hang on to.
      
      Buffer heads are allocated on the fly, so there is no need to retain pointers
      to all of them.  The caching of the metadata occurs when the data is read and
      updated, and is considered invalid and uncached until then.
      
      I needed to remove the vs-4040 check for performing a duplicate operation on a
      particular bit.  The reason is that while the other sites for working with
      bitmaps are allowed to schedule, is_reusable() is called from do_balance(),
      which will panic if a schedule occurs in certain places.
      
      The benefit of on-demand bitmaps clearly outweighs a sanity check that depends
      on a compile-time option that is discouraged.
      
      [akpm@osdl.org: warning fix]
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Cc: <reiserfs-dev@namesys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5065227b
    • J
      [PATCH] reiserfs: reorganize bitmap loading functions · 6f01046b
      Jeff Mahoney 提交于
      This patch moves the bitmap loading code from super.c to bitmap.c
      
      The code is also restructured somewhat.  The only difference between new
      format bitmaps and old format bitmaps is where they are.  That's a two liner
      before loading the block to use the correct one.  There's no need for an
      entirely separate code path.
      
      The load path is generally the same, with the pattern being to throw out a
      bunch of requests and then wait for them, then cache the metadata from the
      contents.
      
      Again, like the previous patches, the purpose is to set up for later ones.
      
      Update: There was a bug in the previously posted version of this that resulted
      in corruption.  The problem was that bitmap 0 on new format file systems must
      be treated specially, and wasn't.  A stupid bug with an easy fix.
      
      This is hopefully the last fix for the disaster that is the reiserfs bitmap
      patch set.
      
      If a bitmap block was full, first_zero_hint would end up at zero since it
      would never be changed from it's zeroed out value.  This just sets it
      beyond the end of the bitmap block.  If any bits are freed, it will be
      reset to a valid bit.  When info->free_count = 0, then we already know it's
      full.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Cc: <reiserfs-dev@namesys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6f01046b
    • J
      [PATCH] reiserfs: fix is_reusable bitmap check to not traverse the bitmap info array · e1fabd3c
      Jeff Mahoney 提交于
      There is a check in is_reusable to determine if a particular block is a bitmap
      block.  It verifies this by going through the array of bitmap block buffer
      heads and comparing the block number to each one.
      
      Bitmap blocks are at defined locations on the disk in both old and current
      formats.  Simply checking against the known good values is enough.
      
      This is a trivial optimization for a non-production codepath, but this is the
      first in a series of patches that will ultimately remove the buffer heads from
      that array.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Cc: <reiserfs-dev@namesys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e1fabd3c
    • A
      [PATCH] kill wall_jiffies · 8ef38609
      Atsushi Nemoto 提交于
      With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
      So we can kill wall_jiffies completely.
      
      This is just a cleanup and logically should not change any real behavior
      except for one thing: RTC updating code in (old) ppc and xtensa use a
      condition "jiffies - wall_jiffies == 1".  This condition is never met so I
      suppose it is just a bug.  I just remove that condition only instead of
      kill the whole "if" block.
      
      [heiko.carstens@de.ibm.com: s390 build fix and cleanup]
      Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ef38609