1. 24 6月, 2005 40 次提交
    • P
      [PATCH] kprobes: Temporary disarming of reentrant probe for ppc64 · 42cc2060
      Prasanna S Panchamukhi 提交于
      This patch includes ppc64 architecture specific changes to support temporary
      disarming on reentrancy of probes.
      Signed-of-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      42cc2060
    • P
      [PATCH] kprobes: Temporary disarming of reentrant probe for x86_64 · aa3d7e3d
      Prasanna S Panchamukhi 提交于
      This patch includes x86_64 architecture specific changes to support temporary
      disarming on reentrancy of probes.
      Signed-of-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aa3d7e3d
    • P
      [PATCH] kprobes: Temporary disarming of reentrant probe for i386 · 417c8da6
      Prasanna S Panchamukhi 提交于
      This patch includes i386 architecture specific changes to support temporary
      disarming on reentrancy of probes.
      Signed-of-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      417c8da6
    • P
      [PATCH] kprobes: Temporary disarming of reentrant probe · ea32c65c
      Prasanna S Panchamukhi 提交于
      In situations where a kprobes handler calls a routine which has a probe on it,
      then kprobes_handler() disarms the new probe forever.  This patch removes the
      above limitation by temporarily disarming the new probe.  When the another
      probe hits while handling the old probe, the kprobes_handler() saves previous
      kprobes state and handles the new probe without calling the new kprobes
      registered handlers.  kprobe_post_handler() restores back the previous kprobes
      state and the normal execution continues.
      
      However on x86_64 architecture, re-rentrancy is provided only through
      pre_handler().  If a routine having probe is referenced through
      post_handler(), then the probes on that routine are disarmed forever, since
      the exception stack is gets changed after the processor single steps the
      instruction of the new probe.
      
      This patch includes generic changes to support temporary disarming on
      reentrancy of probes.
      Signed-of-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ea32c65c
    • K
      [PATCH] Kprobes/IA64: check jprobe break before handling · 89cb14c0
      Keshavamurthy Anil S 提交于
      Once the jprobe instrumented function returns, it executes a jprobe_break
      which is a break instruction with __IA64_JPROBE_BREAK value.  The current
      patch checks for this break value, before assuming that jprobe instrumented
      function just completed.
      
      The previous code was not checking for this value and that was a bug.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      89cb14c0
    • A
      [PATCH] Kprobes IA64: safe register kprobe · 708de8f1
      Anil S Keshavamurthy 提交于
      The current kprobes does not yet handle register kprobes on some of the
      following kind of instruction which needs to be emulated in a special way.
      
      1) mov r1=ip
      2) chk -- Speculation check instruction
      
      This patch attempts to fail register_kprobes() when user tries to insert
      kprobes on the above kind of instruction.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      708de8f1
    • A
      [PATCH] Kprobes IA64: cmp ctype unc support · 1674eafc
      Anil S Keshavamurthy 提交于
      The current Kprobes when patching the original instruction with the break
      instruction tries to retain the original qualifying predicate(qp), however
      for cmp.crel.ctype where ctype == unc, which is a special instruction
      always needs to be executed irrespective of qp.  Hence, if the instruction
      we are patching is of this type, then we should not copy the original qp to
      the break instruction, this is because we always want the break fault to
      happen so that we can emulate the instruction.
      
      This patch is based on the feedback given by David Mosberger
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1674eafc
    • A
      [PATCH] Kprobes IA64: arch_prepare_kprobes() cleanup · a5403183
      Anil S Keshavamurthy 提交于
      arch_prepare_kprobes() was doing lots of functionality
      in just one single function. This patch
      attempts to clean up arch_prepare_kprobes() by moving
      specific sub task to the following (new)functions
      1)valid_kprobe_addr() -->> validate the given kprobe address
      2)get_kprobe_inst(slot..)->> Retrives the instruction for a given slot from the bundle
      3)prepare_break_inst() -->> Prepares break instruction within the bundle
      	3a)update_kprobe_inst_flag()-->>Updates the internal flags, required
      			for proper emulation of the instruction at later
      			point in time.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a5403183
    • R
      [PATCH] Kprobes ia64 qp fix · 13608d64
      Rusty Lynch 提交于
      Fix a bug where a kprobe still fires when the instruction is predicated
      off.  So given the p6=0, and we have an instruction like:
      
      (p6) move loc1=0
      
      we should not be triggering the kprobe.  This is handled by carrying over
      the qp section of the original instruction into the break instruction.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NRusty Lynch <Rusty.lynch@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      13608d64
    • R
      [PATCH] Kprobes ia64 cleanup · 8bc76772
      Rusty Lynch 提交于
      A cleanup of the ia64 kprobes implementation such that all of the bundle
      manipulation logic is concentrated in arch_prepare_kprobe().
      
      With the current design for kprobes, the arch specific code only has a
      chance to return failure inside the arch_prepare_kprobe() function.
      
      This patch moves all of the work that was happening in arch_copy_kprobe()
      and most of the work that was happening in arch_arm_kprobe() into
      arch_prepare_kprobe().  By doing this we can add further robustness checks
      in arch_arm_kprobe() and refuse to insert kprobes that will cause problems.
      Signed-off-by: NRusty Lynch <Rusty.lynch@intel.com>
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8bc76772
    • A
      [PATCH] Kprobes/IA64: support kprobe on branch/call instructions · cd2675bf
      Anil S Keshavamurthy 提交于
      This patch is required to support kprobe on branch/call instructions.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cd2675bf
    • A
      [PATCH] Kprobes/IA64: architecture specific JProbes support · b2761dc2
      Anil S Keshavamurthy 提交于
      This patch adds IA64 architecture specific JProbes support on top of Kprobes
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NRusty Lynch <Rusty.lynch@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b2761dc2
    • A
      [PATCH] Kprobes/IA64: arch specific handling · fd7b231f
      Anil S Keshavamurthy 提交于
      This is an IA64 arch specific handling of Kprobes
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NRusty Lynch <Rusty.lynch@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fd7b231f
    • A
      [PATCH] Kprobes/IA64: kdebug die notification mechanism · 7213b252
      Anil S Keshavamurthy 提交于
      As many of you know that kprobes exist in the main line kernel for various
      architecture including i386, x86_64, ppc64 and sparc64.  Attached patches
      following this mail are a port of Kprobes and Jprobes for IA64.
      
      I have tesed this patches for kprobes and Jprobes and this seems to work fine.
       I have tested this patch by inserting kprobes on various slots and various
      templates including various types of branch instructions.
      
      I have also tested this patch using the tool
      http://marc.theaimsgroup.com/?l=linux-kernel&m=111657358022586&w=2 and the
      kprobes for IA64 works great.
      
      Here is list of TODO things and pathes for the same will appear soon.
      
      1) Support kprobes on "mov r1=ip" type of instruction
      2) Support Kprobes and Jprobes to exist on the same address
      3) Support Return probes
      3) Architecture independent cleanup of kprobes
      
      This patch adds the kdebug die notification mechanism needed by Kprobes.
      
      For break instruction on Branch type slot, imm21 is ignored and value
      zero is placed in IIM register, hence we need to handle kprobes
      for switch case zero.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NRusty Lynch <Rusty.lynch@intel.com>
      
      From: Rusty Lynch <rusty.lynch@intel.com>
      
      At the point in traps.c where we recieve a break with a zero value, we can
      not say if the break was a result of a kprobe or some other debug facility.
      
      This simple patch changes the informational string to a more correct "break
      0" value, and applies to the 2.6.12-rc2-mm2 tree with all the kprobes
      patches that were just recently included for the next mm cut.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7213b252
    • H
      [PATCH] kprobes: moves lock-unlock to non-arch kprobe_flush_task · 0aa55e4d
      Hien Nguyen 提交于
      This patch moves the lock/unlock of the arch specific kprobe_flush_task()
      to the non-arch specific kprobe_flusk_task().
      Signed-off-by: NHien Nguyen <hien@us.ibm.com>
      Acked-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0aa55e4d
    • R
      [PATCH] Move kprobe [dis]arming into arch specific code · 7e1048b1
      Rusty Lynch 提交于
      The architecture independent code of the current kprobes implementation is
      arming and disarming kprobes at registration time.  The problem is that the
      code is assuming that arming and disarming is a just done by a simple write
      of some magic value to an address.  This is problematic for ia64 where our
      instructions look more like structures, and we can not insert break points
      by just doing something like:
      
      *p->addr = BREAKPOINT_INSTRUCTION;
      
      The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent
      functions:
      
           * void arch_arm_kprobe(struct kprobe *p)
           * void arch_disarm_kprobe(struct kprobe *p)
      
      and then adds the new functions for each of the architectures that already
      implement kprobes (spar64/ppc64/i386/x86_64).
      
      I thought arch_[dis]arm_kprobe was the most descriptive of what was really
      happening, but each of the architectures already had a disarm_kprobe()
      function that was really a "disarm and do some other clean-up items as
      needed when you stumble across a recursive kprobe." So...  I took the
      liberty of changing the code that was calling disarm_kprobe() to call
      arch_disarm_kprobe(), and then do the cleanup in the block of code dealing
      with the recursive kprobe case.
      
      So far this patch as been tested on i386, x86_64, and ppc64, but still
      needs to be tested in sparc64.
      Signed-off-by: NRusty Lynch <rusty.lynch@intel.com>
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7e1048b1
    • R
      [PATCH] x86_64 specific function return probes · 73649dab
      Rusty Lynch 提交于
      The following patch adds the x86_64 architecture specific implementation
      for function return probes.
      
      Function return probes is a mechanism built on top of kprobes that allows
      a caller to register a handler to be called when a given function exits.
      For example, to instrument the return path of sys_mkdir:
      
      static int sys_mkdir_exit(struct kretprobe_instance *i, struct pt_regs *regs)
      {
      	printk("sys_mkdir exited\n");
      	return 0;
      }
      static struct kretprobe return_probe = {
      	.handler = sys_mkdir_exit,
      };
      
      <inside setup function>
      
      return_probe.kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("sys_mkdir");
      if (register_kretprobe(&return_probe)) {
      	printk(KERN_DEBUG "Unable to register return probe!\n");
      	/* do error path */
      }
      
      <inside cleanup function>
      unregister_kretprobe(&return_probe);
      
      The way this works is that:
      
      * At system initialization time, kernel/kprobes.c installs a kprobe
        on a function called kretprobe_trampoline() that is implemented in
        the arch/x86_64/kernel/kprobes.c  (More on this later)
      
      * When a return probe is registered using register_kretprobe(),
        kernel/kprobes.c will install a kprobe on the first instruction of the
        targeted function with the pre handler set to arch_prepare_kretprobe()
        which is implemented in arch/x86_64/kernel/kprobes.c.
      
      * arch_prepare_kretprobe() will prepare a kretprobe instance that stores:
        - nodes for hanging this instance in an empty or free list
        - a pointer to the return probe
        - the original return address
        - a pointer to the stack address
      
        With all this stowed away, arch_prepare_kretprobe() then sets the return
        address for the targeted function to a special trampoline function called
        kretprobe_trampoline() implemented in arch/x86_64/kernel/kprobes.c
      
      * The kprobe completes as normal, with control passing back to the target
        function that executes as normal, and eventually returns to our trampoline
        function.
      
      * Since a kprobe was installed on kretprobe_trampoline() during system
        initialization, control passes back to kprobes via the architecture
        specific function trampoline_probe_handler() which will lookup the
        instance in an hlist maintained by kernel/kprobes.c, and then call
        the handler function.
      
      * When trampoline_probe_handler() is done, the kprobes infrastructure
        single steps the original instruction (in this case just a top), and
        then calls trampoline_post_handler().  trampoline_post_handler() then
        looks up the instance again, puts the instance back on the free list,
        and then makes a long jump back to the original return instruction.
      
      So to recap, to instrument the exit path of a function this implementation
      will cause four interruptions:
      
        - A breakpoint at the very beginning of the function allowing us to
          switch out the return address
        - A single step interruption to execute the original instruction that
          we replaced with the break instruction (normal kprobe flow)
        - A breakpoint in the trampoline function where our instrumented function
          returned to
        - A single step interruption to execute the original instruction that
          we replaced with the break instruction (normal kprobe flow)
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      73649dab
    • H
      [PATCH] kprobes: function-return probes · b94cce92
      Hien Nguyen 提交于
      This patch adds function-return probes to kprobes for the i386
      architecture.  This enables you to establish a handler to be run when a
      function returns.
      
      1. API
      
      Two new functions are added to kprobes:
      
      	int register_kretprobe(struct kretprobe *rp);
      	void unregister_kretprobe(struct kretprobe *rp);
      
      2. Registration and unregistration
      
      2.1 Register
      
        To register a function-return probe, the user populates the following
        fields in a kretprobe object and calls register_kretprobe() with the
        kretprobe address as an argument:
      
        kp.addr - the function's address
      
        handler - this function is run after the ret instruction executes, but
        before control returns to the return address in the caller.
      
        maxactive - The maximum number of instances of the probed function that
        can be active concurrently.  For example, if the function is non-
        recursive and is called with a spinlock or mutex held, maxactive = 1
        should be enough.  If the function is non-recursive and can never
        relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should
        be enough.  maxactive is used to determine how many kretprobe_instance
        objects to allocate for this particular probed function.  If maxactive <=
        0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 *
        NR_CPUS) else maxactive=NR_CPUS)
      
        For example:
      
          struct kretprobe rp;
          rp.kp.addr = /* entrypoint address */
          rp.handler = /*return probe handler */
          rp.maxactive = /* e.g., 1 or NR_CPUS or 0, see the above explanation */
          register_kretprobe(&rp);
      
        The following field may also be of interest:
      
        nmissed - Initialized to zero when the function-return probe is
        registered, and incremented every time the probed function is entered but
        there is no kretprobe_instance object available for establishing the
        function-return probe (i.e., because maxactive was set too low).
      
      2.2 Unregister
      
        To unregiter a function-return probe, the user calls
        unregister_kretprobe() with the same kretprobe object as registered
        previously.  If a probed function is running when the return probe is
        unregistered, the function will return as expected, but the handler won't
        be run.
      
      3. Limitations
      
      3.1 This patch supports only the i386 architecture, but patches for
          x86_64 and ppc64 are anticipated soon.
      
      3.2 Return probes operates by replacing the return address in the stack
          (or in a known register, such as the lr register for ppc).  This may
          cause __builtin_return_address(0), when invoked from the return-probed
          function, to return the address of the return-probes trampoline.
      
      3.3 This implementation uses the "Multiprobes at an address" feature in
          2.6.12-rc3-mm3.
      
      3.4 Due to a limitation in multi-probes, you cannot currently establish
          a return probe and a jprobe on the same function.  A patch to remove
          this limitation is being tested.
      
      This feature is required by SystemTap (http://sourceware.org/systemtap),
      and reflects ideas contributed by several SystemTap developers, including
      Will Cohen and Ananth Mavinakayanahalli.
      Signed-off-by: NHien Nguyen <hien@us.ibm.com>
      Signed-off-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@laposte.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b94cce92
    • C
      [PATCH] quota: sanitize dentry handling in vfs_quota_on_mount · 2fa389c5
      Christoph Hellwig 提交于
      Use lookup_one_len instead of opencoding a simplified lookup using
      lookup_hash with a fake hash.
      
      Also there's no need anymore for the d_invalidate as we have a completely
      valid dentry now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2fa389c5
    • C
      [PATCH] quota: consolidate code surrounding vfs_quota_on_mount · 84de856e
      Christoph Hellwig 提交于
      Move some code duplicated in both callers into vfs_quota_on_mount
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NJan Kara <jack@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      84de856e
    • A
      [PATCH] avoid resursive oopses · df164db5
      Alexander Nyberg 提交于
      Prevent recursive faults in do_exit() by leaving the task alone and wait
      for reboot.  This may allow a more graceful shutdown and possibly save the
      original oops.
      Signed-off-by: NAlexander Nyberg <alexn@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      df164db5
    • C
      [PATCH] remove duplicate get_dentry functions in various places · 5f45f1a7
      Christoph Hellwig 提交于
      Various filesystem drivers have grown a get_dentry() function that's a
      duplicate of lookup_one_len, except that it doesn't take a maximum length
      argument and doesn't check for \0 or / in the passed in filename.
      
      Switch all these places to use lookup_one_len.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Greg KH <greg@kroah.com>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5f45f1a7
    • N
      [PATCH] add check to /proc/devices read routines · ac20427e
      Neil Horman 提交于
      Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads
      of /proc/devices from spilling over the provided page if more than 4096
      bytes of string data are generated from all the registered character and
      block devices in a system
      Signed-off-by: NNeil Horman <nhorman@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: <viro@parcelfarce.linux.theplanet.co.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ac20427e
    • P
      [PATCH] remove redundant vm_flags clearing from madvise.c · 3bc1ee3e
      Pekka Enberg 提交于
      This patch removes redundant VM_ClearReadHint from mm/madvice.c which was
      left there by Prasanna's patch.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3bc1ee3e
    • J
      [PATCH] preempt_count is int - remove cast and don't assign to unsigned type · be5b4fbd
      Jesper Juhl 提交于
      In kernel/sched.c the return value from preempt_count() is cast to an int.
      That made sense when preempt_count was defined as different types on is not
      needed and should go away.  The patch removes the cast.
      
      In kernel/timer.c the return value from preempt_count() is assigned to a
      variable of type u32 and then that unsigned value is later compared to
      preempt_count().  Since preempt_count() returns an int, an int is what
      should be used to store its return value.  Storing the result in an
      unsigned 32bit integer made a tiny bit of sense back when preempt_count was
      different types on different archs, but no more - let's not play signed vs
      unsigned comparison games when we don't have to.  The patch modifies the
      code to use an int to hold the value.  While I was around that bit of code
      I also made two changes to a nearby (related) printk() - I modified it to
      specify the loglevel explicitly and also broke the line into a few pieces
      to avoid it being longer than 80 chars and clarified the text a bit.
      Signed-off-by: NJesper Juhl <juhl-lkml@dif.dk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      be5b4fbd
    • J
      [PATCH] streamline preempt_count type across archs · dcd497f9
      Jesper Juhl 提交于
      The preempt_count member of struct thread_info is currently either defined
      as int, unsigned int or __s32 depending on arch.  This patch makes the type
      of preempt_count an int on all archs.
      
      Having preempt_count be an unsigned type prevents the catching of
      preempt_count < 0 bugs, and using int on some archs and __s32 on others is
      not exactely "neat" - much nicer when it's just int all over.
      
      A previous version of this patch was already ACK'ed by Robert Love, and the
      only change in this version of the patch compared to the one he ACK'ed is
      that this one also makes sure the preempt_count member is consistently
      commented.
      Signed-off-by: NJesper Juhl <juhl-lkml@dif.dk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dcd497f9
    • N
      [PATCH] optimise loop driver a bit · 35a82d1a
      Nick Piggin 提交于
      Looks like locking can be optimised quite a lot.  Increase lock widths
      slightly so lo_lock is taken fewer times per request.  Also it was quite
      trivial to cover lo_pending with that lock, and remove the atomic
      requirement.  This also makes memory ordering explicitly correct, which is
      nice (not that I particularly saw any mem ordering bugs).
      
      Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K
      block size, 4K page size).  System is 2 socket Xeon with HT (4 thread).
      
      intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh
      
      Before:
      0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k
      0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
      0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
      0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k
      0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k
      
      After:
      0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
      0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
      0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
      0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
      0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      35a82d1a
    • G
      [PATCH] CON_CONSDEV bit not set correctly on last console · ab4af03a
      Greg Edwards 提交于
      According to include/linux/console.h, CON_CONSDEV flag should be set on
      the last console specified on the boot command line:
      
           86 #define CON_PRINTBUFFER (1)
           87 #define CON_CONSDEV     (2) /* Last on the command line */
           88 #define CON_ENABLED     (4)
           89 #define CON_BOOT        (8)
      
      This does not currently happen if there is more than one console specified
      on the boot commandline.  Instead, it gets set on the first console on the
      command line.  This can cause problems for things like kdb that look for
      the CON_CONSDEV flag to see if the console is valid.
      
      Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
      preferred console at unregister time if the console being unregistered
      currently has that bit set.
      
      Example (from sn2 ia64):
      
      elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
      
      in this case, the flags on ttySG console struct will be 0x4 (should be
      0x6).
      
      Attached patch against bk fixes both issues for the cases I looked at.  It
      uses selected_console (which gets incremented for each console specified on
      the command line) as the indicator of which console to set CON_CONSDEV on.
      When adding the console to the list, if the previous one had CON_CONSDEV
      set, it masks it out.  Tested on ia64 and x86.
      
      The problem with the current behavior is it breaks overriding the default from
      the boot line.  In the ia64 case, there may be a global append line defining
      console=a in elilo.conf.  Then you want to boot your kernel, and want to
      override the default by passing console=b on the boot line.  elilo constructs
      the kernel cmdline by starting with the value of the global append line, then
      tacks on whatever else you specify, which puts console=b last.
      Signed-off-by: NGreg Edwards <edwardsg@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab4af03a
    • R
      [PATCH] kstrdup: convert a few existing implementations · dfe52244
      Robert Love 提交于
      Convert a bunch of strdup() implementations and their callers to the new
      kstrdup().  A few remain, for example see sound/core, and there are tons of
      open coded strdup()'s around.  Sigh.  But this is a start.
      Signed-off-by: NRobert Love <rml@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dfe52244
    • P
      [PATCH] create a kstrdup library function · 543537bd
      Paulo Marques 提交于
      This patch creates a new kstrdup library function and changes the "local"
      implementations in several places to use this function.
      
      Most of the changes come from the sound and net subsystems.  The sound part
      had already been acknowledged by Takashi Iwai and the net part by David S.
      Miller.
      
      I left UML alone for now because I would need more time to read the code
      carefully before making changes there.
      Signed-off-by: NPaulo Marques <pmarques@grupopie.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      543537bd
    • A
      [PATCH] fix for prune_icache()/forced final iput() races · 991114c6
      Alexander Viro 提交于
      Based on analysis and a patch from Russ Weight <rweight@us.ibm.com>
      
      There is a race condition that can occur if an inode is allocated and then
      released (using iput) during the ->fill_super functions.  The race
      condition is between kswapd and mount.
      
      For most filesystems this can only happen in an error path when kswapd is
      running concurrently.  For isofs, however, the error can occur in a more
      common code path (which is how the bug was found).
      
      The logic here is "we want final iput() to free inode *now* instead of
      letting it sit in cache if fs is going down or had not quite come up".  The
      problem is with kswapd seeing such inodes in the middle of being killed and
      happily taking over.
      
      The clean solution would be to tell kswapd to leave those inodes alone and
      let our final iput deal with them.  I.e.  add a new flag
      (I_FORCED_FREEING), set it before write_inode_now() there and make
      prune_icache() leave those alone.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      991114c6
    • O
      [PATCH] posix-timers: use try_to_del_timer_sync() · f972be33
      Oleg Nesterov 提交于
      sys_timer_settime/sys_timer_delete needs to delete k_itimer->real.timer
      synchronously while holding ->it_lock, which is also locked in
      posix_timer_fn.
      
      This patch removes timer_active/set_timer_inactive which plays with
      timer_list's internals in favour of using try_to_del_timer_sync(), which
      was introduced in the previous patch.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f972be33
    • O
      [PATCH] timers: introduce try_to_del_timer_sync() · fd450b73
      Oleg Nesterov 提交于
      This patch splits del_timer_sync() into 2 functions.  The new one,
      try_to_del_timer_sync(), returns -1 when it hits executing timer.
      
      It can be used in interrupt context, or when the caller hold locks which
      can prevent completion of the timer's handler.
      
      NOTE.  Currently it can't be used in interrupt context in UP case, because
      ->running_timer is used only with CONFIG_SMP.
      
      Should the need arise, it is possible to kill #ifdef CONFIG_SMP in
      set_running_timer(), it is cheap.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fd450b73
    • O
      [PATCH] timers fixes/improvements · 55c888d6
      Oleg Nesterov 提交于
      This patch tries to solve following problems:
      
      1. del_timer_sync() is racy. The timer can be fired again after
         del_timer_sync have checked all cpus and before it will recheck
         timer_pending().
      
      2. It has scalability problems. All cpus are scanned to determine
         if the timer is running on that cpu.
      
         With this patch del_timer_sync is O(1) and no slower than plain
         del_timer(pending_timer), unless it has to actually wait for
         completion of the currently running timer.
      
         The only restriction is that the recurring timer should not use
         add_timer_on().
      
      3. The timers are not serialized wrt to itself.
      
         If CPU_0 does mod_timer(jiffies+1) while the timer is currently
         running on CPU 1, it is quite possible that local interrupt on
         CPU_0 will start that timer before it finished on CPU_1.
      
      4. The timers locking is suboptimal. __mod_timer() takes 3 locks
         at once and still requires wmb() in del_timer/run_timers.
      
         The new implementation takes 2 locks sequentially and does not
         need memory barriers.
      
      Currently ->base != NULL means that the timer is pending. In that case
      ->base.lock is used to lock the timer. __mod_timer also takes timer->lock
      because ->base can be == NULL.
      
      This patch uses timer->entry.next != NULL as indication that the timer is
      pending. So it does __list_del(), entry->next = NULL instead of list_del()
      when the timer is deleted.
      
      The ->base field is used for hashed locking only, it is initialized
      in init_timer() which sets ->base = per_cpu(tvec_bases). When the
      tvec_bases.lock is locked, it means that all timers which are tied
      to this base via timer->base are locked, and the base itself is locked
      too.
      
      So __run_timers/migrate_timers can safely modify all timers which could
      be found on ->tvX lists (pending timers).
      
      When the timer's base is locked, and the timer removed from ->entry list
      (which means that _run_timers/migrate_timers can't see this timer), it is
      possible to set timer->base = NULL and drop the lock: the timer remains
      locked.
      
      This patch adds lock_timer_base() helper, which waits for ->base != NULL,
      locks the ->base, and checks it is still the same.
      
      __mod_timer() schedules the timer on the local CPU and changes it's base.
      However, it does not lock both old and new bases at once. It locks the
      timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and
      unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base,
      and adds this timer. This simplifies the code, because AB-BA deadlock is not
      possible. __mod_timer() also ensures that the timer's base is not changed
      while the timer's handler is running on the old base.
      
      __run_timers(), del_timer() do not change ->base anymore, they only clear
      pending flag.
      
      So del_timer_sync() can test timer->base->running_timer == timer to detect
      whether it is running or not.
      
      We don't need timer_list->lock anymore, this patch kills it.
      
      We also don't need barriers. del_timer() and __run_timers() used smp_wmb()
      before clearing timer's pending flag. It was needed because __mod_timer()
      did not lock old_base if the timer is not pending, so __mod_timer()->list_add()
      could race with del_timer()->list_del(). With this patch these functions are
      serialized through base->lock.
      
      One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch
      adds global
      
              struct timer_base_s {
                      spinlock_t lock;
                      struct timer_list *running_timer;
              } __init_timer_base;
      
      which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s
      struct are replaced by struct timer_base_s t_base.
      
      It is indeed ugly. But this can't have scalability problems. The global
      __init_timer_base.lock is used only when __mod_timer() is called for the first
      time AND the timer was compile time initialized. After that the timer migrates
      to the local CPU.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NRenaud Lienhart <renaud.lienhart@free.fr>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55c888d6
    • N
      [PATCH] blk: unplug later · bdd646a4
      Nick Piggin 提交于
      get_request_wait needn't unplug the device immediately.
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bdd646a4
    • N
      [PATCH] blk: branch hints · fde6ad22
      Nick Piggin 提交于
      Sprinkle around a few branch hints in the block layer.
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fde6ad22
    • N
      [PATCH] blk: no memory barrier · 250dccc0
      Nick Piggin 提交于
      This memory barrier is not needed because the waitqueue will only get waiters
      on it in the following situations:
      
      rq->count has exceeded the threshold - however all manipulations of ->count
      are performed under the runqueue lock, and so we will correctly pick up any
      waiter.
      
      Memory allocation for the request fails.  In this case, there is no additional
      help provided by the memory barrier.  We are guaranteed to eventually wake up
      waiters because the request allocation mempool guarantees that if the mem
      allocation for a request fails, there must be some requests in flight.  They
      will wake up waiters when they are retired.
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      250dccc0
    • T
      [PATCH] blk: cleanup generic tag support error messages · 040c928c
      Tejun Heo 提交于
      Add KERN_ERR and __FUNCTION__ to generic tag error messages, and add a comment
      in blk_queue_end_tag() which explains the silent failure path.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      040c928c
    • T
      [PATCH] blk: remove BLK_TAGS_{PER_LONG|MASK} · f7d37d02
      Tejun Heo 提交于
      Replace BLK_TAGS_PER_LONG with BITS_PER_LONG and remove unused BLK_TAGS_MASK.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7d37d02
    • T
      [PATCH] blk: remove blk_queue_tag->real_max_depth optimization · fa72b903
      Tejun Heo 提交于
      blk_queue_tag->real_max_depth was used to optimize out unnecessary
      allocations/frees on tag resize.  However, the whole thing was very broken -
      tag_map was never allocated to real_max_depth resulting in access beyond the
      end of the map, bits in [max_depth..real_max_depth] were set when initializing
      a map and copied when resizing resulting in pre-occupied tags.
      
      As the gain of the optimization is very small, well, almost nill, remove the
      whole thing.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fa72b903