1. 27 6月, 2006 40 次提交
    • E
      [PATCH] proc: Remove tasklist_lock from proc_task_readdir. · cc288738
      Eric W. Biederman 提交于
      This is just like my previous removal of tasklist_lock from first_tgid, and
      next_tgid.  It simply had to wait until it was rcu safe to walk the thread
      list.
      
      This should be the last instance of the tasklist_lock in proc.  So user
      processes should not be able to influence the tasklist lock hold times.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cc288738
    • E
      [PATCH] proc: Cleanup proc_fd_access_allowed · df26c40e
      Eric W. Biederman 提交于
      In process of getting proc_fd_access_allowed to work it has developed a few
      warts.  In particular the special case that always allows introspection and
      the special case to allow inspection of kernel threads.
      
      The special case for introspection is needed for /proc/self/mem.
      
      The special case for kernel threads really should be overridable
      by security modules.
      
      So consolidate these checks into ptrace.c:may_attach().
      
      The check to always allow introspection is trivial.
      
      The check to allow access to kernel threads, and zombies is a little
      trickier.  mem_read and mem_write already verify an mm exists so it isn't
      needed twice.  proc_fd_access_allowed only doesn't want a check to verify
      task->mm exits, s it prevents all access to kernel threads.  So just move
      the task->mm check into ptrace_attach where it is needed for practical
      reasons.
      
      I did a quick audit and none of the security modules in the kernel seem to
      care if they are passed a task without an mm into security_ptrace.  So the
      above move should be safe and it allows security modules to come up with
      more restrictive policy.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      df26c40e
    • E
      [PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks · 778c1144
      Eric W. Biederman 提交于
      Since 2.2 we have been doing a chroot check to see if it is appropriate to
      return a read or follow one of these magic symlinks.  The chroot check was
      asking a question about the visibility of files to the calling process and
      it was actually checking the destination process, and not the files
      themselves.  That test was clearly bogus.
      
      In my first pass through I simply fixed the test to check the visibility of
      the files themselves.  That naive approach to fixing the permissions was
      too strict and resulted in cases where a task could not even see all of
      it's file descriptors.
      
      What has disturbed me about relaxing this check is that file descriptors
      are per-process private things, and they are occasionaly used a user space
      capability tokens.  Looking a little farther into the symlink path on /proc
      I did find userid checks and a check for capability (CAP_DAC_OVERRIDE) so
      there were permissions checking this.
      
      But I was still concerned about privacy.  Besides /proc there is only one
      other way to find out this kind of information, and that is ptrace.  ptrace
      has been around for a long time and it has a well established security
      model.
      
      So after thinking about it I finally realized that the permission checks
      that make sense are the permission checks applied to ptrace_attach.  The
      checks are simple per process, and won't cause nasty surprises for people
      coming from less capable unices.
      
      Unfortunately there is one case that the current ptrace_attach test does
      not cover: Zombies and kernel threads.  Single stepping those kinds of
      processes is impossible.  Being able to see which file descriptors are open
      on these tasks is important to lsof, fuser and friends.  So for these
      special processes I made the rule you can't find out unless you have
      CAP_SYS_PTRACE.
      
      These proc permission checks should now conform to the principle of least
      surprise.  As well as using much less code to implement :)
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      778c1144
    • E
      [PATCH] proc: optimize proc_check_dentry_visible · 5b0c1dd3
      Eric W. Biederman 提交于
      The code doesn't need to sleep to when making this check so I can just do the
      comparison and not worry about the reference counts.
      
      TODO: While looking at this I realized that my original cleanup did not push
      the permission check far enough down into the stack.  The call of
      proc_check_dentry_visible needs to move out of the generic proc
      readlink/follow link code and into the individual get_link instances.
      Otherwise the shared resources checks are not quite correct (shared
      files_struct does not require a shared fs_struct), and there are races with
      unshare.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5b0c1dd3
    • E
      [PATCH] proc: Use struct pid not struct task_ref · 13b41b09
      Eric W. Biederman 提交于
      Incrementally update my proc-dont-lock-task_structs-indefinitely patches so
      that they work with struct pid instead of struct task_ref.
      
      Mostly this is a straight 1-1 substitution.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      13b41b09
    • E
      [PATCH] proc: don't lock task_structs indefinitely · 99f89551
      Eric W. Biederman 提交于
      Every inode in /proc holds a reference to a struct task_struct.  If a
      directory or file is opened and remains open after the the task exits this
      pinning continues.  With 8K stacks on a 32bit machine the amount pinned per
      file descriptor is about 10K.
      
      Normally I would figure a reasonable per user process limit is about 100
      processes.  With 80 processes, with a 1000 file descriptors each I can trigger
      the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless
      data.
      
      This patch replaces the struct task_struct pointer with a pointer to a struct
      task_ref which has a struct task_struct pointer.  The so the pinning of dead
      tasks does not happen.
      
      The code now has to contend with the fact that the task may now exit at any
      time.  Which is a little but not muh more complicated.
      
      With this change it takes about 1000 processes each opening up 1000 file
      descriptors before I can trigger the OOM killer.  Much better.
      
      [mlp@google.com: task_mmu small fixes]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Albert Cahalan <acahalan@gmail.com>
      Signed-off-by: NPrasanna Meda <mlp@google.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      99f89551
    • E
      [PATCH] proc: make PROC_NUMBUF the buffer size for holding integers as strings · 8578cea7
      Eric W. Biederman 提交于
      Currently in /proc at several different places we define buffers to hold a
      process id, or a file descriptor .  In most of them we use either a hard coded
      number or a different define.  Modify them all to use PROC_NUMBUF, so the code
      has a chance of being maintained.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8578cea7
    • E
      [PATCH] simply fix first_tgid · 9cc8cbc7
      Eric W. Biederman 提交于
      Like the bug Oleg spotted in first_tid there was also a small off by one
      error in first_tgid, when a seek was done on the /proc directory.  This
      fixes that and changes the code structure to make it a little more obvious
      what is going on.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9cc8cbc7
    • E
      [PATCH] proc: Remove tasklist_lock from proc_pid_lookup() and proc_task_lookup() · de758734
      Eric W. Biederman 提交于
      Since we no longer need the tasklist_lock for get_task_struct the lookup
      methods no longer need the tasklist_lock.
      
      This just depends on my previous patch that makes get_task_struct() rcu
      safe.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      de758734
    • E
      [PATCH] proc: Remove tasklist_lock from proc_pid_readdir · 454cc105
      Eric W. Biederman 提交于
      We don't need the tasklist_lock to safely iterate through processes
      anymore.
      
      This depends on my previous to task patches that make get_task_struct rcu
      safe, and that make next_task() rcu safe.  I haven't gotten
      first_tid/next_tid yet only because next_thread is missing an
      rcu_dereference.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      454cc105
    • E
      [PATCH] proc: refactor reading directories of tasks · 0bc58a91
      Eric W. Biederman 提交于
      There are a couple of problems this patch addresses.
      - /proc/<tgid>/task currently does not work correctly if you stop reading
        in the middle of a directory.
      
      - /proc/ currently requires a full pass through the task list with
        the tasklist lock held, to determine there are no more processes to read.
      
      - The hand rolled integer to string conversion does not properly running
        out of buffer space.
      
      - We seem to be batching reading of pids from the tasklist without reason,
        and complicating the logic of the code.
      
      This patch addresses that by changing how tasks are processed.  A
      first_<task_type> function is built that handles restarts, and a
      next_<task_type> function is built that just advances to the next task.
      
      first_<task_type> when it detects a restart usually uses find_task_by_pid.  If
      that doesn't work because there has been a seek on the directory, or we have
      already given a complete directory listing, it first checks the number tasks
      of that type, and only if we are under that count does it walk through all of
      the tasks to find the one we are interested in.
      
      The code that fills in the directory is simpler because there is only a single
      for loop.
      
      The hand rolled integer to string conversion is replaced by snprintf which
      should handle the the out of buffer case correctly.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0bc58a91
    • E
      [PATCH] proc: Close the race of a process dying durning lookup · cd6a3ce9
      Eric W. Biederman 提交于
      proc_lookup and task exiting are not synchronized, although some of the
      previous code may have suggested that.  Every time before we reuse a dentry
      namei.c calls d_op->derevalidate which prevents us from reusing a stale dcache
      entry.  Unfortunately it does not prevent us from returning a stale dcache
      entry.  This race has been explicitly plugged in proc_pid_lookup but there is
      nothing to confine it to just that proc lookup function.
      
      So to prevent the race I call revalidate explictily in all of the proc lookup
      functions after I call d_add, and report an error if the revalidate does not
      succeed.
      
      Years ago Al Viro did something similar but those changes got lost in the
      churn.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cd6a3ce9
    • E
      [PATCH] proc: Rewrite the proc dentry flush on exit optimization · 48e6484d
      Eric W. Biederman 提交于
      To keep the dcache from filling up with dead /proc entries we flush them on
      process exit.  However over the years that code has gotten hairy with a
      dentry_pointer and a lock in task_struct and misdocumented as a correctness
      feature.
      
      I have rewritten this code to look and see if we have a corresponding entry in
      the dcache and if so flush it on process exit.  This removes the extra fields
      in the task_struct and allows me to trivially handle the case of a
      /proc/<tgid>/task/<pid> entry as well as the current /proc/<pid> entries.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      48e6484d
    • E
      [PATCH] proc: Move proc_maps_operations into task_mmu.c · 662795de
      Eric W. Biederman 提交于
      All of the functions for proc_maps_operations are already defined in
      task_mmu.c so move the operations structure to keep the functionality
      together.
      
      Since task_nommu.c implements a dummy version of /proc/<pid>/maps give it a
      simplified version of proc_maps_operations that it can modify to best suit its
      needs.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      662795de
    • E
      [PATCH] proc: Fix the link count for /proc/<pid>/task · 6e66b52b
      Eric W. Biederman 提交于
      Use getattr to get an accurate link count when needed.  This is cheaper and
      more accurate than trying to derive it by walking the thread list of a
      process.
      
      Especially as it happens when needed stat instead of at readdir time.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e66b52b
    • E
      [PATCH] proc: Properly filter out files that are not visible to a process · 0f2fe20f
      Eric W. Biederman 提交于
      Long ago and far away in 2.2 we started checking to ensure the files we
      displayed in /proc were visible to the current process.  It was an
      unsophisticated time and no one was worried about functions full of FIXMES in
      a stable kernel.  As time passed the function became sacred and was enshrined
      in the shrine of how things have always been.  The fixes came in but only to
      keep the function working no one really remembering or documenting why we did
      things that way.
      
      The intent and the functionality make a lot of sense.  Don't let /proc be an
      access point for files a process can see no other way.  The implementation
      however is completely wrong.
      
      We are currently checking the root directories of the two processes, we are
      not checking the actual file descriptors themselves.
      
      We are strangely checking with a permission method instead of just when we use
      the data.
      
      This patch fixes the logic to actually check the file descriptors and make a
      note that implementing a permission method for this part of /proc almost
      certainly indicates a bug in the reasoning.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0f2fe20f
    • E
      [PATCH] proc: Kill proc_mem_inode_operations · 22c2c5d7
      Eric W. Biederman 提交于
      The inode operations only exist to support the proc_permission function.
      Currently mem_read and mem_write have all the same permission checks as
      ptrace.  The fs check makes no sense in this context, and we can trivially get
      around it by calling ptrace.
      
      So simply the code by killing the strange weird case.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22c2c5d7
    • E
      [PATCH] proc: Remove bogus proc_task_permission · 68602066
      Eric W. Biederman 提交于
      First we can access every /proc/<tgid>/task/<pid> directory as /proc/<pid> so
      proc_task_permission is not usefully limiting visibility.
      
      Second having related filesystems information should have nothing to do with
      process visibility.  kill does not implement any checks like that.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      68602066
    • E
      [PATCH] proc: Replace proc_inode.type with proc_inode.fd · aed7a6c4
      Eric W. Biederman 提交于
      The sole renaming use of proc_inode.type is to discover the file descriptor
      number, so just store the file descriptor number and don't wory about
      processing this field.  This removes any /proc limits on the maximum number of
      file descriptors, and clears the path to make the hard coded /proc inode
      numbers go away.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aed7a6c4
    • E
      [PATCH] proc: Simplify the ownership rules for /proc · 87bfbf67
      Eric W. Biederman 提交于
      Currently in /proc if the task is dumpable all of files are owned by the tasks
      effective users.  Otherwise the files are owned by root.  Unless it is the
      /proc/<tgid>/ or /proc/<tgid>/task/<pid> directory in that case we always make
      the directory owned by the effective user.
      
      However the special case for directories is pointless except as a way to read
      the effective user, because the permissions on both of those directories are
      world readable, and executable.
      
      /proc/<tgid>/status provides a much better way to read a processes effecitve
      userid, so it is silly to try to provide that on the directory.
      
      So this patch simplifies the code by removing a pointless special case and
      gets us one step closer to being able to remove the hard coded /proc inode
      numbers.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      87bfbf67
    • E
      [PATCH] proc: Remove unnecessary and misleading assignments from proc_pid_make_inode · 16796549
      Eric W. Biederman 提交于
      The removed fields are already set by proc_alloc_inode.  Initializing them in
      proc_alloc_inode implies they need it for proper cleanup.  At least ei->pde
      was not set on all paths making it look like proc_alloc_inode was buggy.  So
      just remove the redundant assignments.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      16796549
    • E
      [PATCH] proc: Remove useless BKL in proc_pid_readlink · ff9724a3
      Eric W. Biederman 提交于
      We already call everything except do_proc_readlink outside of the BKL in
      proc_pid_followlink, and there appears to be nothing in do_proc_readlink that
      needs any special protection.
      
      So remove this leftover from one of the BKL cleanup efforts.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ff9724a3
    • E
    • H
      [PATCH] nfsd kconfig: select things at the closest tristate instead of bool · f05e15b5
      Herbert Xu 提交于
      I noticed recently that my CONFIG_CRYPTO_MD5 turned into a y again instead
      of m.  It turns out that CONFIG_NFSD_V4 is selecting it to be y even though
      I've chosen to compile nfsd as a module.
      
      In general when we have a bool sitting under a tristate it is better to
      select things you need from the tristate rather than the bool since that
      allows the things you select to be modules.
      
      The following patch does it for nfsd.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f05e15b5
    • H
      [PATCH] i4l: Gigaset drivers: add IOCTLs to compat_ioctl.h · 5024ad4a
      Hansjoerg Lipp 提交于
      Add the IOCTLs of the Gigaset drivers to compat_ioctl.h in order to make
      them available for 32 bit programs on 64 bit platforms.  Please merge.
      Signed-off-by: NHansjoerg Lipp <hjlipp@web.de>
      Acked-by: NTilman Schmidt <tilman@imap.cc>
      Cc: Karsten Keil <kkeil@suse.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5024ad4a
    • T
      [PATCH] isdn4linux: Gigaset driver cleanup · 698e3ed9
      Tilman Schmidt 提交于
      The following patch to the common part of the Siemens Gigaset driver
      prevents it from trying to send the +++ break sequence if the device has
      been disconnected, and removes a couple of assignments which didn't have
      any effect.
      Signed-off-by: NTilman Schmidt <tilman@imap.cc>
      Acked-by: NHansjoerg Lipp <hjlipp@web.de>
      Cc: Karsten Keil <kkeil@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      698e3ed9
    • T
      [PATCH] isdn4linux: Gigaset base driver: improve error recovery · 06163f86
      Tilman Schmidt 提交于
      The following patch to the Siemens Gigaset base driver adds graceful
      recovery for some frequently encountered error conditions, by retrying
      failed control requests (eg.  stalled control pipe), and by closing and
      reopening the AT command channel when it appears to be stuck.
      Signed-off-by: NTilman Schmidt <tilman@imap.cc>
      Acked-by: NHansjoerg Lipp <hjlipp@web.de>
      Cc: Karsten Keil <kkeil@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      06163f86
    • E
      [PATCH] Fix typo in drivers/isdn/hisax/q931.c · 9f13fae2
      Eric Sesterhenn 提交于
      This fixes coverity bug #517.
      
      Since IESIZE is greater than IESIZE_NI1 we might run past the end of
      ielist_ni1.  This fixes it by using the proper IESIZE_NI1 define.
      Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
      Acked-by: NKarsten Keil <kkeil@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9f13fae2
    • M
      [PATCH] CAPI crash / race condition · 6aa65472
      Michael Buesch 提交于
      I am getting more or less reproducible crashes from the CAPI subsystem
      using the fcdsl driver:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000010
       printing eip:
      c39bbca4
      *pde = 00000000
      Oops: 0000 [#1]
      Modules linked in: netconsole capi capifs 3c59x mii fcdsl kernelcapi uhci_hcd usbcore ide_cd cdrom
      CPU:    0
      EIP:    0060:[<c39bbca4>]    Tainted: P      VLI
      EFLAGS: 00010202   (2.6.16.11 #3)
      EIP is at handle_minor_send+0x17a/0x241 [capi]
      eax: c24abbc0   ebx: c0b4c980   ecx: 00000010   edx: 00000010
      esi: c1679140   edi: c2783016   ebp: 0000c28d   esp: c0327e24
      ds: 007b   es: 007b   ss: 0068
      Process swapper (pid: 0, threadinfo=c0326000 task=c02e1300)
      Stack: <0>000005b4 c1679180 00000000 c28d0000 c1ce04e0 c2f69654 c221604e c1679140
             c39bc19a 00000038 c20c0400 c075c560 c1f2f800 00000000 c01dc9b5 c1e96a40
             c075c560 c2ed64c0 c1e96a40 c01dcd3b c2fb94e8 c075c560 c0327f00 c1e96a40
      Call Trace:
       [<c39bc19a>] capinc_tty_write+0xda/0xf3 [capi]
       [<c01dc9b5>] ppp_sync_push+0x52/0xfe
       [<c01dcd3b>] ppp_sync_send+0x1f5/0x204
       [<c01d9bc1>] ppp_push+0x3e/0x9c
       [<c01dacd4>] ppp_xmit_process+0x422/0x4cc
       [<c01daf3f>] ppp_start_xmit+0x1c1/0x1f6
       [<c0213ea5>] qdisc_restart+0xa7/0x135
       [<c020b112>] dev_queue_xmit+0xba/0x19e
       [<c0223f69>] ip_output+0x1eb/0x236
       [<c0220907>] ip_forward+0x1c1/0x21a
       [<c021fa6c>] ip_rcv+0x38e/0x3ea
       [<c020b4c2>] netif_receive_skb+0x166/0x195
       [<c020b55e>] process_backlog+0x6d/0xd2
       [<c020a30f>] net_rx_action+0x6a/0xff
       [<c0112909>] __do_softirq+0x35/0x7d
       [<c0112973>] do_softirq+0x22/0x26
       [<c0103a9d>] do_IRQ+0x1e/0x25
       [<c010255a>] common_interrupt+0x1a/0x20
       [<c01013c5>] default_idle+0x2b/0x53
       [<c0101426>] cpu_idle+0x39/0x4e
       [<c0328386>] start_kernel+0x20b/0x20d
      Code: c0 e8 b3 b6 77 fc 85 c0 75 10 68 d8 c8 9b c3 e8 82 3d 75 fc 8b 43 60 5a eb 50 8d 56 50 c7 00 00 00 00 00 66 89 68 04 eb 02 89
      ca <8b> 0a 85 c9 75 f8 89 02 89 da ff 46 54 8b 46 10 e8 30 79 fd ff
       <0>Kernel panic - not syncing: Fatal exception in interrupt
      
      That oops took me to the "ackqueue" implementation in capi.c.  The crash
      occured in capincci_add_ack() (auto-inlined by the compiler).
      
      I read the code a bit and finally decided to replace the custom linked list
      implementation (struct capiminor->ackqueue) by a struct list_head.  That
      did not solve the crash, but produced the following interresting oops:
      
      Unable to handle kernel paging request at virtual address 00200200
       printing eip:
      c39bb1f5
      *pde = 00000000
      Oops: 0002 [#1]
      Modules linked in: netconsole capi capifs 3c59x mii fcdsl kernelcapi uhci_hcd usbcore ide_cd cdrom
      CPU:    0
      EIP:    0060:[<c39bb1f5>]    Tainted: P      VLI
      EFLAGS: 00010246   (2.6.16.11 #3)
      EIP is at capiminor_del_ack+0x18/0x49 [capi]
      eax: 00200200   ebx: c18d41a0   ecx: c1385620   edx: 00100100
      esi: 0000d147   edi: 00001103   ebp: 0000d147   esp: c1093f3c
      ds: 007b   es: 007b   ss: 0068
      Process events/0 (pid: 3, threadinfo=c1092000 task=c1089030)
      Stack: <0>c2a17580 c18d41a0 c39bbd16 00000038 c18d41e0 00000000 d147c640 c29e0b68
             c29e0b90 00000212 c29e0b68 c39932b2 c29e0bb0 c10736a0 c0119ef0 c399326c
             c10736a8 c10736a0 c10736b0 c0119f93 c011a06e 00000001 00000000 00000000
      Call Trace:
       [<c39bbd16>] handle_minor_send+0x1af/0x241 [capi]
       [<c39932b2>] recv_handler+0x46/0x5f [kernelcapi]
       [<c0119ef0>] run_workqueue+0x5e/0x8d
       [<c399326c>] recv_handler+0x0/0x5f [kernelcapi]
       [<c0119f93>] worker_thread+0x0/0x10b
       [<c011a06e>] worker_thread+0xdb/0x10b
       [<c010c998>] default_wake_function+0x0/0xc
       [<c011c399>] kthread+0x90/0xbc
       [<c011c309>] kthread+0x0/0xbc
       [<c0100a65>] kernel_thread_helper+0x5/0xb
      Code: 7e 02 89 ee 89 f0 5a f7 d0 c1 f8 1f 5b 21 f0 5e 5f 5d c3 56 53 8b 48 50 89 d6 89 c3 8b 11 eb 2f 66 39 71 08 75 25 8b 41 04 8b 11 <89> 10 89 42 04 c7 01 00 01 10 00 89 c8 c7 41 04 00 02 20 00 e8
      
      The interresting part of it is the "virtual address 00200200", which is
      LIST_POISON2.  I thought about some race condition, but as this is an UP
      system, it leads to questions on how it can happen.  If we look at EFLAGS:
      00010202, we see that interrupts are enabled at the time of the crash
      (eflags & 0x200).
      
      Finally, I don't understand all the capi code, but I think that
      handle_minor_send() is racing somehow against capi_recv_message(), which
      call both capiminor_del_ack().  So if an IRQ occurs in the middle of
      capiminor_del_ack() and another instance of it is invoked, it leads to
      linked list corruption.
      
      I came up with the following patch.  With this, I could not reproduce the
      crash anymore.  Clearly, this is not the correct fix for the issue.  As this
      seems to be some locking issue, there might be more locking issues in that
      code.  For example, doesn't the whole struct capiminor have to be locked
      somehow?
      
      Cc: Carsten Paeth <calle@calle.de>
      Cc: Kai Germaschewski <kai.germaschewski@gmx.de>
      Cc: Karsten Keil <kkeil@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6aa65472
    • A
      [PATCH] Notify page fault call chain · e6f47f97
      Anil S Keshavamurthy 提交于
      With this patch Kprobes now registers for page fault notifications only when
      their is an active probe registered.  Once all the active probes are
      unregistered their is no need to be notified of page faults and kprobes
      unregisters itself from the page fault notifications.  Hence we will have ZERO
      side effects when no probes are active.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e6f47f97
    • A
      [PATCH] Kprobes registers for notify page fault · 3d5631e0
      Anil S Keshavamurthy 提交于
      Kprobes now registers for page fault notifications.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavmurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3d5631e0
    • A
      [PATCH] Notify page fault call chain for sparc64 · d98f8f05
      Anil S Keshavamurthy 提交于
      Overloading of page fault notification with the notify_die() has performance
      issues(since the only interested components for page fault is kprobes and/or
      kdb) and hence this patch introduces the new notifier call chain exclusively
      for page fault notifications their by avoiding notifying unnecessary
      components in the do_page_fault() code path.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d98f8f05
    • A
      [PATCH] Notify page fault call chain for powerpc · 4f9e87c0
      Anil S Keshavamurthy 提交于
      Overloading of page fault notification with the notify_die() has performance
      issues(since the only interested components for page fault is kprobes and/or
      kdb) and hence this patch introduces the new notifier call chain exclusively
      for page fault notifications their by avoiding notifying unnecessary
      components in the do_page_fault() code path.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4f9e87c0
    • A
      [PATCH] Notify page fault call chain for ia64 · ae9a5b85
      Anil S Keshavamurthy 提交于
      Overloading of page fault notification with the notify_die() has performance
      issues(since the only interested components for page fault is kprobes and/or
      kdb) and hence this patch introduces the new notifier call chain exclusively
      for page fault notifications their by avoiding notifying unnecessary
      components in the do_page_fault() code path.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ae9a5b85
    • A
      [PATCH] Notify page fault call chain for i386 · b71b5b65
      Anil S Keshavamurthy 提交于
      Overloading of page fault notification with the notify_die() has performance
      issues(since the only interested components for page fault is kprobes and/or
      kdb) and hence this patch introduces the new notifier call chain exclusively
      for page fault notifications their by avoiding notifying unnecessary
      components in the do_page_fault() code path.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b71b5b65
    • A
      [PATCH] Notify page fault call chain for x86_64 · 1bd858a5
      Anil S Keshavamurthy 提交于
      Currently in the do_page_fault() code path, we call notify_die(DIE_PAGE_FAULT,
      ...) to notify the page fault.  Since notify_die() is highly overloaded, this
      page fault notification is currently being sent to all the components
      registered with register_die_notification() which uses the same die_chain to
      loop for all the registered components which is unnecessary.
      
      In order to optimize the do_page_fault() code path, this critical page fault
      notification is now moved to different call chain and the test results showed
      great improvements.
      
      And the kprobes which is interested in this notifications, now registers onto
      this new call chain only when it need to, i.e Kprobes now registers for page
      fault notification only when their are an active probes and unregisters from
      this page fault notification when no probes are active.
      
      I have incorporated all the feedback given by Ananth and Keith and everyone,
      and thanks for all the review feedback.
      
      This patch:
      
      Overloading of page fault notification with the notify_die() has performance
      issues(since the only interested components for page fault is kprobes and/or
      kdb) and hence this patch introduces the new notifier call chain exclusively
      for page fault notifications their by avoiding notifying unnecessary
      components in the do_page_fault() code path.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1bd858a5
    • M
      [PATCH] Kprobe: multi kprobe posthandler for booster · 36721656
      mao, bibo 提交于
      If there are multi kprobes on the same probepoint, there will be one extra
      aggr_kprobe on the head of kprobe list.  The aggr_kprobe has
      aggr_post_handler/aggr_break_handler whether the other kprobe
      post_hander/break_handler is NULL or not.  This patch modifies this, only
      when there is one or more kprobe in the list whose post_handler is not
      NULL, post_handler of aggr_kprobe will be set as aggr_post_handler.
      
      [soshima@redhat.com: !CONFIG_PREEMPT fix]
      Signed-off-by: Nbibo, mao <bibo.mao@intel.com>
      Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Yumiko Sugita <sugita@sdl.hitachi.co.jp>
      Cc: Hideo Aoki <haoki@redhat.com>
      Signed-off-by: NSatoshi Oshima <soshima@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      36721656
    • M
      [PATCH] kprobe: boost 2byte-opcodes on i386 · 585deaca
      Masami Hiramatsu 提交于
      Previous kprobe-booster patch has not handled any 2byte opcodes and
      prefixes.  I checked whole IA32 opcode map and classified it.
      
      This patch enables kprobe to boost those 2byte opcodes and prefixes.
      Signed-off-by: NMasami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Yumiko Sugita <sugita@sdl.hitachi.co.jp>
      Cc: Satoshi Oshima <soshima@redhat.com>
      Cc: Hideo Aoki <haoki@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      585deaca
    • J
      [PATCH] GTOD: add scx200 HRT clocksource · 6ae7440e
      Jim Cromie 提交于
      Add a GTOD clocksource driver based on the Geode SCx200's Hi-Res Timer.
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6ae7440e
    • R
      [PATCH] fix and optimize clock source update · 19923c19
      Roman Zippel 提交于
      This fixes the clock source updates in update_wall_time() to correctly
      track the time coming in via current_tick_length().  Optimize the fast
      paths to be as short as possible to keep the overhead low.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Acked-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      19923c19