1. 24 5月, 2007 1 次提交
  2. 17 5月, 2007 1 次提交
  3. 11 5月, 2007 3 次提交
    • D
      signal/timer/event: signalfd core · fba2afaa
      Davide Libenzi 提交于
      This patch series implements the new signalfd() system call.
      
      I took part of the original Linus code (and you know how badly it can be
      broken :), and I added even more breakage ;) Signals are fetched from the same
      signal queue used by the process, so signalfd will compete with standard
      kernel delivery in dequeue_signal().  If you want to reliably fetch signals on
      the signalfd file, you need to block them with sigprocmask(SIG_BLOCK).  This
      seems to be working fine on my Dual Opteron machine.  I made a quick test
      program for it:
      
      http://www.xmailserver.org/signafd-test.c
      
      The signalfd() system call implements signal delivery into a file descriptor
      receiver.  The signalfd file descriptor if created with the following API:
      
      int signalfd(int ufd, const sigset_t *mask, size_t masksize);
      
      The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
      to close/create cycle (Linus idea).  Use "ufd" == -1 if you want a brand new
      signalfd file.
      
      The "mask" allows to specify the signal mask of signals that we are interested
      in.  The "masksize" parameter is the size of "mask".
      
      The signalfd fd supports the poll(2) and read(2) system calls.  The poll(2)
      will return POLLIN when signals are available to be dequeued.  As a direct
      consequence of supporting the Linux poll subsystem, the signalfd fd can use
      used together with epoll(2) too.
      
      The read(2) system call will return a "struct signalfd_siginfo" structure in
      the userspace supplied buffer.  The return value is the number of bytes copied
      in the supplied buffer, or -1 in case of error.  The read(2) call can also
      return 0, in case the sighand structure to which the signalfd was attached,
      has been orphaned.  The O_NONBLOCK flag is also supported, and read(2) will
      return -EAGAIN in case no signal is available.
      
      If the size of the buffer passed to read(2) is lower than sizeof(struct
      signalfd_siginfo), -EINVAL is returned.  A read from the signalfd can also
      return -ERESTARTSYS in case a signal hits the process.  The format of the
      struct signalfd_siginfo is, and the valid fields depends of the (->code &
      __SI_MASK) value, in the same way a struct siginfo would:
      
      struct signalfd_siginfo {
      	__u32 signo;	/* si_signo */
      	__s32 err;	/* si_errno */
      	__s32 code;	/* si_code */
      	__u32 pid;	/* si_pid */
      	__u32 uid;	/* si_uid */
      	__s32 fd;	/* si_fd */
      	__u32 tid;	/* si_fd */
      	__u32 band;	/* si_band */
      	__u32 overrun;	/* si_overrun */
      	__u32 trapno;	/* si_trapno */
      	__s32 status;	/* si_status */
      	__s32 svint;	/* si_int */
      	__u64 svptr;	/* si_ptr */
      	__u64 utime;	/* si_utime */
      	__u64 stime;	/* si_stime */
      	__u64 addr;	/* si_addr */
      };
      
      [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fba2afaa
    • S
      attach_pid() with struct pid parameter · e713d0da
      Sukadev Bhattiprolu 提交于
      attach_pid() currently takes a pid_t and then uses find_pid() to find the
      corresponding struct pid.  Sometimes we already have the struct pid.  We can
      then skip find_pid() if attach_pid() were to take a struct pid parameter.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: <containers@lists.osdl.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e713d0da
    • S
      [PATCH] Abnormal End of Processes · 0a4ff8c2
      Steve Grubb 提交于
      Hi,
      
      I have been working on some code that detects abnormal events based on audit
      system events. One kind of event that we currently have no visibility for is
      when a program terminates due to segfault - which should never happen on a
      production machine. And if it did, you'd want to investigate it. Attached is a
      patch that collects these events and sends them into the audit system.
      Signed-off-by: NSteve Grubb <sgrubb@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a4ff8c2
  4. 09 5月, 2007 2 次提交
    • K
      (re)register_binfmt returns with -EBUSY · 98701d1b
      kalash nainwal 提交于
      When a binary format is unregistered and re-registered, register_binfmt
      fails with -EBUSY.  The reason is that unregister_binfmt does not set
      fmt->next to NULL, and seeing (fmt->next != NULL), register_binfmt fails
      with -EBUSY.
      
      One can find his way around by explicitly setting fmt->next to NULL after
      unregistering, but that is kind of unclean (one should better be using only
      the interfaces, and not the interal members, isn't it?)
      
      Attached one-liner can fix it.
      Signed-off-by: NKalash Nainwal <kalash.nainwal@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98701d1b
    • N
      exec: fix remove_arg_zero · 4fc75ff4
      Nick Piggin 提交于
      Petr Tesarik discovered a problem in remove_arg_zero(). He writes:
      
       When a script is loaded, load_script() replaces argv[0] with the
       name of the interpreter and the filename passed to the exec syscall.
       However, there is no guarantee that the length of the interpreter
       name plus the length of the filename is greater than the length of
       the original argv[0]. If the difference happens to cross a page boundary,
       setup_arg_pages() will call put_dirty_page() [aka install_arg_page()]
       with an address outside the VMA.
      
       Therefore, remove_arg_zero() must free all pages which would be unused
       after the argument is removed.
      
      So, rewrite the remove_arg_zero function without gotos, with a few comments,
      and with the commonly used explicit index/offset. This fixes the problem
      and makes it easier to understand as well.
      
      [a.p.zijlstra@chello.nl: add comment]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fc75ff4
  5. 18 4月, 2007 1 次提交
  6. 12 2月, 2007 1 次提交
  7. 11 12月, 2006 1 次提交
    • V
      [PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69
      Vadim Lobanov 提交于
      Currently, each fdtable supports three dynamically-sized arrays of data: the
      fdarray and two fdsets.  The code allows the number of fds supported by the
      fdarray (fdtable->max_fds) to differ from the number of fds supported by each
      of the fdsets (fdtable->max_fdset).
      
      In practice, it is wasteful for these two sizes to differ: whenever we hit a
      limit on the smaller-capacity structure, we will reallocate the entire fdtable
      and all the dynamic arrays within it, so any delta in the memory used by the
      larger-capacity structure will never be touched at all.
      
      Rather than hogging this excess, we shouldn't even allocate it in the first
      place, and keep the capacities of the fdarray and the fdsets equal.  This
      patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
      code becomes simpler.
      Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bbea9f69
  8. 09 12月, 2006 2 次提交
  9. 08 12月, 2006 2 次提交
  10. 02 10月, 2006 1 次提交
  11. 01 10月, 2006 2 次提交
    • A
      [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern · d025c9db
      Andi Kleen 提交于
      Using the infrastructure created in previous patches implement support to
      pipe core dumps into programs.
      
      This is done by overloading the existing core_pattern sysctl
      with a new syntax:
      
      |program
      
      When the first character of the pattern is a '|' the kernel will instead
      threat the rest of the pattern as a command to run.  The core dump will be
      written to the standard input of that program instead of to a file.
      
      This is useful for having automatic core dump analysis without filling up
      disks.  The program can do some simple analysis and save only a summary of
      the core dump.
      
      The core dump proces will run with the privileges and in the name space of
      the process that caused the core dump.
      
      I also increased the core pattern size to 128 bytes so that longer command
      lines fit.
      
      Most of the changes comes from allowing core dumps without seeks.  They are
      fairly straight forward though.
      
      One small incompatibility is that if someone had a core pattern previously
      that started with '|' they will get suddenly new behaviour.  I think that's
      unlikely to be a real problem though.
      
      Additional background:
      
      > Very nice, do you happen to have a program that can accept this kind of
      > input for crash dumps?  I'm guessing that the embedded people will
      > really want this functionality.
      
      I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
      ran gdb on it to get a backtrace and wrote the summary to a shared directory.
      Then there was a simple CGI script to generate a "top 10" crashes HTML
      listing.
      
      Unfortunately this still had the disadvantage to needing full disk space for a
      dump except for deleting it afterwards (in fact it was worse because over the
      pipe holes didn't work so if you have a holey address map it would require
      more space).
      
      Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
      cores (at least it worked with zsh's =(cat core) syntax), so it would be
      likely possible to do it without temporary space with a simple wrapper that
      calls it in the right way.  I ran out of time before doing that though.
      
      The demo prototype scripts weren't very good.  If there is really interest I
      can dig them out (they are currently on a laptop disk on the desk with the
      laptop itself being in service), but I would recommend to rewrite them for any
      serious application of this and fix the disk space problem.
      
      Also to be really useful it should probably find a way to automatically fetch
      the debuginfos (I cheated and just installed them in advance).  If nobody else
      does it I can probably do the rewrite myself again at some point.
      
      My hope at some point was that desktops would support it in their builtin
      crash reporters, but at least the KDE people I talked too seemed to be happy
      with their user space only solution.
      
      Alan sayeth:
      
        I don't believe that piping as such as neccessarily the right model, but
        the ability to intercept and processes core dumps from user space is asked
        for by many enterprise users as well.  They want to know about, capture,
        analyse and process core dumps, often centrally and in automated form.
      
      [akpm@osdl.org: loff_t != unsigned long]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d025c9db
    • J
      [PATCH] csa: convert CONFIG tag for extended accounting routines · 8f0ab514
      Jay Lan 提交于
      There were a few accounting data/macros that are used in CSA but are #ifdef'ed
      inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
      CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
      kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
      include/linux/tsacct_kern.h.
      Signed-off-by: NJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f0ab514
  12. 30 9月, 2006 1 次提交
  13. 27 9月, 2006 2 次提交
  14. 28 8月, 2006 1 次提交
    • D
      [PATCH] fix up lockdep trace in fs/exec.c · 513627d7
      Dave Jones 提交于
      This fixes the locking error noticed by lockdep:
      
        =============================================
        [ INFO: possible recursive locking detected ]
        ---------------------------------------------
        init/1 is trying to acquire lock:
         (&sighand->siglock){....}, at: [<c047a78a>] flush_old_exec+0x3ae/0x859
      
        but task is already holding lock:
         (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859
      
        other info that might help us debug this:
        2 locks held by init/1:
         #0:  (tasklist_lock){..--}, at: [<c047a76a>] flush_old_exec+0x38e/0x859
         #1:  (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859
      
        stack backtrace:
         [<c04051e1>] show_trace_log_lvl+0x54/0xfd
         [<c040579d>] show_trace+0xd/0x10
         [<c04058b6>] dump_stack+0x19/0x1b
         [<c043b33a>] __lock_acquire+0x773/0x997
         [<c043bacf>] lock_acquire+0x4b/0x6c
         [<c060630b>] _spin_lock+0x19/0x28
         [<c047a78a>] flush_old_exec+0x3ae/0x859
         [<c0498053>] load_elf_binary+0x4aa/0x1628
         [<c0479cab>] search_binary_handler+0xa7/0x24e
         [<c047b577>] do_execve+0x15b/0x1f9
         [<c04022b4>] sys_execve+0x29/0x4d
         [<c0403faf>] syscall_call+0x7/0xb
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      513627d7
  15. 25 8月, 2006 2 次提交
  16. 01 7月, 2006 1 次提交
  17. 27 6月, 2006 8 次提交
  18. 23 6月, 2006 1 次提交
    • M
      [PATCH] remove steal_locks() · c89681ed
      Miklos Szeredi 提交于
      This patch removes the steal_locks() function.
      
      steal_locks() doesn't work correctly with any filesystem that does it's own
      lock management, including NFS, CIFS, etc.
      
      In addition it has weird semantics on local filesystems in case tasks
      sharing file-descriptor tables are doing POSIX locking operations in
      parallel to execve().
      
      The steal_locks() function has an effect on applications doing:
      
      clone(CLONE_FILES)
        /* in child */
        lock
        execve
        lock
      
      POSIX locks acquired before execve (by "child", "parent" or any further
      task sharing files_struct) will after the execve be owned exclusively by
      "child".
      
      According to Chris Wright some LSB/LTP kind of suite triggers without the
      stealing behavior, but there's no known real-world application that would
      also fail.
      
      Apps using NPTL are not affected, since all other threads are killed before
      execve.
      
      Apps using LinuxThreads are only affected if they
      
        - have multiple threads during exec (LinuxThreads doesn't kill other
          threads, the app may do it with pthread_kill_other_threads_np())
        - rely on POSIX locks being inherited across exec
      
      Both conditions are documented, but not their interaction.
      
      Apps using clone() natively are affected if they
      
        - use clone(CLONE_FILES)
        - rely on POSIX locks being inherited across exec
      
      The above scenarios are unlikely, but possible.
      
      If the patch is vetoed, there's a plan B, that involves mostly keeping the
      weird stealing semantics, but changing the way lock ownership is handled so
      that network and local filesystems work consistently.
      
      That would add more complexity though, so this solution seems to be
      preferred by most people.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Steven French <sfrench@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c89681ed
  19. 20 6月, 2006 1 次提交
  20. 20 4月, 2006 1 次提交
  21. 14 4月, 2006 1 次提交
    • E
      [PATCH] de_thread: Don't change our parents and ptrace flags. · c06511d1
      Eric W. Biederman 提交于
      This is two distinct changes.
       - Not changing our real parents.
       - Not changing our ptrace parents.
      
      Not changing our real parents is trivially correct because both tasks
      have the same real parents as they are part of a thread group.  Now that
      we demote the leader to a thread there is no longer any reason to change
      it's parentage.
      
      Not changing our ptrace parents is a user visible change if someone
      looks hard enough.  I don't think user space applications will care or
      even notice.
      
      In the practical and I think common case a debugger will have attached
      to all of the threads using the same ptrace flags.  From my quick skim
      of strace and gdb that appears to be the case.  Which if true means
      debuggers will not notice a change.
      
      Before this point we have already generated a ptrace event in do_exit
      that reports the leaders pid has died so de_thread is visible to a
      debugger.  Which means attempting to hide this case by copying flags
      around appears excessive.
      
      By not doing anything it avoids all of the weird locking issues between
      de_thread and ptrace attach, and removes one case from consideration for
      fixing the ptrace locking.
      
      This only addresses Oleg's first concern with ptrace_attach, that of the
      problems caused by reparenting.  Oleg's second concern is essentially a
      race between ptrace_attach and release_task that causes an oops when we
      get to force_sig_specific.  There is nothing special about de_thread
      with respect to that race.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c06511d1
  22. 11 4月, 2006 2 次提交
    • R
      [PATCH] process accounting: take original leader's start_time in non-leader exec · f5e90281
      Roland McGrath 提交于
      The only record we have of the real-time age of a process, regardless of
      execs it's done, is start_time.  When a non-leader thread exec, the
      original start_time of the process is lost.  Things looking at the
      real-time age of the process are fooled, for example the process accounting
      record when the process finally dies.  This change makes the oldest
      start_time stick around with the process after a non-leader exec.  This way
      the association between PID and start_time is kept constant, which seems
      correct to me.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f5e90281
    • E
      [PATCH] de_thread: Don't confuse users do_each_thread. · de12a787
      Eric W. Biederman 提交于
      Oleg Nesterov spotted two interesting bugs with the current de_thread
      code.  The simplest is a long standing double decrement of
      __get_cpu_var(process_counts) in __unhash_process.  Caused by
      two processes exiting when only one was created.
      
      The other is that since we no longer detach from the thread_group list
      it is possible for do_each_thread when run under the tasklist_lock to
      see the same task_struct twice.  Once on the task list as a
      thread_group_leader, and once on the thread list of another
      thread.
      
      The double appearance in do_each_thread can cause a double increment
      of mm_core_waiters in zap_threads resulting in problems later on in
      coredump_wait.
      
      To remedy those two problems this patch takes the simple approach
      of changing the old thread group leader into a child thread.
      The only routine in release_task that cares is __unhash_process,
      and it can be trivially seen that we handle cleaning up a
      thread group leader properly.
      
      Since de_thread doesn't change the pid of the exiting leader process
      and instead shares it with the new leader process.  I change
      thread_group_leader to recognize group leadership based on the
      group_leader field and not based on pids.  This should also be
      slightly cheaper then the existing thread_group_leader macro.
      
      I performed a quick audit and I couldn't see any user of
      thread_group_leader that cared about the difference.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      de12a787
  23. 01 4月, 2006 1 次提交
  24. 29 3月, 2006 1 次提交