1. 11 5月, 2007 21 次提交
    • D
      signal/timer/event: KAIO eventfd support example · 9c3060be
      Davide Libenzi 提交于
      This is an example about how to add eventfd support to the current KAIO code,
      in order to enable KAIO to post readiness events to a pollable fd (hence
      compatible with POSIX select/poll).  The KAIO code simply signals the eventfd
      fd when events are ready, and this triggers a POLLIN in the fd.  This patch
      uses a reserved for future use member of the struct iocb to pass an eventfd
      file descriptor, that KAIO will use to post events every time a request
      completes.  At that point, an aio_getevents() will return the completed result
      to a struct io_event.  I made a quick test program to verify the patch, and it
      runs fine here:
      
      http://www.xmailserver.org/eventfd-aio-test.c
      
      The test program uses poll(2), but it'd, of course, work with select and epoll
      too.
      
      This can allow to schedule both block I/O and other poll-able devices
      requests, and wait for results using select/poll/epoll.  In a typical
      scenario, an application would submit KAIO request using aio_submit(), and
      will also use epoll_ctl() on the whole other class of devices (that with the
      addition of signals, timers and user events, now it's pretty much complete),
      and then would:
      
      	epoll_wait(...);
      	for_each_event {
      		if (curr_event_is_kaiofd) {
      			aio_getevents();
      			dispatch_aio_events();
      		} else {
      			dispatch_epoll_event();
      		}
      	}
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c3060be
    • D
      signal/timer/event: eventfd core · e1ad7468
      Davide Libenzi 提交于
      This is a very simple and light file descriptor, that can be used as event
      wait/dispatch by userspace (both wait and dispatch) and by the kernel
      (dispatch only).  It can be used instead of pipe(2) in all cases where those
      would simply be used to signal events.  Their kernel overhead is much lower
      than pipes, and they do not consume two fds.  When used in the kernel, it can
      offer an fd-bridge to enable, for example, functionalities like KAIO or
      syslets/threadlets to signal to an fd the completion of certain operations.
      But more in general, an eventfd can be used by the kernel to signal readiness,
      in a POSIX poll/select way, of interfaces that would otherwise be incompatible
      with it.  The API is:
      
      int eventfd(unsigned int count);
      
      The eventfd API accepts an initial "count" parameter, and returns an eventfd
      fd.  It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).
      
      The POLLIN flag is raised when the internal counter is greater than zero.
      
      The POLLOUT flag is raised when at least a value of "1" can be written to the
      internal counter.
      
      The POLLERR flag is raised when an overflow in the counter value is detected.
      
      The write(2) operation can never overflow the counter, since it blocks (unless
      O_NONBLOCK is set, in which case -EAGAIN is returned).
      
      But the eventfd_signal() function can do it, since it's supposed to not sleep
      during its operation.
      
      The read(2) function reads the __u64 counter value, and reset the internal
      value to zero.  If the value read is equal to (__u64) -1, an overflow happened
      on the internal counter (due to 2^64 eventfd_signal() posts that has never
      been retired - unlickely, but possible).
      
      The write(2) call writes an __u64 count value, and adds it to the current
      counter.  The eventfd fd supports O_NONBLOCK also.
      
      On the kernel side, we have:
      
      struct file *eventfd_fget(int fd);
      int eventfd_signal(struct file *file, unsigned int n);
      
      The eventfd_fget() should be called to get a struct file* from an eventfd fd
      (this is an fget() + check of f_op being an eventfd fops pointer).
      
      The kernel can then call eventfd_signal() every time it wants to post an event
      to userspace.  The eventfd_signal() function can be called from any context.
      An eventfd() simple test and bench is available here:
      
      http://www.xmailserver.org/eventfd-bench.c
      
      This is the eventfd-based version of pipetest-4 (pipe(2) based):
      
      http://www.xmailserver.org/pipetest-4.c
      
      Not that performance matters much in the eventfd case, but eventfd-bench
      shows almost as double as performance than pipetest-4.
      
      [akpm@linux-foundation.org: fix i386 build]
      [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1ad7468
    • D
      signal/timer/event: timerfd compat code · 83f5d126
      Davide Libenzi 提交于
      This patch implements the necessary compat code for the timerfd system call.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83f5d126
    • D
      signal/timer/event: timerfd core · b215e283
      Davide Libenzi 提交于
      This patch introduces a new system call for timers events delivered though
      file descriptors.  This allows timer event to be used with standard POSIX
      poll(2), select(2) and read(2).  As a consequence of supporting the Linux
      f_op->poll subsystem, they can be used with epoll(2) too.
      
      The system call is defined as:
      
      int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);
      
      The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
      w/out going through the close/open cycle (same as signalfd).  If "ufd" is -1,
      s new file descriptor will be created, otherwise the existing "ufd" will be
      re-programmed.
      
      The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME.  The time
      specified in the "utmr->it_value" parameter is the expiry time for the timer.
      
      If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
      otherwise it's a relative time.
      
      If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
      tv_nsec == 0), this is the period at which the following ticks should be
      generated.
      
      The "utmr->it_interval" should be set to zero if only one tick is requested.
      Setting the "utmr->it_value" to zero will disable the timer, or will create a
      timerfd without the timer enabled.
      
      The function returns the new (or same, in case "ufd" is a valid timerfd
      descriptor) file, or -1 in case of error.
      
      As stated before, the timerfd file descriptor supports poll(2), select(2) and
      epoll(2).  When a timer event happened on the timerfd, a POLLIN mask will be
      returned.
      
      The read(2) call can be used, and it will return a u32 variable holding the
      number of "ticks" that happened on the interface since the last call to
      read(2).  The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
      be returned if no ticks happened.
      
      A quick test program, shows timerfd working correctly on my amd64 box:
      
      http://www.xmailserver.org/timerfd-test.c
      
      [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b215e283
    • D
      signal/timer/event: signalfd compat code · 6d18c922
      Davide Libenzi 提交于
      This patch implements the necessary compat code for the signalfd system call.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d18c922
    • D
      signal/timer/event: signalfd core · fba2afaa
      Davide Libenzi 提交于
      This patch series implements the new signalfd() system call.
      
      I took part of the original Linus code (and you know how badly it can be
      broken :), and I added even more breakage ;) Signals are fetched from the same
      signal queue used by the process, so signalfd will compete with standard
      kernel delivery in dequeue_signal().  If you want to reliably fetch signals on
      the signalfd file, you need to block them with sigprocmask(SIG_BLOCK).  This
      seems to be working fine on my Dual Opteron machine.  I made a quick test
      program for it:
      
      http://www.xmailserver.org/signafd-test.c
      
      The signalfd() system call implements signal delivery into a file descriptor
      receiver.  The signalfd file descriptor if created with the following API:
      
      int signalfd(int ufd, const sigset_t *mask, size_t masksize);
      
      The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
      to close/create cycle (Linus idea).  Use "ufd" == -1 if you want a brand new
      signalfd file.
      
      The "mask" allows to specify the signal mask of signals that we are interested
      in.  The "masksize" parameter is the size of "mask".
      
      The signalfd fd supports the poll(2) and read(2) system calls.  The poll(2)
      will return POLLIN when signals are available to be dequeued.  As a direct
      consequence of supporting the Linux poll subsystem, the signalfd fd can use
      used together with epoll(2) too.
      
      The read(2) system call will return a "struct signalfd_siginfo" structure in
      the userspace supplied buffer.  The return value is the number of bytes copied
      in the supplied buffer, or -1 in case of error.  The read(2) call can also
      return 0, in case the sighand structure to which the signalfd was attached,
      has been orphaned.  The O_NONBLOCK flag is also supported, and read(2) will
      return -EAGAIN in case no signal is available.
      
      If the size of the buffer passed to read(2) is lower than sizeof(struct
      signalfd_siginfo), -EINVAL is returned.  A read from the signalfd can also
      return -ERESTARTSYS in case a signal hits the process.  The format of the
      struct signalfd_siginfo is, and the valid fields depends of the (->code &
      __SI_MASK) value, in the same way a struct siginfo would:
      
      struct signalfd_siginfo {
      	__u32 signo;	/* si_signo */
      	__s32 err;	/* si_errno */
      	__s32 code;	/* si_code */
      	__u32 pid;	/* si_pid */
      	__u32 uid;	/* si_uid */
      	__s32 fd;	/* si_fd */
      	__u32 tid;	/* si_fd */
      	__u32 band;	/* si_band */
      	__u32 overrun;	/* si_overrun */
      	__u32 trapno;	/* si_trapno */
      	__s32 status;	/* si_status */
      	__s32 svint;	/* si_int */
      	__u64 svptr;	/* si_ptr */
      	__u64 utime;	/* si_utime */
      	__u64 stime;	/* si_stime */
      	__u64 addr;	/* si_addr */
      };
      
      [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fba2afaa
    • D
      signal/timer/event fds: anonymous inode source · 5dc8bf81
      Davide Libenzi 提交于
      This patch add an anonymous inode source, to be used for files that need
      and inode only in order to create a file*. We do not care of having an
      inode for each file, and we do not even care of having different names in
      the associated dentries (dentry names will be same for classes of file*).
      This allow code reuse, and will be used by epoll, signalfd and timerfd
      (and whatever else there'll be).
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dc8bf81
    • S
      Replace pid_t in autofs with struct pid reference · fa0334f1
      Sukadev Bhattiprolu 提交于
      Make autofs container-friendly by caching struct pid reference rather than
      pid_t and using pid_nr() to retreive a task's pid_t.
      
      ChangeLog:
      	- Fix Eric Biederman's comments - Use find_get_pid() to hold a
      	  reference to oz_pgrp and release while unmounting; separate out
      	  changes to autofs and autofs4.
      	- Fix Cedric's comments: retain old prototype of parse_options()
      	  and move necessary change to its caller.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: containers@lists.osdl.org
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa0334f1
    • S
      Fix some coding-style errors in autofs · d78e53c8
      Sukadev Bhattiprolu 提交于
      Fix coding style errors (extra spaces, long lines) in autofs and autofs4 files
      being modified for container/pidspace issues.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: <containers@lists.osdl.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d78e53c8
    • S
      attach_pid() with struct pid parameter · e713d0da
      Sukadev Bhattiprolu 提交于
      attach_pid() currently takes a pid_t and then uses find_pid() to find the
      corresponding struct pid.  Sometimes we already have the struct pid.  We can
      then skip find_pid() if attach_pid() were to take a struct pid parameter.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: <containers@lists.osdl.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e713d0da
    • M
      consolidate generic_writepages and mpage_writepages · 0ea97180
      Miklos Szeredi 提交于
      Clean up massive code duplication between mpage_writepages() and
      generic_writepages().
      
      The new generic function, write_cache_pages() takes a function pointer
      argument, which will be called for each page to be written.
      
      Maybe cifs_writepages() too can use this infrastructure, but I'm not
      touching that with a ten-foot pole.
      
      The upcoming page writeback support in fuse will also want this.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ea97180
    • O
      small cleanup in gpt partition handling · 4c64c30a
      Olaf Hering 提交于
      Remove unused argument in is_pmbr_valid()
      Remove unneeded initialization of local variable legacy_mbr
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c64c30a
    • G
      Let SYSV68_PARTITION default to yes on VME only · 22258d40
      Geert Uytterhoeven 提交于
      Don't enable SYSV68 partition table support on all m68k boxes by default,
      only on Motorola VME boards.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Philippe De Muyter <phdm@macqel.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22258d40
    • D
      AFS: implement statfs · 45222b9e
      David Howells 提交于
      Implement the statfs() op for AFS.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45222b9e
    • D
      AFS: fix a couple of problems with unlinking AFS files · 0f300ca9
      David Howells 提交于
      Fix a couple of problems with unlinking AFS files.
      
       (1) The parent directory wasn't being updated properly between unlink() and
           the following lookup().
      
           It seems that, for some reason, invalidate_remote_inode() wasn't
           discarding the directory contents correctly, so this patch calls
           invalidate_inode_pages2() instead on non-regular files.
      
       (2) afs_vnode_deleted_remotely() should handle vnodes that don't have a
           source server recorded without oopsing.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f300ca9
    • D
      AFS: fix interminable loop in afs_write_back_from_locked_page() · 9d577b6a
      David Howells 提交于
      Following bug was uncovered by compiling with '-W' flag:
      
        CC [M]  fs/afs/write.o
      fs/afs/write.c: In function ‘afs_write_back_from_locked_page’:
      fs/afs/write.c:398: warning: comparison of unsigned expression >= 0 is always true
      
      Loop variable 'n' is unsigned, so wraps around happily as far as I can
      see. Trival fix attached (compile tested only).
      Signed-off-by: NMika Kukkonen <mikukkon@iki.fi>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d577b6a
    • J
      locks: fix F_GETLK regression (failure to find conflicts) · 129a84de
      J. Bruce Fields 提交于
      In 9d6a8c5c we changed posix_test_lock
      to modify its single file_lock argument instead of taking separate input
      and output arguments.  This makes it no longer safe to set the output
      lock's fl_type to F_UNLCK before looking for a conflict, since that
      means searching for a conflict against a lock with type F_UNLCK.
      
      This fixes a regression which causes F_GETLK to incorrectly report no
      conflict on most filesystems (including any filesystem that doesn't do
      its own locking).
      
      Also fix posix_lock_to_flock() to copy the lock type.  This isn't
      strictly necessary, since the caller already does this; but it seems
      less likely to cause confusion in the future.
      
      Thanks to Doug Chapman for the bug report.
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      Acked-by: NDoug Chapman <doug.chapman@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      129a84de
    • S
      Allow compat_ioctl.c to compile without CONFIG_NET · d9de2622
      Simon Horman 提交于
      A small regression appears to have been introduced in the recent patch
      "cleanup compat ioctl handling", which was included in Linus' tree after
      2.6.20.
      
      siocdevprivate_ioctl() is no longer defined if CONFIG_NET is undefined,
      whereas previously it was a dummy function in this case.
      
      This causes compilation with CONFIG_COMPAT but without CONFIG_NET to fail.
      
      fs/compat_ioctl.c: In function `compat_sys_ioctl':
      fs/compat_ioctl.c:3571: warning: implicit declaration of function `siocdevprivate_ioctl'
      
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9de2622
    • R
      ocfs2: kobject/kset foobar · c4a7f5eb
      Randy Dunlap 提交于
      Fix gcc warning and Oops that it causes:
      
      fs/ocfs2/cluster/masklog.c:161: warning: assignment from incompatible pointer type
      [ 2776.204120] OCFS2 Node Manager 1.3.3
      [ 2776.211729] BUG: spinlock bad magic on CPU#0, modprobe/4424
      [ 2776.214269]  lock: ffff810021c8fe18, .magic: ffffffff, .owner: /6394416, .owner_cpu: 0
      [ 2776.217864] [ 2776.217865] Call Trace:
      [ 2776.219662]  [<ffffffff803426c8>] spin_bug+0x9e/0xe9
      [ 2776.221921]  [<ffffffff803427bf>] _raw_spin_lock+0x23/0xf9
      [ 2776.224417]  [<ffffffff8051acf4>] _spin_lock+0x9/0xb
      [ 2776.226676]  [<ffffffff8033c3b1>] kobject_shadow_add+0x98/0x1ac
      [ 2776.229367]  [<ffffffff8033c4d0>] kobject_add+0xb/0xd
      [ 2776.231665]  [<ffffffff8033c4df>] kset_add+0xd/0xf
      [ 2776.233845]  [<ffffffff8033c5a6>] kset_register+0x23/0x28
      [ 2776.236309]  [<ffffffff8808ccb7>] :ocfs2_nodemanager:mlog_sys_init+0x68/0x6d
      [ 2776.239518]  [<ffffffff8808ccee>] :ocfs2_nodemanager:o2cb_sys_init+0x32/0x4a
      [ 2776.242726]  [<ffffffff880b80a6>] :ocfs2_nodemanager:init_o2nm+0xa6/0xd5
      [ 2776.245772]  [<ffffffff8025266c>] sys_init_module+0x1471/0x15d2
      [ 2776.248465]  [<ffffffff8033f250>] simple_strtoull+0x0/0xdc
      [ 2776.250959]  [<ffffffff8020948e>] system_call+0x7e/0x83
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: NMark Fasheh <mark.fasheh@oracle.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4a7f5eb
    • D
      AFS: further write support fixes · 5bbf5d39
      David Howells 提交于
      Further fixes for AFS write support:
      
       (1) The afs_send_pages() outer loop must do an extra iteration if it ends
           with 'first == last' because 'last' is inclusive in the page set
           otherwise it fails to send the last page and complete the RxRPC op under
           some circumstances.
      
       (2) Similarly, the outer loop in afs_pages_written_back() must also do an
           extra iteration if it ends with 'first == last', otherwise it fails to
           clear PG_writeback on the last page under some circumstances.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5bbf5d39
    • D
      AFS: write support fixes · b9b1f8d5
      David Howells 提交于
      AFS write support fixes:
      
       (1) Support large files using the 64-bit file access operations if available
           on the server.
      
       (2) Use kmap_atomic() rather than kmap() in afs_prepare_page().
      
       (3) Don't do stuff in afs_writepage() that's done by the caller.
      
      [akpm@linux-foundation.org: fix right shift count >= width of type]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9b1f8d5
  2. 10 5月, 2007 19 次提交