1. 11 5月, 2007 31 次提交
    • D
      signal/timer/event: KAIO eventfd support example · 9c3060be
      Davide Libenzi 提交于
      This is an example about how to add eventfd support to the current KAIO code,
      in order to enable KAIO to post readiness events to a pollable fd (hence
      compatible with POSIX select/poll).  The KAIO code simply signals the eventfd
      fd when events are ready, and this triggers a POLLIN in the fd.  This patch
      uses a reserved for future use member of the struct iocb to pass an eventfd
      file descriptor, that KAIO will use to post events every time a request
      completes.  At that point, an aio_getevents() will return the completed result
      to a struct io_event.  I made a quick test program to verify the patch, and it
      runs fine here:
      
      http://www.xmailserver.org/eventfd-aio-test.c
      
      The test program uses poll(2), but it'd, of course, work with select and epoll
      too.
      
      This can allow to schedule both block I/O and other poll-able devices
      requests, and wait for results using select/poll/epoll.  In a typical
      scenario, an application would submit KAIO request using aio_submit(), and
      will also use epoll_ctl() on the whole other class of devices (that with the
      addition of signals, timers and user events, now it's pretty much complete),
      and then would:
      
      	epoll_wait(...);
      	for_each_event {
      		if (curr_event_is_kaiofd) {
      			aio_getevents();
      			dispatch_aio_events();
      		} else {
      			dispatch_epoll_event();
      		}
      	}
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c3060be
    • D
      signal/timer/event: eventfd core · e1ad7468
      Davide Libenzi 提交于
      This is a very simple and light file descriptor, that can be used as event
      wait/dispatch by userspace (both wait and dispatch) and by the kernel
      (dispatch only).  It can be used instead of pipe(2) in all cases where those
      would simply be used to signal events.  Their kernel overhead is much lower
      than pipes, and they do not consume two fds.  When used in the kernel, it can
      offer an fd-bridge to enable, for example, functionalities like KAIO or
      syslets/threadlets to signal to an fd the completion of certain operations.
      But more in general, an eventfd can be used by the kernel to signal readiness,
      in a POSIX poll/select way, of interfaces that would otherwise be incompatible
      with it.  The API is:
      
      int eventfd(unsigned int count);
      
      The eventfd API accepts an initial "count" parameter, and returns an eventfd
      fd.  It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).
      
      The POLLIN flag is raised when the internal counter is greater than zero.
      
      The POLLOUT flag is raised when at least a value of "1" can be written to the
      internal counter.
      
      The POLLERR flag is raised when an overflow in the counter value is detected.
      
      The write(2) operation can never overflow the counter, since it blocks (unless
      O_NONBLOCK is set, in which case -EAGAIN is returned).
      
      But the eventfd_signal() function can do it, since it's supposed to not sleep
      during its operation.
      
      The read(2) function reads the __u64 counter value, and reset the internal
      value to zero.  If the value read is equal to (__u64) -1, an overflow happened
      on the internal counter (due to 2^64 eventfd_signal() posts that has never
      been retired - unlickely, but possible).
      
      The write(2) call writes an __u64 count value, and adds it to the current
      counter.  The eventfd fd supports O_NONBLOCK also.
      
      On the kernel side, we have:
      
      struct file *eventfd_fget(int fd);
      int eventfd_signal(struct file *file, unsigned int n);
      
      The eventfd_fget() should be called to get a struct file* from an eventfd fd
      (this is an fget() + check of f_op being an eventfd fops pointer).
      
      The kernel can then call eventfd_signal() every time it wants to post an event
      to userspace.  The eventfd_signal() function can be called from any context.
      An eventfd() simple test and bench is available here:
      
      http://www.xmailserver.org/eventfd-bench.c
      
      This is the eventfd-based version of pipetest-4 (pipe(2) based):
      
      http://www.xmailserver.org/pipetest-4.c
      
      Not that performance matters much in the eventfd case, but eventfd-bench
      shows almost as double as performance than pipetest-4.
      
      [akpm@linux-foundation.org: fix i386 build]
      [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1ad7468
    • D
      signal/timer/event: timerfd compat code · 83f5d126
      Davide Libenzi 提交于
      This patch implements the necessary compat code for the timerfd system call.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83f5d126
    • D
      signal/timer/event: timerfd core · b215e283
      Davide Libenzi 提交于
      This patch introduces a new system call for timers events delivered though
      file descriptors.  This allows timer event to be used with standard POSIX
      poll(2), select(2) and read(2).  As a consequence of supporting the Linux
      f_op->poll subsystem, they can be used with epoll(2) too.
      
      The system call is defined as:
      
      int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);
      
      The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
      w/out going through the close/open cycle (same as signalfd).  If "ufd" is -1,
      s new file descriptor will be created, otherwise the existing "ufd" will be
      re-programmed.
      
      The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME.  The time
      specified in the "utmr->it_value" parameter is the expiry time for the timer.
      
      If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
      otherwise it's a relative time.
      
      If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
      tv_nsec == 0), this is the period at which the following ticks should be
      generated.
      
      The "utmr->it_interval" should be set to zero if only one tick is requested.
      Setting the "utmr->it_value" to zero will disable the timer, or will create a
      timerfd without the timer enabled.
      
      The function returns the new (or same, in case "ufd" is a valid timerfd
      descriptor) file, or -1 in case of error.
      
      As stated before, the timerfd file descriptor supports poll(2), select(2) and
      epoll(2).  When a timer event happened on the timerfd, a POLLIN mask will be
      returned.
      
      The read(2) call can be used, and it will return a u32 variable holding the
      number of "ticks" that happened on the interface since the last call to
      read(2).  The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
      be returned if no ticks happened.
      
      A quick test program, shows timerfd working correctly on my amd64 box:
      
      http://www.xmailserver.org/timerfd-test.c
      
      [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b215e283
    • D
      signal/timer/event: signalfd core · fba2afaa
      Davide Libenzi 提交于
      This patch series implements the new signalfd() system call.
      
      I took part of the original Linus code (and you know how badly it can be
      broken :), and I added even more breakage ;) Signals are fetched from the same
      signal queue used by the process, so signalfd will compete with standard
      kernel delivery in dequeue_signal().  If you want to reliably fetch signals on
      the signalfd file, you need to block them with sigprocmask(SIG_BLOCK).  This
      seems to be working fine on my Dual Opteron machine.  I made a quick test
      program for it:
      
      http://www.xmailserver.org/signafd-test.c
      
      The signalfd() system call implements signal delivery into a file descriptor
      receiver.  The signalfd file descriptor if created with the following API:
      
      int signalfd(int ufd, const sigset_t *mask, size_t masksize);
      
      The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
      to close/create cycle (Linus idea).  Use "ufd" == -1 if you want a brand new
      signalfd file.
      
      The "mask" allows to specify the signal mask of signals that we are interested
      in.  The "masksize" parameter is the size of "mask".
      
      The signalfd fd supports the poll(2) and read(2) system calls.  The poll(2)
      will return POLLIN when signals are available to be dequeued.  As a direct
      consequence of supporting the Linux poll subsystem, the signalfd fd can use
      used together with epoll(2) too.
      
      The read(2) system call will return a "struct signalfd_siginfo" structure in
      the userspace supplied buffer.  The return value is the number of bytes copied
      in the supplied buffer, or -1 in case of error.  The read(2) call can also
      return 0, in case the sighand structure to which the signalfd was attached,
      has been orphaned.  The O_NONBLOCK flag is also supported, and read(2) will
      return -EAGAIN in case no signal is available.
      
      If the size of the buffer passed to read(2) is lower than sizeof(struct
      signalfd_siginfo), -EINVAL is returned.  A read from the signalfd can also
      return -ERESTARTSYS in case a signal hits the process.  The format of the
      struct signalfd_siginfo is, and the valid fields depends of the (->code &
      __SI_MASK) value, in the same way a struct siginfo would:
      
      struct signalfd_siginfo {
      	__u32 signo;	/* si_signo */
      	__s32 err;	/* si_errno */
      	__s32 code;	/* si_code */
      	__u32 pid;	/* si_pid */
      	__u32 uid;	/* si_uid */
      	__s32 fd;	/* si_fd */
      	__u32 tid;	/* si_fd */
      	__u32 band;	/* si_band */
      	__u32 overrun;	/* si_overrun */
      	__u32 trapno;	/* si_trapno */
      	__s32 status;	/* si_status */
      	__s32 svint;	/* si_int */
      	__u64 svptr;	/* si_ptr */
      	__u64 utime;	/* si_utime */
      	__u64 stime;	/* si_stime */
      	__u64 addr;	/* si_addr */
      };
      
      [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fba2afaa
    • D
      signal/timer/event fds: anonymous inode source · 5dc8bf81
      Davide Libenzi 提交于
      This patch add an anonymous inode source, to be used for files that need
      and inode only in order to create a file*. We do not care of having an
      inode for each file, and we do not even care of having different names in
      the associated dentries (dentry names will be same for classes of file*).
      This allow code reuse, and will be used by epoll, signalfd and timerfd
      (and whatever else there'll be).
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dc8bf81
    • S
      Don't init pgrp and __session in INIT_SIGNALS · 325aa33d
      Sukadev Bhattiprolu 提交于
      Remove initialization of pgrp and __session in INIT_SIGNALS, as these are
      later set by the call to __set_special_pids() in init/main.c by the patch:
      
      	explicitly-set-pgid-and-sid-of-init-process.patch
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      325aa33d
    • S
      statically initialize struct pid for swapper · 820e45db
      Sukadev Bhattiprolu 提交于
      Statically initialize a struct pid for the swapper process (pid_t == 0) and
      attach it to init_task.  This is needed so task_pid(), task_pgrp() and
      task_session() interfaces work on the swapper process also.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: <containers@lists.osdl.org>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      820e45db
    • S
      attach_pid() with struct pid parameter · e713d0da
      Sukadev Bhattiprolu 提交于
      attach_pid() currently takes a pid_t and then uses find_pid() to find the
      corresponding struct pid.  Sometimes we already have the struct pid.  We can
      then skip find_pid() if attach_pid() were to take a struct pid parameter.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: <containers@lists.osdl.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e713d0da
    • M
      consolidate generic_writepages and mpage_writepages · 0ea97180
      Miklos Szeredi 提交于
      Clean up massive code duplication between mpage_writepages() and
      generic_writepages().
      
      The new generic function, write_cache_pages() takes a function pointer
      argument, which will be called for each page to be written.
      
      Maybe cifs_writepages() too can use this infrastructure, but I'm not
      touching that with a ten-foot pole.
      
      The upcoming page writeback support in fuse will also want this.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ea97180
    • P
      tty: add compat_ioctl · e10cc1df
      Paul Fulghum 提交于
      Add compat_ioctl method for tty code to allow processing of 32 bit ioctl
      calls on 64 bit systems by tty core, tty drivers, and line disciplines.
      
      Based on patch by Arnd Bergmann:
      http://www.uwsg.iu.edu/hypermail/linux/kernel/0511.0/1732.html
      
      [akpm@linux-foundation.org: make things static]
      Signed-off-by: NPaul Fulghum <paulkf@microgate.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e10cc1df
    • R
      module_author: don't advise putting in an email address · 108f39a1
      Rene Herman 提交于
      module_author: don't advise putting in an email address
      
      It's information that's easily outdated and easily mistaken for a driver
      contact which is a problem especially for modules with multiple current and
      non-current authors as well as for modules with a maintainer who may not
      even be a module author.
      Signed-off-by: NRene Herman <rene.herman@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      108f39a1
    • B
      Add hard_irq_disable() · 2d3fbbb3
      Benjamin Herrenschmidt 提交于
      Some architectures, like powerpc, implement lazy disabling of interrupts.
      That means that on those, local_irq_disable() doesn't actually disable
      interrupts on the CPU, but only sets some per CPU flag which cause them to be
      disabled only if an interrupt actually occurs.
      
      However, in some cases, such as stop_machine, we really want interrupts to be
      fully disabled.  For example, I have code using stop machine to do ECC error
      injection, used to verify operations of the ECC hardware, that sort of thing.
      It really needs to make sure that nothing is actually writing to memory while
      the injection happens.  Similar examples can be found in other low level bits
      and pieces.
      
      This patch implements a generic hard_irq_disable() function which is meant to
      be called -after- local_irq_disable() and ensures that interrupts are fully
      disabled on that CPU.  The default implementation is a nop, though powerpc
      does already provide an appropriate one.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d3fbbb3
    • P
      synclink_gt: add compat_ioctl · 2acdb169
      Paul Fulghum 提交于
      Add support for 32 bit ioctl on 64 bit systems for synclink_gt
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Fulghum <paulkf@microgate.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2acdb169
    • R
      lib/hexdump · 99eaf3c4
      Randy Dunlap 提交于
      Based on ace_dump_mem() from Grant Likely for the Xilinx SystemACE
      CompactFlash interface.
      
      Add print_hex_dump() & hex_dumper() to lib/hexdump.c and linux/kernel.h.
      
      This patch adds the functions print_hex_dump() & hex_dumper().
      print_hex_dump() can be used to perform a hex + ASCII dump of data to
      syslog, in an easily viewable format, thus providing a common text hex dump
      format.
      
      hex_dumper() provides a dump-to-memory function.  It converts one "line" of
      output (16 bytes of input) at a time.
      
      Example usages:
      	print_hex_dump(KERN_DEBUG, DUMP_PREFIX_ADDRESS, frame->data, frame->len);
      	hex_dumper(frame->data, frame->len, linebuf, sizeof(linebuf));
      
      Example output using %DUMP_PREFIX_OFFSET:
      0009ab42: 40414243 44454647 48494a4b 4c4d4e4f-@ABCDEFG HIJKLMNO
      Example output using %DUMP_PREFIX_ADDRESS:
      ffffffff88089af0: 70717273 74757677 78797a7b 7c7d7e7f-pqrstuvw xyz{|}~.
      
      [akpm@linux-foundation.org: cleanups, add export]
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99eaf3c4
    • E
      getrusage(): fill ru_inblock and ru_oublock fields if possible · 6eaeeaba
      Eric Dumazet 提交于
      If CONFIG_TASK_IO_ACCOUNTING is defined, we update io accounting counters for
      each task.
      
      This patch permits reporting of values using the well known getrusage()
      syscall, filling ru_inblock and ru_oublock instead of null values.
      
      As TASK_IO_ACCOUNTING currently counts bytes counts, we approximate blocks
      count doing : nr_blocks = nr_bytes / 512
      
      Example of use :
      ----------------------
      After patch is applied, /usr/bin/time command can now give a good
      approximation of IO that the process had to do.
      
      $ /usr/bin/time grep tototo /usr/include/*
      Command exited with non-zero status 1
      0.00user 0.02system 0:02.11elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
      24288inputs+0outputs (0major+259minor)pagefaults 0swaps
      
      $ /usr/bin/time dd if=/dev/zero of=/tmp/testfile count=1000
      1000+0 enregistrements lus
      1000+0 enregistrements écrits
      512000 octets (512 kB) copiés, 0,00326601 seconde, 157 MB/s
      0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+3000outputs (0major+299minor)pagefaults 0swaps
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6eaeeaba
    • L
      [ALSA] SoC WM8753 codec support · 1f53aee0
      Liam Girdwood 提交于
      This patch series adds support for the WM8753 codec as found on the
      OpenMoko Neo 1973 (other Neo 1973 and Samsung S3C24xx patches to follow
      today) as well other new devices.
      Features:-
       o HiFi and Voice DAI supported (inc runtime switching of DAI mode)
       o DAPM
       o All mixers
       o PLL calculator
       o 16,20 and 24bit samples.
       o WM8753 I2C ID added to include/linux/i2c-id.h
      From: Liam Girdwood <lg@opensource.wolfsonmicro.com>
      Signed-off-by: NHarald Welte <laforge@openmoko.org>
      Signed-off-by: NGraeme Gregory <gg@opensource.wolfsonmicro.com>
      Signed-off-by: NSeth Forshee <seth.forshee@gmail.com>
      Signed-off-by: NLiam Girdwood <lg@opensource.wolfsonmicro.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NJaroslav Kysela <perex@suse.cz>
      1f53aee0
    • J
      Fix compile/link of init/do_mounts.c with !CONFIG_BLOCK · 87c1efbf
      Jens Axboe 提交于
      We need a stub function for when CONFIG_BLOCK isn't set.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      87c1efbf
    • N
      When stacked block devices are in-use (e.g. md or dm), the recursive calls · d89d8796
      Neil Brown 提交于
      to generic_make_request can use up a lot of space, and we would rather they
      didn't.
      
      As generic_make_request is a void function, and as it is generally not
      expected that it will have any effect immediately, it is safe to delay any
      call to generic_make_request until there is sufficient stack space
      available.
      
      As ->bi_next is reserved for the driver to use, it can have no valid value
      when generic_make_request is called, and as __make_request implicitly
      assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be
      certain that all callers set it to NULL.  We can therefore safely use
      bi_next to link pending requests together, providing we clear it before
      making the real call.
      
      So, we choose to allow each thread to only be active in one
      generic_make_request at a time.  If a subsequent (recursive) call is made,
      the bio is linked into a per-thread list, and is handled when the active
      call completes.
      
      As the list of pending bios is per-thread, there are no locking issues to
      worry about.
      
      I say above that it is "safe to delay any call...".  There are, however,
      some behaviours of a make_request_fn which would make it unsafe.  These
      include any behaviour that assumes anything will have changed after a
      recursive call to generic_make_request.
      
      These could include:
       - waiting for that call to finish and call it's bi_end_io function.
         md use to sometimes do this (marking the superblock dirty before
         completing a write) but doesn't any more
       - inspecting the bio for fields that generic_make_request might
         change, such as bi_sector or bi_bdev.  It is hard to see a good
         reason for this, and I don't think anyone actually does it.
       - inspecing the queue to see if, e.g. it is 'full' yet.  Again, I
         think this is very unlikely to be useful, or to be done.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <dm-devel@redhat.com>
      
      Alasdair G Kergon <agk@redhat.com> said:
      
       I can see nothing wrong with this in principle.
      
       For device-mapper at the moment though it's essential that, while the bio
       mappings may now get delayed, they still get processed in exactly
       the same order as they were passed to generic_make_request().
      
       My main concern is whether the timing changes implicit in this patch
       will make the rare data-corrupting races in the existing snapshot code
       more likely. (I'm working on a fix for these races, but the unfinished
       patch is already several hundred lines long.)
      
       It would be helpful if some people on this mailing list would test
       this patch in various scenarios and report back.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d89d8796
    • S
      [PATCH] Abnormal End of Processes · 0a4ff8c2
      Steve Grubb 提交于
      Hi,
      
      I have been working on some code that detects abnormal events based on audit
      system events. One kind of event that we currently have no visibility for is
      when a program terminates due to segfault - which should never happen on a
      production machine. And if it did, you'd want to investigate it. Attached is a
      patch that collects these events and sends them into the audit system.
      Signed-off-by: NSteve Grubb <sgrubb@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a4ff8c2
    • A
      [PATCH] complete message queue auditing · 4fc03b9b
      Amy Griffis 提交于
      Handle the edge cases for POSIX message queue auditing. Collect inode
      info when opening an existing mq, and for send/receive operations. Remove
      audit_inode_update() as it has really evolved into the equivalent of
      audit_inode().
      Signed-off-by: NAmy Griffis <amy.griffis@hp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4fc03b9b
    • A
      [PATCH] audit signal recipients · e54dc243
      Amy Griffis 提交于
      When auditing syscalls that send signals, log the pid and security
      context for each target process. Optimize the data collection by
      adding a counter for signal-related rules, and avoiding allocating an
      aux struct unless we have more than one target process. For process
      groups, collect pid/context data in blocks of 16. Move the
      audit_signal_info() hook up in check_kill_permission() so we audit
      attempts where permission is denied.
      Signed-off-by: NAmy Griffis <amy.griffis@hp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e54dc243
    • A
      [PATCH] add SIGNAL syscall class (v3) · 7f13da40
      Amy Griffis 提交于
      Add a syscall class for sending signals.
      Signed-off-by: NAmy Griffis <amy.griffis@hp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7f13da40
    • A
      [PATCH] auditing ptrace · a5cb013d
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a5cb013d
    • P
      [NETFILTER]: Clean up table initialization · 3c2ad469
      Patrick McHardy 提交于
      - move arp_tables initial table structure definitions to arp_tables.h
        similar to ip_tables and ip6_tables
      
      - use C99 initializers
      
      - use initializer macros where possible
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c2ad469
    • H
      [NET] link_watch: Move link watch list into net_device · 572a103d
      Herbert Xu 提交于
      These days the link watch mechanism is an integral part of the
      network subsystem as it manages the carrier status.  So it now
      makes sense to allocate some memory for it in net_device rather
      than allocating it on demand.
      
      In fact, this is necessary because we can't tolerate a memory
      allocation failure since that means we'd have to potentially
      throw a link up event away.
      
      It also simplifies the code greatly.
      
      In doing so I discovered a subtle race condition in the use
      of singleevent.  This race condition still exists (and is
      somewhat magnified) without singleevent but it's now plugged
      thanks to an smp_mb__before_clear_bit.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      572a103d
    • A
      add upper-32-bits macro · 218e180e
      Andrew Morton 提交于
      We keep on getting "right shift count >= width of type" warnings when doing
      things like
      
      	sector_t s;
      
      	x = s >> 56;
      
      because with CONFIG_LBD=n, s is only 32-bit.  Similar problems can occur with
      dma_addr_t's.
      
      So add a simple wrapper function which code can use to avoid this warning.
      The above example would become
      
      	x = upper_32_bits(s) >> 24;
      
      The first user is in fact AFS.
      
      Cc: James Bottomley <James.Bottomley@SteelEye.com>
      Cc: "Cameron, Steve" <Steve.Cameron@hp.com>
      Cc: "Miller, Mike (OS Dev)" <Mike.Miller@hp.com>
      Cc: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      218e180e
    • C
      slub: support concurrent local and remote frees and allocs on a slab · 894b8788
      Christoph Lameter 提交于
      Avoid atomic overhead in slab_alloc and slab_free
      
      SLUB needs to use the slab_lock for the per cpu slabs to synchronize with
      potential kfree operations.  This patch avoids that need by moving all free
      objects onto a lockless_freelist.  The regular freelist continues to exist
      and will be used to free objects.  So while we consume the
      lockless_freelist the regular freelist may build up objects.
      
      If we are out of objects on the lockless_freelist then we may check the
      regular freelist.  If it has objects then we move those over to the
      lockless_freelist and do this again.  There is a significant savings in
      terms of atomic operations that have to be performed.
      
      We can even free directly to the lockless_freelist if we know that we are
      running on the same processor.  So this speeds up short lived objects.
      They may be allocated and freed without taking the slab_lock.  This is
      particular good for netperf.
      
      In order to maximize the effect of the new faster hotpath we extract the
      hottest performance pieces into inlined functions.  These are then inlined
      into kmem_cache_alloc and kmem_cache_free.  So hotpath allocation and
      freeing no longer requires a subroutine call within SLUB.
      
      [I am not sure that it is worth doing this because it changes the easy to
      read structure of slub just to reduce atomic ops.  However, there is
      someone out there with a benchmark on 4 way and 8 way processor systems
      that seems to show a 5% regression vs.  Slab.  Seems that the regression is
      due to increased atomic operations use vs.  SLAB in SLUB).  I wonder if
      this is applicable or discernable at all in a real workload?]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      894b8788
    • K
    • K
    • I
      CRC ITU-T V.41 · 3e7cbae7
      Ivo van Doorn 提交于
      This will add the CRC calculation according
      to the CRC ITU-T V.41 to the kernel lib/ folder.
      
      This code has been derived from the rt2x00 driver,
      currently found only in the wireless-dev tree, but
      this library is generic and could be used by more
      drivers who currently use their own implementation.
      Signed-off-by: NIvo van Doorn <IvDoorn@gmail.com>
      
      Also useful for the new firewire stack.
      Signed-off-by: NKristian Hoegsberg <krh@redhat.com>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      3e7cbae7
  2. 10 5月, 2007 9 次提交
    • S
      [POWERPC] pmu_sys_suspended is only defined for PPC32 · 49d687b6
      Stephen Rothwell 提交于
      thus we get a link error on ppc64 with CONFIG_PM=y.  This fixes it.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      49d687b6
    • L
      acpi,msi-laptop: Fall back to EC polling mode for MSI laptop specific EC commands · 00eb43a1
      Lennart Poettering 提交于
      The ACPI EC that is used in MSI laptops knows some non-standard
      commands for changing the screen brighntess and a few other things,
      which are used by the msi-laptop.c driver. Unfortunately for these
      commands no GPE events for IBF and OBF are triggered. Since nowadays
      the EC code uses the ec_intr=1 mode by default, this causes these
      operations to timeout, although they don't fail. In result, all
      operations that you can do with the msi-laptop.c driver take more or
      less 1s to complete, which is awfully slow.
      
      In one of the more recent kernels (2.6.20?) the EC subsystem has been
      revamped. With that change the EC timeout has been increased. before
      that increase the MSI EC accesses were slow -- but not *that* slow,
      hence I took notice of this limitation of the MSI EC hardware only very
      recently.
      
      The standard EC operations on the MSI EC as defined in the ACPI spec
      support GPE events properly.
      
      The following patch adds a new argument "force_poll" to the
      ec_transaction() function (and friends). If set to 1, the function
      will poll for IBF/OBF even if ec_intr=1 is enabled. If set to 0 the
      current behaviour is used. The msi-laptop driver is modified to make
      use of this new flag, so that OBF/IBF is polled for the special MSI EC
      transactions -- but only for them.
      Signed-off-by: NLennart Poettering <mzxreary@0pointer.de>
      Acked-by: NAlexey Starikovskiy <aystarik@gmail.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      00eb43a1
    • L
      Revert "md: improve partition detection in md array" · 44ce6294
      Linus Torvalds 提交于
      This reverts commit 5b479c91.
      
      Quoth Neil Brown:
      
        "It causes an oops when auto-detecting raid arrays, and it doesn't
         seem easy to fix.
      
         The array may not be 'open' when do_md_run is called, so
         bdev->bd_disk might be NULL, so bd_set_size can oops.
      
         This whole approach of opening an md device before it has been
         assembled just seems to get more and more painful.  I think I'm going
         to have to come up with something clever to provide both backward
         comparability with usage expectation, and sane integration into the
         rest of the kernel."
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44ce6294
    • B
      ide: legacy PCI bus order probing fixes · 6d208b39
      Bartlomiej Zolnierkiewicz 提交于
      IDE PCI host drivers should register themselves with IDE core only when
      IDE driver is built-in, otherwise (IDE driver is modular and thus IDE PCI
      host drivers are also modular) the code has no effect and just complicates
      the probing.
      
      Fix it by adding new config option CONFIG_IDEPCI_PCIBUS (defined only when
      needed and invisible to the user) and covering by #ifdef/#endif the code
      in question.  It turned out that "ide=reverse" was silently accepted but did
      nothing in case when IDE driver was modular, this is fixed now.
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      6d208b39
    • B
      ide: add ide_proc_register_port() · 5cbf79cd
      Bartlomiej Zolnierkiewicz 提交于
      * create_proc_ide_interfaces() tries to add /proc entries for every probed
        and initialized IDE port, replace it by ide_proc_register_port() which does
        it only for the given port (also rename destroy_proc_ide_interface() to
        ide_proc_unregister_port() for consistency)
        
      * convert {create,destroy}_proc_ide_interface[s]() users to use new functions
      
      * pmac driver depended on proc_ide_create() to add /proc port entries, fix it
        
      * au1xxx-ide, swarm and cs5520 drivers depended indirectly on ide-generic
        driver (CONFIG_IDE_GENERIC=y) to add port /proc entries, fix them
      
      * there is now no need to add /proc entries for IDE ports in proc_ide_create()
        so don't do it
      
      * proc_ide_create() needs now to be called before drivers are probed - fix it,
        while at it make proc_ide_create() create /proc "ide" directory
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      5cbf79cd
    • B
      ide: add "initializing" argument to ide_register_hw() · 869c56ee
      Bartlomiej Zolnierkiewicz 提交于
      Add "initializing" argument to ide_register_hw() and use it instead of ide.c
      wide variable of the same name.  Update all users of ide_register_hw()
      accordingly.
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      869c56ee
    • B
      ide: cable detection fixes (take 2) · 7f8f48af
      Bartlomiej Zolnierkiewicz 提交于
      Tejun's recent eighty_ninty_three() fix has inspired me to do more thorough
      review of the cable detection code...
      
      * print user-friendly warning about limiting the maximum transfer speed
        to UDMA33 (and the reason behind it) when 80-wire cable is not detected,
        also while at it cleanup eighty_ninty_three() a bit
      
      * use eighty_ninty_three() in ide_ata66_check(), this actually fixes 3 bugs:
        - bit 14 (word 93 validity check) == 1 && bit 13 (80-wire cable test) == 1
          were used as 80-wire cable present test for CONFIG_IDEDMA_IVB=n case
          (please see FIXME comment in eighty_ninty_three() for more details)
        - CONFIG_IDEDMA_IVB=y/n cases were interchanged
        - check for SATA devices was missing
      
      * remove private cable warnings from pdc_202xx{old,new} drivers now that core
        code provides this functionality (plus, in pdc202xx_new case the test could
        give false warnings for ATAPI devices because pdc202xx_new driver doesn't
        even support ATAPI DMA)
      
      Cc: Tejun Heo <htejun@gmail.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      7f8f48af
    • B
      ide: move IDE settings handling to ide-proc.c · 7662d046
      Bartlomiej Zolnierkiewicz 提交于
      * move
      	__ide_add_setting()
      	ide_add_setting()
      	__ide_remove_setting()
      	auto_remove_settings()
      	ide_find_setting_by_name()
      	ide_read_setting()
      	ide_write_setting()
      	set_xfer_rate()
      	ide_add_generic_settings()
      	ide_register_subdriver()
      	ide_unregister_subdriver()
      
        from ide.c to ide-proc.c
      
      * set_{io_32bit,pio_mode,using_dma}() cannot be marked static now, fix it
      
      * rename ide_[un]register_subdriver() to ide_proc_[un]register_driver(),
        update device drivers to use new names
      
      * add CONFIG_IDE_PROC_FS=n versions of ide_proc_[un]register_driver()
        and ide_add_generic_settings()
      
      * make ide_find_setting_by_name(), ide_{read,write}_setting()
        and ide_{add,remove}_proc_entries() static
      
      * cover IDE settings code in device drivers with CONFIG_IDE_PROC_FS #ifdef,
        also while at it cover with CONFIG_IDE_PROC_FS #ifdef ide_driver_t.proc
      
      * remove bogus comment from ide.h
      
      * cover with CONFIG_IDE_PROC_FS #ifdef .proc and .settings in ide_drive_t
      
      Besides saner code this patch results in the IDE core smaller by ~2 kB
      (on x86-32) and IDE disk driver by ~1 kB (ditto) when CONFIG_IDE_PROC_FS=n.
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      7662d046
    • B
      ide: split off ioctl handling from IDE settings (v2) · 1497943e
      Bartlomiej Zolnierkiewicz 提交于
      * do write permission and min/max checks in ide_procset_t functions
      
      * ide-disk.c: drive->id is always available so cleanup "multcount" setting
        accordingly
      
      * ide-disk.c: "address" setting was incorrectly defined as type TYPE_INTA,
        fix it by using type TYPE_BYTE and updating ide_drive_t->adressing field,
        the bug didn't trigger because this IDE setting uses custom ->set function
      
      * ide.c: add set_ksettings() for handling HDIO_SET_KEEPSETTINGS ioctl
      
      * ide.c: add set_unmaskirq() for handling HDIO_SET_UNMASKINTR ioctl
      
      * handle ioctls directly in generic_ide_ioclt() and idedisk_ioctl()
        instead of using IDE settings to deal with them
      
      * remove no longer needed ide_find_setting_by_ioctl() and {read,write}_ioctl
        fields from ide_settings_t, also remove now unused TYPE_INTA handling
      
      v2:
      * add missing EXPORT_SYMBOL_GPL(ide_setting_sem) needed now for ide-disk
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      1497943e