1. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  2. 07 8月, 2010 1 次提交
    • D
      Fix init ordering of /dev/console vs callers of modprobe · 31d1d48e
      David Howells 提交于
      Make /dev/console get initialised before any initialisation routine that
      invokes modprobe because if modprobe fails, it's going to want to open
      /dev/console, presumably to write an error message to.
      
      The problem with that is that if the /dev/console driver is not yet
      initialised, the chardev handler will call request_module() to invoke
      modprobe, which will fail, because we never compile /dev/console as a
      module.
      
      This will lead to a modprobe loop, showing the following in the kernel
      log:
      
      	request_module: runaway loop modprobe char-major-5-1
      	request_module: runaway loop modprobe char-major-5-1
      	request_module: runaway loop modprobe char-major-5-1
      	request_module: runaway loop modprobe char-major-5-1
      	request_module: runaway loop modprobe char-major-5-1
      
      This can happen, for example, when the built in md5 module can't find
      the built in cryptomgr module (because the latter fails to initialise).
      The md5 module comes before the call to tty_init(), presumably because
      'crypto' comes before 'drivers' alphabetically.
      
      Fix this by calling tty_init() from chrdev_init().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31d1d48e
  3. 07 4月, 2010 3 次提交
  4. 13 3月, 2010 2 次提交
  5. 03 2月, 2010 2 次提交
  6. 16 12月, 2009 6 次提交
  7. 10 12月, 2009 1 次提交
    • C
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig 提交于
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6b2f3d1f
  8. 04 12月, 2009 1 次提交
  9. 14 10月, 2009 1 次提交
    • F
      mem_class: Drop the bkl from memory_open() · 205153aa
      Frederic Weisbecker 提交于
      The generic open callback for the mem class devices is "protected" by
      the bkl.
      
      Let's look at the datas manipulated inside memory_open:
      
      - inode and file: safe
      - the devlist: safe because it is constant
      - the memdev classes inside this array are safe too (constant)
      
      After we find out which memdev file operation we need to use, we call
      its open callback. Depending on the targeted memdev, we call either
      open_port() that doesn't manipulate any racy data (just a capable()
      check), or we call nothing.
      
      So it's safe to remove the big kernel lock there.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1255113062-5835-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      205153aa
  10. 28 9月, 2009 1 次提交
  11. 24 9月, 2009 1 次提交
  12. 20 9月, 2009 1 次提交
  13. 16 9月, 2009 2 次提交
  14. 11 9月, 2009 1 次提交
  15. 19 6月, 2009 1 次提交
  16. 10 6月, 2009 1 次提交
    • L
      Make /dev/zero reads interruptible by signals · 2b838687
      Linus Torvalds 提交于
      This helps with bad latencies for large reads from /dev/zero, but might
      conceivably break some application that "knows" that a read of /dev/zero
      cannot return early.  So do this early in the merge window to give us
      maximal test coverage, even if the patch is totally trivial.
      
      Obviously, no well-behaved application should ever depend on the read
      being uninterruptible, but hey, bugs happen.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b838687
  17. 05 6月, 2009 1 次提交
    • S
      drivers/char/mem.c: avoid OOM lockup during large reads from /dev/zero · 730c586a
      Salman Qazi 提交于
      While running 20 parallel instances of dd as follows:
      
        #!/bin/bash
        for i in `seq 1 20`; do
                 dd if=/dev/zero of=/export/hda3/dd_$i bs=1073741824 count=1 &
        done
        wait
      
      on a 16G machine, we noticed that rather than just killing the processes,
      the entire kernel went down.  Stracing dd reveals that it first does an
      mmap2, which makes 1GB worth of zero page mappings.  Then it performs a
      read on those pages from /dev/zero, and finally it performs a write.
      
      The machine died during the reads.  Looking at the code, it was noticed
      that /dev/zero's read operation had been changed by
      557ed1fa ("remove ZERO_PAGE") from giving
      zero page mappings to actually zeroing the page.
      
      The zeroing of the pages causes physical pages to be allocated to the
      process.  But, when the process exhausts all the memory that it can, the
      kernel cannot kill it, as it is still in the kernel mode allocating more
      memory.  Consequently, the kernel eventually crashes.
      
      To fix this, I propose that when a fatal signal is pending during
      /dev/zero read operation, we simply return and let the user process die.
      Signed-off-by: NSalman Qazi <sqazi@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Modified error return and comment trivially.  - Linus]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      730c586a
  18. 10 4月, 2009 1 次提交
  19. 07 1月, 2009 1 次提交
  20. 17 10月, 2008 1 次提交
  21. 25 7月, 2008 1 次提交
  22. 22 7月, 2008 1 次提交
  23. 20 7月, 2008 1 次提交
    • I
      Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM · d092633b
      Ingo Molnar 提交于
      From: Arjan van de Ven <arjan@infradead.org>
      Date: Sat, 19 Jul 2008 15:47:17 -0700
      
      CONFIG_NONPROMISC_DEVMEM was a rather confusing name - but renaming it
      to CONFIG_PROMISC_DEVMEM causes problems on architectures that do not
      support this feature; this patch renames it to CONFIG_STRICT_DEVMEM,
      so that architectures can opt-in into it.
      
      ( the polarity of the option is still the same as it was originally; it
        needs to be for now to not break architectures that don't have the
        infastructure yet to support this feature)
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: "V.Radhakrishnan" <rk@atr-labs.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ---
      d092633b
  24. 18 7月, 2008 1 次提交
    • I
      x86: rename CONFIG_NONPROMISC_DEVMEM to CONFIG_PROMISC_DEVMEM · 64d206d8
      Ingo Molnar 提交于
      Linus observed:
      
      > The real bug is that we shouldn't have "double negatives", and
      > certainly not negative config options. Making that "promiscuous
      > /dev/mem" option a negated thing as a config option was bad.
      
      right ... lets rename this option. There should never be a negation
      in config options.
      
      [ that reminds me of CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER, but that
        is for another commit ;-) ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64d206d8
  25. 21 6月, 2008 1 次提交
  26. 29 4月, 2008 1 次提交
  27. 25 4月, 2008 4 次提交