1. 09 10月, 2012 1 次提交
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
  2. 11 7月, 2012 1 次提交
  3. 10 5月, 2012 1 次提交
  4. 08 5月, 2012 2 次提交
    • K
      kmsg: export printk records to the /dev/kmsg interface · e11fea92
      Kay Sievers 提交于
      Support for multiple concurrent readers of /dev/kmsg, with read(),
      seek(), poll() support. Output of message sequence numbers, to allow
      userspace log consumers to reliably reconnect and reconstruct their
      state at any given time. After open("/dev/kmsg"), read() always
      returns *all* buffered records. If only future messages should be
      read, SEEK_END can be used. In case records get overwritten while
      /dev/kmsg is held open, or records get faster overwritten than they
      are read, the next read() will return -EPIPE and the current reading
      position gets updated to the next available record. The passed
      sequence numbers allow the log consumer to calculate the amount of
      lost messages.
      
        [root@mop ~]# cat /dev/kmsg
        5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
        6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
        7,160,424069;pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
         SUBSYSTEM=acpi
         DEVICE=+acpi:PNP0A03:00
        6,339,5140900;NET: Registered protocol family 10
        30,340,5690716;udevd[80]: starting version 181
        6,341,6081421;FDC 0 is a S82078B
        6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
        7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
         SUBSYSTEM=scsi
         DEVICE=+scsi:1:0:0:0
        6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
      
      Cc: Karel Zak <kzak@redhat.com>
      Tested-by: NWilliam Douglas <william.douglas@intel.com>
      Signed-off-by: NKay Sievers <kay@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e11fea92
    • K
      printk: convert byte-buffer to variable-length record buffer · 7ff9554b
      Kay Sievers 提交于
      - Record-based stream instead of the traditional byte stream
        buffer. All records carry a 64 bit timestamp, the syslog facility
        and priority in the record header.
      
      - Records consume almost the same amount, sometimes less memory than
        the traditional byte stream buffer (if printk_time is enabled). The record
        header is 16 bytes long, plus some padding bytes at the end if needed.
        The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
        the timestamp and a newline.
      
      - Buffer management is based on message sequence numbers. When records
        need to be discarded, the reading heads move on to the next full
        record. Unlike the byte-stream buffer, no old logged lines get
        truncated or partly overwritten by new ones. Sequence numbers also
        allow consumers of the log stream to get notified if any message in
        the stream they are about to read gets discarded during the time
        of reading.
      
      - Better buffered IO support for KERN_CONT continuation lines, when printk()
        is called multiple times for a single line. The use of KERN_CONT is now
        mandatory to use continuation; a few places in the kernel need trivial fixes
        here. The buffering could possibly be extended to per-cpu variables to allow
        better thread-safety for multiple printk() invocations for a single line.
      
      - Full-featured syslog facility value support. Different facilities
        can tag their messages. All userspace-injected messages enforce a
        facility value > 0 now, to be able to reliably distinguish them from
        the kernel-generated messages. Independent subsystems like a
        baseband processor running its own firmware, or a kernel-related
        userspace process can use their own unique facility values. Multiple
        independent log streams can co-exist that way in the same
        buffer. All share the same global sequence number counter to ensure
        proper ordering (and interleaving) and to allow the consumers of the
        log to reliably correlate the events from different facilities.
      Tested-by: NWilliam Douglas <william.douglas@intel.com>
      Signed-off-by: NKay Sievers <kay@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ff9554b
  5. 04 1月, 2012 1 次提交
  6. 01 11月, 2011 1 次提交
  7. 20 4月, 2011 2 次提交
  8. 24 3月, 2011 1 次提交
  9. 26 10月, 2010 1 次提交
  10. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  11. 22 9月, 2010 1 次提交
  12. 07 8月, 2010 1 次提交
    • D
      Fix init ordering of /dev/console vs callers of modprobe · 31d1d48e
      David Howells 提交于
      Make /dev/console get initialised before any initialisation routine that
      invokes modprobe because if modprobe fails, it's going to want to open
      /dev/console, presumably to write an error message to.
      
      The problem with that is that if the /dev/console driver is not yet
      initialised, the chardev handler will call request_module() to invoke
      modprobe, which will fail, because we never compile /dev/console as a
      module.
      
      This will lead to a modprobe loop, showing the following in the kernel
      log:
      
      	request_module: runaway loop modprobe char-major-5-1
      	request_module: runaway loop modprobe char-major-5-1
      	request_module: runaway loop modprobe char-major-5-1
      	request_module: runaway loop modprobe char-major-5-1
      	request_module: runaway loop modprobe char-major-5-1
      
      This can happen, for example, when the built in md5 module can't find
      the built in cryptomgr module (because the latter fails to initialise).
      The md5 module comes before the call to tty_init(), presumably because
      'crypto' comes before 'drivers' alphabetically.
      
      Fix this by calling tty_init() from chrdev_init().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31d1d48e
  13. 07 4月, 2010 3 次提交
  14. 13 3月, 2010 2 次提交
  15. 03 2月, 2010 2 次提交
  16. 16 12月, 2009 6 次提交
  17. 10 12月, 2009 1 次提交
    • C
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig 提交于
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6b2f3d1f
  18. 04 12月, 2009 1 次提交
  19. 14 10月, 2009 1 次提交
    • F
      mem_class: Drop the bkl from memory_open() · 205153aa
      Frederic Weisbecker 提交于
      The generic open callback for the mem class devices is "protected" by
      the bkl.
      
      Let's look at the datas manipulated inside memory_open:
      
      - inode and file: safe
      - the devlist: safe because it is constant
      - the memdev classes inside this array are safe too (constant)
      
      After we find out which memdev file operation we need to use, we call
      its open callback. Depending on the targeted memdev, we call either
      open_port() that doesn't manipulate any racy data (just a capable()
      check), or we call nothing.
      
      So it's safe to remove the big kernel lock there.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1255113062-5835-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      205153aa
  20. 28 9月, 2009 1 次提交
  21. 24 9月, 2009 1 次提交
  22. 20 9月, 2009 1 次提交
  23. 16 9月, 2009 2 次提交
  24. 11 9月, 2009 1 次提交
  25. 19 6月, 2009 1 次提交
  26. 10 6月, 2009 1 次提交
    • L
      Make /dev/zero reads interruptible by signals · 2b838687
      Linus Torvalds 提交于
      This helps with bad latencies for large reads from /dev/zero, but might
      conceivably break some application that "knows" that a read of /dev/zero
      cannot return early.  So do this early in the merge window to give us
      maximal test coverage, even if the patch is totally trivial.
      
      Obviously, no well-behaved application should ever depend on the read
      being uninterruptible, but hey, bugs happen.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b838687
  27. 05 6月, 2009 1 次提交
    • S
      drivers/char/mem.c: avoid OOM lockup during large reads from /dev/zero · 730c586a
      Salman Qazi 提交于
      While running 20 parallel instances of dd as follows:
      
        #!/bin/bash
        for i in `seq 1 20`; do
                 dd if=/dev/zero of=/export/hda3/dd_$i bs=1073741824 count=1 &
        done
        wait
      
      on a 16G machine, we noticed that rather than just killing the processes,
      the entire kernel went down.  Stracing dd reveals that it first does an
      mmap2, which makes 1GB worth of zero page mappings.  Then it performs a
      read on those pages from /dev/zero, and finally it performs a write.
      
      The machine died during the reads.  Looking at the code, it was noticed
      that /dev/zero's read operation had been changed by
      557ed1fa ("remove ZERO_PAGE") from giving
      zero page mappings to actually zeroing the page.
      
      The zeroing of the pages causes physical pages to be allocated to the
      process.  But, when the process exhausts all the memory that it can, the
      kernel cannot kill it, as it is still in the kernel mode allocating more
      memory.  Consequently, the kernel eventually crashes.
      
      To fix this, I propose that when a fatal signal is pending during
      /dev/zero read operation, we simply return and let the user process die.
      Signed-off-by: NSalman Qazi <sqazi@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Modified error return and comment trivially.  - Linus]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      730c586a
  28. 10 4月, 2009 1 次提交