1. 03 2月, 2010 1 次提交
  2. 16 12月, 2009 6 次提交
  3. 10 12月, 2009 1 次提交
    • C
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig 提交于
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6b2f3d1f
  4. 04 12月, 2009 1 次提交
  5. 14 10月, 2009 1 次提交
    • F
      mem_class: Drop the bkl from memory_open() · 205153aa
      Frederic Weisbecker 提交于
      The generic open callback for the mem class devices is "protected" by
      the bkl.
      
      Let's look at the datas manipulated inside memory_open:
      
      - inode and file: safe
      - the devlist: safe because it is constant
      - the memdev classes inside this array are safe too (constant)
      
      After we find out which memdev file operation we need to use, we call
      its open callback. Depending on the targeted memdev, we call either
      open_port() that doesn't manipulate any racy data (just a capable()
      check), or we call nothing.
      
      So it's safe to remove the big kernel lock there.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1255113062-5835-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      205153aa
  6. 28 9月, 2009 1 次提交
  7. 24 9月, 2009 1 次提交
  8. 20 9月, 2009 1 次提交
  9. 16 9月, 2009 2 次提交
  10. 11 9月, 2009 1 次提交
  11. 19 6月, 2009 1 次提交
  12. 10 6月, 2009 1 次提交
    • L
      Make /dev/zero reads interruptible by signals · 2b838687
      Linus Torvalds 提交于
      This helps with bad latencies for large reads from /dev/zero, but might
      conceivably break some application that "knows" that a read of /dev/zero
      cannot return early.  So do this early in the merge window to give us
      maximal test coverage, even if the patch is totally trivial.
      
      Obviously, no well-behaved application should ever depend on the read
      being uninterruptible, but hey, bugs happen.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b838687
  13. 05 6月, 2009 1 次提交
    • S
      drivers/char/mem.c: avoid OOM lockup during large reads from /dev/zero · 730c586a
      Salman Qazi 提交于
      While running 20 parallel instances of dd as follows:
      
        #!/bin/bash
        for i in `seq 1 20`; do
                 dd if=/dev/zero of=/export/hda3/dd_$i bs=1073741824 count=1 &
        done
        wait
      
      on a 16G machine, we noticed that rather than just killing the processes,
      the entire kernel went down.  Stracing dd reveals that it first does an
      mmap2, which makes 1GB worth of zero page mappings.  Then it performs a
      read on those pages from /dev/zero, and finally it performs a write.
      
      The machine died during the reads.  Looking at the code, it was noticed
      that /dev/zero's read operation had been changed by
      557ed1fa ("remove ZERO_PAGE") from giving
      zero page mappings to actually zeroing the page.
      
      The zeroing of the pages causes physical pages to be allocated to the
      process.  But, when the process exhausts all the memory that it can, the
      kernel cannot kill it, as it is still in the kernel mode allocating more
      memory.  Consequently, the kernel eventually crashes.
      
      To fix this, I propose that when a fatal signal is pending during
      /dev/zero read operation, we simply return and let the user process die.
      Signed-off-by: NSalman Qazi <sqazi@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Modified error return and comment trivially.  - Linus]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      730c586a
  14. 10 4月, 2009 1 次提交
  15. 07 1月, 2009 1 次提交
  16. 17 10月, 2008 1 次提交
  17. 25 7月, 2008 1 次提交
  18. 22 7月, 2008 1 次提交
  19. 20 7月, 2008 1 次提交
    • I
      Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM · d092633b
      Ingo Molnar 提交于
      From: Arjan van de Ven <arjan@infradead.org>
      Date: Sat, 19 Jul 2008 15:47:17 -0700
      
      CONFIG_NONPROMISC_DEVMEM was a rather confusing name - but renaming it
      to CONFIG_PROMISC_DEVMEM causes problems on architectures that do not
      support this feature; this patch renames it to CONFIG_STRICT_DEVMEM,
      so that architectures can opt-in into it.
      
      ( the polarity of the option is still the same as it was originally; it
        needs to be for now to not break architectures that don't have the
        infastructure yet to support this feature)
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: "V.Radhakrishnan" <rk@atr-labs.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ---
      d092633b
  20. 18 7月, 2008 1 次提交
    • I
      x86: rename CONFIG_NONPROMISC_DEVMEM to CONFIG_PROMISC_DEVMEM · 64d206d8
      Ingo Molnar 提交于
      Linus observed:
      
      > The real bug is that we shouldn't have "double negatives", and
      > certainly not negative config options. Making that "promiscuous
      > /dev/mem" option a negated thing as a config option was bad.
      
      right ... lets rename this option. There should never be a negation
      in config options.
      
      [ that reminds me of CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER, but that
        is for another commit ;-) ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64d206d8
  21. 21 6月, 2008 1 次提交
  22. 29 4月, 2008 1 次提交
  23. 25 4月, 2008 5 次提交
  24. 29 10月, 2007 1 次提交
  25. 17 10月, 2007 2 次提交
    • P
      mm: bdi init hooks · e0bf68dd
      Peter Zijlstra 提交于
      provide BDI constructor/destructor hooks
      
      [akpm@linux-foundation.org: compile fix]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0bf68dd
    • N
      remove ZERO_PAGE · 557ed1fa
      Nick Piggin 提交于
      The commit b5810039 contains the note
      
        A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
        (and thus mapcounted and count towards shared rss).  These writes to
        the struct page could cause excessive cacheline bouncing on big
        systems.  There are a number of ways this could be addressed if it is
        an issue.
      
      And indeed this cacheline bouncing has shown up on large SGI systems.
      There was a situation where an Altix system was essentially livelocked
      tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
      This situation can be avoided in userspace, but it does highlight the
      potential scalability problem with refcounting ZERO_PAGE, and corner
      cases where it can really hurt (we don't want the system to livelock!).
      
      There are several broad ways to fix this problem:
      1. add back some special casing to avoid refcounting ZERO_PAGE
      2. per-node or per-cpu ZERO_PAGES
      3. remove the ZERO_PAGE completely
      
      I will argue for 3. The others should also fix the problem, but they
      result in more complex code than does 3, with little or no real benefit
      that I can see.
      
      Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
      false optimisation: if an application is performance critical, it would
      not be doing many read faults of new memory, or at least it could be
      expected to write to that memory soon afterwards. If cache or memory use
      is critical, it should not be working with a significant number of
      ZERO_PAGEs anyway (a more compact representation of zeroes should be
      used).
      
      As a sanity check -- mesuring on my desktop system, there are never many
      mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
      increase much without it.
      
      When running a make -j4 kernel compile on my dual core system, there are
      about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
      ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
      is torn down without being COWed). So removing ZERO_PAGE will save 1,000
      page faults per second when running kbuild, while keeping it only saves
      less than 1 page clearing operation per second. 1 page clear is cheaper
      than a thousand faults, presumably, so there isn't an obvious loss.
      
      Neither the logical argument nor these basic tests give a guarantee of no
      regressions. However, this is a reasonable opportunity to try to remove
      the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
      we can reintroduce it and just avoid refcounting it.
      
      The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
      much use to them except on benchmarks.  All other users of ZERO_PAGE are
      converted just to use ZERO_PAGE(0) for simplicity. We can look at
      replacing them all and maybe ripping out ZERO_PAGE completely when we are
      more satisfied with this solution.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus "snif" Torvalds <torvalds@linux-foundation.org>
      557ed1fa
  26. 11 7月, 2007 1 次提交
  27. 10 7月, 2007 1 次提交
  28. 09 5月, 2007 2 次提交