1. 18 12月, 2012 1 次提交
  2. 14 12月, 2012 2 次提交
    • I
      autofs4 - use simple_empty() for empty directory check · 0259cb02
      Ian Kent 提交于
      For direct (and offset) mounts, if an automounted mount is manually
      umounted the trigger mount dentry can appear non-empty causing it to
      not trigger mounts. This can also happen if there is a file handle
      leak in a user space automounting application.
      
      This happens because, when a ioctl control file handle is opened
      on the mount, a cursor dentry is created which causes list_empty()
      to see the dentry as non-empty. Since there is a case where listing
      the directory of these dentrys is needed, the use of dcache_dir_*()
      functions for .open() and .release() is needed.
      
      Consequently simple_empty() must be used instead of list_empty()
      when checking for an empty directory.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0259cb02
    • I
      autofs4 - dont clear DCACHE_NEED_AUTOMOUNT on rootless mount · f55fb0c2
      Ian Kent 提交于
      The DCACHE_NEED_AUTOMOUNT flag is cleared on mount and set on expire
      for autofs rootless multi-mount dentrys to prevent unnecessary calls
      to ->d_automount().
      
      Since DCACHE_MANAGE_TRANSIT is always set on autofs dentrys ->d_managed()
      is always called so the check can be done in ->d_manage() without the
      need to change the flag. This still avoids unnecessary calls to
      ->d_automount(), adds negligible overhead and eliminates a seriously
      ugly check in the expire code.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f55fb0c2
  3. 13 12月, 2012 13 次提交
  4. 12 12月, 2012 11 次提交
    • D
      mm, oom: change type of oom_score_adj to short · a9c58b90
      David Rientjes 提交于
      The maximum oom_score_adj is 1000 and the minimum oom_score_adj is -1000,
      so this range can be represented by the signed short type with no
      functional change.  The extra space this frees up in struct signal_struct
      will be used for per-thread oom kill flags in the next patch.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9c58b90
    • R
      mm: redefine address_space.assoc_mapping · 252aa6f5
      Rafael Aquini 提交于
      Overhaul struct address_space.assoc_mapping renaming it to
      address_space.private_data and its type is redefined to void*.  By this
      approach we consistently name the .private_* elements from struct
      address_space as well as allow extended usage for address_space
      association with other data structures through ->private_data.
      
      Also, all users of old ->assoc_mapping element are converted to reflect
      its new name and type change (->private_data).
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      252aa6f5
    • R
      mm: adjust address_space_operations.migratepage() return code · 78bd5209
      Rafael Aquini 提交于
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a
      guest, thus imposing performance penalties associated with the reduced
      number of transparent huge pages that could be used by the guest workload.
      
      This patch-set follows the main idea discussed at 2012 LSFMMS session:
      "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
      to introduce the required changes to the virtio_balloon driver, as well as
      the changes to the core compaction & migration bits, in order to make
      those subsystems aware of ballooned pages and allow memory balloon pages
      become movable within a guest, thus avoiding the aforementioned
      fragmentation issue
      
      Following are numbers that prove this patch benefits on allowing
      compaction to be more effective at memory ballooned guests.
      
      Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
      running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB
      chunks, at every minute (inflating/deflating), while test was running:
      
      ===BEGIN stress-highalloc
      
      STRESS-HIGHALLOC
                       highalloc-3.7     highalloc-3.7
                           rc4-clean         rc4-patch
      Pass 1          55.00 ( 0.00%)    62.00 ( 7.00%)
      Pass 2          54.00 ( 0.00%)    62.00 ( 8.00%)
      while Rested    75.00 ( 0.00%)    80.00 ( 5.00%)
      
      MMTests Statistics: duration
                       3.7         3.7
                 rc4-clean   rc4-patch
      User         1207.59     1207.46
      System       1300.55     1299.61
      Elapsed      2273.72     2157.06
      
      MMTests Statistics: vmstat
                                      3.7         3.7
                                rc4-clean   rc4-patch
      Page Ins                    3581516     2374368
      Page Outs                  11148692    10410332
      Swap Ins                         80          47
      Swap Outs                      3641         476
      Direct pages scanned          37978       33826
      Kswapd pages scanned        1828245     1342869
      Kswapd pages reclaimed      1710236     1304099
      Direct pages reclaimed        32207       31005
      Kswapd efficiency               93%         97%
      Kswapd velocity             804.077     622.546
      Direct efficiency               84%         91%
      Direct velocity              16.703      15.682
      Percentage direct scans          2%          2%
      Page writes by reclaim        79252        9704
      Page writes file              75611        9228
      Page writes anon               3641         476
      Page reclaim immediate        16764       11014
      Page rescued immediate            0           0
      Slabs scanned               2171904     2152448
      Direct inode steals             385        2261
      Kswapd inode steals          659137      609670
      Kswapd skipped wait               1          69
      THP fault alloc                 546         631
      THP collapse alloc              361         339
      THP splits                      259         263
      THP fault fallback               98          50
      THP collapse fail                20          17
      Compaction stalls               747         499
      Compaction success              244         145
      Compaction failures             503         354
      Compaction pages moved       370888      474837
      Compaction move failure       77378       65259
      
      ===END stress-highalloc
      
      This patch:
      
      Introduce MIGRATEPAGE_SUCCESS as the default return code for
      address_space_operations.migratepage() method and documents the expected
      return code for the same method in failure cases.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78bd5209
    • M
      mm: use vm_unmapped_area() in hugetlbfs · 08659355
      Michel Lespinasse 提交于
      Update the hugetlb_get_unmapped_area function to make use of
      vm_unmapped_area() instead of implementing a brute force search.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08659355
    • A
      mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB · 42d7395f
      Andi Kleen 提交于
      There was some desire in large applications using MAP_HUGETLB or
      SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
      others.  This is useful together with NUMA policy: use 2MB interleaving
      on some mappings, but 1GB on local mappings.
      
      This patch extends the IPC/SHM syscall interfaces slightly to allow
      specifying the page size.
      
      It borrows some upper bits in the existing flag arguments and allows
      encoding the log of the desired page size in addition to the *_HUGETLB
      flag.  When 0 is specified the default size is used, this makes the
      change fully compatible.
      
      Extending the internal hugetlb code to handle this is straight forward.
      Instead of a single mount it just keeps an array of them and selects the
      right mount based on the specified page size.  When no page size is
      specified it uses the mount of the default page size.
      
      The change is not visible in /proc/mounts because internal mounts don't
      appear there.  It also has very little overhead: the additional mounts
      just consume a super block, but not more memory when not used.
      
      I also exported the new flags to the user headers (they were previously
      under __KERNEL__).  Right now only symbols for x86 and some other
      architecture for 1GB and 2MB are defined.  The interface should already
      work for all other architectures though.  Only architectures that define
      multiple hugetlb sizes actually need it (that is currently x86, tile,
      powerpc).  However tile and powerpc have user configurable hugetlb
      sizes, so it's not easy to add defines.  A program on those
      architectures would need to query sysfs and use the appropiate log2.
      
      [akpm@linux-foundation.org: cleanups]
      [rientjes@google.com: fix build]
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42d7395f
    • N
      writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr() · d0e1d66b
      Namjae Jeon 提交于
      There is no reason to pass the nr_pages_dirtied argument, because
      nr_pages_dirtied value from the caller is unused in
      balance_dirty_pages_ratelimited_nr().
      Signed-off-by: NNamjae Jeon <linkinjeon@gmail.com>
      Signed-off-by: NVivek Trivedi <vtrivedi018@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0e1d66b
    • P
      CIFS: Fix write after setting a read lock for read oplock files · c299dd0e
      Pavel Shilovsky 提交于
      If we have a read oplock and set a read lock in it, we can't write to the
      locked area - so, filemap_fdatawrite may fail with a no information for a
      userspace application even if we request a write to non-locked area. Fix
      this by populating the page cache without marking affected pages dirty
      after a successful write directly to the server.
      
      Also remove CONFIG_CIFS_SMB2 ifdefs because it's suitable for both CIFS
      and SMB2 protocols.
      Signed-off-by: NPavel Shilovsky <piastry@etersoft.ru>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      c299dd0e
    • J
      cifs: parse the device name into UNC and prepath · d387a5c5
      Jeff Layton 提交于
      This should fix a regression that was introduced when the new mount
      option parser went in. Also, when the unc= and prefixpath= options
      are provided, check their values against the ones we parsed from
      the device string. If they differ, then throw a warning that tells
      the user that we're using the values from the unc= option for now,
      but that that will change in 3.10.
      
      Pavel Shilovsky <piastry@etersoft.ru>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      d387a5c5
    • J
      cifs: fix up handling of prefixpath= option · 839db3d1
      Jeff Layton 提交于
      Currently the code takes care to ensure that the prefixpath has a
      leading '/' delimiter. What if someone passes us a prefixpath with a
      leading '\\' instead? The code doesn't properly handle that currently
      AFAICS.
      
      Let's just change the code to skip over any leading delimiter character
      when copying the prepath. Then, fix up the users of the prepath option
      to prefix it with the correct delimiter when they use it.
      
      Also, there's no need to limit the length of the prefixpath to 1k. If
      the server can handle it, why bother forbidding it?
      
      Pavel Shilovsky <piastry@etersoft.ru>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      839db3d1
    • J
      cifs: clean up handling of unc= option · 62a1a439
      Jeff Layton 提交于
      Make sure we free any existing memory allocated for vol->UNC, just in
      case someone passes in multiple unc= options.
      
      Get rid of the check for too long a UNC. The check for >300 bytes seems
      arbitrary. We later copy this into the tcon->treeName, for instance and
      it's a lot shorter than 300 bytes.
      
      Eliminate an extra kmalloc and copy as well. Just set the vol->UNC
      directly with the contents of match_strdup.
      
      Establish that the UNC should be stored with '\\' delimiters. Use
      convert_delimiter to change it in place in the vol->UNC.
      
      Finally, move the check for a malformed UNC into
      cifs_parse_mount_options so we can catch that situation earlier.
      
      Pavel Shilovsky <piastry@etersoft.ru>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      62a1a439
    • J
      cifs: fix SID binary to string conversion · 193cdd8a
      Jeff Layton 提交于
      The authority fields are supposed to be represented by a single 48-bit
      value. It's also supposed to represent the value as hex if it's equal to
      or greater than 2^32. This is documented in MS-DTYP, section 2.4.2.1.
      
      Also, fix up the max string length to account for this fix.
      Acked-by: NPavel Shilovsky <piastry@etersoft.ru>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      193cdd8a
  5. 11 12月, 2012 13 次提交