1. 04 7月, 2013 1 次提交
    • P
      mm: soft-dirty bits for user memory changes tracking · 0f8975ec
      Pavel Emelyanov 提交于
      The soft-dirty is a bit on a PTE which helps to track which pages a task
      writes to.  In order to do this tracking one should
      
        1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
        2. Wait some time.
        3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)
      
      To do this tracking, the writable bit is cleared from PTEs when the
      soft-dirty bit is.  Thus, after this, when the task tries to modify a
      page at some virtual address the #PF occurs and the kernel sets the
      soft-dirty bit on the respective PTE.
      
      Note, that although all the task's address space is marked as r/o after
      the soft-dirty bits clear, the #PF-s that occur after that are processed
      fast.  This is so, since the pages are still mapped to physical memory,
      and thus all the kernel does is finds this fact out and puts back
      writable, dirty and soft-dirty bits on the PTE.
      
      Another thing to note, is that when mremap moves PTEs they are marked
      with soft-dirty as well, since from the user perspective mremap modifies
      the virtual memory at mremap's new address.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f8975ec
  2. 03 7月, 2013 1 次提交
  3. 29 6月, 2013 6 次提交
    • J
      locks: give the blocked_hash its own spinlock · 7b2296af
      Jeff Layton 提交于
      There's no reason we have to protect the blocked_hash and file_lock_list
      with the same spinlock. With the tests I have, breaking it in two gives
      a barely measurable performance benefit, but it seems reasonable to make
      this locking as granular as possible.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7b2296af
    • J
      locks: add a new "lm_owner_key" lock operation · 3999e493
      Jeff Layton 提交于
      Currently, the hashing that the locking code uses to add these values
      to the blocked_hash is simply calculated using fl_owner field. That's
      valid in most cases except for server-side lockd, which validates the
      owner of a lock based on fl_owner and fl_pid.
      
      In the case where you have a small number of NFS clients doing a lot
      of locking between different processes, you could end up with all
      the blocked requests sitting in a very small number of hash buckets.
      
      Add a new lm_owner_key operation to the lock_manager_operations that
      will generate an unsigned long to use as the key in the hashtable.
      That function is only implemented for server-side lockd, and simply
      XORs the fl_owner and fl_pid.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3999e493
    • J
      locks: protect most of the file_lock handling with i_lock · 1c8c601a
      Jeff Layton 提交于
      Having a global lock that protects all of this code is a clear
      scalability problem. Instead of doing that, move most of the code to be
      protected by the i_lock instead. The exceptions are the global lists
      that the ->fl_link sits on, and the ->fl_block list.
      
      ->fl_link is what connects these structures to the
      global lists, so we must ensure that we hold those locks when iterating
      over or updating these lists.
      
      Furthermore, sound deadlock detection requires that we hold the
      blocked_list state steady while checking for loops. We also must ensure
      that the search and update to the list are atomic.
      
      For the checking and insertion side of the blocked_list, push the
      acquisition of the global lock into __posix_lock_file and ensure that
      checking and update of the  blocked_list is done without dropping the
      lock in between.
      
      On the removal side, when waking up blocked lock waiters, take the
      global lock before walking the blocked list and dequeue the waiters from
      the global list prior to removal from the fl_block list.
      
      With this, deadlock detection should be race free while we minimize
      excessive file_lock_lock thrashing.
      
      Finally, in order to avoid a lock inversion problem when handling
      /proc/locks output we must ensure that manipulations of the fl_block
      list are also protected by the file_lock_lock.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1c8c601a
    • L
      Don't pass inode to ->d_hash() and ->d_compare() · da53be12
      Linus Torvalds 提交于
      Instances either don't look at it at all (the majority of cases) or
      only want it to find the superblock (which can be had as dentry->d_sb).
      A few cases that want more are actually safe with dentry->d_inode -
      the only precaution needed is the check that it hadn't been replaced with
      NULL by rmdir() or by overwriting rename(), which case should be simply
      treated as cache miss.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      da53be12
    • A
      [readdir] ->readdir() is gone · 2233f31a
      Al Viro 提交于
      everything's converted to ->iterate()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2233f31a
    • A
      [readdir] introduce iterate_dir() and dir_context · 5c0ba4e0
      Al Viro 提交于
      iterate_dir(): new helper, replacing vfs_readdir().
      
      struct dir_context: contains the readdir callback (and will get more stuff
      in it), embedded into whatever data that callback wants to deal with;
      eventually, we'll be passing it to ->readdir() replacement instead of
      (data,filldir) pair.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5c0ba4e0
  4. 17 6月, 2013 1 次提交
    • N
      f2fs: add remount_fs callback support · 696c018c
      Namjae Jeon 提交于
      Add the f2fs_remount function call which will be used
      during the filesystem remounting. This function
      will help us to change the mount options specific to
      f2fs.
      
      Also modify the f2fs background_gc mount option, which
      will allow the user to dynamically trun on/off the
      garbage collection in f2fs based on the background_gc
      value. If background_gc=on, Garbage collection will
      be turned off & if background_gc=off, Garbage collection
      will be truned on.
      
      By default the garbage collection is on in f2fs.
      
      Change Log:
      v2: Incorporated the review comments by Gu Zheng.
          Removing the restore part for VFS flags
          Updating comments with proper flag conditions
          Display GC background option as ON/OFF
          Revised conditions to stop GC in case of remount
      
      v1: Initial changes for adding remount_fs callback
      support.
      
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
      Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: change /** with /* for the coding style]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      696c018c
  5. 06 6月, 2013 1 次提交
  6. 22 5月, 2013 1 次提交
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  7. 07 5月, 2013 1 次提交
  8. 30 4月, 2013 1 次提交
  9. 28 4月, 2013 1 次提交
  10. 26 4月, 2013 1 次提交
    • S
      SUNRPC: Use gssproxy upcall for server RPCGSS authentication. · 030d794b
      Simo Sorce 提交于
      The main advantge of this new upcall mechanism is that it can handle
      big tickets as seen in Kerberos implementations where tickets carry
      authorization data like the MS-PAC buffer with AD or the Posix Authorization
      Data being discussed in IETF on the krbwg working group.
      
      The Gssproxy program is used to perform the accept_sec_context call on the
      kernel's behalf. The code is changed to also pass the input buffer straight
      to upcall mechanism to avoid allocating and copying many pages as tokens can
      be as big (potentially more in future) as 64KiB.
      Signed-off-by: NSimo Sorce <simo@redhat.com>
      [bfields: containerization, negotiation api]
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      030d794b
  11. 10 4月, 2013 1 次提交
    • L
      ext4: introduce reserved space · 27dd4385
      Lukas Czerner 提交于
      Currently in ENOSPC condition when writing into unwritten space, or
      punching a hole, we might need to split the extent and grow extent tree.
      However since we can not allocate any new metadata blocks we'll have to
      zero out unwritten part of extent or punched out part of extent, or in
      the worst case return ENOSPC even though use actually does not allocate
      any space.
      
      Also in delalloc path we do reserve metadata and data blocks for the
      time we're going to write out, however metadata block reservation is
      very tricky especially since we expect that logical connectivity implies
      physical connectivity, however that might not be the case and hence we
      might end up allocating more metadata blocks than previously reserved.
      So in future, metadata reservation checks should be removed since we can
      not assure that we do not under reserve.
      
      And this is where reserved space comes into the picture. When mounting
      the file system we slice off a little bit of the file system space (2%
      or 4096 clusters, whichever is smaller) which can be then used for the
      cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.
      
      The number of reserved clusters can be set via sysfs, however it can
      never be bigger than number of free clusters in the file system.
      
      Note that this patch fixes the failure of xfstest 274 as expected.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      27dd4385
  12. 09 4月, 2013 1 次提交
    • D
      ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT · 393d1d1d
      Dr. Tilmann Bubeck 提交于
      Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
      associated attributes (like i_blocks, i_size, i_flags, ...) from the
      specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
      typically used to store a boot loader in a secure part of the
      filesystem, where it can't be changed by a normal user by accident.
      The data blocks of the previous boot loader will be associated with
      the given inode.
      
      This usercode program is a simple example of the usage:
      
      int main(int argc, char *argv[])
      {
        int fd;
        int err;
      
        if ( argc != 2 ) {
          printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
          exit(1);
        }
      
        fd = open(argv[1], O_WRONLY);
        if ( fd < 0 ) {
          perror("open");
          exit(1);
        }
      
        err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
        if ( err < 0 ) {
          perror("ioctl");
          exit(1);
        }
      
        close(fd);
        exit(0);
      }
      
      [ Modified by Theodore Ts'o to fix a number of bugs in the original code.]
      Signed-off-by: NDr. Tilmann Bubeck <t.bubeck@reinform.de>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      393d1d1d
  13. 03 4月, 2013 1 次提交
  14. 26 2月, 2013 1 次提交
    • J
      vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op · ecf3d1f1
      Jeff Layton 提交于
      The following set of operations on a NFS client and server will cause
      
          server# mkdir a
          client# cd a
          server# mv a a.bak
          client# sleep 30  # (or whatever the dir attrcache timeout is)
          client# stat .
          stat: cannot stat `.': Stale NFS file handle
      
      Obviously, we should not be getting an ESTALE error back there since the
      inode still exists on the server. The problem is that the lookup code
      will call d_revalidate on the dentry that "." refers to, because NFS has
      FS_REVAL_DOT set.
      
      nfs_lookup_revalidate will see that the parent directory has changed and
      will try to reverify the dentry by redoing a LOOKUP. That of course
      fails, so the lookup code returns ESTALE.
      
      The problem here is that d_revalidate is really a bad fit for this case.
      What we really want to know at this point is whether the inode is still
      good or not, but we don't really care what name it goes by or whether
      the dcache is still valid.
      
      Add a new d_op->d_weak_revalidate operation and have complete_walk call
      that instead of d_revalidate. The intent there is to allow for a
      "weaker" d_revalidate that just checks to see whether the inode is still
      good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
      special casing.
      
      [AV: changed method name, added note in porting, fixed confusion re
      having it possibly called from RCU mode (it won't be)]
      
      Cc: NeilBrown <neilb@suse.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ecf3d1f1
  15. 04 1月, 2013 1 次提交
  16. 21 12月, 2012 3 次提交
    • M
      documentation: drop vmtruncate · b9f61c3c
      Marco Stornelli 提交于
      Removed vmtruncate
      Signed-off-by: NMarco Stornelli <marco.stornelli@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b9f61c3c
    • D
      FS-Cache: Provide proper invalidation · ef778e7a
      David Howells 提交于
      Provide a proper invalidation method rather than relying on the netfs retiring
      the cookie it has and getting a new one.  The problem with this is that isn't
      easy for the netfs to make sure that it has completed/cancelled all its
      outstanding storage and retrieval operations on the cookie it is retiring.
      
      Instead, have the cache provide an invalidation method that will cancel or wait
      for all currently outstanding operations before invalidating the cache, and
      will cause new operations to queue up behind that.  Whilst invalidation is in
      progress, some requests will be rejected until the cache can stack a barrier on
      the operation queue to cause new operations to be deferred behind it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ef778e7a
    • D
      FS-Cache: Fix operation state management and accounting · 9f10523f
      David Howells 提交于
      Fix the state management of internal fscache operations and the accounting of
      what operations are in what states.
      
      This is done by:
      
       (1) Give struct fscache_operation a enum variable that directly represents the
           state it's currently in, rather than spreading this knowledge over a bunch
           of flags, who's processing the operation at the moment and whether it is
           queued or not.
      
           This makes it easier to write assertions to check the state at various
           points and to prevent invalid state transitions.
      
       (2) Add an 'operation complete' state and supply a function to indicate the
           completion of an operation (fscache_op_complete()) and make things call
           it.  The final call to fscache_put_operation() can then check that an op
           in the appropriate state (complete or cancelled).
      
       (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
           govern the state of an object:
      
      	(a) The ->n_ops is now the number of extant operations on the object
      	    and is now decremented by fscache_put_operation() only.
      
      	(b) The ->n_in_progress is simply the number of objects that have been
      	    taken off of the object's pending queue for the purposes of being
      	    run.  This is decremented by fscache_op_complete() only.
      
      	(c) The ->n_exclusive is the number of exclusive ops that have been
      	    submitted and queued or are in progress.  It is decremented by
      	    fscache_op_complete() and by fscache_cancel_op().
      
           fscache_put_operation() and fscache_operation_gc() now no longer try to
           clean up ->n_exclusive and ->n_in_progress.  That was leading to double
           decrements against fscache_cancel_op().
      
           fscache_cancel_op() now no longer decrements ->n_ops.  That was leading to
           double decrements against fscache_put_operation().
      
           fscache_submit_exclusive_op() now decides whether it has to queue an op
           based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
           will persist in being true even after all preceding operations have been
           cancelled or completed.  Furthermore, if an object is active and there are
           runnable ops against it, there must be at least one op running.
      
       (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
           provide a function to record completion of the pages as they complete.
      
           When n_pages reaches 0, the operation is deemed to be complete and
           fscache_op_complete() is called.
      
           Add calls to fscache_retrieval_complete() anywhere we've finished with a
           page we've been given to read or allocate for.  This includes places where
           we just return pages to the netfs for reading from the server and where
           accessing the cache fails and we discard the proposed netfs page.
      
      The bugs in the unfixed state management manifest themselves as oopses like the
      following where the operation completion gets out of sync with return of the
      cookie by the netfs.  This is possible because the cache unlocks and returns
      all the netfs pages before recording its completion - which means that there's
      nothing to stop the netfs discarding them and returning the cookie.
      
      
      FS-Cache: Cookie 'NFS.fh' still has outstanding reads
      ------------[ cut here ]------------
      kernel BUG at fs/fscache/cookie.c:519!
      invalid opcode: 0000 [#1] SMP
      CPU 1
      Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
      
      Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090                  /DG965RY
      RIP: 0010:[<ffffffffa007050a>]  [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
      RSP: 0018:ffff8800368cfb00  EFLAGS: 00010282
      RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
      RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
      RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
      R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
      R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
      FS:  0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
      Stack:
       ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
       ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
       ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
      Call Trace:
       [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
       [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
       [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
       [<ffffffff810d8d47>] evict+0xa1/0x15c
       [<ffffffff810d8e2e>] dispose_list+0x2c/0x38
       [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
       [<ffffffff810c56b7>] prune_super+0xd5/0x140
       [<ffffffff8109b615>] shrink_slab+0x102/0x1ab
       [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595
       [<ffffffff8103e009>] ? process_timeout+0xb/0xb
       [<ffffffff8109dba3>] kswapd+0x270/0x289
       [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
       [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595
       [<ffffffff8104bf7a>] kthread+0x7f/0x87
       [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
       [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
       [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
       [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
       [<ffffffff813ad6b0>] ? gs_change+0xb/0xb
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9f10523f
  17. 18 12月, 2012 5 次提交
    • C
      docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output · e71ec593
      Cyrill Gorcunov 提交于
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e71ec593
    • C
      docs: add documentation about /proc/<pid>/fdinfo/<fd> output · f1d8c162
      Cyrill Gorcunov 提交于
      [akpm@linux-foundation.org: tweak documentation]
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1d8c162
    • K
      /proc/pid/status: add "Seccomp" field · 2f4b3bf6
      Kees Cook 提交于
      It is currently impossible to examine the state of seccomp for a given
      process.  While attaching with gdb and attempting "call
      prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
      reliable.  If the process is in seccomp mode 1, this query will kill the
      process (prctl not allowed), if the process is in mode 2 with prctl not
      allowed, it will similarly be killed, and in weird cases, if prctl is
      filtered to return errno 0, it can look like seccomp is disabled.
      
      When reviewing the state of running processes, there should be a way to
      externally examine the seccomp mode.  ("Did this build of Chrome end up
      using seccomp?" "Did my distro ship ssh with seccomp enabled?")
      
      This adds the "Seccomp" line to /proc/$pid/status.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f4b3bf6
    • C
      procfs: add VmFlags field in smaps output · 834f82e2
      Cyrill Gorcunov 提交于
      During c/r sessions we've found that there is no way at the moment to
      fetch some VMA associated flags, such as mlock() and madvise().
      
      This leads us to a problem -- we don't know if we should call for mlock()
      and/or madvise() after restore on the vma area we're bringing back to
      life.
      
      This patch intorduces a new field into "smaps" output called VmFlags,
      where all set flags associated with the particular VMA is shown as two
      letter mnemonics.
      
      [ Strictly speaking for c/r we only need mlock/madvise bits but it has been
        said that providing just a few flags looks somehow inconsistent.  So all
        flags are here now. ]
      
      This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as
      other applications may start to use these fields.
      
      The data is encoded in a somewhat awkward two letters mnemonic form, to
      encourage userspace to be prepared for fields being added or removed in
      the future.
      
      [a.p.zijlstra@chello.nl: props to use for_each_set_bit]
      [sfr@canb.auug.org.au: props to use array instead of struct]
      [akpm@linux-foundation.org: overall redesign and simplification]
      [akpm@linux-foundation.org: remove unneeded braces per sfr, avoid using bloaty for_each_set_bit()]
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      834f82e2
    • J
      fat: provide option for setting timezone offset · 58156c8f
      Jan Kara 提交于
      So far FAT either offsets time stamps by sys_tz.minuteswest or leaves them
      as they are (when tz=UTC mount option is used).  However in some cases it
      is useful if one can specify time stamp offset on his own (e.g.  when time
      zone of the camera connected is different from time zone of the computer,
      or when HW clock is in UTC and thus sys_tz.minuteswest == 0).
      
      So provide a mount option time_offset= which allows user to specify offset
      in minutes that should be applied to time stamps on the filesystem.
      
      akpm: this code would work incorrectly when used via `mount -o remount',
      because cached inodes would not be updated.  But fatfs's fat_remount() is
      basically a no-op anyway.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58156c8f
  18. 11 12月, 2012 4 次提交
    • H
      f2fs: fix a typo in f2fs documentation · d08ab08d
      Huajun Li 提交于
      In f2fs_fs.h, one f2fs inode contains 923 data block pointers, while
      f2fs documentation says it is 929. Fix this inconsistence.
      Signed-off-by: NHuajun Li <huajun.li.lee@gmail.com>
      d08ab08d
    • J
      f2fs: update the f2fs document · 5bb446a2
      Jaegeuk Kim 提交于
      I moved the f2fs-tools.git into kernel.org.
      And I added a new mailing list, linux-f2fs-devel@lists.sourceforge.net.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      5bb446a2
    • J
      f2fs: add document · 98e4da8c
      Jaegeuk Kim 提交于
      This adds a document describing the mount options, proc entries, usage, and
      design of Flash-Friendly File System, namely F2FS.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      98e4da8c
    • T
      ext4: Remove CONFIG_EXT4_FS_XATTR · 939da108
      Tao Ma 提交于
      Ted has sent out a RFC about removing this feature. Eric and Jan
      confirmed that both RedHat and SUSE enable this feature in all their
      product.  David also said that "As far as I know, it's enabled in all
      Android kernels that use ext4."  So it seems OK for us.
      
      And what's more, as inline data depends its implementation on xattr,
      and to be frank, I don't run any test again inline data enabled while
      xattr disabled.  So I think we should add inline data and remove this
      config option in the same release.
      
      [ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which
        isn't much in the grand scheme of things.  Since no one seems to be
        testing this configuration except for some automated compile farms, on
        balance we are better removing this config option, and so that it is
        effectively always enabled. -- tytso ]
      
      Cc: David Brown <davidb@codeaurora.org>
      Cc: Eric Sandeen <sandeen@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      939da108
  19. 27 11月, 2012 1 次提交
  20. 26 11月, 2012 1 次提交
    • J
      nfsd4: delay filling in write iovec array till after xdr decoding · ffe1137b
      J. Bruce Fields 提交于
      Our server rejects compounds containing more than one write operation.
      It's unclear whether this is really permitted by the spec; with 4.0,
      it's possibly OK, with 4.1 (which has clearer limits on compound
      parameters), it's probably not OK.  No client that we're aware of has
      ever done this, but in theory it could be useful.
      
      The source of the limitation: we need an array of iovecs to pass to the
      write operation.  In the worst case that array of iovecs could have
      hundreds of elements (the maximum rwsize divided by the page size), so
      it's too big to put on the stack, or in each compound op.  So we instead
      keep a single such array in the compound argument.
      
      We fill in that array at the time we decode the xdr operation.
      
      But we decode every op in the compound before executing any of them.  So
      once we've used that array we can't decode another write.
      
      If we instead delay filling in that array till the time we actually
      perform the write, we can reuse it.
      
      Another option might be to switch to decoding compound ops one at a
      time.  I considered doing that, but it has a number of other side
      effects, and I'd rather fix just this one problem for now.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ffe1137b
  21. 17 11月, 2012 1 次提交
  22. 11 11月, 2012 1 次提交
  23. 08 11月, 2012 1 次提交
  24. 03 11月, 2012 1 次提交
  25. 30 10月, 2012 1 次提交
  26. 09 10月, 2012 1 次提交