1. 11 5月, 2011 1 次提交
    • E
      ns: proc files for namespace naming policy. · 6b4e306a
      Eric W. Biederman 提交于
      Create files under /proc/<pid>/ns/ to allow controlling the
      namespaces of a process.
      
      This addresses three specific problems that can make namespaces hard to
      work with.
      - Namespaces require a dedicated process to pin them in memory.
      - It is not possible to use a namespace unless you are the child
        of the original creator.
      - Namespaces don't have names that userspace can use to talk about
        them.
      
      The namespace files under /proc/<pid>/ns/ can be opened and the
      file descriptor can be used to talk about a specific namespace, and
      to keep the specified namespace alive.
      
      A namespace can be kept alive by either holding the file descriptor
      open or bind mounting the file someplace else.  aka:
      mount --bind /proc/self/ns/net /some/filesystem/path
      mount --bind /proc/self/fd/<N> /some/filesystem/path
      
      This allows namespaces to be named with userspace policy.
      
      It requires additional support to make use of these filedescriptors
      and that will be comming in the following patches.
      Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      6b4e306a
  2. 04 5月, 2011 1 次提交
    • L
      logfs: initialize superblock entries earlier · cce2c56e
      Linus Torvalds 提交于
      In particular, s_freeing_list needs to be initialized early, since it is
      used on some of the error paths when mounts fail.  The mapping inode,
      for example, would be initialized and then free'd on an error path
      before s_freeing_list was initialized, but the inode drop operation
      needs the s_freeing_list to be set up.
      
      Normally you'd never see this, because not only is logfs fairly rare,
      but a successful mount will never have any issues.
      Reported-by: Nwerner <w.landgraf@ru.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cce2c56e
  3. 03 5月, 2011 2 次提交
    • A
      UBIFS: seek journal heads to the latest bud in replay · 52c6e6f9
      Artem Bityutskiy 提交于
      This is the second fix of the following symptom:
      
      UBIFS error (pid 34456): could not find an empty LEB
      
      which sometimes happens after power cuts when we mount the file-system - UBIFS
      refuses it with the above error message which comes from the
      'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test
      with the UBIFS power cut emulation enabled.
      
      Analysis of the problem.
      
      Currently UBIFS replay seeks the journal heads to the last _replayed_ bud.
      But the buds are replayed out-of-order, so the replay basically seeks journal
      heads to the "random" bud belonging to this head, and not to the _last_ one.
      
      The result of this is that the GC head may be seeked to a full LEB with no free
      space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a
      fully or mostly dirty LEB to match the current GC head (because we need to
      garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum).
      So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and
      also fail. As a result - recovery fails and mounting fails.
      
      This patch teaches the replay to initialize the GC heads exactly to the latest
      buds, i.e. the buds which have the largest sequence number in corresponding
      log reference nodes.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: stable@kernel.org
      52c6e6f9
    • A
      UBIFS: do not free write-buffers when in R/O mode · b50b9f40
      Artem Bityutskiy 提交于
      Currently UBIFS has a small optimization - it frees write-buffers when it is
      re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it
      does not allocate write-buffers as well.
      
      This optimization is nice but it leads to subtle problems and complications
      in recovery, which I can reproduce using the integck test. The symptoms are
      that after a power cut the file-system cannot be mounted if we first mount
      it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints:
      
      UBIFS error (pid 34456): could not find an empty LEB
      
      Analysis of the  problem.
      
      When mounting R/W, the reply process sets journal heads to buds [1], but
      when mounting R/O - it does not do this, because the write-buffers are not
      allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the
      same file-system but for the following 2 cases:
      
      1. mounting R/W after a power cut and recover
      2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery
      
      In the former case, we have journal heads seeked to the a bud, in the latter
      case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not
      try to recover the GC LEB by garbage-collecting to the GC head, but we just
      try to find an empty LEB, and there may be no empty LEBs, so we just fail.
      On the other hand, in the former case (mount R/W), we are able to make a GC LEB
      (@c->gc_lnum) by garbage-collecting.
      
      Thus, let's remove this small nice optimization and always allocate
      write-buffers. This should not make too big difference - we have only 3
      of them, each of max. write unit size, which is usually 2KiB. So this is
      about 6KiB of RAM for the typical case, and only when mounted R/O.
      
      [1]: Note, currently the replay process is setting (seeking) the journal heads
      to _some_ buds, not necessarily to the buds which had been the journal heads
      before the power cut happened. This will be fixed separately.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: stable@kernel.org
      b50b9f40
  4. 29 4月, 2011 1 次提交
  5. 28 4月, 2011 3 次提交
  6. 27 4月, 2011 1 次提交
  7. 26 4月, 2011 14 次提交
  8. 25 4月, 2011 2 次提交
  9. 24 4月, 2011 2 次提交
    • L
      vfs: get rid of insane dentry hashing rules · dea3667b
      Linus Torvalds 提交于
      The dentry hashing rules have been really quite complicated for a long
      while, in odd ways.  That made functions like __d_drop() very fragile
      and non-obvious.
      
      In particular, whether a dentry was hashed or not was indicated with an
      explicit DCACHE_UNHASHED bit.  That's despite the fact that the hash
      abstraction that the dentries use actually have a 'is this entry hashed
      or not' model (which is a simple test of the 'pprev' pointer).
      
      The reason that was done is because we used the normal 'is this entry
      unhashed' model to mark whether the dentry had _ever_ been hashed in the
      dentry hash tables, and that logic goes back many years (commit
      b3423415: "dcache: avoid RCU for never-hashed dentries").
      
      That, in turn, meant that __d_drop had totally different unhashing logic
      for the dentry hash table case and for the anonymous dcache case,
      because in order to use the "is this dentry hashed" logic as a flag for
      whether it had ever been on the RCU hash table, we had to unhash such a
      dentry differently so that we'd never think that it wasn't 'unhashed'
      and wouldn't be free'd correctly.
      
      That's just insane.  It made the logic really hard to follow, when there
      were two different kinds of "unhashed" states, and one of them (the one
      that used "list_bl_unhashed()") really had nothing at all to do with
      being unhashed per se, but with a very subtle lifetime rule instead.
      
      So turn all of it around, and make it logical.
      
      Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
      the dentry is on the hash chains or not, use the hash chain unhashed
      logic for that.  Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
      and everything makes sense.
      
      And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
      If we ever insert the dentry into the dentry hash table so that it is
      visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
      needs the RCU lifetime rules.  Now suddently that test at dentry free
      time makes sense too.
      
      And because unhashing now is sane and doesn't depend on where the dentry
      got unhashed from (because the dentry hash chain details doesn't have
      some subtle side effects), we can re-unify the __d_drop() logic and use
      common code for the unhashing.
      
      Also fix one more open-coded hash chain bit_spin_lock() that I missed in
      the previous chain locking cleanup commit.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dea3667b
    • L
      vfs: get rid of 'struct dcache_hash_bucket' abstraction · b07ad996
      Linus Torvalds 提交于
      It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
      help anything - quite the reverse.  All the users end up having to know
      about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
      etc. So it just makes the code look confusing.
      
      And the cost of it is extra '&b->head' syntactic noise, but more
      importantly it spuriously makes the hash table dentry list look
      different from the per-superblock DCACHE_DISCONNECTED dentry list.
      
      As a result, the code ended up using ad-hoc locking for one case and
      special helper functions for what is really another totally identical
      case in the very same function.
      
      Make it all look and work the same.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b07ad996
  10. 22 4月, 2011 1 次提交
  11. 21 4月, 2011 4 次提交
    • J
      vfs: Pass setxattr(2) flags properly · df7e1303
      Jan Kara 提交于
      For some reason generic_setxattr() did not pass flags (XATTR_CREATE,
      XATTR_REPLACE) to the filesystem specific helper. This caused that
      setxattr(2) syscall just ignored these flags.
      
      Fix the bug by passing flags correctly.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df7e1303
    • A
      UBIFS: fix master node recovery · 6e0d9fd3
      Artem Bityutskiy 提交于
      This patch fixes the following symptoms:
      1. Unmount UBIFS cleanly.
      2. Start mounting UBIFS R/W and have a power cut immediately
      3. Start mounting UBIFS R/O, this succeeds
      4. Try to re-mount UBIFS R/W - this fails immediately or later on,
         because UBIFS will write the master node to the flash area
         which has been written before.
      
      The analysis of the problem:
      
      1. UBIFS is unmounted cleanly, both copies of the master node are clean.
      2. UBIFS is being mounter R/W, starts changing master node copy 1, and
         a power cut happens. The copy N1 becomes corrupted.
      3. UBIFS is being mounted R/O. It notices the copy N1 is corrupted and
         reads copy N2. Copy N2 is clean.
      4. Because of R/O mode, UBIFS cannot recover copy 1.
      5. The mount code (ubifs_mount()) sees that the master node is clean,
         so it decides that no recovery is needed.
      6. We are re-mounting R/W. UBIFS believes no recovery is needed and
         starts updating the master node, but copy N1 is still corrupted
         and was not recovered!
      
      Fix this problem by marking the master node as dirty every time we
      recover it and we are in R/O mode. This forces further recovery and
      the UBIFS cleans-up the corruptions and recovers the copy N1 when
      re-mounting R/W later.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: stable@kernel.org
      6e0d9fd3
    • A
      UBIFS: fix false assertion warning in case of I/O failures · 1a067a22
      Artem Bityutskiy 提交于
      When UBIFS switches to R/O mode because it detects I/O failures, then
      when we unmount, we still may have allocated budget, and the assertions
      which verify that we have not budget will fire. But it is expected to
      have the budget in case of I/O failures, so the assertion warnings will
      be false. Suppress them for the I/O failure case.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      1a067a22
    • D
      xfs: fix duplicate message output · 3eff1268
      Dave Chinner 提交于
      Commit 957935dc ("xfs: fix xfs_debug warnings" broke the logic in
      __xfs_printk(). Instead of only printing one of two possible output
      strings based on whether the fs has a name or not, it outputs both.
      Fix it to only output one message again.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      3eff1268
  12. 20 4月, 2011 4 次提交
    • A
      UBIFS: fix false space checking failure · 8c230d9a
      Artem Bityutskiy 提交于
      This patch fixes UBIFS mount failure when the debugging support is enabled,
      we are recovering from a power cut, we were first mounter R/O and we are
      re-mounting R/W. In this case we should not assume that the amount of free
      space before we have re-mounted R/W and after are equivalent, because
      when we have mounted R/O the file-system is in a non-committed state so
      the amount of free space is slightly smaller, due to the fact that we cannot
      predict the amount of free space precisely before we commit.
      
      This patch fixes the issue by skipping the debugging check in case of
      recovery. This issue was reported by Caizhiyong <caizhiyong@huawei.com>
      here: http://thread.gmane.org/gmane.linux.drivers.mtd/34350/focus=34387Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reported-by: NCaizhiyong <caizhiyong@huawei.com>
      Cc: stable@kernel.org [2.6.30+]
      8c230d9a
    • S
      Open with O_CREAT flag set fails to open existing files on non writable directories · 1574dff8
      Sachin Prabhu 提交于
      An open on a NFS4 share using the O_CREAT flag on an existing file for
      which we have permissions to open but contained in a directory with no
      write permissions will fail with EACCES.
      
      A tcpdump shows that the client had set the open mode to UNCHECKED which
      indicates that the file should be created if it doesn't exist and
      encountering an existing flag is not an error. Since in this case the
      file exists and can be opened by the user, the NFS server is wrong in
      attempting to check create permissions on the parent directory.
      
      The patch adds a conditional statement to check for create permissions
      only if the file doesn't exist.
      Signed-off-by: NSachin S. Prabhu <sprabhu@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1574dff8
    • C
      Btrfs: do some plugging in the submit_bio threads · 211588ad
      Chris Mason 提交于
      The Btrfs submit bio threads have a small number of
      threads responsible for pushing down bios we've collected
      for a large number of devices.
      
      Since we do all the bios for a single device at once,
      we want to make sure we unplug and send down the bios
      for each device as we're done processing them.
      
      The new plugging API removed the btrfs code to
      unplug while processing bios, this adds it back with
      the new API.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      211588ad
    • O
      nfsd4: Fix filp leak · a96e5b90
      OGAWA Hirofumi 提交于
      23fcf2ec (nfsd4: fix oops on lock failure)
      
      The above patch breaks free path for stp->st_file. If stp was inserted
      into sop->so_stateids, we have to free stp->st_file refcount. Because
      stp->st_file refcount itself is taken whether or not any refcounts are
      taken on the stp->st_file->fi_fds[].
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a96e5b90
  13. 19 4月, 2011 4 次提交