1. 20 8月, 2011 1 次提交
    • J
      ext4: flush any pending end_io requests before DIO reads w/dioread_nolock · dccaf33f
      Jiaying Zhang 提交于
      There is a race between ext4 buffer write and direct_IO read with
      dioread_nolock mount option enabled. The problem is that we clear
      PageWriteback flag during end_io time but will do
      uninitialized-to-initialized extent conversion later with dioread_nolock.
      If an O_direct read request comes in during this period, ext4 will return
      zero instead of the recently written data.
      
      This patch checks whether there are any pending uninitialized-to-initialized
      extent conversion requests before doing O_direct read to close the race.
      Note that this is just a bandaid fix. The fundamental issue is that we
      clear PageWriteback flag before we really complete an IO, which is
      problem-prone. To fix the fundamental issue, we may need to implement an
      extent tree cache that we can use to look up pending to-be-converted extents.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      dccaf33f
  2. 19 8月, 2011 2 次提交
  3. 18 8月, 2011 4 次提交
  4. 17 8月, 2011 13 次提交
  5. 16 8月, 2011 1 次提交
    • J
      cifs: demote cERROR in build_path_from_dentry to cFYI · fa71f447
      Jeff Layton 提交于
      Running the cthon tests on a recent kernel caused this message to pop
      occasionally:
      
          CIFS VFS: did not end path lookup where expected namelen is 0
      
      Some added debugging showed that namelen and dfsplen were both 0 when
      this occurred. That means that the read_seqretry returned true.
      
      Assuming that the comment inside the if statement is true, this should
      be harmless and just means that we raced with a rename. If that is the
      case, then there's no need for alarm and we can demote this to cFYI.
      
      While we're at it, print the dfsplen too so that we can see what
      happened here if the message pops during debugging.
      
      Cc: stable@kernel.org
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      fa71f447
  6. 14 8月, 2011 3 次提交
    • T
      ext4: fix nomblk_io_submit option so it correctly converts uninit blocks · 9dd75f1f
      Theodore Ts'o 提交于
      Bug discovered by Jan Kara:
      
      Finally, commit 1449032b returned back
      the old IO submission code but apparently it forgot to return the old
      handling of uninitialized buffers so we unconditionnaly call
      block_write_full_page() without specifying end_io function. So AFAICS
      we never convert unwritten extents to written in some cases. For
      example when I mount the fs as: mount -t ext4 -o
      nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
              int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
              char buf[1024];
              memset(buf, 'a', sizeof(buf));
              fallocate(fd, 0, 0, 16384);
              write(fd, buf, sizeof(buf));
      
      I get a file full of zeros (after remounting the filesystem so that
      pagecache is dropped) instead of seeing the first KB contain 'a's.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      9dd75f1f
    • T
      ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. · 32c80b32
      Tao Ma 提交于
      EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
      should be done simultaneously since ext4_end_io_nolock always clear
      the flag and decrease the counter in the same time.
      
      We don't increase i_aiodio_unwritten when setting
      EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
      to wait forever.
      
      Part of the patch came from Eric in his e-mail, but it doesn't fix the
      problem met by Michael actually.
      
      http://marc.info/?l=linux-ext4&m=131316851417460&w=2
      
      Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      32c80b32
    • J
      ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode · 2581fdc8
      Jiaying Zhang 提交于
      Flush inode's i_completed_io_list before calling ext4_io_wait to
      prevent the following deadlock scenario: A page fault happens while
      some process is writing inode A. During page fault,
      shrink_icache_memory is called that in turn evicts another inode
      B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
      that waits for inode B's i_ioend_count to become zero. However, inode
      B's ioend work was queued behind some of inode A's ioend work on the
      same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
      thread on that cpu is processing inode A's ioend work, it tries to
      grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
      still hold before the page fault happened, we enter a deadlock.
      
      Also moves ext4_flush_completed_IO and ext4_ioend_wait from
      ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
      ext4_evict_inode() is called before ext4_destroy_inode() and in
      ext4_evict_inode(), we may call ext4_truncate() without holding
      i_mutex lock. As a result, there is a race between flush_completed_IO
      that is called from ext4_ext_truncate() and ext4_end_io_work, which
      may cause corruption on an io_end structure. This change moves
      ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
      to ext4_evict_inode() to resolve the race between ext4_truncate() and
      ext4_end_io_work during inode deletion.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      2581fdc8
  7. 13 8月, 2011 3 次提交
  8. 12 8月, 2011 6 次提交
    • B
      pnfs: Automatically select blocks & objects layouts · 8cf1fb21
      Boaz Harrosh 提交于
      Just like files-layout, blocks & objects layouts are part of the
      NFS 4.1 protocol and should be automatically selected if NFS_4_1
      is selected. The small problem is that these depend on other
      Kernel support being present, while files only depends on NFS
      itself.
      
      This patch removes from the user choice the presence of objects
      and blocks layout. But makes sure these are selected only if
      the depended subsystems are present in the Kernel.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Acked-by: NPeng Tao <peng_tao@emc.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8cf1fb21
    • E
      ext4: Properly count journal credits for long symlinks · 8c208719
      Eric Sandeen 提交于
      Commit df5e6223 ("ext4: fix deadlock in ext4_symlink() in ENOSPC
      conditions") recalculated the number of credits needed for a long
      symlink, in the process of splitting it into two transactions.  However,
      the first credit calculation under-counted because if selinux is
      enabled, credits are needed to create the selinux xattr as well.
      
      Overrunning the reservation will result in an OOPS in
      jbd2_journal_dirty_metadata() due to this assert:
      
        J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
      
      Fix this by increasing the reservation size.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c208719
    • E
      ext3: Properly count journal credits for long symlinks · d2db60df
      Eric Sandeen 提交于
      Commit ae54870a ("ext3: Fix lock inversion in ext3_symlink()")
      recalculated the number of credits needed for a long symlink, in the
      process of splitting it into two transactions.  However, the first
      credit calculation under-counted because if selinux is enabled, credits
      are needed to create the selinux xattr as well.
      
      Overrunning the reservation will result in an OOPS in
      journal_dirty_metadata() due to this assert:
      
        J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
      
      Fix this by increasing the reservation size.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2db60df
    • V
      move RLIMIT_NPROC check from set_user() to do_execve_common() · 72fa5997
      Vasiliy Kulikov 提交于
      The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
      check in set_user() to check for NPROC exceeding via setuid() and
      similar functions.
      
      Before the check there was a possibility to greatly exceed the allowed
      number of processes by an unprivileged user if the program relied on
      rlimit only.  But the check created new security threat: many poorly
      written programs simply don't check setuid() return code and believe it
      cannot fail if executed with root privileges.  So, the check is removed
      in this patch because of too often privilege escalations related to
      buggy programs.
      
      The NPROC can still be enforced in the common code flow of daemons
      spawning user processes.  Most of daemons do fork()+setuid()+execve().
      The check introduced in execve() (1) enforces the same limit as in
      setuid() and (2) doesn't create similar security issues.
      
      Neil Brown suggested to track what specific process has exceeded the
      limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
      this process would fail on execve(), and other processes' execve()
      behaviour is not changed.
      
      Solar Designer suggested to re-check whether NPROC limit is still
      exceeded at the moment of execve().  If the process was sleeping for
      days between set*uid() and execve(), and the NPROC counter step down
      under the limit, the defered execve() failure because NPROC limit was
      exceeded days ago would be unexpected.  If the limit is not exceeded
      anymore, we clear the flag on successful calls to execve() and fork().
      
      The flag is also cleared on successful calls to set_user() as the limit
      was exceeded for the previous user, not the current one.
      
      Similar check was introduced in -ow patches (without the process flag).
      
      v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72fa5997
    • S
      cifs: Do not set cifs/ntfs acl using a file handle (try #4) · e22906c5
      Shirish Pargaonkar 提交于
      Set security descriptor using path name instead of a file handle.
      We can't be sure that the file handle has adequate permission to
      set a security descriptor (to modify DACL).
      
      Function set_cifs_acl_by_fid() has been removed since we can't be
      sure how a file was opened for writing, a valid request can fail
      if the file was not opened with two above mentioned permissions.
      We could have opted to add on WRITE_DAC and WRITE_OWNER permissions
      to file opens and then use that file handle but adding addtional
      permissions such as WRITE_DAC and WRITE_OWNER could cause an
      any open to fail.
      
      And it was incorrect to look for read file handle to set a
      security descriptor anyway.
      Signed-off-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      e22906c5
    • S
      [CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable · 789e6661
      Steve French 提交于
      Christoph had requested that the stats related code (in
      CONFIG_CIFS_STATS2) be moved into helpers to make code flow more
      readable.   This patch should help.   For example the following
      section from transport.c
      
                             spin_unlock(&GlobalMid_Lock);
                             atomic_inc(&ses->server->num_waiters);
                             wait_event(ses->server->request_q,
                                        atomic_read(&ses->server->inFlight)
                                          < cifs_max_pending);
                             atomic_dec(&ses->server->num_waiters);
                             spin_lock(&GlobalMid_Lock);
      
      becomes simpler (with the patch below):
                             spin_unlock(&GlobalMid_Lock);
                             cifs_num_waiters_inc(server);
                             wait_event(server->request_q,
                                        atomic_read(&server->inFlight)
                                          < cifs_max_pending);
                             cifs_num_waiters_dec(server);
                             spin_lock(&GlobalMid_Lock);
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      CC: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      Reviewed-by: NPavel Shilovsky <piastry@etersoft.ru>
      789e6661
  9. 11 8月, 2011 2 次提交
  10. 10 8月, 2011 5 次提交