1. 14 3月, 2011 1 次提交
    • A
      compat breakage in preadv() and pwritev() · 586ce098
      Al Viro 提交于
      Fix for a dumb preadv()/pwritev() compat bug - unlike the native
      variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g.
      on pipe the native preadv() will fail with -ESPIPE and compat one will
      act as readv() and succeed.  Not critical, but it's a clear bug with trivial
      fix.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      586ce098
  2. 10 3月, 2011 10 次提交
  3. 09 3月, 2011 2 次提交
  4. 08 3月, 2011 2 次提交
    • A
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro 提交于
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
    • J
      nfsd4: fix bad pointer on failure to find delegation · 32b007b4
      J. Bruce Fields 提交于
      In case of a nonempty list, the return on error here is obviously bogus;
      it ends up being a pointer to the list head instead of to any valid
      delegation on the list.
      
      In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
      then renew_client may oops, and it may take an embarassingly long time to
      figure out why.  Facepalm.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
      IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200
      ...
      
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      32b007b4
  5. 05 3月, 2011 3 次提交
    • N
      nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) · e9e3d724
      Neil Horman 提交于
      The "bad_page()" page allocator sanity check was reported recently (call
      chain as follows):
      
        bad_page+0x69/0x91
        free_hot_cold_page+0x81/0x144
        skb_release_data+0x5f/0x98
        __kfree_skb+0x11/0x1a
        tcp_ack+0x6a3/0x1868
        tcp_rcv_established+0x7a6/0x8b9
        tcp_v4_do_rcv+0x2a/0x2fa
        tcp_v4_rcv+0x9a2/0x9f6
        do_timer+0x2df/0x52c
        ip_local_deliver+0x19d/0x263
        ip_rcv+0x539/0x57c
        netif_receive_skb+0x470/0x49f
        :virtio_net:virtnet_poll+0x46b/0x5c5
        net_rx_action+0xac/0x1b3
        __do_softirq+0x89/0x133
        call_softirq+0x1c/0x28
        do_softirq+0x2c/0x7d
        do_IRQ+0xec/0xf5
        default_idle+0x0/0x50
        ret_from_intr+0x0/0xa
        default_idle+0x29/0x50
        cpu_idle+0x95/0xb8
        start_kernel+0x220/0x225
        _sinittext+0x22f/0x236
      
      It occurs because an skb with a fraglist was freed from the tcp
      retransmit queue when it was acked, but a page on that fraglist had
      PG_Slab set (indicating it was allocated from the Slab allocator (which
      means the free path above can't safely free it via put_page.
      
      We tracked this back to an nfsv4 setacl operation, in which the nfs code
      attempted to fill convert the passed in buffer to an array of pages in
      __nfs4_proc_set_acl, which gets used by the skb->frags list in
      xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
      to a page struct via virt_to_page, but the vfs allocates the buffer via
      kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
      kmalloc and free it later in the tcp ack path with put_page, so we need
      to either:
      
      1) ensure that when we create the list of pages, no page struct has
         PG_Slab set
      
       or
      
      2) not use a page list to send this data
      
      Given that these buffers can be multiple pages and arbitrarily sized, I
      think (1) is the right way to go.  I've written the below patch to
      allocate a page from the buddy allocator directly and copy the data over
      to it.  This ensures that we have a put_page free-able page for every
      entry that winds up on an skb frag list, so it can be safely freed when
      the frame is acked.  We do a put page on each entry after the
      rpc_call_sync call so as to drop our own reference count to the page,
      leaving only the ref count taken by tcp_sendpages.  This way the data
      will be properly freed when the ack comes in
      
      Successfully tested by myself to solve the above oops.
      
      Note, as this is the result of a setacl operation that exceeded a page
      of data, I think this amounts to a local DOS triggerable by an
      uprivlidged user, so I'm CCing security on this as well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Trond Myklebust <Trond.Myklebust@netapp.com>
      CC: security@kernel.org
      CC: Jeff Layton <jlayton@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9e3d724
    • S
      ceph: no .snap inside of snapped namespace · 455cec0a
      Sage Weil 提交于
      Otherwise you can do things like
      
      # mkdir .snap/foo
      # cd .snap/foo/.snap
      # ls
      <badness>
      Signed-off-by: NSage Weil <sage@newdream.net>
      455cec0a
    • A
      minimal fix for do_filp_open() race · 1858efd4
      Al Viro 提交于
      failure exits on the no-O_CREAT side of do_filp_open() merge with
      those of O_CREAT one; unfortunately, if do_path_lookup() returns
      -ESTALE, we'll get out_filp:, notice that we are about to return
      -ESTALE without having trying to create the sucker with LOOKUP_REVAL
      and jump right into the O_CREAT side of code.  And proceed to try
      and create a file.  Usually that'll fail with -ESTALE again, but
      we can race and get that attempt of pathname resolution to succeed.
      
      open() without O_CREAT really shouldn't end up creating files, races
      or not.  The real fix is to rearchitect the whole do_filp_open(),
      but for now splitting the failure exits will do.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1858efd4
  6. 04 3月, 2011 3 次提交
  7. 03 3月, 2011 9 次提交
  8. 02 3月, 2011 3 次提交
    • J
      ext2: Fix link count corruption under heavy link+rename load · e8a80c6f
      Josh Hunt 提交于
      vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
      i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
      it as reported and analyzed by Josh.
      
      In fact, there is no good reason to mess with i_nlink of the moved file.
      We did it presumably to simulate linking into the new directory and unlinking
      from an old one. But the practical effect of this is disputable because fsck
      can possibly treat file as being properly linked into both directories without
      writing any error which is confusing. So we just stop increment-decrement
      games with i_nlink which also fixes the corruption.
      
      CC: stable@kernel.org
      CC: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      e8a80c6f
    • A
      xfs: zero proper structure size for geometry calls · af24ee9e
      Alex Elder 提交于
      Commit 493f3358 added this call to
      xfs_fs_geometry() in order to avoid passing kernel stack data back
      to user space:
      
      +       memset(geo, 0, sizeof(*geo));
      
      Unfortunately, one of the callers of that function passes the
      address of a smaller data type, cast to fit the type that
      xfs_fs_geometry() requires.  As a result, this can happen:
      
      Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
      in: f87aca93
      
      Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358+ #1
      Call Trace:
      
      [<c12991ac>] ? panic+0x50/0x150
      [<c102ed71>] ? __stack_chk_fail+0x10/0x18
      [<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]
      
      Fix this by fixing that one caller to pass the right type and then
      copy out the subset it is interested in.
      
      Note: This patch is an alternative to one originally proposed by
      Eric Sandeen.
      Reported-by: NJeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Tested-by: NJeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
      af24ee9e
    • R
      nilfs2: fix regression that i-flag is not set on changeless checkpoints · 72746ac6
      Ryusuke Konishi 提交于
      According to the report from Jiro SEKIBA titled "regression in
      2.6.37?"  (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and
      later kernels, lscp command no longer displays "i" flag on checkpoints
      that snapshot operations or garbage collection created.
      
      This is a regression of nilfs2 checkpointing function, and it's
      critical since it broke behavior of a part of nilfs2 applications.
      For instance, snapshot manager of TimeBrowse gets to create
      meaningless snapshots continuously; snapshot creation triggers another
      checkpoint, but applications cannot distinguish whether the new
      checkpoint contains meaningful changes or not without the i-flag.
      
      This patch fixes the regression and brings that application behavior
      back to normal.
      Reported-by: NJiro SEKIBA <jir@unicus.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NJiro SEKIBA <jir@unicus.jp>
      Cc: stable <stable@kernel.org>  [2.6.37]
      72746ac6
  9. 01 3月, 2011 1 次提交
  10. 26 2月, 2011 5 次提交
    • J
      aio: fix race between io_destroy() and io_submit() · 7137c6bd
      Jan Kara 提交于
      A race can occur when io_submit() races with io_destroy():
      
       CPU1						CPU2
      io_submit()
        do_io_submit()
          ...
          ctx = lookup_ioctx(ctx_id);
      						io_destroy()
          Now do_io_submit() holds the last reference to ctx.
          ...
          queue new AIO
          put_ioctx(ctx) - frees ctx with active AIOs
      
      We solve this issue by checking whether ctx is being destroyed in AIO
      submission path after adding new AIO to ctx.  Then we are guaranteed that
      either io_destroy() waits for new AIO or we see that ctx is being
      destroyed and bail out.
      
      Cc: Nick Piggin <npiggin@kernel.dk>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7137c6bd
    • N
      aio: fix rcu ioctx lookup · 3bd9a5d7
      Nick Piggin 提交于
      aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.
      
      lookup_ioctx doesn't implement the rcu lookup pattern properly.
      rcu_read_lock does not prevent refcount going to zero, so we might take
      a refcount on a zero count ioctx.
      
      Fix the bug by atomically testing for zero refcount before incrementing.
      
      [jack@suse.cz: added comment into the code]
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3bd9a5d7
    • T
      ldm: corrupted partition table can cause kernel oops · 294f6cf4
      Timo Warns 提交于
      The kernel automatically evaluates partition tables of storage devices.
      The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
      a bug that causes a kernel oops on certain corrupted LDM partitions.  A
      kernel subsystem seems to crash, because, after the oops, the kernel no
      longer recognizes newly connected storage devices.
      
      The patch changes ldm_parse_vmdb() to Validate the value of vblk_size.
      Signed-off-by: NTimo Warns <warns@pre-sense.de>
      Cc: Eugene Teo <eugeneteo@kernel.sg>
      Acked-by: NRichard Russon <ldm@flatcap.org>
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      294f6cf4
    • D
      epoll: prevent creating circular epoll structures · 22bacca4
      Davide Libenzi 提交于
      In several places, an epoll fd can call another file's ->f_op->poll()
      method with ep->mtx held.  This is in general unsafe, because that other
      file could itself be an epoll fd that contains the original epoll fd.
      
      The code defends against this possibility in its own ->poll() method using
      ep_call_nested, but there are several other unsafe calls to ->poll
      elsewhere that can be made to deadlock.  For example, the following simple
      program causes the call in ep_insert recursively call the original fd's
      ->poll, leading to deadlock:
      
       #include <unistd.h>
       #include <sys/epoll.h>
      
       int main(void) {
           int e1, e2, p[2];
           struct epoll_event evt = {
               .events = EPOLLIN
           };
      
           e1 = epoll_create(1);
           e2 = epoll_create(2);
           pipe(p);
      
           epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt);
           epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt);
           write(p[1], p, sizeof p);
           epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);
      
           return 0;
       }
      
      On insertion, check whether the inserted file is itself a struct epoll,
      and if so, do a recursive walk to detect whether inserting this file would
      create a loop of epoll structures, which could lead to deadlock.
      
      [nelhage@ksplice.com: Use epmutex to serialize concurrent inserts]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NNelson Elhage <nelhage@ksplice.com>
      Reported-by: NNelson Elhage <nelhage@ksplice.com>
      Tested-by: NNelson Elhage <nelhage@ksplice.com>
      Cc: <stable@kernel.org>		[2.6.34+, possibly earlier]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22bacca4
    • A
      afs: Fix oops in afs_unlink_writeback · f129ccc9
      Anton Blanchard 提交于
      I'm seeing the following oops when testing afs:
      
        Unable to handle kernel paging request for data at address 0x00000008
        ...
        NIP [c0000000003393b0] .afs_unlink_writeback+0x38/0xc0
        LR [c00000000033987c] .afs_put_writeback+0x98/0xec
        Call Trace:
        [c00000000345f600] [c00000000033987c] .afs_put_writeback+0x98/0xec
        [c00000000345f690] [c00000000033ae80] .afs_write_begin+0x6a4/0x75c
        [c00000000345f790] [c00000000012b77c] .generic_file_buffered_write+0x148/0x320
        [c00000000345f8d0] [c00000000012e1b8] .__generic_file_aio_write+0x37c/0x3e4
        [c00000000345f9d0] [c00000000012e2a8] .generic_file_aio_write+0x88/0xfc
        [c00000000345fa90] [c0000000003390a8] .afs_file_write+0x10c/0x178
        [c00000000345fb40] [c000000000188788] .do_sync_write+0xc4/0x128
        [c00000000345fcc0] [c000000000189658] .vfs_write+0xe8/0x1d8
        [c00000000345fd70] [c000000000189884] .SyS_write+0x68/0xb0
        [c00000000345fe30] [c000000000008564] syscall_exit+0x0/0x40
      
      afs_write_begin hits an error and calls afs_unlink_writeback. In there
      we do list_del_init on an uninitialised list.
      
      The patch below initialises ->link when creating the afs_writeback struct.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f129ccc9
  11. 25 2月, 2011 1 次提交
    • M
      fuse: fix truncate after open · 8d56addd
      Miklos Szeredi 提交于
      Commit e1181ee6 "vfs: pass struct file to do_truncate on O_TRUNC
      opens" broke the behavior of open(O_TRUNC|O_RDONLY) in fuse.  Fuse
      assumed that when called from open, a truncate() will be done, not an
      ftruncate().
      
      Fix by restoring the old behavior, based on the ATTR_OPEN flag.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      8d56addd