1. 22 1月, 2016 4 次提交
  2. 21 1月, 2016 14 次提交
  3. 20 1月, 2016 2 次提交
    • W
      pipe: limit the per-user amount of pages allocated in pipes · 759c0114
      Willy Tarreau 提交于
      On no-so-small systems, it is possible for a single process to cause an
      OOM condition by filling large pipes with data that are never read. A
      typical process filling 4000 pipes with 1 MB of data will use 4 GB of
      memory. On small systems it may be tricky to set the pipe max size to
      prevent this from happening.
      
      This patch makes it possible to enforce a per-user soft limit above
      which new pipes will be limited to a single page, effectively limiting
      them to 4 kB each, as well as a hard limit above which no new pipes may
      be created for this user. This has the effect of protecting the system
      against memory abuse without hurting other users, and still allowing
      pipes to work correctly though with less data at once.
      
      The limit are controlled by two new sysctls : pipe-user-pages-soft, and
      pipe-user-pages-hard. Both may be disabled by setting them to zero. The
      default soft limit allows the default number of FDs per process (1024)
      to create pipes of the default size (64kB), thus reaching a limit of 64MB
      before starting to create only smaller pipes. With 256 processes limited
      to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
      1084 MB of memory allocated for a user. The hard limit is disabled by
      default to avoid breaking existing applications that make intensive use
      of pipes (eg: for splicing).
      
      Reported-by: socketpair@gmail.com
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Mitigates: CVE-2013-4312 (Linux 2.0+)
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      759c0114
    • A
      find_filesystem(): simplify comparison · 558041d8
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      558041d8
  4. 19 1月, 2016 2 次提交
    • D
      xfs: log mount failures don't wait for buffers to be released · 85bec546
      Dave Chinner 提交于
      Recently I've been seeing xfs/051 fail on 1k block size filesystems.
      Trying to trace the events during the test lead to the problem going
      away, indicating that it was a race condition that lead to this
      ASSERT failure:
      
      XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156
      .....
      [<ffffffff814e1257>] xfs_free_perag+0x87/0xb0
      [<ffffffff814e21b9>] xfs_mountfs+0x4d9/0x900
      [<ffffffff814e5dff>] xfs_fs_fill_super+0x3bf/0x4d0
      [<ffffffff811d8800>] mount_bdev+0x180/0x1b0
      [<ffffffff814e3ff5>] xfs_fs_mount+0x15/0x20
      [<ffffffff811d90a8>] mount_fs+0x38/0x170
      [<ffffffff811f4347>] vfs_kern_mount+0x67/0x120
      [<ffffffff811f7018>] do_mount+0x218/0xd60
      [<ffffffff811f7e5b>] SyS_mount+0x8b/0xd0
      
      When I finally caught it with tracing enabled, I saw that AG 2 had
      an elevated reference count and a buffer was responsible for it. I
      tracked down the specific buffer, and found that it was missing the
      final reference count release that would put it back on the LRU and
      hence be found by xfs_wait_buftarg() calls in the log mount failure
      handling.
      
      The last four traces for the buffer before the assert were (trimmed
      for relevance)
      
      kworker/0:1-5259   xfs_buf_iodone:        hold 2  lock 0 flags ASYNC
      kworker/0:1-5259   xfs_buf_ioerror:       hold 2  lock 0 error -5
      mount-7163	   xfs_buf_lock_done:     hold 2  lock 0 flags ASYNC
      mount-7163	   xfs_buf_unlock:        hold 2  lock 1 flags ASYNC
      
      This is an async write that is completing, so there's nobody waiting
      for it directly.  Hence we call xfs_buf_relse() once all the
      processing is complete. That does:
      
      static inline void xfs_buf_relse(xfs_buf_t *bp)
      {
      	xfs_buf_unlock(bp);
      	xfs_buf_rele(bp);
      }
      
      Now, it's clear that mount is waiting on the buffer lock, and that
      it has been released by xfs_buf_relse() and gained by mount. This is
      expected, because at this point the mount process is in
      xfs_buf_delwri_submit() waiting for all the IO it submitted to
      complete.
      
      The mount process, however, is waiting on the lock for the buffer
      because it is in xfs_buf_delwri_submit(). This waits for IO
      completion, but it doesn't wait for the buffer reference owned by
      the IO to go away. The mount process collects all the completions,
      fails the log recovery, and the higher level code then calls
      xfs_wait_buftarg() to free all the remaining buffers in the
      filesystem.
      
      The issue is that on unlocking the buffer, the scheduler has decided
      that the mount process has higher priority than the the kworker
      thread that is running the IO completion, and so immediately
      switched contexts to the mount process from the semaphore unlock
      code, hence preventing the kworker thread from finishing the IO
      completion and releasing the IO reference to the buffer.
      
      Hence by the time that xfs_wait_buftarg() is run, the buffer still
      has an active reference and so isn't on the LRU list that the
      function walks to free the remaining buffers. Hence we miss that
      buffer and continue onwards to tear down the mount structures,
      at which time we get find a stray reference count on the perag
      structure. On a non-debug kernel, this will be ignored and the
      structure torn down and freed. Hence when the kworker thread is then
      rescheduled and the buffer released and freed, it will access a
      freed perag structure.
      
      The problem here is that when the log mount fails, we still need to
      quiesce the log to ensure that the IO workqueues have returned to
      idle before we run xfs_wait_buftarg(). By synchronising the
      workqueues, we ensure that all IO completions are fully processed,
      not just to the point where buffers have been unlocked. This ensures
      we don't end up in the situation above.
      
      cc: <stable@vger.kernel.org> # 3.18
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      85bec546
    • D
      Revert "xfs: clear PF_NOFREEZE for xfsaild kthread" · 3e85286e
      Dave Chinner 提交于
      This reverts commit 24ba16bb as it
      prevents machines from suspending. This regression occurs when the
      xfsaild is idle on entry to suspend, and so there s no activity to
      wake it from it's idle sleep and hence see that it is supposed to
      freeze. Hence the freezer times out waiting for it and suspend is
      cancelled.
      
      There is no obvious fix for this short of freezing the filesystem
      properly, so revert this change for now.
      
      cc: <stable@vger.kernel.org> # 4.4
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      3e85286e
  5. 17 1月, 2016 1 次提交
  6. 16 1月, 2016 17 次提交