1. 01 11月, 2016 1 次提交
  2. 28 10月, 2016 1 次提交
    • C
      block: better op and flags encoding · ef295ecf
      Christoph Hellwig 提交于
      Now that we don't need the common flags to overflow outside the range
      of a 32-bit type we can encode them the same way for both the bio and
      request fields.  This in addition allows us to place the operation
      first (and make some room for more ops while we're at it) and to
      stop having to shift around the operation values.
      
      In addition this allows passing around only one value in the block layer
      instead of two (and eventuall also in the file systems, but we can do
      that later) and thus clean up a lot of code.
      
      Last but not least this allows decreasing the size of the cmd_flags
      field in struct request to 32-bits.  Various functions passing this
      value could also be updated, but I'd like to avoid the churn for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ef295ecf
  3. 18 10月, 2016 1 次提交
  4. 15 10月, 2016 6 次提交
  5. 14 10月, 2016 12 次提交
  6. 13 10月, 2016 11 次提交
  7. 12 10月, 2016 8 次提交
    • M
      fs: use mapping_set_error instead of opencoded set_bit · 5114a97a
      Michal Hocko 提交于
      The mapping_set_error() helper sets the correct AS_ flag for the mapping
      so there is no reason to open code it.  Use the helper directly.
      
      [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
      Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5114a97a
    • M
      treewide: remove redundant #include <linux/kconfig.h> · 97139d4a
      Masahiro Yamada 提交于
      Kernel source files need not include <linux/kconfig.h> explicitly
      because the top Makefile forces to include it with:
      
        -include $(srctree)/include/linux/kconfig.h
      
      This commit removes explicit includes except the following:
      
        * arch/s390/include/asm/facilities_src.h
        * tools/testing/radix-tree/linux/kernel.h
      
      These two are used for host programs.
      
      Link: http://lkml.kernel.org/r/1473656164-11929-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97139d4a
    • M
      pipe: cap initial pipe capacity according to pipe-max-size limit · 086e774a
      Michael Kerrisk (man-pages) 提交于
      This is a patch that provides behavior that is more consistent, and
      probably less surprising to users. I consider the change optional, and
      welcome opinions about whether it should be applied.
      
      By default, pipes are created with a capacity of 64 kiB.  However,
      /proc/sys/fs/pipe-max-size may be set smaller than this value.  In this
      scenario, an unprivileged user could thus create a pipe whose initial
      capacity exceeds the limit. Therefore, it seems logical to cap the
      initial pipe capacity according to the value of pipe-max-size.
      
      The test program shown earlier in this patch series can be used to
      demonstrate the effect of the change brought about with this patch:
      
          # cat /proc/sys/fs/pipe-max-size
          1048576
          # sudo -u mtk ./test_F_SETPIPE_SZ 1
          Initial pipe capacity: 65536
          # echo 10000 > /proc/sys/fs/pipe-max-size
          # cat /proc/sys/fs/pipe-max-size
          16384
          # sudo -u mtk ./test_F_SETPIPE_SZ 1
          Initial pipe capacity: 16384
          # ./test_F_SETPIPE_SZ 1
          Initial pipe capacity: 65536
      
      The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size
      caps the initial allocation for a new pipe for unprivileged users, but
      not for privileged users.
      
      Link: http://lkml.kernel.org/r/31dc7064-2a17-9c5b-1df1-4e3012ee992c@gmail.comSigned-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: NVegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: <socketpair@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      086e774a
    • M
      pipe: make account_pipe_buffers() return a value, and use it · 9c87bcf0
      Michael Kerrisk (man-pages) 提交于
      This is an optional patch, to provide a small performance
      improvement.  Alter account_pipe_buffers() so that it returns the
      new value in user->pipe_bufs. This means that we can refactor
      too_many_pipe_buffers_soft() and too_many_pipe_buffers_hard() to
      avoid the costs of repeated use of atomic_long_read() to get the
      value user->pipe_bufs.
      
      Link: http://lkml.kernel.org/r/93e5f193-1e5e-3e1f-3a20-eae79b7e1310@gmail.comSigned-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: NVegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: <socketpair@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c87bcf0
    • M
      pipe: fix limit checking in alloc_pipe_info() · a005ca0e
      Michael Kerrisk (man-pages) 提交于
      The limit checking in alloc_pipe_info() (used by pipe(2) and when
      opening a FIFO) has the following problems:
      
      (1) When checking capacity required for the new pipe, the checks against
          the limit in /proc/sys/fs/pipe-user-pages-{soft,hard} are made
          against existing consumption, and exclude the memory required for
          the new pipe capacity. As a consequence: (1) the memory allocation
          throttling provided by the soft limit does not kick in quite as
          early as it should, and (2) the user can overrun the hard limit.
      
      (2) As currently implemented, accounting and checking against the limits
          is done as follows:
      
          (a) Test whether the user has exceeded the limit.
          (b) Make new pipe buffer allocation.
          (c) Account new allocation against the limits.
      
          This is racey. Multiple processes may pass point (a) simultaneously,
          and then allocate pipe buffers that are accounted for only in step
          (c).  The race means that the user's pipe buffer allocation could be
          pushed over the limit (by an arbitrary amount, depending on how
          unlucky we were in the race). [Thanks to Vegard Nossum for spotting
          this point, which I had missed.]
      
      This patch addresses the above problems as follows:
      
      * Alter the checks against limits to include the memory required for the
        new pipe.
      * Re-order the accounting step so that it precedes the buffer allocation.
        If the accounting step determines that a limit has been reached, revert
        the accounting and cause the operation to fail.
      
      Link: http://lkml.kernel.org/r/8ff3e9f9-23f6-510c-644f-8e70cd1c0bd9@gmail.comSigned-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: NVegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: <socketpair@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a005ca0e
    • M
      pipe: simplify logic in alloc_pipe_info() · 09b4d199
      Michael Kerrisk (man-pages) 提交于
      Replace an 'if' block that covers most of the code in this function
      with a 'goto'. This makes the code a little simpler to read, and also
      simplifies the next patch (fix limit checking in alloc_pipe_info())
      
      Link: http://lkml.kernel.org/r/aef030c1-0257-98a9-4988-186efa48530c@gmail.comSigned-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: NVegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: <socketpair@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09b4d199
    • M
      pipe: fix limit checking in pipe_set_size() · b0b91d18
      Michael Kerrisk (man-pages) 提交于
      The limit checking in pipe_set_size() (used by fcntl(F_SETPIPE_SZ))
      has the following problems:
      
      (1) When increasing the pipe capacity, the checks against the limits in
          /proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing
          consumption, and exclude the memory required for the increased pipe
          capacity. The new increase in pipe capacity can then push the total
          memory used by the user for pipes (possibly far) over a limit. This
          can also trigger the problem described next.
      
      (2) The limit checks are performed even when the new pipe capacity is
          less than the existing pipe capacity. This can lead to problems if a
          user sets a large pipe capacity, and then the limits are lowered,
          with the result that the user will no longer be able to decrease the
          pipe capacity.
      
      (3) As currently implemented, accounting and checking against the
          limits is done as follows:
      
          (a) Test whether the user has exceeded the limit.
          (b) Make new pipe buffer allocation.
          (c) Account new allocation against the limits.
      
          This is racey. Multiple processes may pass point (a)
          simultaneously, and then allocate pipe buffers that are accounted
          for only in step (c).  The race means that the user's pipe buffer
          allocation could be pushed over the limit (by an arbitrary amount,
          depending on how unlucky we were in the race). [Thanks to Vegard
          Nossum for spotting this point, which I had missed.]
      
      This patch addresses the above problems as follows:
      
      * Perform checks against the limits only when increasing a pipe's
        capacity; an unprivileged user can always decrease a pipe's capacity.
      * Alter the checks against limits to include the memory required for
        the new pipe capacity.
      * Re-order the accounting step so that it precedes the buffer
        allocation. If the accounting step determines that a limit has
        been reached, revert the accounting and cause the operation to fail.
      
      The program below can be used to demonstrate problems 1 and 2, and the
      effect of the fix. The program takes one or more command-line arguments.
      The first argument specifies the number of pipes that the program should
      create. The remaining arguments are, alternately, pipe capacities that
      should be set using fcntl(F_SETPIPE_SZ), and sleep intervals (in
      seconds) between the fcntl() operations. (The sleep intervals allow the
      possibility to change the limits between fcntl() operations.)
      
      Problem 1
      =========
      
      Using the test program on an unpatched kernel, we first set some
      limits:
      
          # echo 0 > /proc/sys/fs/pipe-user-pages-soft
          # echo 1000000000 > /proc/sys/fs/pipe-max-size
          # echo 10000 > /proc/sys/fs/pipe-user-pages-hard    # 40.96 MB
      
      Then show that we can set a pipe with capacity (100MB) that is
      over the hard limit
      
          # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000
          Initial pipe capacity: 65536
              Loop 1: set pipe capacity to 100000000 bytes
                  F_SETPIPE_SZ returned 134217728
      
      Now set the capacity to 100MB twice. The second call fails (which is
      probably surprising to most users, since it seems like a no-op):
      
          # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000 0 100000000
          Initial pipe capacity: 65536
              Loop 1: set pipe capacity to 100000000 bytes
                  F_SETPIPE_SZ returned 134217728
              Loop 2: set pipe capacity to 100000000 bytes
                  Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted
      
      With a patched kernel, setting a capacity over the limit fails at the
      first attempt:
      
          # echo 0 > /proc/sys/fs/pipe-user-pages-soft
          # echo 1000000000 > /proc/sys/fs/pipe-max-size
          # echo 10000 > /proc/sys/fs/pipe-user-pages-hard
          # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000
          Initial pipe capacity: 65536
              Loop 1: set pipe capacity to 100000000 bytes
                  Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted
      
      There is a small chance that the change to fix this problem could
      break user-space, since there are cases where fcntl(F_SETPIPE_SZ)
      calls that previously succeeded might fail. However, the chances are
      small, since (a) the pipe-user-pages-{soft,hard} limits are new (in
      4.5), and the default soft/hard limits are high/unlimited.  Therefore,
      it seems warranted to make these limits operate more precisely (and
      behave more like what users probably expect).
      
      Problem 2
      =========
      
      Running the test program on an unpatched kernel, we first set some limits:
      
          # getconf PAGESIZE
          4096
          # echo 0 > /proc/sys/fs/pipe-user-pages-soft
          # echo 1000000000 > /proc/sys/fs/pipe-max-size
          # echo 10000 > /proc/sys/fs/pipe-user-pages-hard    # 40.96 MB
      
      Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe,
      first setting a pipe capacity (10MB), sleeping for a few seconds,
      during which time the hard limit is lowered, and then set pipe
      capacity to a smaller amount (5MB):
      
          # sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 &
          [1] 748
          # Initial pipe capacity: 65536
              Loop 1: set pipe capacity to 10000000 bytes
                  F_SETPIPE_SZ returned 16777216
                  Sleeping 15 seconds
      
          # echo 1000 > /proc/sys/fs/pipe-user-pages-hard      # 4.096 MB
          #     Loop 2: set pipe capacity to 5000000 bytes
                  Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted
      
      In this case, the user should be able to lower the limit.
      
      With a kernel that has the patch below, the second fcntl()
      succeeds:
      
          # echo 0 > /proc/sys/fs/pipe-user-pages-soft
          # echo 1000000000 > /proc/sys/fs/pipe-max-size
          # echo 10000 > /proc/sys/fs/pipe-user-pages-hard
          # sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 &
          [1] 3215
          # Initial pipe capacity: 65536
          #     Loop 1: set pipe capacity to 10000000 bytes
                  F_SETPIPE_SZ returned 16777216
                  Sleeping 15 seconds
      
          # echo 1000 > /proc/sys/fs/pipe-user-pages-hard
      
          #     Loop 2: set pipe capacity to 5000000 bytes
                  F_SETPIPE_SZ returned 8388608
      
      8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
      
      /* test_F_SETPIPE_SZ.c
      
         (C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later
      
         Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity
         and interactions with limits defined by /proc/sys/fs/pipe-* files.
      */
      
      #define _GNU_SOURCE
      #include <stdio.h>
      #include <stdlib.h>
      #include <fcntl.h>
      #include <unistd.h>
      
      int
      main(int argc, char *argv[])
      {
          int (*pfd)[2];
          int npipes;
          int pcap, rcap;
          int j, p, s, stime, loop;
      
          if (argc < 2) {
              fprintf(stderr, "Usage: %s num-pipes "
                      "[pipe-capacity sleep-time]...\n", argv[0]);
              exit(EXIT_FAILURE);
          }
      
          npipes = atoi(argv[1]);
      
          pfd = calloc(npipes, sizeof (int [2]));
          if (pfd == NULL) {
              perror("calloc");
              exit(EXIT_FAILURE);
          }
      
          for (j = 0; j < npipes; j++) {
              if (pipe(pfd[j]) == -1) {
                  fprintf(stderr, "Loop %d: pipe() failed: ", j);
                  perror("pipe");
                  exit(EXIT_FAILURE);
              }
          }
      
          printf("Initial pipe capacity: %d\n", fcntl(pfd[0][0], F_GETPIPE_SZ));
      
          for (j = 2; j < argc; j += 2 ) {
              loop = j / 2;
              pcap = atoi(argv[j]);
              printf("    Loop %d: set pipe capacity to %d bytes\n", loop, pcap);
      
              for (p = 0; p < npipes; p++) {
                  s = fcntl(pfd[p][0], F_SETPIPE_SZ, pcap);
                  if (s == -1) {
                      fprintf(stderr, "        Loop %d, pipe %d: F_SETPIPE_SZ "
                              "failed: ", loop, p);
                      perror("fcntl");
                      exit(EXIT_FAILURE);
                  }
      
                  if (p == 0) {
                      printf("        F_SETPIPE_SZ returned %d\n", s);
                      rcap = s;
                  } else {
                      if (s != rcap) {
                          fprintf(stderr, "        Loop %d, pipe %d: F_SETPIPE_SZ "
                                  "unexpected return: %d\n", loop, p, s);
                          exit(EXIT_FAILURE);
                      }
                  }
      
                  stime = (j + 1 < argc) ? atoi(argv[j + 1]) : 0;
                  if (stime > 0) {
                      printf("        Sleeping %d seconds\n", stime);
                      sleep(stime);
                  }
              }
          }
      
          exit(EXIT_SUCCESS);
      }
      
      8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
      
      Patch history:
      
      v2
         * Switch order of test in 'if' statement to avoid function call
            (to capability()) in normal path. [This is a fix to a preexisting
            wart in the code. Thanks to Willy Tarreau]
          * Perform (size > pipe_max_size) check before calling
            account_pipe_buffers().  [Thanks to Vegard Nossum]
            Quoting Vegard:
      
              The potential problem happens if the user passes a very large number
              which will overflow pipe->user->pipe_bufs.
      
              On 32-bit, sizeof(int) == sizeof(long), so if they pass arg = INT_MAX
              then round_pipe_size() returns INT_MAX. Although it's true that the
              accounting is done in terms of pages and not bytes, so you'd need on
              the order of (1 << 13) = 8192 processes hitting the limit at the same
              time in order to make it overflow, which seems a bit unlikely.
      
              (See https://lkml.org/lkml/2016/8/12/215 for another discussion on the
              limit checking)
      
      Link: http://lkml.kernel.org/r/1e464945-536b-2420-798b-e77b9c7e8593@gmail.comSigned-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: NVegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: <socketpair@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0b91d18
    • M
      pipe: refactor argument for account_pipe_buffers() · 3734a13b
      Michael Kerrisk (man-pages) 提交于
      This is a preparatory patch for following work. account_pipe_buffers()
      performs accounting in the 'user_struct'. There is no need to pass a
      pointer to a 'pipe_inode_info' struct (which is then dereferenced to
      obtain a pointer to the 'user' field). Instead, pass a pointer directly
      to the 'user_struct'. This change is needed in preparation for a
      subsequent patch that the fixes the limit checking in alloc_pipe_info()
      (and the resulting code is a little more logical).
      
      Link: http://lkml.kernel.org/r/7277bf8c-a6fc-4a7d-659c-f5b145c981ab@gmail.comSigned-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: NVegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: <socketpair@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3734a13b