1. 13 2月, 2019 4 次提交
  2. 16 1月, 2019 2 次提交
    • J
      fuse: call pipe_buf_release() under pipe lock · 9509941e
      Jann Horn 提交于
      Some of the pipe_buf_release() handlers seem to assume that the pipe is
      locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page
      without taking any extra locks. From a glance through the callers of
      pipe_buf_release(), it looks like FUSE is the only one that calls
      pipe_buf_release() without having the pipe locked.
      
      This bug should only lead to a memory leak, nothing terrible.
      
      Fixes: dd3bb14f ("fuse: support splice() writing to fuse device")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9509941e
    • M
      fuse: handle zero sized retrieve correctly · 97e1532e
      Miklos Szeredi 提交于
      Dereferencing req->page_descs[0] will Oops if req->max_pages is zero.
      
      Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
      Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
      Fixes: b2430d75 ("fuse: add per-page descriptor <offset, length> to fuse_req")
      Cc: <stable@vger.kernel.org> # v3.9
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      97e1532e
  3. 09 11月, 2018 2 次提交
    • M
      fuse: fix possibly missed wake-up after abort · 2d84a2d1
      Miklos Szeredi 提交于
      In current fuse_drop_waiting() implementation it's possible that
      fuse_wait_aborted() will not be woken up in the unlikely case that
      fuse_abort_conn() + fuse_wait_aborted() runs in between checking
      fc->connected and calling atomic_dec(&fc->num_waiting).
      
      Do the atomic_dec_and_test() unconditionally, which also provides the
      necessary barrier against reordering with the fc->connected check.
      
      The explicit smp_mb() in fuse_wait_aborted() is not actually needed, since
      the spin_unlock() in fuse_abort_conn() provides the necessary RELEASE
      barrier after resetting fc->connected.  However, this is not a performance
      sensitive path, and adding the explicit barrier makes it easier to
      document.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: b8f95e5d ("fuse: umount should wait for all requests")
      Cc: <stable@vger.kernel.org> #v4.19
      2d84a2d1
    • M
      fuse: fix leaked notify reply · 7fabaf30
      Miklos Szeredi 提交于
      fuse_request_send_notify_reply() may fail if the connection was reset for
      some reason (e.g. fs was unmounted).  Don't leak request reference in this
      case.  Besides leaking memory, this resulted in fc->num_waiting not being
      decremented and hence fuse_wait_aborted() left in a hanging and unkillable
      state.
      
      Fixes: 2d45ba38 ("fuse: add retrieve request")
      Fixes: b8f95e5d ("fuse: umount should wait for all requests")
      Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org> #v2.6.36
      7fabaf30
  4. 01 10月, 2018 3 次提交
    • M
      fuse: realloc page array · e52a8250
      Miklos Szeredi 提交于
      Writeback caching currently allocates requests with the maximum number of
      possible pages, while the actual number of pages per request depends on a
      couple of factors that cannot be determined when the request is allocated
      (whether page is already under writeback, whether page is contiguous with
      previous pages already added to a request).
      
      This patch allows such requests to start with no page allocation (all pages
      inline) and grow the page array on demand.
      
      If the max_pages tunable remains the default value, then this will mean
      just one allocation that is the same size as before.  If the tunable is
      larger, then this adds at most 3 additional memory allocations (which is
      generously compensated by the improved performance from the larger
      request).
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e52a8250
    • C
      fuse: add max_pages to init_out · 5da784cc
      Constantine Shulyupin 提交于
      Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to
      improve performance.
      
      Old RFC with detailed description of the problem and many fixes by Mitsuo
      Hayasaka (mitsuo.hayasaka.hu@hitachi.com):
       - https://lkml.org/lkml/2012/7/5/136
      
      We've encountered performance degradation and fixed it on a big and complex
      virtual environment.
      
      Environment to reproduce degradation and improvement:
      
      1. Add lag to user mode FUSE
      Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in
      passthrough_fh.c
      
      2. patch UM fuse with configurable max_pages parameter. The patch will be
      provided latter.
      
      3. run test script and perform test on tmpfs
      fuse_test()
      {
      
             cd /tmp
             mkdir -p fusemnt
             passthrough_fh -o max_pages=$1 /tmp/fusemnt
             grep fuse /proc/self/mounts
             dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \
      		count=1K bs=1M 2>&1 | grep -v records
             rm fusemnt/tmp/tmp
             killall passthrough_fh
      }
      
      Test results:
      
      passthrough_fh /tmp/fusemnt fuse.passthrough_fh \
      	rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0
      1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s
      
      passthrough_fh /tmp/fusemnt fuse.passthrough_fh \
      	rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0
      1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s
      
      Obviously with bigger lag the difference between 'before' and 'after'
      will be more significant.
      
      Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136),
      observed improvement from 400-550 to 520-740.
      Signed-off-by: NConstantine Shulyupin <const@MakeLinux.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5da784cc
    • M
      fuse: allocate page array more efficiently · 8a7aa286
      Miklos Szeredi 提交于
      When allocating page array for a request the array for the page pointers
      and the array for page descriptors are allocated by two separate kmalloc()
      calls.  Merge these into one allocation.
      
      Also instead of initializing the request and the page arrays with memset(),
      use the zeroing allocation variants.
      
      Reserved requests never carry pages (page array size is zero). Make that
      explicit by initializing the page array pointers to NULL and make sure the
      assumption remains true by adding a WARN_ON().
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8a7aa286
  5. 28 9月, 2018 10 次提交
  6. 26 7月, 2018 7 次提交
  7. 13 6月, 2018 1 次提交
    • K
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook 提交于
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6da2ec56
  8. 31 5月, 2018 1 次提交
    • T
      fuse: fix congested state leak on aborted connections · 8a301eb1
      Tejun Heo 提交于
      If a connection gets aborted while congested, FUSE can leave
      nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
      wait spuriously which can lead to severe performance degradation.
      
      The leak is caused by gating congestion state clearing with
      fc->connected test in request_end().  This was added way back in 2009
      by 26c36791 ("fuse: destroy bdi on umount").  While the commit
      description doesn't explain why the test was added, it most likely was
      to avoid dereferencing bdi after it got destroyed.
      
      Since then, bdi lifetime rules have changed many times and now we're
      always guaranteed to have access to the bdi while the superblock is
      alive (fc->sb).
      
      Drop fc->connected conditional to avoid leaking congestion states.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJoshua Miller <joshmiller@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: stable@vger.kernel.org # v2.6.29+
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8a301eb1
  9. 21 3月, 2018 4 次提交
    • E
      fuse: Support fuse filesystems outside of init_user_ns · 8cb08329
      Eric W. Biederman 提交于
      In order to support mounts from namespaces other than init_user_ns, fuse
      must translate uids and gids to/from the userns of the process servicing
      requests on /dev/fuse. This patch does that, with a couple of restrictions
      on the namespace:
      
       - The userns for the fuse connection is fixed to the namespace
         from which /dev/fuse is opened.
      
       - The namespace must be the same as s_user_ns.
      
      These restrictions simplify the implementation by avoiding the need to pass
      around userns references and by allowing fuse to rely on the checks in
      setattr_prepare for ownership changes.  Either restriction could be relaxed
      in the future if needed.
      
      For cuse the userns used is the opener of /dev/cuse.  Semantically the cuse
      support does not appear safe for unprivileged users.  Practically the
      permissions on /dev/cuse only make it accessible to the global root user.
      If something slips through the cracks in a user namespace the only users
      who will be able to use the cuse device are those users mapped into the
      user namespace.
      
      Translation in the posix acl is updated to use the uuser namespace of the
      filesystem.  Avoiding cases which might bypass this translation is handled
      in a following change.
      
      This change is stronlgy based on a similar change from Seth Forshee and
      Dongsu Park.
      
      Cc: Seth Forshee <seth.forshee@canonical.com>
      Cc: Dongsu Park <dongsu@kinvolk.io>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8cb08329
    • E
      fuse: Fail all requests with invalid uids or gids · c9582eb0
      Eric W. Biederman 提交于
      Upon a cursory examinination the uid and gid of a fuse request are
      necessary for correct operation.  Failing a fuse request where those
      values are not reliable seems a straight forward and reliable means of
      ensuring that fuse requests with bad data are not sent or processed.
      
      In most cases the vfs will avoid actions it suspects will cause
      an inode write back of an inode with an invalid uid or gid.  But that does
      not map precisely to what fuse is doing, so test for this and solve
      this at the fuse level as well.
      
      Performing this work in fuse_req_init_context is cheap as the code is
      already performing the translation here and only needs to check the
      result of the translation to see if things are not representable in
      a form the fuse server can handle.
      
      [SzM] Don't zero the context for the nofail case, just keep using the
      munging version (makes sense for debugging and doesn't hurt).
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c9582eb0
    • E
      fuse: Remove the buggy retranslation of pids in fuse_dev_do_read · dbf107b2
      Eric W. Biederman 提交于
      At the point of fuse_dev_do_read the user space process that initiated the
      action on the fuse filesystem may no longer exist.  The process have been
      killed or may have fired an asynchronous request and exited.
      
      If the initial process has exited, the code "pid_vnr(find_pid_ns(in->h.pid,
      fc->pid_ns)" will either return a pid of 0, or in the unlikely event that
      the pid has been reallocated it can return practically any pid.  Any pid is
      possible as the pid allocator allocates pid numbers in different pid
      namespaces independently.
      
      The only way to make translation in fuse_dev_do_read reliable is to call
      get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in
      fuse_dev_do_read.  That reference counting in other contexts has been shown
      to bounce cache lines between processors and in general be slow.  So that
      is not desirable.
      
      The only known user of running the fuse server in a different pid namespace
      from the filesystem does not care what the pids are in the fuse messages so
      removing this code should not matter.
      
      Getting the translation to a server running outside of the pid namespace of
      a container can still be achieved by playing setns games at mount time.  It
      is also possible to add an option to pass a pid namespace into the fuse
      filesystem at mount time.
      
      Fixes: 5d6d3a30 ("fuse: allow server to run in different pid_ns")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      dbf107b2
    • S
      fuse: return -ECONNABORTED on /dev/fuse read after abort · 3b7008b2
      Szymon Lukasz 提交于
      Currently the userspace has no way of knowing whether the fuse
      connection ended because of umount or abort via sysfs. It makes it hard
      for filesystems to free the mountpoint after abort without worrying
      about removing some new mount.
      
      The patch fixes it by returning different errors when userspace reads
      from /dev/fuse (-ENODEV for umount and -ECONNABORTED for abort).
      
      Add a new capability flag FUSE_ABORT_ERROR. If set and the connection is
      gone because of sysfs abort, reading from the device will return
      -ECONNABORTED.
      Signed-off-by: NSzymon Lukasz <noh4hss@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3b7008b2
  10. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  11. 28 11月, 2017 1 次提交
  12. 16 11月, 2017 1 次提交
  13. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  14. 12 9月, 2017 1 次提交
    • M
      fuse: allow server to run in different pid_ns · 5d6d3a30
      Miklos Szeredi 提交于
      Commit 0b6e9ea0 ("fuse: Add support for pid namespaces") broke
      Sandstorm.io development tools, which have been sending FUSE file
      descriptors across PID namespace boundaries since early 2014.
      
      The above patch added a check that prevented I/O on the fuse device file
      descriptor if the pid namespace of the reader/writer was different from the
      pid namespace of the mounter.  With this change passing the device file
      descriptor to a different pid namespace simply doesn't work.  The check was
      added because pids are transferred to/from the fuse userspace server in the
      namespace registered at mount time.
      
      To fix this regression, remove the checks and do the following:
      
      1) the pid in the request header (the pid of the task that initiated the
      filesystem operation) is translated to the reader's pid namespace.  If a
      mapping doesn't exist for this pid, then a zero pid is used.  Note: even if
      a mapping would exist between the initiator task's pid namespace and the
      reader's pid namespace the pid will be zero if either mapping from
      initator's to mounter's namespace or mapping from mounter's to reader's
      namespace doesn't exist.
      
      2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone.
      Userspace should not interpret this value anyway.  Also allow the
      setlk/setlkw operations if the pid of the task cannot be represented in the
      mounter's namespace (pid being zero in that case).
      Reported-by: NKenton Varda <kenton@sandstorm.io>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 0b6e9ea0 ("fuse: Add support for pid namespaces")
      Cc: <stable@vger.kernel.org> # v4.12+
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Seth Forshee <seth.forshee@canonical.com>
      5d6d3a30
  15. 21 4月, 2017 1 次提交