1. 13 11月, 2019 2 次提交
    • T
      kernfs: use dumber locking for kernfs_find_and_get_node_by_ino() · b680b081
      Tejun Heo 提交于
      kernfs_find_and_get_node_by_ino() uses RCU protection.  It's currently
      a bit buggy because it can look up a node which hasn't been activated
      yet and thus may end up exposing a node that the kernfs user is still
      prepping.
      
      While it can be fixed by pushing it further in the current direction,
      it's already complicated and isn't clear whether the complexity is
      justified.  The main use of kernfs_find_and_get_node_by_ino() is for
      exportfs operations.  They aren't super hot and all the follow-up
      operations (e.g. mapping to path) use normal locking anyway.
      
      Let's switch to a dumber locking scheme and protect the lookup with
      kernfs_idr_lock.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      b680b081
    • T
      kernfs: fix ino wrap-around detection · e23f568a
      Tejun Heo 提交于
      When the 32bit ino wraps around, kernfs increments the generation
      number to distinguish reused ino instances.  The wrap-around detection
      tests whether the allocated ino is lower than what the cursor but the
      cursor is pointing to the next ino to allocate so the condition never
      triggers.
      
      Fix it by remembering the last ino and comparing against that.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Fixes: 4a3ef68a ("kernfs: implement i_generation")
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org # v4.14+
      e23f568a
  2. 08 8月, 2019 1 次提交
    • G
      Revert "kernfs: fix memleak in kernel_ops_readdir()" · 8097c43b
      Greg Kroah-Hartman 提交于
      This reverts commit cc798c83.
      
      Tony writes:
      	Somehow this causes a regression in Linux next for me where I'm
      	seeing lots of sysfs entries now missing under
      	/sys/bus/platform/devices.
      
      	For example, I now only see one .serial entry show up in sysfs.
      	Things work again if I revert commit cc798c83 ("kernfs: fix
      	memleak inkernel_ops_readdir()"). Any ideas why that would be?
      
      Tejun says:
      	Ugh, you're right.  It can get double-put cuz ctx->pos is put by
      	release too.
      
      So reverting it for now.
      Reported-by: NTony Lindgren <tony@atomide.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Fixes: cc798c83 ("kernfs: fix memleak in kernel_ops_readdir()")
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8097c43b
  3. 06 8月, 2019 1 次提交
  4. 25 7月, 2019 2 次提交
  5. 05 6月, 2019 1 次提交
  6. 26 4月, 2019 1 次提交
  7. 21 3月, 2019 3 次提交
    • O
      kernfs: initialize security of newly created nodes · e19dfdc8
      Ondrej Mosnacek 提交于
      Use the new security_kernfs_init_security() hook to allow LSMs to
      possibly assign a non-default security context to a newly created kernfs
      node based on the attributes of the new node and also its parent node.
      
      This fixes an issue with cgroupfs under SELinux, where newly created
      cgroup subdirectories/files would not inherit its parent's context if
      it had been set explicitly to a non-default value (other than the genfs
      context specified by the policy). This can be reproduced as follows (on
      Fedora/RHEL):
      
          # mkdir /sys/fs/cgroup/unified/test
          # # Need permissive to change the label under Fedora policy:
          # setenforce 0
          # chcon -t container_file_t /sys/fs/cgroup/unified/test
          # ls -lZ /sys/fs/cgroup/unified
          total 0
          -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.controllers
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.depth
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.descendants
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.procs
          -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.stat
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.subtree_control
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.threads
          drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 init.scope
          drwxr-xr-x. 26 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:21 system.slice
          drwxr-xr-x.  3 root root system_u:object_r:container_file_t:s0 0 Jan 29 03:15 test
          drwxr-xr-x.  3 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 user.slice
          # mkdir /sys/fs/cgroup/unified/test/subdir
      
      Actual result:
      
          # ls -ldZ /sys/fs/cgroup/unified/test/subdir
          drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir
      
      Expected result:
      
          # ls -ldZ /sys/fs/cgroup/unified/test/subdir
          drwxr-xr-x. 2 root root unconfined_u:object_r:container_file_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir
      
      Link: https://github.com/SELinuxProject/selinux-kernel/issues/39Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      e19dfdc8
    • O
      kernfs: use simple_xattrs for security attributes · 0ac6075a
      Ondrej Mosnacek 提交于
      Replace the special handling of security xattrs with simple_xattrs, as
      is already done for the trusted xattrs. This simplifies the code and
      allows LSMs to use more than just a single xattr to do their business.
      Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      [PM: manual merge fixes]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      0ac6075a
    • O
      kernfs: clean up struct kernfs_iattrs · 05895219
      Ondrej Mosnacek 提交于
      Right now, kernfs_iattrs embeds the whole struct iattr, even though it
      doesn't really use half of its fields... This both leads to wasting
      space and makes the code look awkward. Let's just list the few fields
      we need directly in struct kernfs_iattrs.
      Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      [PM: merged a number of chunks manually due to fuzz]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      05895219
  8. 08 2月, 2019 1 次提交
    • A
      kernfs: Allocating memory for kernfs_iattrs with kmem_cache. · 26e28d68
      Ayush Mittal 提交于
      Creating a new cache for kernfs_iattrs.
      Currently, memory is allocated with kzalloc() which
      always gives aligned memory. On ARM, this is 64 byte aligned.
      To avoid the wastage of memory in aligning the size requested,
      a new cache for kernfs_iattrs is created.
      
      Size of struct kernfs_iattrs is 80 Bytes.
      On ARM, it will come in kmalloc-128 slab.
      and it will come in kmalloc-192 slab if debug info is enabled.
      Extra bytes taken 48 bytes.
      
      Total number of objects created : 4096
      Total saving = 48*4096 = 192 KB
      
      After creating new slab(When debug info is enabled) :
      sh-3.2# cat /proc/slabinfo
      ...
      kernfs_iattrs_cache   4069   4096    128   32    1 : tunables    0    0    0 : slabdata    128    128      0
      ...
      
      All testing has been done on ARM target.
      Signed-off-by: NAyush Mittal <ayush.m@samsung.com>
      Signed-off-by: NVaneet Narang <v.narang@samsung.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26e28d68
  9. 21 7月, 2018 1 次提交
  10. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  11. 29 7月, 2017 5 次提交
  12. 10 2月, 2017 1 次提交
  13. 28 12月, 2016 1 次提交
  14. 08 10月, 2016 1 次提交
  15. 07 10月, 2016 1 次提交
  16. 27 9月, 2016 2 次提交
    • M
      fs: rename "rename2" i_op to "rename" · 2773bf00
      Miklos Szeredi 提交于
      Generated patch:
      
      sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
      sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2773bf00
    • M
      fs: make remaining filesystems use .rename2 · 1cd66c93
      Miklos Szeredi 提交于
      This is trivial to do:
      
       - add flags argument to foo_rename()
       - check if flags is zero
       - assign foo_rename() to .rename2 instead of .rename
      
      This doesn't mean it's impossible to support RENAME_NOREPLACE for these
      filesystems, but it is not trivial, like for local filesystems.
      RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
      for a file to be created on one host while it is overwritten by rename on
      another host).
      
      Filesystems converted:
      
      9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.
      
      After this, we can get rid of the duplicate interfaces for rename.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: David Howells <dhowells@redhat.com> [AFS]
      Acked-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jan Harkes <jaharkes@cs.cmu.edu>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      1cd66c93
  17. 10 8月, 2016 2 次提交
    • T
      kernfs: remove kernfs_path_len() · bb09c863
      Tejun Heo 提交于
      It doesn't have any in-kernel user and the same result can be obtained
      from kernfs_path(@kn, NULL, 0).  Remove it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      bb09c863
    • T
      kernfs: make kernfs_path*() behave in the style of strlcpy() · 3abb1d90
      Tejun Heo 提交于
      kernfs_path*() functions always return the length of the full path but
      the path content is undefined if the length is larger than the
      provided buffer.  This makes its behavior different from strlcpy() and
      requires error handling in all its users even when they don't care
      about truncation.  In addition, the implementation can actully be
      simplified by making it behave properly in strlcpy() style.
      
      * Update kernfs_path_from_node_locked() to always fill up the buffer
        with path.  If the buffer is not large enough, the output is
        truncated and terminated.
      
      * kernfs_path() no longer needs error handling.  Make it a simple
        inline wrapper around kernfs_path_from_node().
      
      * sysfs_warn_dup()'s use of kernfs_path() doesn't need error handling.
        Updated accordingly.
      
      * cgroup_path()'s use of kernfs_path() updated to retain the old
        behavior.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
      3abb1d90
  18. 11 6月, 2016 1 次提交
    • L
      vfs: make the string hashes salt the hash · 8387ff25
      Linus Torvalds 提交于
      We always mixed in the parent pointer into the dentry name hash, but we
      did it late at lookup time.  It turns out that we can simplify that
      lookup-time action by salting the hash with the parent pointer early
      instead of late.
      
      A few other users of our string hashes also wanted to mix in their own
      pointers into the hash, and those are updated to use the same mechanism.
      
      Hash users that don't have any particular initial salt can just use the
      NULL pointer as a no-salt.
      
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8387ff25
  19. 09 5月, 2016 1 次提交
  20. 03 5月, 2016 1 次提交
    • S
      kernfs_path_from_node_locked: don't overwrite nlen · e99ed4de
      Serge Hallyn 提交于
      We've calculated @len to be the bytes we need for '/..' entries from
      @kn_from to the common ancestor, and calculated @nlen to be the extra
      bytes we need to get from the common ancestor to @kn_to.  We use them
      as such at the end.  But in the loop copying the actual entries, we
      overwrite @nlen.  Use a temporary variable for that instead.
      
      Without this, the return length, when the buffer is large enough, is
      wrong.  (When the buffer is NULL or too small, the returned value is
      correct. The buffer contents are also correct.)
      
      Interestingly, no callers of this function are affected by this as of
      yet.  However the upcoming cgroup_show_path() will be.
      Signed-off-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
      e99ed4de
  21. 30 3月, 2016 1 次提交
    • D
      fs: kernfs: Replace CURRENT_TIME by current_fs_time() · 3a3a5fec
      Deepa Dinamani 提交于
      This is in preparation for the series that transitions
      filesystem timestamps to use 64 bit time and hence make
      them y2038 safe.
      
      CURRENT_TIME macro will be deleted before merging the
      aforementioned series.
      
      Use current_fs_time() instead of CURRENT_TIME for inode
      timestamps.
      
      struct kernfs_node is associated with a sysfs file/ directory.
      Truncate the values to appropriate time granularity when
      writing to inode timestamps of the files.
      
      ktime_get_real_ts() is used to obtain times for
      struct kernfs_iattrs. Since these times are later assigned to
      inode times using timespec_truncate() for all filesystem based
      operations, we can save the supers list traversal time here by
      using ktime_get_real_ts() directly.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a3a5fec
  22. 17 2月, 2016 1 次提交
  23. 08 2月, 2016 1 次提交
  24. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  25. 15 1月, 2016 1 次提交
    • V
      Revert "kernfs: do not account ino_ida allocations to memcg" · b2a209ff
      Vladimir Davydov 提交于
      Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
      alloc_kmem_pages call) are accounted to memory cgroup automatically.
      Callers have to explicitly opt out if they don't want/need accounting
      for some reason.  Such a design decision leads to several problems:
      
       - kmalloc users are highly sensitive to failures, many of them
         implicitly rely on the fact that kmalloc never fails, while memcg
         makes failures quite plausible.
      
       - A lot of objects are shared among different containers by design.
         Accounting such objects to one of containers is just unfair.
         Moreover, it might lead to pinning a dead memcg along with its kmem
         caches, which aren't tiny, which might result in noticeable increase
         in memory consumption for no apparent reason in the long run.
      
       - There are tons of short-lived objects. Accounting them to memcg will
         only result in slight noise and won't change the overall picture, but
         we still have to pay accounting overhead.
      
      For more info, see
      
       - http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
       - http://lkml.kernel.org/r/20151106090555.GK29259@esperanza
      
      Therefore this patchset switches to the white list policy.  Now kmalloc
      users have to explicitly opt in by passing __GFP_ACCOUNT flag.
      
      Currently, the list of accounted objects is quite limited and only
      includes those allocations that (1) are known to be easily triggered
      from userspace and (2) can fail gracefully (for the full list see patch
      no.  6) and it still misses many object types.  However, accounting only
      those objects should be a satisfactory approximation of the behavior we
      used to have for most sane workloads.
      
      This patch (of 6):
      
      Revert 499611ed ("kernfs: do not account ino_ida allocations
      to memcg").
      
      Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be
      fragile and difficult to maintain, because there seem to be many more
      allocations that should not be accounted than those that should be.
      Besides, false accounting an allocation might result in much worse
      consequences than not accounting at all, namely increased memory
      consumption due to pinned dead kmem caches.
      
      So it was decided to switch to the white-list policy.  This patch reverts
      bits introducing the black-list policy.  The white-list policy will be
      introduced later in the series.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2a209ff
  26. 21 11月, 2015 1 次提交
  27. 19 8月, 2015 1 次提交
  28. 01 7月, 2015 1 次提交
  29. 15 5月, 2015 1 次提交
    • V
      kernfs: do not account ino_ida allocations to memcg · 499611ed
      Vladimir Davydov 提交于
      root->ino_ida is used for kernfs inode number allocations. Since IDA has
      a layered structure, different IDs can reside on the same layer, which
      is currently accounted to some memory cgroup. The problem is that each
      kmem cache of a memory cgroup has its own directory on sysfs (under
      /sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a
      directory or any file in it gets allocated from a layer accounted to the
      cgroup which the cache is created for, the cgroup will get pinned for
      good, because one has to free all kmem allocations accounted to a cgroup
      in order to release it and destroy all its kmem caches. That said we
      must not account layers of ino_ida to any memory cgroup.
      
      Since per net init operations may create new sysfs entries directly
      (e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache
      per each namespace, which, in turn, creates new sysfs entries), an easy
      way to reproduce this issue is by creating network namespace(s) from
      inside a kmem-active memory cgroup.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.0.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      499611ed
  30. 16 4月, 2015 1 次提交