1. 17 4月, 2018 1 次提交
  2. 16 4月, 2018 1 次提交
  3. 27 3月, 2018 1 次提交
  4. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  5. 28 11月, 2017 1 次提交
  6. 18 11月, 2017 1 次提交
  7. 28 9月, 2016 1 次提交
  8. 24 6月, 2016 1 次提交
    • E
      vfs: Pass data, ns, and ns->userns to mount_ns · d91ee87d
      Eric W. Biederman 提交于
      Today what is normally called data (the mount options) is not passed
      to fill_super through mount_ns.
      
      Pass the mount options and the namespace separately to mount_ns so
      that filesystems such as proc that have mount options, can use
      mount_ns.
      
      Pass the user namespace to mount_ns so that the standard permission
      check that verifies the mounter has permissions over the namespace can
      be performed in mount_ns instead of in each filesystems .mount method.
      Thus removing the duplication between mqueuefs and proc in terms of
      permission checks.  The extra permission check does not currently
      affect the rpc_pipefs filesystem and the nfsd filesystem as those
      filesystems do not currently allow unprivileged mounts.  Without
      unpvileged mounts it is guaranteed that the caller has already passed
      capable(CAP_SYS_ADMIN) which guarantees extra permission check will
      pass.
      
      Update rpc_pipefs and the nfsd filesystem to ensure that the network
      namespace reference is always taken in fill_super and always put in kill_sb
      so that the logic is simpler and so that errors originating inside of
      fill_super do not cause a network namespace leak.
      Acked-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d91ee87d
  9. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  10. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  11. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  12. 16 4月, 2015 1 次提交
  13. 13 7月, 2014 1 次提交
  14. 11 12月, 2013 1 次提交
  15. 07 12月, 2013 4 次提交
  16. 16 11月, 2013 1 次提交
  17. 25 10月, 2013 1 次提交
  18. 01 9月, 2013 2 次提交
  19. 30 8月, 2013 2 次提交
  20. 24 7月, 2013 1 次提交
  21. 14 7月, 2013 3 次提交
  22. 10 7月, 2013 2 次提交
  23. 29 6月, 2013 2 次提交
    • S
      SUNRPC: fix races on PipeFS UMOUNT notifications · adb6fa7f
      Stanislav Kinsbursky 提交于
      CPU#0                                   CPU#1
      -----------------------------           -----------------------------
      rpc_kill_sb
      sn->pipefs_sb = NULL                    rpc_release_client
      (UMOUNT_EVENT)                          rpc_free_auth
      rpc_pipefs_event
      rpc_get_client_for_event
      !atomic_inc_not_zero(cl_count)
      <skip the client>
                                              atomic_inc(cl_count)
                                              rpc_free_client
                                              rpc_clnt_remove_pipedir
                                              <skip client dir removing>
      
      To fix this, this patch does the following:
      
      1) Calls RPC_PIPEFS_UMOUNT notification with sn->pipefs_sb_lock being held.
      2) Removes SUNRPC client from the list AFTER pipes destroying.
      3) Doesn't hold RPC client on notification: if client in the list, then it
      can't be destroyed while sn->pipefs_sb_lock in hold by notification caller.
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      adb6fa7f
    • S
      SUNRPC: fix races on PipeFS MOUNT notifications · 38481605
      Stanislav Kinsbursky 提交于
      Below are races, when RPC client can be created without PiepFS dentries
      
      CPU#0					CPU#1
      -----------------------------		-----------------------------
      rpc_new_client				rpc_fill_super
      rpc_setup_pipedir
      mutex_lock(&sn->pipefs_sb_lock)
      rpc_get_sb_net == NULL
      (no per-net PipeFS superblock)
      					sn->pipefs_sb = sb;
      					notifier_call_chain(MOUNT)
      					(client is not in the list)
      rpc_register_client
      (client without pipes dentries)
      
      To fix this patch:
      1) makes PipeFS mount notification call with pipefs_sb_lock being held.
      2) releases pipefs_sb_lock on new SUNRPC client creation only after
      registration.
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      38481605
  24. 19 6月, 2013 1 次提交
    • J
      rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set · e401452d
      Jeff Layton 提交于
      We had a report of a reproducible WARNING:
      
      [ 1360.039358] ------------[ cut here ]------------
      [ 1360.043978] WARNING: at fs/dcache.c:1355 d_set_d_op+0x8d/0xc0()
      [ 1360.049880] Hardware name: HP Z200 Workstation
      [ 1360.054308] Modules linked in: nfsv4 nfs dns_resolver fscache nfsd
      auth_rpcgss nfs_acl lockd sunrpc sg acpi_cpufreq mperf coretemp kvm_intel kvm
      snd_hda_codec_realtek snd_hda_intel snd_hda_codec hp_wmi crc32c_intel
      snd_hwdep e1000e snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd
      sparse_keymap rfkill soundcore serio_raw ptp iTCO_wdt pps_core pcspkr
      iTCO_vendor_support mei microcode lpc_ich mfd_core wmi xfs libcrc32c sr_mod
      sd_mod cdrom crc_t10dif radeon i2c_algo_bit drm_kms_helper ttm ahci libahci
      drm i2c_core libata dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
      auth_rpcgss]
      [ 1360.107406] Pid: 8814, comm: mount.nfs4 Tainted: G         I --------------   3.9.0-0.55.el7.x86_64 #1
      [ 1360.116771] Call Trace:
      [ 1360.119219]  [<ffffffff810610c0>] warn_slowpath_common+0x70/0xa0
      [ 1360.125208]  [<ffffffff810611aa>] warn_slowpath_null+0x1a/0x20
      [ 1360.131025]  [<ffffffff811af46d>] d_set_d_op+0x8d/0xc0
      [ 1360.136159]  [<ffffffffa05a7d6f>] __rpc_lookup_create_exclusive+0x4f/0x80 [sunrpc]
      [ 1360.143710]  [<ffffffffa05a8cc6>] rpc_mkpipe_dentry+0x86/0x170 [sunrpc]
      [ 1360.150311]  [<ffffffffa062a7b6>] nfs_idmap_new+0x96/0x130 [nfsv4]
      [ 1360.156475]  [<ffffffffa062e7cd>] nfs4_init_client+0xad/0x2d0 [nfsv4]
      [ 1360.162902]  [<ffffffff812f02df>] ? idr_get_empty_slot+0x16f/0x3c0
      [ 1360.169062]  [<ffffffff812f0582>] ? idr_mark_full+0x52/0x60
      [ 1360.174615]  [<ffffffff812f0699>] ? idr_alloc+0x79/0xe0
      [ 1360.179826]  [<ffffffffa0598081>] ? __rpc_init_priority_wait_queue+0x81/0xc0 [sunrpc]
      [ 1360.187635]  [<ffffffffa05980f3>] ? rpc_init_wait_queue+0x13/0x20 [sunrpc]
      [ 1360.194493]  [<ffffffffa05d05da>] nfs_get_client+0x27a/0x350 [nfs]
      [ 1360.200666]  [<ffffffffa062e438>] nfs4_set_client.isra.8+0x78/0x100 [nfsv4]
      [ 1360.207624]  [<ffffffffa062f2f3>] nfs4_create_server+0xf3/0x3a0 [nfsv4]
      [ 1360.214222]  [<ffffffffa06284be>] nfs4_remote_mount+0x2e/0x60 [nfsv4]
      [ 1360.220644]  [<ffffffff8119ea79>] mount_fs+0x39/0x1b0
      [ 1360.225691]  [<ffffffff81153880>] ? __alloc_percpu+0x10/0x20
      [ 1360.231348]  [<ffffffff811b7ccf>] vfs_kern_mount+0x5f/0xf0
      [ 1360.236822]  [<ffffffffa0628396>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
      [ 1360.243246]  [<ffffffffa06287b4>] nfs4_try_mount+0x44/0xc0 [nfsv4]
      [ 1360.249410]  [<ffffffffa05d1457>] ? get_nfs_version+0x27/0x80 [nfs]
      [ 1360.255659]  [<ffffffffa05db985>] nfs_fs_mount+0x5c5/0xd10 [nfs]
      [ 1360.261650]  [<ffffffffa05dc550>] ? nfs_clone_super+0x140/0x140 [nfs]
      [ 1360.268074]  [<ffffffffa05da8e0>] ? param_set_portnr+0x60/0x60 [nfs]
      [ 1360.274406]  [<ffffffff8119ea79>] mount_fs+0x39/0x1b0
      [ 1360.279443]  [<ffffffff81153880>] ? __alloc_percpu+0x10/0x20
      [ 1360.285088]  [<ffffffff811b7ccf>] vfs_kern_mount+0x5f/0xf0
      [ 1360.290556]  [<ffffffff811b9f5d>] do_mount+0x1fd/0xa00
      [ 1360.295677]  [<ffffffff81137dee>] ? __get_free_pages+0xe/0x50
      [ 1360.301405]  [<ffffffff811b9be6>] ? copy_mount_options+0x36/0x170
      [ 1360.307479]  [<ffffffff811ba7e3>] sys_mount+0x83/0xc0
      [ 1360.312515]  [<ffffffff8160ad59>] system_call_fastpath+0x16/0x1b
      [ 1360.318503] ---[ end trace 8fa1f4cbc36094a7 ]---
      
      The problem is that we're ending up in __rpc_lookup_create_exclusive
      with a negative dentry that already has d_op set. A little debugging
      has shown that when we hit this, the d_ops are already set to
      simple_dentry_operations.
      
      I believe that what's happening is that during a mount, idmapd is racing
      in and doing a lookup of /var/lib/nfs/rpc_pipefs/nfs/clnt???/idmap.
      Before that dentry reference is released, the kernel races in to create
      that file and finds the new negative dentry, which already has the
      d_op set.
      
      This patch just avoids setting the d_op if it's already set.
      simple_dentry_operations and rpc_dentry_operations are functionally
      equivalent so it shouldn't matter which one it's set to.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      e401452d
  25. 16 5月, 2013 2 次提交
  26. 13 3月, 2013 1 次提交
    • E
      fs: Readd the fs module aliases. · fa7614dd
      Eric W. Biederman 提交于
      I had assumed that the only use of module aliases for filesystems
      prior to "fs: Limit sys_mount to only request filesystem modules."
      was in request_module.  It turns out I was wrong.  At least mkinitcpio
      in Arch linux uses these aliases.
      
      So readd the preexising aliases, to keep from breaking userspace.
      
      Userspace eventually will have to follow and use the same aliases the
      kernel does.  So at some point we may be delete these aliases without
      problems.  However that day is not today.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      fa7614dd
  27. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  28. 23 2月, 2013 1 次提交
  29. 09 11月, 2012 1 次提交
  30. 05 11月, 2012 1 次提交