1. 07 9月, 2017 4 次提交
  2. 07 7月, 2017 2 次提交
  3. 04 5月, 2017 1 次提交
  4. 21 4月, 2017 1 次提交
  5. 25 2月, 2017 1 次提交
  6. 20 2月, 2017 1 次提交
  7. 13 12月, 2016 1 次提交
  8. 29 10月, 2016 2 次提交
  9. 18 10月, 2016 1 次提交
  10. 03 10月, 2016 1 次提交
  11. 28 7月, 2016 3 次提交
    • N
      ceph: Mark the file cache as unreclaimable · 6b1a9a6c
      Nikolay Borisov 提交于
      Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so
      that it can satisfy its internal needs. Inspecting the code shows that
      most of the caches are indeed reclaimable since they are directly
      related to the generic inode/dentry shrinkers. However, one of the
      cache used to satisfy struct file is not reclaimable since its
      entries are freed only when the last reference to the file is
      dropped. If a heavily loaded node opens a lot of files it can
      introduce non-trivial discrepancies between memory shown as reclaimable
      and what is actually reclaimed when drop_caches is used.
      
      Fix this by removing the reclaimable flag for the file's cache.
      Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      6b1a9a6c
    • Y
      ceph: mount non-default filesystem by name · 430afbad
      Yan, Zheng 提交于
      To mount non-default filesytem, user currently needs to provide mds
      namespace ID. This is inconvenience.
      
      This patch makes user be able to mount filesystem by name. If user
      wants to mount non-default filesystem. Client first subscribes to
      fsmap.user. Subscribe to mdsmap.<ID> after getting ID of filesystem.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      430afbad
    • Y
      ceph: wait unsafe sync writes for evicting inode · 9a5530c6
      Yan, Zheng 提交于
      Otherwise ceph_sync_write_unsafe() may access/modify freed inode.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      9a5530c6
  12. 26 5月, 2016 3 次提交
  13. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  14. 26 3月, 2016 4 次提交
    • Y
      ceph: fix mounting same fs multiple times · 132ca7e1
      Yan, Zheng 提交于
      Now __ceph_open_session() only accepts closed client. An opened
      client will tigger BUG_ON().
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      132ca7e1
    • I
      ceph: kill ceph_empty_snapc · 34b759b4
      Ilya Dryomov 提交于
      ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
      ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
      equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
      for sizing the request message.
      
      Further, in all four cases the subsequent ceph_osdc_build_request() is
      passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
      and making ceph_empty_snapc entirely useless.  The two cases where it
      actually mattered were removed in commits 86056090 ("ceph: avoid
      sending unnessesary FLUSHSNAP message") and 23078637 ("ceph: fix
      queuing inode to mdsdir's snaprealm").
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      34b759b4
    • Y
      ceph: don't enable rbytes mount option by default · 133e9156
      Yan, Zheng 提交于
      When rbytes mount option is enabled, directory size is recursive
      size. Recursive size is not updated instantly. This can cause
      directory size to change between successive stat(1)
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      133e9156
    • I
      libceph: revamp subs code, switch to SUBSCRIBE2 protocol · 82dcabad
      Ilya Dryomov 提交于
      It is currently hard-coded in the mon_client that mdsmap and monmap
      subs are continuous, while osdmap sub is always "onetime".  To better
      handle full clusters/pools in the osd_client, we need to be able to
      issue continuous osdmap subs.  Revamp subs code to allow us to specify
      for each sub whether it should be continuous or not.
      
      Although not strictly required for the above, switch to SUBSCRIBE2
      protocol while at it, eliminating the ambiguity between a request for
      "every map since X" and a request for "just the latest" when we don't
      have a map yet (i.e. have epoch 0).  SUBSCRIBE2 feature bit is now
      required - it's been supported since pre-argonaut (2010).
      
      Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
      in before we validate the epoch and successfully install the new map
      can mess up mon_client sub state.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      82dcabad
  15. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  16. 09 9月, 2015 1 次提交
  17. 05 9月, 2015 1 次提交
    • K
      fs: create and use seq_show_option for escaping · a068acf2
      Kees Cook 提交于
      Many file systems that implement the show_options hook fail to correctly
      escape their output which could lead to unescaped characters (e.g.  new
      lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
      could lead to confusion, spoofed entries (resulting in things like
      systemd issuing false d-bus "mount" notifications), and who knows what
      else.  This looks like it would only be the root user stepping on
      themselves, but it's possible weird things could happen in containers or
      in other situations with delegated mount privileges.
      
      Here's an example using overlay with setuid fusermount trusting the
      contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
      of "sudo" is something more sneaky:
      
        $ BASE="ovl"
        $ MNT="$BASE/mnt"
        $ LOW="$BASE/lower"
        $ UP="$BASE/upper"
        $ WORK="$BASE/work/ 0 0
        none /proc fuse.pwn user_id=1000"
        $ mkdir -p "$LOW" "$UP" "$WORK"
        $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
        $ cat /proc/mounts
        none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
        none /proc fuse.pwn user_id=1000 0 0
        $ fusermount -u /proc
        $ cat /proc/mounts
        cat: /proc/mounts: No such file or directory
      
      This fixes the problem by adding new seq_show_option and
      seq_show_option_n helpers, and updating the vulnerable show_option
      handlers to use them as needed.  Some, like SELinux, need to be open
      coded due to unusual existing escape mechanisms.
      
      [akpm@linux-foundation.org: add lost chunk, per Kees]
      [keescook@chromium.org: seq_show_option should be using const parameters]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NJan Kara <jack@suse.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Cc: J. R. Okajima <hooanon05g@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a068acf2
  18. 25 6月, 2015 3 次提交
  19. 20 4月, 2015 3 次提交
  20. 16 4月, 2015 1 次提交
  21. 19 2月, 2015 1 次提交
  22. 21 1月, 2015 2 次提交
  23. 18 12月, 2014 1 次提交