1. 31 3月, 2016 2 次提交
    • T
      ext4: allow readdir()'s of large empty directories to be interrupted · 1028b55b
      Theodore Ts'o 提交于
      If a directory has a large number of empty blocks, iterating over all
      of them can take a long time, leading to scheduler warnings and users
      getting irritated when they can't kill a process in the middle of one
      of these long-running readdir operations.  Fix this by adding checks to
      ext4_readdir() and ext4_htree_fill_tree().
      Reported-by: NBenjamin LaHaise <bcrl@kvack.org>
      Google-Bug-Id: 27880676
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1028b55b
    • F
      btrfs: fix crash/invalid memory access on fsync when using overlayfs · de17e793
      Filipe Manana 提交于
      If the lower or upper directory of an overlayfs mount belong to a btrfs
      file system and we fsync the file through the overlayfs' merged directory
      we ended up accessing an inode that didn't belong to btrfs as if it were
      a btrfs inode at btrfs_sync_file() resulting in a crash like the following:
      
      [ 7782.588845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000544
      [ 7782.590624] IP: [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
      [ 7782.591931] PGD 4d954067 PUD 1e878067 PMD 0
      [ 7782.592016] Oops: 0002 [#6] PREEMPT SMP DEBUG_PAGEALLOC
      [ 7782.592016] Modules linked in: btrfs overlay ppdev crc32c_generic evdev xor raid6_pq psmouse pcspkr sg serio_raw acpi_cpufreq parport_pc parport tpm_tis i2c_piix4 tpm i2c_core processor button loop autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
      [ 7782.592016] CPU: 10 PID: 16437 Comm: xfs_io Tainted: G      D         4.5.0-rc6-btrfs-next-26+ #1
      [ 7782.592016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [ 7782.592016] task: ffff88001b8d40c0 ti: ffff880137488000 task.ti: ffff880137488000
      [ 7782.592016] RIP: 0010:[<ffffffffa030b7ab>]  [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
      [ 7782.592016] RSP: 0018:ffff88013748be40  EFLAGS: 00010286
      [ 7782.592016] RAX: 0000000080000000 RBX: ffff880133b30c88 RCX: 0000000000000001
      [ 7782.592016] RDX: 0000000000000001 RSI: ffffffff8148fec0 RDI: 00000000ffffffff
      [ 7782.592016] RBP: ffff88013748bec0 R08: 0000000000000001 R09: 0000000000000000
      [ 7782.624248] R10: ffff88013748be40 R11: 0000000000000246 R12: 0000000000000000
      [ 7782.624248] R13: 0000000000000000 R14: 00000000009305a0 R15: ffff880015e3be40
      [ 7782.624248] FS:  00007fa83b9cb700(0000) GS:ffff88023ed40000(0000) knlGS:0000000000000000
      [ 7782.624248] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 7782.624248] CR2: 0000000000000544 CR3: 00000001fa652000 CR4: 00000000000006e0
      [ 7782.624248] Stack:
      [ 7782.624248]  ffffffff8108b5cc ffff88013748bec0 0000000000000246 ffff8800b005ded0
      [ 7782.624248]  ffff880133b30d60 8000000000000000 7fffffffffffffff 0000000000000246
      [ 7782.624248]  0000000000000246 ffffffff81074f9b ffffffff8104357c ffff880015e3be40
      [ 7782.624248] Call Trace:
      [ 7782.624248]  [<ffffffff8108b5cc>] ? arch_local_irq_save+0x9/0xc
      [ 7782.624248]  [<ffffffff81074f9b>] ? ___might_sleep+0xce/0x217
      [ 7782.624248]  [<ffffffff8104357c>] ? __do_page_fault+0x3c0/0x43a
      [ 7782.624248]  [<ffffffff811a2351>] vfs_fsync_range+0x8c/0x9e
      [ 7782.624248]  [<ffffffff811a237f>] vfs_fsync+0x1c/0x1e
      [ 7782.624248]  [<ffffffff811a24d6>] do_fsync+0x31/0x4a
      [ 7782.624248]  [<ffffffff811a2700>] SyS_fsync+0x10/0x14
      [ 7782.624248]  [<ffffffff81493617>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [ 7782.624248] Code: 85 c0 0f 85 e2 02 00 00 48 8b 45 b0 31 f6 4c 29 e8 48 ff c0 48 89 45 a8 48 8d 83 d8 00 00 00 48 89 c7 48 89 45 a0 e8 fc 43 18 e1 <f0> 41 ff 84 24 44 05 00 00 48 8b 83 58 ff ff ff 48 c1 e8 07 83
      [ 7782.624248] RIP  [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
      [ 7782.624248]  RSP <ffff88013748be40>
      [ 7782.624248] CR2: 0000000000000544
      [ 7782.661994] ---[ end trace 721e14960eb939bc ]---
      
      This started happening since commit 4bacc9c9 (overlayfs: Make f_path
      always point to the overlay and f_inode to the underlay) and even though
      after this change we could still access the btrfs inode through
      struct file->f_mapping->host or struct file->f_inode, we would end up
      resulting in more similar issues later on at check_parent_dirs_for_sync()
      because the dentry we got (from struct file->f_path.dentry) was from
      overlayfs and not from btrfs, that is, we had no way of getting the dentry
      that belonged to btrfs (we always got the dentry that belonged to
      overlayfs).
      
      The new patch from Miklos Szeredi, titled "vfs: add file_dentry()" and
      recently submitted to linux-fsdevel, adds a file_dentry() API that allows
      us to get the btrfs dentry from the input file and therefore being able
      to fsync when the upper and lower directories belong to btrfs filesystems.
      
      This issue has been reported several times by users in the mailing list
      and bugzilla. A test case for xfstests is being submitted as well.
      
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101951
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109791Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Cc: stable@vger.kernel.org
      de17e793
  2. 27 3月, 2016 6 次提交
    • T
      ext4 crypto: use dget_parent() in ext4_d_revalidate() · 3d43bcfe
      Theodore Ts'o 提交于
      This avoids potential problems caused by a race where the inode gets
      renamed out from its parent directory and the parent directory is
      deleted while ext4_d_revalidate() is running.
      
      Fixes: 28b4c263Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      3d43bcfe
    • M
      ext4: use file_dentry() · c0a37d48
      Miklos Szeredi 提交于
      EXT4 may be used as lower layer of overlayfs and accessing f_path.dentry
      can lead to a crash.
      
      Fix by replacing direct access of file->f_path.dentry with the
      file_dentry() accessor, which will always return a native object.
      Reported-by: NDaniel Axtens <dja@axtens.net>
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Fixes: ff978b09 ("ext4 crypto: move context consistency check to ext4_file_open()")
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org> # v4.5
      c0a37d48
    • M
      ext4: use dget_parent() in ext4_file_open() · 9dd78d8c
      Miklos Szeredi 提交于
      In f_op->open() lock on parent is not held, so there's no guarantee that
      parent dentry won't go away at any time.
      
      Even after this patch there's no guarantee that 'dir' will stay the parent
      of 'inode', but at least it won't be freed while being used.
      
      Fixes: ff978b09 ("ext4 crypto: move context consistency check to ext4_file_open()")
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: <stable@vger.kernel.org> # v4.5
      9dd78d8c
    • M
      nfs: use file_dentry() · be62a1a8
      Miklos Szeredi 提交于
      NFS may be used as lower layer of overlayfs and accessing f_path.dentry can
      lead to a crash.
      
      Fix by replacing direct access of file->f_path.dentry with the
      file_dentry() accessor, which will always return a native object.
      
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Tested-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: <stable@vger.kernel.org> # v4.2
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      be62a1a8
    • M
      fs: add file_dentry() · d101a125
      Miklos Szeredi 提交于
      This series fixes bugs in nfs and ext4 due to 4bacc9c9 ("overlayfs:
      Make f_path always point to the overlay and f_inode to the underlay").
      
      Regular files opened on overlayfs will result in the file being opened on
      the underlying filesystem, while f_path points to the overlayfs
      mount/dentry.
      
      This confuses filesystems which get the dentry from struct file and assume
      it's theirs.
      
      Add a new helper, file_dentry() [*], to get the filesystem's own dentry
      from the file.  This checks file->f_path.dentry->d_flags against
      DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not
      set (this is the common, non-overlayfs case).
      
      In the uncommon case it will call into overlayfs's ->d_real() to get the
      underlying dentry, matching file_inode(file).
      
      The reason we need to check against the inode is that if the file is copied
      up while being open, d_real() would return the upper dentry, while the open
      file comes from the lower dentry.
      
      [*] If possible, it's better simply to use file_inode() instead.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Tested-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Cc: <stable@vger.kernel.org> # v4.2
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Daniel Axtens <dja@axtens.net>
      d101a125
    • T
      ext4 crypto: don't let data integrity writebacks fail with ENOMEM · c9af28fd
      Theodore Ts'o 提交于
      We don't want the writeback triggered from the journal commit (in
      data=writeback mode) to cause the journal to abort due to
      generic_writepages() returning an ENOMEM error.  In addition, if
      fsync() fails with ENOMEM, most applications will probably not do the
      right thing.
      
      So if we are doing a data integrity sync, and ext4_encrypt() returns
      ENOMEM, we will submit any queued I/O to date, and then retry the
      allocation using GFP_NOFAIL.
      
      Google-Bug-Id: 27641567
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c9af28fd
  3. 23 3月, 2016 1 次提交
  4. 22 3月, 2016 7 次提交
  5. 21 3月, 2016 1 次提交
    • C
      btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums · 389f239c
      Chris Mason 提交于
      Commit c40a3d38 (Btrfs: Compute and look up csums based on
      sectorsized blocks) changes around how we walk the bios while looking up
      crcs.  There's an inner loop that is jumping to the next bvec based on
      sectors and before it derefs the next bvec, it needs to make sure we're
      still in the bio.
      
      In this case, the outer loop would have decided to stop moving forward
      too, and the bvec deref is never actually used for anything.  But
      CONFIG_DEBUG_PAGEALLOC catches it because we're outside our bio.
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      389f239c
  6. 19 3月, 2016 1 次提交
    • R
      splice: handle zero nr_pages in splice_to_pipe() · d6785d91
      Rabin Vincent 提交于
      Running the following command:
      
       busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null
      
      with any tracing enabled pretty very quickly leads to various NULL
      pointer dereferences and VM BUG_ON()s, such as these:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
       IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40
       Call Trace:
        [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
        [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
        [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
        [<ffffffff81196869>] do_sendfile+0x199/0x380
        [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
        [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d
      
       page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
       kernel BUG at include/linux/mm.h:367!
       invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40
       Call Trace:
        [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
        [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
        [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
        [<ffffffff81196869>] do_sendfile+0x199/0x380
        [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
        [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89
      
      (busybox's cat uses sendfile(2), unlike the coreutils version)
      
      This is because tracing_splice_read_pipe() can call splice_to_pipe()
      with spd->nr_pages == 0.  spd_pages underflows in splice_to_pipe() and
      we fill the page pointers and the other fields of the pipe_buffers with
      garbage.
      
      All other callers of splice_to_pipe() avoid calling it when nr_pages ==
      0, and we could make tracing_splice_read_pipe() do that too, but it
      seems reasonable to have splice_to_page() handle this condition
      gracefully.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d6785d91
  7. 18 3月, 2016 20 次提交
    • J
      f2fs: submit node page write bios when really required · 12bb0a8f
      Jaegeuk Kim 提交于
      If many threads calls fsync with data writes, we don't need to flush every
      bios having node page writes.
      The f2fs_wait_on_page_writeback will flush its bios when the page is really
      needed.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      12bb0a8f
    • A
      f2fs: add missing argument to f2fs_setxattr stub · fff4c55d
      Arnd Bergmann 提交于
      The f2fs_setxattr() prototype for CONFIG_F2FS_FS_XATTR=n has
      been wrong for a long time, since 8ae8f162 ("f2fs: support
      xattr security labels"), but there have never been any callers,
      so it did not matter.
      
      Now, the function gets called from f2fs_ioc_keyctl(), which
      causes a build failure:
      
      fs/f2fs/file.c: In function 'f2fs_ioc_keyctl':
      include/linux/stddef.h:7:14: error: passing argument 6 of 'f2fs_setxattr' makes integer from pointer without a cast [-Werror=int-conversion]
       #define NULL ((void *)0)
                    ^
      fs/f2fs/file.c:1599:27: note: in expansion of macro 'NULL'
           value, F2FS_KEY_SIZE, NULL, type);
                                 ^
      In file included from ../fs/f2fs/file.c:29:0:
      fs/f2fs/xattr.h:129:19: note: expected 'int' but argument is of type 'void *'
       static inline int f2fs_setxattr(struct inode *inode, int index,
                         ^
      fs/f2fs/file.c:1597:9: error: too many arguments to function 'f2fs_setxattr'
        return f2fs_setxattr(inode, F2FS_XATTR_INDEX_KEY,
               ^
      In file included from ../fs/f2fs/file.c:29:0:
      fs/f2fs/xattr.h:129:19: note: declared here
       static inline int f2fs_setxattr(struct inode *inode, int index,
      
      Thsi changes the prototype of the empty stub function to match
      that of the actual implementation. This will not make the key
      management work when F2FS_FS_XATTR is disabled, but it gets it
      to build at least.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fff4c55d
    • C
      f2fs: fix to avoid unneeded unlock_new_inode · d726732c
      Chao Yu 提交于
      During ->lookup, I_NEW state of inode was been cleared in f2fs_iget,
      so in error path, we don't need to clear it again.
      Signed-off-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d726732c
    • C
      f2fs: clean up opened code with f2fs_update_dentry · 291bf80b
      Chao Yu 提交于
      Just clean up opened code with existing function, no logic change.
      Signed-off-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      291bf80b
    • J
      f2fs: declare static functions · 17a0ee55
      Jaegeuk Kim 提交于
      Just to avoid sparse warnings.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      17a0ee55
    • K
      f2fs: use cryptoapi crc32 functions · 43b6573b
      Keith Mok 提交于
      The crc function is done bit by bit.
      Optimize this by use cryptoapi
      crc32 function which is backed by h/w acceleration.
      Signed-off-by: NKeith Mok <ek9852@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      43b6573b
    • F
      f2fs: modify the readahead method in ra_node_page() · 999270de
      Fan Li 提交于
      ra_node_page() is used to read ahead one node page. Comparing to regular
      read, it's faster because it doesn't wait for IO completion.
      But if it is called twice for reading the same block, and the IO request
      from the first call hasn't been completed before the second call, the second
      call will have to wait until the read is over.
      
      Here use the code in __do_page_cache_readahead() to solve this problem.
      It does nothing when someone else already puts the page in mapping. The
      status of page should be assured by whoever puts it there.
      This implement also prevents alteration of page reference count.
      Signed-off-by: NFan li <fanofcode.li@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      999270de
    • J
      f2fs crypto: sync ext4_lookup and ext4_file_open · 8074bb51
      Jaegeuk Kim 提交于
      This patch tries to catch up with lookup and open policies in ext4.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8074bb51
    • J
      fs crypto: move per-file encryption from f2fs tree to fs/crypto · 0b81d077
      Jaegeuk Kim 提交于
      This patch adds the renamed functions moved from the f2fs crypto files.
      
      1. definitions for per-file encryption used by ext4 and f2fs.
      
      2. crypto.c for encrypt/decrypt functions
       a. IO preparation:
        - fscrypt_get_ctx / fscrypt_release_ctx
       b. before IOs:
        - fscrypt_encrypt_page
        - fscrypt_decrypt_page
        - fscrypt_zeroout_range
       c. after IOs:
        - fscrypt_decrypt_bio_pages
        - fscrypt_pullback_bio_page
        - fscrypt_restore_control_page
      
      3. policy.c supporting context management.
       a. For ioctls:
        - fscrypt_process_policy
        - fscrypt_get_policy
       b. For context permission
        - fscrypt_has_permitted_context
        - fscrypt_inherit_context
      
      4. keyinfo.c to handle permissions
        - fscrypt_get_encryption_info
        - fscrypt_free_encryption_info
      
      5. fname.c to support filename encryption
       a. general wrapper functions
        - fscrypt_fname_disk_to_usr
        - fscrypt_fname_usr_to_disk
        - fscrypt_setup_filename
        - fscrypt_free_filename
      
       b. specific filename handling functions
        - fscrypt_fname_alloc_buffer
        - fscrypt_fname_free_buffer
      
      6. Makefile and Kconfig
      
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
      Signed-off-by: NIldar Muslukhov <ildarm@google.com>
      Signed-off-by: NUday Savagaonkar <savagaon@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b81d077
    • K
      lib: update single-char callers of strtobool() · 1404297e
      Kees Cook 提交于
      Some callers of strtobool() were passing a pointer to unterminated
      strings.  In preparation of adding multi-character processing to
      kstrtobool(), update the callers to not pass single-character pointers,
      and switch to using the new kstrtobool_from_user() helper where
      possible.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Amitkumar Karwar <akarwar@marvell.com>
      Cc: Nishant Sarmukadam <nishants@marvell.com>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: Steve French <sfrench@samba.org>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1404297e
    • M
      btrfs: use radix_tree_iter_retry() · c28f2420
      Matthew Wilcox 提交于
      Even though this is a 'can't happen' situation, use the new
      radix_tree_iter_retry() pattern to eliminate a goto.
      
      [akpm@linux-foundation.org: fix btrfs build]
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: David Sterba <dsterba@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c28f2420
    • D
      proc-vmcore: wrong data type casting fix · 0b50a2d8
      Dave Young 提交于
      On i686 PAE enabled machine the contiguous physical area could be large
      and it can cause trimming down variables in below calculation in
      read_vmcore() and mmap_vmcore():
      
      	tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);
      
      That is, the types being used is like below on i686:
      m->offset: unsigned long long int
      m->size:   unsigned long long int
      *fpos:     loff_t (long long int)
      buflen:    size_t (unsigned int)
      
      So casting (m->offset + m->size - *fpos) by size_t means truncating a
      given value by 4GB.
      
      Suppose (m->offset + m->size - *fpos) being truncated to 0, buflen >0
      then we will get tsz = 0.  It is of course not an expected result.
      Similarly we could also get other truncated values less than buflen.
      Then the real size passed down is not correct any more.
      
      If (m->offset + m->size - *fpos) is above 4GB, read_vmcore or
      mmap_vmcore use the min_t result with truncated values being compared to
      buflen.  Then, fpos proceeds with the wrong value so that we reach below
      bugs:
      
      1) read_vmcore will refuse to continue so makedumpfile fails.
      2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range().
      
      Use unsigned long long in min_t instead so that the variables in are not
      truncated.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jianyu Zhan <nasa4836@gmail.com>
      Cc: Minfei Huang <mhuang@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b50a2d8
    • M
      proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan" · 7e2bc81d
      Minfei Huang 提交于
      It is not elegant that prompt shell does not start from new line after
      executing "cat /proc/$pid/wchan".  Make prompt shell start from new
      line.
      Signed-off-by: NMinfei Huang <mnfhuang@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e2bc81d
    • E
      procfs: add conditional compilation check · b5946bea
      Eric Engestrom 提交于
      `proc_timers_operations` is only used when CONFIG_CHECKPOINT_RESTORE is
      enabled.
      Signed-off-by: NEric Engestrom <eric.engestrom@imgtec.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5946bea
    • J
      proc: add /proc/<pid>/timerslack_ns interface · 5de23d43
      John Stultz 提交于
      This patch provides a proc/PID/timerslack_ns interface which exposes a
      task's timerslack value in nanoseconds and allows it to be changed.
      
      This allows power/performance management software to set timer slack for
      other threads according to its policy for the thread (such as when the
      thread is designated foreground vs.  background activity)
      
      If the value written is non-zero, slack is set to that value.  Otherwise
      sets it to the default for the thread.
      
      This interface checks that the calling task has permissions to to use
      PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure
      arbitrary apps do not change the timer slack for other apps.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5de23d43
    • J
      timer: convert timer_slack_ns from unsigned long to u64 · da8b44d5
      John Stultz 提交于
      This patchset introduces a /proc/<pid>/timerslack_ns interface which
      would allow controlling processes to be able to set the timerslack value
      on other processes in order to save power by avoiding wakeups (Something
      Android currently does via out-of-tree patches).
      
      The first patch tries to fix the internal timer_slack_ns usage which was
      defined as a long, which limits the slack range to ~4 seconds on 32bit
      systems.  It converts it to a u64, which provides the same basically
      unlimited slack (500 years) on both 32bit and 64bit machines.
      
      The second patch introduces the /proc/<pid>/timerslack_ns interface
      which allows the full 64bit slack range for a task to be read or set on
      both 32bit and 64bit machines.
      
      With these two patches, on a 32bit machine, after setting the slack on
      bash to 10 seconds:
      
      $ time sleep 1
      
      real    0m10.747s
      user    0m0.001s
      sys     0m0.005s
      
      The first patch is a little ugly, since I had to chase the slack delta
      arguments through a number of functions converting them to u64s.  Let me
      know if it makes sense to break that up more or not.
      
      Other than that things are fairly straightforward.
      
      This patch (of 2):
      
      The timer_slack_ns value in the task struct is currently a unsigned
      long.  This means that on 32bit applications, the maximum slack is just
      over 4 seconds.  However, on 64bit machines, its much much larger (~500
      years).
      
      This disparity could make application development a little (as well as
      the default_slack) to a u64.  This means both 32bit and 64bit systems
      have the same effective internal slack range.
      
      Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
      the interface as a unsigned long, so we preserve that limitation on
      32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
      long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
      actually larger then what can be stored by an unsigned long.
      
      This patch also modifies hrtimer functions which specified the slack
      delta as a unsigned long.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da8b44d5
    • J
      mm: introduce page reference manipulation functions · fe896d18
      Joonsoo Kim 提交于
      The success of CMA allocation largely depends on the success of
      migration and key factor of it is page reference count.  Until now, page
      reference is manipulated by direct calling atomic functions so we cannot
      follow up who and where manipulate it.  Then, it is hard to find actual
      reason of CMA allocation failure.  CMA allocation should be guaranteed
      to succeed so finding offending place is really important.
      
      In this patch, call sites where page reference is manipulated are
      converted to introduced wrapper function.  This is preparation step to
      add tracepoint to each page reference manipulation function.  With this
      facility, we can easily find reason of CMA allocation failure.  There is
      no functional change in this patch.
      
      In addition, this patch also converts reference read sites.  It will
      help a second step that renames page._count to something else and
      prevents later attempt to direct access to it (Suggested by Andrew).
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe896d18
    • I
      mm/page_alloc.c: calculate 'available' memory in a separate function · d02bd27b
      Igor Redko 提交于
      Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
      statistics protocol, corresponding to 'Available' in /proc/meminfo.
      
      It indicates to the hypervisor how big the balloon can be inflated
      without pushing the guest system to swap.  This metric would be very
      useful in VM orchestration software to improve memory management of
      different VMs under overcommit.
      
      This patch (of 2):
      
      Factor out calculation of the available memory counter into a separate
      exportable function, in order to be able to use it in other parts of the
      kernel.
      
      In particular, it appears a relevant metric to report to the hypervisor
      via virtio-balloon statistics interface (in a followup patch).
      Signed-off-by: NIgor Redko <redkoi@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d02bd27b
    • N
      /proc/kpageflags: return KPF_SLAB for slab tail pages · 0a71649c
      Naoya Horiguchi 提交于
      Currently /proc/kpageflags returns just KPF_COMPOUND_TAIL for slab tail
      pages, which is inconvenient when grasping how slab pages are
      distributed (userspace always needs to check which kind of tail pages by
      itself).  This patch sets KPF_SLAB for such pages.
      
      With this patch:
      
        $ grep Slab /proc/meminfo ; tools/vm/page-types -b slab
        Slab:              64880 kB
                     flags      page-count       MB  symbolic-flags                     long-symbolic-flags
        0x0000000000000080           16220       63  _______S__________________________________ slab
                     total           16220       63
      
      16220 pages equals to 64880 kB, so returned result is consistent with the
      global counter.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a71649c
    • N
      /proc/kpageflags: return KPF_BUDDY for "tail" buddy pages · 832fc1de
      Naoya Horiguchi 提交于
      Currently /proc/kpageflags returns nothing for "tail" buddy pages, which
      is inconvenient when grasping how free pages are distributed.  This
      patch sets KPF_BUDDY for such pages.
      
      With this patch:
      
        $ grep MemFree /proc/meminfo ; tools/vm/page-types -b buddy
        MemFree:         3134992 kB
                     flags      page-count       MB  symbolic-flags                     long-symbolic-flags
        0x0000000000000400          779272     3044  __________B_______________________________ buddy
        0x0000000000000c00            4385       17  __________BM______________________________ buddy,mmap
                     total          783657     3061
      
      783657 pages is 3134628 kB (roughly consistent with the global counter,)
      so it's OK.
      
      [akpm@linux-foundation.org: update comment, per Naoya]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com&gt;>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      832fc1de
  8. 17 3月, 2016 1 次提交
  9. 16 3月, 2016 1 次提交