1. 21 5月, 2013 5 次提交
    • D
      xfs: fix missing KM_NOFS tags to keep lockdep happy · ac14876c
      Dave Chinner 提交于
      There are several places where we use KM_SLEEP allocation contexts
      and use the fact that they are called from transaction context to
      add KM_NOFS where appropriate. Unfortunately, there are several
      places where the code makes this assumption but can be called from
      outside transaction context but with filesystem locks held. These
      places need explicit KM_NOFS annotations to avoid lockdep
      complaining about reclaim contexts.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ac14876c
    • D
      xfs: Don't reference the EFI after it is freed · 52c24ad3
      Dave Chinner 提交于
      Checking the EFI for whether it is being released from recovery
      after we've already released the known active reference is a mistake
      worthy of a brown paper bag. Fix the (now) obvious use after free
      that it can cause.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      52c24ad3
    • D
      xfs: fix rounding in xfs_free_file_space · 28ca489c
      Dave Chinner 提交于
      The offset passed into xfs_free_file_space() needs to be rounded
      down to a certain size, but the rounding mask is built by a 32 bit
      variable. Hence the mask will always mask off the upper 32 bits of
      the offset and lead to incorrect writeback and invalidation ranges.
      
      This is not actually exposed as a bug because we writeback and
      invalidate from the rounded offset to the end of the file, and hence
      the offset we are actually punching a hole out of will always be
      covered by the code. This needs fixing, however, if we ever want to
      use exact ranges for writeback/invalidation here...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      28ca489c
    • D
      xfs: fix sub-page blocksize data integrity writes · 49b137cb
      Dave Chinner 提交于
      FSX on 512 byte block size filesystems has been failing for some
      time with corrupted data. The fault dates back to the change in
      the writeback data integrity algorithm that uses a mark-and-sweep
      approach to avoid data writeback livelocks.
      
      Unfortunately, a side effect of this mark-and-sweep approach is that
      each page will only be written once for a data integrity sync, and
      there is a condition in writeback in XFS where a page may require
      two writeback attempts to be fully written. As a result of the high
      level change, we now only get a partial page writeback during the
      integrity sync because the first pass through writeback clears the
      mark left on the page index to tell writeback that the page needs
      writeback....
      
      The cause is writing a partial page in the clustering code. This can
      happen when a mapping boundary falls in the middle of a page - we
      end up writing back the first part of the page that the mapping
      covers, but then never revisit the page to have the remainder mapped
      and written.
      
      The fix is simple - if the mapping boundary falls inside a page,
      then simple abort clustering without touching the page. This means
      that the next ->writepage entry that write_cache_pages() will make
      is the page we aborted on, and xfs_vm_writepage() will map all
      sections of the page correctly. This behaviour is also optimal for
      non-data integrity writes, as it results in contiguous sequential
      writeback of the file rather than missing small holes and having to
      write them a "random" writes in a future pass.
      
      With this fix, all the fsx tests in xfstests now pass on a 512 byte
      block size filesystem on a 4k page machine.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      49b137cb
    • J
      xfs: Avoid pathological backwards allocation · 211d022c
      Jan Kara 提交于
      Writing a large file using direct IO in 16 MB chunks sometimes results
      in a pathological allocation pattern where 16 MB chunks of large free
      extent are allocated to a file in a reversed order. So extents of a file
      look for example as:
      
       ext logical physical expected length flags
         0        0        13          4550656
         1  4550656 188136807   4550668 12562432
         2 17113088 200699240 200699238 622592
         3 17735680 182046055 201321831   4096
         4 17739776 182041959 182050150   4096
         5 17743872 182037863 182046054   4096
         6 17747968 182033767 182041958   4096
         7 17752064 182029671 182037862   4096
      ...
      6757 45400064 154381644 154389835   4096
      6758 45404160 154377548 154385739   4096
      6759 45408256 252951571 154381643  73728 eof
      
      This happens because XFS_ALLOCTYPE_THIS_BNO allocation fails (the last
      extent in the file cannot be further extended) so we fall back to
      XFS_ALLOCTYPE_NEAR_BNO allocation which picks end of a large free
      extent as the best place to continue the file. Since the chunk at the
      end of the free extent again cannot be further extended, this behavior
      repeats until the whole free extent is consumed in a reversed order.
      
      For data allocations this backward allocation isn't beneficial so make
      xfs_alloc_compute_diff() pick start of a free extent instead of its end
      for them. That avoids the backward allocation pattern.
      
      See thread at http://oss.sgi.com/archives/xfs/2013-03/msg00144.html for
      more details about the reproduction case and why this solution was
      chosen.
      
      Based on idea by Dave Chinner <dchinner@redhat.com>.
      
      CC: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      211d022c
  2. 10 5月, 2013 4 次提交
    • T
      eCryptfs: Use the ablkcipher crypto API · 4dfea4f0
      Tyler Hicks 提交于
      Make the switch from the blkcipher kernel crypto interface to the
      ablkcipher interface.
      
      encrypt_scatterlist() and decrypt_scatterlist() now use the ablkcipher
      interface but, from the eCryptfs standpoint, still treat the crypto
      operation as a synchronous operation. They submit the async request and
      then wait until the operation is finished before they return. Most of
      the changes are contained inside those two functions.
      
      Despite waiting for the completion of the crypto operation, the
      ablkcipher interface provides performance increases in most cases when
      used on AES-NI capable hardware.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Acked-by: NColin King <colin.king@canonical.com>
      Reviewed-by: NZeev Zilberman <zeev@annapurnaLabs.com>
      Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Ying Huang <ying.huang@intel.com>
      Cc: Thieu Le <thieule@google.com>
      Cc: Li Wang <dragonylffly@163.com>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@iki.fi>
      4dfea4f0
    • A
      91c2e0bc
    • A
      ecryptfs: don't open-code kernel_read() · 39dfe6c6
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      39dfe6c6
    • J
      nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error · 7255e716
      Jeff Layton 提交于
      Toralf reported the following oops to the linux-nfs mailing list:
      
          -----------------[snip]------------------
          NFSD: unable to generate recoverydir name (-2).
          NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly!
          BUG: unable to handle kernel NULL pointer dereference at 000003c8
          IP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
          *pdpt = 000000002ba33001 *pde = 0000000000000000
          Oops: 0000 [#1] SMP
          Modules linked in: loop nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun arc4 iwldvm mac80211 coretemp kvm_intel uvcvideo sdhci_pci sdhci mmc_core videobuf2_vmalloc videobuf2_memops usblp videobuf2_core i915 iwlwifi psmouse videodev cfg80211 kvm fbcon bitblit cfbfillrect acpi_cpufreq mperf evdev softcursor font cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt drm_kms_helper snd_hda_codec_conexant drm agpgart fb fbdev tpm_tis thinkpad_acpi tpm nvram e1000e rfkill thermal ptp wmi pps_core tpm_bios 8250_pci processor 8250 ac snd_hda_intel snd_hda_codec snd_pcm battery video i2c_i801 snd_page_alloc snd_timer button serial_core i2c_core snd soundcore thermal_sys hwmon aesni_intel ablk_helper cryp
      td lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
          Pid: 6374, comm: nfsd Not tainted 3.9.1 #6 LENOVO 4180F65/4180F65
          EIP: 0060:[<f90a3d91>] EFLAGS: 00010202 CPU: 0
          EIP is at nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
          EAX: 00000000 EBX: fffffffe ECX: 00000007 EDX: 00000007
          ESI: eb9dcb00 EDI: eb2991c0 EBP: eb2bde38 ESP: eb2bde34
          DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
          CR0: 80050033 CR2: 000003c8 CR3: 2ba80000 CR4: 000407f0
          DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
          DR6: ffff0ff0 DR7: 00000400
          Process nfsd (pid: 6374, ti=eb2bc000 task=eb2711c0 task.ti=eb2bc000)
          Stack:
          fffffffe eb2bde4c f90a3e0c f90a7754 fffffffe eb0a9c00 eb2bdea0 f90a41ed
          eb2991c0 1b270000 eb2991c0 eb2bde7c f9099ce9 eb2bde98 0129a020 eb29a020
          eb2bdecc eb2991c0 eb2bdea8 f9099da5 00000000 eb9dcb00 00000001 67822f08
          Call Trace:
          [<f90a3e0c>] legacy_recdir_name_error+0x3c/0x40 [nfsd]
          [<f90a41ed>] nfsd4_create_clid_dir+0x15d/0x1c0 [nfsd]
          [<f9099ce9>] ? nfsd4_lookup_stateid+0x99/0xd0 [nfsd]
          [<f9099da5>] ? nfs4_preprocess_seqid_op+0x85/0x100 [nfsd]
          [<f90a4287>] nfsd4_client_record_create+0x37/0x50 [nfsd]
          [<f909d6ce>] nfsd4_open_confirm+0xfe/0x130 [nfsd]
          [<f90980b1>] ? nfsd4_encode_operation+0x61/0x90 [nfsd]
          [<f909d5d0>] ? nfsd4_free_stateid+0xc0/0xc0 [nfsd]
          [<f908fd0b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
          [<f9081b7b>] nfsd_dispatch+0x8b/0x1a0 [nfsd]
          [<f857b85d>] svc_process+0x3dd/0x640 [sunrpc]
          [<f908165d>] nfsd+0xad/0x110 [nfsd]
          [<f90815b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
          [<c1054824>] kthread+0x94/0xa0
          [<c1486937>] ret_from_kernel_thread+0x1b/0x28
          [<c1054790>] ? flush_kthread_work+0xd0/0xd0
          Code: 86 b0 00 00 00 90 c5 0a f9 c7 04 24 70 76 0a f9 e8 74 a9 3d c8 eb ba 8d 76 00 55 89 e5 53 66 66 66 66 90 8b 15 68 c7 0a f9 85 d2 <8b> 88 c8 03 00 00 74 2c 3b 11 77 28 8b 5c 91 08 85 db 74 22 8b
          EIP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd] SS:ESP 0068:eb2bde34
          CR2: 00000000000003c8
          ---[ end trace 09e54015d145c9c6 ]---
      
      The problem appears to be a regression that was introduced in commit
      9a9c6478 "nfsd: make NFSv4 recovery client tracking options per net".
      Prior to that commit, it was safe to pass a NULL net pointer to
      nfsd4_client_tracking_exit in the legacy recdir case, and
      legacy_recdir_name_error did so. After that comit, the net pointer must
      be valid.
      
      This patch just fixes legacy_recdir_name_error to pass in a valid net
      pointer to that function.
      
      Cc: <stable@vger.kernel.org> # v3.8+
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Reported-and-tested-by: NToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7255e716
  3. 09 5月, 2013 2 次提交
  4. 08 5月, 2013 29 次提交
    • J
      f2fs: cover free_nid management with spin_lock · 59bbd474
      Jaegeuk Kim 提交于
      After build_free_nids() searches free nid candidates from nat pages and
      current journal blocks, it checks all the candidates if they are allocated
      so that the nat cache has its nid with an allocated block address.
      
      In this procedure, previously we used
          list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list).
      But, this is not covered by free_nid_list_lock, resulting in null pointer bug.
      
      This patch moves this checking routine inside add_free_nid() in order not to use
      the spin_lock.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      59bbd474
    • H
      f2fs: optimize scan_nat_page() · 23d38844
      Haicheng Li 提交于
      When nm_i->fcnt > 2 * MAX_FREE_NIDS, stop scanning other NAT entries.
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      [Jaegeuk Kim: fix handling the return value of add_free_nid()]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      23d38844
    • H
      f2fs: code cleanup for scan_nat_page() and build_free_nids() · 8760952d
      Haicheng Li 提交于
      This patch does two cleanups:
      1. remove unused variable "fcnt" in build_free_nids().
      2. make scan_nat_page() as void type and remove useless variable "fcnt".
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      8760952d
    • H
      f2fs: bugfix for alloc_nid_failed() · 95630cba
      Haicheng Li 提交于
      Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS
      
      Since there is NOT nmi->free_nid_list_lock spinlock protection between
      a sequential calling of alloc_nid() and alloc_nid_failed(), some other
      threads may already add new free_nid to the free_nid_list during this
      period.
      
      We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS.
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      [Jaegeuk Kim: fit the coding style]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      95630cba
    • C
      f2fs: recover when journal contains deleted files · 047184b4
      Chris Fries 提交于
      When recovering a journal file with fsync data for files that have
      been deleted, don't bail out on recovery.
      Signed-off-by: NChris Fries <C.Fries@motorola.com>
      Reviewed-by: NRussell Knize <rknize2@motorola.com>
      Reviewed-by: NJason Hrycay <jason.hrycay@motorola.com>
      [Jaegeuk Kim: fit the coding style]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      047184b4
    • C
      f2fs: continue to mount after failing recovery · bde582b2
      Chris Fries 提交于
      When unable to roll forward the journal, we shouldn't bail out and
      not mount, we should continue to attempt the mount.  Bad recovery data
      is likely unrecoverable at this point, and requiring the user to try
      to mount again doesn't solve any issues.
      Signed-off-by: NChris Fries <C.Fries@motorola.com>
      Reviewed-by: NRussell Knize <rknize2@motorola.com>
      Reviewed-by: NJason Hrycay <jason.hrycay@motorola.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      bde582b2
    • J
      f2fs: avoid deadlock during evict after f2fs_gc · 531ad7d5
      Jaegeuk Kim 提交于
      o Deadlock case #1
      
      Thread 1:
      - writeback_sb_inodes
       - do_writepages
        - f2fs_write_data_pages
         - write_cache_pages
          - f2fs_write_data_page
           - f2fs_balance_fs
            - wait mutex_lock(gc_mutex)
      
      Thread 2:
      - f2fs_balance_fs
       - mutex_lock(gc_mutex)
       - f2fs_gc
        - f2fs_iget
         - wait iget_locked(inode->i_lock)
      
      Thread 3:
      - do_unlinkat
       - iput
        - lock(inode->i_lock)
         - evict
          - inode_wait_for_writeback
      
      o Deadlock case #2
      
      Thread 1:
      - __writeback_single_inode
       : set I_SYNC
        - do_writepages
         - f2fs_write_data_page
          - f2fs_balance_fs
           - f2fs_gc
            - iput
             - evict
              - inode_wait_for_writeback(I_SYNC)
      
      In order to avoid this, even though iput is called with the zero-reference
      count, we need to stop the eviction procedure if the inode is on writeback.
      So this patch links f2fs_drop_inode which checks the I_SYNC flag.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      531ad7d5
    • K
      aio: don't include aio.h in sched.h · a27bb332
      Kent Overstreet 提交于
      Faster kernel compiles by way of fewer unnecessary includes.
      
      [akpm@linux-foundation.org: fix fallout]
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a27bb332
    • K
      aio: kill ki_retry · 41ef4eb8
      Kent Overstreet 提交于
      Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
      need this anymore - ki_retry was only called right after the kiocb was
      initialized.
      
      This also refactors and trims some duplicated code, as well as cleaning up
      the refcounting/error handling a bit.
      
      [akpm@linux-foundation.org: use fmode_t in aio_run_iocb()]
      [akpm@linux-foundation.org: fix file_start_write/file_end_write tests]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41ef4eb8
    • K
      aio: kill ki_key · 8a660890
      Kent Overstreet 提交于
      ki_key wasn't actually used for anything previously - it was always 0.
      Drop it to trim struct kiocb a bit.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a660890
    • J
      audit: vfs: fix audit_inode call in O_CREAT case of do_last · 33e2208a
      Jeff Layton 提交于
      Jiri reported a regression in auditing of open(..., O_CREAT) syscalls.
      In older kernels, creating a file with open(..., O_CREAT) created
      audit_name records that looked like this:
      
      type=PATH msg=audit(1360255720.628:64): item=1 name="/abc/foo" inode=138810 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255720.628:64): item=0 name="/abc/" inode=138635 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      
      ...in recent kernels though, they look like this:
      
      type=PATH msg=audit(1360255402.886:12574): item=2 name=(null) inode=264599 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255402.886:12574): item=1 name=(null) inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255402.886:12574): item=0 name="/abc/foo" inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      
      Richard bisected to determine that the problems started with commit
      bfcec708, but the log messages have changed with some later
      audit-related patches.
      
      The problem is that this audit_inode call is passing in the parent of
      the dentry being opened, but audit_inode is being called with the parent
      flag false. This causes later audit_inode and audit_inode_child calls to
      match the wrong entry in the audit_names list.
      
      This patch simply sets the flag to properly indicate that this inode
      represents the parent. With this, the audit_names entries are back to
      looking like they did before.
      
      Cc: <stable@vger.kernel.org> # v3.7+
      Reported-by: NJiri Jaburek <jjaburek@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Test By: Richard Guy Briggs <rbriggs@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      33e2208a
    • K
      aio: give shared kioctx fields their own cachelines · 4e23bcae
      Kent Overstreet 提交于
      [akpm@linux-foundation.org: make reqs_active __cacheline_aligned_in_smp]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e23bcae
    • K
      aio: kill struct aio_ring_info · 58c85dc2
      Kent Overstreet 提交于
      struct aio_ring_info was kind of odd, the only place it's used is where
      it's embedded in struct kioctx - there's no real need for it.
      
      The next patch rearranges struct kioctx and puts various things on their
      own cachelines - getting rid of struct aio_ring_info now makes that
      reordering a bit clearer.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58c85dc2
    • K
      aio: kill batch allocation · a1c8eae7
      Kent Overstreet 提交于
      Previously, allocating a kiocb required touching quite a few global
      (well, per kioctx) cachelines...  so batching up allocation to amortize
      those was worthwhile.  But we've gotten rid of some of those, and in
      another couple of patches kiocb allocation won't require writing to any
      shared cachelines, so that means we can just rip this code out.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1c8eae7
    • K
      aio: change reqs_active to include unreaped completions · 3e845ce0
      Kent Overstreet 提交于
      The aio code tries really hard to avoid having to deal with the
      completion ringbuffer overflowing.  To do that, it has to keep track of
      the number of outstanding kiocbs, and the number of completions
      currently in the ringbuffer - and it's got to check that every time we
      allocate a kiocb.  Ouch.
      
      But - we can improve this quite a bit if we just change reqs_active to
      mean "number of outstanding requests and unreaped completions" - that
      means kiocb allocation doesn't have to look at the ringbuffer, which is
      a fairly significant win.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e845ce0
    • K
      aio: use cancellation list lazily · 0460fef2
      Kent Overstreet 提交于
      Cancelling kiocbs requires adding them to a per kioctx linked list,
      which is one of the few things we need to take the kioctx lock for in
      the fast path.  But most kiocbs can't be cancelled - so if we just do
      this lazily, we can avoid quite a bit of locking overhead.
      
      While we're at it, instead of using a flag bit switch to using ki_cancel
      itself to indicate that a kiocb has been cancelled/completed.  This lets
      us get rid of ki_flags entirely.
      
      [akpm@linux-foundation.org: remove buggy BUG()]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0460fef2
    • K
      aio: use flush_dcache_page() · 21b40200
      Kent Overstreet 提交于
      This wasn't causing problems before because it's not needed on x86, but
      it is needed on other architectures.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      21b40200
    • K
      aio: make aio_read_evt() more efficient, convert to hrtimers · a31ad380
      Kent Overstreet 提交于
      Previously, aio_read_event() pulled a single completion off the
      ringbuffer at a time, locking and unlocking each time.  Change it to
      pull off as many events as it can at a time, and copy them directly to
      userspace.
      
      This also fixes a bug where if copying the event to userspace failed,
      we'd lose the event.
      
      Also convert it to wait_event_interruptible_hrtimeout(), which
      simplifies it quite a bit.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a31ad380
    • K
      aio: refcounting cleanup · 36f55889
      Kent Overstreet 提交于
      The usage of ctx->dead was fubar - it makes no sense to explicitly check
      it all over the place, especially when we're already using RCU.
      
      Now, ctx->dead only indicates whether we've dropped the initial
      refcount. The new teardown sequence is:
      
        set ctx->dead
        hlist_del_rcu();
        synchronize_rcu();
      
      Now we know no system calls can take a new ref, and it's safe to drop
      the initial ref:
      
        put_ioctx();
      
      We also need to ensure there are no more outstanding kiocbs.  This was
      done incorrectly - it was being done in kill_ctx(), and before dropping
      the initial refcount.  At this point, other syscalls may still be
      submitting kiocbs!
      
      Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after
      kioctx->users has dropped to 0 and we know no more iocbs could be
      submitted.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36f55889
    • K
      aio: make aio_put_req() lockless · 11599eba
      Kent Overstreet 提交于
      Freeing a kiocb needed to touch the kioctx for three things:
      
       * Pull it off the reqs_active list
       * Decrementing reqs_active
       * Issuing a wakeup, if the kioctx was in the process of being freed.
      
      This patch moves these to aio_complete(), for a couple reasons:
      
       * aio_complete() already has to issue the wakeup, so if we drop the
         kioctx refcount before aio_complete does its wakeup we don't have to
         do it twice.
       * aio_complete currently has to take the kioctx lock, so it makes sense
         for it to pull the kiocb off the reqs_active list too.
       * A later patch is going to change reqs_active to include unreaped
         completions - this will mean allocating a kiocb doesn't have to look
         at the ringbuffer. So taking the decrement of reqs_active out of
         kiocb_free() is useful prep work for that patch.
      
      This doesn't really affect cancellation, since existing (usb) code that
      implements a cancel function still calls aio_complete() - we just have
      to make sure that aio_complete does the necessary teardown for cancelled
      kiocbs.
      
      It does affect code paths where we free kiocbs that were never
      submitted; they need to decrement reqs_active and pull the kiocb off the
      reqs_active list.  This occurs in two places: kiocb_batch_free(), which
      is going away in a later patch, and the error path in io_submit_one.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11599eba
    • K
      aio: do fget() after aio_get_req() · 1d98ebfc
      Kent Overstreet 提交于
      aio_get_req() will fail if we have the maximum number of requests
      outstanding, which depending on the application may not be uncommon.  So
      avoid doing an unnecessary fget().
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d98ebfc
    • K
      aio: dprintk() -> pr_debug() · caf4167a
      Kent Overstreet 提交于
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      caf4167a
    • K
      aio: move private stuff out of aio.h · 4e179bca
      Kent Overstreet 提交于
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e179bca
    • K
      aio: add kiocb_cancel() · 906b973c
      Kent Overstreet 提交于
      Minor refactoring, to get rid of some duplicated code
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      906b973c
    • K
      aio: kill return value of aio_complete() · 2d68449e
      Kent Overstreet 提交于
      Nothing used the return value, and it probably wasn't possible to use it
      safely for the locked versions (aio_complete(), aio_put_req()).  Just
      kill it.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Acked-by: NZach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d68449e
    • Z
      aio: remove retry-based AIO · 41003a7b
      Zach Brown 提交于
      This removes the retry-based AIO infrastructure now that nothing in tree
      is using it.
      
      We want to remove retry-based AIO because it is fundemantally unsafe.
      It retries IO submission from a kernel thread that has only assumed the
      mm of the submitting task.  All other task_struct references in the IO
      submission path will see the kernel thread, not the submitting task.
      This design flaw means that nothing of any meaningful complexity can use
      retry-based AIO.
      
      This removes all the code and data associated with the retry machinery.
      The most significant benefit of this is the removal of the locking
      around the unused run list in the submission path.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Signed-off-by: NZach Brown <zab@redhat.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41003a7b
    • N
      hugetlbfs: fix mmap failure in unaligned size request · af73e4d9
      Naoya Horiguchi 提交于
      The current kernel returns -EINVAL unless a given mmap length is
      "almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
      given length is passed to vm_mmap_pgoff() as it is without being aligned
      with hugepage boundary.
      
      This is a regression introduced in commit 40716e29 ("hugetlbfs: fix
      alignment of huge page requests"), where alignment code is pushed into
      hugetlb_file_setup() and the variable len in caller side is not changed.
      
      To fix this, this patch partially reverts that commit, and adds
      alignment code in caller side.  And it also introduces hstate_sizelog()
      in order to get proper hstate to specified hugepage size.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881
      
      [akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: <iceman_dvd@yahoo.com>
      Cc: Steven Truelove <steven.truelove@utoronto.ca>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af73e4d9
    • E
      xfs: fallback to vmalloc for large buffers in xfs_compat_attrlist_by_handle · 7dfbcbef
      Eric Sandeen 提交于
      Shamelessly copied from dchinner's:
      ad650f5b xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get
          
      xfsdump uses a large buffer for extended attributes, which has a
      kmalloc'd shadow buffer in the kernel. This can fail after the
      system has been running for some time as it is a high order
      allocation. Add a fallback to vmalloc so that it doesn't require
      contiguous memory and so won't randomly fail while xfsdump is
      running.
      
      This was done for xfs_attrlist_by_handle but
      xfs_compat_attrlist_by_handle (the 32-bit version) needs the same
      attention.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      7dfbcbef
    • E
      xfs: fallback to vmalloc for large buffers in xfs_attrlist_by_handle · dd700d94
      Eric Sandeen 提交于
      Shamelessly copied from dchinner's:
      ad650f5b xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get
          
      xfsdump uses for a large buffer for extended attributes, which has a
      kmalloc'd shadow buffer in the kernel. This can fail after the
      system has been running for some time as it is a high order
      allocation. Add a fallback to vmalloc so that it doesn't require
      contiguous memory and so won't randomly fail while xfsdump is
      running.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      dd700d94