提交 · f044c8847bb61eff5e1e95b6f6bb950e7f4a73a4 · openanolis / cloud-kernel

13 11月, 2017 2 次提交

afs: Lay the groundwork for supporting network namespaces · f044c884

由 David Howells 提交于 11月 02, 2017

Lay the groundwork for supporting network namespaces (netns) to the AFS
filesystem by moving various global features to a network-namespace struct
(afs_net) and providing an instance of this as a temporary global variable
that everything uses via accessor functions for the moment.

The following changes have been made:

 (1) Store the netns in the superblock info.  This will be obtained from
     the mounter's nsproxy on a manual mount and inherited from the parent
     superblock on an automount.

 (2) The cell list is made per-netns.  It can be viewed through
     /proc/net/afs/cells and also be modified by writing commands to that
     file.

 (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
     This is unset by default.

 (4) The 'rootcell' module parameter, which sets a cell and VL server list
     modifies the init net namespace, thereby allowing an AFS root fs to be
     theoretically used.

 (5) The volume location lists and the file lock manager are made
     per-netns.

 (6) The AF_RXRPC socket and associated I/O bits are made per-ns.

The various workqueues remain global for the moment.

Changes still to be made:

 (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
     from the old name.

 (2) A per-netns subsys needs to be registered for AFS into which it can
     store its per-netns data.

 (3) Rather than the AF_RXRPC socket being opened on module init, it needs
     to be opened on the creation of a superblock in that netns.

 (4) The socket needs to be closed when the last superblock using it is
     destroyed and all outstanding client calls on it have been completed.
     This prevents a reference loop on the namespace.

 (5) It is possible that several namespaces will want to use AFS, in which
     case each one will need its own UDP port.  These can either be set
     through /proc/net/afs/cm_port or the kernel can pick one at random.
     The init_ns gets 7001 by default.

Other issues that need resolving:

 (1) The DNS keyring needs net-namespacing.

 (2) Where do upcalls go (eg. DNS request-key upcall)?

 (3) Need something like open_socket_in_file_ns() syscall so that AFS
     command line tools attempting to operate on an AFS file/volume have
     their RPC calls go to the right place.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

f044c884

Pass mode to wait_on_atomic_t() action funcs and provide default actions · 5e4def20

由 David Howells 提交于 11月 02, 2017

Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
extra argument and make it 'unsigned int throughout.

Also, consolidate a bunch of identical action functions into a default
function that can do the appropriate thing for the mode.

Also, change the argument name in the bit_wait*() function declarations to
reflect the fact that it's the mode and not the bit number.

[Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
should be done differently, though he's not immediately sure as to how]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
cc: Ingo Molnar <mingo@kernel.org>

5e4def20

12 11月, 2017 1 次提交

pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() · df27067e

由 Arnd Bergmann 提交于 11月 10, 2017

__getnstimeofday() is a rather odd interface, with a number of quirks:

- The caller may come from NMI context, but the implementation is not NMI safe,
  one way to get there from NMI is

      NMI handler:
        something bad
          panic()
            kmsg_dump()
              pstore_dump()
                 pstore_record_init()
                   __getnstimeofday()

- The calling conventions are different from any other timekeeping functions,
  to deal with returning an error code during suspended timekeeping.

Address the above issues by using a completely different method to get the
time: ktime_get_real_fast_ns() is NMI safe and has a reasonable behavior
when timekeeping is suspended: it returns the time at which it got
suspended. As Thomas Gleixner explained, this is safe, as
ktime_get_real_fast_ns() does not call into the clocksource driver that
might be suspended.

The result can easily be transformed into a timespec structure. Since
ktime_get_real_fast_ns() was not exported to modules, add the export.

The pstore behavior for the suspended case changes slightly, as it now
stores the timestamp at which timekeeping was suspended instead of storing
a zero timestamp.

This change is not addressing y2038-safety, that's subject to a more
complex follow up patch.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Colin Cross <ccross@android.com>
Link: https://lkml.kernel.org/r/20171110152530.1926955-1-arnd@arndb.de

df27067e

03 11月, 2017 4 次提交

fs/hugetlbfs/inode.c: fix hwpoison reserve accounting · ab615a5b

由 Mike Kravetz 提交于 11月 02, 2017

Calling madvise(MADV_HWPOISON) on a hugetlbfs page will result in bad
(negative) reserved huge page counts.  This may not happen immediately,
but may happen later when the underlying file is removed or filesystem
unmounted.  For example:

  AnonHugePages:         0 kB
  ShmemHugePages:        0 kB
  HugePages_Total:       1
  HugePages_Free:        0
  HugePages_Rsvd:    18446744073709551615
  HugePages_Surp:        0
  Hugepagesize:       2048 kB

In routine hugetlbfs_error_remove_page(), hugetlb_fix_reserve_counts is
called after remove_huge_page.  hugetlb_fix_reserve_counts is designed
to only be called/used only if a failure is returned from
hugetlb_unreserve_pages.  Therefore, call hugetlb_unreserve_pages as
required and only call hugetlb_fix_reserve_counts in the unlikely event
that hugetlb_unreserve_pages returns an error.

Link: http://lkml.kernel.org/r/20171019230007.17043-2-mike.kravetz@oracle.com
Fixes: 78bb9203 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab615a5b

ocfs2: fstrim: Fix start offset of first cluster group during fstrim · 105ddc93

由 Ashish Samant 提交于 11月 02, 2017

The first cluster group descriptor is not stored at the start of the
group but at an offset from the start.  We need to take this into
account while doing fstrim on the first cluster group.  Otherwise we
will wrongly start fstrim a few blocks after the desired start block and
the range can cross over into the next cluster group and zero out the
group descriptor there.  This can cause filesytem corruption that cannot
be fixed by fsck.

Link: http://lkml.kernel.org/r/1507835579-7308-1-git-send-email-ashish.samant@oracle.comSigned-off-by: NAshish Samant <ashish.samant@oracle.com>
Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NJoseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

105ddc93

mm, /proc/pid/pagemap: fix soft dirty marking for PMD migration entry · b83d7e43

由 Huang Ying 提交于 11月 02, 2017

When the pagetable is walked in the implementation of /proc/<pid>/pagemap,
pmd_soft_dirty() is used for both the PMD huge page map and the PMD
migration entries.  That is wrong, pmd_swp_soft_dirty() should be used
for the PMD migration entries instead because the different page table
entry flag is used.

As a result, /proc/pid/pagemap may report incorrect soft dirty information
for PMD migration entries.

Link: http://lkml.kernel.org/r/20171017081818.31795-1-ying.huang@intel.com
Fixes: 84c3fc4e ("mm: thp: check pmd migration entry in common path")
Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b83d7e43

fs/ncpfs: Convert timers to use timer_setup() · 9b5dfbdd

由 Kees Cook 提交于 8月 29, 2017

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NJan Kara <jack@suse.cz>

9b5dfbdd

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

30 10月, 2017 1 次提交

cifs: check MaxPathNameComponentLength != 0 before using it · f74bc7c6

由 Ronnie Sahlberg 提交于 10月 30, 2017

And fix tcon leak in error path.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: NDavid Disseldorp <ddiss@samba.org>

f74bc7c6

27 10月, 2017 1 次提交

SMB3: Validate negotiate request must always be signed · 4587eee0

由 Steve French 提交于 10月 25, 2017

According to MS-SMB2 3.2.55 validate_negotiate request must
always be signed. Some Windows can fail the request if you send it unsigned

See kernel bugzilla bug 197311

CC: Stable <stable@vger.kernel.org>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

4587eee0

26 10月, 2017 5 次提交

SMB: fix validate negotiate info uninitialised memory use · a2d9daad

由 David Disseldorp 提交于 10月 20, 2017

An undersize validate negotiate info server response causes the client
to use uninitialised memory for struct validate_negotiate_info_rsp
comparisons of Dialect, SecurityMode and/or Capabilities members.

Link: https://bugzilla.samba.org/show_bug.cgi?id=13092
Fixes: 7db0a6ef ("SMB3: Work around mount failure when using SMB3 dialect to Macs")
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

a2d9daad

SMB: fix leak of validate negotiate info response buffer · fe83bebc

由 David Disseldorp 提交于 10月 20, 2017

Fixes: ff1c038a ("Check SMB3 dialects against downgrade attacks")
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Signed-off-by: NSteve French <smfrench@gmail.com>

fe83bebc

CIFS: Fix NULL pointer deref on SMB2_tcon() failure · db3b5474

由 Aurélien Aptel 提交于 10月 11, 2017

If SendReceive2() fails rsp is set to NULL but is dereferenced in the
error handling code.

Cc: stable@vger.kernel.org
Signed-off-by: NAurelien Aptel <aaptel@suse.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

db3b5474

CIFS: do not send invalid input buffer on QUERY_INFO requests · 48923d2a

由 Aurelien Aptel 提交于 10月 17, 2017

query_info() doesn't use the InputBuffer field of the QUERY_INFO
request, therefore according to [MS-SMB2] it must:

a) set the InputBufferOffset to 0
b) send a zero-length InputBuffer

Doing a) is trivial but b) is a bit more tricky.

The packet is allocated according to it's StructureSize, which takes
into account an extra 1 byte buffer which we don't need
here. StructureSize fields must have constant values no matter the
actual length of the whole packet so we can't just edit that constant.

Both the NetBIOS-over-TCP message length ("rfc1002 length") L and the
iovec length L' have to be updated. Since L' is computed from L we
just update L by decrementing it by one.
Signed-off-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

48923d2a

cifs: Select all required crypto modules · 5b454a64

由 Benjamin Gilbert 提交于 10月 19, 2017

Some dependencies were lost when CIFS_SMB2 was merged into CIFS.

Fixes: 2a38e120 ("[SMB3] Remove ifdef since SMB3 (and later) now STRONGLY preferred")
Signed-off-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

5b454a64

25 10月, 2017 2 次提交

fuse: fix READDIRPLUS skipping an entry · c6cdd514

由 Miklos Szeredi 提交于 10月 25, 2017

Marios Titas running a Haskell program noticed a problem with fuse's
readdirplus: when it is interrupted by a signal, it skips one directory
entry.

The reason is that fuse erronously updates ctx->pos after a failed
dir_emit().

The issue originates from the patch adding readdirplus support.
Reported-by: NJakob Unterwurzacher <jakobunt@gmail.com>
Tested-by: Marios Titas <redneb@gmx.com> 
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Fixes: 0b05b183 ("fuse: implement NFS-like readdirplus support")
Cc: <stable@vger.kernel.org> # v3.9

c6cdd514

ceph: unlock dangling spinlock in try_flush_caps() · 6c2838fb

由 Jeff Layton 提交于 10月 19, 2017

sparse warns:

  fs/ceph/caps.c:2042:9: warning: context imbalance in 'try_flush_caps' - wrong count at exit

We need to exit this function with the lock unlocked, but a couple of
cases leave it locked.

Cc: stable@vger.kernel.org
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6c2838fb

24 10月, 2017 4 次提交

ovl: do not cleanup unsupported index entries · fa0096e3

由 Amir Goldstein 提交于 10月 24, 2017

With index=on, ovl_indexdir_cleanup() tries to cleanup invalid index
entries (e.g. bad index name). This behavior could result in cleaning of
entries created by newer kernels and is therefore undesirable.
Instead, abort mount if such entries are encountered. We still cleanup
'stale' entries and 'orphan' entries, both those cases can be a result
of offline changes to lower and upper dirs.

When encoutering an index entry of type directory or whiteout, kernel
was supposed to fallback to read-only mount, but the fill_super()
operation returns EROFS in this case instead of returning success with
read-only mount flag, so mount fails when encoutering directory or
whiteout index entries. Bless this behavior by returning -EINVAL on
directory and whiteout index entries as we do for all unsupported index
entries.

Fixes: 61b67471 ("ovl: do not cleanup directory and whiteout index..")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>

fa0096e3

ovl: handle ENOENT on index lookup · 7937a56f

由 Amir Goldstein 提交于 10月 20, 2017

Treat ENOENT from index entry lookup the same way as treating a returned
negative dentry. Apparently, either could be returned if file is not
found, depending on the underlying file system.

Fixes: 359f392c ("ovl: lookup index entry for copy up origin")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>

7937a56f

ovl: fix EIO from lookup of non-indexed upper · 6eaf0111

由 Amir Goldstein 提交于 10月 12, 2017

Commit fbaf94ee ("ovl: don't set origin on broken lower hardlink")
attempt to avoid the condition of non-indexed upper inode with lower
hardlink as origin. If this condition is found, lookup returns EIO.

The protection of commit mentioned above does not cover the case of lower
that is not a hardlink when it is copied up (with either index=off/on)
and then lower is hardlinked while overlay is offline.

Changes to lower layer while overlayfs is offline should not result in
unexpected behavior, so a permanent EIO error after creating a link in
lower layer should not be considered as correct behavior.

This fix replaces EIO error with success in cases where upper has origin
but no index is found, or index is found that does not match upper
inode. In those cases, lookup will not fail and the returned overlay inode
will be hashed by upper inode instead of by lower origin inode.

Fixes: 359f392c ("ovl: lookup index entry for copy up origin")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

6eaf0111

xfs: fix AIM7 regression · 942491c9

由 Christoph Hellwig 提交于 10月 23, 2017

Apparently our current rwsem code doesn't like doing the trylock, then
lock for real scheme.  So change our read/write methods to just do the
trylock for the RWF_NOWAIT case.  This fixes a ~25% regression in
AIM7.

Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads")
Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

942491c9

20 10月, 2017 1 次提交

membarrier: Provide register expedited private command · a961e409

由 Mathieu Desnoyers 提交于 10月 19, 2017

This introduces a "register private expedited" membarrier command which
allows eventual removal of important memory barrier constraints on the
scheduler fast-paths. It changes how the "private expedited" membarrier
command (new to 4.14) is used from user-space.

This new command allows processes to register their intent to use the
private expedited command.  This affects how the expedited private
command introduced in 4.14-rc is meant to be used, and should be merged
before 4.14 final.

Processes are now required to register before using
MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.

This fixes a problem that arose when designing requested extensions to
sys_membarrier() to allow JITs to efficiently flush old code from
instruction caches.  Several potential algorithms are much less painful
if the user register intent to use this functionality early on, for
example, before the process spawns the second thread.  Registering at
this time removes the need to interrupt each and every thread in that
process at the first expedited sys_membarrier() system call.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a961e409

19 10月, 2017 6 次提交

ovl: Return -ENOMEM if an allocation fails ovl_lookup() · 0ce5cdc9

由 Dan Carpenter 提交于 9月 22, 2017

The error code is missing here so it means we return ERR_PTR(0) or NULL.
The other error paths all return an error code so this probably should
as well.

Fixes: 02b69b28 ("ovl: lookup redirects")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0ce5cdc9

ovl: add NULL check in ovl_alloc_inode · b3885bd6

由 Hirofumi Nakagawa 提交于 9月 26, 2017

This was detected by fault injection test
Signed-off-by: NHirofumi Nakagawa <nklabs@gmail.com>
Fixes: 13cf199d ("ovl: allocate an ovl_inode struct")
Cc: <stable@vger.kernel.org> # v4.13

b3885bd6

Convert fs/*/* to SB_I_VERSION · 357fdad0

由 Matthew Garrett 提交于 10月 18, 2017

[AV: in addition to the fix in previous commit]
Signed-off-by: NMatthew Garrett <mjg59@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

357fdad0

CIFS: SMBD: Fix the definition for SMB2_CHANNEL_RDMA_V1_INVALIDATE · 4572f053

由 Long Li 提交于 10月 01, 2017

The channel value for requesting server remote invalidating local memory
registration should be 0x00000002
Signed-off-by: NLong Li <longli@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

4572f053

cifs: handle large EA requests more gracefully in smb2+ · 7cb3def4

由 Ronnie Sahlberg 提交于 9月 28, 2017

Update reading the EA using increasingly larger buffer sizes
until the response will fit in the buffer, or we exceed the
(arbitrary) maximum set to 64kb.

Without this change, a user is able to add more and more EAs using
setfattr until the point where the total space of all EAs exceed 2kb
at which point the user can no longer list the EAs at all
and getfattr will abort with an error.

The same issue still exists for EAs in SMB1.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reported-by: NXiaoli Feng <xifeng@redhat.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

7cb3def4

Fix encryption labels and lengths for SMB3.1.1 · 06e22908

由 Steve French 提交于 9月 25, 2017

SMB3.1.1 is most secure and recent dialect. Fixup labels and lengths
for sMB3.1.1 signing and encryption.
Signed-off-by: NSteve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>

06e22908

18 10月, 2017 2 次提交

rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals · bc5e3a54

由 David Howells 提交于 10月 18, 2017

Make AF_RXRPC accept MSG_WAITALL as a flag to sendmsg() to tell it to
ignore signals whilst loading up the message queue, provided progress is
being made in emptying the queue at the other side.

Progress is defined as the base of the transmit window having being
advanced within 2 RTT periods.  If the period is exceeded with no progress,
sendmsg() will return anyway, indicating how much data has been copied, if
any.

Once the supplied buffer is entirely decanted, the sendmsg() will return.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

bc5e3a54

rxrpc: Support service upgrade from a kernel service · a68f4a27

由 David Howells 提交于 10月 18, 2017

Provide support for a kernel service to make use of the service upgrade
facility.  This involves:

 (1) Pass an upgrade request flag to rxrpc_kernel_begin_call().

 (2) Make rxrpc_kernel_recv_data() return the call's current service ID so
     that the caller can detect service upgrade and see what the service
     was upgraded to.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a68f4a27

17 10月, 2017 6 次提交

fs: Avoid invalidation in interrupt context in dio_complete() · ffe51f01

由 Lukas Czerner 提交于 10月 17, 2017

Currently we try to defer completion of async DIO to the process context
in case there are any mapped pages associated with the inode so that we
can invalidate the pages when the IO completes. However the check is racy
and the pages can be mapped afterwards. If this happens we might end up
calling invalidate_inode_pages2_range() in dio_complete() in interrupt
context which could sleep. This can be reproduced by generic/451.

Fix this by passing the information whether we can or can't invalidate
to the dio_complete(). Thanks Eryu Guan for reporting this and Jan Kara
for suggesting a fix.

Fixes: 332391a9 ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
Reported-by: NEryu Guan <eguan@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Tested-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ffe51f01

vfs: fix mounting a filesystem with i_version · 917086ff

由 Mimi Zohar 提交于 10月 08, 2017

The mount i_version flag is not enabled in the new sb_flags.  This patch
adds the missing SB_I_VERSION flag.

Fixes: e462ec50 "VFS: Differentiate mount flags (MS_*) from internal
       superblock flags"
Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

917086ff

xfs: move two more RT specific functions into CONFIG_XFS_RT · 785545c8

由 Arnd Bergmann 提交于 10月 13, 2017

The last cleanup introduced two harmless warnings:

fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not used
fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' defined but not used

This moves those two functions as well.

Fixes: bb9c2e54 ("xfs: move more RT specific code under CONFIG_XFS_RT")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

785545c8

xfs: trim writepage mapping to within eof · 40214d12

由 Brian Foster 提交于 10月 13, 2017

The writeback rework in commit fbcc0256 ("xfs: Introduce
writeback context for writepages") introduced a subtle change in
behavior with regard to the block mapping used across the
->writepages() sequence. The previous xfs_cluster_write() code would
only flush pages up to EOF at the time of the writepage, thus
ensuring that any pages due to file-extending writes would be
handled on a separate cycle and with a new, updated block mapping.

The updated code establishes a block mapping in xfs_writepage_map()
that could extend beyond EOF if the file has post-eof preallocation.
Because we now use the generic writeback infrastructure and pass the
cached mapping to each writepage call, there is no implicit EOF
limit in place. If eofblocks trimming occurs during ->writepages(),
any post-eof portion of the cached mapping becomes invalid. The
eofblocks code has no means to serialize against writeback because
there are no pages associated with post-eof blocks. Therefore if an
eofblocks trim occurs and is followed by a file-extending buffered
write, not only has the mapping become invalid, but we could end up
writing a page to disk based on the invalid mapping.

Consider the following sequence of events:

- A buffered write creates a delalloc extent and post-eof
  speculative preallocation.
- Writeback starts and on the first writepage cycle, the delalloc
  extent is converted to real blocks (including the post-eof blocks)
  and the mapping is cached.
- The file is closed and xfs_release() trims post-eof blocks. The
  cached writeback mapping is now invalid.
- Another buffered write appends the file with a delalloc extent.
- The concurrent writeback cycle picks up the just written page
  because the writeback range end is LLONG_MAX. xfs_writepage_map()
  attributes it to the (now invalid) cached mapping and writes the
  data to an incorrect location on disk (and where the file offset is
  still backed by a delalloc extent).

This problem is reproduced by xfstests test generic/464, which
triggers racing writes, appends, open/closes and writeback requests.

To address this problem, trim the mapping used during writeback to
within EOF when the mapping is validated. This ensures the mapping
is revalidated for any pages encountered beyond EOF as of the time
the current mapping was cached or last validated.
Reported-by: NEryu Guan <eguan@redhat.com>
Diagnosed-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

40214d12

fs: invalidate page cache after end_io() in dio completion · 5e25c269

由 Eryu Guan 提交于 10月 13, 2017

Commit 332391a9 ("fs: Fix page cache inconsistency when mixing
buffered and AIO DIO") moved page cache invalidation from
iomap_dio_rw() to iomap_dio_complete() for iomap based direct write
path, but before the dio->end_io() call, and it re-introdued the bug
fixed by commit c771c14b ("iomap: invalidate page caches should
be after iomap_dio_complete() in direct write").

I found this because fstests generic/418 started failing on XFS with
v4.14-rc3 kernel, which is the regression test for this specific
bug.

So similarly, fix it by moving dio->end_io() (which does the
unwritten extent conversion) before page cache invalidation, to make
sure next buffer read reads the final real allocations not unwritten
extents. I also add some comments about why should end_io() go first
in case we get it wrong again in the future.

Note that, there's no such problem in the non-iomap based direct
write path, because we didn't remove the page cache invalidation
after the ->direct_IO() in generic_file_direct_write() call, but I
decided to fix dio_complete() too so we don't leave a landmine
there, also be consistent with iomap_dio_complete().

Fixes: 332391a9 ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
Signed-off-by: NEryu Guan <eguan@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>

5e25c269

xfs: cancel dirty pages on invalidation · 793d7dbe

由 Dave Chinner 提交于 10月 13, 2017

Recently we've had warnings arise from the vm handing us pages
without bufferheads attached to them. This should not ever occur
in XFS, but we don't defend against it properly if it does. The only
place where we remove bufferheads from a page is in
xfs_vm_releasepage(), but we can't tell the difference here between
"page is dirty so don't release" and "page is dirty but is being
invalidated so release it".

In some places that are invalidating pages ask for pages to be
released and follow up afterward calling ->releasepage by checking
whether the page was dirty and then aborting the invalidation. This
is a possible vector for releasing buffers from a page but then
leaving it in the mapping, so we really do need to avoid dirty pages
in xfs_vm_releasepage().

To differentiate between invalidated pages and normal pages, we need
to clear the page dirty flag when invalidating the pages. This can
be done through xfs_vm_invalidatepage(), and will result
xfs_vm_releasepage() seeing the page as clean which matches the
bufferhead state on the page after calling block_invalidatepage().

Hence we can re-add the page dirty check in xfs_vm_releasepage to
catch the case where we might be releasing a page that is actually
dirty and so should not have the bufferheads on it removed. This
will remove one possible vector of "dirty page with no bufferheads"
and so help narrow down the search for the root cause of that
problem.
Signed-Off-By: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

793d7dbe

14 10月, 2017 2 次提交

fs/binfmt_misc.c: node could be NULL when evicting inode · 7e866006

由 Eryu Guan 提交于 10月 13, 2017

inode->i_private is assigned by a Node pointer only after registering a
new binary format, so it could be NULL if inode was created by
bm_fill_super() (or iput() was called by the error path in
bm_register_write()), and this could result in NULL pointer dereference
when evicting such an inode.  e.g.  mount binfmt_misc filesystem then
umount it immediately:

  mount -t binfmt_misc binfmt_misc /proc/sys/fs/binfmt_misc
  umount /proc/sys/fs/binfmt_misc

will result in

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000013
  IP: bm_evict_inode+0x16/0x40 [binfmt_misc]
  ...
  Call Trace:
   evict+0xd3/0x1a0
   iput+0x17d/0x1d0
   dentry_unlink_inode+0xb9/0xf0
   __dentry_kill+0xc7/0x170
   shrink_dentry_list+0x122/0x280
   shrink_dcache_parent+0x39/0x90
   do_one_tree+0x12/0x40
   shrink_dcache_for_umount+0x2d/0x90
   generic_shutdown_super+0x1f/0x120
   kill_litter_super+0x29/0x40
   deactivate_locked_super+0x43/0x70
   deactivate_super+0x45/0x60
   cleanup_mnt+0x3f/0x70
   __cleanup_mnt+0x12/0x20
   task_work_run+0x86/0xa0
   exit_to_usermode_loop+0x6d/0x99
   syscall_return_slowpath+0xba/0xf0
   entry_SYSCALL_64_fastpath+0xa3/0xa

Fix it by making sure Node (e) is not NULL.

Link: http://lkml.kernel.org/r/20171010100642.31786-1-eguan@redhat.com
Fixes: 83f91827 ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()")
Signed-off-by: NEryu Guan <eguan@redhat.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7e866006

fs/mpage.c: fix mpage_writepage() for pages with buffers · f892760a

由 Matthew Wilcox 提交于 10月 13, 2017

When using FAT on a block device which supports rw_page, we can hit
BUG_ON(!PageLocked(page)) in try_to_free_buffers().  This is because we
call clean_buffers() after unlocking the page we've written.  Introduce
a new clean_page_buffers() which cleans all buffers associated with a
page and call it from within bdev_write_page().

[akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew]
Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reported-by: NToshi Kani <toshi.kani@hpe.com>
Reported-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: NToshi Kani <toshi.kani@hpe.com>
Acked-by: NJohannes Thumshirn <jthumshirn@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f892760a

13 10月, 2017 2 次提交

ecryptfs: fix dereference of NULL user_key_payload · f66665c0

由 Eric Biggers 提交于 10月 09, 2017

In eCryptfs, we failed to verify that the authentication token keys are
not revoked before dereferencing their payloads, which is problematic
because the payload of a revoked key is NULL.  request_key() *does* skip
revoked keys, but there is still a window where the key can be revoked
before we acquire the key semaphore.

Fix it by updating ecryptfs_get_key_payload_data() to return
-EKEYREVOKED if the key payload is NULL.  For completeness we check this
for "encrypted" keys as well as "user" keys, although encrypted keys
cannot be revoked currently.

Alternatively we could use key_validate(), but since we'll also need to
fix ecryptfs_get_key_payload_data() to validate the payload length, it
seems appropriate to just check the payload pointer.

Fixes: 237fead6 ("[PATCH] ecryptfs: fs/Makefile and fs/Kconfig")
Reviewed-by: NJames Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org>    [v2.6.19+]
Cc: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

f66665c0

fscrypt: fix dereference of NULL user_key_payload · d60b5b78

由 Eric Biggers 提交于 10月 09, 2017

When an fscrypt-encrypted file is opened, we request the file's master
key from the keyrings service as a logon key, then access its payload.
However, a revoked key has a NULL payload, and we failed to check for
this.  request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.

Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.

Fixes: 88bd6ccd ("ext4 crypto: add encryption key management facilities")
Reviewed-by: NJames Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org>    [v4.1+]
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

d60b5b78

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功