1. 21 4月, 2018 15 次提交
    • I
      autofs: mount point create should honour passed in mode · 1e630665
      Ian Kent 提交于
      The autofs file system mkdir inode operation blindly sets the created
      directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
      cause selinux dac_override denials.
      
      But the function also checks if the caller is the daemon (as no-one else
      should be able to do anything here) so there's no point in not honouring
      the passed in mode, allowing the daemon to set appropriate mode when
      required.
      
      Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e630665
    • U
      MAINTAINERS: add personal addresses for Sascha and Uwe · 1551cf74
      Uwe Kleine-König 提交于
      The idea behind using kernel@pengutronix.de (i.e. the mail alias for the
      kernel people at Pengutronix) as email address was to have a backup when
      a given developer is on vacation or run over by a bus. Make this more
      explicit by adding the alias as reviewer and use the personal address
      for Sascha and me.
      
      Link: http://lkml.kernel.org/r/20180413083312.11213-1-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Acked-by: NSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1551cf74
    • A
      kasan: add no_sanitize attribute for clang builds · 12c8f25a
      Andrey Konovalov 提交于
      KASAN uses the __no_sanitize_address macro to disable instrumentation of
      particular functions.  Right now it's defined only for GCC build, which
      causes false positives when clang is used.
      
      This patch adds a definition for clang.
      
      Note, that clang's revision 329612 or higher is required.
      
      [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
        Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
      Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Paul Lawrence <paullawrence@google.com>
      Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12c8f25a
    • I
      rapidio: fix rio_dma_transfer error handling · c5157b76
      Ioan Nicu 提交于
      Some of the mport_dma_req structure members were initialized late
      inside the do_dma_request() function, just before submitting the
      request to the dma engine. But we have some error branches before
      that. In case of such an error, the code would return on the error
      path and trigger the calling of dma_req_free() with a req structure
      which is not completely initialized. This causes a NULL pointer
      dereference in dma_req_free().
      
      This patch fixes these error branches by making sure that all
      necessary mport_dma_req structure members are initialized in
      rio_dma_transfer() immediately after the request structure gets
      allocated.
      
      Link: http://lkml.kernel.org/r/20180412150605.GA31409@nokia.com
      Fixes: bbd876ad ("rapidio: use a reference count for struct mport_dma_req")
      Signed-off-by: NIoan Nicu <ioan.nicu.ext@nokia.com>
      Tested-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: NAlexandre Bounine <alex.bou9@gmail.com>
      Cc: Barry Wood <barry.wood@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Frank Kunz <frank.kunz@nokia.com>
      Cc: <stable@vger.kernel.org>	[4.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5157b76
    • N
      mm: enable thp migration for shmem thp · e71769ae
      Naoya Horiguchi 提交于
      My testing for the latest kernel supporting thp migration showed an
      infinite loop in offlining the memory block that is filled with shmem
      thps.  We can get out of the loop with a signal, but kernel should return
      with failure in this case.
      
      What happens in the loop is that scan_movable_pages() repeats returning
      the same pfn without any progress.  That's because page migration always
      fails for shmem thps.
      
      In memory offline code, memory blocks containing unmovable pages should be
      prevented from being offline targets by has_unmovable_pages() inside
      start_isolate_page_range().  So it's possible to change migratability for
      non-anonymous thps to avoid the issue, but it introduces more complex and
      thp-specific handling in migration code, so it might not good.
      
      So this patch is suggesting to fix the issue by enabling thp migration for
      shmem thp.  Both of anon/shmem thp are migratable so we don't need
      precheck about the type of thps.
      
      Link: http://lkml.kernel.org/r/20180406030706.GA2434@hori1.linux.bs1.fc.nec.co.jp
      Fixes: commit 72b39cfc ("mm, memory_hotplug: do not fail offlining too early")
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Zi Yan <zi.yan@sent.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e71769ae
    • G
      writeback: safer lock nesting · 2e898e4c
      Greg Thelen 提交于
      lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
      the page's memcg is undergoing move accounting, which occurs when a
      process leaves its memcg for a new one that has
      memory.move_charge_at_immigrate set.
      
      unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
      the given inode is switching writeback domains.  Switches occur when
      enough writes are issued from a new domain.
      
      This existing pattern is thus suspicious:
          lock_page_memcg(page);
          unlocked_inode_to_wb_begin(inode, &locked);
          ...
          unlocked_inode_to_wb_end(inode, locked);
          unlock_page_memcg(page);
      
      If both inode switch and process memcg migration are both in-flight then
      unlocked_inode_to_wb_end() will unconditionally enable interrupts while
      still holding the lock_page_memcg() irq spinlock.  This suggests the
      possibility of deadlock if an interrupt occurs before unlock_page_memcg().
      
          truncate
          __cancel_dirty_page
          lock_page_memcg
          unlocked_inode_to_wb_begin
          unlocked_inode_to_wb_end
          <interrupts mistakenly enabled>
                                          <interrupt>
                                          end_page_writeback
                                          test_clear_page_writeback
                                          lock_page_memcg
                                          <deadlock>
          unlock_page_memcg
      
      Due to configuration limitations this deadlock is not currently possible
      because we don't mix cgroup writeback (a cgroupv2 feature) and
      memory.move_charge_at_immigrate (a cgroupv1 feature).
      
      If the kernel is hacked to always claim inode switching and memcg
      moving_account, then this script triggers lockup in less than a minute:
      
        cd /mnt/cgroup/memory
        mkdir a b
        echo 1 > a/memory.move_charge_at_immigrate
        echo 1 > b/memory.move_charge_at_immigrate
        (
          echo $BASHPID > a/cgroup.procs
          while true; do
            dd if=/dev/zero of=/mnt/big bs=1M count=256
          done
        ) &
        while true; do
          sync
        done &
        sleep 1h &
        SLEEP=$!
        while true; do
          echo $SLEEP > a/cgroup.procs
          echo $SLEEP > b/cgroup.procs
        done
      
      The deadlock does not seem possible, so it's debatable if there's any
      reason to modify the kernel.  I suggest we should to prevent future
      surprises.  And Wang Long said "this deadlock occurs three times in our
      environment", so there's more reason to apply this, even to stable.
      Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
      see "[PATCH for-4.4] writeback: safer lock nesting"
      https://lkml.org/lkml/2018/4/11/146
      
      Wang Long said "this deadlock occurs three times in our environment"
      
      [gthelen@google.com: v4]
        Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
      [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
      Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
      Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
      Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Reported-by: NWang Long <wanglong19@meituan.com>
      Acked-by: NWang Long <wanglong19@meituan.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>	[v4.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e898e4c
    • H
      mm, pagemap: fix swap offset value for PMD migration entry · 88c28f24
      Huang Ying 提交于
      The swap offset reported by /proc/<pid>/pagemap may be not correct for
      PMD migration entries.  If addr passed into pagemap_pmd_range() isn't
      aligned with PMD start address, the swap offset reported doesn't
      reflect this.  And in the loop to report information of each sub-page,
      the swap offset isn't increased accordingly as that for PFN.
      
      This may happen after opening /proc/<pid>/pagemap and seeking to a page
      whose address doesn't align with a PMD start address.  I have verified
      this with a simple test program.
      
      BTW: migration swap entries have PFN information, do we need to restrict
      whether to show them?
      
      [akpm@linux-foundation.org: fix typo, per Huang, Ying]
      Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Jerome Glisse" <jglisse@redhat.com>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88c28f24
    • M
      mm: fix do_pages_move status handling · 8f175cf5
      Michal Hocko 提交于
      Li Wang has reported that LTP move_pages04 test fails with the current
      tree:
      
      LTP move_pages04:
         TFAIL  :  move_pages04.c:143: status[1] is EPERM, expected EFAULT
      
      The test allocates an array of two pages, one is present while the other
      is not (resp.  backed by zero page) and it expects EFAULT for the second
      page as the man page suggests.  We are reporting EPERM which doesn't make
      any sense and this is a result of a bug from cf5f16b23ec9 ("mm: unclutter
      THP migration").
      
      do_pages_move tries to handle as many pages in one batch as possible so we
      queue all pages with the same node target together and that corresponds to
      [start, i] range which is then used to update status array.
      add_page_for_migration will correctly notice the zero (resp.  !present)
      page and returns with EFAULT which gets written to the status.  But if
      this is the last page in the array we do not update start and so the last
      store_status after the loop will overwrite the range of the last batch
      with NUMA_NO_NODE (which corresponds to EPERM).
      
      Fix this by simply bailing out from the last flush if the pagelist is
      empty as there is clearly nothing more to do.
      
      Link: http://lkml.kernel.org/r/20180418121255.334-1-mhocko@kernel.org
      Fixes: cf5f16b23ec9 ("mm: unclutter THP migration")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NLi Wang <liwang@redhat.com>
      Tested-by: NLi Wang <liwang@redhat.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f175cf5
    • K
      fork: unconditionally clear stack on fork · e01e8063
      Kees Cook 提交于
      One of the classes of kernel stack content leaks[1] is exposing the
      contents of prior heap or stack contents when a new process stack is
      allocated.  Normally, those stacks are not zeroed, and the old contents
      remain in place.  In the face of stack content exposure flaws, those
      contents can leak to userspace.
      
      Fixing this will make the kernel no longer vulnerable to these flaws, as
      the stack will be wiped each time a stack is assigned to a new process.
      There's not a meaningful change in runtime performance; it almost looks
      like it provides a benefit.
      
      Performing back-to-back kernel builds before:
      	Run times: 157.86 157.09 158.90 160.94 160.80
      	Mean: 159.12
      	Std Dev: 1.54
      
      and after:
      	Run times: 159.31 157.34 156.71 158.15 160.81
      	Mean: 158.46
      	Std Dev: 1.46
      
      Instead of making this a build or runtime config, Andy Lutomirski
      recommended this just be enabled by default.
      
      [1] A noisy search for many kinds of stack content leaks can be seen here:
      https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
      
      I did some more with perf and cycle counts on running 100,000 execs of
      /bin/true.
      
      before:
      Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
      Mean:  221015379122.60
      Std Dev: 4662486552.47
      
      after:
      Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
      Mean:  217745009865.40
      Std Dev: 5935559279.99
      
      It continues to look like it's faster, though the deviation is rather
      wide, but I'm not sure what I could do that would be less noisy.  I'm
      open to ideas!
      
      Link: http://lkml.kernel.org/r/20180221021659.GA37073@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e01e8063
    • D
      vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion · a9e5b732
      David Howells 提交于
      In do_mount() when the MS_* flags are being converted to MNT_* flags,
      MS_RDONLY got accidentally convered to SB_RDONLY.
      
      Undo this change.
      
      Fixes: e462ec50 ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9e5b732
    • D
      afs: Fix server record deletion · 66062592
      David Howells 提交于
      AFS server records get removed from the net->fs_servers tree when
      they're deleted, but not from the net->fs_addresses{4,6} lists, which
      can lead to an oops in afs_find_server() when a server record has been
      removed, for instance during rmmod.
      
      Fix this by deleting the record from the by-address lists before posting
      it for RCU destruction.
      
      The reason this hasn't been noticed before is that the fileserver keeps
      probing the local cache manager, thereby keeping the service record
      alive, so the oops would only happen when a fileserver eventually gets
      bored and stops pinging or if the module gets rmmod'd and a call comes
      in from the fileserver during the window between the server records
      being destroyed and the socket being closed.
      
      The oops looks something like:
      
        BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
        ...
        Workqueue: kafsd afs_process_async_call [kafs]
        RIP: 0010:afs_find_server+0x271/0x36f [kafs]
        ...
        Call Trace:
         afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
         afs_deliver_to_call+0x1ee/0x5e8 [kafs]
         afs_process_async_call+0x5b/0xd0 [kafs]
         process_one_work+0x2c2/0x504
         worker_thread+0x1d4/0x2ac
         kthread+0x11f/0x127
         ret_from_fork+0x24/0x30
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66062592
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a72db42c
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Unbalanced refcounting in TIPC, from Jon Maloy.
      
       2) Only allow TCP_MD5SIG to be set on sockets in close or listen state.
          Once the connection is established it makes no sense to change this.
          From Eric Dumazet.
      
       3) Missing attribute validation in neigh_dump_table(), also from Eric
          Dumazet.
      
       4) Fix address comparisons in SCTP, from Xin Long.
      
       5) Neigh proxy table clearing can deadlock, from Wolfgang Bumiller.
      
       6) Fix tunnel refcounting in l2tp, from Guillaume Nault.
      
       7) Fix double list insert in team driver, from Paolo Abeni.
      
       8) af_vsock.ko module was accidently made unremovable, from Stefan
          Hajnoczi.
      
       9) Fix reference to freed llc_sap object in llc stack, from Cong Wang.
      
      10) Don't assume netdevice struct is DMA'able memory in virtio_net
          driver, from Michael S. Tsirkin.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
        net/smc: fix shutdown in state SMC_LISTEN
        bnxt_en: Fix memory fault in bnxt_ethtool_init()
        virtio_net: sparse annotation fix
        virtio_net: fix adding vids on big-endian
        virtio_net: split out ctrl buffer
        net: hns: Avoid action name truncation
        docs: ip-sysctl.txt: fix name of some ipv6 variables
        vmxnet3: fix incorrect dereference when rxvlan is disabled
        llc: hold llc_sap before release_sock()
        MAINTAINERS: Direct networking documentation changes to netdev
        atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit"
        net: qmi_wwan: add Wistron Neweb D19Q1
        net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"
        net: stmmac: Disable ACS Feature for GMAC >= 4
        net: mvpp2: Fix DMA address mask size
        net: change the comment of dev_mc_init
        net: qualcomm: rmnet: Fix warning seen with fill_info
        tun: fix vlan packet truncation
        tipc: fix infinite loop when dumping link monitor summary
        tipc: fix use-after-free in tipc_nametbl_stop
        ...
      a72db42c
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · b9abdcfd
      Linus Torvalds 提交于
      Pull vfs fixes from Al Viro:
       "Assorted fixes.
      
        Some of that is only a matter with fault injection (broken handling of
        small allocation failure in various mount-related places), but the
        last one is a root-triggerable stack overflow, and combined with
        userns it gets really nasty ;-/"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        Don't leak MNT_INTERNAL away from internal mounts
        mm,vmscan: Allow preallocating memory for register_shrinker().
        rpc_pipefs: fix double-dput()
        orangefs_kill_sb(): deal with allocation failures
        jffs2_kill_sb(): deal with failed allocations
        hypfs_kill_super(): deal with failed allocations
      b9abdcfd
    • L
      Merge tag 'ecryptfs-4.17-rc2-fixes' of... · 43f70c96
      Linus Torvalds 提交于
      Merge tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
      
      Pull eCryptfs fixes from Tyler Hicks:
       "Minor cleanups and a bug fix to completely ignore unencrypted
        filenames in the lower filesystem when filename encryption is enabled
        at the eCryptfs layer"
      
      * tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
        eCryptfs: don't pass up plaintext names when using filename encryption
        ecryptfs: fix spelling mistake: "cadidate" -> "candidate"
        ecryptfs: lookup: Don't check if mount_crypt_stat is NULL
      43f70c96
    • L
      Merge tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 0d9cf33b
      Linus Torvalds 提交于
       - isofs memory leak fix
      
       - two fsnotify fixes of event mask handling
      
       - udf fix of UTF-16 handling
      
       - couple other smaller cleanups
      
      * tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        udf: Fix leak of UTF-16 surrogates into encoded strings
        fs: ext2: Adding new return type vm_fault_t
        isofs: fix potential memory leak in mount option parsing
        MAINTAINERS: add an entry for FSNOTIFY infrastructure
        fsnotify: fix typo in a comment about mark->g_list
        fsnotify: fix ignore mask logic in send_to_group()
        isofs compress: Remove VLA usage
        fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
        fanotify: fix logic of events on child
      0d9cf33b
  2. 20 4月, 2018 24 次提交
  3. 19 4月, 2018 1 次提交