1. 13 5月, 2016 3 次提交
    • J
      ocfs2: fix posix_acl_create deadlock · c25a1e06
      Junxiao Bi 提交于
      Commit 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      refactored code to use posix_acl_create.  The problem with this function
      is that it is not mindful of the cluster wide inode lock making it
      unsuitable for use with ocfs2 inode creation with ACLs.  For example,
      when used in ocfs2_mknod, this function can cause deadlock as follows.
      The parent dir inode lock is taken when calling posix_acl_create ->
      get_acl -> ocfs2_iop_get_acl which takes the inode lock again.  This can
      cause deadlock if there is a blocked remote lock request waiting for the
      lock to be downconverted.  And same deadlock happened in ocfs2_reflink.
      This fix is to revert back using ocfs2_init_acl.
      
      Fixes: 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      Signed-off-by: NTariq Saeed <tariq.x.saeed@oracle.com>
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c25a1e06
    • J
      ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang · 5ee0fbd5
      Junxiao Bi 提交于
      Commit 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      introduced this issue.  ocfs2_setattr called by chmod command holds
      cluster wide inode lock when calling posix_acl_chmod.  This latter
      function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl.  These
      two are also called directly from vfs layer for getfacl/setfacl commands
      and therefore acquire the cluster wide inode lock.  If a remote
      conversion request comes after the first inode lock in ocfs2_setattr,
      OCFS2_LOCK_BLOCKED will be set.  And this will cause the second call to
      inode lock from the ocfs2_iop_get_acl() to block indefinetly.
      
      The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which
      does not call back into the filesystem.  Therefore, we restore
      ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that
      instead.
      
      Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      Signed-off-by: NTariq Saeed <tariq.x.saeed@oracle.com>
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ee0fbd5
    • L
      Merge tag 'pinctrl-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 422ce5a9
      Linus Torvalds 提交于
      Pull pinctrl fix from Linus Walleij:
       "A single last pin control fix for v4.6.  t's tagged for stable and
        only hits a single driver with two added lines so should be safe.
        Tested in linux-next.
      
         - The pull up/down logic for the AT91 PIO4 controller was tilted: we
           need to mask the reverse pull when unmasking a pull direction.
      
           Setting both pull up & pull down is illegal and makes no sense"
      
      * tag 'pinctrl-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: at91-pio4: fix pull-up/down logic
      422ce5a9
  2. 12 5月, 2016 7 次提交
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 685764b1
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "This is a couple of small fixes: one is a potential uninitialised
        error variable in the alua code, potentially causing spurious failures
        and the other is a problem caused by the conversion of SCSI to
        hostwide tags which resulted in the qla1280 driver always failing in
        host initialisation"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        qla1280: Don't allocate 512kb of host tags
        scsi_dh_alua: uninitialized variable in alua_rtpg()
      685764b1
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4d8bbbff
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Hopefully the last round of fixes this release, fingers crossed :)
      
         1) Initialize static nf_conntrack_locks_all_lock properly, from
            Florian Westphal.
      
         2) Need to cancel pending work when destroying IDLETIMER entries,
            from Liping Zhang.
      
         3) Fix TX param usage when sending TSO over iwlwifi devices, from
            Emmanuel Grumbach.
      
         4) NFACCT quota params not validated properly, from Phil Turnbull.
      
         5) Resolve more glibc vs.  kernel header conflicts, from Mikko
            Tapeli.
      
         6) Missing IRQ free in ravb_close(), from Geert Uytterhoeven.
      
         7) Fix infoleak in x25, from Kangjie Lu.
      
         8) Similarly in thunderx driver, from Heinrich Schuchardt.
      
         9) tc_ife.h uapi header not exported properly, from Jamal Hadi Salim.
      
        10) Don't reenable PHY interreupts if device is in polling mode, from
            Shaohui Xie.
      
        11) Packet scheduler actions late binding was not being handled
            properly at all, from Jamal Hadi Salim.
      
        12) Fix binding of conntrack entries to helpers in openvswitch, from
            Joe Stringer"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
        gre: do not keep the GRE header around in collect medata mode
        openvswitch: Fix cached ct with helper.
        net sched: ife action fix late binding
        net sched: skbedit action fix late binding
        net sched: simple action fix late binding
        net sched: mirred action fix late binding
        net sched: ipt action fix late binding
        net sched: vlan action fix late binding
        net: phylib: fix interrupts re-enablement in phy_start
        tcp: refresh skb timestamp at retransmit time
        net: nps_enet: bug fix - handle lost tx interrupts
        net: nps_enet: Tx handler synchronization
        export tc ife uapi header
        net: thunderx: avoid exposing kernel stack
        net: fix a kernel infoleak in x25 module
        ravb: Add missing free_irq() call to ravb_close()
        uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h
        netfilter: nfnetlink_acct: validate NFACCT_QUOTA parameter
        iwlwifi: mvm: don't override the rate with the AMSDU len
        netfilter: IDLETIMER: fix race condition when destroy the target
        ...
      4d8bbbff
    • J
      gre: do not keep the GRE header around in collect medata mode · e271c7b4
      Jiri Benc 提交于
      For ipgre interface in collect metadata mode, it doesn't make sense for the
      interface to be of ARPHRD_IPGRE type. The outer header of received packets
      is not needed, as all the information from it is present in metadata_dst. We
      already don't set ipgre_header_ops for collect metadata interfaces, which is
      the only consumer of mac_header pointing to the outer IP header.
      
      Just set the interface type to ARPHRD_NONE in collect metadata mode for
      ipgre (not gretap, that still correctly stays ARPHRD_ETHER) and reset
      mac_header.
      
      Fixes: a64b04d8 ("gre: do not assign header_ops in collect metadata mode")
      Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e271c7b4
    • J
      openvswitch: Fix cached ct with helper. · 16ec3d4f
      Joe Stringer 提交于
      When using conntrack helpers from OVS, a common configuration is to
      perform a lookup without specifying a helper, then go through a
      firewalling policy, only to decide to attach a helper afterwards.
      
      In this case, the initial lookup will cause a ct entry to be attached to
      the skb, then the later commit with helper should attach the helper and
      confirm the connection. However, the helper attachment has been missing.
      If the user has enabled automatic helper attachment, then this issue
      will be masked as it will be applied in init_conntrack(). It is also
      masked if the action is executed from ovs_packet_cmd_execute() as that
      will construct a fresh skb.
      
      This patch fixes the issue by making an explicit call to try to assign
      the helper if there is a discrepancy between the action's helper and the
      current skb->nfct.
      
      Fixes: cae3a262 ("openvswitch: Allow attaching helpers to ct action")
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16ec3d4f
    • M
      x86/extable: ensure entries are swapped completely when sorting · 50c73890
      Mathias Krause 提交于
      The x86 exception table sorting was changed in commit 29934b0f
      ("x86/extable: use generic search and sort routines") to use the arch
      independent code in lib/extable.c.  However, the patch was mangled
      somehow on its way into the kernel from the last version posted at [1].
      The committed version kind of attempted to incorporate the changes of
      commit 548acf19 ("x86/mm: Expand the exception table logic to allow
      new handling options") as in _completely_ _ignoring_ the x86 specific
      'handler' member of struct exception_table_entry.  This effectively
      broke the sorting as entries will only partly be swapped now.
      
      Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the
      exception table doesn't need to be sorted at runtime. However, in case
      that ever changes, we better not break the exception table sorting just
      because of that.
      
      [ Ard Biesheuvel points out that BUILDTIME_EXTABLE_SORT applies to the
        core image only, but we still rely on the sorting routines for modules
        in that case - Linus ]
      
      Fix this by providing a swap_ex_entry_fixup() macro that takes care of
      the 'handler' member.
      
      [1] https://lkml.org/lkml/2016/1/27/232Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Fixes: 29934b0f ("x86/extable: use generic search and sort routines")
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50c73890
    • L
      Merge tag 'spi-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · e0d09e32
      Linus Torvalds 提交于
      Pull spi fixes from Mark Brown:
       "A bunch of small driver specific fixes that have come up, none of them
        remarkable in themselves.  One fixes a regression introduced in the
        merge window and another two are targetted at stable"
      
      * tag 'spi-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
        spi: spi-ti-qspi: Handle truncated frames properly
        spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
        spi: omap2-mcspi: Undo broken fix for dma transfer of vmalloced buffer
        spi: spi-fsl-dspi: Fix cs_change handling in message transfer
      e0d09e32
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · d32917ee
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
       "Two small x86 patches, improving "make kvmconfig" and fixing an
        objtool warning for CONFIG_PROFILE_ALL_BRANCHES"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvmconfig: add more virtio drivers
        x86/kvm: Add stack frame dependency to fastop() inline asm
      d32917ee
  3. 11 5月, 2016 17 次提交
  4. 10 5月, 2016 13 次提交
    • X
      sched/rt, sched/dl: Don't push if task's scheduling class was changed · 13b5ab02
      Xunlei Pang 提交于
      We got this warning:
      
          WARNING: CPU: 1 PID: 2468 at kernel/sched/core.c:1161 set_task_cpu+0x1af/0x1c0
          [...]
          Call Trace:
      
          dump_stack+0x63/0x87
          __warn+0xd1/0xf0
          warn_slowpath_null+0x1d/0x20
          set_task_cpu+0x1af/0x1c0
          push_dl_task.part.34+0xea/0x180
          push_dl_tasks+0x17/0x30
          __balance_callback+0x45/0x5c
          __sched_setscheduler+0x906/0xb90
          SyS_sched_setattr+0x150/0x190
          do_syscall_64+0x62/0x110
          entry_SYSCALL64_slow_path+0x25/0x25
      
      This corresponds to:
      
          WARN_ON_ONCE(p->state == TASK_RUNNING &&
                   p->sched_class == &fair_sched_class &&
                   (p->on_rq && !task_on_rq_migrating(p)))
      
      It happens because in find_lock_later_rq(), the task whose scheduling
      class was changed to fair class is still pushed away as if it were
      a deadline task ...
      
      So, check in find_lock_later_rq() after double_lock_balance(), if the
      scheduling class of the deadline task was changed, break and retry.
      
      Apply the same logic to RT tasks.
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      13b5ab02
    • T
      x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n · 8d415ee2
      Thomas Gleixner 提交于
      Josef reported that the uncore driver trips over with CONFIG_SMP=n because
      x86_max_cores is 16 instead of 12.
      
      The reason is, that for SMP=n the extended topology detection is a NOOP and
      the cache leaf is used to determine the number of cores. That's wrong in two
      aspects:
      
      1) The cache leaf enumerates the maximum addressable number of cores in the
         package, which is obviously not correct
      
      2) UP has no business with topology bits at all.
      
      Make intel_num_cpu_cores() return 1 for CONFIG_SMP=n
      Reported-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team <Kernel-team@fb.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/761b4a2a-0332-7954-f030-c6639f949612@fb.com
      8d415ee2
    • J
      export tc ife uapi header · d99079e2
      Jamal Hadi Salim 提交于
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d99079e2
    • D
      Merge tag 'wireless-drivers-for-davem-2016-05-09' of... · 5e769ada
      David S. Miller 提交于
      Merge tag 'wireless-drivers-for-davem-2016-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.6
      
      iwlwifi
      
      * fix P2P rates (and possibly other issues)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e769ada
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · adc0a8bf
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contain Netfilter simple fixes for your net tree,
      two one-liner and one two-liner:
      
      1) Oneliner to fix missing spinlock definition that triggers
         'BUG: spinlock bad magic on CPU#' when spinlock debugging is enabled,
         from Florian Westphal.
      
      2) Fix missing workqueue cancelation on IDLETIMER removal,
         from Liping Zhang.
      
      3) Fix insufficient validation of netlink of NFACCT_QUOTA in
         nfnetlink_acct, from Phil Turnbull.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adc0a8bf
    • X
      net: thunderx: avoid exposing kernel stack · 161de2ca
      xypron.glpk@gmx.de 提交于
      Reserved fields should be set to zero to avoid exposing
      bits from the kernel stack.
      Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      161de2ca
    • K
      net: fix a kernel infoleak in x25 module · 79e48650
      Kangjie Lu 提交于
      Stack object "dte_facilities" is allocated in x25_rx_call_request(),
      which is supposed to be initialized in x25_negotiate_facilities.
      However, 5 fields (8 bytes in total) are not initialized. This
      object is then copied to userland via copy_to_user, thus infoleak
      occurs.
      Signed-off-by: NKangjie Lu <kjlu@gatech.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e48650
    • G
      ravb: Add missing free_irq() call to ravb_close() · 7fa816b9
      Geert Uytterhoeven 提交于
      When reopening the network device on ra7795/salvator-x, e.g. after a
      DHCP timeout:
      
          IP-Config: Reopening network devices...
          genirq: Flags mismatch irq 139. 00000000 (eth0:ch24:emac) vs. 00000000 (eth0:ch24:emac)
          ravb e6800000.ethernet eth0: cannot request IRQ eth0:ch24:emac
          IP-Config: Failed to open eth0
          IP-Config: No network devices available
      
      The "mismatch" is due to requesting an IRQ that is already in use,
      while IRQF_PROBE_SHARED wasn't set.
      
      However, the real cause is that ravb_close() doesn't release the R-Car
      Gen3-specific secondary IRQ.
      
      Add the missing free_irq() call to fix this.
      
      Fixes: 22d4df8f ("ravb: Add support for r8a7795 SoC")
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fa816b9
    • M
      uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h · 4a91cb61
      Mikko Rapeli 提交于
      glibc's net/if.h contains copies of definitions from linux/if.h and these
      conflict and cause build failures if both files are included by application
      source code. Changes in uapi headers, which fixed header file dependencies to
      include linux/if.h when it was needed, e.g. commit 1ffad83d, made the
      net/if.h and linux/if.h incompatibilities visible as build failures for
      userspace applications like iproute2 and xtables-addons.
      
      This patch fixes compile errors when glibc net/if.h is included before
      linux/if.h:
      
      ./linux/if.h:99:21: error: redeclaration of enumerator ‘IFF_NOARP’
      ./linux/if.h:98:23: error: redeclaration of enumerator ‘IFF_RUNNING’
      ./linux/if.h:97:26: error: redeclaration of enumerator ‘IFF_NOTRAILERS’
      ./linux/if.h:96:27: error: redeclaration of enumerator ‘IFF_POINTOPOINT’
      ./linux/if.h:95:24: error: redeclaration of enumerator ‘IFF_LOOPBACK’
      ./linux/if.h:94:21: error: redeclaration of enumerator ‘IFF_DEBUG’
      ./linux/if.h:93:25: error: redeclaration of enumerator ‘IFF_BROADCAST’
      ./linux/if.h:92:19: error: redeclaration of enumerator ‘IFF_UP’
      ./linux/if.h:252:8: error: redefinition of ‘struct ifconf’
      ./linux/if.h:203:8: error: redefinition of ‘struct ifreq’
      ./linux/if.h:169:8: error: redefinition of ‘struct ifmap’
      ./linux/if.h:107:23: error: redeclaration of enumerator ‘IFF_DYNAMIC’
      ./linux/if.h:106:25: error: redeclaration of enumerator ‘IFF_AUTOMEDIA’
      ./linux/if.h:105:23: error: redeclaration of enumerator ‘IFF_PORTSEL’
      ./linux/if.h:104:25: error: redeclaration of enumerator ‘IFF_MULTICAST’
      ./linux/if.h:103:21: error: redeclaration of enumerator ‘IFF_SLAVE’
      ./linux/if.h:102:22: error: redeclaration of enumerator ‘IFF_MASTER’
      ./linux/if.h:101:24: error: redeclaration of enumerator ‘IFF_ALLMULTI’
      ./linux/if.h:100:23: error: redeclaration of enumerator ‘IFF_PROMISC’
      
      The cases where linux/if.h is included before net/if.h need a similar fix in
      the glibc side, or the order of include files can be changed userspace
      code as a workaround.
      
      This change was tested in x86 userspace on Debian unstable with
      scripts/headers_compile_test.sh:
      
      $ make headers_install && \
        cd usr/include && ../../scripts/headers_compile_test.sh -l -k
      ...
      cc -Wall -c -nostdinc -I /usr/lib/gcc/i586-linux-gnu/5/include -I /usr/lib/gcc/i586-linux-gnu/5/include-fixed -I . -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH/i586-linux-gnu -o /dev/null ./linux/if.h_libc_before_kernel.h
      PASSED libc before kernel test: ./linux/if.h
      Reported-by: NJan Engelhardt <jengelh@inai.de>
      Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Reported-by: NStephen Hemminger <shemming@brocade.com>
      Reported-by: NWaldemar Brodkorb <mail@waldemar-brodkorb.de>
      Cc: Gabriel Laskar <gabriel@lse.epita.fr>
      Signed-off-by: NMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a91cb61
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 2d0bd953
      Linus Torvalds 提交于
      Pull libnvdimm build fix from Dan Williams:
       "A build fix for the usage of HPAGE_SIZE in the last libnvdimm pull
        request.
      
        I have taken note that the kbuild robot build success test does not
        include results for alpha_allmodconfig.  Thanks to Guenter for the
        report.  It's tagged for -stable since the original fix will land
        there and cause build problems"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure
      2d0bd953
    • A
      perf/core: Change the default paranoia level to 2 · 0161028b
      Andy Lutomirski 提交于
      Allowing unprivileged kernel profiling lets any user dump follow kernel
      control flow and dump kernel registers.  This most likely allows trivial
      kASLR bypassing, and it may allow other mischief as well.  (Off the top
      of my head, the PERF_SAMPLE_REGS_INTR output during /dev/urandom reads
      could be quite interesting.)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0161028b
    • L
      Merge branch 'akpm' (patches from Andrew) · 5c56b563
      Linus Torvalds 提交于
      Merge fixes from Andrew Morton:
       "2 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        zsmalloc: fix zs_can_compact() integer overflow
        Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan""
      5c56b563
    • S
      zsmalloc: fix zs_can_compact() integer overflow · 44f43e99
      Sergey Senozhatsky 提交于
      zs_can_compact() has two race conditions in its core calculation:
      
      unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
      				zs_stat_get(class, OBJ_USED);
      
      1) classes are not locked, so the numbers of allocated and used
         objects can change by the concurrent ops happening on other CPUs
      2) shrinker invokes it from preemptible context
      
      Depending on the circumstances, thus, OBJ_ALLOCATED can become
      less than OBJ_USED, which can result in either very high or
      negative `total_scan' value calculated later in do_shrink_slab().
      
      do_shrink_slab() has some logic to prevent those cases:
      
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
      
      However, due to the way `total_scan' is calculated, not every
      shrinker->count_objects() overflow can be spotted and handled.
      To demonstrate the latter, I added some debugging code to do_shrink_slab()
      (x86_64) and the results were:
      
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 92679974445502
       vmscan: resulting total_scan: 92679974445502
      [..]
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 22634041808232578
       vmscan: resulting total_scan: 22634041808232578
      
      Even though shrinker->count_objects() has returned an overflowed value,
      the resulting `total_scan' is positive, and, what is more worrisome, it
      is insanely huge. This value is getting used later on in
      shrinker->scan_objects() loop:
      
              while (total_scan >= batch_size ||
                     total_scan >= freeable) {
                      unsigned long ret;
                      unsigned long nr_to_scan = min(batch_size, total_scan);
      
                      shrinkctl->nr_to_scan = nr_to_scan;
                      ret = shrinker->scan_objects(shrinker, shrinkctl);
                      if (ret == SHRINK_STOP)
                              break;
                      freed += ret;
      
                      count_vm_events(SLABS_SCANNED, nr_to_scan);
                      total_scan -= nr_to_scan;
      
                      cond_resched();
              }
      
      `total_scan >= batch_size' is true for a very-very long time and
      'total_scan >= freeable' is also true for quite some time, because
      `freeable < 0' and `total_scan' is large enough, for example,
      22634041808232578. The only break condition, in the given scheme of
      things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
      bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
      
      To fix the issue, take a pool stat snapshot and use it instead of
      racy zs_stat_get() calls.
      
      Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>        [4.3+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44f43e99