1. 11 3月, 2015 1 次提交
    • M
      arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting · a987370f
      Marc Zyngier 提交于
      We're using __get_free_pages with to allocate the guest's stage-2
      PGD. The standard behaviour of this function is to return a set of
      pages where only the head page has a valid refcount.
      
      This behaviour gets us into trouble when we're trying to increment
      the refcount on a non-head page:
      
      page:ffff7c00cfb693c0 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x4000000000000000()
      page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0)
      BUG: failure at include/linux/mm.h:548/get_page()!
      Kernel panic - not syncing: BUG!
      CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825
      Hardware name: APM X-Gene Mustang board (DT)
      Call trace:
      [<ffff80000008a09c>] dump_backtrace+0x0/0x13c
      [<ffff80000008a1e8>] show_stack+0x10/0x1c
      [<ffff800000691da8>] dump_stack+0x74/0x94
      [<ffff800000690d78>] panic+0x100/0x240
      [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc
      [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0
      [<ffff8000000a420c>] handle_exit+0x58/0x180
      [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c
      [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754
      [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8
      [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78
      CPU0: stopping
      
      A possible approach for this is to split the compound page using
      split_page() at allocation time, and change the teardown path to
      free one page at a time.  It turns out that alloc_pages_exact() and
      free_pages_exact() does exactly that.
      
      While we're at it, the PGD allocation code is reworked to reduce
      duplication.
      
      This has been tested on an X-Gene platform with a 4kB/48bit-VA host
      kernel, and kvmtool hacked to place memory in the second page of
      the hardware PGD (PUD for the host kernel). Also regression-tested
      on a Cubietruck (Cortex-A7).
      
       [ Reworked to use alloc_pages_exact() and free_pages_exact() and to
         return pointers directly instead of by reference as arguments
          - Christoffer ]
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      a987370f
  2. 06 3月, 2015 1 次提交
  3. 05 3月, 2015 4 次提交
  4. 04 3月, 2015 14 次提交
    • M
      KVM: s390: non-LPAR case obsolete during facilities mask init · fb5bf93f
      Michael Mueller 提交于
      With patch "include guest facilities in kvm facility test" it is no
      longer necessary to have special handling for the non-LPAR case.
      Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      fb5bf93f
    • M
      KVM: s390: include guest facilities in kvm facility test · 981467c9
      Michael Mueller 提交于
      Most facility related decisions in KVM have to take into account:
      
      - the facilities offered by the underlying run container (LPAR/VM)
      - the facilities supported by the KVM code itself
      - the facilities requested by a guest VM
      
      This patch adds the KVM driver requested facilities to the test routine.
      
      It additionally renames struct s390_model_fac to kvm_s390_fac and its field
      names to be more meaningful.
      
      The semantics of the facilities stored in the KVM architecture structure
      is changed. The address arch.model.fac->list now points to the guest
      facility list and arch.model.fac->mask points to the KVM facility mask.
      
      This patch fixes the behaviour of KVM for some facilities for guests
      that ignore the guest visible facility bits, e.g. guests could use
      transactional memory intructions on hosts supporting them even if the
      chosen cpu model would not offer them.
      
      The userspace interface is not affected by this change.
      Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      981467c9
    • M
      KVM: s390: fix in memory copy of facility lists · 94422ee8
      Michael Mueller 提交于
      The facility lists were not fully copied.
      Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      94422ee8
    • C
      KVM: s390/cpacf: Fix kernel bug under z/VM · 86044c8c
      Christian Borntraeger 提交于
      Under z/VM PQAP might trigger an operation exception if no crypto cards
      are defined via APVIRTUAL or APDEDICATED.
      
      [  386.098666] Kernel BUG at 0000000000135c56 [verbose debug info unavailable]
      [  386.098693] illegal operation: 0001 ilc:2 [#1] SMP
      [...]
      [  386.098751] Krnl PSW : 0704c00180000000 0000000000135c56 (kvm_s390_apxa_installed+0x46/0x98)
      [...]
      [  386.098804]  [<000000000013627c>] kvm_arch_init_vm+0x29c/0x358
      [  386.098806]  [<000000000012d008>] kvm_dev_ioctl+0xc0/0x460
      [  386.098809]  [<00000000002c639a>] do_vfs_ioctl+0x332/0x508
      [  386.098811]  [<00000000002c660e>] SyS_ioctl+0x9e/0xb0
      [  386.098814]  [<000000000070476a>] system_call+0xd6/0x258
      [  386.098815]  [<000003fffc7400a2>] 0x3fffc7400a2
      
      Lets add an extable entry and provide a zeroed config in that case.
      Reported-by: NStefan Zimmermann <stzi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Tested-by: NStefan Zimmermann <stzi@linux.vnet.ibm.com>
      86044c8c
    • N
      powerpc/iommu: Remove IOMMU device references via bus notifier · 4ad04e59
      Nishanth Aravamudan 提交于
      After d905c5df ("PPC: POWERNV: move iommu_add_device earlier"), the
      refcnt on the kobject backing the IOMMU group for a PCI device is
      elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
      set_iommu_table_base_and_group). When we go to dlpar a multi-function
      PCI device out:
      
              iommu_reconfig_notifier ->
                      iommu_free_table ->
                              iommu_group_put
                              BUG_ON(tbl->it_group)
      
      We trip this BUG_ON, because there are still references on the table, so
      it is not freed. Fix this by moving the powernv bus notifier to common
      code and calling it for both powernv and pseries.
      
      Fixes: d905c5df ("PPC: POWERNV: move iommu_add_device earlier")
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Tested-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4ad04e59
    • M
      powerpc/smp: Wait until secondaries are active & online · 875ebe94
      Michael Ellerman 提交于
      Anton has a busy ppc64le KVM box where guests sometimes hit the infamous
      "kernel BUG at kernel/smpboot.c:134!" issue during boot:
      
        BUG_ON(td->cpu != smp_processor_id());
      
      Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
      output confirms it:
      
        CPU: 0
        Comm: watchdog/130
      
      The problem is that we aren't ensuring the CPU active bit is set for the
      secondary before allowing the master to continue on. The master unparks
      the secondary CPU's kthreads and the scheduler looks for a CPU to run
      on. It calls select_task_rq() and realises the suggested CPU is not in
      the cpus_allowed mask. It then ends up in select_fallback_rq(), and
      since the active bit isnt't set we choose some other CPU to run on.
      
      This seems to have been introduced by 6acbfb96 "sched: Fix hotplug
      vs. set_cpus_allowed_ptr()", which changed from setting active before
      online to setting active after online. However that was in turn fixing a
      bug where other code assumed an active CPU was also online, so we can't
      just revert that fix.
      
      The simplest fix is just to spin waiting for both active & online to be
      set. We already have a barrier prior to set_cpu_online() (which also
      sets active), to ensure all other setup is completed before online &
      active are set.
      
      Fixes: 6acbfb96 ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      875ebe94
    • L
      Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux · a6c5170d
      Linus Torvalds 提交于
      Pull nfsd fixes from Bruce Fields:
       "Three miscellaneous bugfixes, most importantly the clp->cl_revoked
        bug, which we've seen several reports of people hitting"
      
      * 'for-4.0' of git://linux-nfs.org/~bfields/linux:
        sunrpc: integer underflow in rsc_parse()
        nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd
        svcrpc: fix memory leak in gssp_accept_sec_context_upcall
      a6c5170d
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 789d7f60
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) If an IPVS tunnel is created with a mixed-family destination
          address, it cannot be removed.  Fix from Alexey Andriyanov.
      
       2) Fix module refcount underflow in netfilter's nft_compat, from Pablo
          Neira Ayuso.
      
       3) Generic statistics infrastructure can reference variables sitting on
          a released function stack, therefore use dynamic allocation always.
          Fix from Ignacy Gawędzki.
      
       4) skb_copy_bits() return value test is inverted in ip_check_defrag().
      
       5) Fix network namespace exit in openvswitch, we have to release all of
          the per-net vports.  From Pravin B Shelar.
      
       6) Fix signedness bug in CAIF's cfpkt_iterate(), from Dan Carpenter.
      
       7) Fix rhashtable grow/shrink behavior, only expand during inserts and
          shrink during deletes.  From Daniel Borkmann.
      
       8) Netdevice names with semicolons should never be allowed, because
          they serve as a separator.  From Matthew Thode.
      
       9) Use {,__}set_current_state() where appropriate, from Fabian
          Frederick.
      
      10) Revert byte queue limits support in r8169 driver, it's causing
          regressions we can't figure out.
      
      11) tcp_should_expand_sndbuf() erroneously uses tp->packets_out to
          measure packets in flight, properly use tcp_packets_in_flight()
          instead.  From Neal Cardwell.
      
      12) Fix accidental removal of support for bluetooth in CSR based Intel
          wireless cards.  From Marcel Holtmann.
      
      13) We accidently added a behavioral change between native and compat
          tasks, wrt testing the MSG_CMSG_COMPAT bit.  Just ignore it if the
          user happened to set it in a native binary as that was always the
          behavior we had.  From Catalin Marinas.
      
      14) Check genlmsg_unicast() return valud in hwsim netlink tx frame
          handling, from Bob Copeland.
      
      15) Fix stale ->radar_required setting in mac80211 that can prevent
          starting new scans, from Eliad Peller.
      
      16) Fix memory leak in nl80211 monitor, from Johannes Berg.
      
      17) Fix race in TX index handling in xen-netback, from David Vrabel.
      
      18) Don't enable interrupts in amx-xgbe driver until all software et al.
          state is ready for the interrupt handler to run.  From Thomas
          Lendacky.
      
      19) Add missing netlink_ns_capable() checks to rtnl_newlink(), from Eric
          W Biederman.
      
      20) The amount of header space needed in macvtap was not calculated
          properly, fix it otherwise we splat past the beginning of the
          packet.  From Eric Dumazet.
      
      21) Fix bcmgenet TCP TX perf regression, from Jaedon Shin.
      
      22) Don't raw initialize or mod timers, use setup_timer() and
          mod_timer() instead.  From Vaishali Thakkar.
      
      23) Fix software maintained statistics in bcmgenet and systemport
          drivers, from Florian Fainelli.
      
      24) DMA descriptor updates in sh_eth need proper memory barriers, from
          Ben Hutchings.
      
      25) Don't do UDP Fragmentation Offload on RAW sockets, from Michal
          Kubecek.
      
      26) Openvswitch's non-masked set actions aren't constructed properly
          into netlink messages, fix from Joe Stringer.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
        openvswitch: Fix serialization of non-masked set actions.
        gianfar: Reduce logging noise seen due to phy polling if link is down
        ibmveth: Add function to enable live MAC address changes
        net: bridge: add compile-time assert for cb struct size
        udp: only allow UFO for packets from SOCK_DGRAM sockets
        sh_eth: Really fix padding of short frames on TX
        Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790"
        sh_eth: Fix RX recovery on R-Car in case of RX ring underrun
        sh_eth: Ensure proper ordering of descriptor active bit write/read
        net/mlx4_en: Disbale GRO for incoming loopback/selftest packets
        net/mlx4_core: Fix wrong mask and error flow for the update-qp command
        net: systemport: fix software maintained statistics
        net: bcmgenet: fix software maintained statistics
        rxrpc: don't multiply with HZ twice
        rxrpc: terminate retrans loop when sending of skb fails
        net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface.
        net: pasemi: Use setup_timer and mod_timer
        net: stmmac: Use setup_timer and mod_timer
        net: 8390: axnet_cs: Use setup_timer and mod_timer
        net: 8390: pcnet_cs: Use setup_timer and mod_timer
        ...
      789d7f60
    • J
      openvswitch: Fix serialization of non-masked set actions. · f4f8e738
      Joe Stringer 提交于
      Set actions consist of a regular OVS_KEY_ATTR_* attribute nested inside
      of a OVS_ACTION_ATTR_SET action attribute. When converting masked actions
      back to regular set actions, the inner attribute length was not changed,
      ie, double the length being serialized. This patch fixes the bug.
      
      Fixes: 83d2b9ba ("net: openvswitch: Support masked set actions.")
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4f8e738
    • G
      gianfar: Reduce logging noise seen due to phy polling if link is down · 0ae93b2c
      Guenter Roeck 提交于
      Commit 6ce29b0e ("gianfar: Avoid unnecessary reg accesses in adjust_link()")
      eliminates unnecessary calls to adjust_link for phy devices which don't support
      interrupts and need polling. As part of that work, the 'new_state' local flag,
      which was used to reduce logging noise on the console, was eliminated.
      
      Unfortunately, that means that a 'Link is Down' log message will now be
      issued continuously if a link is configured as UP, the link state is down,
      and the associated phy requires polling. This occurs because priv->oldduplex
      is -1 in this case, which always differs from phydev->duplex. In addition,
      phydev->speed may also differ from priv->oldspeed.  gfar_update_link_state()
      is therefore called each time a phy is polled, even if the link state did not
      change.
      
      Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ae93b2c
    • T
      ibmveth: Add function to enable live MAC address changes · c77c761f
      Thomas Falcon 提交于
      Add a function that will enable changing the MAC address
      of an ibmveth interface while it is still running.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Reviewed-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c77c761f
    • F
      net: bridge: add compile-time assert for cb struct size · 71e168b1
      Florian Westphal 提交于
      make build fail if structure no longer fits into ->cb storage.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71e168b1
    • L
      Linux 4.0-rc2 · 13a7a6ac
      Linus Torvalds 提交于
      13a7a6ac
    • D
      drm/i915: Fix modeset state confusion in the load detect code · 9128b040
      Daniel Vetter 提交于
      This is a tricky story of the new atomic state handling and the legacy
      code fighting over each another. The bug at hand is an underrun of the
      framebuffer reference with subsequent hilarity caused by the load
      detect code. Which is peculiar since the the exact same code works
      fine as the implementation of the legacy setcrtc ioctl.
      
      Let's look at the ingredients:
      
      - Currently our code is a crazy mix of legacy modeset interfaces to
        set the parameters and half-baked atomic state tracking underneath.
        While this transition is going we're using the transitional plane
        helpers to update the atomic side (drm_plane_helper_disable/update
        and friends), i.e. plane->state->fb. Since the state structure owns
        the fb those functions take care of that themselves.
      
        The legacy state (specifically crtc->primary->fb) is still managed
        by the old code (and mostly by the drm core), with the fb reference
        counting done by callers (core drm for the ioctl or the i915 load
        detect code). The relevant commit is
      
        commit ea2c67bb
        Author: Matt Roper <matthew.d.roper@intel.com>
        Date:   Tue Dec 23 10:41:52 2014 -0800
      
            drm/i915: Move to atomic plane helpers (v9)
      
      - drm_plane_helper_disable has special code to handle multiple calls
        in a row - it checks plane->crtc == NULL and bails out. This is to
        match the proper atomic implementation which needs the crtc to get
        at the implied locking context atomic updates always need. See
      
        commit acf24a39
        Author: Daniel Vetter <daniel.vetter@ffwll.ch>
        Date:   Tue Jul 29 15:33:05 2014 +0200
      
            drm/plane-helper: transitional atomic plane helpers
      
      - The universal plane code split out the implicit primary plane from
        the CRTC into it's own full-blown drm_plane object. As part of that
        the setcrtc ioctl (which updated both the crtc mode and primary
        plane) learned to set crtc->primary->crtc on modeset to make sure
        the plane->crtc assignments statate up to date in
      
        commit e13161af
        Author: Matt Roper <matthew.d.roper@intel.com>
        Date:   Tue Apr 1 15:22:38 2014 -0700
      
            drm: Add drm_crtc_init_with_planes() (v2)
      
        Unfortunately we've forgotten to update the load detect code. Which
        wasn't a problem since the load detect modeset is temporary and
        always undone before we drop the locks.
      
      - Finally there is a organically grown history (i.e. don't ask) around
        who sets the legacy plane->fb for the various driver entry points.
        Originally updating that was the drivers duty, but for almost all
        places we've moved that (plus updating the refcounts) into the core.
        Again the exception is the load detect code.
      
      Taking all together the following happens:
      - The load detect code doesn't set crtc->primary->crtc. This is only
        really an issue on crtcs never before used or when userspace
        explicitly disabled the primary plane.
      
      - The plane helper glue code short-circuits because of that and leaves
        a non-NULL fb behind in plane->state->fb and plane->fb. The state
        fb isn't a real problem (it's properly refcounted on its own), it's
        just the canary.
      
      - Load detect code drops the reference for that fb, but doesn't set
        plane->fb = NULL. This is ok since it's still living in that old
        world where drivers had to clear the pointer but the core/callers
        handled the refcounting.
      
      - On the next modeset the drm core notices plane->fb and takes care of
        refcounting it properly by doing another unref. This drops the
        refcount to zero, leaving state->plane now pointing at freed memory.
      
      - intel_plane_duplicate_state still assume it owns a reference to that
        very state->fb and bad things start to happen.
      
      Fix this all by applying the same duct-tape as for the legacy setcrtc
      ioctl code and set crtc->primary->crtc properly.
      
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Paulo Zanoni <przanoni@gmail.com>
      Cc: Sean Paul <seanpaul@chromium.org>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Reported-and-tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9128b040
  5. 03 3月, 2015 17 次提交
  6. 02 3月, 2015 3 次提交