1. 29 5月, 2017 8 次提交
  2. 26 5月, 2017 11 次提交
  3. 25 5月, 2017 13 次提交
    • G
      9pfs: local: metadata file for the VirtFS root · 81ffbf5a
      Greg Kurz 提交于
      When using the mapped-file security, credentials are stored in a metadata
      directory located in the parent directory. This is okay for all paths with
      the notable exception of the root path, since we don't want and probably
      can't create a metadata directory above the virtfs directory on the host.
      
      This patch introduces a dedicated metadata file, sitting in the virtfs root
      for this purpose. It relies on the fact that the "." name necessarily refers
      to the virtfs root.
      
      As for the metadata directory, we don't want the client to see this file.
      The current code only cares for readdir() but there are many other places
      to fix actually. The filtering logic is hence put in a separate function.
      
      Before:
      
      # ls -ld
      drwxr-xr-x. 3 greg greg 4096 May  5 12:49 .
      # chown root.root .
      chown: changing ownership of '.': Is a directory
      # ls -ld
      drwxr-xr-x. 3 greg greg 4096 May  5 12:49 .
      
      After:
      
      # ls -ld
      drwxr-xr-x. 3 greg greg 4096 May  5 12:49 .
      # chown root.root .
      # ls -ld
      drwxr-xr-x. 3 root root 4096 May  5 12:50 .
      
      and from the host:
      
      ls -al .virtfs_metadata_root
      -rwx------. 1 greg greg 26 May  5 12:50 .virtfs_metadata_root
      $ cat .virtfs_metadata_root
      virtfs.uid=0
      virtfs.gid=0
      Reported-by: NLeo Gaspard <leo@gaspard.io>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Tested-by: NLeo Gaspard <leo@gaspard.io>
      [groug: work around a patchew false positive in
              local_set_mapped_file_attrat()]
      81ffbf5a
    • G
      9pfs: local: simplify file opening · 3dbcf273
      Greg Kurz 提交于
      The logic to open a path currently sits between local_open_nofollow() and
      the relative_openat_nofollow() helper, which has no other user.
      
      For the sake of clarity, this patch moves all the code of the helper into
      its unique caller. While here we also:
      - drop the code to skip leading "/" because the backend isn't supposed to
        pass anything but relative paths without consecutive slashes. The assert()
        is kept because we really don't want a buggy backend to pass an absolute
        path to openat().
      - use strchrnul() to get a simpler code. This is ok since virtfs is for
        linux+glibc hosts only.
      - don't dup() the initial directory and add an assert() to ensure we don't
        return the global mountfd to the caller. BTW, this would mean that the
        caller passed an empty path, which isn't supposed to happen either.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      [groug: fixed typos in changelog]
      3dbcf273
    • G
      9pfs: local: resolve special directories in paths · f57f5878
      Greg Kurz 提交于
      When using the mapped-file security mode, the creds of a path /foo/bar
      are stored in the /foo/.virtfs_metadata/bar file. This is okay for all
      paths unless they end with '.' or '..', because we cannot create the
      corresponding file in the metadata directory.
      
      This patch ensures that '.' and '..' are resolved in all paths.
      
      The core code only passes path elements (no '/') to the backend, with
      the notable exception of the '/' path, which refers to the virtfs root.
      This patch preserves the current behavior of converting it to '.' so
      that it can be passed to "*at()" syscalls ('/' would mean the host root).
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      f57f5878
    • G
      9pfs: check return value of v9fs_co_name_to_path() · 4fa62005
      Greg Kurz 提交于
      These v9fs_co_name_to_path() call sites have always been around. I guess
      no care was taken to check the return value because the name_to_path
      operation could never fail at the time. This is no longer true: the
      handle and synth backends can already fail this operation, and so will the
      local backend soon.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      4fa62005
    • G
      9pfs: assume utimensat() and futimens() are present · 24df3371
      Greg Kurz 提交于
      The utimensat() and futimens() syscalls have been around for ages (ie,
      glibc 2.6 and linux 2.6.22), and the decision was already taken to
      switch to utimensat() anyway when fixing CVE-2016-9602 in 2.9.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      24df3371
    • G
      9pfs: local: fix unlink of alien files in mapped-file mode · 6a87e792
      Greg Kurz 提交于
      When trying to remove a file from a directory, both created in non-mapped
      mode, the file remains and EBADF is returned to the guest.
      
      This is a regression introduced by commit "df4938a6 9pfs: local:
      unlinkat: don't follow symlinks" when fixing CVE-2016-9602. It changed the
      way we unlink the metadata file from
      
          ret = remove("$dir/.virtfs_metadata/$name");
          if (ret < 0 && errno != ENOENT) {
               /* Error out */
          }
          /* Ignore absence of metadata */
      
      to
      
          fd = openat("$dir/.virtfs_metadata")
          unlinkat(fd, "$name")
          if (ret < 0 && errno != ENOENT) {
               /* Error out */
          }
          /* Ignore absence of metadata */
      
      If $dir was created in non-mapped mode, openat() fails with ENOENT and
      we pass -1 to unlinkat(), which fails in turn with EBADF.
      
      We just need to check the return of openat() and ignore ENOENT, in order
      to restore the behaviour we had with remove().
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      [groug: rewrote the comments as suggested by Eric]
      6a87e792
    • G
      9pfs: drop pdu_push_and_notify() · a17d8659
      Greg Kurz 提交于
      Only pdu_complete() needs to notify the client that a request has completed.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NStefano Stabellini <sstabellini@kernel.org>
      a17d8659
    • G
      virtio-9p/xen-9p: move 9p specific bits to core 9p code · 506f3275
      Greg Kurz 提交于
      These bits aren't related to the transport so let's move them to the core
      code.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NStefano Stabellini <sstabellini@kernel.org>
      506f3275
    • G
      xics: add unrealize handler · 62f94fc9
      Greg Kurz 提交于
      Now that ICPState objects get finalized on CPU unplug, we should unregister
      reset handlers as well to avoid a QEMU crash at machine reset time.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      62f94fc9
    • D
      hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release · 16ee9980
      Daniel Henrique Barboza 提交于
      When a LMB hot unplug starts, the current DRC LMB status is stored at
      spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus
      if a migration occurs in the middle of a LMB unplug the
      spapr_lmb_release callback will lost track of the LMB unplug progress.
      
      This patch implements a new recover function spapr_recover_pending_dimm_state
      that is used inside spapr_lmb_release to recover this DRC LMB release
      status that is lost during the migration.
      Signed-off-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
      [dwg: Minor stylistic changes, simplify error handling]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      16ee9980
    • D
      hw/ppc: migrating the DRC state of hotplugged devices · a50919dd
      Daniel Henrique Barboza 提交于
      In pseries, a firmware abstraction called Dynamic Reconfiguration
      Connector (DRC) is used to assign a particular dynamic resource
      to the guest and provide an interface to manage configuration/removal
      of the resource associated with it. In other words, DRC is the
      'plugged state' of a device.
      
      Before this patch, DRC wasn't being migrated. This causes
      post-migration problems due to DRC state mismatch between source and
      target. The DRC state of a device X in the source might
      change, while in the target the DRC state of X is still fresh. When
      migrating the guest, X will not have the same hotplugged state as it
      did in the source. This means that we can't hot unplug X in the
      target after migration is completed because its DRC state is not consistent.
      https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1677552 is one
      bug that is caused by this DRC state mismatch between source and
      target.
      
      To migrate the DRC state, we defined the VMStateDescription struct for
      spapr_drc to enable the transmission of spapr_drc state in migration.
      Not all the elements in the DRC state are migrated - only those
      that can be modified by guest actions or device add/remove
      operations:
      
      - 'isolation_state', 'allocation_state' and 'indicator_state'
      are involved in the DR state transition diagram from
      PAPR+ 2.7, 13.4;
      
      - 'configured', 'signalled', 'awaiting_release' and 'awaiting_allocation'
      are needed in attaching and detaching devices;
      
      - 'indicator_state' provides users with hardware state information.
      
      These are the DRC elements that are migrated.
      
      In this patch the DRC state is migrated for PCI, LMB and CPU
      connector types. At this moment there is no support to migrate
      DRC for the PHB (PCI Host Bridge) type.
      
      In the 'realize' function the DRC is registered using vmstate_register,
      similar to what hw/ppc/spapr_iommu.c does in 'spapr_tce_table_realize'.
      This approach works because  DRCs are bus-less and do not sit
      on a BusClass that implements bc->get_dev_path, so as a fallback the
      VMSD gets identified via "spapr_drc"/get_index(drc).
      Signed-off-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      a50919dd
    • D
      hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque · 31834723
      Daniel Henrique Barboza 提交于
      The pointer drc->detach_cb is being used as a way of informing
      the detach() function inside spapr_drc.c which cb to execute. This
      information can also be retrieved simply by checking drc->type and
      choosing the right callback based on it. In this context, detach_cb
      is redundant information that must be managed.
      
      After the previous spapr_lmb_release change, no detach_cb_opaques
      are being used by any of the three callbacks functions. This is
      yet another information that is now unused and, on top of that, can't
      be migrated either.
      
      This patch makes the following changes:
      
      - removal of detach_cb_opaque. the 'opaque' argument was removed from
      the callbacks and from the detach() function of sPAPRConnectorClass. The
      attribute detach_cb_opaque of sPAPRConnector was removed.
      
      - removal of detach_cb from the detach() call. The function pointer
      detach_cb of sPAPRConnector was removed. detach() now uses a
      switch(drc->type) to execute the apropriate callback. To achieve this,
      spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb
      callbacks were made public to be visible inside detach().
      Signed-off-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      31834723
    • D
      hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState · 0cffce56
      David Gibson 提交于
      The LMB DRC release callback, spapr_lmb_release(), uses an opaque
      parameter, a sPAPRDIMMState struct that stores the current LMBs that
      are allocated to a DIMM (nr_lmbs). After each call to this callback,
      the nr_lmbs is decremented by one and, when it reaches zero, the callback
      proceeds with the qdev calls to hot unplug the LMB.
      
      Using drc->detach_cb_opaque is problematic because it can't be migrated in
      the future DRC migration work. This patch makes the following changes to
      eliminate the usage of this opaque callback inside spapr_lmb_release:
      
      - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new
      attribute called 'addr' was added to it. This is used as an unique
      identifier to associate a sPAPRDIMMState to a PCDIMM element.
      
      - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'.
      This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs
      that are currently going under an unplug process.
      
      - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the
      correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address
      was created to fetch the address of a PCDIMM device inside spapr_lmb_release.
      When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug
      calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs.
      
      After these changes, the opaque argument for spapr_lmb_release is now
      unused and is passed as NULL inside spapr_del_lmbs. This and the other
      opaque arguments can now be safely removed from the code.
      
      As an additional cleanup made by this patch, the spapr_del_lmbs function
      was merged with spapr_memory_unplug_request. The former was being called
      only by the latter and both were small enough to fit one single function.
      Signed-off-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
      [dwg: Minor stylistic cleanups]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      0cffce56
  4. 24 5月, 2017 8 次提交
    • L
      spapr: add pre_plug function for memory · c871bc70
      Laurent Vivier 提交于
      This allows to manage errors before the memory
      has started to be hotplugged. We already have
      the function for the CPU cores.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      [dwg: Fixed a couple of style nits]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      c871bc70
    • D
      pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types · 459264ef
      David Gibson 提交于
      As of pseries-2.7 and later, we require the total number of guest vcpus to
      be a multiple of the threads-per-core.  pseries-2.6 and earlier machine
      types, however, are supposed to allow this for the sake of migration from
      old qemu versions which allowed this.
      
      Unfortunately, 8149e299 "pseries: Enforce homogeneous threads-per-core"
      broke this by not considering the old machine type case.  This fixes it by
      only applying the check when the machine type supports hotpluggable cpus.
      By not-entirely-coincidence, that corresponds to the same time when we
      started enforcing total threads being a multiple of threads-per-core.
      
      Fixes: 8149e299Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NGreg Kurz <groug@kaod.org>
      459264ef
    • D
      pseries: Split CAS PVR negotiation out into a separate function · 80c33d34
      David Gibson 提交于
      Guests of the qemu machine type go through a feature negotiation process
      known as "client architecture support" (CAS) during early boot.  This does
      a number of things, one of which is finding a CPU compatibility mode which
      can be supported by both guest and host.
      
      In fact the CPU negotiation is probably the single most complex part of the
      CAS process, so this splits it out into a helper function.  We've recently
      made some mistakes in maintaining backward compatibility for old machine
      types here.  Splitting this out will also make it easier to fix this.
      
      This also adds a possibly useful error message if the negotiation fails
      (i.e. if there isn't a CPU mode that's suitable for both guest and host).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      80c33d34
    • G
      spapr: fix error reporting in xics_system_init() · 3d85885a
      Greg Kurz 提交于
      If the user explicitely asked for kernel-irqchip support and "xics-kvm"
      initialization fails, we shouldn't fallback to emulated "xics" as we
      do now. It is also awkward to print an error message when we have an
      errp pointer argument.
      
      Let's use the errp argument to report the error and let the caller decide.
      This simplifies the code as we don't need a local Error * here.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      3d85885a
    • G
      spapr_cpu_core: drop reference on ICP object during CPU realization · 249127d0
      Greg Kurz 提交于
      When a piece of code allocates an object, it implicitely gets a reference
      on it. If it then makes that object a child property of another object, it
      should drop its own reference at some point otherwise the child object can
      never be finalized. The current code hence leaks one ICP object per CPU
      when hot-removing a core.
      
      Failing to add a newly allocated ICP object to the CPU is a bug. While here,
      let's ensure QEMU aborts if this ever happens.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      249127d0
    • D
      hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry · bff30638
      Daniel Henrique Barboza 提交于
      Currenty we do not have any RTAS event that is reported by the
      event-scan interface. The existing events, RTAS_LOG_TYPE_EPOW and
      RTAS_LOG_TYPE_HOTPLUG, are being reported by the check-exception
      interface and, as such, marked as 'exception=true'.
      
      Commit 79853e18, 'spapr_events: event-scan RTAS interface', added
      the event_scan interface because the guest kernel requires it to
      initialize other required interfaces. It is acting since then as
      a stub because no events that would be reported by it were added
      since then. However, the existence of the 'exception' boolean adds
      an unnecessary load in the future migration of the pending_events,
      sPAPREventLogEntry QTAILQ that hosts the pending RTAS events.
      
      To make the code cleaner and ease the future migration changes, this
      patch makes the following changes:
      
      - remove the 'exception' boolean that filter these events. There is
      nothing to filter since all events are reported by check-exception;
      
      - functions rtas_event_log_queue, rtas_event_log_dequeue and
      rtas_event_log_contains don't receive the 'exception' boolean
      as parameter;
      
      - event_scan function was simplified. It was calling
      'rtas_event_log_dequeue(mask, false)' that was always returning
      'NULL' because we have no events that are created with
      exception=false, thus in the end it would execute a jump to
      'out_no_events' all the time. The function now assumes that
      this will always be the case and all the remaining logic were
      deleted.
      
      In the future, when or if we add new RTAS events that should
      be reported with the event_scan interface, we can refer to
      the changes made in this patch to add the event_scan logic
      back.
      Signed-off-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      bff30638
    • G
      spapr: ensure core_slot isn't NULL in spapr_core_unplug() · 07572c06
      Greg Kurz 提交于
      If we go that far on the path of hot-removing a core and we find out that
      the core-id is invalid, then we have a serious bug.
      
      Let's make it explicit with an assert() instead of dereferencing a NULL
      pointer.
      
      This fixes Coverity issue CID 1375404.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      07572c06
    • G
      xics_kvm: cache already enabled vCPU ids · de86eccc
      Greg Kurz 提交于
      Since commit a45863bd ("xics_kvm: Don't enable KVM_CAP_IRQ_XICS if
      already enabled"), we were able to re-hotplug a vCPU that had been hot-
      unplugged ealier, thanks to a boolean flag in ICPState that we set when
      enabling KVM_CAP_IRQ_XICS.
      
      This could work because the lifecycle of all ICPState objects was the
      same as the machine. Commit 5bc8d26d ("spapr: allocate the ICPState
      object from under sPAPRCPUCore") broke this assumption and now we always
      pass a freshly allocated ICPState object (ie, with the flag unset) to
      icp_kvm_cpu_setup().
      
      This cause re-hotplug to fail with:
      
      Unable to connect CPU8 to kernel XICS: Device or resource busy
      
      Let's fix this by caching all the vCPU ids for which KVM_CAP_IRQ_XICS was
      enabled. This also drops the now useless boolean flag from ICPState.
      Reported-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      de86eccc