1. 07 12月, 2009 2 次提交
  2. 06 12月, 2009 2 次提交
  3. 04 12月, 2009 12 次提交
  4. 03 12月, 2009 24 次提交
    • W
      writeback: introduce wbc.for_background · b17621fe
      Wu Fengguang 提交于
      It will lower the flush priority for NFS, and maybe more in future.
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b17621fe
    • S
      GFS2: Tag all metadata with jid · 0ab7d13f
      Steven Whitehouse 提交于
      There are two spare field in the header common to all GFS2
      metadata. One is just the right size to fit a journal id
      in it, and this patch updates the journal code so that each
      time a metadata block is modified, we tag it with the journal
      id of the node which is performing the modification.
      
      The reason for this is that it should make it much easier to
      debug issues which arise if we can tell which node was the
      last to modify a particular metadata block.
      
      Since the field is updated before the block is written into
      the journal, each journal should only contain metadata which
      is tagged with its own journal id. The one exception to this
      is the journal header block, which might have a different node's
      id in it, if that journal was recovered by another node in the
      cluster.
      
      Thus each journal will contain a record of which nodes recovered
      it, via the journal header.
      
      The other field in the metadata header could potentially be
      used to hold information about what kind of operation was
      performed, but for the time being we just zero it on each
      transaction so that if we use it for that in future, we'll
      know that the information (where it exists) is reliable.
      
      I did consider using the other field to hold the journal
      sequence number, however since in GFS2's journaling we write
      the modified data into the journal and not the original
      data, this gives no information as to what action caused the
      modification, so I think we can probably come up with a better
      use for those 64 bits in the future.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0ab7d13f
    • S
      VFS: Export dquot_send_warning · 86e931a3
      Steven Whitehouse 提交于
      Sending a message to userspace in a generic format to warn
      of events (e.g. quota exceeded) in the quota subsystem is
      a generically useful feature. This patch makes some minor
      changes to the send_message function from dquot.c renaming
      it quota_send_message, moving it to quota.c and exporting it
      for use by filesystems which do not use the dquot code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      86e931a3
    • S
      VFS: Add forget_all_cached_acls() · 796bd952
      Steven Whitehouse 提交于
      This is required for cluster filesystems which want to use
      cached ACLs so that they can invalidate the cache when
      required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Alexander Viro <aviro@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      796bd952
    • M
      block: Allow devices to indicate whether discarded blocks are zeroed · 98262f27
      Martin K. Petersen 提交于
      The discard ioctl is used by mkfs utilities to clear a block device
      prior to putting metadata down.  However, not all devices return zeroed
      blocks after a discard.  Some drives return stale data, potentially
      containing old superblocks.  It is therefore important to know whether
      discarded blocks are properly zeroed.
      
      Both ATA and SCSI drives have configuration bits that indicate whether
      zeroes are returned after a discard operation.  Implement a block level
      interface that allows this information to be bubbled up the stack and
      queried via a new block device ioctl.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      98262f27
    • C
      libata: add translation for SCSI WRITE SAME (aka TRIM support) · 18f0f978
      Christoph Hellwig 提交于
      Add support for the ATA TRIM command in libata.  We translate a WRITE SAME 16
      command with the unmap bit set into an ATA TRIM command and export enough
      information in READ CAPACITY 16 and the block limits EVPD page so that the new
      SCSI layer discard support will driver this for us.
      
      Note that I hardcode the WRITE_SAME_16 opcode for now as the patch to introduce
      the symbolic is not in 2.6.32 yet but only in the SCSI tree - as soon as it is
      merged we can fix it up to properly use the symbolic name.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      18f0f978
    • T
      libata: retry failed FLUSH if device didn't fail it · 6013efd8
      Tejun Heo 提交于
      If ATA device failed FLUSH, it means that the device failed to write
      out some amount of data and the error needs to be reported to upper
      layers. As retries can't recover the lost data, FLUSH failures need to
      be reported immediately in general.
      
      However, if FLUSH fails due to transmission errors, the FLUSH needs to
      be retried; otherwise, filesystems may switch to RO mode and/or raid
      array may drop a drive for a random transmission glitch.
      
      This condition can be rather easily reproduced on certain ahci
      controllers which go through a PHY event after powersave mode switch +
      ext4 combination.  Powersave mode switch is often closely followed by
      flush from the filesystem failing the FLUSH with ATA bus error which
      makes the filesystem code believe that data is lost and drop to RO
      mode.  This was reported in the following bugzilla bug.
      
        http://bugzilla.kernel.org/show_bug.cgi?id=14543
      
      This patch makes libata EH retry FLUSH if it wasn't failed by the
      device.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndrey Vihrov <andrey.vihrov@gmail.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      6013efd8
    • C
      KVM: s390: Make psw available on all exits, not just a subset · d7b0b5eb
      Carsten Otte 提交于
      This patch moves s390 processor status word into the base kvm_run
      struct and keeps it up-to date on all userspace exits.
      
      The userspace ABI is broken by this, however there are no applications
      in the wild using this.  A capability check is provided so users can
      verify the updated API exists.
      
      Cc: stable@kernel.org
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d7b0b5eb
    • J
      KVM: x86: Add KVM_GET/SET_VCPU_EVENTS · 3cfc3092
      Jan Kiszka 提交于
      This new IOCTL exports all yet user-invisible states related to
      exceptions, interrupts, and NMIs. Together with appropriate user space
      changes, this fixes sporadic problems of vmsave/restore, live migration
      and system reset.
      
      [avi: future-proof abi by adding a flags field]
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3cfc3092
    • A
      KVM: VMX: Report unexpected simultaneous exceptions as internal errors · 65ac7264
      Avi Kivity 提交于
      These happen when we trap an exception when another exception is being
      delivered; we only expect these with MCEs and page faults.  If something
      unexpected happens, things probably went south and we're better off reporting
      an internal error and freezing.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      65ac7264
    • A
      KVM: Allow internal errors reported to userspace to carry extra data · a9c7399d
      Avi Kivity 提交于
      Usually userspace will freeze the guest so we can inspect it, but some
      internal state is not available.  Add extra data to internal error
      reporting so we can expose it to the debugger.  Extra data is specific
      to the suberror.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a9c7399d
    • J
      KVM: Reorder IOCTLs in main kvm.h · c54d2aba
      Jan Kiszka 提交于
      Obviously, people tend to extend this header at the bottom - more or
      less blindly. Ensure that deprecated stuff gets its own corner again by
      moving things to the top. Also add some comments and reindent IOCTLs to
      make them more readable and reduce the risk of number collisions.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c54d2aba
    • G
      KVM: allow userspace to adjust kvmclock offset · afbcf7ab
      Glauber Costa 提交于
      When we migrate a kvm guest that uses pvclock between two hosts, we may
      suffer a large skew. This is because there can be significant differences
      between the monotonic clock of the hosts involved. When a new host with
      a much larger monotonic time starts running the guest, the view of time
      will be significantly impacted.
      
      Situation is much worse when we do the opposite, and migrate to a host with
      a smaller monotonic clock.
      
      This proposed ioctl will allow userspace to inform us what is the monotonic
      clock value in the source host, so we can keep the time skew short, and
      more importantly, never goes backwards. Userspace may also need to trigger
      the current data, since from the first migration onwards, it won't be
      reflected by a simple call to clock_gettime() anymore.
      
      [marcelo: future-proof abi with a flags field]
      [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      afbcf7ab
    • E
      KVM: Xen PV-on-HVM guest support · ffde22ac
      Ed Swierk 提交于
      Support for Xen PV-on-HVM guests can be implemented almost entirely in
      userspace, except for handling one annoying MSR that maps a Xen
      hypercall blob into guest address space.
      
      A generic mechanism to delegate MSR writes to userspace seems overkill
      and risks encouraging similar MSR abuse in the future.  Thus this patch
      adds special support for the Xen HVM MSR.
      
      I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
      KVM which MSR the guest will write to, as well as the starting address
      and size of the hypercall blobs (one each for 32-bit and 64-bit) that
      userspace has loaded from files.  When the guest writes to the MSR, KVM
      copies one page of the blob from userspace to the guest.
      
      I've tested this patch with a hacked-up version of Gerd's userspace
      code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
      FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
      
      [jan: fix i386 build warning]
      [avi: future proof abi with a flags field]
      Signed-off-by: NEd Swierk <eswierk@aristanetworks.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ffde22ac
    • Z
      KVM: introduce kvm_vcpu_on_spin · d255f4f2
      Zhai, Edwin 提交于
      Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
      once the cpu detects pause-based looping.
      Signed-off-by: N"Zhai, Edwin" <edwin.zhai@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      d255f4f2
    • A
      KVM: Activate Virtualization On Demand · 10474ae8
      Alexander Graf 提交于
      X86 CPUs need to have some magic happening to enable the virtualization
      extensions on them. This magic can result in unpleasant results for
      users, like blocking other VMMs from working (vmx) or using invalid TLB
      entries (svm).
      
      Currently KVM activates virtualization when the respective kernel module
      is loaded. This blocks us from autoloading KVM modules without breaking
      other VMMs.
      
      To circumvent this problem at least a bit, this patch introduces on
      demand activation of virtualization. This means, that instead
      virtualization is enabled on creation of the first virtual machine
      and disabled on destruction of the last one.
      
      So using this, KVM can be easily autoloaded, while keeping other
      hypervisors usable.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      10474ae8
    • A
      KVM: Move assigned device code to own file · bfd99ff5
      Avi Kivity 提交于
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      bfd99ff5
    • G
      KVM: Move irq ack notifier list to arch independent code · 136bdfee
      Gleb Natapov 提交于
      Mask irq notifier list is already there.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      136bdfee
    • G
      KVM: Maintain back mapping from irqchip/pin to gsi · 3e71f88b
      Gleb Natapov 提交于
      Maintain back mapping from irqchip/pin to gsi to speedup
      interrupt acknowledgment notifications.
      
      [avi: build fix on non-x86/ia64]
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3e71f88b
    • G
      KVM: Change irq routing table to use gsi indexed array · 46e624b9
      Gleb Natapov 提交于
      Use gsi indexed array instead of scanning all entries on each interrupt
      injection.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      46e624b9
    • G
      KVM: Move irq sharing information to irqchip level · 1a6e4a8c
      Gleb Natapov 提交于
      This removes assumptions that max GSIs is smaller than number of pins.
      Sharing is tracked on pin level not GSI level.
      
      [avi: no PIC on ia64]
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1a6e4a8c
    • A
      include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size · 7cff7ce9
      Andrew Morton 提交于
      Maybe 4.1.0 doesn't too, but this fixed it for me.
      
      Caused by:
      
       4a312769: x86: Turn the copy_from_user check into an (optional) compile time warning
       9f0cf4ad: x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      LKML-Reference: <200910090724.n997OQl6013538@imap1.linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7cff7ce9
    • W
      TCPCT part 1d: define TCP cookie option, extend existing struct's · 435cf559
      William Allen Simpson 提交于
      Data structures are carefully composed to require minimal additions.
      For example, the struct tcp_options_received cookie_plus variable fits
      between existing 16-bit and 8-bit variables, requiring no additional
      space (taking alignment into consideration).  There are no additions to
      tcp_request_sock, and only 1 pointer in tcp_sock.
      
      This is a significantly revised implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      The principle difference is using a TCP option to carry the cookie nonce,
      instead of a user configured offset in the data.  This is more flexible and
      less subject to user configuration error.  Such a cookie option has been
      suggested for many years, and is also useful without SYN data, allowing
      several related concepts to use the same extension option.
      
          "Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
          http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html
      
          "Re: what a new TCP header might look like", May 12, 1998.
          ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail
      
      These functions will also be used in subsequent patches that implement
      additional features.
      
      Requires:
         TCPCT part 1a: add request_values parameter for sending SYNACK
         TCPCT part 1b: generate Responder Cookie secret
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435cf559
    • W
      TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS · 519855c5
      William Allen Simpson 提交于
      Define sysctl (tcp_cookie_size) to turn on and off the cookie option
      default globally, instead of a compiled configuration option.
      
      Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
      data values, retrieving variable cookie values, and other facilities.
      
      Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
      near its corresponding struct tcp_options_received (prior to changes).
      
      This is a straightforward re-implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      These functions will also be used in subsequent patches that implement
      additional features.
      
      Requires:
         net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      519855c5