1. 24 1月, 2013 1 次提交
    • A
      fuse: implement NFS-like readdirplus support · 0b05b183
      Anand V. Avati 提交于
      This patch implements readdirplus support in FUSE, similar to NFS.
      The payload returned in the readdirplus call contains
      'fuse_entry_out' structure thereby providing all the necessary inputs
      for 'faking' a lookup() operation on the spot.
      
      If the dentry and inode already existed (for e.g. in a re-run of ls -l)
      then just the inode attributes timeout and dentry timeout are refreshed.
      
      With a simple client->network->server implementation of a FUSE based
      filesystem, the following performance observations were made:
      
      Test: Performing a filesystem crawl over 20,000 files with
      
      sh# time ls -lR /mnt
      
      Without readdirplus:
      Run 1: 18.1s
      Run 2: 16.0s
      Run 3: 16.2s
      
      With readdirplus:
      Run 1: 4.1s
      Run 2: 3.8s
      Run 3: 3.8s
      
      The performance improvement is significant as it avoided 20,000 upcalls
      calls (lookup). Cache consistency is no worse than what already is.
      Signed-off-by: NAnand V. Avati <avati@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      0b05b183
  2. 18 1月, 2013 1 次提交
  3. 12 1月, 2013 2 次提交
  4. 05 1月, 2013 1 次提交
    • S
      ipc: introduce message queue copy feature · 4a674f34
      Stanislav Kinsbursky 提交于
      This patch is required for checkpoint/restore in userspace.
      
      c/r requires some way to get all pending IPC messages without deleting
      them from the queue (checkpoint can fail and in this case tasks will be
      resumed, so queue have to be valid).
      
      To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
      was introduced.  If this flag was specified, then mtype is interpreted as
      number of the message to copy.
      
      If MSG_COPY is set, then kernel will allocate dummy message with passed
      size, and then use new copy_msg() helper function to copy desired message
      (instead of unlinking it from the queue).
      
      Notes:
      
      1) Return -ENOSYS if MSG_COPY is specified, but
         CONFIG_CHECKPOINT_RESTORE is not set.
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a674f34
  5. 27 12月, 2012 1 次提交
    • B
      PCI: Add PCIe Link Capability link speed and width names · 130f1b8f
      Bjorn Helgaas 提交于
      Add standard #defines for the Supported Link Speeds field in the PCIe
      Link Capabilities register.
      
      Note that prior to PCIe spec r3.0, these encodings were defined:
      
          0001b  2.5GT/s Link speed supported
          0010b  5.0GT/s and 2.5GT/s Link speed supported
      
      Starting with spec r3.0, these encodings refer to bits 0 and 1 in the
      Supported Link Speeds Vector in the Link Capabilities 2 register, and bits
      0 and 1 there mean 2.5 GT/s and 5.0 GT/s, respectively.  Therefore, code
      that followed r2.0 and interpreted 0x1 as 2.5GT/s and 0x2 as 5.0GT/s will
      continue to work, and we can identify a device using the new encodings
      because it will have a non-zero Link Capabilities 2 register.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      130f1b8f
  6. 22 12月, 2012 1 次提交
  7. 20 12月, 2012 1 次提交
  8. 18 12月, 2012 2 次提交
  9. 16 12月, 2012 1 次提交
  10. 14 12月, 2012 2 次提交
  11. 13 12月, 2012 4 次提交
  12. 11 12月, 2012 7 次提交
    • M
      mm: numa: Migrate on reference policy · 5606e387
      Mel Gorman 提交于
      This is the simplest possible policy that still does something of note.
      When a pte_numa is faulted, it is moved immediately. Any replacement
      policy must at least do better than this and in all likelihood this
      policy regresses normal workloads.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      5606e387
    • M
      mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now · a720094d
      Mel Gorman 提交于
      The use of MPOL_NOOP and MPOL_MF_LAZY to allow an application to
      explicitly request lazy migration is a good idea but the actual
      API has not been well reviewed and once released we have to support it.
      For now this patch prevents an application using the services. This
      will need to be revisited.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      a720094d
    • L
      mm: mempolicy: Add MPOL_MF_LAZY · b24f53a0
      Lee Schermerhorn 提交于
      NOTE: Once again there is a lot of patch stealing and the end result
      	is sufficiently different that I had to drop the signed-offs.
      	Will re-add if the original authors are ok with that.
      
      This patch adds another mbind() flag to request "lazy migration".  The
      flag, MPOL_MF_LAZY, modifies MPOL_MF_MOVE* such that the selected
      pages are marked PROT_NONE. The pages will be migrated in the fault
      path on "first touch", if the policy dictates at that time.
      
      "Lazy Migration" will allow testing of migrate-on-fault via mbind().
      Also allows applications to specify that only subsequently touched
      pages be migrated to obey new policy, instead of all pages in range.
      This can be useful for multi-threaded applications working on a
      large shared data area that is initialized by an initial thread
      resulting in all pages on one [or a few, if overflowed] nodes.
      After PROT_NONE, the pages in regions assigned to the worker threads
      will be automatically migrated local to the threads on 1st touch.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      b24f53a0
    • L
      mm: mempolicy: Check for misplaced page · 771fb4d8
      Lee Schermerhorn 提交于
      This patch provides a new function to test whether a page resides
      on a node that is appropriate for the mempolicy for the vma and
      address where the page is supposed to be mapped.  This involves
      looking up the node where the page belongs.  So, the function
      returns that node so that it may be used to allocated the page
      without consulting the policy again.
      
      A subsequent patch will call this function from the fault path.
      Because of this, I don't want to go ahead and allocate the page, e.g.,
      via alloc_page_vma() only to have to free it if it has the correct
      policy.  So, I just mimic the alloc_page_vma() node computation
      logic--sort of.
      
      Note:  we could use this function to implement a MPOL_MF_STRICT
      behavior when migrating pages to match mbind() mempolicy--e.g.,
      to ensure that pages in an interleaved range are reinterleaved
      rather than left where they are when they reside on any page in
      the interleave nodemask.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      [ Added MPOL_F_LAZY to trigger migrate-on-fault;
        simplified code now that we don't have to bother
        with special crap for interleaved ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      771fb4d8
    • L
      mm: mempolicy: Add MPOL_NOOP · d3a71033
      Lee Schermerhorn 提交于
      This patch augments the MPOL_MF_LAZY feature by adding a "NOOP" policy
      to mbind().  When the NOOP policy is used with the 'MOVE and 'LAZY
      flags, mbind() will map the pages PROT_NONE so that they will be
      migrated on the next touch.
      
      This allows an application to prepare for a new phase of operation
      where different regions of shared storage will be assigned to
      worker threads, w/o changing policy.  Note that we could just use
      "default" policy in this case.  However, this also allows an
      application to request that pages be migrated, only if necessary,
      to follow any arbitrary policy that might currently apply to a
      range of pages, without knowing the policy, or without specifying
      multiple mbind()s for ranges with different policies.
      
      [ Bug in early version of mpol_parse_str() reported by Fengguang Wu. ]
      Bug-Reported-by: NReported-by: Fengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      d3a71033
    • P
      mm: mempolicy: Make MPOL_LOCAL a real policy · 479e2802
      Peter Zijlstra 提交于
      Make MPOL_LOCAL a real and exposed policy such that applications that
      relied on the previous default behaviour can explicitly request it.
      Requested-by: NChristoph Lameter <cl@linux.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      479e2802
    • J
      f2fs: add superblock and major in-memory structure · 39a53e0c
      Jaegeuk Kim 提交于
      This adds the following major in-memory structures in f2fs.
      
      - f2fs_sb_info:
        contains f2fs-specific information, two special inode pointers for node and
        meta address spaces, and orphan inode management.
      
      - f2fs_inode_info:
        contains vfs_inode and other fs-specific information.
      
      - f2fs_nm_info:
        contains node manager information such as NAT entry cache, free nid list,
        and NAT page management.
      
      - f2fs_node_info:
        represents a node as node id, inode number, block address, and its version.
      
      - f2fs_sm_info:
        contains segment manager information such as SIT entry cache, free segment
        map, current active logs, dirty segment management, and segment utilization.
        The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
        curseg_info.
      
      In addition, add F2FS_SUPER_MAGIC in magic.h.
      Signed-off-by: NChul Lee <chur.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      39a53e0c
  13. 09 12月, 2012 1 次提交
    • J
      virtio_net: multiqueue support · 986a4f4d
      Jason Wang 提交于
      This patch adds the multiqueue (VIRTIO_NET_F_MQ) support to virtio_net
      driver. VIRTIO_NET_F_MQ capable device could allow the driver to do packet
      transmission and reception through multiple queue pairs and does the packet
      steering to get better performance. By default, one one queue pair is used, user
      could change the number of queue pairs by ethtool in the next patch.
      
      When multiple queue pairs is used and the number of queue pairs is equal to the
      number of vcpus. Driver does the following optimizations to implement per-cpu
      virt queue pairs:
      
      - select the txq based on the smp processor id.
      - smp affinity hint to the cpu that owns the queue pairs.
      
      This could be used with the flow steering support of the device to guarantee the
      packets of a single flow is handled by the same cpu.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      986a4f4d
  14. 08 12月, 2012 2 次提交
    • C
      bridge: export multicast database via netlink · ee07c6e7
      Cong Wang 提交于
      V5: fix two bugs pointed out by Thomas
          remove seq check for now, mark it as TODO
      
      V4: remove some useless #include
          some coding style fix
      
      V3: drop debugging printk's
          update selinux perm table as well
      
      V2: drop patch 1/2, export ifindex directly
          Redesign netlink attributes
          Improve netlink seq check
          Handle IPv6 addr as well
      
      This patch exports bridge multicast database via netlink
      message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
      We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).
      
      (Thanks to Thomas for patient reviews)
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee07c6e7
    • B
      PCI: Add standard PCIe Capability Link ASPM field names · 75083206
      Bjorn Helgaas 提交于
      Add standard #defines for ASPM fields in PCI Express Link Capability and
      Link Control registers.
      
      Previously we used PCIE_LINK_STATE_L0S and PCIE_LINK_STATE_L1 directly, but
      these are defined for the Linux ASPM interfaces, e.g.,
      pci_disable_link_state(), and only coincidentally match the actual register
      bits.  PCIE_LINK_STATE_CLKPM, also part of that interface, does not match
      the register bit.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      Acked-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      75083206
  15. 06 12月, 2012 4 次提交
    • D
      byteorder: allow arch to opt to use GCC intrinsics for byteswapping · cf66bb93
      David Woodhouse 提交于
      Since GCC 4.4, there have been __builtin_bswap32() and __builtin_bswap16()
      intrinsics. A __builtin_bswap16() came a little later (4.6 for PowerPC,
      48 for other platforms).
      
      By using these instead of the inline assembler that most architectures
      have in their __arch_swabXX() macros, we let the compiler see what's
      actually happening. The resulting code should be at least as good, and
      much *better* in the cases where it can be combined with a nearby load
      or store, using a load-and-byteswap or store-and-byteswap instruction
      (e.g. lwbrx/stwbrx on PowerPC, movbe on Atom).
      
      When GCC is sufficiently recent *and* the architecture opts in to using
      the intrinsics by setting CONFIG_ARCH_USE_BUILTIN_BSWAP, they will be
      used in preference to the __arch_swabXX() macros. An architecture which
      does not set ARCH_USE_BUILTIN_BSWAP will continue to use its own
      hand-crafted macros.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      cf66bb93
    • P
      KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT · a2932923
      Paul Mackerras 提交于
      A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
      this fd return the contents of the HPT (hashed page table), writes
      create and/or remove entries in the HPT.  There is a new capability,
      KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
      takes an argument structure with the index of the first HPT entry to
      read out and a set of flags.  The flags indicate whether the user is
      intending to read or write the HPT, and whether to return all entries
      or only the "bolted" entries (those with the bolted bit, 0x10, set in
      the first doubleword).
      
      This is intended for use in implementing qemu's savevm/loadvm and for
      live migration.  Therefore, on reads, the first pass returns information
      about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
      end of the HPT, it returns from the read.  Subsequent reads only return
      information about HPTEs that have changed since they were last read.
      A read that finds no changed HPTEs in the HPT following where the last
      read finished will return 0 bytes.
      
      The format of the data provides a simple run-length compression of the
      invalid entries.  Each block of data starts with a header that indicates
      the index (position in the HPT, which is just an array), the number of
      valid entries starting at that index (may be zero), and the number of
      invalid entries following those valid entries.  The valid entries, 16
      bytes each, follow the header.  The invalid entries are not explicitly
      represented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2932923
    • D
      bridge: implement multicast fast leave · c2d3babf
      David S. Miller 提交于
      V3: make it a flag
      V2: make the toggle per-port
      
      Fast leave allows bridge to immediately stops the multicast
      traffic on the port receives IGMP Leave when IGMP snooping is enabled,
      no timeouts are observed.
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      c2d3babf
    • B
      PCI: Add and use standard PCI-X Capability register names · 7793eeab
      Bjorn Helgaas 提交于
      Add and use #defines for PCI-X Capability registers and fields.
      Note that the PCI-X Capability has a different layout for
      type 0 (endpoint) and type 1 (bridge) devices.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      7793eeab
  16. 05 12月, 2012 3 次提交
  17. 04 12月, 2012 1 次提交
    • M
      tun: only queue packets on device · 5d097109
      Michael S. Tsirkin 提交于
      Historically tun supported two modes of operation:
      - in default mode, a small number of packets would get queued
        at the device, the rest would be queued in qdisc
      - in one queue mode, all packets would get queued at the device
      
      This might have made sense up to a point where we made the
      queue depth for both modes the same and set it to
      a huge value (500) so unless the consumer
      is stuck the chance of losing packets is small.
      
      Thus in practice both modes behave the same, but the
      default mode has some problems:
      - if packets are never consumed, fragments are never orphaned
        which cases a DOS for sender using zero copy transmit
      - overrun errors are hard to diagnose: fifo error is incremented
        only once so you can not distinguish between
        userspace that is stuck and a transient failure,
        tcpdump on the device does not show any traffic
      
      Userspace solves this simply by enabling IFF_ONE_QUEUE
      but there seems to be little point in not doing the
      right thing for everyone, by default.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d097109
  18. 03 12月, 2012 1 次提交
    • P
      netfilter: ctnetlink: dump entries from the dying and unconfirmed lists · d871befe
      Pablo Neira Ayuso 提交于
      This patch adds a new operation to dump the content of the dying and
      unconfirmed lists.
      
      Under some situations, the global conntrack counter can be inconsistent
      with the number of entries that we can dump from the conntrack table.
      The way to resolve this is to allow dumping the content of the unconfirmed
      and dying lists, so far it was not possible to look at its content.
      
      This provides some extra instrumentation to resolve problematic situations
      in which anyone suspects memory leaks.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d871befe
  19. 27 11月, 2012 1 次提交
  20. 26 11月, 2012 3 次提交
    • J
      nl80211/cfg80211: add VHT MCS support · db9c64cf
      Johannes Berg 提交于
      Add support for reporting and calculating VHT MCSes.
      
      Note that I'm not completely sure that the bitrate
      calculations are correct, nor that they can't be
      simplified.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      db9c64cf
    • J
      nl80211/cfg80211: support VHT channel configuration · 3d9d1d66
      Johannes Berg 提交于
      Change nl80211 to support specifying a VHT (or HT)
      using the control channel frequency (as before) and
      new attributes for the channel width and first and
      second center frequency. The old channel type is of
      course still supported for HT.
      
      Also change the cfg80211 channel definition struct
      to support these by adding the relevant fields to
      it (and removing the _type field.)
      
      This also adds new helper functions:
       - cfg80211_chandef_create to create a channel def
         struct given the control channel and channel type,
       - cfg80211_chandef_identical to check if two channel
         definitions are identical
       - cfg80211_chandef_compatible to check if the given
         channel definitions are compatible, and return the
         wider of the two
      
      This isn't entirely complete, but that doesn't matter
      until we have a driver using it. In particular, it's
      missing
       - regulatory checks on the usable bandwidth (if that
         even makes sense)
       - regulatory TX power (database can't deal with it)
       - a proper channel compatibility calculation for the
         new channel types
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      3d9d1d66
    • J
      nl80211: add documentation for channel type · fe4b3181
      Johannes Berg 提交于
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      fe4b3181