1. 15 1月, 2015 2 次提交
  2. 12 1月, 2015 1 次提交
  3. 07 1月, 2015 1 次提交
  4. 29 12月, 2014 1 次提交
  5. 23 12月, 2014 1 次提交
  6. 19 12月, 2014 1 次提交
  7. 17 12月, 2014 3 次提交
  8. 14 12月, 2014 3 次提交
    • M
      virtio_pci: add VIRTIO_PCI_NO_LEGACY · 0dce3771
      Michael S. Tsirkin 提交于
      Add macro to disable all legacy register defines.
      Helpful to make sure legacy macros don't leak
      through into modern code.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      0dce3771
    • M
      ipc/msg: increase MSGMNI, remove scaling · 0050ee05
      Manfred Spraul 提交于
      SysV can be abused to allocate locked kernel memory.  For most systems, a
      small limit doesn't make sense, see the discussion with regards to SHMMAX.
      
      Therefore: increase MSGMNI to the maximum supported.
      
      And: If we ignore the risk of locking too much memory, then an automatic
      scaling of MSGMNI doesn't make sense.  Therefore the logic can be removed.
      
      The code preserves auto_msgmni to avoid breaking any user space applications
      that expect that the value exists.
      
      Notes:
      1) If an administrator must limit the memory allocations, then he can set
      MSGMNI as necessary.
      
      Or he can disable sysv entirely (as e.g. done by Android).
      
      2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
      to control latency vs. throughput:
      If MSGMNB is large, then msgsnd() just returns and more messages can be queued
      before a task switch to a task that calls msgrcv() is forced.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0050ee05
    • M
      ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM · e843e7d2
      Manfred Spraul 提交于
      a)
      
      SysV can be abused to allocate locked kernel memory.  For most systems, a
      small limit doesn't make sense, see the discussion with regards to SHMMAX.
      
      Therefore: Increase the sysv sem limits so that all known applications
      will work with these defaults.
      
      b)
      
      With regards to the maximum supported:
      Some of the specified hard limits are not correct anymore, therefore the
      patch updates the documentation.
      
      - SEMMNI must stay below IPCMNI, which is 32768.
        As for SHMMAX: Stay a bit below this limit.
      
      - SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array
        was limited to 16 kB (order=2)
      
        This doesn't apply anymore:
         - the allocation size isn't sizeof(short)*nsems anymore.
         - ipc_alloc falls back to vmalloc
      
      - SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an
        order=1 allocation.
        Therefore: Leave it at 500 (order=0 allocation).
      
      Note:
      If an administrator must limit the memory allocations, then he can set the
      values as necessary.
      
      Or he can disable sysv entirely (as e.g. done by Android).
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Acked-by: NRafael Aquini <aquini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e843e7d2
  9. 12 12月, 2014 1 次提交
  10. 11 12月, 2014 2 次提交
    • A
      take the targets of /proc/*/ns/* symlinks to separate fs · e149ed2b
      Al Viro 提交于
      New pseudo-filesystem: nsfs.  Targets of /proc/*/ns/* live there now.
      It's not mountable (not even registered, so it's not in /proc/filesystems,
      etc.).  Files on it *are* bindable - we explicitly permit that in do_loopback().
      
      This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
      get_proc_ns() is a macro now (it's simply returning ->i_private; would
      have been an inline, if not for header ordering headache).
      proc_ns_inode() is an ex-parrot.  The interface used in procfs is
      ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).
      
      Dentries and inodes are never hashed; a non-counting reference to dentry
      is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
      if present.  See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
      of that mechanism.
      
      As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
      it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
      from ns_get_path().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e149ed2b
    • P
      kernel: add panic_on_warn · 9e3961a0
      Prarit Bhargava 提交于
      There have been several times where I have had to rebuild a kernel to
      cause a panic when hitting a WARN() in the code in order to get a crash
      dump from a system.  Sometimes this is easy to do, other times (such as
      in the case of a remote admin) it is not trivial to send new images to
      the user.
      
      A much easier method would be a switch to change the WARN() over to a
      panic.  This makes debugging easier in that I can now test the actual
      image the WARN() was seen on and I do not have to engage in remote
      debugging.
      
      This patch adds a panic_on_warn kernel parameter and
      /proc/sys/kernel/panic_on_warn calls panic() in the
      warn_slowpath_common() path.  The function will still print out the
      location of the warning.
      
      An example of the panic_on_warn output:
      
      The first line below is from the WARN_ON() to output the WARN_ON()'s
      location.  After that the panic() output is displayed.
      
          WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
          Kernel panic - not syncing: panic_on_warn set ...
      
          CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ #57
          Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
           0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
           0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
           ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
          Call Trace:
           [<ffffffff81665190>] dump_stack+0x46/0x58
           [<ffffffff8165e2ec>] panic+0xd0/0x204
           [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
           [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
           [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
           [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
           [<ffffffff81002144>] do_one_initcall+0xd4/0x210
           [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
           [<ffffffff810f8889>] load_module+0x16a9/0x1b30
           [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
           [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
           [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
           [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17
      
      Successfully tested by me.
      
      hpa said: There is another very valid use for this: many operators would
      rather a machine shuts down than being potentially compromised either
      functionally or security-wise.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Acked-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e3961a0
  11. 10 12月, 2014 4 次提交
  12. 09 12月, 2014 14 次提交
    • M
      virtio: make VIRTIO_F_VERSION_1 a transport bit · 747ae34a
      Michael S. Tsirkin 提交于
      Activate VIRTIO_F_VERSION_1 automatically unless legacy_only
      is set.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      
      
      
      747ae34a
    • M
      virtio_console: virtio 1.0 support · 1f0f9106
      Michael S. Tsirkin 提交于
      Pretty straight-forward, just use accessors for all fields.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      
      
      
      1f0f9106
    • M
      virtio_scsi: export to userspace · fba7f020
      Michael S. Tsirkin 提交于
      Replace uXX by __uXX and _packed by __attribute((packed))
      as seems to be the norm for userspace headers.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      
      
      
      fba7f020
    • M
      virtio_scsi: move to uapi · 106d81f5
      Michael S. Tsirkin 提交于
      Guests need to use virtio scsi API, so export it to uapi,
      nice to e.g. qemu and will help us remember this file
      affects ABI.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      
      
      
      106d81f5
    • M
      tun: add VNET_LE flag · e999d6ea
      Michael S. Tsirkin 提交于
      virtio 1.0 modified virtio net header format,
      making all fields little endian.
      
      Users can tweak header format before submitting it to tun,
      but this means more data copies where none were necessary.
      And if the iovec is in RO memory, this means we might
      need to split iovec also means we might in theory overflow
      iovec max size.
      
      This patch adds a simpler way for applications to handle this,
      using new "little endian" flag in tun.
      As a result, tun simply byte-swaps header fields as appropriate.
      This is a NOP on LE architectures.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      
      
      
      e999d6ea
    • M
      tun: move internal flag defines out of uapi · 031f5e03
      Michael S. Tsirkin 提交于
      TUN_ flags are internal and never exposed
      to userspace. Any application using it is almost
      certainly buggy.
      
      Move them out to tun.c.
      
      Note: we remove these completely in follow-up patches,
      this code movement is split out for ease of review.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      
      
      
      031f5e03
    • M
      virtio_blk: v1.0 support · 19c1c5a6
      Michael S. Tsirkin 提交于
      Based on patch by Cornelia Huck.
      
      Note: for consistency, and to avoid sparse errors,
            convert all fields, even those no longer in use
            for virtio v1.0.
      Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      
      
      
      19c1c5a6
    • M
      virtio_net: v1.0 endianness · fdd819b2
      Michael S. Tsirkin 提交于
      Based on patches by Rusty Russell, Cornelia Huck.
      Note: more code changes are needed for 1.0 support
      (due to different header size).
      So we don't advertize support for 1.0 yet.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      
      
      
      fdd819b2
    • M
      virtio: set FEATURES_OK · cb3f6d9d
      Michael S. Tsirkin 提交于
      set FEATURES_OK as per virtio 1.0 spec
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      
      
      
      cb3f6d9d
    • M
      virtio: memory access APIs · eef960a0
      Michael S. Tsirkin 提交于
      virtio 1.0 makes all memory structures LE, so
      we need APIs to conditionally do a byteswap on BE
      architectures.
      
      To make it easier to check code statically,
      add virtio specific types for multi-byte integers
      in memory.
      
      Add low level wrappers that do a byteswap conditionally, these will be
      useful e.g. for vhost.  Add high level wrappers that
      query device endian-ness and act accordingly.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      
      
      
      eef960a0
    • M
      virtio: add virtio 1.0 feature bit · 4ec22fae
      Michael S. Tsirkin 提交于
      Based on original patches by Rusty Russell, Thomas Huth
      and Cornelia Huck.
      
      Note: at this time, we do not negotiate this feature bit
      in core, drivers have to declare VERSION_1 support explicitly.
      
      For this reason we treat this bit as a device bit
      and not as a transport bit for now.
      
      After all drivers are converted, we will be able to
      move VERSION_1 to core and drop it from all drivers.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      
      
      
      4ec22fae
    • F
      thermal: provide an UAPI header file · af6c9f16
      Florian Fainelli 提交于
      include/linux/thermal.h contains definitions for the Thermal generic
      netlink family, but none of the valuable information relevant to
      user-space such as the Genl family name, multicast group, version or
      command set and data types is exported to user-space.
      
      Export all the relevant generic netlink information to user-space to
      make this genl family usable by user-space, and while at it, export
      THERMAL_NAME_LENGTH since it limits name length for thermal_hwmon
      devices.
      
      Kbuild and MAINTAINERS are also updated accordingly to reflect this new
      file: include/uapi/linux/thermal.h.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NZhang Rui <rui.zhang@intel.com>
      af6c9f16
    • E
      ethtool: Support for configurable RSS hash function · 892311f6
      Eyal Perry 提交于
      This patch extends the set/get_rxfh ethtool-options for getting or
      setting the RSS hash function.
      
      It modifies drivers implementation of set/get_rxfh accordingly.
      
      This change also delegates the responsibility of checking whether a
      modification to a certain RX flow hash parameter is supported to the
      driver implementation of set_rxfh.
      
      User-kernel API is done through the new hfunc bitmask field in the
      ethtool_rxfh struct. A bit set in the hfunc field is corresponding to an
      index in the new string-set ETH_SS_RSS_HASH_FUNCS.
      
      Got approval from most of the relevant driver maintainers that their
      driver is using Toeplitz, and for the few that didn't answered, also
      assumed it is Toeplitz.
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Ariel Elior <ariel.elior@qlogic.com>
      Cc: Prashant Sreedharan <prashant@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Sathya Perla <sathya.perla@emulex.com>
      Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
      Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Cc: Bruce Allan <bruce.w.allan@intel.com>
      Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
      Cc: Don Skidmore <donald.c.skidmore@intel.com>
      Cc: Greg Rose <gregory.v.rose@intel.com>
      Cc: Matthew Vick <matthew.vick@intel.com>
      Cc: John Ronciak <john.ronciak@intel.com>
      Cc: Mitch Williams <mitch.a.williams@intel.com>
      Cc: Amir Vadai <amirv@mellanox.com>
      Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
      Cc: Shradha Shah <sshah@solarflare.com>
      Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
      Cc: "VMware, Inc." <pv-drivers@vmware.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      892311f6
    • W
      net-timestamp: allow reading recv cmsg on errqueue with origin tstamp · 829ae9d6
      Willem de Bruijn 提交于
      Allow reading of timestamps and cmsg at the same time on all relevant
      socket families. One use is to correlate timestamps with egress
      device, by asking for cmsg IP_PKTINFO.
      
      on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
      avoid changing legacy expectations, only do so if the caller sets a
      new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
      
      on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
      returned for all origins. only change is to set ifindex, which is
      not initialized for all error origins.
      
      In both cases, only generate the pktinfo message if an ifindex is
      known. This is not the case for ACK timestamps.
      
      The difference between the protocol families is probably a historical
      accident as a result of the different conditions for generating cmsg
      in the relevant ip(v6)_recv_error function:
      
      ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
      ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
      
      At one time, this was the same test bar for the ICMP/ICMP6
      distinction. This is no longer true.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      
      ----
      
      Changes
        v1 -> v2
          large rewrite
          - integrate with existing pktinfo cmsg generation code
          - on ipv4: only send with new flag, to maintain legacy behavior
          - on ipv6: send at most a single pktinfo cmsg
          - on ipv6: initialize fields if not yet initialized
      
      The recv cmsg interfaces are also relevant to the discussion of
      whether looping packet headers is problematic. For v6, cmsgs that
      identify many headers are already returned. This patch expands
      that to v4. If it sounds reasonable, I will follow with patches
      
      1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
         (http://patchwork.ozlabs.org/patch/366967/)
      2. sysctl to conditionally drop all timestamps that have payload or
         cmsg from users without CAP_NET_RAW.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      829ae9d6
  13. 08 12月, 2014 2 次提交
  14. 06 12月, 2014 2 次提交
  15. 05 12月, 2014 2 次提交