1. 12 3月, 2013 8 次提交
  2. 09 3月, 2013 1 次提交
  3. 08 3月, 2013 1 次提交
  4. 04 3月, 2013 4 次提交
    • R
      ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type · 92414481
      Rafael J. Wysocki 提交于
      After PCI and USB have stopped using the .find_bridge() callback in
      struct acpi_bus_type, the only remaining user of it is SATA, but SATA
      only pretends to be a user, because it points that callback to a stub
      always returning -ENODEV.
      
      For this reason, drop the SATA's dummy .find_bridge() callback and
      remove .find_bridge(), which is not used any more, from struct
      acpi_bus_type entirely.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeff Garzik <jgarzik@pobox.com>
      92414481
    • R
      ACPI / glue: Add .match() callback to struct acpi_bus_type · 53540098
      Rafael J. Wysocki 提交于
      USB uses the .find_bridge() callback from struct acpi_bus_type
      incorrectly, because as a result of the way it is used by USB every
      device in the system that doesn't have a bus type or parent is
      passed to usb_acpi_find_device() for inspection.
      
      What USB actually needs, though, is to call usb_acpi_find_device()
      for USB ports that don't have a bus type defined, but have
      usb_port_device_type as their device type, as well as for USB
      devices.
      
      To fix that replace the struct bus_type pointer in struct
      acpi_bus_type used for matching devices to specific subsystems
      with a .match() callback to be used for this purpose and update
      the users of struct acpi_bus_type, including USB, accordingly.
      Define the .match() callback routine for USB, usb_acpi_bus_match(),
      in such a way that it will cover both USB devices and USB ports
      and remove the now redundant .find_bridge() callback pointer from
      usb_acpi_bus.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeff Garzik <jgarzik@pobox.com>
      53540098
    • K
      eCryptfs: allow userspace messaging to be disabled · 290502be
      Kees Cook 提交于
      When the userspace messaging (for the less common case of userspace key
      wrap/unwrap via ecryptfsd) is not needed, allow eCryptfs to build with
      it removed. This saves on kernel code size and reduces potential attack
      surface by removing the /dev/ecryptfs node.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      290502be
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  5. 03 3月, 2013 7 次提交
    • J
      mm: define VM_GROWSUP for CONFIG_METAG · 9ca52ed9
      James Hogan 提交于
      Commit cc2383ec ("mm: introduce
      arch-specific vma flag VM_ARCH_1") merged in v3.7-rc1.
      
      The above commit combined several arch-specific vma flags into one, and
      in the process it changed the VM_GROWSUP definition to depend on
      specific architectures rather than CONFIG_STACK_GROWSUP. Therefore add
      an ifdef for CONFIG_METAG to also set VM_GROWSUP.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-mm@kvack.org
      9ca52ed9
    • J
      metag: Internal and external irqchips · 5698c50d
      James Hogan 提交于
      Meta core internal interrupts (from HWSTATMETA and friends) are vectored
      onto the TR1 core trigger for the current thread. This is demultiplexed
      in irq-metag.c to individual Linux IRQs for each internal interrupt.
      
      External SoC interrupts (from HWSTATEXT and friends) are vectored onto
      the TR2 core trigger for the current thread. This is demultiplexed in
      irq-metag-ext.c to individual Linux IRQs for each external SoC interrupt.
      The external irqchip has devicetree bindings for configuring the number
      of irq banks and the type of masking available.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Dom Cobley <popcornmix@gmail.com>
      Cc: Simon Arlott <simon@fire.lp0.eu>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      5698c50d
    • J
      metag: Time keeping · a2c5d4ed
      James Hogan 提交于
      Add time keeping code for metag. Meta hardware threads have 2 timers.
      The background timer (TXTIMER) is used as a free-running time base, and
      the interrupt timer (TXTIMERI) is used for the timer interrupt. Both
      counters traditionally count at approximately 1MHz.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a2c5d4ed
    • J
      metag: ptrace · bc3966bf
      James Hogan 提交于
      The ptrace interface for metag provides access to some core register
      sets using the PTRACE_GETREGSET and PTRACE_SETREGSET operations. The
      details of the internal context structures is abstracted into user API
      structures to both ease use and allow flexibility to change the internal
      context layouts. Copyin and copyout functions for these register sets
      are exposed to allow signal handling code to use them to copy to and
      from the signal context.
      
      struct user_gp_regs (NT_PRSTATUS) provides access to the core general
      purpose register context.
      
      struct user_cb_regs (NT_METAG_CBUF) provides access to the TXCATCH*
      registers which contains information abuot a memory fault, unaligned
      access error or watchpoint. This can be modified to alter the way the
      fault is replayed on resume ("catch replay"), or to prevent the replay
      taking place.
      
      struct user_rp_state (NT_METAG_RPIPE) provides access to the state of
      the Meta read pipeline which can be used to hide memory latencies in
      hand optimised data loops.
      
      Extended DSP register state, DSP RAM, and hardware breakpoint registers
      aren't yet exposed through ptrace.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      bc3966bf
    • J
      asm-generic/unistd.h: handle symbol prefixes in cond_syscall · 4dd3c959
      James Hogan 提交于
      Some architectures have symbol prefixes and set CONFIG_SYMBOL_PREFIX,
      but this wasn't taken into account by the generic cond_syscall. It's
      easy enough to fix in a generic fashion, so add the symbol prefix to
      symbol names in cond_syscall when CONFIG_SYMBOL_PREFIX is set.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      4dd3c959
    • J
      asm-generic/io.h: check CONFIG_VIRT_TO_BUS · c93d0312
      James Hogan 提交于
      Make asm-generic/io.h check CONFIG_VIRT_TO_BUS before defining
      virt_to_bus() and bus_to_virt(), otherwise it's easy to accidentally
      have a silently failing incorrect direct mapped definition rather then
      no definition at all.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      c93d0312
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  6. 02 3月, 2013 9 次提交
    • A
      constify path_get/path_put and fs_struct.c stuff · dcf787f3
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dcf787f3
    • A
      cache the value of file_inode() in struct file · dd37978c
      Al Viro 提交于
      Note that this thing does *not* contribute to inode refcount;
      it's pinned down by dentry.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dd37978c
    • A
      dm: add target num_write_bios fn · b0d8ed4d
      Alasdair G Kergon 提交于
      Add a num_write_bios function to struct target.
      
      If an instance of a target sets this, it will be queried before the
      target's mapping function is called on a write bio, and the response
      controls the number of copies of the write bio that the target will
      receive.
      
      This provides a convenient way for a target to send the same data to
      more than one device.  The new cache target uses this in writethrough
      mode, to send the data both to the cache and the backing device.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b0d8ed4d
    • M
      dm kcopyd: introduce configurable throttling · df5d2e90
      Mikulas Patocka 提交于
      This patch allows the administrator to reduce the rate at which kcopyd
      issues I/O.
      
      Each module that uses kcopyd acquires a throttle parameter that can be
      set in /sys/module/*/parameters.
      
      We maintain a history of kcopyd usage by each module in the variables
      io_period and total_period in struct dm_kcopyd_throttle. The actual
      kcopyd activity is calculated as a percentage of time equal to
      "(100 * io_period / total_period)".  This is compared with the user-defined
      throttle percentage threshold and if it is exceeded, we sleep.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      df5d2e90
    • M
      dm ioctl: allow message to return data · a2606241
      Mikulas Patocka 提交于
      This patch introduces enhanced message support that allows the
      device-mapper core to recognise messages that are common to all devices,
      and for messages to return data to userspace.
      
      Core messages are processed by the function "message_for_md".  If the
      device mapper doesn't support the message, it is passed to the target
      driver.
      
      If the message returns data, the kernel sets the flag
      DM_MESSAGE_OUT_FLAG.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a2606241
    • M
      dm ioctl: optimize functions without variable params · 02cde50b
      Mikulas Patocka 提交于
      Device-mapper ioctls receive and send data in a buffer supplied
      by userspace.  The buffer has two parts.  The first part contains
      a 'struct dm_ioctl' and has a fixed size.  The second part depends
      on the ioctl and has a variable size.
      
      This patch recognises the specific ioctls that do not use the variable
      part of the buffer and skips allocating memory for it.
      
      In particular, when a device is suspended and a resume ioctl is sent,
      this now avoid memory allocation completely.
      
      The variable "struct dm_ioctl tmp" is moved from the function
      copy_params to its caller ctl_ioctl and renamed to param_kernel.
      It is used directly when the ioctl function doesn't need any arguments.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      02cde50b
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • M
      dm: fix truncated status strings · fd7c092e
      Mikulas Patocka 提交于
      Avoid returning a truncated table or status string instead of setting
      the DM_BUFFER_FULL_FLAG when the last target of a table fills the
      buffer.
      
      When processing a table or status request, the function retrieve_status
      calls ti->type->status. If ti->type->status returns non-zero,
      retrieve_status assumes that the buffer overflowed and sets
      DM_BUFFER_FULL_FLAG.
      
      However, targets don't return non-zero values from their status method
      on overflow. Most targets returns always zero.
      
      If a buffer overflow happens in a target that is not the last in the
      table, it gets noticed during the next iteration of the loop in
      retrieve_status; but if a buffer overflow happens in the last target, it
      goes unnoticed and erroneously truncated data is returned.
      
      In the current code, the targets behave in the following way:
      * dm-crypt returns -ENOMEM if there is not enough space to store the
        key, but it returns 0 on all other overflows.
      * dm-thin returns errors from the status method if a disk error happened.
        This is incorrect because retrieve_status doesn't check the error
        code, it assumes that all non-zero values mean buffer overflow.
      * all the other targets always return 0.
      
      This patch changes the ti->type->status function to return void (because
      most targets don't use the return code). Overflow is detected in
      retrieve_status: if the status method fills up the remaining space
      completely, it is assumed that buffer overflow happened.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd7c092e
    • R
      hsi: fix kernel-doc warnings · 8eae508b
      Randy Dunlap 提交于
      Fix kernel-doc warnings in hsi files:
      
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'e_handler' description in 'hsi_client'
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'pclaimed' description in 'hsi_client'
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'nb' description in 'hsi_client'
        Warning(drivers/hsi/hsi.c:434): No description found for parameter 'handler'
        Warning(drivers/hsi/hsi.c:434): Excess function parameter 'cb' description in 'hsi_register_port_event'
      
      Don't document "private:" fields with kernel-doc notation.
      If you want to leave them fully documented, that's OK, but
      then don't mark them as "private:".
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Carlos Chinea <carlos.chinea@nokia.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-omap@vger.kernel.org
      Acked-by: NNishanth Menon <nm@ti.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8eae508b
  7. 01 3月, 2013 8 次提交
    • F
      watchdog: core: dt: add support for the timeout-sec dt property · 3048253e
      Fabio Porcedda 提交于
      Add support for watchdog drivers to initialize/set the timeout field
      of the watchdog_device structure. The timeout field is initialised
      either with the module timeout parameter value (if valid) or with the
      timeout-sec dt property (if valid). If both are invalid the initial
      value is unchanged.
      Signed-off-by: NFabio Porcedda <fabio.porcedda@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
      3048253e
    • H
      watchdog: bcm47xx_wdt.c: use platform device · f82dedf8
      Hauke Mehrtens 提交于
      Instead of accessing the function to set the watchdog timer directly,
      register a platform driver the platform could register to use this
      watchdog driver.
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
      f82dedf8
    • W
      rtc: stmp3xxx: add wdt-accessor function · 1a71fb84
      Wolfram Sang 提交于
      This RTC also includes a watchdog timer. Provide an accessor function
      for setting the watchdog timeout value which will be picked up by a
      watchdog driver. Also register the platform_device for the watchdog here
      to get the boot-time dependencies right.
      Signed-off-by: NWolfram Sang <w.sang@pengutronix.de>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
      1a71fb84
    • N
      regulator: core: update kernel documentation for regulator_desc · 5838b032
      Nishanth Menon 提交于
      commit df367931 (regulator: core: Provide regmap get/set bypass
      operations) introduced regulator_[gs]et_bypass_regmap
      
      However structure documentation for regulator_desc needs an update.
      ./scripts/kernel-doc include/linux/regulator/driver.h >/dev/null
      generates:
      Warning(include/linux/regulator/driver.h:233): No description found for parameter 'bypass_reg'
      Warning(include/linux/regulator/driver.h:233): No description found for parameter 'bypass_mask'
      
      Cc: Liam Girdwood <lgirdwood@gmail.com>
      Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NNishanth Menon <nm@ti.com>
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      5838b032
    • T
      ext4: optimize ext4_es_shrink() · 24630774
      Theodore Ts'o 提交于
      When the system is under memory pressure, ext4_es_srhink() will get
      called very often.  So optimize returning the number of items in the
      file system's extent status cache by keeping a per-filesystem count,
      instead of calculating it each time by scanning all of the inodes in
      the extent status cache.
      
      Also rename the slab used for the extent status cache to be
      "ext4_extent_status" so it's obviousl the slab in question is created
      by ext4.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Zheng Liu <gnehzuil.liu@gmail.com>
      24630774
    • W
      NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS · 30005121
      Weston Andros Adamson 提交于
      The client will currently try LAYOUTGETs forever if a server is returning
      NFS4ERR_LAYOUTTRYLATER or NFS4ERR_RECALLCONFLICT - even if the client no
      longer needs the layout (ie process killed, unmounted).
      
      This patch uses the DS timeout value (module parameter 'dataserver_timeo'
      via rpc layer) to set an upper limit of how long the client tries LATOUTGETs
      in this situation.  Once the timeout is reached, IO is redirected to the MDS.
      
      This also changes how the client checks if a layout is on the clp list
      to avoid a double list_add.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      30005121
    • W
      SUNRPC: add call to get configured timeout · edddbb1e
      Weston Andros Adamson 提交于
      Returns the configured timeout for the xprt of the rpc client.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      edddbb1e
    • E
      tcp: avoid wakeups for pure ACK · 79ffef1f
      Eric Dumazet 提交于
      TCP prequeue mechanism purpose is to let incoming packets
      being processed by the thread currently blocked in tcp_recvmsg(),
      instead of behalf of the softirq handler, to better adapt flow
      control on receiver host capacity to schedule the consumer.
      
      But in typical request/answer workloads, we send request, then
      block to receive the answer. And before the actual answer, TCP
      stack receives the ACK packets acknowledging the request.
      
      Processing pure ACK on behalf of the thread blocked in tcp_recvmsg()
      is a waste of resources, as thread has to immediately sleep again
      because it got no payload.
      
      This patch avoids the extra context switches and scheduler overhead.
      
      Before patch :
      
      a:~# echo 0 >/proc/sys/net/ipv4/tcp_low_latency
      a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k
      231676
      
       Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k':
      
           116251.501765 task-clock                #   11.369 CPUs utilized
               5,025,463 context-switches          #    0.043 M/sec
               1,074,511 CPU-migrations            #    0.009 M/sec
                 216,923 page-faults               #    0.002 M/sec
         311,636,972,396 cycles                    #    2.681 GHz
         260,507,138,069 stalled-cycles-frontend   #   83.59% frontend cycles idle
         155,590,092,840 stalled-cycles-backend    #   49.93% backend  cycles idle
         100,101,255,411 instructions              #    0.32  insns per cycle
                                                   #    2.60  stalled cycles per insn
          16,535,930,999 branches                  #  142.243 M/sec
             646,483,591 branch-misses             #    3.91% of all branches
      
            10.225482774 seconds time elapsed
      
      After patch :
      
      a:~# echo 0 >/proc/sys/net/ipv4/tcp_low_latency
      a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k
      233297
      
       Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k':
      
            91084.870855 task-clock                #    8.887 CPUs utilized
               2,485,916 context-switches          #    0.027 M/sec
                 815,520 CPU-migrations            #    0.009 M/sec
                 216,932 page-faults               #    0.002 M/sec
         245,195,022,629 cycles                    #    2.692 GHz
         202,635,777,041 stalled-cycles-frontend   #   82.64% frontend cycles idle
         124,280,372,407 stalled-cycles-backend    #   50.69% backend  cycles idle
          83,457,289,618 instructions              #    0.34  insns per cycle
                                                   #    2.43  stalled cycles per insn
          13,431,472,361 branches                  #  147.461 M/sec
             504,470,665 branch-misses             #    3.76% of all branches
      
            10.249594448 seconds time elapsed
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ffef1f
  8. 28 2月, 2013 2 次提交