1. 28 3月, 2013 1 次提交
    • S
      net: add ETH_P_802_3_MIN · e5c5d22e
      Simon Horman 提交于
      Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
      an 802.3 frame. Frames with a lower value in the ethernet type field
      are Ethernet II.
      
      Also update all the users of this value that David Miller and
      I could find to use the new constant.
      
      Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
      should be >= not >.
      
      As suggested by Jesse Gross.
      
      Compile tested only.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: John W. Linville <linville@tuxdriver.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Bart De Schuymer <bart.de.schuymer@pandora.be>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Johan Hedberg <johan.hedberg@gmail.com>
      Cc: linux-bluetooth@vger.kernel.org
      Cc: netfilter-devel@vger.kernel.org
      Cc: bridge@lists.linux-foundation.org
      Cc: linux-wireless@vger.kernel.org
      Cc: linux1394-devel@lists.sourceforge.net
      Cc: linux-media@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: dev@openvswitch.org
      Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Acked-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5c5d22e
  2. 27 3月, 2013 1 次提交
  3. 22 3月, 2013 2 次提交
    • A
      netlink: Diag core and basic socket info dumping (v2) · eaaa3139
      Andrey Vagin 提交于
      The netlink_diag can be built as a module, just like it's done in
      unix sockets.
      
      The core dumping message carries the basic info about netlink sockets:
      family, type and protocol, portis, dst_group, dst_portid, state.
      
      Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.
      
      Netlink sockets cab be filtered by protocols.
      
      The socket inode number and cookie is reserved for future per-socket info
      retrieving. The per-protocol filtering is also reserved for future by
      requiring the sdiag_protocol to be zero.
      
      The file /proc/net/netlink doesn't provide enough information for
      dumping netlink sockets. It doesn't provide dst_group, dst_portid,
      groups above 32.
      
      v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaaa3139
    • A
      net: fix *_DIAG_MAX constants · ae5fc987
      Andrey Vagin 提交于
      Follow the common pattern and define *_DIAG_MAX like:
      
              [...]
              __XXX_DIAG_MAX,
      };
      
      Because everyone is used to do:
      
              struct nlattr *attrs[XXX_DIAG_MAX+1];
      
              nla_parse([...], XXX_DIAG_MAX, [...]
      Reported-by: NThomas Graf <tgraf@suug.ch>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5fc987
  4. 21 3月, 2013 2 次提交
  5. 20 3月, 2013 1 次提交
    • W
      packet: packet fanout rollover during socket overload · 77f65ebd
      Willem de Bruijn 提交于
      Changes:
        v3->v2: rebase (no other changes)
                passes selftest
        v2->v1: read f->num_members only once
                fix bug: test rollover mode + flag
      
      Minimize packet drop in a fanout group. If one socket is full,
      roll over packets to another from the group. Maintain flow
      affinity during normal load using an rxhash fanout policy, while
      dispersing unexpected traffic storms that hit a single cpu, such
      as spoofed-source DoS flows. Rollover breaks affinity for flows
      arriving at saturated sockets during those conditions.
      
      The patch adds a fanout policy ROLLOVER that rotates between sockets,
      filling each socket before moving to the next. It also adds a fanout
      flag ROLLOVER. If passed along with any other fanout policy, the
      primary policy is applied until the chosen socket is full. Then,
      rollover selects another socket, to delay packet drop until the
      entire system is saturated.
      
      Probing sockets is not free. Selecting the last used socket, as
      rollover does, is a greedy approach that maximizes chance of
      success, at the cost of extreme load imbalance. In practice, with
      sufficiently long queues to absorb bursts, sockets are drained in
      parallel and load balance looks uniform in `top`.
      
      To avoid contention, scales counters with number of sockets and
      accesses them lockfree. Values are bounds checked to ensure
      correctness.
      
      Tested using an application with 9 threads pinned to CPUs, one socket
      per thread and sufficient busywork per packet operation to limits each
      thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
      packets, a FANOUT_CPU setup processes 32 Kpps in total without this
      patch, 270 Kpps with the patch. Tested with read() and with a packet
      ring (V1).
      
      Also, passes psock_fanout.c unit test added to selftests.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77f65ebd
  6. 18 3月, 2013 2 次提交
    • C
      tcp: Remove TCPCT · 1a2c6181
      Christoph Paasch 提交于
      TCPCT uses option-number 253, reserved for experimental use and should
      not be used in production environments.
      Further, TCPCT does not fully implement RFC 6013.
      
      As a nice side-effect, removing TCPCT increases TCP's performance for
      very short flows:
      
      Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
      for files of 1KB size.
      
      before this patch:
      	average (among 7 runs) of 20845.5 Requests/Second
      after:
      	average (among 7 runs) of 21403.6 Requests/Second
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a2c6181
    • D
      vxlan: generalize forwarding tables · 6681712d
      David Stevens 提交于
      This patch generalizes VXLAN forwarding table entries allowing an administrator
      to:
      	1) specify multiple destinations for a given MAC
      	2) specify alternate vni's in the VXLAN header
      	3) specify alternate destination UDP ports
      	4) use multicast MAC addresses as fdb lookup keys
      	5) specify multicast destinations
      	6) specify the outgoing interface for forwarded packets
      
      The combination allows configuration of more complex topologies using VXLAN
      encapsulation.
      
      Changes since v1: rebase to 3.9.0-rc2
      Signed-Off-By: NDavid L Stevens <dlstevens@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6681712d
  7. 14 3月, 2013 3 次提交
    • D
      UAPI: fix endianness conditionals in linux/raid/md_p.h · ca044f9a
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of struct mdp_superblock_s in linux/raid/md_p.h is wrong in
      this way.  Note that userspace will likely interpret the ordering of the
      fields incorrectly as the big-endian variant on a little-endian machines -
      depending on header inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the ordering of events_hi, events_lo, cp_events_hi and
      cp_events_lo in struct mdp_superblock_s / typedef mdp_super_t.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca044f9a
    • D
      UAPI: fix endianness conditionals in linux/acct.h · 29ba06b9
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of ACCT_BYTEORDER in linux/acct.h is wrong in this way.
      Note that userspace will likely interpret this incorrectly as the
      big-endian variant on little-endian machines - depending on header
      inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the value of ACCT_BYTEORDER.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29ba06b9
    • D
      UAPI: fix endianness conditionals in linux/aio_abi.h · 51b154ed
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of PADDED() in linux/aio_abi.h is wrong in this way.  Note
      that userspace will likely interpret this and thus the order of fields in
      struct iocb incorrectly as the little-endian variant on big-endian
      machines - depending on header inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the ordering of aio_key and aio_reserved1 in struct iocb.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51b154ed
  8. 12 3月, 2013 3 次提交
    • L
      tty/serial: Add support for Altera serial port · e06c93ca
      Ley Foon Tan 提交于
      Add support for Altera 8250/16550 compatible serial port.
      Signed-off-by: NLey Foon Tan <lftan@altera.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e06c93ca
    • N
      tcp: TLP loss detection. · 9b717a8d
      Nandita Dukkipati 提交于
      This is the second of the TLP patch series; it augments the basic TLP
      algorithm with a loss detection scheme.
      
      This patch implements a mechanism for loss detection when a Tail
      loss probe retransmission plugs a hole thereby masking packet loss
      from the sender. The loss detection algorithm relies on counting
      TLP dupacks as outlined in Sec. 3 of:
      http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
      
      The basic idea is: Sender keeps track of TLP "episode" upon
      retransmission of a TLP packet. An episode ends when the sender receives
      an ACK above the SND.NXT (tracked by tlp_high_seq) at the time of the
      episode. We want to make sure that before the episode ends the sender
      receives a "TLP dupack", indicating that the TLP retransmission was
      unnecessary, so there was no loss/hole that needed plugging. If the
      sender gets no TLP dupack before the end of the episode, then it reduces
      ssthresh and the congestion window, because the TLP packet arriving at
      the receiver probably plugged a hole.
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b717a8d
    • N
      tcp: Tail loss probe (TLP) · 6ba8a3b1
      Nandita Dukkipati 提交于
      This patch series implement the Tail loss probe (TLP) algorithm described
      in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
      first patch implements the basic algorithm.
      
      TLP's goal is to reduce tail latency of short transactions. It achieves
      this by converting retransmission timeouts (RTOs) occuring due
      to tail losses (losses at end of transactions) into fast recovery.
      TLP transmits one packet in two round-trips when a connection is in
      Open state and isn't receiving any ACKs. The transmitted packet, aka
      loss probe, can be either new or a retransmission. When there is tail
      loss, the ACK from a loss probe triggers FACK/early-retransmit based
      fast recovery, thus avoiding a costly RTO. In the absence of loss,
      there is no change in the connection state.
      
      PTO stands for probe timeout. It is a timer event indicating
      that an ACK is overdue and triggers a loss probe packet. The PTO value
      is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
      ACK timer when there is only one oustanding packet.
      
      TLP Algorithm
      
      On transmission of new data in Open state:
        -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms).
        -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
        -> PTO = min(PTO, RTO)
      
      Conditions for scheduling PTO:
        -> Connection is in Open state.
        -> Connection is either cwnd limited or no new data to send.
        -> Number of probes per tail loss episode is limited to one.
        -> Connection is SACK enabled.
      
      When PTO fires:
        new_segment_exists:
          -> transmit new segment.
          -> packets_out++. cwnd remains same.
      
        no_new_packet:
          -> retransmit the last segment.
             Its ACK triggers FACK or early retransmit based recovery.
      
      ACK path:
        -> rearm RTO at start of ACK processing.
        -> reschedule PTO if need be.
      
      In addition, the patch includes a small variation to the Early Retransmit
      (ER) algorithm, such that ER and TLP together can in principle recover any
      N-degree of tail loss through fast recovery. TLP is controlled by the same
      sysctl as ER, tcp_early_retrans sysctl.
      tcp_early_retrans==0; disables TLP and ER.
      		 ==1; enables RFC5827 ER.
      		 ==2; delayed ER.
      		 ==3; TLP and delayed ER. [DEFAULT]
      		 ==4; TLP only.
      
      The TLP patch series have been extensively tested on Google Web servers.
      It is most effective for short Web trasactions, where it reduced RTOs by 15%
      and improved HTTP response time (average by 6%, 99th percentile by 10%).
      The transmitted probes account for <0.5% of the overall transmissions.
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ba8a3b1
  9. 11 3月, 2013 2 次提交
  10. 09 3月, 2013 1 次提交
  11. 07 3月, 2013 1 次提交
  12. 06 3月, 2013 7 次提交
    • T
      nl80211: user_mpm overrides auto_open_plinks · d37bb18a
      Thomas Pedersen 提交于
      If the user requested a userspace MPM, automatically
      disable auto_open_plinks to fully disable the kernel MPM.
      Signed-off-by: NThomas Pedersen <thomas@cozybit.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      d37bb18a
    • T
      nl80211: explicit userspace MPM · bb2798d4
      Thomas Pedersen 提交于
      Secure mesh had the implicit requirement that the Mesh
      Peering Management entity be in userspace.  However
      userspace might want to implement an open MPM as well, so
      specify a mesh setup parameter to indicate this.
      Signed-off-by: NThomas Pedersen <thomas@cozybit.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      bb2798d4
    • J
      cfg80211: Extend support for IEEE 802.11r Fast BSS Transition · 355199e0
      Jouni Malinen 提交于
      Add NL80211_CMD_UPDATE_FT_IES to support update of FT IEs to the WLAN
      driver and NL80211_CMD_FT_EVENT to send FT events from the WLAN driver.
      This will carry the target AP's MAC address along with the relevant
      Information Elements. This event is used to report received FT IEs
      (MDIE, FTIE, RSN IE, TIE, RICIE). These changes allow FT to be supported
      with drivers that use an internal SME instead of user space option (like
      FT implementation in wpa_supplicant with mac80211-based drivers).
      Signed-off-by: NJouni Malinen <jouni@qca.qualcomm.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      355199e0
    • J
      cfg80211: add ability to override VHT capabilities · ee2aca34
      Johannes Berg 提交于
      For testing it's sometimes useful to be able to
      override certain VHT capability advertisement,
      add the ability to do that in cfg80211.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      ee2aca34
    • J
      nl80211: allow splitting wiphy information in dumps · 3713b4e3
      Johannes Berg 提交于
      The per-wiphy information is getting large, to the point
      where with more than the typical number of channels it's
      too large and overflows, and userspace can't get any of
      the information at all.
      
      To address this (in a way that doesn't require making all
      messages bigger) allow userspace to specify that it can
      deal with wiphy information split across multiple parts
      of the dump, and if it can split up the data. This also
      splits up each channel separately so an arbitrary number
      of channels can be supported.
      
      Additionally, since GET_WIPHY has the same problem, add
      support for filtering the wiphy dump and get information
      for a single wiphy only, this allows userspace apps to
      use dump in this case to retrieve all data from a single
      device.
      
      As userspace needs to know if all this this is supported,
      add a global nl80211 feature set and include a bit for
      this behaviour in it.
      
      Cc: Dennis H Jensen <dennis.h.jensen@siemens.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      3713b4e3
    • J
      cfg80211: comprehensively check station changes · 77ee7c89
      Johannes Berg 提交于
      The station change API isn't being checked properly before
      drivers are called, and as a result it is difficult to see
      what should be allowed and what not.
      
      In order to comprehensively check the API parameters parse
      everything first, and then have the driver call a function
      (cfg80211_check_station_change()) with the additionally
      information about the kind of station that is being changed;
      this allows the function to make better decisions than the
      old code could.
      
      While at it, also add a few checks, particularly in mesh
      and clarify the TDLS station lifetime in documentation.
      
      To be able to reduce a few checks, ignore any flag set bits
      when the mask isn't set, they shouldn't be applied then.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      77ee7c89
    • J
      cfg80211: clean up mesh plink station change API · f8bacc21
      Johannes Berg 提交于
      Make the ability to leave the plink_state unchanged not use a
      magic -1 variable that isn't in the enum, but an explicit change
      flag; reject invalid plink states or actions and move the needed
      constants for plink actions to the right header file. Also
      reject plink_state changes for non-mesh interfaces.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      f8bacc21
  13. 03 3月, 2013 1 次提交
    • J
      metag: ptrace · bc3966bf
      James Hogan 提交于
      The ptrace interface for metag provides access to some core register
      sets using the PTRACE_GETREGSET and PTRACE_SETREGSET operations. The
      details of the internal context structures is abstracted into user API
      structures to both ease use and allow flexibility to change the internal
      context layouts. Copyin and copyout functions for these register sets
      are exposed to allow signal handling code to use them to copy to and
      from the signal context.
      
      struct user_gp_regs (NT_PRSTATUS) provides access to the core general
      purpose register context.
      
      struct user_cb_regs (NT_METAG_CBUF) provides access to the TXCATCH*
      registers which contains information abuot a memory fault, unaligned
      access error or watchpoint. This can be modified to alter the way the
      fault is replayed on resume ("catch replay"), or to prevent the replay
      taking place.
      
      struct user_rp_state (NT_METAG_RPIPE) provides access to the state of
      the Meta read pipeline which can be used to hide memory latencies in
      hand optimised data loops.
      
      Extended DSP register state, DSP RAM, and hardware breakpoint registers
      aren't yet exposed through ptrace.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      bc3966bf
  14. 02 3月, 2013 2 次提交
    • M
      dm ioctl: allow message to return data · a2606241
      Mikulas Patocka 提交于
      This patch introduces enhanced message support that allows the
      device-mapper core to recognise messages that are common to all devices,
      and for messages to return data to userspace.
      
      Core messages are processed by the function "message_for_md".  If the
      device mapper doesn't support the message, it is passed to the target
      driver.
      
      If the message returns data, the kernel sets the flag
      DM_MESSAGE_OUT_FLAG.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a2606241
    • M
      dm ioctl: optimize functions without variable params · 02cde50b
      Mikulas Patocka 提交于
      Device-mapper ioctls receive and send data in a buffer supplied
      by userspace.  The buffer has two parts.  The first part contains
      a 'struct dm_ioctl' and has a fixed size.  The second part depends
      on the ioctl and has a variable size.
      
      This patch recognises the specific ioctls that do not use the variable
      part of the buffer and skips allocating memory for it.
      
      In particular, when a device is suspended and a resume ioctl is sent,
      this now avoid memory allocation completely.
      
      The variable "struct dm_ioctl tmp" is moved from the function
      copy_params to its caller ctl_ioctl and renamed to param_kernel.
      It is used directly when the ioctl function doesn't need any arguments.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      02cde50b
  15. 28 2月, 2013 5 次提交
    • A
      nbd: support FLUSH requests · 75f187ab
      Alex Bligh 提交于
      Currently, the NBD device does not accept flush requests from the Linux
      block layer.  If the NBD server opened the target with neither O_SYNC nor
      O_DSYNC, however, the device will be effectively backed by a writeback
      cache.  Without issuing flushes properly, operation of the NBD device will
      not be safe against power losses.
      
      The NBD protocol has support for both a cache flush command and a FUA
      command flag; the server will also pass a flag to note its support for
      these features.  This patch adds support for the cache flush command and
      flag.  In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl,
      and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush.  When the
      flag is active the block layer will send REQ_FLUSH requests, which we
      translate to NBD_CMD_FLUSH commands.
      
      FUA support is not included in this patch because all free software
      servers implement it with a full fdatasync; thus it has no advantage over
      supporting flush only.  Because I [Paolo] cannot really benchmark it in a
      realistic scenario, I cannot tell if it is a good idea or not.  It is also
      not clear if it is valid for an NBD server to support FUA but not flush.
      The Linux block layer gives a warning for this combination, the NBD
      protocol documentation says nothing about it.
      
      The patch also fixes a small problem in the handling of flags: nbd->flags
      must be cleared at the end of NBD_DO_IT, but the driver was not doing
      that.  The bug manifests itself as follows.  Suppose you two different
      client/server pairs to start the NBD device.  Suppose also that the first
      client supports NBD_SET_FLAGS, and the first server sends
      NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two
      things.  Before this patch, the second invocation of NBD_DO_IT will use a
      stale value of nbd->flags, and the second server will issue an error every
      time it receives an NBD_CMD_FLUSH command.
      
      This bug is pre-existing, but it becomes much more important after this
      patch; flush failures make the device pretty much unusable, unlike
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAlex Bligh <alex@alex.org.uk>
      Acked-by: NPaul Clements <Paul.Clements@steeleye.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75f187ab
    • R
      ipmi: remove superfluous kernel/userspace explanation · 59fb1b9f
      Robert P. J. Day 提交于
      Given the obvious distinction between kernel and userspace supported
      by uapi/, it seems unnecessary to comment on that.
      Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59fb1b9f
    • O
      fat: mark fs as dirty on mount and clean on umount · b88a1058
      Oleksij Rempel 提交于
      There is no documented methods to mark FAT as dirty.  Unofficially MS
      started to use reserved Byte in boot sector for this purpose, at least
      since Win 2000.  With Win 7 user is warned if fs is dirty and asked to
      clean it.
      
      Different versions of Win, handle it in different ways, but always have
      same meaning:
      
      - Win 2000 and XP, set it on write operations and
        remove it after operation was finnished
      - Win 7, set dirty flag on first write and remove it on umount.
      
      We will do it as follows:
      
      - set dirty flag on mount. If fs was initially dirty, warn user,
        remember it and do not do any changes to boot sector.
      - clean it on umount. If fs was initially dirty, leave it dirty.
      - do not do any thing if fs mounted read-only.
      - TODO: leave fs dirty if we found some error after mount.
      Signed-off-by: NOleksij Rempel <bug-track@fisher-privat.net>
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b88a1058
    • O
      fat: add extended fileds to struct fat_boot_sector · 6b46419b
      Oleksij Rempel 提交于
      Later we will need "state" field to check if volume was cleanly unmounted.
      Signed-off-by: NOleksij Rempel <bug-track@fisher-privat.net>
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b46419b
    • V
      hfsplus: add osx.* prefix for handling namespace of Mac OS X extended attributes · 5841ca09
      Vyacheslav Dubeyko 提交于
      hfsplus: reworked support of extended attributes.
      
      Current mainline implementation of hfsplus file system driver treats as
      extended attributes only two fields (fdType and fdCreator) of user_info
      field in file description record (struct hfsplus_cat_file).  It is
      possible to get or set only these two fields as extended attributes.
      But HFS+ treats as com.apple.FinderInfo extended attribute an union of
      user_info and finder_info fields as for file (struct hfsplus_cat_file)
      as for folder (struct hfsplus_cat_folder).  Moreover, current mainline
      implementation of hfsplus file system driver doesn't support special
      metadata file - attributes tree.
      
      Mac OS X 10.4 and later support extended attributes by making use of the
      HFS+ filesystem Attributes file B*-tree feature which allows for named
      forks.  Mac OS X supports only inline extended attributes, limiting
      their size to 3802 bytes.  Any regular file may have a list of extended
      attributes.  HFS+ supports an arbitrary number of named forks.  Each
      attribute is denoted by a name and the associated data.  The name is a
      null-terminated Unicode string.  It is possible to list, to get, to set,
      and to remove extended attributes from files or directories.
      
      It exists some peculiarity during getting of extended attributes list by
      means of getfattr utility.  The getfattr utility expects prefix "user."
      before any extended attribute's name.  So, it ignores any names that
      don't contained such prefix.  Such behavior of getfattr utility results
      in unexpected empty output of extended attributes list even in the case
      when file (or folder) contains extended attributes.  It needs to use
      empty string as regular expression pattern for names matching (getfattr
      --match="").
      
      For support of extended attributes in HFS+:
      1. It was added necessary on-disk layout declarations related to Attributes
         tree into hfsplus_raw.h file.
      2. It was added attributes.c file with implementation of functionality of
         manipulation by records in Attributes tree.
      3. It was reworked hfsplus_listxattr, hfsplus_getxattr, hfsplus_setxattr
         functions in ioctl.c. Moreover, it was added hfsplus_removexattr method.
      
      This patch:
      
      Add osx.* prefix for handling namespace of Mac OS X extended attributes.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Reported-by: NHin-Tak Leung <htl10@users.sourceforge.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5841ca09
  16. 22 2月, 2013 2 次提交
  17. 21 2月, 2013 4 次提交