1. 06 10月, 2017 20 次提交
    • Y
      tcp: more efficient RACK loss detection · 043b87d7
      Yuchung Cheng 提交于
      Use the new time-ordered list to speed up RACK. The detection
      logic is identical. But since the list is chronologically ordered
      by skb_mstamp and contains only skbs not yet acked or sacked,
      RACK can abort the loop upon hitting skbs that were sent more
      recently. On YouTube servers this patch reduces the iterations on
      write queue by 40x. The improvement is even bigger with large
      BDP networks.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      043b87d7
    • E
      tcp: new list for sent but unacked skbs for RACK recovery · e2080072
      Eric Dumazet 提交于
      This patch adds a new queue (list) that tracks the sent but not yet
      acked or SACKed skbs for a TCP connection. The list is chronologically
      ordered by skb->skb_mstamp (the head is the oldest sent skb).
      
      This list will be used to optimize TCP Rack recovery, which checks
      an skb's timestamp to judge if it has been lost and needs to be
      retransmitted. Since TCP write queue is ordered by sequence instead
      of sent time, RACK has to scan over the write queue to catch all
      eligible packets to detect lost retransmission, and iterates through
      SACKed skbs repeatedly.
      
      Special cares for rare events:
      1. TCP repair fakes skb transmission so the send queue needs adjusted
      2. SACK reneging would require re-inserting SACKed skbs into the
         send queue. For now I believe it's not worth the complexity to
         make RACK work perfectly on SACK reneging, so we do nothing here.
      3. Fast Open: currently for non-TFO, send-queue correctly queues
         the pure SYN packet. For TFO which queues a pure SYN and
         then a data packet, send-queue only queues the data packet but
         not the pure SYN due to the structure of TFO code. This is okay
         because the SYN receiver would never respond with a SACK on a
         missing SYN (i.e. SYN is never fast-retransmitted by SACK/RACK).
      
      In order to not grow sk_buff, we use an union for the new list and
      _skb_refdst/destructor fields. This is a bit complicated because
      we need to make sure _skb_refdst and destructor are properly zeroed
      before skb is cloned/copied at transmit, and before being freed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2080072
    • A
      RDS: IB: Initialize max_items based on underlying device attributes · b1fb67fa
      Avinash Repaka 提交于
      Use max_1m_mrs/max_8k_mrs while setting max_items, as the former
      variables are set based on the underlying device attributes.
      Signed-off-by: NAvinash Repaka <avinash.repaka@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1fb67fa
    • A
      RDS: IB: Limit the scope of has_fr/has_fmr variables · 9dff9936
      Avinash Repaka 提交于
      This patch fixes the scope of has_fr and has_fmr variables as they are
      needed only in rds_ib_add_one().
      Signed-off-by: NAvinash Repaka <avinash.repaka@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dff9936
    • T
      net/ipv4: Remove unused variable in route.c · 1bcdca3f
      Tim Hansen 提交于
      int rc is unmodified after initalization in net/ipv4/route.c, this patch simply cleans up that variable and returns 0.
      
      This was found with coccicheck M=net/ipv4/ on linus' tree.
      Signed-off-by: NTim Hansen <devtimhansen@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bcdca3f
    • W
      tcp: clean up TFO server's initial tcp_rearm_rto() call · 6d05081e
      Wei Wang 提交于
      This commit does a cleanup and moves tcp_rearm_rto() call in the TFO
      server case into a previous spot in tcp_rcv_state_process() to make
      it more compact.
      This is only a cosmetic change.
      Suggested-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d05081e
    • W
      tcp: uniform the set up of sockets after successful connection · 27204aaa
      Wei Wang 提交于
      Currently in the TCP code, the initialization sequence for cached
      metrics, congestion control, BPF, etc, after successful connection
      is very inconsistent. This introduces inconsistent bevhavior and is
      prone to bugs. The current call sequence is as follows:
      
      (1) for active case (tcp_finish_connect() case):
              tcp_mtup_init(sk);
              icsk->icsk_af_ops->rebuild_header(sk);
              tcp_init_metrics(sk);
              tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
              tcp_init_congestion_control(sk);
              tcp_init_buffer_space(sk);
      
      (2) for passive case (tcp_rcv_state_process() TCP_SYN_RECV case):
              icsk->icsk_af_ops->rebuild_header(sk);
              tcp_call_bpf(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
              tcp_init_congestion_control(sk);
              tcp_mtup_init(sk);
              tcp_init_buffer_space(sk);
              tcp_init_metrics(sk);
      
      (3) for TFO passive case (tcp_fastopen_create_child()):
              inet_csk(child)->icsk_af_ops->rebuild_header(child);
              tcp_init_congestion_control(child);
              tcp_mtup_init(child);
              tcp_init_metrics(child);
              tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
              tcp_init_buffer_space(child);
      
      This commit uniforms the above functions to have the following sequence:
              tcp_mtup_init(sk);
              icsk->icsk_af_ops->rebuild_header(sk);
              tcp_init_metrics(sk);
              tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE/PASSIVE_ESTABLISHED_CB);
              tcp_init_congestion_control(sk);
              tcp_init_buffer_space(sk);
      This sequence is the same as the (1) active case. We pick this sequence
      because this order correctly allows BPF to override the settings
      including congestion control module and initial cwnd, etc from
      the route, and then allows the CC module to see those settings.
      Suggested-by: NNeal Cardwell <ncardwell@google.com>
      Tested-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27204aaa
    • D
      Merge branch 'VSOCK-sock_diag' · 5820299a
      David S. Miller 提交于
      Stefan Hajnoczi says:
      
      ====================
      VSOCK: add sock_diag interface
      
      v3:
       * Rebased onto net-next/master and resolved Hyper-V transport conflict
      
      v2:
       * Moved tests to tools/testing/vsock/.  I was unable to put them in selftests/
         because they require manual setup of a VMware/KVM guest.
       * Moved to __vsock_in_bound/connected_table() to af_vsock.h
       * Fixed local variable ordering in Patch 4
      
      There is currently no way for userspace to query open AF_VSOCK sockets.  This
      means ss(8), netstat(8), and other utilities cannot display AF_VSOCK sockets.
      
      This patch series adds the netlink sock_diag interface for AF_VSOCK.  Userspace
      programs sent a DUMP request including an sk_state bitmap to filter sockets
      based on their state (connected, listening, etc).  The vsock_diag.ko module
      replies with information about matching sockets.  This userspace ABI is defined
      in <linux/vm_sockets_diag.h>.
      
      The final patch adds a test suite that exercises the basic cases.
      
      Jorgen and Dexuan: I have only tested the virtio transport but this should also
      work for VMCI and Hyper-V.  Please give it a shot if you have time.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5820299a
    • S
      VSOCK: add tools/testing/vsock/vsock_diag_test · 0b025033
      Stefan Hajnoczi 提交于
      This patch adds tests for the vsock_diag.ko module.
      
      These tests are not self-tests because they require manual set up of a
      KVM or VMware guest.  Please see tools/testing/vsock/README for
      instructions.
      
      The control.h and timeout.h infrastructure can be used for additional
      AF_VSOCK tests in the future.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b025033
    • S
      VSOCK: add sock_diag interface · 413a4317
      Stefan Hajnoczi 提交于
      This patch adds the sock_diag interface for querying sockets from
      userspace.  Tools like ss(8) and netstat(8) can use this interface to
      list open sockets.
      
      The userspace ABI is defined in <linux/vm_sockets_diag.h> and includes
      netlink request and response structs.  The request can query sockets
      based on their sk_state (e.g. listening sockets only) and the response
      contains socket information fields including the local/remote addresses,
      inode number, etc.
      
      This patch does not dump VMCI pending sockets because I have only tested
      the virtio transport, which does not use pending sockets.  Support can
      be added later by extending vsock_diag_dump() if needed by VMCI users.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      413a4317
    • S
      VSOCK: use TCP state constants for sk_state · 3b4477d2
      Stefan Hajnoczi 提交于
      There are two state fields: socket->state and sock->sk_state.  The
      socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
      sock->sk_state typically uses values that match TCP state constants
      (TCP_CLOSE, TCP_ESTABLISHED).  AF_VSOCK does not follow this convention
      and instead uses SS_* constants for both fields.
      
      The sk_state field will be exposed to userspace through the vsock_diag
      interface for ss(8), netstat(8), and other programs.
      
      This patch switches sk_state to TCP state constants so that the meaning
      of this field is consistent with other address families.  Not just
      AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.
      
      The following mapping was used to convert the code:
      
        SS_FREE -> TCP_CLOSE
        SS_UNCONNECTED -> TCP_CLOSE
        SS_CONNECTING -> TCP_SYN_SENT
        SS_CONNECTED -> TCP_ESTABLISHED
        SS_DISCONNECTING -> TCP_CLOSING
        VSOCK_SS_LISTEN -> TCP_LISTEN
      
      In __vsock_create() the sk_state initialization was dropped because
      sock_init_data() already initializes sk_state to TCP_CLOSE.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b4477d2
    • S
      VSOCK: move __vsock_in_bound/connected_table() to af_vsock.h · bf359b81
      Stefan Hajnoczi 提交于
      The vsock_diag.ko module will need to check socket table membership.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf359b81
    • S
      VSOCK: export socket tables for sock_diag interface · 44f20980
      Stefan Hajnoczi 提交于
      The socket table symbols need to be exported from vsock.ko so that the
      vsock_diag.ko module will be able to traverse sockets.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44f20980
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 53954cf8
      David S. Miller 提交于
      Just simple overlapping changes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53954cf8
    • L
      Merge tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7a92616c
      Linus Torvalds 提交于
      Pull power management fix from Rafael Wysocki:
       "This fixes a code ordering issue in the main suspend-to-idle loop that
        causes some "low power S0 idle" conditions to be incorrectly reported
        as unmet with suspend/resume debug messages enabled"
      
      * tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / s2idle: Invoke the ->wake() platform callback earlier
      7a92616c
    • R
      Merge branch 'pm-sleep' · ca935f8e
      Rafael J. Wysocki 提交于
      * pm-sleep:
        PM / s2idle: Invoke the ->wake() platform callback earlier
      ca935f8e
    • L
      Merge tag 'for-4.14/dm-fixes' of... · 076264ad
      Linus Torvalds 提交于
      Merge tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - a stable fix for the alignment of the event number reported at the
         end of the 'DM_LIST_DEVICES' ioctl.
      
       - a couple stable fixes for the DM crypt target.
      
       - a DM raid health status reporting fix.
      
      * tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm raid: fix incorrect status output at the end of a "recover" process
        dm crypt: reject sector_size feature if device length is not aligned to it
        dm crypt: fix memory leak in crypt_ctr_cipher_old()
        dm ioctl: fix alignment of event number in the device list
      076264ad
    • J
      dm raid: fix incorrect status output at the end of a "recover" process · 41dcf197
      Jonathan Brassow 提交于
      There are three important fields that indicate the overall health and
      status of an array: dev_health, sync_ratio, and sync_action.  They tell
      us the condition of the devices in the array, and the degree to which
      the array is synchronized.
      
      This commit fixes a condition that is reported incorrectly.  When a member
      of the array is being rebuilt or a new device is added, the "recover"
      process is used to synchronize it with the rest of the array.  When the
      process is complete, but the sync thread hasn't yet been reaped, it is
      possible for the state of MD to be:
       mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ]
       curr_resync_completed = <max dev size> (but not MaxSector)
       and all rdevs to be In_sync.
      This causes the 'array_in_sync' output parameter that is passed to
      rs_get_progress() to be computed incorrectly and reported as 'false' --
      or not in-sync.  This in turn causes the dev_health status characters to
      be reported as all 'a', rather than the proper 'A'.
      
      This can cause erroneous output for several seconds at a time when tools
      will want to be checking the condition due to events that are raised at
      the end of a sync process.  Fix this by properly calculating the
      'array_in_sync' return parameter in rs_get_progress().
      
      Also, remove an unnecessary intermediate 'recovery_cp' variable in
      rs_get_progress().
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      41dcf197
    • L
      Merge tag 'sound-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 0f380715
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes, mostly with stable ones:
      
       - X32 ABI fix for PCM; likely not so many people suffer from it, but
         still better to fix
      
       - Two minor kernel warning fixes on USB audio devices spotted by
         syzkaller
      
       - Regression fix of echoaudio due to its inconsistent dimension
      
       - Fix for HBR support on Intel DP audio, on some recent chips
      
       - USB-audio quirk for yet another Plantronics devices
      
       - Fix for potential double-fetch in ASIHPI FIFO queue"
      
      * tag 'sound-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usx2y: Suppress kernel warning at page allocation failures
        Revert "ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members"
        ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor
        ALSA: pcm: Fix structure definition for X32 ABI
        ALSA: usb-audio: Add sample rate quirk for Plantronics C310/C520-M
        ALSA: hda - program ICT bits to support HBR audio
        ALSA: asihpi: fix a potential double-fetch bug when copying puhm
        ALSA: compress: Remove unused variable
      0f380715
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 77ede3a0
      Linus Torvalds 提交于
      Pull HID subsystem fixes from Jiri Kosina:
      
       - buffer management size fix for i2c-hid driver, from Adrian Salido
      
       - tool ID regression fixes for Wacom driver from Jason Gerecke
      
       - a few small assorted fixes and a few device ID additions
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        Revert "HID: multitouch: Support ALPS PTP stick with pid 0x120A"
        HID: hidraw: fix power sequence when closing device
        HID: wacom: Always increment hdev refcount within wacom_get_hdev_data
        HID: wacom: generic: Clear ABS_MISC when tool leaves proximity
        HID: wacom: generic: Send MSC_SERIAL and ABS_MISC when leaving prox
        HID: i2c-hid: allocate hid buffers for real worst case
        HID: rmi: Make sure the HID device is opened on resume
        HID: multitouch: Support ALPS PTP stick with pid 0x120A
        HID: multitouch: support buttons and trackpoint on Lenovo X1 Tab Gen2
        HID: wacom: Correct coordinate system of touchring and pen twist
        HID: wacom: Properly report negative values from Intuos Pro 2 Bluetooth
        HID: multitouch: Fix system-control buttons not working
        HID: add multi-input quirk for IDC6680 touchscreen
        HID: wacom: leds: Don't try to control the EKR's read-only LEDs
        HID: wacom: bits shifted too much for 9th and 10th buttons
      77ede3a0
  2. 05 10月, 2017 20 次提交