# 二分查找高效判定子序列
# Subsequence Using Binary Search
二分查找本身不难理解,难在巧妙地运用二分查找技巧。对于一个问题,你可能都很难想到它跟二分查找有关,比如前文 [最长递增子序列](../动态规划系列/动态规划设计:最长递增子序列.md) 就借助一个纸牌游戏衍生出二分查找解法。
**Translator: [youyun](https://github.com/youyun)**
今天再讲一道巧用二分查找的算法问题:如何判定字符串 `s` 是否是字符串 `t` 的子序列(可以假定 `s` 长度比较小,且 `t` 的长度非常大)。举两个例子:
**Author: [labuladong](https://github.com/labuladong)**
Binary search is not hard to understand. It is rather hard to apply. Sometimes, you can't even link a question with binary search. In another article [Longest Increasing Subsequence](../dynamic_programming/动态规划设计:最长递增子序列.md), we could even apply binary search in a poker game.
s = "abc", t = "**a**h**b**gd**c**", return true.
Let's discuss another interesting question that we can use binary search: how to determine if a given string `s` is subsequence of another string `t` (assume `s` is much shorter as compared to `t`)? Look at the two examples below:
> s = "abc", t = "**a**h**b**gd**c**", return true.
s = "axc", t = "ahbgdc", return false.
> s = "axc", t = "ahbgdc", return false.
This is a straightforward question which looks simple. But can you relate this with binary search?
### 一、问题分析
### 1. Problem Analysis
Here is an intuitive solution:
bool isSubsequence(string s, string t) {
其思路也非常简单,利用双指针 `i, j` 分别指向 `s, t`,一边前进一边匹配子序列:
其思路也非常简单,利用双指针 `i, j` 分别指向 `s, t`,一边前进一边匹配子序列:
The idea is to use two pointers `i, j` to point to `s, t` respectively. While moving forward, try to match the characters:
读者也许会问,这不就是最优解法了吗,时间复杂度只需 O(N),N 为 `t` 的长度。
Some people may claim this is the optimal solution, given the time complexity is O(N) while N is the length of `t`.
是的,如果仅仅是这个问题,这个解法就够好了,**不过这个问题还有 follow up**
In fact, this solution is good enough for this problem alone. __However, there is a follow-up__:
如果给你一系列字符串 `s1,s2,...` 和字符串 `t`,你需要判定每个串 `s` 是否是 `t` 的子序列(可以假定 `s` 较短,`t` 很长)。
Given a list of string `s1,s2,...` and a string `t`, determine if each string `s` is a subsequence of `t` (assume each `s` is much shorter as compared to `t`).
boolean[] isSubsequence(String[] sn, String t);
你也许会问,这不是很简单吗,还是刚才的逻辑,加个 for 循环不就行了?
We can still apply the same logic inside a `for` loop. However, the time complexity for each `s` is still O(N). If binary search is applied, the time complexity can be reduced to O(MlogN). Since `N >> M`, the efficiency will be improved significantly.
可以,但是此解法处理每个 `s` 时间复杂度仍然是 O(N),而如果巧妙运用二分查找,可以将时间复杂度降低,大约是 O(MlogN)。由于 N 相对 M 大很多,所以后者效率会更高。
### 2. Using Binary Search
### 二、二分思路
二分思路主要是对 `t` 进行预处理,用一个字典 `index` 将每个字符出现的索引位置按顺序存储下来:
To begin with binary search, we need to pre-process `t` by storing the indices of each character in a dictionary `index`.
int m = s.length(), n = t.length();
ArrayList<Integer>[] index = new ArrayList[256];
// 先记下 t 中每个字符出现的位置
// record down the indices of each character in t
for (int i = 0; i < n; i++) {
char c = t.charAt(i);
if (index[c] == null)
比如对于这个情况,匹配了 "ab",应该匹配 "c" 了:
比如对于这个情况,匹配了 "ab",应该匹配 "c" 了:
Refer to the diagram below, since we've matched "ab", the next one to be matched should be "c":
按照之前的解法,我们需要 `j` 线性前进扫描字符 "c",但借助 `index` 中记录的信息,**可以二分搜索 `index[c]` 中比 j 大的那个索引**,在上图的例子中,就是在 `[0,2,6]` 中搜索比 4 大的那个索引:
If we apply the first solution, we need to traverse linearly using `j` to find "c". With the information in `index`, __we can use binary search to find an index that is greater than `j` in `index["c"]`__. In the diagram above, we need to find an index from `[0, 2, 6]` that is greater than 4:
这样就可以直接得到下一个 "c" 的索引。现在的问题就是,如何用二分查找计算那个恰好比 4 大的索引呢?答案是,寻找左侧边界的二分搜索就可以做到。
In this way, we can directly get the index of next "c". The problem becomes how to find the smallest index that is greater than 4? We can use binary search to find the left boundary.
### 三、再谈二分查找
### 3. More about Binary Search
在前文 [二分查找详解](../算法思维系列/二分查找详解.md) 中,详解了如何正确写出三种二分查找算法的细节。二分查找返回目标值 `val` 的索引,对于搜索**左侧边界**的二分查找,有一个特殊性质:
In another article [Detailed Binary Search](../think_like_computer/Detailed%20Binary%20Search.md), we discussed in details how to implement binary search in 3 different ways. When we use binary search to return the index of target `val` to find __the left boundary__, there is a special property:
**当 `val` 不存在时,得到的索引恰好是比 `val` 大的最小元素索引**
__When `val` does not exist, the index returned is the index of the smallest value which is greater than `val`__.
什么意思呢,就是说如果在数组 `[0,1,3,4]` 中搜索元素 2,算法会返回索引 2,也就是元素 3 的位置,元素 3 是数组中大于 2 的最小元素。所以我们可以利用二分搜索避免线性扫描。
It means that when we try to find element 2 in array `[0,1,3,4]`, the algorithm will return index 2, where element 3 is located. And element 3 is the smallest element that is greater than 2 in this array. Hence, we can use binary search to avoid linear traversal.
// 查找左侧边界的二分查找
// binary search to find the left boundary
int left_bound(ArrayList<Integer> arr, int tar) {
int lo = 0, hi = arr.size();
while (lo < hi) {
......@@ -97,16 +97,16 @@ int left_bound(ArrayList<Integer> arr, int tar) {
The binary search above is to find the left boundary. Its details can be found in [Detailed Binary Search](../think_like_computer/Detailed%20Binary%20Search.md). Let's apply it.
### 四、代码实现
### 4. Implementation
这里以单个字符串 `s` 为例,对于多个字符串 `s`,可以把预处理部分抽出来。
We take a single string `s` as an example for the case of multiple strings. The part of pre-processing can be extracted out.
boolean isSubsequence(String s, String t) {
int m = s.length(), n = t.length();
// 对 t 进行预处理
// pre-process t
ArrayList<Integer>[] index = new ArrayList[256];
for (int i = 0; i < n; i++) {
char c = t.charAt(i);
......@@ -115,29 +115,25 @@ boolean isSubsequence(String s, String t) {
// 串 t 上的指针
// the pointer in t
int j = 0;
// 借助 index 查找 s[i]
// find s[i] using index
for (int i = 0; i < m; i++) {
char c = s.charAt(i);
// 整个 t 压根儿没有字符 c
// character c does not exist in t
if (index[c] == null) return false;
int pos = left_bound(index[c], j);
// 二分搜索区间中没有找到字符 c
// c is not found in the binary search interval
if (pos == index[c].size()) return false;
// 向前移动指针 j
// increment pointer j
j = index[c].get(pos) + 1;
return true;
The gif below illustrates how the algorithm executes:
坚持原创高质量文章,致力于把算法问题讲清楚,欢迎关注我的公众号 labuladong 获取最新文章:
We can see that the efficiency can be significantly improved using binary search.
