如何找到包含给定字符串中所有字符的最小子字符串?

时间:2022-09-07 00:26:52

I have recently come across an interesting question on strings. Suppose you are given following:

我最近遇到一个关于字符串的有趣问题。假设你有以下条件:

Input string1: "this is a test string"
Input string2: "tist"
Output string: "t stri"

So, given above, how can I approach towards finding smallest substring of string1 that contains all the characters from string 2?

那么,在上面的例子中,如何找到包含字符串2中所有字符的string1的最小子字符串呢?

12 个解决方案

#1


32  

You can do a histogram sweep in O(N+M) time and O(1) space where N is the number of characters in the first string and M is the number of characters in the second.

可以在O(N+M)时间和O(1)空间中进行直方图扫描,其中N是第一个字符串中的字符数,M是第二个字符串中的字符数。

It works like this:

是这样的:

  • Make a histogram of the second string's characters (key operation is hist2[ s2[i] ]++).
  • 制作第二个字符串字符的直方图(关键操作是hist2[s2[i]]]++ +)。
  • Make a cumulative histogram of the first string's characters until that histogram contains every character that the second string's histogram contains (which I will call "the histogram condition").
  • 对第一个字符串的字符进行累积直方图,直到该直方图包含第二个字符串的直方图所包含的每个字符(我称之为“直方图条件”)。
  • Then move forwards on the first string, subtracting from the histogram, until it fails to meet the histogram condition. Mark that bit of the first string (before the final move) as your tentative substring.
  • 然后在第一个字符串上向前移动,从直方图中减去直方图,直到它不能满足直方图条件。将第一个字符串的那一点(在最后一步之前)标记为您的暂定子字符串。
  • Move the front of the substring forwards again until you meet the histogram condition again. Move the end forwards until it fails again. If this is a shorter substring than the first, mark that as your tentative substring.
  • 再次向前移动子字符串的前部,直到再次满足直方图条件。将末端向前移动,直到它再次失败。如果这是一个比第一个短的子字符串,那么将其标记为您的暂定子字符串。
  • Repeat until you've passed through the entire first string.
  • 重复,直到你通过了整个第一个字符串。
  • The marked substring is your answer.
  • 标记的子字符串是你的答案。

Note that by varying the check you use on the histogram condition, you can choose either to have the same set of characters as the second string, or at least as many characters of each type. (Its just the difference between a[i]>0 && b[i]>0 and a[i]>=b[i].)

注意,通过改变在直方图条件下使用的检查,您可以选择具有与第二个字符串相同的字符集,或者至少具有每种类型相同的字符数。(这只是a[i]> & b[i]>0和a[i]>=b[i]之间的差异。)

You can speed up the histogram checks if you keep a track of which condition is not satisfied when you're trying to satisfy it, and checking only the thing that you decrement when you're trying to break it. (On the initial buildup, you count how many items you've satisfied, and increment that count every time you add a new character that takes the condition from false to true.)

如果你跟踪哪个条件在你试图满足它的时候不满足,你可以加快直方图检查,并且只检查你想要破坏它的时候衰减的东西。(在初始的构建过程中,您可以计算您已经满足了多少项,并在每次添加一个新字符时,将该条件从false改为true。)

#2


37  

To see more details including working code, check my blog post at:

要了解更多的细节,包括工作代码,请查看我的博客文章:

http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html

http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html

To help illustrate this approach, I use an example: string1 = "acbbaca" and string2 = "aba". Here, we also use the term "window", which means a contiguous block of characters from string1 (could be interchanged with the term substring).

为了说明这种方法,我使用了一个示例:string1 = "acbbaca"和string2 = "aba"。在这里,我们还使用术语“窗口”,它表示来自string1的连续字符块(可以与术语子字符串互换)。

如何找到包含给定字符串中所有字符的最小子字符串?

i) string1 = "acbbaca" and string2 = "aba".

i) string1 = "acbbaca"和string2 = "aba"。

如何找到包含给定字符串中所有字符的最小子字符串?

ii) The first minimum window is found. Notice that we cannot advance begin pointer as hasFound['a'] == needToFind['a'] == 2. Advancing would mean breaking the constraint.

ii)找到第一个最小窗口。注意,我们不能像hasFound['a'] = needToFind['a'] = 2那样提前开始指针。前进意味着打破约束。

如何找到包含给定字符串中所有字符的最小子字符串?

iii) The second window is found. begin pointer still points to the first element 'a'. hasFound['a'] (3) is greater than needToFind['a'] (2). We decrement hasFound['a'] by one and advance begin pointer to the right.

iii)找到第二个窗口。begin pointer仍然指向第一个元素“a”。已经找到了[a](3)比需要找到[a](2)要大。我们已经找到[a]一个,提前开始指针指向右。

如何找到包含给定字符串中所有字符的最小子字符串?

iv) We skip 'c' since it is not found in string2. Begin pointer now points to 'b'. hasFound['b'] (2) is greater than needToFind['b'] (1). We decrement hasFound['b'] by one and advance begin pointer to the right.

iv)我们跳过c,因为它在string2中不存在。现在开始指针指向“b”。发现['b'](2)比需要找到['b'](1)更大。我们减少了一个['b'],并且提前开始指向正确的指针。

如何找到包含给定字符串中所有字符的最小子字符串?

v) Begin pointer now points to the next 'b'. hasFound['b'] (1) is equal to needToFind['b'] (1). We stop immediately and this is our newly found minimum window.

v)开始指针现在指向下一个b。已经找到['b'] (1) = needToFind['b'](1).我们立即停止,这是我们新找到的最小窗口。

The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing string1. needToFind stores the total count of a character in string2 and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in string2 that's met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals string2's length, we know a valid window is found.

这个想法主要基于在遍历string1时两个指针(窗口的开始和结束位置)和两个表(needToFind和hasFound)的帮助。needToFind存储string2中一个字符的总数,并且已经找到了一个字符的总数。我们还使用一个count变量来存储到目前为止在string2中遇到的所有字符(不包括在hasFound[x]超过needToFind[x]的字符)。当count等于string2的长度时,我们知道找到了一个有效的窗口。

Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to string2's size), we immediately advance begin pointer as far right as possible while maintaining the constraint.

每次我们向前进结束指针(指向一个元素x),我们就会增加一个。如果hasFound[x]小于或等于needToFind[x],我们也将计数增加1。为什么?当满足约束时(也就是说,count等于string2的大小),我们立即在维护约束的同时尽可能地向右推进begin指针。

How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.

如何检查它是否维护了约束?假设begin指向元素x,我们检查hasFound[x]是否大于needToFind[x]。如果是的话,我们可以一个递减hasFound[x]并在不破坏约束的情况下提前开始指针。另一方面,如果不是,我们会立即停止,因为向前的开始指针会破坏窗口约束。

Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.

最后,我们检查窗口的最小长度是否小于当前的最小长度。如果发现新的最小值,则更新当前的最小值。

Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.

本质上,算法找到第一个满足约束的窗口,然后在整个过程中继续保持约束。

#3


6  

Here's an O(n) solution. The basic idea is simple: for each starting index, find the least ending index such that the substring contains all of the necessary letters. The trick is that the least ending index increases over the course of the function, so with a little data structure support, we consider each character at most twice.

这是一个O(n)的解决方案。基本思想很简单:对于每个起始索引,查找最小结束索引,以便子字符串包含所有必要的字母。关键在于,在函数的过程中,最小结尾的索引会增加,因此,在少量的数据结构支持下,我们最多考虑两个字符。

In Python:

在Python中:

from collections import defaultdict

def smallest(s1, s2):
    assert s2 != ''
    d = defaultdict(int)
    nneg = [0]  # number of negative entries in d
    def incr(c):
        d[c] += 1
        if d[c] == 0:
            nneg[0] -= 1
    def decr(c):
        if d[c] == 0:
            nneg[0] += 1
        d[c] -= 1
    for c in s2:
        decr(c)
    minlen = len(s1) + 1
    j = 0
    for i in xrange(len(s1)):
        while nneg[0] > 0:
            if j >= len(s1):
                return minlen
            incr(s1[j])
            j += 1
        minlen = min(minlen, j - i)
        decr(s1[i])
    return minlen

#4


2  

I received the same interview question. I am a C++ candidate but I was in a position to code relatively fast in JAVA.

我也收到了同样的面试问题。我是一个c++的候选人,但是我在JAVA中代码相对较快。

Java [Courtesy : Sumod Mathilakath]

Java[提供:Sumod Mathilakath]

import java.io.*;
import  java.util.*;

class UserMainCode
{


    public String GetSubString(String input1,String input2){
        // Write code here...
        return find(input1, input2);
    }
  private static boolean containsPatternChar(int[] sCount, int[] pCount) {
        for(int i=0;i<256;i++) {
            if(pCount[i]>sCount[i])
                return false;
        }
        return true;
    }
  public static String find(String s, String p) {
        if (p.length() > s.length())
            return null;
        int[] pCount = new int[256];
        int[] sCount = new int[256];
        // Time: O(p.lenght)
        for(int i=0;i<p.length();i++) {
            pCount[(int)(p.charAt(i))]++;
            sCount[(int)(s.charAt(i))]++;
        }
        int i = 0, j = p.length(), min = Integer.MAX_VALUE;
        String res = null;
        // Time: O(s.lenght)
        while (j < s.length()) {
            if (containsPatternChar(sCount, pCount)) {
                if ((j - i) < min) {
                    min = j - i;
                    res = s.substring(i, j);
                    // This is the smallest possible substring.
                    if(min==p.length())
                        break;
                    // Reduce the window size.
                    sCount[(int)(s.charAt(i))]--;
                    i++;
                }
            } else {
                sCount[(int)(s.charAt(j))]++;
                // Increase the window size.
                j++;
            }
        }
        System.out.println(res);
        return res;
    }
}

C++ [Courtesy : sundeepblue]

c++(礼貌sundeepblue):

#include <iostream>
#include <vector>
#include <string>
#include <climits>
using namespace std;
string find_minimum_window(string s, string t) {
    if(s.empty() || t.empty()) return;

    int ns = s.size(), nt = t.size();
    vector<int> total(256, 0);
    vector<int> sofar(256, 0);
    for(int i=0; i<nt; i++) 
        total[t[i]]++;

    int L = 0, R; 
    int minL = 0;                           //gist2
    int count = 0;
    int min_win_len = INT_MAX;

    for(R=0; R<ns; R++) {                   // gist0, a big for loop
        if(total[s[R]] == 0) continue;
        else sofar[s[R]]++;

        if(sofar[s[R]] <= total[s[R]])      // gist1, <= not <
            count++;

        if(count == nt) {                   // POS1
            while(true) {
                char c = s[L]; 
                if(total[c] == 0) { L++; }
                else if(sofar[c] > total[c]) {
                    sofar[c]--;
                    L++;
                }
                else break;
            }  
            if(R - L + 1 < min_win_len) {   // this judge should be inside POS1
                min_win_len = R - L + 1;
                minL = L;
            }
        }
    }
    string res;
    if(count == nt)                         // gist3, cannot forget this. 
        res = s.substr(minL, min_win_len);  // gist4, start from "minL" not "L"
    return res;
}
int main() {
    string s = "abdccdedca";
    cout << find_minimum_window(s, "acd");
}

Erlang [Courtesy : wardbekker]

Erlang(礼貌wardbekker):

-module(leetcode).

-export([min_window/0]).

%% Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

%% For example,
%% S = "ADOBECODEBANC"
%% T = "ABC"
%% Minimum window is "BANC".

%% Note:
%% If there is no such window in S that covers all characters in T, return the emtpy string "".
%% If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.



min_window() ->
    "eca" = min_window("cabeca", "cae"),
    "eca" = min_window("cfabeca", "cae"),
    "aec" = min_window("cabefgecdaecf", "cae"),
    "cwae" = min_window("cabwefgewcwaefcf", "cae"),
    "BANC" = min_window("ADOBECODEBANC", "ABC"),
    ok.

min_window(T, S) ->
    min_window(T, S, []).

min_window([], _T, MinWindow) ->
    MinWindow;
min_window([H | Rest], T, MinWindow) ->
    NewMinWindow = case lists:member(H, T) of
                       true ->
                           MinWindowFound = fullfill_window(Rest, lists:delete(H, T), [H]),
                           case length(MinWindow) == 0 orelse (length(MinWindow) > length(MinWindowFound)
                               andalso length(MinWindowFound) > 0) of
                               true ->
                                   MinWindowFound;
                               false ->
                                   MinWindow
                           end;
                       false ->
                           MinWindow
                   end,
    min_window(Rest, T, NewMinWindow).

fullfill_window(_, [], Acc) ->
    %% window completed
    Acc;
fullfill_window([], _T, _Acc) ->
    %% no window found
    "";
fullfill_window([H | Rest], T, Acc) ->
    %% completing window
    case lists:member(H, T) of
        true ->
            fullfill_window(Rest, lists:delete(H, T), Acc ++ [H]);
        false ->
            fullfill_window(Rest, T, Acc ++ [H])
    end.

REF:

裁判:

#5


1  

Please have a look at this as well:

请大家也来看看这个:

//-----------------------------------------------------------------------

bool IsInSet(char ch, char* cSet)
{
    char* cSetptr = cSet;
    int index = 0;
    while (*(cSet+ index) != '\0')
    {
        if(ch == *(cSet+ index))
        {
            return true;            
        }
        ++index;
    }
    return false;
}

void removeChar(char ch, char* cSet)
{
    bool bShift = false;
    int index = 0;
    while (*(cSet + index) != '\0')
    {
        if( (ch == *(cSet + index)) || bShift)
        {
            *(cSet + index) = *(cSet + index + 1);
            bShift = true;
        }
        ++index;
    }
}
typedef struct subStr
{
    short iStart;
    short iEnd;
    short szStr;
}ss;

char* subStringSmallest(char* testStr, char* cSet)
{
    char* subString = NULL;
    int iSzSet = strlen(cSet) + 1;
    int iSzString = strlen(testStr)+ 1;
    char* cSetBackUp = new char[iSzSet];
    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);

    int iStartIndx = -1;    
    int iEndIndx = -1;
    int iIndexStartNext = -1;

    std::vector<ss> subStrVec;
    int index = 0;

    while( *(testStr+index) != '\0' )
    {
        if (IsInSet(*(testStr+index), cSetBackUp))
        {
            removeChar(*(testStr+index), cSetBackUp);

            if(iStartIndx < 0)
            {
                iStartIndx = index;
            }
            else if( iIndexStartNext < 0)
                iIndexStartNext = index;
            else
                ;

            if (strlen(cSetBackUp) == 0 )
            {
                iEndIndx = index;
                if( iIndexStartNext == -1)
                    break;
                else
                {
                    index = iIndexStartNext;
                    ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)};
                    subStrVec.push_back(stemp);
                    iStartIndx = iEndIndx = iIndexStartNext = -1;
                    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
                    continue;
                }
            }
        }
        else
        {
            if (IsInSet(*(testStr+index), cSet))
            {
                if(iIndexStartNext < 0)
                    iIndexStartNext = index;
            }
        }

        ++index;
    }


    int indexSmallest = 0;
    for(int indexVec = 0; indexVec < subStrVec.size(); ++indexVec)
    {
        if(subStrVec[indexSmallest].szStr > subStrVec[indexVec].szStr)
            indexSmallest = indexVec;       
    }

    subString = new char[(subStrVec[indexSmallest].szStr) + 1];
    memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr);
    memset((void*)(subString + subStrVec[indexSmallest].szStr), 0, 1);

    delete[] cSetBackUp;
    return subString;
}
//--------------------------------------------------------------------

#6


0  

Edit: apparently there's an O(n) algorithm (cf. algorithmist's answer). Obviously this have this will beat the [naive] baseline described below!

编辑:显然有一个O(n)算法。显然,这将击败下面描述的[天真的]基线!

Too bad I gotta go... I'm a bit suspicious that we can get O(n). I'll check in tomorrow to see the winner ;-) Have fun!

真遗憾我得走了……我有点怀疑我们能得到O(n)我明天去看获胜者;-)玩得开心!

Tentative algorithm:
The general idea is to sequentially try and use a character from str2 found in str1 as the start of a search (in either/both directions) of all the other letters of str2. By keeping a "length of best match so far" value, we can abort searches when they exceed this. Other heuristics can probably be used to further abort suboptimal (so far) solutions. The choice of the order of the starting letters in str1 matters much; it is suggested to start with the letter(s) of str1 which have the lowest count and to try with the other letters, of an increasing count, in subsequent attempts.

暂定算法:一般的想法是顺序地尝试并使用str1中找到的str2中的字符作为所有str2的其他字母的搜索(在任意/两个方向)的开始。通过保持“到目前为止最佳匹配的长度”值,我们可以在搜索超过这个值时终止搜索。其他启发式可能用于进一步终止次优(到目前为止)解决方案。str1中起始字母的顺序的选择很重要;建议从str1的字母(s)开始,它具有最低的计数,并在随后的尝试中使用其他字母,增加计数。

  [loose pseudo-code]
  - get count for each letter/character in str1  (number of As, Bs etc.)
  - get count for each letter in str2
  - minLen = length(str1) + 1  (the +1 indicates you're not sure all chars of 
                                str2 are in str1)
  - Starting with the letter from string2 which is found the least in string1,
    look for other letters of Str2, in either direction of str1, until you've 
    found them all (or not, at which case response = impossible => done!). 
    set x = length(corresponding substring of str1).
 - if (x < minLen), 
         set minlen = x, 
         also memorize the start/len of the str1 substring.
 - continue trying with other letters of str1 (going the up the frequency
   list in str1), but abort search as soon as length(substring of strl) 
   reaches or exceed minLen.  
   We can find a few other heuristics that would allow aborting a 
   particular search, based on [pre-calculated ?] distance between a given
   letter in str1 and some (all?) of the letters in str2.
 - the overall search terminates when minLen = length(str2) or when 
   we've used all letters of str1 (which match one letter of str2)
   as a starting point for the search

#7


0  

Here is Java implementation

这是Java实现

public static String shortestSubstrContainingAllChars(String input, String target) {
    int needToFind[] = new int[256];
    int hasFound[] = new int[256];
    int totalCharCount = 0;
    String result = null;

    char[] targetCharArray = target.toCharArray();
    for (int i = 0; i < targetCharArray.length; i++) {
        needToFind[targetCharArray[i]]++;           
    }

    char[] inputCharArray = input.toCharArray();
    for (int begin = 0, end = 0; end < inputCharArray.length; end++) {

        if (needToFind[inputCharArray[end]] == 0) {
            continue;
        }

        hasFound[inputCharArray[end]]++;
        if (hasFound[inputCharArray[end]] <= needToFind[inputCharArray[end]]) {
            totalCharCount ++;
        }
        if (totalCharCount == target.length()) {
            while (needToFind[inputCharArray[begin]] == 0 
                    || hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {

                if (hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
                    hasFound[inputCharArray[begin]]--;
                }
                begin++;
            }

            String substring = input.substring(begin, end + 1);
            if (result == null || result.length() > substring.length()) {
                result = substring;
            }
        }
    }
    return result;
}

Here is the Junit Test

这是Junit测试

@Test
public void shortestSubstringContainingAllCharsTest() {
    String result = StringUtil.shortestSubstrContainingAllChars("acbbaca", "aba");
    assertThat(result, equalTo("baca"));

    result = StringUtil.shortestSubstrContainingAllChars("acbbADOBECODEBANCaca", "ABC");
    assertThat(result, equalTo("BANC"));

    result = StringUtil.shortestSubstrContainingAllChars("this is a test string", "tist");
    assertThat(result, equalTo("t stri"));
}

#8


0  

//[ShortestSubstring.java][1]

public class ShortestSubstring {

    public static void main(String[] args) {
        String input1 = "My name is Fran";
        String input2 = "rim";
        System.out.println(getShortestSubstring(input1, input2));
    }

    private static String getShortestSubstring(String mainString, String toBeSearched) {

        int mainStringLength = mainString.length();
        int toBeSearchedLength = toBeSearched.length();

        if (toBeSearchedLength > mainStringLength) {
            throw new IllegalArgumentException("search string cannot be larger than main string");
        }

        for (int j = 0; j < mainStringLength; j++) {
            for (int i = 0; i <= mainStringLength - toBeSearchedLength; i++) {
                String substring = mainString.substring(i, i + toBeSearchedLength);
                if (checkIfMatchFound(substring, toBeSearched)) {
                    return substring;
                }
            }
            toBeSearchedLength++;
        }

        return null;
    }

    private static boolean checkIfMatchFound(String substring, String toBeSearched) {
        char[] charArraySubstring = substring.toCharArray();
        char[] charArrayToBeSearched = toBeSearched.toCharArray();
        int count = 0;

        for (int i = 0; i < charArraySubstring.length; i++) {
            for (int j = 0; j < charArrayToBeSearched.length; j++) {
                if (String.valueOf(charArraySubstring[i]).equalsIgnoreCase(String.valueOf(charArrayToBeSearched[j]))) {
                    count++;
                }
            }
        }
        return count == charArrayToBeSearched.length;
    }
}

#9


0  

This is an approach using prime numbers to avoid one loop, and replace it with multiplications. Several other minor optimizations can be made.

这是一种使用质数来避免一个循环的方法,并将其替换为乘法。还可以进行其他一些较小的优化。

  1. Assign a unique prime number to any of the characters that you want to find, and 1 to the uninteresting characters.

    为要查找的任何字符分配唯一的素数,为无趣的字符分配1。

  2. Find the product of a matching string by multiplying the prime number with the number of occurrences it should have. Now this product can only be found if the same prime factors are used.

    通过将质数与它应有的出现次数相乘,找到匹配字符串的乘积。现在只有使用相同的素因子才能找到这个乘积。

  3. Search the string from the beginning, multiplying the respective prime number as you move into a running product.

    从一开始搜索字符串,在进入一个正在运行的产品时将相应的质数相乘。

  4. If the number is greater than the correct sum, remove the first character and divide its prime number out of your running product.

    如果数字大于正确的和,则删除第一个字符,并将其素数从运行产品中除去。

  5. If the number is less than the correct sum, include the next character and multiply it into your running product.

    如果数字小于正确的和,包含下一个字符,并将其相乘到运行产品中。

  6. If the number is the same as the correct sum you have found a match, slide beginning and end to next character and continue searching for other matches.

    如果数字与找到匹配的正确和相同,则从开始到结束滑动到下一个字符,并继续搜索其他匹配。

  7. Decide which of the matches is the shortest.

    决定哪一个匹配是最短的。

Gist

要点

charcount = { 'a': 3, 'b' : 1 };
str = "kjhdfsbabasdadaaaaasdkaaajbajerhhayeom"

def find (c, s):
  Ns = len (s)

  C = list (c.keys ())
  D = list (c.values ())

  # prime numbers assigned to the first 25 chars
  prmsi = [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 , 97]

  # primes used in the key, all other set to 1
  prms = []
  Cord = [ord(c) - ord('a') for c in C]

  for e,p in enumerate(prmsi):
    if e in Cord:
      prms.append (p)
    else:
      prms.append (1)

  # Product of match
  T = 1
  for c,d in zip(C,D):
    p = prms[ord (c) - ord('a')]
    T *= p**d

  print ("T=", T)

  t = 1 # product of current string
  f = 0
  i = 0

  matches = []
  mi = 0
  mn = Ns
  mm = 0

  while i < Ns:
    k = prms[ord(s[i]) - ord ('a')]
    t *= k

    print ("testing:", s[f:i+1])

    if (t > T):
      # included too many chars: move start
      t /= prms[ord(s[f]) - ord('a')] # remove first char, usually division by 1
      f += 1 # increment start position
      t /= k # will be retested, could be replaced with bool

    elif t == T:
      # found match
      print ("FOUND match:", s[f:i+1])
      matches.append (s[f:i+1])

      if (i - f) < mn:
        mm = mi
        mn = i - f

      mi += 1

      t /= prms[ord(s[f]) - ord('a')] # remove first matching char

      # look for next match
      i += 1
      f += 1

    else:
      # no match yet, keep searching
      i += 1

  return (mm, matches)


print (find (charcount, str))

(note: this answer was originally posted to a duplicate question, the original answer is now deleted.)

(注意:这个答案本来是贴在一个重复的问题上的,现在原答案被删除了。)

#10


0  

C# Implementation:

c#实现:

public static Tuple<int, int> FindMinSubstringWindow(string input, string pattern)
{
    Tuple<int, int> windowCoords = new Tuple<int, int>(0, input.Length - 1);
    int[] patternHist = new int[256];
    for (int i = 0; i < pattern.Length; i++)
    {
        patternHist[pattern[i]]++;
    }
    int[] inputHist = new int[256];
    int minWindowLength = int.MaxValue;
    int count = 0;
    for (int begin = 0, end = 0; end < input.Length; end++)
    {
        // Skip what's not in pattern.
        if (patternHist[input[end]] == 0)
        {
            continue;
        }
        inputHist[input[end]]++;
        // Count letters that are in pattern.
        if (inputHist[input[end]] <= patternHist[input[end]])
        {
            count++;
        }
        // Window found.
        if (count == pattern.Length)
        {
            // Remove extra instances of letters from pattern
            // or just letters that aren't part of the pattern
            // from the beginning.
            while (patternHist[input[begin]] == 0 ||
                   inputHist[input[begin]] > patternHist[input[begin]])
            {
                if (inputHist[input[begin]] > patternHist[input[begin]])
                {
                    inputHist[input[begin]]--;
                }
                begin++;
            }
            // Current window found.
            int windowLength = end - begin + 1;
            if (windowLength < minWindowLength)
            {
                windowCoords = new Tuple<int, int>(begin, end);
                minWindowLength = windowLength;
            }
        }
    }
    if (count == pattern.Length)
    {
        return windowCoords;
    }
    return null;
}

#11


-1  

Java code for the approach discussed above:

上述方法的Java代码:

private static Map<Character, Integer> frequency;
private static Set<Character> charsCovered;
private static Map<Character, Integer> encountered;
/**
 * To set the first match index as an intial start point
 */
private static boolean hasStarted = false;
private static int currentStartIndex = 0;
private static int finalStartIndex = 0;
private static int finalEndIndex = 0;
private static int minLen = Integer.MAX_VALUE;
private static int currentLen = 0;
/**
 * Whether we have already found the match and now looking for other
 * alternatives.
 */
private static boolean isFound = false;
private static char currentChar;

public static String findSmallestSubStringWithAllChars(String big, String small) {

    if (null == big || null == small || big.isEmpty() || small.isEmpty()) {
        return null;
    }

    frequency = new HashMap<Character, Integer>();
    instantiateFrequencyMap(small);
    charsCovered = new HashSet<Character>();
    int charsToBeCovered = frequency.size();
    encountered = new HashMap<Character, Integer>();

    for (int i = 0; i < big.length(); i++) {
        currentChar = big.charAt(i);
        if (frequency.containsKey(currentChar) && !isFound) {
            if (!hasStarted && !isFound) {
                hasStarted = true;
                currentStartIndex = i;
            }
            updateEncounteredMapAndCharsCoveredSet(currentChar);
            if (charsCovered.size() == charsToBeCovered) {
                currentLen = i - currentStartIndex;
                isFound = true;
                updateMinLength(i);
            }
        } else if (frequency.containsKey(currentChar) && isFound) {
            updateEncounteredMapAndCharsCoveredSet(currentChar);
            if (currentChar == big.charAt(currentStartIndex)) {
                encountered.put(currentChar, encountered.get(currentChar) - 1);
                currentStartIndex++;
                while (currentStartIndex < i) {
                    if (encountered.containsKey(big.charAt(currentStartIndex))
                            && encountered.get(big.charAt(currentStartIndex)) > frequency.get(big
                                    .charAt(currentStartIndex))) {
                        encountered.put(big.charAt(currentStartIndex),
                                encountered.get(big.charAt(currentStartIndex)) - 1);
                    } else if (encountered.containsKey(big.charAt(currentStartIndex))) {
                        break;
                    }
                    currentStartIndex++;
                }
            }
            currentLen = i - currentStartIndex;
            updateMinLength(i);
        }
    }
    System.out.println("start: " + finalStartIndex + " finalEnd : " + finalEndIndex);
    return big.substring(finalStartIndex, finalEndIndex + 1);
}

private static void updateMinLength(int index) {
    if (minLen > currentLen) {
        minLen = currentLen;
        finalStartIndex = currentStartIndex;
        finalEndIndex = index;
    }

}

private static void updateEncounteredMapAndCharsCoveredSet(Character currentChar) {
    if (encountered.containsKey(currentChar)) {
        encountered.put(currentChar, encountered.get(currentChar) + 1);
    } else {
        encountered.put(currentChar, 1);
    }

    if (encountered.get(currentChar) >= frequency.get(currentChar)) {
        charsCovered.add(currentChar);
    }
}

private static void instantiateFrequencyMap(String str) {

    for (char c : str.toCharArray()) {
        if (frequency.containsKey(c)) {
            frequency.put(c, frequency.get(c) + 1);
        } else {
            frequency.put(c, 1);
        }
    }

}

public static void main(String[] args) {

    String big = "this is a test string";
    String small = "tist";
    System.out.println("len: " + big.length());
    System.out.println(findSmallestSubStringWithAllChars(big, small));
}

#12


-1  

def minimum_window(s, t, min_length = 100000):
    d = {}
    for x in t:
        if x in d:
            d[x]+= 1
        else:
            d[x] = 1

    tot = sum([y for x,y in d.iteritems()])
    l = []
    ind = 0 
    for i,x in enumerate(s):
        if ind == 1:
            l = l + [x]
        if x in d:
            tot-=1
            if not l:
                ind = 1
                l = [x]

        if tot == 0:
            if len(l)<min_length:
                min_length = len(l)
                min_length = minimum_window(s[i+1:], t, min_length)

return min_length

l_s = "ADOBECODEBANC"
t_s = "ABC"

min_length = minimum_window(l_s, t_s)

if min_length == 100000:
      print "Not found"
else:
      print min_length

#1


32  

You can do a histogram sweep in O(N+M) time and O(1) space where N is the number of characters in the first string and M is the number of characters in the second.

可以在O(N+M)时间和O(1)空间中进行直方图扫描,其中N是第一个字符串中的字符数,M是第二个字符串中的字符数。

It works like this:

是这样的:

  • Make a histogram of the second string's characters (key operation is hist2[ s2[i] ]++).
  • 制作第二个字符串字符的直方图(关键操作是hist2[s2[i]]]++ +)。
  • Make a cumulative histogram of the first string's characters until that histogram contains every character that the second string's histogram contains (which I will call "the histogram condition").
  • 对第一个字符串的字符进行累积直方图,直到该直方图包含第二个字符串的直方图所包含的每个字符(我称之为“直方图条件”)。
  • Then move forwards on the first string, subtracting from the histogram, until it fails to meet the histogram condition. Mark that bit of the first string (before the final move) as your tentative substring.
  • 然后在第一个字符串上向前移动,从直方图中减去直方图,直到它不能满足直方图条件。将第一个字符串的那一点(在最后一步之前)标记为您的暂定子字符串。
  • Move the front of the substring forwards again until you meet the histogram condition again. Move the end forwards until it fails again. If this is a shorter substring than the first, mark that as your tentative substring.
  • 再次向前移动子字符串的前部,直到再次满足直方图条件。将末端向前移动,直到它再次失败。如果这是一个比第一个短的子字符串,那么将其标记为您的暂定子字符串。
  • Repeat until you've passed through the entire first string.
  • 重复,直到你通过了整个第一个字符串。
  • The marked substring is your answer.
  • 标记的子字符串是你的答案。

Note that by varying the check you use on the histogram condition, you can choose either to have the same set of characters as the second string, or at least as many characters of each type. (Its just the difference between a[i]>0 && b[i]>0 and a[i]>=b[i].)

注意,通过改变在直方图条件下使用的检查,您可以选择具有与第二个字符串相同的字符集,或者至少具有每种类型相同的字符数。(这只是a[i]> & b[i]>0和a[i]>=b[i]之间的差异。)

You can speed up the histogram checks if you keep a track of which condition is not satisfied when you're trying to satisfy it, and checking only the thing that you decrement when you're trying to break it. (On the initial buildup, you count how many items you've satisfied, and increment that count every time you add a new character that takes the condition from false to true.)

如果你跟踪哪个条件在你试图满足它的时候不满足,你可以加快直方图检查,并且只检查你想要破坏它的时候衰减的东西。(在初始的构建过程中,您可以计算您已经满足了多少项,并在每次添加一个新字符时,将该条件从false改为true。)

#2


37  

To see more details including working code, check my blog post at:

要了解更多的细节,包括工作代码,请查看我的博客文章:

http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html

http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html

To help illustrate this approach, I use an example: string1 = "acbbaca" and string2 = "aba". Here, we also use the term "window", which means a contiguous block of characters from string1 (could be interchanged with the term substring).

为了说明这种方法,我使用了一个示例:string1 = "acbbaca"和string2 = "aba"。在这里,我们还使用术语“窗口”,它表示来自string1的连续字符块(可以与术语子字符串互换)。

如何找到包含给定字符串中所有字符的最小子字符串?

i) string1 = "acbbaca" and string2 = "aba".

i) string1 = "acbbaca"和string2 = "aba"。

如何找到包含给定字符串中所有字符的最小子字符串?

ii) The first minimum window is found. Notice that we cannot advance begin pointer as hasFound['a'] == needToFind['a'] == 2. Advancing would mean breaking the constraint.

ii)找到第一个最小窗口。注意,我们不能像hasFound['a'] = needToFind['a'] = 2那样提前开始指针。前进意味着打破约束。

如何找到包含给定字符串中所有字符的最小子字符串?

iii) The second window is found. begin pointer still points to the first element 'a'. hasFound['a'] (3) is greater than needToFind['a'] (2). We decrement hasFound['a'] by one and advance begin pointer to the right.

iii)找到第二个窗口。begin pointer仍然指向第一个元素“a”。已经找到了[a](3)比需要找到[a](2)要大。我们已经找到[a]一个,提前开始指针指向右。

如何找到包含给定字符串中所有字符的最小子字符串?

iv) We skip 'c' since it is not found in string2. Begin pointer now points to 'b'. hasFound['b'] (2) is greater than needToFind['b'] (1). We decrement hasFound['b'] by one and advance begin pointer to the right.

iv)我们跳过c,因为它在string2中不存在。现在开始指针指向“b”。发现['b'](2)比需要找到['b'](1)更大。我们减少了一个['b'],并且提前开始指向正确的指针。

如何找到包含给定字符串中所有字符的最小子字符串?

v) Begin pointer now points to the next 'b'. hasFound['b'] (1) is equal to needToFind['b'] (1). We stop immediately and this is our newly found minimum window.

v)开始指针现在指向下一个b。已经找到['b'] (1) = needToFind['b'](1).我们立即停止,这是我们新找到的最小窗口。

The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing string1. needToFind stores the total count of a character in string2 and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in string2 that's met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals string2's length, we know a valid window is found.

这个想法主要基于在遍历string1时两个指针(窗口的开始和结束位置)和两个表(needToFind和hasFound)的帮助。needToFind存储string2中一个字符的总数,并且已经找到了一个字符的总数。我们还使用一个count变量来存储到目前为止在string2中遇到的所有字符(不包括在hasFound[x]超过needToFind[x]的字符)。当count等于string2的长度时,我们知道找到了一个有效的窗口。

Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to string2's size), we immediately advance begin pointer as far right as possible while maintaining the constraint.

每次我们向前进结束指针(指向一个元素x),我们就会增加一个。如果hasFound[x]小于或等于needToFind[x],我们也将计数增加1。为什么?当满足约束时(也就是说,count等于string2的大小),我们立即在维护约束的同时尽可能地向右推进begin指针。

How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.

如何检查它是否维护了约束?假设begin指向元素x,我们检查hasFound[x]是否大于needToFind[x]。如果是的话,我们可以一个递减hasFound[x]并在不破坏约束的情况下提前开始指针。另一方面,如果不是,我们会立即停止,因为向前的开始指针会破坏窗口约束。

Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.

最后,我们检查窗口的最小长度是否小于当前的最小长度。如果发现新的最小值,则更新当前的最小值。

Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.

本质上,算法找到第一个满足约束的窗口,然后在整个过程中继续保持约束。

#3


6  

Here's an O(n) solution. The basic idea is simple: for each starting index, find the least ending index such that the substring contains all of the necessary letters. The trick is that the least ending index increases over the course of the function, so with a little data structure support, we consider each character at most twice.

这是一个O(n)的解决方案。基本思想很简单:对于每个起始索引,查找最小结束索引,以便子字符串包含所有必要的字母。关键在于,在函数的过程中,最小结尾的索引会增加,因此,在少量的数据结构支持下,我们最多考虑两个字符。

In Python:

在Python中:

from collections import defaultdict

def smallest(s1, s2):
    assert s2 != ''
    d = defaultdict(int)
    nneg = [0]  # number of negative entries in d
    def incr(c):
        d[c] += 1
        if d[c] == 0:
            nneg[0] -= 1
    def decr(c):
        if d[c] == 0:
            nneg[0] += 1
        d[c] -= 1
    for c in s2:
        decr(c)
    minlen = len(s1) + 1
    j = 0
    for i in xrange(len(s1)):
        while nneg[0] > 0:
            if j >= len(s1):
                return minlen
            incr(s1[j])
            j += 1
        minlen = min(minlen, j - i)
        decr(s1[i])
    return minlen

#4


2  

I received the same interview question. I am a C++ candidate but I was in a position to code relatively fast in JAVA.

我也收到了同样的面试问题。我是一个c++的候选人,但是我在JAVA中代码相对较快。

Java [Courtesy : Sumod Mathilakath]

Java[提供:Sumod Mathilakath]

import java.io.*;
import  java.util.*;

class UserMainCode
{


    public String GetSubString(String input1,String input2){
        // Write code here...
        return find(input1, input2);
    }
  private static boolean containsPatternChar(int[] sCount, int[] pCount) {
        for(int i=0;i<256;i++) {
            if(pCount[i]>sCount[i])
                return false;
        }
        return true;
    }
  public static String find(String s, String p) {
        if (p.length() > s.length())
            return null;
        int[] pCount = new int[256];
        int[] sCount = new int[256];
        // Time: O(p.lenght)
        for(int i=0;i<p.length();i++) {
            pCount[(int)(p.charAt(i))]++;
            sCount[(int)(s.charAt(i))]++;
        }
        int i = 0, j = p.length(), min = Integer.MAX_VALUE;
        String res = null;
        // Time: O(s.lenght)
        while (j < s.length()) {
            if (containsPatternChar(sCount, pCount)) {
                if ((j - i) < min) {
                    min = j - i;
                    res = s.substring(i, j);
                    // This is the smallest possible substring.
                    if(min==p.length())
                        break;
                    // Reduce the window size.
                    sCount[(int)(s.charAt(i))]--;
                    i++;
                }
            } else {
                sCount[(int)(s.charAt(j))]++;
                // Increase the window size.
                j++;
            }
        }
        System.out.println(res);
        return res;
    }
}

C++ [Courtesy : sundeepblue]

c++(礼貌sundeepblue):

#include <iostream>
#include <vector>
#include <string>
#include <climits>
using namespace std;
string find_minimum_window(string s, string t) {
    if(s.empty() || t.empty()) return;

    int ns = s.size(), nt = t.size();
    vector<int> total(256, 0);
    vector<int> sofar(256, 0);
    for(int i=0; i<nt; i++) 
        total[t[i]]++;

    int L = 0, R; 
    int minL = 0;                           //gist2
    int count = 0;
    int min_win_len = INT_MAX;

    for(R=0; R<ns; R++) {                   // gist0, a big for loop
        if(total[s[R]] == 0) continue;
        else sofar[s[R]]++;

        if(sofar[s[R]] <= total[s[R]])      // gist1, <= not <
            count++;

        if(count == nt) {                   // POS1
            while(true) {
                char c = s[L]; 
                if(total[c] == 0) { L++; }
                else if(sofar[c] > total[c]) {
                    sofar[c]--;
                    L++;
                }
                else break;
            }  
            if(R - L + 1 < min_win_len) {   // this judge should be inside POS1
                min_win_len = R - L + 1;
                minL = L;
            }
        }
    }
    string res;
    if(count == nt)                         // gist3, cannot forget this. 
        res = s.substr(minL, min_win_len);  // gist4, start from "minL" not "L"
    return res;
}
int main() {
    string s = "abdccdedca";
    cout << find_minimum_window(s, "acd");
}

Erlang [Courtesy : wardbekker]

Erlang(礼貌wardbekker):

-module(leetcode).

-export([min_window/0]).

%% Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

%% For example,
%% S = "ADOBECODEBANC"
%% T = "ABC"
%% Minimum window is "BANC".

%% Note:
%% If there is no such window in S that covers all characters in T, return the emtpy string "".
%% If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.



min_window() ->
    "eca" = min_window("cabeca", "cae"),
    "eca" = min_window("cfabeca", "cae"),
    "aec" = min_window("cabefgecdaecf", "cae"),
    "cwae" = min_window("cabwefgewcwaefcf", "cae"),
    "BANC" = min_window("ADOBECODEBANC", "ABC"),
    ok.

min_window(T, S) ->
    min_window(T, S, []).

min_window([], _T, MinWindow) ->
    MinWindow;
min_window([H | Rest], T, MinWindow) ->
    NewMinWindow = case lists:member(H, T) of
                       true ->
                           MinWindowFound = fullfill_window(Rest, lists:delete(H, T), [H]),
                           case length(MinWindow) == 0 orelse (length(MinWindow) > length(MinWindowFound)
                               andalso length(MinWindowFound) > 0) of
                               true ->
                                   MinWindowFound;
                               false ->
                                   MinWindow
                           end;
                       false ->
                           MinWindow
                   end,
    min_window(Rest, T, NewMinWindow).

fullfill_window(_, [], Acc) ->
    %% window completed
    Acc;
fullfill_window([], _T, _Acc) ->
    %% no window found
    "";
fullfill_window([H | Rest], T, Acc) ->
    %% completing window
    case lists:member(H, T) of
        true ->
            fullfill_window(Rest, lists:delete(H, T), Acc ++ [H]);
        false ->
            fullfill_window(Rest, T, Acc ++ [H])
    end.

REF:

裁判:

#5


1  

Please have a look at this as well:

请大家也来看看这个:

//-----------------------------------------------------------------------

bool IsInSet(char ch, char* cSet)
{
    char* cSetptr = cSet;
    int index = 0;
    while (*(cSet+ index) != '\0')
    {
        if(ch == *(cSet+ index))
        {
            return true;            
        }
        ++index;
    }
    return false;
}

void removeChar(char ch, char* cSet)
{
    bool bShift = false;
    int index = 0;
    while (*(cSet + index) != '\0')
    {
        if( (ch == *(cSet + index)) || bShift)
        {
            *(cSet + index) = *(cSet + index + 1);
            bShift = true;
        }
        ++index;
    }
}
typedef struct subStr
{
    short iStart;
    short iEnd;
    short szStr;
}ss;

char* subStringSmallest(char* testStr, char* cSet)
{
    char* subString = NULL;
    int iSzSet = strlen(cSet) + 1;
    int iSzString = strlen(testStr)+ 1;
    char* cSetBackUp = new char[iSzSet];
    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);

    int iStartIndx = -1;    
    int iEndIndx = -1;
    int iIndexStartNext = -1;

    std::vector<ss> subStrVec;
    int index = 0;

    while( *(testStr+index) != '\0' )
    {
        if (IsInSet(*(testStr+index), cSetBackUp))
        {
            removeChar(*(testStr+index), cSetBackUp);

            if(iStartIndx < 0)
            {
                iStartIndx = index;
            }
            else if( iIndexStartNext < 0)
                iIndexStartNext = index;
            else
                ;

            if (strlen(cSetBackUp) == 0 )
            {
                iEndIndx = index;
                if( iIndexStartNext == -1)
                    break;
                else
                {
                    index = iIndexStartNext;
                    ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)};
                    subStrVec.push_back(stemp);
                    iStartIndx = iEndIndx = iIndexStartNext = -1;
                    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
                    continue;
                }
            }
        }
        else
        {
            if (IsInSet(*(testStr+index), cSet))
            {
                if(iIndexStartNext < 0)
                    iIndexStartNext = index;
            }
        }

        ++index;
    }


    int indexSmallest = 0;
    for(int indexVec = 0; indexVec < subStrVec.size(); ++indexVec)
    {
        if(subStrVec[indexSmallest].szStr > subStrVec[indexVec].szStr)
            indexSmallest = indexVec;       
    }

    subString = new char[(subStrVec[indexSmallest].szStr) + 1];
    memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr);
    memset((void*)(subString + subStrVec[indexSmallest].szStr), 0, 1);

    delete[] cSetBackUp;
    return subString;
}
//--------------------------------------------------------------------

#6


0  

Edit: apparently there's an O(n) algorithm (cf. algorithmist's answer). Obviously this have this will beat the [naive] baseline described below!

编辑:显然有一个O(n)算法。显然,这将击败下面描述的[天真的]基线!

Too bad I gotta go... I'm a bit suspicious that we can get O(n). I'll check in tomorrow to see the winner ;-) Have fun!

真遗憾我得走了……我有点怀疑我们能得到O(n)我明天去看获胜者;-)玩得开心!

Tentative algorithm:
The general idea is to sequentially try and use a character from str2 found in str1 as the start of a search (in either/both directions) of all the other letters of str2. By keeping a "length of best match so far" value, we can abort searches when they exceed this. Other heuristics can probably be used to further abort suboptimal (so far) solutions. The choice of the order of the starting letters in str1 matters much; it is suggested to start with the letter(s) of str1 which have the lowest count and to try with the other letters, of an increasing count, in subsequent attempts.

暂定算法:一般的想法是顺序地尝试并使用str1中找到的str2中的字符作为所有str2的其他字母的搜索(在任意/两个方向)的开始。通过保持“到目前为止最佳匹配的长度”值,我们可以在搜索超过这个值时终止搜索。其他启发式可能用于进一步终止次优(到目前为止)解决方案。str1中起始字母的顺序的选择很重要;建议从str1的字母(s)开始,它具有最低的计数,并在随后的尝试中使用其他字母,增加计数。

  [loose pseudo-code]
  - get count for each letter/character in str1  (number of As, Bs etc.)
  - get count for each letter in str2
  - minLen = length(str1) + 1  (the +1 indicates you're not sure all chars of 
                                str2 are in str1)
  - Starting with the letter from string2 which is found the least in string1,
    look for other letters of Str2, in either direction of str1, until you've 
    found them all (or not, at which case response = impossible => done!). 
    set x = length(corresponding substring of str1).
 - if (x < minLen), 
         set minlen = x, 
         also memorize the start/len of the str1 substring.
 - continue trying with other letters of str1 (going the up the frequency
   list in str1), but abort search as soon as length(substring of strl) 
   reaches or exceed minLen.  
   We can find a few other heuristics that would allow aborting a 
   particular search, based on [pre-calculated ?] distance between a given
   letter in str1 and some (all?) of the letters in str2.
 - the overall search terminates when minLen = length(str2) or when 
   we've used all letters of str1 (which match one letter of str2)
   as a starting point for the search

#7


0  

Here is Java implementation

这是Java实现

public static String shortestSubstrContainingAllChars(String input, String target) {
    int needToFind[] = new int[256];
    int hasFound[] = new int[256];
    int totalCharCount = 0;
    String result = null;

    char[] targetCharArray = target.toCharArray();
    for (int i = 0; i < targetCharArray.length; i++) {
        needToFind[targetCharArray[i]]++;           
    }

    char[] inputCharArray = input.toCharArray();
    for (int begin = 0, end = 0; end < inputCharArray.length; end++) {

        if (needToFind[inputCharArray[end]] == 0) {
            continue;
        }

        hasFound[inputCharArray[end]]++;
        if (hasFound[inputCharArray[end]] <= needToFind[inputCharArray[end]]) {
            totalCharCount ++;
        }
        if (totalCharCount == target.length()) {
            while (needToFind[inputCharArray[begin]] == 0 
                    || hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {

                if (hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
                    hasFound[inputCharArray[begin]]--;
                }
                begin++;
            }

            String substring = input.substring(begin, end + 1);
            if (result == null || result.length() > substring.length()) {
                result = substring;
            }
        }
    }
    return result;
}

Here is the Junit Test

这是Junit测试

@Test
public void shortestSubstringContainingAllCharsTest() {
    String result = StringUtil.shortestSubstrContainingAllChars("acbbaca", "aba");
    assertThat(result, equalTo("baca"));

    result = StringUtil.shortestSubstrContainingAllChars("acbbADOBECODEBANCaca", "ABC");
    assertThat(result, equalTo("BANC"));

    result = StringUtil.shortestSubstrContainingAllChars("this is a test string", "tist");
    assertThat(result, equalTo("t stri"));
}

#8


0  

//[ShortestSubstring.java][1]

public class ShortestSubstring {

    public static void main(String[] args) {
        String input1 = "My name is Fran";
        String input2 = "rim";
        System.out.println(getShortestSubstring(input1, input2));
    }

    private static String getShortestSubstring(String mainString, String toBeSearched) {

        int mainStringLength = mainString.length();
        int toBeSearchedLength = toBeSearched.length();

        if (toBeSearchedLength > mainStringLength) {
            throw new IllegalArgumentException("search string cannot be larger than main string");
        }

        for (int j = 0; j < mainStringLength; j++) {
            for (int i = 0; i <= mainStringLength - toBeSearchedLength; i++) {
                String substring = mainString.substring(i, i + toBeSearchedLength);
                if (checkIfMatchFound(substring, toBeSearched)) {
                    return substring;
                }
            }
            toBeSearchedLength++;
        }

        return null;
    }

    private static boolean checkIfMatchFound(String substring, String toBeSearched) {
        char[] charArraySubstring = substring.toCharArray();
        char[] charArrayToBeSearched = toBeSearched.toCharArray();
        int count = 0;

        for (int i = 0; i < charArraySubstring.length; i++) {
            for (int j = 0; j < charArrayToBeSearched.length; j++) {
                if (String.valueOf(charArraySubstring[i]).equalsIgnoreCase(String.valueOf(charArrayToBeSearched[j]))) {
                    count++;
                }
            }
        }
        return count == charArrayToBeSearched.length;
    }
}

#9


0  

This is an approach using prime numbers to avoid one loop, and replace it with multiplications. Several other minor optimizations can be made.

这是一种使用质数来避免一个循环的方法,并将其替换为乘法。还可以进行其他一些较小的优化。

  1. Assign a unique prime number to any of the characters that you want to find, and 1 to the uninteresting characters.

    为要查找的任何字符分配唯一的素数,为无趣的字符分配1。

  2. Find the product of a matching string by multiplying the prime number with the number of occurrences it should have. Now this product can only be found if the same prime factors are used.

    通过将质数与它应有的出现次数相乘,找到匹配字符串的乘积。现在只有使用相同的素因子才能找到这个乘积。

  3. Search the string from the beginning, multiplying the respective prime number as you move into a running product.

    从一开始搜索字符串,在进入一个正在运行的产品时将相应的质数相乘。

  4. If the number is greater than the correct sum, remove the first character and divide its prime number out of your running product.

    如果数字大于正确的和,则删除第一个字符,并将其素数从运行产品中除去。

  5. If the number is less than the correct sum, include the next character and multiply it into your running product.

    如果数字小于正确的和,包含下一个字符,并将其相乘到运行产品中。

  6. If the number is the same as the correct sum you have found a match, slide beginning and end to next character and continue searching for other matches.

    如果数字与找到匹配的正确和相同,则从开始到结束滑动到下一个字符,并继续搜索其他匹配。

  7. Decide which of the matches is the shortest.

    决定哪一个匹配是最短的。

Gist

要点

charcount = { 'a': 3, 'b' : 1 };
str = "kjhdfsbabasdadaaaaasdkaaajbajerhhayeom"

def find (c, s):
  Ns = len (s)

  C = list (c.keys ())
  D = list (c.values ())

  # prime numbers assigned to the first 25 chars
  prmsi = [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 , 97]

  # primes used in the key, all other set to 1
  prms = []
  Cord = [ord(c) - ord('a') for c in C]

  for e,p in enumerate(prmsi):
    if e in Cord:
      prms.append (p)
    else:
      prms.append (1)

  # Product of match
  T = 1
  for c,d in zip(C,D):
    p = prms[ord (c) - ord('a')]
    T *= p**d

  print ("T=", T)

  t = 1 # product of current string
  f = 0
  i = 0

  matches = []
  mi = 0
  mn = Ns
  mm = 0

  while i < Ns:
    k = prms[ord(s[i]) - ord ('a')]
    t *= k

    print ("testing:", s[f:i+1])

    if (t > T):
      # included too many chars: move start
      t /= prms[ord(s[f]) - ord('a')] # remove first char, usually division by 1
      f += 1 # increment start position
      t /= k # will be retested, could be replaced with bool

    elif t == T:
      # found match
      print ("FOUND match:", s[f:i+1])
      matches.append (s[f:i+1])

      if (i - f) < mn:
        mm = mi
        mn = i - f

      mi += 1

      t /= prms[ord(s[f]) - ord('a')] # remove first matching char

      # look for next match
      i += 1
      f += 1

    else:
      # no match yet, keep searching
      i += 1

  return (mm, matches)


print (find (charcount, str))

(note: this answer was originally posted to a duplicate question, the original answer is now deleted.)

(注意:这个答案本来是贴在一个重复的问题上的,现在原答案被删除了。)

#10


0  

C# Implementation:

c#实现:

public static Tuple<int, int> FindMinSubstringWindow(string input, string pattern)
{
    Tuple<int, int> windowCoords = new Tuple<int, int>(0, input.Length - 1);
    int[] patternHist = new int[256];
    for (int i = 0; i < pattern.Length; i++)
    {
        patternHist[pattern[i]]++;
    }
    int[] inputHist = new int[256];
    int minWindowLength = int.MaxValue;
    int count = 0;
    for (int begin = 0, end = 0; end < input.Length; end++)
    {
        // Skip what's not in pattern.
        if (patternHist[input[end]] == 0)
        {
            continue;
        }
        inputHist[input[end]]++;
        // Count letters that are in pattern.
        if (inputHist[input[end]] <= patternHist[input[end]])
        {
            count++;
        }
        // Window found.
        if (count == pattern.Length)
        {
            // Remove extra instances of letters from pattern
            // or just letters that aren't part of the pattern
            // from the beginning.
            while (patternHist[input[begin]] == 0 ||
                   inputHist[input[begin]] > patternHist[input[begin]])
            {
                if (inputHist[input[begin]] > patternHist[input[begin]])
                {
                    inputHist[input[begin]]--;
                }
                begin++;
            }
            // Current window found.
            int windowLength = end - begin + 1;
            if (windowLength < minWindowLength)
            {
                windowCoords = new Tuple<int, int>(begin, end);
                minWindowLength = windowLength;
            }
        }
    }
    if (count == pattern.Length)
    {
        return windowCoords;
    }
    return null;
}

#11


-1  

Java code for the approach discussed above:

上述方法的Java代码:

private static Map<Character, Integer> frequency;
private static Set<Character> charsCovered;
private static Map<Character, Integer> encountered;
/**
 * To set the first match index as an intial start point
 */
private static boolean hasStarted = false;
private static int currentStartIndex = 0;
private static int finalStartIndex = 0;
private static int finalEndIndex = 0;
private static int minLen = Integer.MAX_VALUE;
private static int currentLen = 0;
/**
 * Whether we have already found the match and now looking for other
 * alternatives.
 */
private static boolean isFound = false;
private static char currentChar;

public static String findSmallestSubStringWithAllChars(String big, String small) {

    if (null == big || null == small || big.isEmpty() || small.isEmpty()) {
        return null;
    }

    frequency = new HashMap<Character, Integer>();
    instantiateFrequencyMap(small);
    charsCovered = new HashSet<Character>();
    int charsToBeCovered = frequency.size();
    encountered = new HashMap<Character, Integer>();

    for (int i = 0; i < big.length(); i++) {
        currentChar = big.charAt(i);
        if (frequency.containsKey(currentChar) && !isFound) {
            if (!hasStarted && !isFound) {
                hasStarted = true;
                currentStartIndex = i;
            }
            updateEncounteredMapAndCharsCoveredSet(currentChar);
            if (charsCovered.size() == charsToBeCovered) {
                currentLen = i - currentStartIndex;
                isFound = true;
                updateMinLength(i);
            }
        } else if (frequency.containsKey(currentChar) && isFound) {
            updateEncounteredMapAndCharsCoveredSet(currentChar);
            if (currentChar == big.charAt(currentStartIndex)) {
                encountered.put(currentChar, encountered.get(currentChar) - 1);
                currentStartIndex++;
                while (currentStartIndex < i) {
                    if (encountered.containsKey(big.charAt(currentStartIndex))
                            && encountered.get(big.charAt(currentStartIndex)) > frequency.get(big
                                    .charAt(currentStartIndex))) {
                        encountered.put(big.charAt(currentStartIndex),
                                encountered.get(big.charAt(currentStartIndex)) - 1);
                    } else if (encountered.containsKey(big.charAt(currentStartIndex))) {
                        break;
                    }
                    currentStartIndex++;
                }
            }
            currentLen = i - currentStartIndex;
            updateMinLength(i);
        }
    }
    System.out.println("start: " + finalStartIndex + " finalEnd : " + finalEndIndex);
    return big.substring(finalStartIndex, finalEndIndex + 1);
}

private static void updateMinLength(int index) {
    if (minLen > currentLen) {
        minLen = currentLen;
        finalStartIndex = currentStartIndex;
        finalEndIndex = index;
    }

}

private static void updateEncounteredMapAndCharsCoveredSet(Character currentChar) {
    if (encountered.containsKey(currentChar)) {
        encountered.put(currentChar, encountered.get(currentChar) + 1);
    } else {
        encountered.put(currentChar, 1);
    }

    if (encountered.get(currentChar) >= frequency.get(currentChar)) {
        charsCovered.add(currentChar);
    }
}

private static void instantiateFrequencyMap(String str) {

    for (char c : str.toCharArray()) {
        if (frequency.containsKey(c)) {
            frequency.put(c, frequency.get(c) + 1);
        } else {
            frequency.put(c, 1);
        }
    }

}

public static void main(String[] args) {

    String big = "this is a test string";
    String small = "tist";
    System.out.println("len: " + big.length());
    System.out.println(findSmallestSubStringWithAllChars(big, small));
}

#12


-1  

def minimum_window(s, t, min_length = 100000):
    d = {}
    for x in t:
        if x in d:
            d[x]+= 1
        else:
            d[x] = 1

    tot = sum([y for x,y in d.iteritems()])
    l = []
    ind = 0 
    for i,x in enumerate(s):
        if ind == 1:
            l = l + [x]
        if x in d:
            tot-=1
            if not l:
                ind = 1
                l = [x]

        if tot == 0:
            if len(l)<min_length:
                min_length = len(l)
                min_length = minimum_window(s[i+1:], t, min_length)

return min_length

l_s = "ADOBECODEBANC"
t_s = "ABC"

min_length = minimum_window(l_s, t_s)

if min_length == 100000:
      print "Not found"
else:
      print min_length