读书笔记-python数据结构与算法分析（一）-异序词检测

一站式接入主流 AI 大模型，支持对话 · 生图 · 生视频，即开即用

ChatGPT Claude Gemini Grok DeepSeek 通义千问 Ollama

要展示不同数量级的算法，一个好例子就是经典的异序词检测问题。如果一个字符串只是重排了另一个字符串的字符，那么这个字符串就是另一个的异序词，列如heart与earth，以及python与typhon。为了简化问题，假设要检查的两个字符串长度一样，并且都是由26个英文字母的小谢形式组成的。我们的目标是编写一个布尔函数，它接受两个字符串，并能判断它们是否为异序词。

1 清点法

清点第一个字符串的每个字符，看看它们是否都出目前第2个字符串中。如果是，那么两个字符串必然是异序词。清点是通过python中的特殊值None取代字符来实现的。但是，由于python中的字符串是不可修改的，所以先要江第2个字符串转换成列表。在字符列表中检查第1个字符串中的每个字符，如果找到了，就替换掉。

python

def inventory(s1, s2):
    """
    清点法 O(n^2)
    :param s1:
    :param s2:
    :return:
    """
    s2_list = list(s2)
    pos1 = 0
    still_ok = True
    # 遍历s1
    while pos1 < len(s1) and still_ok:
        pos2 = 0
        found = False
        # 遍历s2
        while pos2 < len(s2_list) and not found:
            if s1[pos1] == s2_list[pos2]:
                found = True
            else:
                pos2 += 1
        # 如果找到了一样的字符，则把list中的值赋值为None
        if found:
            s2_list[pos2] = None
        else:
            still_ok = False
        pos1 += 1
    return still_ok

//
//  inventory
//  @Description: 清点法 O(n^2)
//  @param s1
//  @param s2
//  @return bool
//
func inventory(s1, s2 string) bool {
    s1Array := strings.Split(s1, "")
    s2Array := strings.Split(s2, "")

    pos1 := 0
    stillOk := true
    for pos1 < lenS1 && stillOk {
        pos2 := 0
        found := false
        for pos2 < lenS2 && !found {
            if s1Array[pos1] == s2Array[pos2] {
                found = true
            } else {
                pos2++
            }
        }
        if found {
            s2Array[pos2] = ""
        } else {
            stillOk = false
        }
        pos1++
    }
    return stillOk
}

注意，对于s1中的n个字符，检查每一个时都要遍历s2中的n个字符。要匹配s1中的一个字符，列表中的n个位置都要被访问一次。因此，访问次数就成了从1到n的整数之和。公式如下，所以时间复杂度为O(n^2)
读书笔记-python数据结构与算法分析（一）-异序词检测

2 排序法

尽管s1与s2是不同的字符串，但只要由一样的字符构成，它们就是异序词。基于这一点，可以采用另一个方案。如果按照字母表顺序给字符串排序，异序词得到的结果将是同一个字符串。可以先将字符串转换为列表，然后使用内建的sort方法对列表排序。

python

def sort_(s1, s2):
    """
    排序法 O(n^2)或O(nlog(n))
    :param s1:
    :param s2:
    :return:
    """
    s1_list = list(s1)
    s2_list = list(s2)

    s1_list.sort()
    s2_list.sort()

    for index, value in enumerate(s1_list):
        if s2_list[index] != value:
            return False
    return True

//
//  sort_
//  @Description: 排序法 O(n^2)或O(nlog(n))
//  @param s1
//  @param s2
//  @return bool
//
func sort_(s1, s2 string) bool {
    s1Array := strings.Split(s1, "")
    s2Array := strings.Split(s2, "")
    sort.Strings(s1Array)
    sort.Strings(s2Array)

    for index, value := range s1Array {
        if value != s2Array[index] {
            return false
        }
    }
    return true
}

乍看之下，你可能会认为这个算法的时间复杂度是O(n)，由于在排序之后需要遍历一次就可以比较n个字符。但是，调用两次sort方法不是没有代价。排序的时间复杂度基本上是O(n^2)或O(nlogn)，所以排序操作起主导作用。也就是说，该算法和排序过滤的数量级一样。

3 计数法

最后一个方案基于这样一个实际：两个异序词有一样数目的a、一样数目的b、一样数目的c，等等。要判断两个字符串是否为异序词，先数一下每一个出现的次数。由于字符可能有26种，所以使用26个计数器，对应每个字符。每遇到一个字符，就将对应的计数器加1。最后，如果两个计数器列表一样，那么两个字符串肯定是异序词。

python

def count(s1, s2):
    """
    计数法 O(n)
        ord: 它以一个字符（长度为1的字符串）作为参数，返回对应的 ASCII 数值，或者 Unicode 数值，如果所给的 Unicode 字符超出了你的 Python 定义范围，则会引发一个 TypeError 的异常
    :param s1:
    :param s2:
    :return:
    """
    # 26个字母
    c1 = [0] * 26
    c2 = [0] * 26

    for i in s1:
        index = ord(i) - ord( a )
        c1[index] += 1

    for i in s2:
        index = ord(i) - ord( a )
        c2[index] += 1

    return c1 == c2

//
//  count
//  @Description: 计数法 O(n) 单引号的字符为rune类型，用int可转ASCII
//  @param s1
//  @param s2
//  @return bool
//
func count(s1, s2 string) bool {
    var c1 [26]int
    var c2 [26]int
    startA := int( a )

    for _, ch := range s1 {
        c1[int(ch) - startA] ++
    }
    for _, ch := range s2 {
        c2[int(ch) - startA] ++
    }
    return c1 == c2
}

这个方案也有循环。但不同于方案1，这个方案的循环没有嵌套。两个计数循环都是n阶的。