串是由零个或多个字符组成的有限序列,又叫做字符串
串的逻辑结构和线性表很相似的,不同的是串针对是是字符集,所以在操作上与线性表还是有很大区别的。线性表更关注的是单个元素的操作CURD,串则是关注查找子串的位置,替换等操作。
当然不同的高级语言对串的基本操作都有不同的定义方法,但是总的来说操作的本质都是相似的。比如javascrript查找就是indexOf, 去空白就是trim,转化大小写toLowerCase/toUpperCase等等
这里主要讨论下字符串模式匹配的几种经典的算法:BF、BM、KMP
BF(Brute Force)算法
Brute-Force算法的基本思想:
从目标串s 的第一个字符起和模式串t的第一个字符进行比较,若相等,则继续逐个比较后续字符,否则从串s 的第二个字符起再重新和串t进行比较。
依此类推,直至串t 中的每个字符依次和串s的一个连续的字符序列相等,则称模式匹配成功,此时串t的第一个字符在串s 中的位置就是t 在s中的位置,否则模式匹配不成功
可见BF算法是一种暴力算法,又称为朴素匹配算法或蛮力算法。
主串 BBC ABB ABCF
子串 ABC
在主串中找出子串的位置,对应了其实就是javascript的indexOf查找方法的实现了
var sourceStr = "BBC ABB ABCF"; var searchStr = "ABC"; function BF_Ordinary(sourceStr, searchStr) { var sourceLength = sourceStr.length; var searchLength = searchStr.length; var padding = sourceLength - searchLength; //循环的次数 //BBC ABB ABCF =>ABC => 搜索9次 for (var i = 0; i <= padding; i++) { //如果满足了第一个charAt是相等的 //开始子循环检测 //其中sourceStr的取值是需要叠加i的值 if (sourceStr.charAt(i) == searchStr.charAt(0)) { //匹配成功的数据 var complete = searchLength; for (var j = 0; j < searchLength; j++) { if (sourceStr.charAt(i + j) == searchStr.charAt(j)) { --complete if (!complete) { return i; } } } } } return -1; }
BF算法就是简单粗暴,直接把BBC ABB ABCF母串的每一个字符的下表取出来与模式串的第一个字符匹配,如果相等就进去字串的再次匹配
这里值得注意:
1:最外围循环的次数sourceLength - searchLength,因为我们匹配的母串至少要大于等于子串
2:在子串的继续匹配中,母串的起点是需要叠加的(i+j)
3:通过一个条件判断是否完全匹配complete,BBC ABB ABCF中,我们在ABB的时候就需要跳过去
上面是最简单的一个算法了,代码上还有更优的处理,比如在自串的匹配上可以采取取反的算法
优化算法(一)
function BF_Optimize(sourceStr, searchStr) { var mainLength = sourceStr.length; var searchLength = searchStr.length; var padding = mainLength - searchLength; for (var offset = 0; offset <= padding; offset++) { var match = true; for (var i = 0; i < searchLength; i++) { //取反,如果只要不相等 if (searchStr.charAt(i) !== sourceStr.charAt(offset + i)) { match = false; break; } } if (match) return offset; } return -1; }
我们不需要判断为真的情况,我们只要判断为假的情况就可以了,当子匹配结束后match没有被修改过的话,则说明此匹配是完全匹配
以上2种方法我们都用到了子循环,我们能否改成一个循环体呢?
其实我们可以看到规律,主串每次都只会递增+1,子串每次匹配也是从头开始匹配,所以我们可以改成一个while,控制下标指针就可以了
优化算法(二)
function BF_Optimize_2(sourceStr, searchStr) { var i = 0, j = 0; while (i < sourceStr.length) { // 两字母相等则继续 if (sourceStr.charAt(i) == searchStr.charAt(j)) { i++; j++; } else { // 两字母不等则角标后退重新开始匹配 i = i - j + 1; // i 回退到上次匹配首位的下一位 j = 0; // j 回退到子串的首位 } if (j == searchStr.length) { return i - j; } } }
i就是主串的下标定位,j就是子串的下标定位
当主串子串相等的时候,就进入了子串的循环模式,当子循环的次数j满足子串长度时,就验证是完全匹配
当主串子串不相等的时候,就需要把主串的下标往后移一位,当然i的时候,因为可能经过子串的处理,所以需要i-j+1, 然后复位子串
具体我们可以看看代码比较
基于BF算法的四种结构,for/while/递归
<!doctype html>由于电脑性能的不断提高,测试的数据量的大小,可能会导致得到的结果不太准确;<script type="text/javascript"> ///////// //暴力算法 // //普通版 ///////// function BF_Ordinary(sourceStr, searchStr) { var sourceLength = sourceStr.length; var searchLength = searchStr.length; var padding = sourceLength - searchLength; //循环的次数 //BBC ABB ABCF =>ABC => 搜索9次 for (var i = 0; i <= padding; i++) { //如果满足了第一个charAt是相等的 //开始子循环检测 //其中sourceStr的取值是需要叠加i的值 if (sourceStr.charAt(i) == searchStr.charAt(0)) { //匹配成功的数据 var complete = searchLength; for (var j = 0; j < searchLength; j++) { if (sourceStr.charAt(i + j) == searchStr.charAt(j)) { --complete if (!complete) { return i; } } } } } return -1; } ///////// //暴力算法 // //优化版 ///////// function BF_Optimize_1(sourceStr, searchStr) { var mainLength = sourceStr.length; var searchLength = searchStr.length; var padding = mainLength - searchLength; for (var offset = 0; offset <= padding; offset++) { var match = true; for (var i = 0; i < searchLength; i++) { //取反,如果只要不相等 if (searchStr.charAt(i) !== sourceStr.charAt(offset + i)) { match = false; break; } } if (match) return offset; } return -1; } //////// //优化版 // //while //////// function BF_Optimize_2(sourceStr, searchStr) { var i = 0, j = 0; while (i < sourceStr.length) { // 两字母相等则继续 if (sourceStr.charAt(i) == searchStr.charAt(j)) { i++; j++; } else { // 两字母不等则角标后退重新开始匹配 i = i - j + 1; // i 回退到上次匹配首位的下一位 j = 0; // j 回退到子串的首位 } if (j == searchStr.length) { return i - j; } } } ///////// //暴力算法 //递归版本 ///////// function BF_Recursive(sourceStr, searchStr, offset) { var mainLength = sourceStr.length; var searchLength = searchStr.length; if (searchLength > mainLength - offset) { return -1; } offset = offset || 0; for (var i = 0; searchLength > i; i++) { if (searchStr.charAt(i) !== sourceStr.charAt(offset + i)) { return BF_Recursive(sourceStr, searchStr, offset + 1) } } return offset; } var sourceStr = "There are some times wThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkhen clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer There are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely thinkThere are some times when clicking “like” on a friend's Facebook status doesn't feel appropriate. A bad day. A loved one lost. A break up. It only seems natural that a “dislike” button could solve the conundrum of wanting to empathize but not seem inappropriate by clicking “like.” Mark Zuckerberg Puts the Rest of Us to Shame by Speaking Fluent Chinese. Mark Zuckerberg: Facebook Founder and Animal Butcher. Mark Zuckerberg and That Shirt. The idea has been on Mark Zuckerberg's radar for a while, he said. In 2010, he told ABC News' Diane Sawyer that that Facebook would “definitely think that that Facebook would “definitely think about” adding a dislike button. “People definitely seem to want it,” Zuckerberg said. Four years later — Zuckerberg says Facebook is still “thinking about” adding the oft-requested sdfafd button, Zuckerberg says Facebook is still “thinking about” adding the oft-requested button. At a town hall meeting on Thursday, the CEO revealed he has some reservations about the feature. “There are two things that it can mean,” Zuckerberg said of the potential button, which could be used in a mean spirited way or to express empathy. Finding how to limit it to the latter is the challenge. Zuckerberg said he doesn't want the button to turn into a “voting mechanism” or something that isn't “socially valuable.” “Often people will tell us they don't feel comfortable pressing ‘like,'” Zuckerberg said. “What's the right way to make it so people can easier express a wide range of emotions"; var searchStr = "adding the oft-requested sdf"; function show(bf_name,fn) { var myDate = +new Date() var r = fn(); var div = document.createElement('div') div.innerHTML = bf_name +'算法,搜索位置:' + r + ",耗时" + (+new Date() - myDate) + "ms"; document.body.appendChild(div); } show('BF_Ordinary',function() { return BF_Ordinary(sourceStr, searchStr) }) show('BF_Optimize_1',function() { return BF_Optimize_1(sourceStr, searchStr) }) show('BF_Optimize_2',function() { return BF_Optimize_2(sourceStr, searchStr) }) show('BF_Recursive',function() { return BF_Recursive(sourceStr, searchStr) }) </script>
BF也是经典的前缀匹配算法,前缀还包括KMP,我们可见这种算法最大缺点就是字符匹配失败指针就要回溯,所以性能很低,之后会写一下KMP与BM算法针对BF的的升级
git代码下载: https://github.com/JsAaron/data_structure