正则表达式中断言的使用

啰嗦几句

正则表达式中的断言叫零宽度断言，分别有以下几种情况
- 零宽度正先行断言，表达式为(?=exp)
- 零宽度负先行断言，表达式为(?!exp)
- 零宽度正后发断言，表达式为(?<=exp)
- 零宽度负后发断言，表达式为(?<!exp)
搞得很拗口，简单一点上述4种情况翻译成人话分别是：后面是xxx，后面不是xxx，前面是xxx，前面不是xxx，就这么回事；
以下通过几个示例分别来验证使用效果

代码示例

以一段html代码为例，首先匹配出所有的a标签

无断言正常匹配

let html = `
    <div>
        <a href="https://www.sina.com.cn">新浪</a>
        <a href="https://cn.bing.com">必应</a>
        <a href="http://www.qq.com">腾讯</a>
    </div>
`
// 匹配所有a标签
let reg = /<a.*?href=(['"])(https?.+(com|cn|org|net))\1[\s\S]+?<\/a>/g;
/**
 * 输出结果为：
 * [
    '<a href="https://www.sina.com.cn">新浪</a>',
    '<a href="https://cn.bing.com">必应</a>',
    '<a href="http://www.qq.com">腾讯</a>'
   ]
 */
console.log(html.match(reg));

零宽度正先行断言

// 零宽度正先行断言(后面必须是xxx)
// 如果要求必须是.cn结尾(?=cn)
reg = /<a.*?href=(['"])(https?.+(?=cn)(com|cn|org|net))\1[\s\S]+?<\/a>/g;
/**
 * 输出结果为：
 * [ '<a href="https://www.sina.com.cn">新浪</a>' ]
 */
console.log(html.match(reg));

零宽度负先行断言

// 零宽度负先行断言(后面不能是xxx)
// 如果要求不能是.cn结尾(?!cn)
reg = /<a.*?href=(['"])(https?.+(?!cn)(com|cn|org|net))\1[\s\S]+?<\/a>/g;
/**
 * 输出结果为：
 * [
    '<a href="https://cn.bing.com">必应</a>',
    '<a href="http://www.qq.com">腾讯</a>'
   ]
 */
console.log(html.match(reg));

零宽度正后发断言

// 零宽宽正后发断言(前面必须是xxx)
// 如果要求必须以https开头(?<=https)
reg = /<a.*?href=(['"])(https?(?<=https).+(com|cn|org|net))\1[\s\S]+?<\/a>/g;
/**
 * 输出结果：
 * [
    '<a href="https://www.sina.com.cn">新浪</a>',
    '<a href="https://cn.bing.com">必应</a>'
   ]
 */
console.log(html.match(reg));

零宽度负后发断言

// 零宽度负后发断言(前面不能是xxx)
// 如果要求不能以http开头(?<!http)
// 这里需要注意，https?这个?表示0或1默认是贪婪的，因此包含https链接断言的对象会是https，不会是http
reg = /<a.*?href=(['"])(https?(?<!http).+(com|cn|org|net))\1[\s\S]+?<\/a>/g;
/**
 * 输出结果：
 * [
    '<a href="https://www.sina.com.cn">新浪</a>',
    '<a href="https://cn.bing.com">必应</a>'
   ]
 */
console.log(html.match(reg));

Leave a Reply Cancel reply

You must be logged in to post a comment.