A Simple HTML Entity Parser in C++

  • 时间:2020-10-11 15:25:20
  • 分类:网络文摘
  • 阅读:110 次

HTML entity parser is the parser that takes HTML code as input and replace all the entities of the special characters by the characters itself.

The special characters and their entities for HTML are:
Quotation Mark: the entity is " and symbol character is “.
Single Quote Mark: the entity is ' and symbol character is ‘.
Ampersand: the entity is & and symbol character is &.
Greater Than Sign: the entity is > and symbol character is >.
Less Than Sign: the entity is < and symbol character is <.
Slash: the entity is ⁄ and symbol character is /.

Given the input text string to the HTML parser, you have to implement the entity parser. Return the text after replacing the entities by the special characters.

Example 1:
Input: text = “& is an HTML entity but &ambassador; is not.”
Output: “& is an HTML entity but &ambassador; is not.”
Explanation: The parser will replace the & entity by &

Example 2:
Input: text = “and I quote: "…"”
Output: “and I quote: \”…\””

Example 3:
Input: text = “Stay home! Practice on Leetcode :)”
Output: “Stay home! Practice on Leetcode :)”

Example 4:
Input: text = “x > y && x < y is always false”
Output: “x > y && x < y is always false”

Example 5:
Input: text = “leetcode.com⁄problemset⁄all”
Output: “leetcode.com/problemset/all”

Constraints:
1 <= text.length <= 10^5
The string may contain any possible characters out of all the 256 ASCII characters.

Hints:
Search the string for all the occurrences of the character ‘&’.
For every ‘&’ check if it matches an HTML entity by checking the ‘;’ character and if entity found replace it in the answer.

HTML Entity Parser in C++

The following is a Simple HTML Entity Parser. We store the mappings in a unordered hash map. Then we go through each character, and check if any of the mapping can be applied to the current position of the HTML string. Once a mapping is applied, we need to skip to next character.

The time complexity is O(NM) where N is the number of the characters of the HTML string, and M is the number of the mappings. We use an alternative string to return the parsed HTML string. You can also apply the HTML entity transform in-place.

We use the C++ substr to return a copy of the substring. The first parameter is the start index, and the second parameter is the length of the substring.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class Solution {
public:
    string entityParser(string text) {
        unordered_map<string, string> convert({
            {"&quot;", "\""},
            {"&apos;", "'"},
            {"&amp;", "&"},
            {"&gt;", ">"},
            {"&lt;", "<"},
            {"&frasl;", "/"}
        });
        string res = "";
        for (int i = 0; i < text.size(); ++ i) {
            bool flag = false;
            for (auto it = begin(convert); it != end(convert); ++ it) {
                string key = it->first;
                string value = it->second;
                if (i + key.size() - 1 < text.size()) {
                    if (text.substr(i, key.size()) == key)    {
                        res += value;
                        i += key.size() - 1;
                        flag = true;
                        break;
                    }
                }                 
            }
            if (!flag) {
                res += text[i];
            }
        }
        return res;
    }
};
class Solution {
public:
    string entityParser(string text) {
        unordered_map<string, string> convert({
            {"&quot;", "\""},
            {"&apos;", "'"},
            {"&amp;", "&"},
            {"&gt;", ">"},
            {"&lt;", "<"},
            {"&frasl;", "/"}
        });
        string res = "";
        for (int i = 0; i < text.size(); ++ i) {
            bool flag = false;
            for (auto it = begin(convert); it != end(convert); ++ it) {
                string key = it->first;
                string value = it->second;
                if (i + key.size() - 1 < text.size()) {
                    if (text.substr(i, key.size()) == key)    {
                        res += value;
                        i += key.size() - 1;
                        flag = true;
                        break;
                    }
                }                 
            }
            if (!flag) {
                res += text[i];
            }
        }
        return res;
    }
};

The space complexity is O(N) as we need to allocate a string to hold the result parsed string.

–EOF (The Ultimate Computing & Technology Blog) —

推荐阅读:
Heroku免费云空间512M内存可绑定域名  Freehostia免费虚拟主机提供免费空间大小1GB月流量6GB  Awardspace免费php空间稳定可绑域名没有广告500MB空间  一站式商旅及费用管理平台“汇联易”完成3亿元C+轮融资  研究完各路大神,终于知道你做项目失败的原因了  以技术战疫 融云入围"创客北京2020"疫情防控专题赛50强  微信视频号如何注册?微信视频号如何运营吗?  思创客品牌咨询 帮你的品牌牢牢守住市场地位  为什么说餐饮行业也需要微博营销  餐饮020 新开餐厅微信微博营销四段法 
评论列表
添加评论