python lib
I've been working on some interesting python stuff at Mozilla and one task recently called for called for rending a page and then finding elements with a URL attribute value (like img[src] or a[href]) and ensuring they become absolute URLs. One problem I encountered when using html5lib was that LINK and IMG elements were being skipped when I tokenized the HTML. After browsing through the html5lib source code, I found a variable called voidElements which included both LINK and IMAGE:
我一直在Mozilla上研究一些有趣的python东西,最近又要求执行一项任务,该任务是租用页面,然后查找具有URL属性值的元素(例如img [src]或a [href]),并确保它们成为绝对URL 。 使用html5lib时遇到的一个问题是,当标记HTML时,将跳过LINK和IMG元素。 浏览html5lib源代码后,我发现了一个名为voidElements的变量,其中包含LINK和IMAGE:
voidElements = frozenset(( "base", "command", "event-source", "link", "meta", "hr", "br", "img", "embed", "param", "area", "col", "input", "source"))
When I commented out those two elements, they were found upon next run of my routine, meaning their presence in the set were causing me problems. Here's how I skirted the issue:
当我注释掉这两个元素时,它们是在我的例行程序的下一次运行中发现的,这意味着它们在集合中的出现给我带来了麻烦。 这是我避开问题的方式:
new_void_set = set()for item in html5lib_constants.voidElements: new_void_set.add(item)new_void_set.remove('link')new_void_set.remove('img')html5lib_constants.voidElements = frozenset(new_void_set)
Since voidElements is a frozenset, I couldn't simply remove LINK and IMG, so I needed to create a new frozenset without those elements. Let me know if there's a more python-ish way of creating this frozen set. In an event, delving into the deep recesses of html5lib paid off and I accomplished the goal!
由于voidElements是冻结集,因此我不能简单地删除LINK和IMG,因此我需要创建一个没有这些元素的新冻结集。 让我知道是否还有一种更Python化的方式来创建此冻结集。 在一个事件中,深入研究html5lib的深层收获是有回报的,我实现了目标!
翻译自:
python lib