博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
python lib_Python html5lib跳过的元素
阅读量:2518 次
发布时间:2019-05-11

本文共 1782 字,大约阅读时间需要 5 分钟。

python lib

I've been working on some interesting python stuff at Mozilla and one task recently called for called for rending a page and then finding elements with a URL attribute value (like img[src] or a[href]) and ensuring they become absolute URLs.  One problem I encountered when using html5lib was that LINK and IMG elements were being skipped when I tokenized the HTML.  After browsing through the html5lib source code, I found a variable called voidElements which included both LINK and IMAGE:

我一直在Mozilla上研究一些有趣的python东西,最近又要求执行一项任务,该任务是租用页面,然后查找具有URL属性值的元素(例如img [src]或a [href]),并确保它们成为绝对URL 。 使用html5lib时遇到的一个问题是,当标记HTML时,将跳过LINK和IMG元素。 浏览html5lib源代码后,我发现了一个名为voidElements的变量,其中包含LINK和IMAGE:

voidElements = frozenset((    "base",    "command",    "event-source",    "link",    "meta",    "hr",    "br",    "img",    "embed",    "param",    "area",    "col",    "input",    "source"))

When I commented out those two elements, they were found upon next run of my routine, meaning their presence in the set were causing me problems.  Here's how I skirted the issue:

当我注释掉这两个元素时,它们是在我的例行程序的下一次运行中发现的,这意味着它们在集合中的出现给我带来了麻烦。 这是我避开问题的方式:

new_void_set = set()for item in html5lib_constants.voidElements:	new_void_set.add(item)new_void_set.remove('link')new_void_set.remove('img')html5lib_constants.voidElements = frozenset(new_void_set)

Since voidElements is a frozenset, I couldn't simply remove LINK and IMG, so I needed to create a new frozenset without those elements.  Let me know if there's a more python-ish way of creating this frozen set.  In an event, delving into the deep recesses of html5lib paid off and I accomplished the goal!

由于voidElements是冻结集,因此我不能简单地删除LINK和IMG,因此我需要创建一个没有这些元素的新冻结集。 让我知道是否还有一种更Python化的方式来创建此冻结集。 在一个事件中,深入研究html5lib的深层收获是有回报的,我实现了目标!

翻译自:

python lib

转载地址:http://auvwd.baihongyu.com/

你可能感兴趣的文章
appium(10)-iOS predictate
查看>>
程序的优化(PHP)
查看>>
Function.prototype.toString 的使用技巧
查看>>
Zookeeper+websocket实现对分布式服务器的实时监控(附源码下载)
查看>>
Oracle to_char 转换数值
查看>>
selinux-网络服务安全
查看>>
urllib
查看>>
NIOS II 中用结构体指示灯终于正常了
查看>>
CF1009F Dominant Indices - 题解
查看>>
memached实现tomcat的session共享
查看>>
django导出excel
查看>>
【搜索】数的划分
查看>>
智能提示
查看>>
[JavaScript] 弹出编辑框
查看>>
一个消息队列MQ调试工具
查看>>
springmvc 访问时找不到配置文件
查看>>
采访吴岳师兄有感 by 王宇飞
查看>>
LVS简略介绍
查看>>
hdu 1021 Fibonacci Again
查看>>
JVM架构_XmnXmsXmxXss有什么区别:转
查看>>