bsoup needs lots of improvements #155

dustmop · 2021-11-30T21:42:01Z

Bsoup was written a long time ago, based mainly on the reference python implementation, without too much regard for how easy it would be to use by developers. It was also built when starlark was much younger, and was missing key features such as arbitrary attribute support.

Some things that should be fixed:

printing nodes should work. Perhaps they could display as an expandable tree
contents() returns weird results
get_text is not recursive
no method to get tag name
parent.div should work, returning a div child node of parent. child() would be unnecessary then
parseHtml -> bsoup() rename

Also the docs need lots of work.

GeoffBarrett · 2024-09-27T05:48:20Z

This issue is old, but I am also seeing issues with get_text(). I am calling it on the entire page contents and it returns an empty string, I am assuming due to the lack of recursion described in this issue.

dustmop self-assigned this Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bsoup needs lots of improvements #155

bsoup needs lots of improvements #155

dustmop commented Nov 30, 2021

GeoffBarrett commented Sep 27, 2024

bsoup needs lots of improvements #155

bsoup needs lots of improvements #155

Comments

dustmop commented Nov 30, 2021

GeoffBarrett commented Sep 27, 2024