-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with unliking nodes #60
Comments
Update: The problem with the loop was keeping the xpath context alive but not the document as described in #61 . So that's solved now that I know that (again). Regarding the hard crash I observed it's not due to unlinking a single specific node, but rather always crashes at some point. Lowering the amount of stuff I strip step by step makes it run at some point. I'll try to reproduce this behavior in a minimal example. |
Thanks for troubleshooting in this detail and letting us know @jangernert ! There is clearly a need for a more transparent (and ideally earlier) failure when the DocumentWeak refs are no longer around. It's also reassuring the root cause for both issues was the same flaw, definitely something to address for the next crate release. |
Update: Good but unexpected news. The issue isn't related to async at all. Master branch has the exact same issue now, so the only thing that changed to trigger this is the "Golem.de" website. Tested the example with an older release (rust 1.36.0) just to be sure. Taking out some of the things that get stripped from the document fixes this. But I coudln't pin it down to one thing specifically. It seems to either be a threshold or a certain combination. I hope this is concrete enough information to help me find the cause of this. |
So since I remember this working at some point I tested with Libxml2 version 2.9.9. Sadly without success. |
Hi @jangernert , I think you won't like my answer - but I don't think this is a bug in the crate - rather it's an issue with your aggressive use of unlink+xpath. If you trim down your example to: // ... snip ...
let context = Context::new(&doc).unwrap();
let xpath = "//*[contains(@class, 'tags') or contains(@id, 'tags')]";
let res = context.evaluate(xpath).unwrap();
let node_vec = res.get_nodes_as_vec();
for node in node_vec {
println!(" Node: {:?}", doc.node_to_string(&node));
}
// ... snip ... You see the nodes:
In particular the 4th printed node with The likely "intended" fix here, in the spirit of libxml2, would be for you to try and design your xpath in a restrictive enough way where you don't get multiple nodes from the same subtree in the result set. A "higher level" solution, which would cost runtime performance, is to design something akin to a |
Taking a step back, I am always in favor of adding more safety in using the wrapper, when there is no obvious performance cost. For cases where there is a noticeable cost I lean towards giving it a shot as an experiment, but as a separately namespaced set of methods. Maybe we need a convention that we add slower equivalents to dangerous methods with a Alternatively one could make a |
@dginev No, that seems like a fair answer. But what baffles me is that the error appeared at some point in a unit test. So I'm quite confident in saying that it worked at time writing the test. Also not sure why the same code in C works. |
Indeed, just validated that the |
Ah right, this is actually something I realized way back when and then forgot, but luckily I left a comment: impl Drop for _Node {
/// Free node if it isn't bound in some document
/// Warning: xmlFreeNode is RECURSIVE into the node's children, so this may lead to segfaults if used carelessly
The rust wrapper needs to not leak memory, which means whenever it Drops an unlinked node it should free it explicitly. Unlinked nodes will not be deallocated internally by libxml2. Which is what the wrapper implementation does, but it does not spend the time and effort to check if there are multiple nodes from the same subtree -- and that leads to freeing twice in your example. So indeed your C file does not segfault, but here's something else it does instead:
It leaks memory!
No leaks! It took us a few iterations until we arrived at a very safe memory deallocation scheme here, which I will try to keep -- it's extremely useful for large batch jobs to not have your RAM leaking out. Your main.c file on the other hand will be in some RAM usage trouble unless it frees the nodes it gets from XPath. |
Yes, I had a similar thought about an hour ago while riding my bike (its surprising how often you realize these things during physical exercise) that in the C example I don't free every unlinked node. I'll work around the issue and close this report. Thank you for the support! |
With the release of rust 1.39 I hopped on the async train for the first time. However I am having some difficulties with libxml related code. I'm not sure if it's a problem with my code or libxml itself. I'll try to describe my problems here. Hopefully someone can help.
I'm running my unit tests with
#[tokio::test]
which statesUses current_thread runtime.
. So there should be no multi threading shenanigans.The first test straight up crashes with this message:
I nailed it down to the
node.unlink()
line in this function:A very similar function does not trigger this error. Maybe it's a very specific case that is triggered here.
The second problem I'm having is because async functions don't support recursing right now. So I refactored my code into a loop that looks like this:However in the second iteration of the loop the check_for_next_page function gets stuck evaluating an xpath expression (for which I wrote a macro)The complete code is over here. So if anyone has any ideas I'd be happy to hear them.
If the code base of my crate is too big I can try to whip up some minimal samples.
Thank you!
The text was updated successfully, but these errors were encountered: