-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inserting a large block is very slow #149
Comments
Wassap @sdh4, does it take input as a string to convert to nodes? ps>
--- base_nodes.py.orig 2017-01-02 15:59:58.000000000 -0600
+++ base_nodes.py 2017-11-14 14:16:19.626943779 -0600
@@ -1375,6 +1375,17 @@
self.data.insert(index, [value, None])
self._synchronise()
+ # sdh4 11/14/17 add insert_multiple
+ def insert_multiple(self,index,values):
+ """ Insert multiple values from an iterable into the ProxyList
+ at the specified index, without resynchronizing after each one. """
+ for value in values:
+ value = self._convert_input_to_node_object(value, parent=self.node_list, on_attribute=self.on_attribute)
+ self.data.insert(index, [value, None])
+ index+=1
+ pass
+ self._synchronise()
+
def append(self, value):
self.insert(len(self), value) What about converting if we wanna do multiple inserts from one RB tree to another RB tree? |
Here's an improvement that can handle a broader array of sources (such as fst's) because it uses _convert_input_to_node_object_list(). Not sure how best to do multiple inserts in different places with no synchronization. I guess a data structure with an array of indices and node lists could be used. Leaving the tree unsync'd seems like it could be a bad idea. Separately, this seems to trigger an indentation bug which I will list separately.
|
I haven't looked at the LineProxyList algorithm, but I took a pretty close look at the CommaProxyList algorithm a few months ago. CommaProxyList was super slow for big files while computing the inter-element indentation. (node.indentation is shockingly slow.) I suspect that could be what's happening here. #146 has my changes. A similar change to LineProxyList might fix this and #150. |
The speed problem here seems to come from the self._synchronize() call completely rebuilding the inner list and then the node list from the inner list. When you do that iteratively the process ends up O(n^2) which blows up quickly for large n. The patch above addresses the problem and does not seem to generate any other problems (#150 is a separate and independent problem). Shall I make a pull request? |
…ltiple lines at one time.
Inserting a large block into a ProxyList can be very slow because of the _synchronize() method called after each insertion.
Propose an insert_multiple() method that allows multiple insertions at the same time:
insert_multiple.patch.txt
Use of this patch can easily give an order of magnitude speedup for use cases such as loop unrolling that involve a large number of insert operations.
The text was updated successfully, but these errors were encountered: