Inserting a large block is very slow #149

sdh4 · 2017-11-14T20:26:38Z

Inserting a large block into a ProxyList can be very slow because of the _synchronize() method called after each insertion.

Propose an insert_multiple() method that allows multiple insertions at the same time:
insert_multiple.patch.txt

Use of this patch can easily give an order of magnitude speedup for use cases such as loop unrolling that involve a large number of insert operations.

rojaster · 2017-11-15T10:08:05Z

Wassap @sdh4, does it take input as a string to convert to nodes?

ps>
please insert code with

insert code

--- base_nodes.py.orig	2017-01-02 15:59:58.000000000 -0600
+++ base_nodes.py	2017-11-14 14:16:19.626943779 -0600
@@ -1375,6 +1375,17 @@
         self.data.insert(index, [value, None])
         self._synchronise()
 
+    # sdh4 11/14/17 add insert_multiple
+    def insert_multiple(self,index,values):
+        """ Insert multiple values from an iterable into the ProxyList 
+        at the specified index, without resynchronizing after each one. """
+        for value in values:
+            value = self._convert_input_to_node_object(value, parent=self.node_list, on_attribute=self.on_attribute)
+            self.data.insert(index, [value, None])
+            index+=1
+            pass
+        self._synchronise()
+    
     def append(self, value):
         self.insert(len(self), value)

What about converting if we wanna do multiple inserts from one RB tree to another RB tree?

sdh4 · 2017-11-15T22:19:31Z

Here's an improvement that can handle a broader array of sources (such as fst's) because it uses _convert_input_to_node_object_list().

Not sure how best to do multiple inserts in different places with no synchronization. I guess a data structure with an array of indices and node lists could be used.

Leaving the tree unsync'd seems like it could be a bad idea.

Separately, this seems to trigger an indentation bug which I will list separately.

--- base_nodes.py.orig	2017-01-02 15:59:58.000000000 -0600
+++ base_nodes.py	2017-11-15 09:42:22.360961636 -0600
@@ -1375,6 +1375,17 @@
         self.data.insert(index, [value, None])
         self._synchronise()
 
+    # sdh4 11/14/17 add insert_multiple
+    def insert_multiple(self,index,values):
+        """ Insert multiple values from an iterable into the ProxyList 
+        at the specified index, without resynchronizing after each one. """
+        values = self._convert_input_to_node_object_list(values, parent=self.node_list, on_attribute=self.on_attribute)
+        for value in values:
+            self.data.insert(index, [value, None])
+            index+=1
+            pass
+        self._synchronise()
+    
     def append(self, value):
         self.insert(len(self), value)

duncf · 2017-11-16T07:49:35Z

I haven't looked at the LineProxyList algorithm, but I took a pretty close look at the CommaProxyList algorithm a few months ago.

CommaProxyList was super slow for big files while computing the inter-element indentation. (node.indentation is shockingly slow.) I suspect that could be what's happening here. #146 has my changes. A similar change to LineProxyList might fix this and #150.

sdh4 · 2018-02-06T05:16:59Z

The speed problem here seems to come from the self._synchronize() call completely rebuilding the inner list and then the node list from the inner list. When you do that iteratively the process ends up O(n^2) which blows up quickly for large n.

The patch above addresses the problem and does not seem to generate any other problems (#150 is a separate and independent problem). Shall I make a pull request?

…ltiple lines at one time.

sdh4 added a commit to sdh4/redbaron that referenced this issue Sep 17, 2018

Add insert_multiple() method per PyCQA#149 to accelerate inserting mu…

211495e

…ltiple lines at one time.

sdh4 mentioned this issue Sep 17, 2018

Add insert_multiple() method to accelerate insertion of multiple lines #170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inserting a large block is very slow #149

Inserting a large block is very slow #149

sdh4 commented Nov 14, 2017

rojaster commented Nov 15, 2017

sdh4 commented Nov 15, 2017

duncf commented Nov 16, 2017

sdh4 commented Feb 6, 2018

Inserting a large block is very slow #149

Inserting a large block is very slow #149

Comments

sdh4 commented Nov 14, 2017

rojaster commented Nov 15, 2017

sdh4 commented Nov 15, 2017

duncf commented Nov 16, 2017

sdh4 commented Feb 6, 2018