-
Notifications
You must be signed in to change notification settings - Fork 11
/
cell_row_span.py
272 lines (221 loc) · 11.6 KB
/
cell_row_span.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
"""
Table Cell and Row Span extension for Python Markdown
=====================================================
Adds spanning for rows and cells in tables.
**Example:**
| Column 1 | Col 2 | Big row span |
|:-----------------------:|-------| -------------- |
| r1_c1 spans two cols || One large cell |
| r2_c1 spans two rows | r2_c2 | |
|_^ _| r3_c2 | |
| ______  | r4_c2 |_ _|
The example renders as:
.--------------------------------------------------.
| Column 1 | Col 2 | Big row span |
|---------------------------------+----------------|
| r1_c1 spans two cols | |
|---------------------------------| |
| r2_c1 spans two rows | r2_c2 | |
| |-------| One large cell |
| | r3_c2 | |
|-------------------------+-------| |
| ____ | r4_c2 | |
`--------------------------------------------------'
To span cells across multiple columns, end them with two or more consecutive
vertical bars. Cells to the left will be merged together, as many cells are
there are bars. In the example above, there are two bars at the end of cell
2 on row 1, so the two cells to the left of it (numbers 1 and 2) are merged.
To span cells across rows, fill the cell on the last row with at least two
underscores, one at the start and the other at the end of its content, and no
other characters than spaces, underscores, '^' or '='. This is referred to as
the *marker.* The cell with the marker and all the empty cells above it to the
first non-empty cell will be made into a single cell, with the content of the
non-empty cell. See column 3 ("Big row span") in the example.
By default the contents are vertically aligned in the middle of the cell. To
align to the top, include at least one '^' character in the marker between the
two underscores; for example, '|_^^^_|' or simply '|_^ _|'. See row 2 in
column 1 of the example, which is merged with row 3 and aligned at the top. To
align to the bottom, use at least one '=' character between the underscores;
for example, '|_ = _|'. Including both '^' and '=' in a marker raises a
'ValueError' exception.
Note: If this extension finds a cell with at least two underscores and no other
characters other than spaces, '^' or '=', it assumes it's a row span marker and
attempts to process it. If you need a cell that looks like a marker (generally
one with only underscores in it), add the text '' as well---this extension
won't process it as a row span marker and Markdown will change the '' to a
space.
Bug in Markdown 2.6: Python Markdown 2.6 does not process the following table
correctly:
| Column 1 | Column 2 | Column 3 | Column 4 |
| -------- | -------- | -------- | -------- |
| r1,c1 | r1,c2 | r1,c3 | r1,c4 |
| r2,c1 || r2,c3 | r2,c4 |
The table should be rendered as follows:
.-------------------------------------------.
| Column 1 | Column 2 | Column 3 | Column 4 |
|----------+----------+----------+----------|
| r1,c1 | r1,c2 | r1,c3 | r1,c4 |
|----------+----------+----------+----------|
| r2,c1 | r2,c3 | r2,c4 |
`-------------------------------------------'
Instead it comes out as:
.-------------------------------------------.
| Column 1 | Column 2 | Column 3 | Column 4 |
|----------+----------+----------+----------|
| r1,c1 | r1,c2 | r1,c3 | r1,c4 |
|----------+----------+----------+----------|
| r2,c1 | r2,c4 | |
`-------------------------------------------'
The bug is in the *tables* extension, not this one. If you're having problems
getting a table with column spans to work correctly in Markdown 2.6, try
replacing '||' with '|~~|', as follows:
| Column 1 | Column 2 | Column 3 | Column 4 |
| -------- | -------- | -------- | -------- |
| r1,c1 | r1,c2 | r1,c3 | r1,c4 |
| r2,c1 |~~| r2,c3 | r2,c4 |
The table extension processes the above correctly, and this extension recognizes
a cell containing only '~~' as an empty cell. (I chose '~~' because I can't think
of a reason anyone would use that in a cell.) If you want to use something
different, change '~~' to another value in the following line in the code:
RE_empty_cell = re.compile(r'\s*(~~)?\s*$')
Keep in mind that many characters have special meaning in regular expressions.
If you use any of the following characters in the expression, preceed them with
a backslash ("\\") to avoid problems:
. + * | ? $ ( ) [ ] { }
Usage: See Extensions for general extension usage. Use 'cell_row_span' as the
name of the extension. You must include the 'tables' extension *before* this
one, or this extension will not be run.
This extension does not accept any special configuration options.
See <https://python-markdown.github.io/extensions/tables/> for documentation on
the **tables** extension.
This extension works with Python Markdown versions 2.6 and 3 under Python 2
and Python 3.
License: [BSD](http://www.opensource.org/licenses/bsd-license.php)
"""
from markdown.extensions import Extension
from markdown.blockprocessors import BlockProcessor
from markdown.treeprocessors import Treeprocessor
from markdown.util import etree
import re
class CellRowSpanExtension(Extension):
""" Table Cell and Row Span extension """
# Table blocks found by the block processor. Because we need to share the
# list between the block and the tree processors, we make it a property of
# the extension itself.
table_blocks = []
def extendMarkdown(self, md, md_globals):
""" Add our block and tree processors """
if 'table' in md.parser.blockprocessors:
md.parser.blockprocessors.add('cell_row_span',
CellRowSpanBlockProcessor(self, md.parser), '<table')
md.treeprocessors.add('cell_row_span',
CellRowSpanTreeProcessor(self), '<inline')
class CellRowSpanBlockProcessor(BlockProcessor):
""" Save the current block if it's a table """
table_blockprocessor = None
def __init__(self, extension_obj, md_parser):
self.table_blocks = extension_obj.table_blocks
self.table_blockprocessor = md_parser.blockprocessors['table']
super(CellRowSpanBlockProcessor, self).__init__(md_parser)
def test(self, parent, block):
""" Use the 'table' BlockProcessor's test method to determine if this a table block """
return self.table_blockprocessor.test(parent, block)
def run(self, parent, blocks):
""" Add the table block's source to our list """
self.table_blocks.append(blocks[0])
return False # Tell the BlockProcessor we didn't process the block
class CellRowSpanTreeProcessor(Treeprocessor):
""" Add cell and row spans to table as needed """
RE_adjacent_bars = re.compile(r'\|(~~)?\|')
RE_remove_lead_pipe = re.compile(r'^ *\|') # ... Colonel Mustard in the Library? ;)
RE_row_span_marker = re.compile(r'^_[_^= ]*_$')
RE_valign_top = re.compile(r'\^')
RE_valign_bottom = re.compile(r'=')
RE_empty_cell = re.compile(r'\s*(~~)?\s*$')
def __init__(self, extension_obj):
self.table_blocks = extension_obj.table_blocks
super(CellRowSpanTreeProcessor, self).__init__()
def _update_colspan_attrib(self, text, t_index, tr_index, tr, td_remove):
""" Update 'colspan' attributes in 'td' entries """
text = self.RE_remove_lead_pipe.sub('', text) # Remove leading '|' from text
td_index = 0
td_last_active_index = 0
for c in text.split('|'):
if len(c) == 0 or c == '~~':
try:
td = tr[td_last_active_index] # Update 'colspan' on previous cell
except IndexError:
row_content = ''
for i in range(len(tr)):
x = tr[i].text
row_content += " Cell %i: %s\n" % (i+1, x if x else 'Empty')
raise IndexError(
'Cannot merge cell beyond end of row '
"(one too many '|' characters in row?)\n"
'Check row %i of table %i in your document. Row contents:\n%s' % (
tr_index+1, t_index, row_content
)
)
if td_index < len(tr):
span = 1
if 'colspan' in td.keys():
span = int(td.get('colspan'))
td.set('colspan', str(span+1))
td_remove.append( (tr, tr[td_index]) )
else:
td_last_active_index = td_index
td_index += 1
def _update_rowspan_attrib(self, tbody, tr_index, td_index, td_remove):
""" Update 'rowspan' attributes in 'td' entries """
# Look for '^' (vertical align top) or '=' (bottom) in the marker
marker = tbody[tr_index][td_index].text
v_align = 'middle'
if self.RE_valign_top.search(marker):
v_align = 'top'
if self.RE_valign_bottom.search(marker):
if v_align == 'top':
raise ValueError(
'Cannot use both ^ (top) and = (bottom) codes in a row span '
'marker\nCheck row %i, column %i in table %i in your '
'document' % (tr_index+1, td_index+1, self.table_count)
)
v_align='bottom'
# Starting from the current row, go up the rows and delete columns
# until we hit a non-empty column (or the start of the table)
for row_num in reversed(range(tr_index+1)):
td = tbody[row_num][td_index]
if row_num == tr_index or self.RE_empty_cell.match(td.text):
td_remove.append( (tbody[row_num], td) )
else:
break
# Update the colspan and valign attributes on the row
td = tbody[row_num][td_index] if row_num >=0 else tbody[0][td_index]
td.set('rowspan', str(tr_index-row_num+1))
def run(self, root):
# Process all the tables in the ElementTree
t_index = 0
for table in root.findall('.//table'):
# Retrieve the block saved by the BlockProcessor
rows = self.table_blocks[t_index].split('\n')
t_index += 1
# Scan the original table text for adjacent columns and spanned rows
tbody = table.find('tbody')
td_remove = [] # List of td objects to be removed
tr_index = 0
for tr in tbody:
# Check for adjacent columns
if tr_index + 2 in rows and self.RE_adjacent_bars.search(rows[tr_index+2]):
self._update_colspan_attrib(rows[tr_index+2], t_index, tr_index, tr, td_remove)
# Check for spanned rows
td_index = 0
for td in tr:
if self.RE_row_span_marker.match(td.text):
self._update_rowspan_attrib(tbody, tr_index, td_index, td_remove)
td_index += 1
tr_index += 1
# Remove unneeded td elements
for tr, td in td_remove:
if td in tr:
tr.remove(td)
def makeExtension(*args, **kwargs):
return CellRowSpanExtension(*args, **kwargs)