This repository has been archived by the owner on Feb 11, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
level1.rst.old
2015 lines (1612 loc) · 98.4 KB
/
level1.rst.old
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
.. Copyright (c) 2009 Ars Aperta, Itaapy, Pierlis, Talend.
Authors: Hervé Cauwelier <[email protected]>
Jean-Marie Gouarné <[email protected]>
Luis Belmar-Letelier <[email protected]>
This file is part of Lpod (see: http://lpod-project.org).
Lpod is free software; you can redistribute it and/or modify it under
the terms of either:
a) the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option)
any later version.
Lpod is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with Lpod. If not, see <http://www.gnu.org/licenses/>.
b) the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
################################
Level 1 Functional specification
################################
.. contents::
Common features and conventions
===============================
All the lpOD level 0 features are available through the level 1 API, so the
applications can create, retrieve or delete any element. They can create,
select, update or delete any attribute or sub-element in a previously retrieved
element.
The API provides functions and methods.
Object creation
---------------
Functions are mainly used as object constructors, in order to create new ODF
elements that could be later attached to a document. The name of an object
constructor is like ``odf_create_xxx()`` where "xxx" is the object type.
These constructors return "free" ODF elements, i.e. elements which don't belong
yet to any document; these elements may be attached later through a document or
context based method. However, some very specific objects may be created "in
place", through ``set_xxx()`` element specific methods that create the objects
and directly append them to the calling element.
Once created, an object may be changed through the ``set_text()`` and
``set_attribute()`` level 0 methods; however, the level 1 features allow the
user to set the most used properties using a more friendly way.
Object property handling
------------------------
The level 1 ``set_attribute()`` method extends the level 0 one by allowing
the user, in some situations, to forget the ODF namespaces. Knowing that every
ODF attribute name belong to a namespace, if ``set_attribute()`` is
called from an ODF element with an attribute name without namespace prefix, the
method transparently concatenates the given name to the namespace prefix of the
calling object. ``get_attribute()`` use the same behaviour. As a consequence,
the prefix may be safely omitted with attributes whose namespace is the same as
the namespace of the target element. In addition, knowing that an XML attribute
name can't contain blank spaces, these methods automatically replace every
space by a dash. For example, assuming ``p`` is a paragraph (which belongs to
the "text" namespace), the three instructions below (that return the style of
the given paragraph) are equivalent::
p.get_attribute('text:style-name')
p.get_attribute('style-name')
p.get_attribute('style name')
There is an exception regarding a particular attribute, which is the style name.
When ``get_attribute()`` or ``set_attribute()`` is called with an attribute
name without prefix and ending with "style", the namespace prefix is inserted as
usual, but in addition a "-name" string is silently appended. Knowing that
attributes like "xxx-style-name" are very frequently used, this feature provides
a "xxx style" shortcut. As a consequence, the following instruction does the
same as each one of the previous example::
p.get_attribute('style')
A ``get_attributes()`` method is provided, that returns all the attributes of
the calling element (with their real ODF names) and their values as a array
of named items. The ``set_attributes()`` method allows the user to change or
create several attributes a a time; it checks and transforms the given
attribute names in the same way as ``set_attribute()``.
Some ODF elements own a ``set_properties()`` method, which could sound redundant
with ``set_attributes()``. However, ``set_properties()`` may set element
properties that imply element-specific transformations or constructs, makes some
consistency checks, and allow the user to provide property names that aren't
directly translated in simple attributes using the same name transformation
rules as ``set_attributes()``. The same logic apply to ``get_properties()``,
when defined.
In the present specification, some element properties or attributes may be
named using multiple-word designations (ex: ``display name``, ``page layout``)
that include spaces or dashes. Knowing that such designations are not easy to
use as variable names in every programming language, spaces and dashes should
be replaces by underscore ("_") characters in the lpOD executable
implementations.
Method scopes
-------------
Some methods are document-based, other are context-based, and other are
element-specific.
A document-based method is a method that makes sense at the document level
only. As an example, ``insert_style()`` is document-based knowing that a style
is always defined at the document level.
A context-based method is designed in order to allow the user to insert, search,
process or delete content elements either in the whole document body, or in a
particular branch in the content tree. For example ``insert_element()`` is
context-based because it allows the insertion of an element in any context. Of
course, a context is always an ODF element, but context-based methods are
available whatever the element type (however, a context-based method can raise
an error, for example when it's used to execute an operation that is not legal
for the current context).
The level 1 ``insert_element()`` method supports all the features of the level 0
version, but it accepts the additional parameters ``before`` and ``after``,
whose value is an ODF element. The element to be inserted takes place
immediately *after* the reference element provided through the ``after``
parameter (if set). Alternatively, the insertion will take place *before* any
element which is provided through the ``after`` parameter. These parameters are
intended to hide the low level XML jargon, and they are, of course, optional and
mutually exclusive.
On the other hand, ``append_element()`` always attaches an element after the
last child of the context element.
An element-specific method works with specific ODF elements only, according to
their particular role. For example ``set_header()`` is provided with ODF master
pages, because a header is an extension of a page style element, while
``set_background()`` is available with objects where a background definition
makes sense (such as page layouts or paragraph styles).
Common element-specific functions and methods
=============================================
Any ODF element in the level 1 API inherits all the features of the underlying
XML element.
Every ODF element comes with methods that directly return its parent, next
sibling, previous sibling, and the list of its children. These methods (which
are provided by the underlying XML API) are available whatever the element type.
Any element provides a ``clone`` method, which creates a new instance of the
element with all its children; this instance is free and can be inserted later
in any place in the same document or in another document. An element may be
removed through a ``delete`` method from its parent element; the deletion
removes the element itself and all its children.
Some elements are created without any predefined attachment, i.e. as a free
elements, by specific constructor functions whose name is like
``odf_create_xxx()``, where ``xxx`` is the kind of element to be created.
A free element can be inserted later at the right place. Other elements, whose
definition doesn't make sens out of a specific context, are directly created in
place, through context-based methods whose name is ``set_xxx()``. Beware, every
``set_xxx()`` method creates or replaces something in the calling element, but
some of them don't create new elements.
Any element is able to be serialized and exported as an XML, UTF8-encoded
string. Symmetrically, an element can be created from an application- provided
XML string. As a consequence, lpOD-based applications can remotely transmit or
receive any kind of ODF content.
The level 1 API is not validating, so the user is responsible of the ODF
compliance (the API doesn't automatically prevent the applications from
inserting an element at the wrong place or to set non-ODF elements).
Any element can be retrieved according to its sequential position in a given
context or its text content (if defined), through methods like
``get_xxx_by_position()`` and ``get_xxx_by_content()`` where "xxx" is the
element type (i.e. "paragraph", "heading", etc). For example::
element = context.get_xxx_by_position(p)
element = context.get_xxx_by_content(regex)
It's possible to get the list of elements of a known type in the context, using
``get_xxx_list()``.
The two lines above retrieve an element among the children of context element.
The first one selects the child element at the given ``p`` position.
The given position is an integer; the first position is zero; negative positions
are counted back from the last (-1 is the last position).
The second instruction retrieves the first element whose text content matches a
given ``regex`` regular expression. Knowing that the regexp could be matched by
more than one element, the same method is available in a list context.
Addtional retrieval methods are available according to the element type.
Every search method operates in context, knowing that the context could be the
whole document as well as a particular element (section, table, etc).
Basic text containers
=====================
Paragraphs
-----------
A paragraph element inherits all the basic element features introduced above,
and owns the following ones.
All the visible text content of a document is hold in paragraphs (and in
*headings*, which are special paragraphs, cf. later in this documentation).
A paragraph is basically a text container associated with a layout style.
The text content may be directly hold as the text of the paragraph element;
however, a paragraph can contain sub-paragraph elements so-called *spans*
(introduced later in this documentation).
As soon as a piece of text is displayed somewhere in a document,
whatever the context, this text belongs to a paragraph.
In a text document, paragraphs may appear as top level elements, i.e.
directly in the document body, as well as inside complex containers, such as
lists, tables, text boxes. Paragraphs may be used as components of page headers
or footers. In other documents, a paragraph can't appear as a top level element,
knowing that any visible text is embedded in a structured container (table cell,
text box, etc).
Creation and attachment
~~~~~~~~~~~~~~~~~~~~~~~
A paragraph can be created with a given style and a given text content. The
default content is an empty string. There is not default style; a paragraph can
be created without explicit style, as long as the default paragraph style of the
document is convenient for the application. The style and the text content can
be set or changed later.
A paragraph is created (as a free element) using the ``odf_create_paragraph()``
function, with a ``text`` and a ``style`` optional parameters. It may be
attached later through the standard ``append_element()`` or
``insert_element()`` method::
p = odf_create_paragraph(text='My first paragraph', style='TextBody')
document.append_element(p)
Retrieval
~~~~~~~~~
Like any element, a paragraph can be retrieved in a given context using
``get_paragraph_by_position()`` or ``get_paragraph_by_content()``, and
``get_paragraph_list()`` returns all the paragraphs in the context.
The ``get_paragraph_list()`` with a ``style`` named parameter restricts the
search in order to get the paragraphs which use a given style.
Text processing
~~~~~~~~~~~~~~~
The traditional string editing methods (i.e. regex-based search & replace
functions) are available against the text content of a paragraph.
``search()`` in a element-based method which takes a search string (or a
regular expression) as argument a,d returns the position of the first substring
matching the argument in the text content of the element. A null return value
means no match. This method works with the direct text content of the calling
element, not with the children, so it makes sense with paragraphs, headings and
text spans only.
``replace()`` is a context-based method. It takes two arguments, the first one
being a search string like with ``search()``, the second one a text which will
replace any substring matching the search string. The return value of the
method is the total number of matches. If the second argument is an empty
string, every matching substring is just deleted without replacement. If the
second argument is missing, then nothing is changed, and the method just counts
the number of matches. This method is context-based, so it recursively works on
all the paragraphs, headers and spans below the calling element; the calling
element may be any ODF element, including the elements that can't directly own a
text content. It may be called at the document level.
Multiple spaces and intra-paragraph breaks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
According to the ODF specification, a sequence of multiple spaces is regarded
as a single space, so multiple spaces must be represented by an appropriate
ODF element. In the same way, tabulation marks and line breaks can't be
directly included in the text content, and must be replaced by appropriate
ODF elements. This API transparently does the job: it allows the user to put
in a paragraph a text strings containing multiple spaces, tab stops ("\t")
and/or line breaks ("\n").
Headings
---------
All the features that apply to paragraphs, as described above, apply to headings
as well. As a consequence, a heading may be regarded as a subclass of the
paragraph class.
However, a heading is a special paragraph which owns additional properties
related to its hierarchical level and its numbering. As an consequence, some
heading-specific methods are provided, and the constructor function is
``odf_create_heading()``. The ``text`` and ``style`` parameters are allowed
like with ``odf_create_paragraph()``. In addition, this constructor gets more
optional parameters:
- ``level`` which indicates the hierarchical level of the heading (default 1,
i.e. the top level);
- ``restart numbering``, a boolean which, if true, indicates that the numbering
should be restarted at the current heading (default false);
- ``start value`` to restart the heading numbering of the current level at a
given value;
- ``suppress numbering``, a boolean which, if true, indicates that the heading
must not be numbered (default false).
See below for explanations about level and numbering.
In addition, the layout of the headings depends partly on the paragraph style
that individually apply to each one, and partly on the outline style of the
document (see the "Outline style" section in the present document).
Heading level
~~~~~~~~~~~~~
A heading owns a special property which indicates its hierarchical level in the
document. A "level" property can be set at creation time or later and changed at
any time. A heading without a level attribute is assumed to be at level 1, which
is the top level. The level may be any positive integer value (while the ODF
spec doesn't set an explicit limit, we don't recommend levels beyond 10).
Heading numbering
~~~~~~~~~~~~~~~~~~
Whatever the visibility of the numbers, all the headings of a given level are
potentially numbered. By default, the numbering is related to the whole
document starting to 1. However, optional properties allow the user to change
this behaviour.
An arbitrary, explicit numbering value can be set, so the automatic numbering
restarts from this value from the target heading element and apply to the
following headings at the same level.
The automatic numbering can be inhibited through an optional property which
prevents the current heading from being numbered.
In addition, the API allows the users to provide a heading with an arbitrary
hidden number. A hidden number is a static, user-provided value available for
applications that can't dynamically calculate the numbering, but safely ignored
by applications that support dynamic numbering in text documents.
Text spans
----------
A text span, in the lpOD scope, is a delimited area included in a paragraph or
a heading. It's a sub-paragraph text container whose essential function is to
associate a particular feature to a limited text run instead of a whole
paragraph.
There are several kinds of text spans.
- Style spans: a text span can be defined in order to apply a special style to
a part of the content of a paragraph/heading. As a consequence, it's
associated to a text style.
- Hyperlinks: a hyperlink can be defined in order to associate a part of the
content of a paragraph/heading to another content element in the current
document or to an external resource.
Unlike paragraphs and headings, spans are created "in place", i.e. their
creation methods create and directly insert them in an existing container.
A style span is created through a ``set_span()`` method from the object that
will contain the span. This object is a paragraph, a heading or an existing
styling span. The method must be called with a ``style`` named parameter whose
value should be the name of any text style (common or automatic, existing or to
be created in the same document). ``set_span()`` may uses a string or a regular
expression, which may match zero, one or several times to the text content of
the calling object, so the spans can apply repeatedly to every substring that
matches. The string is provided through a ``filter`` parameter. Alternatively,
``set_span()`` may be called with given ``position`` and ``length`` parameters,
in order to apply the span once whatever the content. Note that ``position`` is
an offset that may be a positive integer (starting to 0 for the 1st position),
or a negative integer (starting to -1 for the last position) if the user prefers
to count back from the end of the target. If the ``length`` parameter is omitted
or set to 0 the span runs up to the end of the target content. If ``position``
is out of range, nothing is done; if ``position`` is OK, extra length (if any)
is ignored. The following instructions create two text spans with a so-called
"HighLight" style; the first one applies the given style to any "The lpOD
Project" substring while the second one does it once on fixed length substring
at a given position, ``p`` being the target paragraph::
p.set_span(filter='The lpOD Project', style='HighLight')
p.set_span(position=3, length=5, style='HighLight')
A hyperlink span is created through ``set_hyperlink()``, which waits for the
same positioning parameters (by regex or by position and length). However,
there is no style, and a ``url`` parameter (whose value is any kind of path
specification that is supported by the application) is required instead.
A hyperlink span can't contain any other span, while a style span can contain
one or more spans. As a consequence, the only one way to provide a hyperlink
span with a text style consists of embedding it in a style span.
The objects that can directly contain text spans are paragraphs, headings and
style spans. However, ``set_span()`` and ``set_hyperlink()`` may be called
from any higher level containers that can contain paragraphs or headings,
including the whole document. The span creation process may work recursively and
repeatedly in all the paragraphs, and spans below the calling ODF element. Both
return the list of the created span objects; a span object is an ODF element
itself. However, it's possible to prohibit this behaviour with a boolean
``norecurse`` parameter; if this option is set to ``true``, it prevents
``set_span()`` or ``set_hyperlink()`` from searching and processing the children
of the calling ODF element; of course, nothing is done when ``norecurse`` is the
current object is not able to directly able to contain text spans.
As an example, the instruction below applies the "HighLight" text style to
every "ODF" and "OpenDocument" substring in the ``p`` context::
p.set_span(filter='ODF|OpenDocument', style='HighLight')
The following example associates an hyperlink in the last 5 characters of the
``p`` container (note that the ``length`` parameter is omitted, meaning that
the hyperlink will run up to the end)::
p.set_hyperlink(position=-5, url='http://here.org')
The sequence hereafter show the way to set a style span and a hyperlink for
the same text run. The style span is created first, then it's used as the
context to create a hyperlink span that spreads over its whole content::
s = p.set_span(filter='The lpOD Project', style='Outstanding')
s.set_hyperlink(position=0, url='http://www.lpod-project.org')
Text marks and indices
======================
Position bookmarks
------------------
A position bookmark is a location mark somewhere in a text container, which is
identified by a unique name, but without any content.
A bookmark is created "in place", in a given element at a given position. The
name and the target element are mandatory arguments. By default, the bookmark is put before the first character of the content.
The position can be explicitly provided by the user. Alternatively, the user can provide a regular expression, so the bookmark is set before the first substring that matches the expression::
document.create_bookmark("BM1", paragraph, text="xyz")
document.create_bookmark("BM2", paragraph, position=4)
The first instruction above sets a bookmark before the first substring matching
the given expression (here ``xyz``), which is processed as a regular expression. The second instruction sets a bookmark in the same paragraph at a given (zero-based), so before the 5th character.
In order to put a bookmark according to a regex that could be matched more than
once in the same paragraph, it's possible to combine the position and text
options, so the search area begins at the given position.
A bookmark can be retrieved by its unique name. The ODF element then can be
obtained as the parent of the bookmark element. However, if the bookmark is
located inside a span, its parent is the span element instead of a regular
paragraph. So another method is provided, that returns the main text container
of the bookmark. In the following example, the first line returns the parent of
a given bookmark (whatever the kind of element), while the second one returns
the paragraph (or heading) where the bookmark is located::
context.get_bookmark("BM1").parent
context.get_paragraph_by_bookmark("BM1")
Another method allows the user to get the offset of a given bookmark in the host ODF element. Beware: this offset is related to the text of the parent element (which could be a text span).
Range bookmarks
----------------
A range bookmark is an identified text range which can spread across paragraph
frontiers. It's a named content area, not dependant of the document tree
structure. It starts somewhere in a paragraph and stops somewhere in the same
paragraph or in a following one. Technically, it's a pair of special position
bookmarks, so called bookmark start and bookmark end, owning the same name.
The API allows the user to create a range bookmark and name it through an
existing content, as well as to retrieve and extract it according to its name.
Provided methods allow the user to get
- the pair of elements containing the bookmark start and the bookmark end
(possibly the same);
- the text content of the bookmark (without the structure).
A retrieved range bookmark can be safely removed through a single method.
A range bookmark can be safely processed only if it's entirely contained in the
calling context. A context that is not the whole document can contain a bookmark
start or a bookmark end but not both. In addition, a bookmark spreading across
several elements gets corrupt if the element containing its start point or its
end point is later removed.
Tables of content [todo]
========================
Indices [todo]
=======================
Notes
=======================
Generally speaking, a note is an object whose main function is to allow the user
to set some text content out of the main document body but to structurally
associate this content to a specific location in the document body. The content
of a note is stored in a sequence of one or more paragraphs and/or item lists.
The lpOD API supports three kinds of notes, so-called footnotes, endnotes and
annotations. Footnotes and endnotes have the same structure and differ only by
their display location in the document body, while annotations are specific
objects.
Footnote and endnote creation
-----------------------------
Footnotes and endnotes are created through the same method. The user must
provide a note identifier, i.e. an arbitrary code name (not visible in the
document), unique in the scope of the document, and a class option, knowing that
a note class is either 'footnote' or 'endnote'.
These notes are created as free elements, so they can be inserted later in place
(and replicated for reuse in several locations one or more documents). As a
consequence, creation and insertion are done through two distinct functions,
i.e. ``odf_create_note()`` and ``insert_note()``, the second one being a
context-related method.
While the identifier and the class are mandatory as soon as a note is inserted
in a document, these parameters are not required at the creation time. They can
be provided (or changed) through the insert_note() method.
The ``insert_note()`` method allows the user to insert the note in the same way
as a position bookmark (see above). As a consequence, its first arguments are
the same as those of the create bookmark method. However, ``insert_note()``
requires additional arguments providing the identifier and the citation mark
(if not previously set), and the citation mark, i.e. the symbol which will be
displayed in the document body as a reference to the note. Remember that the
note citation is not an identifier; it's a designed to be displayed according
to a context-related logic, while the identifier is unique for the whole
document.
Regarding the identifier, the user can provide either an explicit value, or an
function that is supposed to return an automatically generated unique value. If
the class option is missing, the API automatically selects 'footnote'.
Footnote and endnote content
-----------------------------
A note is a container whose body can be filled with one or more paragraphs or
item lists at any time, before or after the insertion in the document. As a
consequence, a note can be used as a regular context for paragraph or list
appending or retrieval operations.
Note that neither the OpenDocument schema nor the lpOD level 1 API prevents the
user from including notes into a note body; however the lpOD team doesn't
recommend such a practice.
Annotation creation [tbc]
-------------------------
Annotations don't have identifiers and are directly linked to a given offset in
a given text container.
Change tracking [todo]
----------------------
Structured containers
=====================
Tables
-------
An ``odf_table`` object is a structured container that holds two sets
of objects, a set of *rows* and a set of *columns*, and that is
optionally associated with a table style.
The basic information unit in a table is the *cell*. Every cell is
contained in a row. Table columns don't contain cells; an ODF column
holds information related to the layout of a particular column at the
display time, not content data.
A cell can directly contain one or more paragraphs. However, a cell
may be used as a container for high level containers, including lists,
tables, sections and frames.
Every table is identified by a name (which must be unique for the
document) and may own some optional properties.
Table creation and retrieval
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A table is created using ``odf_create_table()`` with a mandatory name
as its first argument and the following optional parameters:
- ``width``, ``length``: the initial size of the new table
(rows then columns), knowing that it's zero-sized by default
(beware: because cells are contained in rows, no cell in created if
as long as ``width`` is less than 1);
- ``style``: the name of a table style, already existing or to be
defined;
- ``cell style``: the style to use by default for every cell in the table;
- ``protected``: a boolean that, if true, means that the table should
be write-protected when the document is edited through a user-oriented,
interactive application (of course, such a protection doesn't prevent
an lpOD-based tool from modifying the table)(default is false);
- ``protection key``: a (supposedly encrypted) string that represents
a password; if this parameter is set and if ``protected`` is true,
a end-user interactive application should ask for a password that matches
this string before removing the write-protection (beware, such a protection
is *not* a security feature);
- ``display``: boolean, tells that the table should be visible; default is true;
- ``print``: boolean, tells that the table should be printable; however, the
table is not printable if ``display`` is false, whatever the value of
``print``; default is true;
- ``print ranges``: the cell ranges to be printed, if some areas are not to
be printed; the value of this parameter is a space-separated list of cell
ranges expressed in spreadsheet-style format (ex: "E6:K12").
Once created, a table may be incorporated somewhere using ``insert_element()``.
A table may be retrieved in a document according to its unique name using
the context-based ``get_table_by_name()`` with the name as argument. It may
be selected by its sequential position in the list of the table belonging
to the context, using ``get_table_by_position()``, with a zero-based numeric
argument (possibly counted back from the end if the argument is negative).
In addition, it's possible to retrieve a table according to its content,
through ``get_table_by_content()``; this method returns the first table (in
the order of the document) whose text content matches the given argument,
which is regarded as a regular expression.
Table content retrieval
~~~~~~~~~~~~~~~~~~~~~~~
A table object provides methods that allow to retrieve any column, row or cell
using its logical position. A position may be expressed using either zero-based
numeric coordinates, or alphanumeric, spreadsheet-like coordinates. For example
the top left cell should be addressed either by [0,0] or by "A1". On the other
hand, numeric coordinates only allow the user to address an object relatively to
the end of the table; for example, [-1,-1] designates the last cell of the last
row whatever the table size.
Table object selection methods return a null value, without error, when the
given address is out of range.
The number of rows and columns may be got using ``get_size()``.
An individual cell is selected using ``get_cell()`` with either a pair of
numeric arguments corresponding to the row then the columns, or an alphanumeric
argument whose first character is a letter. The second argument, if provided,
is ignored as soon as the first one begins with a letter; if only one numeric
argument is provided, the column number is assumed to be 0.
The two following instructions are equivalent and return the second cell of the
second row in a table (assuming that ``t`` is a previously selected table)::
c = t.get_cell('B2')
c = t.get_cell(1, 1)
``get_row()`` allows the user to select a table row as an ODF element. This
method requires a zero-based numeric value.
``get_column()`` works according to the same logic and returns a table column
ODF element.
The full set of row and column objects may be selected using the table-based
``get_row_list()`` and ``get_column_list()`` methods. By default these methods
return repectively the full list of rows or columns. They can be restricted to
a specified range of rows or columns. The restriction may be expressed through
two numeric, zero-based arguments indicating the positions of the first and the
last item of the range. Alternatively, the range may be specified using a more
"spreadsheet-like" syntax, in only one alphanumeric argument representing the
visible representation of the range through a GUI; this argument is the
concatenation of the visible numbers of the starting and ending elements,
separated by a ":", knowing that "1" is the visible number of the row zero
while "A" is the visible number or the column zero. As a consequence, the two
following instructions are equivalent and return a list including the rows from
5 to 10 belonging to the table ``t``::
rows = t.get_row_list(5, 10)
rows = t.get_row_list('6:11')
According to the same logic, each of the two instruction below returns the
columns from 8 to 15::
cols = t.get_column_list(8, 15)
cols = t.get_column_list('I:P')
Once selected, knowing that cells are contained in rows, a row-based
``get_cell()`` method is provided. When called from a row object,
``get_cell()`` requires the same parameters as the table-based ``get_column()``
method. For example, the following sequence returns the same cell as in the
previous example::
r = t.get_row(1)
c = r.get_cell(1)
Cell range selection
~~~~~~~~~~~~~~~~~~~~
The API can extract rectangular ranges of cells in order to allow the
applications to store and process them out of the document tree, through
regular 2D tables. The range selection is defined by the coordinates of the
top left and the bottom right cells of the target area. The selection is
done using the table-based ``get_cells()`` method, with two possible syntaxes,
i.e. the spreadsheet-like one and the numeric one. The first one requires an
alphanumeric argument whose first character is a letter and that includes a
':', while the second one requires four numeric arguments. As an example, the
two following instructions, which are equivalent, return a bi-dimensional array
corresponding to the cells of the ``B2:D15`` area of a table::
cells = t.get_cells("B2:D15")
cells = t.get_cells(1,1,14,3)
Note that, after such a selection, ``cells[0,0]`` contains the "B2" cell of
the ODF table.
If ``get_cells()`` is called without argument, the selection covers the whole
table.
A row object has its own ``get_cell()`` method. The row based version of
``get_cells()`` returns, of course, a one-column table of cell objects. When
used without argument, it selects all the cells of the row. It may be called
with either a pair of numeric arguments that represent the start and the end
positions of the cell range, or an alphanumeric argument (whose the numeric
content is ignored and should be omitted) corresponding to the start and end
columns in conventional spreadsheet notation. The following example shows two
ways to select the same cell range (beginning at the 2nd position and ending
at the 26th one) in a previously selected row::
cells = r.get_cells('B:Z')
cells = r.get_cells(1, 25)
If the user needs to select a range of cells as a list instead of a 2D array,
the ``get_cell_list()`` method should preferred. This method requires the same
arguments as ``get_cells()`` exists in table- and row-based versions.
**Note**: The range selection feature provided by the level 1 API is a
building block for the lpOD level 2 business-oriented cell range objects.
Row and column customization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The objects returned by ``get_row()`` and ``get_column()`` can be customized
using the standard ``set_attribute()`` or ``set_attributes()`` method. Possible
attributes are:
- ``style``: the name of the applicable style (which should be at display time
a valid row or column style);
- ``cell style``: the default style which apply to each cell in the column or
row unless this cell has no defined style attribute;
- ``visibility``: specifies the visibility of the row or column; legal values
are ``visible``, ``collapse`` and ``filter``.
Table expansion
~~~~~~~~~~~~~~~
A table may be expanded vertically and horizontally, using its ``add_row()`` and
``add_column()`` methods.
``add_row()`` allows the user to insert one or more rows at a given position in
the table. The new rows are copies of an existing one. Without argument, a
single row is just appended as the end. A ``number`` named parameter provides
the number of rows to insert.
An optional ``before`` named parameter may be provided; if defined, the value
of this parameter must be a row number (in numeric, zero-based form) in the
range of the table; the new rows are created as clones of the row existing at
the given position then inserted at this position, i.e. *before* the original
reference row. A ``after`` parameter may be provided instead of ``before``;
it produces a similar result, but the new rows are inserted *after* the
reference row. Note that the two following instructions produce the same
result::
t.add_row(number=1, after=-1)
t.add_row()
The ``add_column()`` does the same thing with columns as ``add_rows()`` for
rows. However, because the cells belong to rows, it works according to a very
different logic. ``add_column()`` inserts new column objects (clones of an
existing column), the it goes through all the rows and inserts new cells
(cloning the cell located at the reference position) in each one.
Of course, it's possible to use ``insert_element()`` in order to insert a row,
a column or a cell externally created (or extracted from an other table from
another document), provided that the user carefully checks the consistency of
the resulting contruct. As an example, the following sequence appends a copy
of the first row of ``t1``after the 5th row of ``t2``::
to_be_inserted = t1.get_row(0).clone();
t2.insert_element(to_be_inserted, after=t2.get_row(5))
Row and column group handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The content expansion and content selection methods above work with the table
body. However it's possible to manage groups of rows or columns. A group may
be created with existing adjacent rows or columns, using ``set_row_group()``
and ``set_column_group()`` respectively. These methods take two mandatory
arguments, which are the numeric positions of the starting and ending elements
of the group. In addition, an optional ``display`` named boolean parameter
may be provided (default=true), instructing the applications about the
visibility of the group.
Both ``set_row_group()`` and ``set_column_group()`` return an object which can
be used later as a context object for any row, column or cell retrieval or
processing. An existing group may be retrieved according to its numeric
position using ``get_row_group()`` or ``get_column_group()`` with the position
as argument, or without argument to get the first (or the only one) group.
A group can't bring a particular style; it's just visible or not. Once created,
its visibility may be turned on and off by changing its ``display`` value
through ``set_attribute()``.
A row group provides a ``add_row()`` method, while a column group provides a
``add_column()`` method. These methods work like their table-based versions,
and they allow the user to expand the content of a particular group.
A group can contain a *header* (see below).
Table headers
~~~~~~~~~~~~~
One or more rows or columns in the beginning of a table may be organized as
a *header*. Row and columns headers are created using the ``set_row_header()``
and ``set_columns_header()`` table-based methods, and retrieved using
``get_row_header()`` and ``get_column_header()``. A row header object brings its
own ``add_row()`` method, which works like the table-based ``add_row()`` but
appends the new rows in the space of the row header. The same logic applies to
column headers which have a ``add_column()`` method.
A table can't directly contain more than one row header and one column header.
However, a column group can contain a column header, while a row group can
contain a row header. So the header-focused methods above work with groups as
well as with tables.
A table header doesn't bring particular properties; it's just a construct
allowing the author to designate rows and columns that should be automatically
repeated on every page if the table doesn't fit on a single page.
The ``get_xxx()`` table-based retrieval methods ignore the content of the
headers. However, it's always possible to select a header, then to used it as
the context object to select an object using its coordinates inside the header.
For example, the first instruction below gets the first cell of a table body,
while the third and third instructions select the first cell of a table header::
c1 = table.get_cell(0,0)
header = table.get_header()
c2 = header.get_cell(0,0)
Individual cell property handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A cell owns both a *content* and some *properties* which may be processed
separately.
The cell content is a list of one or more ODF elements. While this content is
generally made of a single paragraph, it may contain several paragraphs and
various other objects. The user can attach any content element to a cell using
the standard ``insert_element()`` method. However, for the simplest (and the
most usual) cases, it's possible to use ``set_text()``. The cell-based
``set_text()`` method diffs from the level 0 ``set_text()``: it removes the
previous content elements, if any, then creates a single paragraph with the
given text as the new content. In addition, this method accepts an optional
``style`` named parameter, allowing the user to set a paragraph style for the
new content. To insert more content (i.e. additional paragraphs and/or other
ODF elements), the needed objects have to be created externally and attached
to the cell using ``insert_element()``. Alternatively, it's possible to remove
the existing content (if any) and attach a full set of content elements in a
single instruction using ``set_content()``; this last cell method takes a list
of arbitrary ODF elements and appends them (in the given order) as the new
content.
The ``get_content()`` cell method returns all the content elements as a list.
For the simplest cases, the cell-based ``get_text()`` method directly returns
the text content as a flat string, without any structural information and
whatever the number and the type of the content elements.
The properties may be accessed using ``set_properties()`` and
``get_properties()``; ``set_properties()`` works with the following optional
named parameters:
- ``style``: the name of a cell style;
- ``type``: the cell value type, which may be one of the ODF supported data
types, used when the cell have to contain a computable value (omitted with
text cells);
- ``value``: the numeric computable value of the cell, used when the ``type`` is
defined;
- ``currency``: the international standard currency unit identifier (ex: EUR,
USD), used when the ``type`` is ``currency``;
- ``formula``: a calculation formula whose result is a computable value (the
grammar and syntax of the formula is application-specific and not ckecked
by the lpOD API (it's stored as flat text and not interpreted);
- ``protected``: boolean (default false), tells the applications that the cell
can't be edited.
All the existing properties may be retrieved using the cell ``get_properties()``
which returns a list of named parameters.
Cell span extension
~~~~~~~~~~~~~~~~~~~
A cell may be expanded in so it covers one or more adjacent columns and/or rows.
The cell-based ``set_span()`` method allows the user to control this expansion.
It takes ``rows`` and ``columns`` as parameters, specifying the number of rows
and the number of columns covered. The following example selects the "B4" cell
then expands it over 4 columns and 3 rows::
cell = table.get_cell('B4')
cell.set_span(rows=3, columns=4)
The existing span of a cell may be get using ``get_span()``, which returns the
``rows`` and ``columns`` values.
This method changes the previous span of the cell. The default value for each
parameter is 1, so a ``set_span()`` without argument reduces the cell at its
minimal span.
When a cell is covered due to the span of another cell, it remains present and
holds its content and properties. However, it's possible to know at any time if
a given cell is covered or not through the boolean ``is_covered()`` cell method.
In addition, the span values of a covered cell are automatically set to 1, and
``set_span()`` is forbidden with covered cells.
Note that the API doesn't support cell spans that spread across table header
or group boundaries.
Item lists
----------
A list is a structured object that contains an optional list header followed by
any number of list items. The list header, if defined, contains one or more
paragraphs that are displayed before the list. A list item can contain
paragraphs, headings, or lists. Its properties are ``style``, that is an
appropriate list style, and ``continue numbering``, a boolean value that, if
true, means that *if the numbering style of the preceding list is the same as the current list, the number of the first list item in the current list is the number of the last item in the preceding list incremented by one* (default=false).
.. figure:: figures/lpod_list.*
:align: center
A list is created using ``odf_create_list()``, then inserted using
``insert_element()`` as usual.
A list header is created "in place" with ``set_header()``, called from a list
element; this method returns an ODF element that can be used later as a context
to append paragraphs in the header. Alternatively, it's possible to call the
list-based ``set_header()`` with one or more existing paragraphs as arguments,
so these paragraphs are immediately incorporated in the new list header. Note
that every use of ``set_header()`` replaces any existing header by a new one.
Regular list items are created in place (like the optional list header) using
``add_item()`` wich creates one or more new items and inserts them at a
position which depends on optional parameters, according to the same kind
of logic than the tabble-based ``add_row()`` method. Without any argument, a
single item is appended at end of the list. An optional ``before`` named
parameter may be provided; if defined, the value of this parameter must be a
row number (in numeric, zero-based form) in the range of the list; the new
items are inserted *before* the original item that existed at the given
position. Alternatively, a ``after`` parameter may be provided instead of
``before``; it produces a similar result, but the new items are inserted
*after* the given position. If a additional ``number`` parameter is provided
with a integer value, the corresponding number of identical items are
inserted in place.
By default, a new item is created empty. However, as a shortcut for the most
common case, it's possible to directly create it with a text content. To do
so, the text content must be provided through a ``text`` parameter; an
optional ``style`` parameter, whose value is a regular paragraph style, may
provided too. The new item is then created with a single paragraph as content
(that is the most typical situation).
Another optional ``start value`` parameter may be set in order to restart the
numbering of the current list at the given value. Of course, this start value
apply to the first inserted item if ``add_item()`` is used to create many items
in a single call.
``add_item()`` returns the newly created list of item elements. In addition,
an existing item may be selected in the list context using ``get_item()`` with
its numeric position. A list item is an ODF element, so any content element
may be attached to it using ``insert_element()``.
Note that, unlike headings, list items don't have an explicit level property.
All the items in an ODF list have the same level. Knowing that a list may be
inside an item belonging to another list, the hierarchy is represented by the
structural list imbrication, not by item attributes.
Data pilot (pivot) tables [todo]
--------------------------------
Sections
--------
A section is a named region in a text document. It's a high level container that
can include one or more content elements of any kind (including sections, that
may be nested).
The purpose of a section is either to assign certain formatting properties to a
document region, or to include an external content.
A section is created using ``odf_create_section()`` with a mandatory name
as the first argument and the following optional parameters:
- ``style``: the name of a section style, already existing or to be defined;
- ``url`` : the URL of an external resource that will provide the content of the
section;
- ``protected``: a boolean that, if true, means that the section should
be write-protected when the document is edited through a user-oriented,
interactive application (of course, such a protection doesn't prevent
an lpOD-based tool from modifying the table)(default is false);
- ``protection key``: a (supposedly encrypted) string that represents
a password; if this parameter is set and if ``protected`` is true,