-
Notifications
You must be signed in to change notification settings - Fork 2
/
ckcplm.txt
3113 lines (2652 loc) · 137 KB
/
ckcplm.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[1]The Columbia Crown The Kermit Project | Columbia University
612 West 115th Street, New York NY 10025 USA o [2][email protected]
...since 1981
[3]Home [4]Kermit 95 [5]C-Kermit [6]Scripts [7]Current [8]New [9]FAQ
[10]Support
C-Kermit Program Logic Manual
Frank da Cruz
[11]The Kermit Project
As of: C-Kermit 9.0.300, 30 June 2011
Last update: Fri Jul 1 15:47:34 2011
IF YOU ARE READING A PLAIN-TEXT version of this document, note that
this file is a plain-text dump of a Web page. You can visit the
original (and possibly more up-to-date) Web page here:
[12]http://www.columbia.edu/kermit/ckcplm.html
CONTENTS
1. [13]INTRODUCTION
2. [14]FILES
3. [15]SOURCE CODE PORTABILITY AND STYLE
4. [16]MODULES
4.A. [17]Group A: Library Routines
4.B. [18]Group B: Kermit File Transfer
4.C. [19]Group C: Character-Set Conversion
4.D. [20]Group D: User Interface
4.E. [21]Group E: Platform-Dependent I/O
4.F. [22]Group F: Network Support
4.G. [23]Group G: Formatted Screen Support
4.H. [24]Group H: Pseudoterminal Support
4.I. [25]Group I: Security
I. [26]APPENDIX I: FILE PERMISSIONS
1. INTRODUCTION
The Kermit Protocol is specified in the book [27]Kermit, A File
Transfer Protocol by Frank da Cruz, Digital Press / Butterworth
Heinemann, Newton, MA, USA (1987), 379 pages, ISBN 0-932376-88-6. It is
assumed the reader is familiar with the Kermit protocol specification.
This file describes the relationship among the modules and functions of
C-Kermit 5A and later, and other programming considerations. C-Kermit
is designed to be portable to any kind of computer that has a C
compiler. The source code is broken into many files that are grouped
according to their function, as shown in the [28]Contents.
C-Kermit has seen constant development since 1985. Throughout its
history, there has been a neverending tug-of-war among:
a. Functionality: adding new features, fixing bugs, improving
performance.
b. Adding support for new platforms or communication methods.
c. "Buzzword 1.0 compliance".
The latter category is the most frustrating, since it generally
involves massive changes just to keep the software doing what it did
before in some new setting: e.g. the K&R-to-ANSIC conversion (which had
to be done, of course, without breaking K&R); Y2K (not a big deal in
our case); the many and varied UNIX and other API "standards" with
which to "comply".
Upon first glance at the source code, you will probably be appalled.
Many will be tempted to clean it up and modernize it. But as soon as
you do, you are sure to break something. Remember that above all else,
the C-Kermit code is portable to every Unix platform that ever existed,
going back Unix V7 (1979)*, and to several other completely different
and unrelated operating-system families such as DEC/HP VMS, DG AOS/VS,
and Stratus VOS, as well as to some Unix offshoots like OS-9 and Plan 9
(from Outer Space). Every release of Kermit has been checked on every
platform available -- the older the better! -- to make sure it still
builds and runs. Even today (2011), there are modern Unix systems that
have non-ANSI C compilers, foremost among them HP-UX (where an ANSI
optimizing C compiler is available, but only as an expensive add-on).
In a way, portability is the most important feature of C-Kermit and
every effort should be made to preserve it through future releases.
Voluminous edit histories are available going back to May 1985. The
first versions of C-Kermit were done on our [29]DEC VAX-11/750 with
Ultrix 1.0 and 2.0 (as well as departmental 750s with 4.2BSD**), DEC
Pro-380 workstations (desktop PDP-11s) running 2.9BSD, which was
[30]ported to the 380 by us. Later (1988 or so) on a big VAX 8650 with
Ultrix, which became an 8700 (these no doubt weighed several tons), and
finally a succession of non-DEC equipment: an Encore Multimax, 25 years
worth of Suns, and now Linux on [31]HP Blades. We also had our own VMS
development systems for some years. All this plus a generous assortment
of departmental and offsite guest accounts on a multitude of platforms.
Anyway, the edit histories:
[32]ckc04e.txt C-Kermit 4.2(030) May 1985 to 4E(072) Jan 1989.
[33]ckc04f.txt C-Kermit 4F(077) Arp 1989 to 4F(095) Aug 1989.
[34]ckc168.txt Updates to C-Kermit 5A(168) for VMS Nov 1991
[35]ckc178.txt C-Kermit 5A(100) Jul 1989 to 5A(178) Jan 1992
[36]ckc188.txt C-Kermit 5A(188) development, 1992
[37]ckc189.txt C-Kermit 5A(189) development, 1993
[38]ckc192.txt C-Kermit 6.0(192) development, 1998
[39]ckc197.txt C-Kermit 7.0(197) development, 2000
[40]ckc200.txt C-Kermit 8.0.200 development, 2001
[41]ckc211.txt C-Kermit 8.0.201 through 8.0.209 2001-2004
[42]ckc300.txt C-Kermit 9.0.300 June 2011
_________________________________
* C-Kermit 6.0 was the last one to be built on V7, as I recall. The
code should still be good for V7 but it probably has outgrown the
16-bit address space. In any case there is still a V7 makefile target
and a V7 path through the forest of #ifdefs in the code if anybody is
running V7 on an emulator and would like to try building C-Kermit.
There is no support for V6 but that is only because no V6 system was
ever found for development. Notice that some other 16-bit Unixes are
supported in the code, including 2.9BSD and Tandy Xenix 3.0, but have
not been tried since C-Kermit 6.0
** C-Kermit 9.0.300 was built successfully on 4.2BSD about 25 years
later, in June 2011.
[ [43]Contents ] [ [44]C-Kermit ] [ [45]Kermit Home ]
2. FILES
C-Kermit source files begin with the two letters "ck", for example
ckutio.c. Filenames are kept short (6.3) for maximum portability and
(obviously I hope) do not contain spaces or more than one period. The
third character in the name denotes something about the function group
and the expected level of portability:
a General descriptive material and documentation (text)
b BOO file encoders and decoders (obsolete)
c All platforms with C compilers (*)
d Data General AOS/VS
e Reserved for "ckermit" files, like ckermit.ini, ckermit2.txt
f (reserved)
g (reserved)
h (reserved)
i Commodore Amiga (Intuition)
j (unused)
k (unused)
l Stratus VOS
m Macintosh with Mac OS 1-9
n (unused)
o OS/2 and Microsoft Windows 9x/ME/NT/2000/XP/Vista/etc
p Plan 9 from Bell Labs
q (reserved)
r DEC PDP-11 with RSTS/E (never used, open for reassignment)
s Atari ST GEMDOS (last supported in version 5A(189))
t DEC PDP-11 with RT-11 (never used, open for reassignment)
u Unix-based operating systems (*)
v VMS and OpenVMS
w Wart (Lex-like preprocessor, platform independent)
x (reserved)
y (reserved)
z (reserved)
0-3 (reserved)
4 IBM AS/400
5-8 (reserved)
9 Microware OS-9
_ (underscore) Encryption modules
(*) In fact there is little distinction between the ckc*.* and cku*.*
categories. It would make more sense for all cku*.* modules to be
ckc*.* ones, except ckufio.c, ckutio.c, ckucon.c, ckucns.c, and
ckupty.c, which truly are specific to Unix. The rest (ckuus*.c,
ckucmd.c, etc) are quite portable.
One hint before proceeding: functions are scattered all over the ckc*.c
and cku*.c modules, where function size has begun to take precedence
over the desirability of grouping related functions together, the aim
being to keep any particular module from growing disproportionately
large. The easiest way (in UNIX) to find out in what source file a
given function is defined is like this (where the desired function is
foo()...):
grep ^foo\( ck*.c
This works because the coding convention has been to make function
names always start on the left margin with their contents indented, for
example:
static char *
foo(x,y) int x, y; {
...
}
Also note the style for bracket placement. This allows bracket-matching
text editors (such as EMACS) to help you make sure you know which
opening bracket a closing bracket matches, particularly when the
opening bracket is above the visible screen, and it also makes it easy
to find the end of a function (search for '}' on the left margin).
Of course EMACS tags work nicely with this format too:
$ cd kermit-source-directory
$ etags ck[cu]*.c
$ emacs
Esc-X Visit-Tags-Table<CR><CR>
(but remember that the source file for ckcpro.c is [46]ckcpro.w!)
Also:
* Tabs should be set every 8 spaces, as on a VT100.
* All lines must no more than 79 characters wide after tab expansion.
* Note the distinction between physical tabs (ASCII 9) and the
indentation conventions, which are: 4 for block contents, 2 for
most other stuff (obviously this is not a portability issue, just
style).
[ [47]Contents ] [ [48]C-Kermit ] [ [49]Kermit Home ]
3. SOURCE CODE PORTABILITY AND STYLE
C-Kermit was designed in 1985 as a platform-independent replacement for
the earlier Unix Kermit. C-Kermit's design was expected to promote
portability, and judging from the number of platforms to which it has
been adapted since then, the model is effective, if not ideal
(obviously if we had it all to do over, we'd change a few things). To
answer the oft-repeated question: "Why are there so many #ifdefs?",
it's because:
* Many of them are related to feature selection and program size, and
so need to be there anyway.
* Those that treat compiler, library, platform, header-file, and
similar differences have built up over time as hundreds of people
all over the world adapted C-Kermit to their particular
environments and sent back their changes. There might be more
politically-correct ways to achieve portability, but this one is
natural and proven. The basic idea is to introduce changes that can
be selected by defining a symbol, which, if not defined, leaves the
program exactly as it was before the changes.
* Although it might be possible to "clean up" the "#ifdef mess",
nobody has access to all the hundreds of platforms served by the
#ifdefs to check the results.
And to answer the second-most-oft-repeated question: "Why don't you
just use GNU autoconfig / automake / autowhatever instead of
hard-coding all those #ifdefs?" Answers:
* The GNU tools are not available on all the platforms where C-Kermit
must be built and I wouldn't necessarily trust them if they were.
* Each platform is a moving target, so the tools themselves would
need to updated before Kermit could be updated.
* It would only add another layer of complexity to an already complex
process.
* Conversion at this point would not be practical unless there was a
way to test the results on all the hundreds of platforms where
C-Kermit is supposed to build.
When writing code for the system-independent C-Kermit modules, please
stick to the following coding conventions to ensure portability to the
widest possible variety of C preprocessors, compilers, and linkers, as
well as certain network and/or email transports. The same holds true
for many of the "system dependent" modules too; particularly the Unix
ones, since they must be buildable by a wide variety of compilers and
linkers, new and old.
This list does not purport to be comprehensive, and although some items
on it might seem far-fetched, they would not be listed unless I had
encountered them somewhere, some time. I wish I had kept better records
so I could cite specific platforms and compilers.
* Try to keep variable and function names unique within 6 characters,
especially if they are used across modules, since 6 is the maximum
for some old linkers (actually, this goes back to TOPS-10 and -20
and other old DEC OS's where C-Kermit never ran anyway; a more
realistic maximum is probably somewhere between 8 and 16). We know
for certain that VAX C has a 31-character max because it complains
-- others might not complain, but just silently truncate, thus
folding two or more routines/variables into one.
* Keep preprocessor symbols unique within 8 characters; that's the
max for some preprocessors (sorry, I can't give a specific example,
but in 1988 or thereabouts, I had to change character-set symbols
like TC_LATIN1 and TC_LATIN2 to TC_1LATIN and TC_2LATIN because the
digits were being truncated and ignored on a platform where I
actually had to build C-Kermit 5A; unfortunately I didn't note
which platform -- maybe some early Ultrix version?)
* Don't create preprocessor symbols, or variable or function names,
that start with underscore (_). These are usually reserved for
internal use by the compiler and header files.
* Don't put #include directives inside functions or { blocks }.
* Don't use the #if or #elif preprocessor constructions, only use
#ifdef, #ifndef, #define, #undef, and #endif.
* Put tokens after #endif in comment brackets, e.g. #endif /* FOO */.
* Don't indent preprocessor statements - # must always be first char
on line.
* Don't put whitespace after # in preprocessor statements.
* Don't use #pragma, even within #ifdefs -- it makes some
preprocessors give up.
* Same goes for #module, #if, etc - #ifdefs do NOT protect them.
* Don't use logical operators in preprocessor constructions.
* Avoid #ifdefs inside argument list to function calls (I can't
remember why this one is here, but probably needn't be; we do this
all the time).
* Always cast strlen() in expressions to int:
if ((int)strlen(foo) < x)...
* Avoid typedefs; they might be portable but they are very confusing
and there's no way to test for their presence or absence at compile
time. Use preprocessor symbols instead if possible; at least you
can test their definitions.
* Unsigned long is not portable; use a preprocessor symbol (Kermit
uses ULONG for this).
* Long long is not portable. If you really need it, be creative.
* Similarly 1234LL is not portable, nor almost any other constant
modifier other than L.
* Unsigned char is not portable, use CHAR (a preprocessor symbol
defined in the Kermit header files) and always take precautions
against character signage (more about this [50]below).
* Don't use initializers with automatic arrays or structs: it's not
portable.
* Don't use big automatic arrays or structs in functions that might
be called recursively; some platforms have fixed-size stacks (e.g.
Windows 9x: 256K) and recursive functions crash with stack
overflow. Even when there is not a compiler limitation, this causes
memory to be consumed without bound, and can end up filling swap
space.
* Don't assume that struct assignment performs a copy, or that it
even exists.
* Don't use sizeof to get the size of an array; someone might come
along later and and change it from static to malloc'd. Always use a
symbol to refer to the array's size.
* Don't put prototypes for static functions into header files that
are used by modules that don't contain that function; the link step
can fail with unresolved references (e.g. on AOS/VS).
* Avoid the construction *++p (the order of evaluation varies; it
shouldn't but at least one compiler had a bug that made me include
this item).
* Don't use triple assignments, like a = b = c = 0; (or quadruple,
etc). Some compilers generate bad code for these, or crash, etc
(some version of DEC C as I recall).
* Some compilers don't allow structure members to have the same names
as other identifiers. Try to give structure members unique names.
* Don't assume anything about order of evaluation in boolean
expressions, or that they will stop early if a required condition
is not true, e.g.:
if (i > 0 && p[i-1] == blah)
can still dump core if i == 0 (hopefully this is not true of any
modern compiler, but I would not have said this if it did not
actually happen somewhere).
* Don't have a switch() statement with no cases (e.g. because of
#ifdefs); this is a fatal error in some compilers.
* Don't put lots of code in a switch case; move it out to a separate
function; some compilers run out of memory when presented with a
huge switch() statement -- it's not the number of cases that
matters; it's the overall amount of code.
* Some compilers might also limit the number of switch() cases, e.g.
to 254.
* Don't put anything between "switch() {" and "case:" -- switch
blocks are not like other blocks.
* Don't jump into or out of switches.
* Don't make character-string constants longer than about 250 bytes.
Longer strings should be broken up into arrays of strings.
* Don't write into character-string constants (obviously). Even when
you know you are not writing past the end; the compiler or linker
might have put them into read-only and/or shared memory, and/or
coalesced multiple equal constants so if you change one you change
them all.
* Don't depend on '\r' being carriage return.
* Don't depend on '\n' being linefeed or for that matter any SINGLE
character.
* Don't depend on '\r' and '\n' being different (e.g. as separate
switch() cases).
* In other words, don't use \n or \r to stand for specific
characters; use \012 and \015 instead.
* Don't code for "buzzword 1.0 compliance", unless "buzzword" is K&R
and "1.0" is the first edition.
* Don't use or depend on anything_t (size_t, pid_t, etc), except
time_t, without #ifdef protection (time_t is the only one I've
found that is accepted everywhere). This is a tough one because the
same function might require (say) a size_t arg on one platform,
whereas size_t is unheard of on another; or worse, it might require
a totally different data type, like int or long or some other
typedef'd thing. It has often proved necessary to define a symbol
to stand for the type of a particular argument to a particular
library or system function to get around this problem.
* Don't use or depend on internationalization ("i18n") features,
wchar_t, locales, etc, in portable code; they are not portable.
Anyway, locales are not the right model for Kermit's
multi-character-set support. Kermit does all character-set
conversion itself and does not use any external libraries or
functions.
* In particular, don't use any library functions that deal with wide
characters or Unicode in any form. These are not only nonportable,
but a constantly shifting target (e.g. the ones in glibc).
* Don't make any assumption about signal handler type. It can be
void, int, long, or anything else. Always declare signal handlers
as SIGTYP (see definition in ckcdeb.h and augment it if necessary)
and always use SIGRETURN at exit points from signal handlers.
* Signals should always be re-armed to be used again (this barely
scratches the surface -- the differences between BSD/V7 and System
V and POSIX signal handling are numerous, and some platforms do not
even support signals, alarms, or longjmps correctly or at all --
avoid all of this if you can).
* On the other hand, don't assume that signals are disarmed after
being raised. In some platforms you have to re-arm them, in others
they stay armed.
* Don't call malloc() and friends from a signal handler; don't do
anything but setting integer global variables in a signal handler.
* malloc() does not initialize allocated memory -- it never said it
did. Don't expect it to be all 0's.
* Did You Know: malloc() can succeed and the program can still dump
core later when it attempts to use the malloc'd memory? (This
happens when allocation is deferred until use and swap space is
full.)
* memset(), memmove(), and memcpy() are not portable, don't use them
without protecting them in ifdefs (we have USE_MEMCPY for this).
bzero()/bcopy() too, except we're guaranteed to have
bzero()/bcopy() when using the sockets library (not really). See
examples in the source.
* Don't assume that strncpy() stops on the first null byte -- most
versions always copy the number of bytes given in arg 3, padding
out with 0's and overwriting whatever was there before. Use
C-Kermit ckstrncpy() if you want predictable non-padding behavior,
guaranteed NUL-termination, and a useful return code.
* DID YOU KNOW.. that some versions of inet_blah() routines return IP
addresses in network byte order, while others return them local
machine byte order? So passing them to ntohs() or whatever is not
always the right thing to do.
* Don't use ANSI-format function declarations without #ifdef
CK_ANSIC, and always provide an #else for the non-ANSI case.
* Use the Kermit _PROTOTYP() macro for declaring function prototypes;
it works in both the ANSI and non-ANSI cases.
* Don't depend on any other ANSI preprocessor features like "pasting"
-- they are often missing or nonoperational.
* Don't assume any C++ syntax or semantics.
* Don't use // as a comment introducer. C is not C++.
* Don't declare a string as "char foo[]" in one module and "extern
char * foo" in another, or vice-versa: this causes core dumps.
* With compiler makers falling all over themselves trying to outdo
each other in ANSI strictness, it has become increasingly necessary
to cast EVERYTHING. Especially char vs unsigned char. We need to
use unsigned chars if we want to deal with 8-bit character sets,
but most character- and string-oriented APIs want (signed) char
arguments, so explicit casts are necessary. It would be nice if
every compiler had a -funsigned-char option (as gcc does), but they
don't.
* a[x], where x is an unsigned char, can produce a wild memory
reference if x, when promoted to an int, becomes negative. Cast it
to (unsigned), even though it ALREADY IS unsigned.
* Be careful how you declare functions that have char or long
arguments; for ANSI compilers you MUST use ANSI declarations to
avoid promotion problems, but you can't use ANSI declarations with
non-ANSI compilers. Thus declarations of such functions must be
hideously entwined in #ifdefs. Example:
int /* Put character in server command buffer */
#ifdef CK_ANSIC
putsrv(char c)
#else
putsrv(c) char c;
#endif /* CK_ANSIC */
/* putsrv */ {
*srvptr++ = c;
*srvptr = '\0'; /* Make sure buffer is null-terminated */
return(0);
}
* Be careful how you return characters from functions that return int
values -- "getc-like functions" -- in the ANSI world. Unless you
explicitly cast the return value to (unsigned), it is likely to be
"promoted" to an int and have its sign extended.
* At least one compiler (the one on DEC OSF/1 1.3) treats "/*" and
"*/" within string constants as comment begin and end. No amount of
#ifdefs will get around this one. You simply can't put these
sequences in a string constant, e.g. "/usr/local/doc/*.*".
* Avoid putting multiple macro references on a single line, e.g.:
putchar(BS); putchar(SP); putchar(BS)
This overflows the CPP output buffer of more than a few C preprocessors
(this happened, for example, with SunOS 4.1 cc, which evidently has a
1K macro expansion buffer).
C-Kermit needs constant adjustment to new OS and compiler releases.
Every new OS release shuffles header files or their contents, or
prototypes, or data types, or levels of ANSI strictness, etc. Every
time you make an adjustment to remove a new compilation error, BE VERY
CAREFUL to #ifdef it on a symbol unique to the new configuration so
that the previous configuration (and all other configurations on all
other platforms) remain as before.
Assume nothing. Don't assume header files are where they are supposed
to be, that they contain what you think they contain, that they define
specific symbols to have certain values -- or define them at all! Don't
assume system header files protect themselves against multiple
inclusion. Don't assume that particular system or library calls are
available, or that the arguments are what you think they are -- order,
data type, passed by reference vs value, etc. Be conservative when
attempting to write portable code. Avoid all advanced features.
If you see something that does not make sense, don't assume it's a
mistake -- it might be there for a reason, and changing it or removing
is likely to cause compilation, linking, or runtime failures sometime,
somewhere. Some huge percentage of the code, especially in the
platform-dependent modules, is workarounds for compiler, linker, or API
bugs.
But finally... feel free to violate any or all of these rules in
platform-specific modules for environments in which the rules are
certain not to apply. For example, in VMS-specific code, it is OK to
use #if, because VAX C, DEC C, and VMS GCC all support it.
[ [51]Contents ] [ [52]C-Kermit ] [ [53]Kermit Home ]
3.1. Memory Leaks
The C language and standard C library are notoriously inadequate and
unsafe. Strings are arrays of characters, usually referenced through
pointers. There is no native string datatype. Buffers are fixed size,
and C provides no runtime bounds checking, thus allowing overwriting of
other data or even program code. With the popularization of the
Internet, the "buffer exploit" has become a preferred method for
hackers to hijack privileged programs; long data strings are fed to a
program in hopes that it uses unsafe C library calls such as strcpy()
or sprintf() to copy strings into automatic arrays, thus overwriting
the call stack, and therefore the routine's return address. When such a
hole is discovered, a "string" can be constructed that contains machine
code to hijack the program's privileges and penetrate the system.
This problem is partially addressed by the strn...() routines, which
should always be used in preference to their str...() equivalents
(except when the copy operation has already been prechecked, or there
is a good reason for not using them, e.g. the sometimes undesirable
side effect of strncpy() zeroing the remainder of the buffer). The most
gaping whole, however, is sprintf(), which performs no length checking
on its destination buffer, and is not easy to replace. Although
snprintf() routines are starting to appear, they are not yet
widespread, and certainly not universal, nor are they especially
portable, or even full-featured.
For these reasons, we have started to build up our own little library
of C Library replacements, ckclib.[ch]. These are safe and highly
portable primitives for memory management and string manipulation, such
as:
ckstrncpy()
Like strncpy but returns a useful value, doesn't zero buffer.
ckitoa()
Opposite of atoi()
ckltoa()
Opposite of atol()
ckctoa()
Returns character as string
ckmakmsg()
Used with ck?to?() as a safe sprintf() replacement for up to 4
items
ckmakxmsg()
Like ckmakmsg() but accepts up to 12 items
More about library functions in [54]Section 4.A.
[ [55]Contents ] [ [56]C-Kermit ] [ [57]Kermit Home ]
3.2. The "char" vs "unsigned char" Dilemma
This is one of the most aggravating and vexing characteristics of the C
language. By design, chars (and char *'s) are SIGNED. But in the modern
era, however, we need to process characters that can have (or include)
8-bit values, as in the ISO Latin-1, IBM CP 850, or UTF-8 character
sets, so this data must be treated as unsigned. But some C compilers
(such as those based on the Bell UNIX V7 compiler) do not support
"unsigned char" as a data type. Therefore we have the macro or typedef
CHAR, which we use when we need chars to be unsigned, but which,
unfortunately, resolves itself to "char" on those compilers that don't
support "unsigned char". AND SO... We have to do a lot of fiddling at
runtime to avoid sign extension and so forth.
Some modern compilers (e.g. IBM, DEC, Microsoft) have options that say
"make all chars be unsigned" (e.g. GCC "-funsigned-char") and we use
them when they are available. Other compilers don't have this option,
and at the same time, are becoming increasingly strict about type
mismatches, and spew out torrents of warnings when we use a CHAR where
a char is expected, or vice versa. We fix these one by one using casts,
and the code becomes increasingly ugly. But there remains a serious
problem, namely that certain library and kernel functions have
arguments that are declared as signed chars (or pointers to them),
whereas our character data is unsigned. Fine, we can can use casts here
too -- but who knows what happens inside these routines.
[ [58]Contents ] [ [59]C-Kermit ] [ [60]Kermit Home ]
4. MODULES
When C-Kermit is on the far end of a connection, it is said to be in
remote mode. When C-Kermit has made a connection to another computer,
it is in local mode. (If C-Kermit is "in the middle" of a multihop
connection, it is still in local mode.)
On another axis, C-Kermit can be in any of several major states:
Command State
Reading and writing from the job's controlling terminal or
"console". In this mode, all i/o is handled by the Group E
conxxx() (console i/o) routines.
Protocol State
Reading and writing from the communications device. In this
mode, all i/o is handled by the Group E ttxxx() (terminal i/o)
routines.
Terminal State
Reading from the keyboard with conxxx() routines and writing to
the communications device with ttxxx() routines AND vice-versa.
When in local mode, the console and communications device are distinct.
During file transfer, Kermit may put up a file-transfer display on the
console and sample the console for interruption signals.
When in remote mode, the console and communications device are the
same, and therefore there can be no file-transfer display on the
console or interruptions from it (except for "in-band" interruptions
such as ^C^C^C).
[ [61]Contents ] [ [62]C-Kermit ] [ [63]Kermit Home ]
4.A. Group A: Library Functions
Library functions, strictly portable, can be used by all modules on all
platforms: [64]ckclib.h, [65]ckclib.c.
(To be filled in... For now, see [66]Section 3.1 and the comments in
ckclib.c.)
[ [67]Contents ] [ [68]C-Kermit ] [ [69]Kermit Home ]
4.B. Group B: Kermit File Transfer
The Kermit protocol kernel. These files, whose names start with "ckc
are supposed to be totally portable C, and are expected to compile
correctly on any platform with any C compiler. "Portable" does not mean
the same as as "ANSI" -- these modules must compile on 10- and 20-year
old computers, with C preprocessors, compilers, and/or linkers that
have all sorts of restrictions. The Group B modules do not include any
header files other than those that come with Kermit itself. They do not
contain any library calls except from the standard C library (e.g.
printf()). They most certainly do not contain any system calls. Files:
[70]ckcsym.h
For use by C compilers that don't allow -D on the command line.
[71]ckcasc.h
ASCII character symbol definitions.
[72]ckcsig.h
System-independent signal-handling definitions and prototypes.
[73]ckcdeb.h
Originally, debugging definitions. Now this file also contains
all definitions and prototypes that are shared by all modules in
all groups.
[74]ckcker.h
Kermit protocol symbol definitions.
[75]ckcxla.h
Character-set-related symbol definitions (see next section).
[76]ckcmai.c
The main program. This module contains the declarations of all
the protocol-related global variables that are shared among the
other modules.
[77]ckcpro.w
The protocol module itself, written in "wart", a lex-like
preprocessor that is distributed with Kermit under the name
CKWART.C.
[78]ckcfns.c, [79]ckcfn2.c, [80]ckcfn3.c
The protocol support functions used by the protocol module.
[81]Group B modules may call upon functions from [82]Group E, but not
from [83]Group D modules (with the single exception that the main
program invokes the user interface, which is in Group D). (This last
assertion is really only a conjecture.)
[ [84]Contents ] [ [85]C-Kermit ] [ [86]Kermit Home ]
4.C. Group C: Character-Set Conversion
Character set translation tables and functions. Used by the [87]Group
B, protocol modules, but may be specific to different computers. (So
far, all character character sets supported by C-Kermit are supported
in [88]ckuxla.c and [89]ckuxla.h, including Macintosh and IBM character
sets). These modules should be completely portable, and not rely on any
kind of system or library services.
[90]ckcxla.h
Character-set definitions usable by all versions of C-Kermit.
ck?xla.h
Character-set definitions for computer "?", e.g. [91]ckuxla.h
for UNIX, [92]ckmxla.h for Macintosh.
[93]ck?xla
Character-set translation tables and functions for computer "?",
For example, CKUXLA.C for UNIX, CKMXLA.C for Macintosh. So far,
these are the only two such modules. The UNIX module is used for
all versions of C-Kermit except the Macintosh version.
[94]ckcuni.h
Unicode definitions
[95]ckcuni.c
Unicode module
Here's how to add a new file character set in the original (non-Unicode
modules). Assuming it is based on the Roman (Latin) alphabet. Let's
call it "Barbarian". First, in ck?xla.h, add a definition for FC_BARBA
(8 chars maximum length) and increase MAXFCSETS by 1. Then, in
ck?xla.c:
* Add a barbarian entry into the fcsinfo array.
* Add a "barbarian" entry to file character set keyword table,
fcstab.
* Add a "barbarian" entry to terminal character set keyword table,
ttcstab.
* Add a translation table from Latin-1 to barbarian: yl1ba[].
* Add a translation table from barbarian to Latin-1: ybal1[].
* Add a translation function from Barbarian to ASCII: xbaas().
* Add a translation function from Barbarian to Latin-1: xbal1().
* Add a translation function from Latin-1 to Barbarian: xl1ba().
* etc etc for each transfer character set...
* Add translation function pointers to the xls and xlr tables.
Other translations involving Barbarian (e.g. from Barbarian to
Latin-Cyrillic) are performed through these tables and functions. See
ckuxla.h and ckuxla.c for extensive examples.
To add a new Transfer Character Set, e.g. Latin Alphabet 9 (for the
Euro symbol), again in the "old" character-set modules:
In ckcxla.h:
+ Add a TC_xxxx definition and increase MAXTCSETS accordingly.
In ck?xla.h (since any transfer charset is also a file charset):
+ Add an FC_xxxx definition and increase MAXFCSETS accordingly.
In ck?xla.c:
+ Add a tcsinfo[] entry.
+ Make a tcstab[] keyword table entry.
+ Make an fcsinfo[] table entry.
+ Make an fcstab[] keyword table entry.
+ Make a tcstab[] keyword table entry.
+ If necessary, make a langinfo[] table entry.
+ Make entries in the function pointer arrays.
+ Provide any needed functions.
As of C-Kermit 7.0, character sets are also handled in parallel by the
new (and very large) Unicode module, ckcuni.[ch]. Eventually we should
phase out the old way, described just above, and operate entirely in
(and through) Unicode. The advantages are many. The disadvantages are
size and performance. To add a character to the Unicode modules:
In ckcuni.h:
+ (To be filled in...)
In ckcuni.c:
+ (To be filled in...)
[ [96]Contents ] [ [97]C-Kermit ] [ [98]Kermit Home ]
4.D. Group D: User Interface
This is the code that communicates with the user, gets her commands,
informs her of the results. It may be command-line oriented,
interactive prompting dialog, menus and arrow keys, windows and mice,
speech recognition, telepathy, etc. The one provided is command-and
prompt, with the ability to read commands from various sources: the
console keyboard, a file, or a macro definition. The user interface has
three major functions:
1. Sets the parameters for the file transfer and then starts it. This
is done by setting certain (many) global variables, such as the
protocol machine start state, the file specification, file type,
communication parameters, packet length, window size, character
set, etc.
2. Displays messages on the user's screen during the file transfer,
using the screen() function, which is called by the group-1
modules.
3. Executes any commands directly that do not require Kermit protocol,
such as the CONNECT command, local file management commands,
parameter-setting commands, FTP client commands, etc.
If you plan to embed the [99]Group B, files into a program with a
different user interface, your interface must supply an appropriate
screen() function, plus a couple related ones like chkint() and
intmsg() for handling keyboard (or mouse, etc) interruptions during
file transfer. The best way to find out about this is to link all the
C-Kermit modules together except the ckuu*.o and ckucon.o modules, and
see which missing symbols turn up.
C-Kermit's character-oriented user interface (as opposed to the
Macintosh version's graphical user interface) consists of the following
modules. C-Kermit can be built with an interactive command parser, a
command-line-option-only parser, a graphical user interface, or any
combination, and it can even be built with no user interface at all (in
which case it runs as a remote-mode Kermit server).
[100]ckucmd.h
[101]ckucmd.c
The command parsing primitives used by the interactive command
parser to parse keywords, numbers, filenames, etc, and to give
help, complete fields, supply defaults, allow abbreviations and
editing, etc. This package is totally independent of Kermit, but
does depend on the [102]Group E functions.
[103]ckuusr.h
Definitions of symbols used in Kermit's commands.
ckuus*.c
Kermit's interactive command parser, including the script
programming language: [104]ckuusr.c (includes top-level keyword
tables); [105]ckuus2.c (HELP command text); [106]ckuus3.c (most
of the SET command); [107]ckuus4.c (includes variables and
functions); ckuus[567].c (miscellaneous);
[108]ckuusy.c
The command-line-option parser.
[109]ckuusx.c
User interface functions common to both the interactive and
command-line parsers.
[110]ckuver.h
Version heralds for different implementations.
[111]ckuscr.c
The (old, uucp-like) SCRIPT command
[112]ckudia.c
The DIAL command. Includes specific knowledge of many types of
modems.
Note that none of the above files is actually Unix-specific. Over time
they have proven to be portable among all platforms where C-Kermit is
built: Unix, VMS, AOS/VS, Amiga, OS-9, VOS, etc etc. Thus the third
letter should more properly be "c", but changing it would be too
confusing.
ck?con.c, ckucns.c
The CONNECT command. Terminal connection, and in some cases
(Macintosh, Windows) also terminal emulation. NOTE: As of
C-Kermit 7.0, there are two different CONNECT modules for UNIX:
[113]ckucon.c -- the traditional, portable, fork()-based version
-- and [114]ckucns.c, a new version that uses select() rather
than forks so it can handle encryption. ckucns.c is the
preferred version for Unix; ckucon.c is not likely to keep pace
with it in terms of upgrades, etc. However, since select() is
not portable to every platform, ckucon.c will be kept
indefinitely for those platforms that can't use ckucns.c. NOTE:
SunLink X.25 support is available only in ckucon.c.
ck_*.*, ckuat*.*
Modules having to do with authentication and encryption. Since
the relaxation of USA export laws, they are included with the
general source-code distribution. Secure C-Kermit binaries can
be built using special targets in the standard makefile.
However, secure prebuilt binaries may not be distributed.
For other implementations, the files may, and probably do, have
different names. For example, the Macintosh graphical user interface
filenames start with "ckm". Kermit 95 uses the ckucmd and ckuus*
modules, but has its own CONNECT command modules. And so on.
Here is a brief description of C-Kermit's "user interface interface",
from ckuusr.c. It is nowhere near complete; in particular, hundreds of
global variables are shared among the many modules. These should, some
day, be collected into classes or structures that can be passed around
as needed; not only for purity's sake, but also to allow for multiple
simultaneous communication sessions and or user interfaces. Our list of
things to do is endless, and reorganizing the source is almost always
at the bottom.
The ckuus*.c modules (like many of the ckc*.c modules) depend on the
existence of C library features like fopen, fgets, feof, (f)printf,
argv/argc, etc. Other functions that are likely to vary among operating
systems -- like setting terminal modes or interrupts -- are invoked via
calls to functions that are defined in the [115]Group E
platform-dependent modules, ck?[ft]io.c. The command line parser
processes any arguments found on the command line, as passed to main()
via argv/argc. The interactive parser uses the facilities of the cmd
package (developed for this program, but, in theory, usable by any
program). Any command parser may be substituted for this one. The only
requirements for the Kermit command parser are these:
1. Set parameters via global variables like duplex, speed, ttname,
etc. See [116]ckcmai.c for the declarations and descriptions of
these variables.
2. If a command can be executed without the use of Kermit protocol,
then execute the command directly and set the sstate (start state)
variable to 0. Examples include SET commands, local directory
listings, the CONNECT command.
3. If a command requires the Kermit protocol, set the following
variables:
sstate string data
'x' (enter server mode) (none)
'r' (send a 'get' command) cmarg, cmarg2
'v' (enter receive mode) cmarg2
'g' (send a generic command) cmarg
's' (send files) nfils, cmarg & cmarg2 OR cmlist
'c' (send a remote host command) cmarg
cmlist is an array of pointers to strings.
cmarg, cmarg2 are pointers to strings.
nfils is an integer (hmmm, probably should be an unsigned long).
cmarg can be:
A filename string (possibly wild), or:
a pointer to a prefabricated generic command string, or:
a pointer to a host command string.
cmarg2 is:
The name to send a single file under, or:
the name under which to store an incoming file; must not
be wild.
If it's the name for receiving, a null value means to
store the file under the name it arrives with.
cmlist is:
A list of nonwild filenames, such as passed via argv.
nfils is an integer, interpreted as follows:
-1: filespec (possibly wild) in cmarg, must be expanded
internally.
0: send from stdin (standard input).
>0: number of files to send, from cmlist.
The screen() function is used to update the screen during file
transfer. The tlog() function writes to a transaction log (if TLOG is
defined). The debug() function writes to a debugging log (if DEBUG is
defined). The intmsg() and chkint() functions provide the user i/o for
interrupting file transfers.
[ [117]Contents ] [ [118]C-Kermit ] [ [119]Kermit Home ]
4.E. Group E: Platform-Dependent I/O
Platform-dependent function definitions. All the Kermit modules,
including the command package, call upon these functions, which are
designed to provide system-independent primitives for controlling and
manipulating devices and files. For Unix, these functions are defined
in the files [120]ckufio.c (files), [121]ckutio.c (communications), and
[122]ckusig.c (signal handling).
For VMS, the files are [123]ckvfio.c, ckvtio.c, and [124]ckusig.c (VMS
can use the same signal handling routines as Unix). It doesn't really
matter what the files are called, except for Kermit distribution
purposes (grouping related files together alphabetically), only that
each function is provided with the name indicated, observes the same
calling and return conventions, and has the same type.
The Group E modules contain both functions and global variables that
are accessed by modules in the other groups. These are now described.
(By the way, I got this list by linking all the C-Kermit modules
together except ckutio and ckufio. These are the symbols that ld
reported as undefined. But that was a long time ago, probably circa
Version 6.)
4.E.1. Global Variables
char *DELCMD;
Pointer to string containing command for deleting files.
Example: char *DELCMD = "rm -f "; (UNIX)
Example: char *DELCMD = "delete "; (VMS)
Note trailing space. Filename is concatenated to end of this
string. NOTE: DELCMD is used only in versions that do not
provide their own built-in DELETE command.
char *DIRCMD;
Pointer to string containing command for listing files when a
filespec is given.
Example: char *DIRCMD = "/bin/ls -l "; (UNIX)
Example: char *DIRCMD = "directory "; (VMS)
Note trailing space. Filename is concatenated to end of this
string. NOTE: DIRCMD is used only in versions that do not
provide their own built-in DIRECTORY command.
char *DIRCM2;
Pointer to string containing command for listing files when a
filespec is not given. (currently not used, handled in another
way.)
Example: char *DIRCMD2 = "/bin/ls -ld *";
NOTE: DIRCMD2 is used only in versions that do not provide their
own built-in DIRECTORY command.
char *PWDCMD;
Pointer to string containing command to display current
directory.