Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wring out starline parsing #132

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
8095b9a
Fix import goof
CyberiaResurrection Dec 27, 2024
d2cc5f0
Pre-filter bad economics and social strings during star creation
CyberiaResurrection Dec 2, 2024
7cf5840
Change 0000 hexes to 0101 before regex change
CyberiaResurrection Dec 2, 2024
7fb7be3
Trim up spaces in Ix when transforming tree
CyberiaResurrection Dec 4, 2024
f62322c
Tighten star-position chunk of regex
CyberiaResurrection Dec 2, 2024
e285fac
Tighten as-PBG check
CyberiaResurrection Dec 3, 2024
e8f5d91
Tighten checks in square_up_parsed_zero
CyberiaResurrection Dec 3, 2024
14398e9
Fill in empty allegiance code
CyberiaResurrection Dec 3, 2024
10dc898
Tighten checks in square_up_parsed_one
CyberiaResurrection Dec 3, 2024
8fa7262
Deal with dangling parenthesis in trade code
CyberiaResurrection Dec 4, 2024
4ee3083
Tighten singleton tradecode parsing
CyberiaResurrection Dec 4, 2024
67dec6b
Fill blank worlds value in transformer
CyberiaResurrection Dec 4, 2024
d0f1f0d
Count spaces when parsing world count
CyberiaResurrection Dec 4, 2024
3c681a4
Count spaces when parsing base codes
CyberiaResurrection Dec 4, 2024
0569f97
Revert singleton trade-code parsing change
CyberiaResurrection Dec 7, 2024
02371b6
Slightly loosen allegiance-handling overflow
CyberiaResurrection Dec 8, 2024
f0e072d
Strip duplicate spaces from name when parsing
CyberiaResurrection Dec 8, 2024
a14f334
Handle identical successive trade codes when checking overrun
CyberiaResurrection Dec 9, 2024
4459a9f
Count opening space when checking trade code overflow
CyberiaResurrection Dec 9, 2024
b25a635
Count overrun from trade codes onwards
CyberiaResurrection Dec 10, 2024
f4b3aa7
Count overrun from back of trade codes
CyberiaResurrection Dec 10, 2024
496dcca
Handle blank trade codes when reparsing star line
CyberiaResurrection Dec 10, 2024
e4f1292
Only move overrun segments that can be nobles
CyberiaResurrection Dec 10, 2024
2605b66
Don't overrun when no trade segment _can_ be nobles
CyberiaResurrection Dec 11, 2024
485d0ab
Handle when last two trade codes are actually first two of nbz
CyberiaResurrection Dec 12, 2024
c3cfa7f
Align pop-code error messages with other trade-code error messages
CyberiaResurrection Dec 12, 2024
e846296
Make importance warning more consistent with other extensions
CyberiaResurrection Dec 12, 2024
334500b
Handle when last two trade codes aren't first two of nbz
CyberiaResurrection Dec 12, 2024
89eeb51
Disable full shift-right in processing
CyberiaResurrection Dec 13, 2024
1990aac
Shut up, ruff!
CyberiaResurrection Dec 14, 2024
0ed4707
Fix test blast damage
CyberiaResurrection Dec 14, 2024
222aead
Handle suspected empty-trade code with valid zone code
CyberiaResurrection Dec 22, 2024
c5b6791
Fill in empty nobles value in transformer
CyberiaResurrection Dec 23, 2024
ab7afa7
Enforce valid pop digits in homeworld trade codes
CyberiaResurrection Dec 25, 2024
fa72a04
Refine raw-code skip when creating TradeCode
CyberiaResurrection Dec 25, 2024
a8a4359
Trim overlong homeworld codes where needed
CyberiaResurrection Dec 26, 2024
3610b2d
Handle direct-dieback in homeworld trade codes
CyberiaResurrection Dec 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions PyRoute/DeltaStar.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,11 +277,12 @@ def _check_pop_code(self, msg, code, pop):
code_match = code in self.tradeCode.codeset

if pop_match and not code_match:
line = '{} - Calculated "{}" not in trade codes {}'.format(self, code, self.tradeCode.codeset)
line = '{}-{} Calculated "{}" not in trade codes {}'.format(self, self.uwp, code, self.tradeCode.codeset)
msg.append(line)
if code_match and not pop_match:
line = '{} - Found invalid "{}" code on world with {} population: {}'.format(self, code, self.pop,
self.tradeCode.codeset)
line = '{}-{} Found invalid "{}" code on world with {} population: {}'.format(self, self.uwp, code,
self.pop,
self.tradeCode.codeset)
msg.append(line)
check = False
return check
Expand Down
114 changes: 73 additions & 41 deletions PyRoute/Inputs/BaseTransformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def starline(self, args):
tradelen = sum([len(item) for item in args[2]]) + len(args[2]) - 1
if 16 < tradelen and 3 <= len(args[2]) and 1 == len(args[3]) and '' == args[3][0].value.strip(): # Square up overspilled trade codes
if '' == args[4][0].value and '' != args[5][0].value and '' == args[6][0].value:
move_fwd = 3 == len(args[5][0].value) # Will base code still make sense as PBG?
move_fwd = 3 == len(args[5][0].value) and args[5][0].value[0].isdigit() # Will base code still make sense as PBG?
move_rev = 3 == len(args[7][2][0].value) # Will allegiance code still make sense as PBG?
if move_fwd and not move_rev:
last = args[2][-1]
Expand All @@ -54,19 +54,6 @@ def starline(self, args):
args[6][0].value = args[5][0].value
args[5][0].value = args[4][0].value
args[4][0].value = ''
elif '*' != args[5][0].value and 3 == len(args[3]):
if '' == args[4][0].value and '' != args[5][0].value and '' == args[6][0].value:
if args[7][0][0].value == args[7][2][0].value:
args[4][0].value = args[5][0].value
args[5][0].value = args[7][0][0].value
args[6][0].value = args[7][1][0].value
args[7][0][0].value = args[7][2][0].value
args[7][1][0].value = ' '
if 9 == len(args):
args[7][2][0].value = args[8][0].value
args[8][0].value = ''
else:
args[7][2][0].value = ''
if 8 == len(args): # If there's no residual argument
if 1 < len(args[7]):
tailend = args[7][2][0].value
Expand Down Expand Up @@ -105,6 +92,8 @@ def extensions(self, args):

def nobles(self, args):
args[0].value = args[0].value.strip()
if '' == args[0].value:
args[0].value = '-'
return args

def base(self, args):
Expand Down Expand Up @@ -169,6 +158,8 @@ def extensions_transform(self, extensions):
def world_alg_transform(self, world_alg):
if 1 == len(world_alg):
return world_alg[0][0], world_alg[0][1], world_alg[0][2]
if '' == world_alg[1][0].value.strip():
world_alg[1][0].value = '0'
return world_alg[0][0].value, world_alg[1][0].value, world_alg[2][0].value

def transform(self, tree):
Expand Down Expand Up @@ -196,7 +187,7 @@ def transform(self, tree):
self.trim_raw_string(parsed)
rawbitz = self._trim_raw_bitz(parsed)
parsed = self._square_up_parsed_zero(rawbitz[0], parsed)
parsed = self._square_up_parsed_one(rawbitz[1], parsed)
# parsed = self._square_up_parsed_one(rawbitz[1], parsed)
parsed = self._square_up_allegiance_overflow(parsed)

no_extensions = parsed['ix'] is None and parsed['ex'] is None and parsed['cx'] is None
Expand Down Expand Up @@ -260,6 +251,7 @@ def _preprocess_extensions_and_nbz(self, tree):
return tree

def _preprocess_trade_and_nbz(self, tree):
from PyRoute.Inputs.ParseStarInput import ParseStarInput
trade = tree.children[2]
tradelen = sum([len(item.value) for item in trade.children]) + (len(trade.children) - 1)
if 17 > tradelen:
Expand All @@ -270,15 +262,21 @@ def _preprocess_trade_and_nbz(self, tree):
if trade_final_keep:
return tree

overrun = self._calc_trade_overrun(trade.children, self.raw)
starname = tree.children[1].children[0].value
bitz = starname.split(' ')
bitz = [item for item in bitz if '' != item]
rawbitz = self.raw.split(bitz[-1])

if ParseStarInput.can_be_nobles(tree.children[4].children[0].value) and \
ParseStarInput.can_be_base(tree.children[5].children[0].value):
overrun = 0
else:
overrun = self._calc_trade_overrun(trade.children, rawbitz[1])
if 0 == overrun: # if the reconstructed trade code is fully in the raw string, nothing to do - bail out now
return tree
nobles = tree.children[4]
base = tree.children[5]
zone = tree.children[6]
world_alg = tree.children[7]
pbg = world_alg.children[0]
worlds = world_alg.children[1]

if 0 < overrun:
relocate = trade.children[-overrun:]
Expand All @@ -299,20 +297,19 @@ def _preprocess_trade_and_nbz(self, tree):
base.children[0].value = relocate[1].value
nobles.children[0].value = relocate[0].value
trade.children = trade.children[:-2]
elif '' != nobles.children[0].value.strip() and '' == zone.children[0].value.strip():
worlds.children[0].value = pbg.children[0].value
pbg.children[0].value = base.children[0].value
zone.children[0].value = nobles.children[0].value
base.children[0].value = relocate[1].value
nobles.children[0].value = relocate[0].value
trade.children = trade.children[:-2]

return tree

def _is_noble(self, noble_string):
noble = "BCcDEeFfGH"
return all(char in noble for char in noble_string)

def _is_zone(self, zone_string):
if 1 != len(zone_string):
return False
from PyRoute.Inputs.ParseStarInput import ParseStarInput
return zone_string[0] in ParseStarInput.valid_zone

def _preprocess_tree_suspect_empty_trade_code(self, tree):
if 1 != len(tree.children[2].children):
return tree
Expand All @@ -325,6 +322,8 @@ def _preprocess_tree_suspect_empty_trade_code(self, tree):
all_noble = self._is_noble(tree.children[2].children[0])
if not all_noble:
return tree
if self._is_zone(tree.children[6].children[0].value.strip()):
return tree
tree.children[6].children[0].value = tree.children[5].children[0].value
tree.children[5].children[0].value = tree.children[4].children[0].value
tree.children[4].children[0].value = tree.children[2].children[0].value
Expand All @@ -334,25 +333,50 @@ def _preprocess_tree_suspect_empty_trade_code(self, tree):

@staticmethod
def _calc_trade_overrun(children, raw):
from PyRoute.Inputs.ParseStarInput import ParseStarInput
trade_ext = ''
overrun = 0
# first check whether trade codes are straight up aligned
for item in children:
trade_ext += item.value + ' '
if trade_ext in raw:
return 0
trade_ext = ''
num_child = len(children) - 1
gubbinz = [item.value for item in children]
nobles = [item for item in gubbinz if ParseStarInput.can_be_nobles(item)]
if 0 == len(nobles):
return 0
if 1 < len(gubbinz):
if ParseStarInput.can_be_nobles(gubbinz[-2]) and ParseStarInput.can_be_base(gubbinz[-1]):
return 2

for k in range(num_child, 1, -1):
trade_bar = " ".join(gubbinz[:k])
if trade_bar in raw:
overrun = len(children) - k
for j in range(k, len(children)):
if not ParseStarInput.can_be_nobles(gubbinz[j]):
overrun -= 1
return overrun
trade_ext = ' '
i = 0
for item in children: # Dig out the largest left-subset of trade children that are in the raw string
trade_ext += item.value + ' '
if trade_ext in raw: # if it worked with one space appended, try a second space
trade_ext += ' '
if trade_ext not in raw: # if it didn't, drop the second space
trade_ext = trade_ext[:-1]
substr = False
if i < num_child:
substr = children[i + 1].value.rfind(item.value) == (len(children[i + 1].value) - len(item.value))

if not substr:
trade_ext += ' '
if trade_ext not in raw: # if it didn't, drop the second space
trade_ext = trade_ext[:-1]
else: # if appending the space didn't work, try without it
trade_ext = trade_ext[:-1]
# after all that, if we've overrun (such as a nobles code getting transplanted), throw hands up and move on
if trade_ext not in raw:
overrun += 1
i += 1
return overrun

def _square_up_parsed(self, parsed):
Expand All @@ -376,6 +400,12 @@ def trim_raw_string(self, tree):
continue
rawval = tree[dataval]
if rawval is not None:
if rawval.startswith('{ '):
oldlen = 0
while oldlen != len(self.raw):
oldlen = len(self.raw)
self.raw = self.raw.replace('{ ', '{ ')

index = self.raw.find(rawval)
self.raw = self.raw.replace(rawval, '', 1)
if 0 < index:
Expand Down Expand Up @@ -425,15 +455,17 @@ def _square_up_parsed_zero(self, rawstring, parsed):
parsed['zone'] = ''
return parsed
if not rawstring.endswith(' '):
parsed['nobles'] = ''
parsed['base'] = bitz[0]
parsed['zone'] = bitz[1]
return parsed
if 4 > len(bitz[0]):
parsed['nobles'] = ''
parsed['base'] = bitz[0]
parsed['zone'] = bitz[1]
return parsed
if rawstring.startswith(' '):
parsed['nobles'] = ''
parsed['base'] = bitz[0]
parsed['zone'] = bitz[1]
return parsed
if 4 > len(bitz[0]):
parsed['nobles'] = ''
parsed['base'] = bitz[0]
parsed['zone'] = bitz[1]
return parsed
else:
parsed['nobles'] = bitz[0]
parsed['base'] = bitz[1]
Expand Down Expand Up @@ -469,7 +501,7 @@ def _square_up_parsed_one(self, rawstring, parsed):
if 2 == len(trimbitz):
allegiance = trimbitz[1]
rawtrim = rawtrim.replace(allegiance, '', 1)
if alg.isdigit() and 5 > len(alg) and 1 < len(allegiance): # if first trimbit fits in worlds field, stick it there
if alg.isdigit() and 5 > len(alg) and 1 < len(allegiance) and (not allegiance[0].islower()): # if first trimbit fits in worlds field, stick it there
parsed['worlds'] = alg
parsed['allegiance'] = allegiance
parsed['residual'] = rawtrim.strip()
Expand Down Expand Up @@ -522,9 +554,9 @@ def _square_up_allegiance_overflow(self, parsed):
if alleg.startswith('----') and 4 <= len(alleg):
parsed['allegiance'] = '----'
parsed['residual'] = alleg[4:] + parsed['residual']
elif alleg.startswith('--') and 2 <= len(alleg):
elif alleg.startswith('--') and 4 <= len(alleg):
parsed['allegiance'] = '--'
parsed['residual'] = alleg[2:] + parsed['residual']
parsed['residual'] = alleg[2:] + ' ' + parsed['residual']
else:
counter = 0
while counter < len(alleg) and (alleg[counter].isalnum() or '-' == alleg[counter] or '?' == alleg[counter]) and 4 > counter:
Expand Down
49 changes: 45 additions & 4 deletions PyRoute/Inputs/ParseStarInput.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

class ParseStarInput:
regex = """
^([0-3]\d[0-4]\d) +
^((0[1-9]|[1-2]\d|3[0-2])(0[1-9]|40|[1-3]\d)) +
(.{15,}) +
([A-HXYa-hxy][0-9A-Fa-f]\w\w[0-9A-Fa-f][0-9A-Xa-x][0-9A-Ja-j]-\w|\?\?\?\?\?\?\?-\?|[A-HXYa-hxy\?][0-9A-Fa-f\?][\w\?]{2,2}[0-9A-Fa-f\?][0-9A-Xa-x\?][0-9A-Ja-j\?]-[\w\?]) +
(.{15,}) +
Expand Down Expand Up @@ -76,8 +76,12 @@ def parse_line_into_star_core(star, line, sector, pop_code, ru_calc, fix_pop=Fal
star.tradeCode = TradeCodes(data[3].strip())
star.ownedBy = star.tradeCode.owned_by(star)

star.economics = data[6].strip().upper() if data[6] and data[6].strip() != '-' else None
star.social = data[7].strip().upper() if data[7] and data[7].strip() != '-' else None
raw_economics = data[6].strip().upper() if data[6] else None
if raw_economics is not None:
star.economics = raw_economics if 7 == len(raw_economics) else None
raw_social = data[7].strip().upper() if data[7] else None
if raw_social is not None:
star.social = raw_social if 6 == len(raw_social) else None

star.nobles = Nobles()
star.nobles.count(data[11])
Expand Down Expand Up @@ -126,7 +130,7 @@ def parse_line_into_star_core(star, line, sector, pop_code, ru_calc, fix_pop=Fal
star.calculate_importance()
if imp != star.importance:
star.logger.warning(
'{}-{} Calculated importance {} does not match generated importance {}'.
'{}-{} IX Calculated importance {} does not match generated importance {}'.
format(star, star.baseCode, star.importance, imp))
else:
star.calculate_importance()
Expand Down Expand Up @@ -238,6 +242,7 @@ def _unpack_starline_fallback(line):
if matches is None:
return
data = list(matches.groups())
del data[2], data[1]
parsed = {'position': data[0], 'name': data[1], 'uwp': data[2], 'trade': data[3]}
raw_extensions = data[4].replace(' ', ' ').replace('{ ', '{').replace(' }', '}')
oldlen = 0
Expand Down Expand Up @@ -266,6 +271,18 @@ def _unpack_starline_fallback(line):

spacer = ' '
data = [parsed['position'], parsed['name'], parsed['uwp'], parsed['trade'], extensions, parsed['ix'], parsed['ex'], parsed['cx'], spacer, spacer, spacer, parsed['nobles'], parsed['base'], parsed['zone'].upper(), parsed['pbg'], parsed['worlds'], parsed['allegiance'], parsed['residual']]

data = ParseStarInput._unpack_starline_tweak(data)
return data

@staticmethod
def _unpack_starline_tweak(data):
data[1] = data[1].replace(' ', ' ')
if data[16].startswith('--') and '----' != data[16]:
if 2 < len(data[16]):
data[17] = data[16][2:] + " " + data[17]
data[16] = '--'

return data

@staticmethod
Expand Down Expand Up @@ -319,3 +336,27 @@ def check_tl_core(star):
min_tl = max(0, mod + 1)
max_tl = mod + 6
return max_tl, min_tl

@staticmethod
def can_be_nobles(segment) -> bool:
if not isinstance(segment, str):
return False

if '-' == segment.strip():
return True

non_noble = [item for item in segment if item not in ParseStarInput.valid_nobles]
return 0 == len(non_noble)

@staticmethod
def can_be_base(segment) -> bool:
if not isinstance(segment, str):
return False

if '-' == segment or '*' == segment or ' ' == segment:
return True

if 3 < len(segment):
return False

return segment.isupper()
17 changes: 9 additions & 8 deletions PyRoute/Inputs/StarlineParser.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,18 @@ class StarlineParser:
starline_grammar = r"""
starline: position starname trade extensions nobles base zone world_alg residual?

position: /^([0-3]\d[0-4]\d)/
position: /^((0[1-9]|[1-2]\d|3[0-2])(0[1-9]|40|[1-3]\d))/

starname: /(.{15,}) ([A-HXYa-hxy\?][0-9A-Fa-f\?][\w\?]{2,2}[0-9A-Fa-f\?][0-9A-Xa-x\?][0-9A-Ja-j\?]-[\w\?]) /

trade: TRADECODE*
trade: BLANK | TRADECODE*
TRADECODE: MINOR_DIEBACK | BINARY | POPCODE | MINOR_SOPHONT | OWNED_COLONY | MAJOR_SOPHONT | RESIDUAL | SINGLETON
BLANK: / /
BINARY.3: /[A-Z][a-z]/
POPCODE.3: /[A-Z][a-z!]{1,3}[W\d]{0,1}/
MINOR_SOPHONT.3: /\([^\)\{]{1,}\)[W\d\?]{0,1}/
MINOR_SOPHONT.3: /\([^\)\{\(]{1,}\)[WX\d\?]{0,1}/
MINOR_DIEBACK.3: /Di\([^\)]{1,}\)[\d]{0,1}/
MAJOR_SOPHONT.3: /\[[^\]\{]{1,}\][W\d\?]{0,1}/
MAJOR_SOPHONT.3: /\[[^\]\{]{1,}\][WX\d\?]{0,1}/
OWNED_COLONY.3: /[OC]:[X\d\?]{0,4}/ | /[OC]:[A-Z][A-Za-z]{3,3}[-\:]{0,1}\d{4,4}/
RESIDUAL.2: /[0-9A-Za-z?\-+*()\'\{\}\[\]]{2,}/
SINGLETON: /[0-9AC-Za-z\+\*()?\']/
Expand All @@ -39,18 +40,18 @@ class StarlineParser:
ex: /\([0-9A-Za-z]{3}[+-]\d\)|-/
cx: /(\[[0-9A-Za-z]{4}[\]\}]|-)/

nobles: /([BcCDeEfFGH]{1,5}|-| )/
nobles: /([BcCDeEfFGH]{1,5}|-| ) /

base: /([A-Z]{1,3}|-|\*)/
base: /([A-Z]{1,3}|-|\*) /

zone: /([ARUFGBarufgb]|-| )[ ]{0,}/

pbg: /[0-9X?][0-9A-FX?][0-9A-FX?]/
pbg: /[0-9X?][0-9A-FX?][0-9A-FX?] /

residual: /(.{1,})/

world_alg: pbg worlds allegiance
worlds: /(\d{1,2} | |-)/
worlds: /(\d{1,} | |- )/

allegiance: /[A-Z0-9?-][A-Za-z0-9?-]{1,3}/

Expand Down
Loading
Loading