-
Notifications
You must be signed in to change notification settings - Fork 15
The Shunting Yard Algorithm
@beng showed us his BASIC interpreter for which he eventually had to implement the shunting yard algorithm. It was written because he wanted to run old BASIC games like traitor's castle.
It needed to handle BASIC weirdness like arbitrary gotos, ubiquitous global state. Initially it used a eval
-based strategy, where each line was converted into a ruby method that more-or-less eval'd the line contents in a specific context.
This was great, but Monsters of Galacticon was super tricky, a specific line preventing the eval strategy from working, and stopped work for 6 years.
So eventually Ben wrote a proper expression parser, and this used the shunting-yard algo to turn BASIC expressions into an RPN form that could be evaluated with a stack-based VM.
[aside: Leo: why are the line numbers multiples of 10? well, if you realise you need to insert a line between 10 and 20, you can define a new line 15 without renumbering everything (then use RENUMBER to print a rebased listing)]
Infix notation leads to ambiguities about evaluation order e.g.
10 + 2 * 3 => (10 + 2) * 3 or 10 + (2 * 3) ?
Operator precedence is a convention to standardise this without parentheses, so in the above example because *
conventionally has a higher precedence than +
, the second interpretation is used.
RPN / postfix notation lets you express these unambiguously (assuming you know the operator arity):
10 + 2 * 3 => 10 2 3 * +
Tom: you can think of RPN expressions as being a kind of program for a limited stack-based VM, where there are only two instructions:
- push a value onto the stack
- pop values off, perform an operation, push answer
Matt: how does this vary from lisp-style prefix notation?
Tom: this allows step-by-step evaluation, with a program counter moving along the expression token by token (stack-based vm vs register-based).
Okay, so how do we get from infix expressions to these postfix ones?
We discussed two variants of the algorithm - the basic one, where there is a single level of precedence, and the more complex one with multiple precedence levels.
Tuzz LEAPT to the whiteboard and wrote this example for the basic algo:
L O L O L O L
2 + 3 - 4 + 5
L = literal
O = operator
We were asked to believe that this key was accidental, but the reader can judge for themselves.
The basic case of the algorithm is simply:
* for each input token
* if it's an operator, push it to the stack
* if it's a literal, push it to the output, and push the whole stack to the output afterwards
This doesn't model precedence at all (or, I think, assumes all ops are of equal precedence).
The basic case having been ruthlessly dispatched, we moved on to dealing with precedence:
L O L O L O L
2 + 3 * 4 + 5
Precedence:
* = 2
+ = 1
- = 1
Now the algorithm is more complex - literals are still pushed to the output directly, but whether the stack is cleared depends on the next operator encountered. We only considered left-associative operators: when an operator is encountered in the input queue, we clear operators from the stack while they have a greater precedence than the new token.
We worked through this on the board. Choo-choo noises were made.
Associativity (I think) implies a kind of precedence among equal precedence operators
e.g. 1 + 2 + 3 => (1 + 2) + 3 because + is left-associative
whereas 1 ^ 2 ^ 3 => 1 ^ (2 ^ 3) because ^ is (conventionally) right-associative
This slightly modifies the precedence check when an operator is next in the input queue.
[HERE WE DECIDE TO SKIP BRACKETS AND FUNCTIONS AND MOB AN IMPLEMENTATION]
[GREAT AMOUNT OF PAUL DOING VIM STUFF CENSORED]
Some discussion of our model ensued. After a false start we agreed to have an object that could carry some state (our stack and output queue).
[INTERLUDE FOR TRAIN TERMINOLOGY DISCUSSION]
After some train-related pedantry (the best kind of pedantry), we decided that the stack was in fact a siding, and the whole object was the yard. This was approved. We ended the metaphor when Paul tried to call tokens "cars".
We quickly got to the basic implementation and started on precedence.
[NOISES OF JOEL PLAYING WITH WOODEN TRAINS BROUGHT BY BEN]
[MORE VIM NONSENSE CENSORED]
Chris Z CONTROVERSIALLY converted us from "siding" to "operator_stack", which, while unpopular, was an eminently sensible suggestion. Some discussion ensued about whether it really is a purely "operator" stack (since parentheses get placed on it, and they're not really operators), but we decided it was basically fine.
[TOM LIVESTREAMS HIS OWN FEET FOR SEVERAL SECONDS]
Okay, so we've got our stuff to the point where basic precedence works (no associativity), we decide to do parentheses next to override precedence in local circumstances
Ben: it's basically like you're creating a new stack - whatever happens inside the parens, the whole output of the inner expression should end up in the output, and then it continues as if nothing had ever happened.
Leo: what if we just create a new stack and recurse on the inner expression?
We talked this through and thought that, though it makes loads of sense, this might be a lot of complexity - we either need to decide how much of the input to pass down to the recursive call, or have some mechanism for passing the remainder up from the recursive calls.
Reader, we implemented it using the stack-based approach.
Tuzz: Can we hack the precedence system to implement brackets without a special parsing case? e.g. '(' has super high prec, ')' super low.
Tom: well, they're kinda special cased anyway in that they disappear in the final output, so maybe we don't gain any simplicity that way anyway.
Tuzz: could we move the special casing into an output filter that just rejects all paren tokens?
All: MAYBE WE COULD
Mudge: [SOUNDS OF TYPING]
All: OMG THAT ACTUALLY WORKED
Simon: but we've effectively set ')' to minus infinity prec, doesn't this mean that when we encounter ')', we clear the ENTIRE opstack instead of just up until the first '('?
Mudge: [SOUNDS OF TYPING]
All: YES IT DOES
Tom: I WANT TO PLAY WITH TRAINS
All: [MURMURED APPROVAL]
(You can find our final code in our shunting-yard
repository.)
Mudge: This meeting, then. How was it eh?
Joel: purpose of this was to feel less like it's a long book haul, let more people in. Was this good?
All: [general murmurs of assent. maybe we haven't had loads of extra attendees but it felt refreshing innit. Takes pressure off the regular TAPL leaders.]
<3 <3 <3 <3 <3 <3 <3
Organisation: next time we agreed we should:
- pick an organiser
- organise it more than 7-10 days in advance so if there's an obvious person who's sharing a thing, they can prep
ACTION: identify the next natural break, it might not be part 2 of the book
Mudge: next tues Gecko can't host and it's right after Easter. Do we push the meeting back, push it forward?
All: not forward eh
Mudge: TO THE BAT-DOODLE!
All: nods of assent
Mudge: AN ORGANISER THEN?
All: [UNCOMFORTABLE SILENCE]
Abeer: I WILL DO THIS THING
All: [TUMULTUOUS ACCLAIM]
- Home
- Documentation
- Choosing a Topic
- Shows & Tells
- Miscellaneous
- Opt Art
- Reinforcement Learning: An Introduction
- 10 Technical Papers Every Programmer Should Read (At Least Twice)
- 7 More Languages in 7 Weeks
- Lua, Day 1: The Call to Adventure
- Lua, Day 2: Tables All the Way Down
- Lua, Day 3
- Factor, Day 1: Stack On, Stack Off
- Factor, Day 2: Painting the Fence
- Factor, Day 3: Balancing on a Boat
- Elm, Day 1: Handling the Basics
- Elm, Day 2: The Elm Architecture
- Elm, Day 3: The Elm Architecture
- Elixir, Day 1: Laying a Great Foundation
- Elixir, Day 2: Controlling Mutations
- Elixir, Day 3: Spawning and Respawning
- Julia, Day 1: Resistance Is Futile
- Julia, Day 2: Getting Assimilated
- Julia, Day 3: Become One With Julia
- Minikanren, Days 1-3
- Minikanren, Einstein's Puzzle
- Idris Days 1-2
- Types and Programming Languages
- Chapter 1: Introduction
- Chapter 2: Mathematical Preliminaries
- Chapter 3: Untyped Arithmetic Expressions
- Chapter 4: An ML Implementation of Arithmetic Expressions
- Chapter 5: The Untyped Lambda-Calculus
- Chapters 6 & 7: De Bruijn Indices and an ML Implementation of the Lambda-Calculus
- Chapter 8: Typed Arithmetic Expressions
- Chapter 9: The Simply-Typed Lambda Calculus
- Chapter 10: An ML Implementation of Simple Types
- Chapter 11: Simple Extensions
- Chapter 11 Redux: Simple Extensions
- Chapter 13: References
- Chapter 14: Exceptions
- Chapter 15: Subtyping – Part 1
- Chapter 15: Subtyping – Part 2
- Chapter 16: The Metatheory of Subtyping
- Chapter 16: Implementation
- Chapter 18: Case Study: Imperative Objects
- Chapter 19: Case Study: Featherweight Java
- The New Turing Omnibus
- Errata
- Chapter 11: Search Trees
- Chapter 8: Random Numbers
- Chapter 35: Sequential Sorting
- Chapter 58: Predicate Calculus
- Chapter 27: Perceptrons
- Chapter 9: Mathematical Research
- Chapter 16: Genetic Algorithms
- Chapter 37: Public Key Cryptography
- Chapter 6: Game Trees
- Chapter 5: Gödel's Theorem
- Chapter 34: Satisfiability (also featuring: Sentient)
- Chapter 44: Cellular Automata
- Chapter 47: Storing Images
- Chapter 12: Error-Correcting Codes
- Chapter 32: The Fast Fourier Transform
- Chapter 36: Neural Networks That Learn
- Chapter 41: NP-Completeness
- Chapter 55: Iteration and Recursion
- Chapter 19: Computer Vision
- Chapter 61: Searching Strings
- Chapter 66: Church's Thesis
- Chapter 52: Text Compression
- Chapter 22: Minimum spanning tree
- Chapter 64: Logic Programming
- Chapter 60: Computer Viruses
- Show & Tell
- Elements of Computing Systems
- Archived pages