Parrot+Assembly+Language

=Parrot Assembly Language =

=PDD 6: Parrot Assembly Language (PASM) =

Abstract
The format of Parrot's bytecode assembly language.

Description
Parrot's bytecode can be thought of as a form of machine language for a virtual super CISC machine. It makes sense, then, to define an assembly language for it for those people who may need to generate bytecode directly, rather than indirectly through a high-level language. 

Questions

 *  Can we get rid of PASM ? conversely, does PASM need to be kept up to date? PASM is just a text form of PBC, so it should be kept are there specific PBC features that can't currently be represented in PASM? besides hll and :outer? :init lexicals? :vtable I'm a bit rusty, but anything that starts with a '.' or ':' is suspect things that start with '.' are just directives to IMCC, equally applicable to PASM and PIR isn't PASM separate from IMCC? mdiep: it used to be separate so to say that PASM can have directives is a major architectural change perhaps the biggest thing we need is a definition of what PASM actually is the line has grown quite fuzzy over the years PASM could be defined as stringified PBC compilable stringified pbc it should be defined that way if we're going to call it assembly. barney: that's the most likely direction, and if so, it has some implications for how PASM behaves allison: which is what we want, anyway, right? particle: yup yes good, looks like we're in agreement and headed in the proper direction on that topic.

Implementation
Parrot opcodes take the format of: code code destination[dest_key], source1[source1_key], source2[source2_key] code The brackets do not denote optional arguments as such--they are real brackets. They may be left out entirely, however. If any argument has a key the assembler will substitute the null key for arguments missing keys. Conditional branches take the format: code code boolean[bool_key], true_dest code The key parameters are optional, and may be either an integer or a string. If either is passed they are associated with the parameter to their left, and are assumed to be either an array/list entry number, or a hash key. Any time a source or destination can be a PMC register, there may be a key. Destinations for conditional branches are an integer offset from the current PC. All registers have a type prefix of P, S, I, or N, for PMC, string, integer, and number respectively. While parrot bytecode does not have a fixed limit on the number of registers, PASM has an implementation limit on the number of addressable registers of each type, currently set at 100 (0-99).

<span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif; font-size: 17px;">Assembly Syntax
<span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">All assembly opcodes contain only ASCII lowercase letters, digits, and the underscore. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">Assembler directives are prefixed with a dot. These directives are instructions for the assembler and may or may not translate to a PASM instruction. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">Labels all end with a colon. They may have ASCII letters, numbers, and underscores in them. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">Namespaces are noted with the .namespace directive. It takes a single parameter, the name of the namespace, in the form of a multi-dimensional key. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">Constants can be declared with the .macro_const directive. It takes two parameters: the name of the constant and the value. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">Subroutine names are noted with the .pcc_sub directive. It takes a single parameter, the name of the subroutine, which is added to the namespace's symbol table. Sub names may be any valid Unicode alphanumeric character and the underscore. The .pcc_subdirective may take flags to indicate when the sub should be invoked. The following flags are available: :main to indicate that execution should start at the specified subroutine; :immediate or :postcomp to indicate that the sub should be run immediately after compilation; :load to indicate that the sub should be executed when its bytecode segment is loaded; :init to indicate the sub should be run when the file is run directly. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">Constants don't need to be named and put in a separate section of the assembly source. The assembler will take care of putting them in the appropriate part of the generated bytecode. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">Below is an overview of the grammar of a PASM file. code pasm_file: [ pasm_line '\n' ]*

pasm_line: pasm_instruction | constant_directive | namespace_directive

pasm_instruction: [ [ sub_directive ]? label ]? instruction

sub_directive: ".pcc_sub" [ sub_flag ]?

sub_flag: ":init" | ":main" | ":load" | ":postcomp" | ":immediate" | ":anon"

label: identifier ":"

constant_directive: ".macro_const" identifier literal

namespace_directive: ".namespace" "[" multi_dimensional_key "]"

multi_dimensional_key: quoted_string [ ";" quoted_string ]* code

<span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif; font-size: 17px;">Opcode List
<span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">There may be multiple (but unlisted) versions of an opcode. If an opcode takes a register that might be keyed, the keyed version of the opcode has a _k suffix. If an opcode might take multiple types of registers for a single parameter, the opcode function really has a _x suffix, where x is either P, S, I, or N, depending on whether a PMC, string, integer, or numeric register is involved. The suffix isn't necessary (though not an error) as the assembler can intuit the information from the code. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">In those cases where an opcode can take several types of registers, and more than one of the sources or destinations are of variable type, then the register is passed in extended format. An extended format register number is of the form: code register_number | register_type code <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">where register_type is 0x100, 0x200, 0x400, or 0x800 for PMC, string, integer, or number respectively. So N19 would be 0x413. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">**Note**: Instructions tagged with a * will call a vtable function to handle the instruction if used on PMC registers. <span style="color: #1d2d45; font-family: 'Trebuchet MS',Trebuchet,Arial,Verdana,sans-serif;">In all cases, the letters x, y, and z refer to register numbers. The letter t refers to a generic register (P, S, I, or N). A lowercase p, s, i, or n means either a register or constant of the appropriate type (PMC, string, integer, or number).