Myer - Format of .myerN files
Jonathan Yavner  (2003 November 15)


Each line represents one token.  The first character on the line is the token type and indicates the line format.

Table of contents: .myer1  .myer2  .myer3  .myer4  .myer5



 Phase 1 - Parse


This is the raw output from myer_cc1 (using the -d@ option).  The line numbering is gcc's unified preprocessor system, which assigns line numbers sequentially across include files.  When counting columns, a tab "\t" counts as one.  Tokens are emitted in the order generated by the compiler.

* text
.....
(TOKEN_COMMENT)  The text is to be ignored.  It tells Emacs to make the tabbed columns 14 characters wide.

B #maxuid  rev

(TOKEN_BUILTIN)  Indicates the last UID reserved by gcc for built-in objects and the major.minor.patch revision for gcc.  Any UID numbered maxuid or less that's referenced in the token stream is intrinsically public.

> lineno  filename

(TOKEN_FILEPUSH)  Line lineno is an #include directive that refers to filename.  This token is also used at the beginning of the .myer1 file to record the name of the initial .c file.  Following tokens (until FILEPOP) refer to text from filename.

< lineno  physline  filename

(TOKEN_FILEPOP)  End of #include, return to filename.  Preprocessor line lineno corresponds to physical line physline in filename.

| lineno  physline  filename

(TOKEN_FILECHG)  Line lineno is a #line directive, telling us to pretend we're on physical line physline of filename.  Later Myer phases ignore this token, for now.

I lineno  filename

(TOKEN_FILESKIP)  Line lineno is an #include directive that refers to already-included file filename.  The file will not be read again, but this token will be counted as a reference.

/ begline - endline

(TOKEN_SKIP)  Lines begline through endline are being ignored due to a preprocessor conditional such as #if.

D begline - endline  ident

(TOKEN_MACRODEF)  Lines begline through endline are a #define for ident.  The tokens within the macro will be emitted at each macro invocation, using lineno values in the begline...endline range.

M begline,begcol - endline,endcol  ident

(TOKEN_MACROREF)  An invocation for macro ident starts at begline,begcol.  The entire invocation (including any parenthesized arguments) ends at endline,endcol.  Identifiers and string-literals generated via token-pasting (#, ##) will have tokens with the "impossible" location endline,endcol-1.  Following tokens whose begline is within the range for the definition of ident are referring to source code within the macro, although the compiler sees them as part of the invocation.

K line,col  ident

(TOKEN_KEYWORD)  The keyword ident starts at offset col in line.  This is mostly useless, and is just here to help the reader keep his/her bearings while reading the .myer1 file.

N line,col  number
C line,col  'c'
S line,col  "string"


(TOKEN_NUMBCONST, TOKEN_CHARCONST, TOKEN_STRCONST)  A numeric, character, or string constant begins at offset col in line.

E line,col  #uid  enumconstName
F line,col  #uid  fieldName
L line,col  
#uid  labelName
f line,col  
#uid  functionName
s line,col  
#uid  structName
v line,col  
#uid  variableName
t line,col  #uid  typeName

(TOKEN_ENUMCONST, TOKEN_FIELD, TOKEN_LABEL, TOKEN_FUNCTION, TOKEN_STRUCT, TOKEN_VARIABLE, TOKEN_TYPEDEF)  A reference to an identifier of the specified type begins at line,col and has unique identifier uid.  Names for unions and enums are lumped with structs.

p line,col  #uid  parameterName

(TOKEN_PARAMETER)  A reference to parameter parameterName starts at line,col and has unique identifier uid.  It's sometimes unclear which function-decl each parameter is associated with (e.g., when declaring a function that takes as a parameter a function-pointer whose own parameters are also declared),  but this doesn't matter for Myer since parameters are local variables with no coupling/cohesion cost.

=  #uid
d  #uid


(TOKEN_DEF,TOKEN_DECLSPOT)  Mark the preceding reference as "the spot" where object uid is defined or declared.  The difference between the two is minor and was a late addition to fix some bug or other: "extern int x" is a declaration, "int x = 14" is a definition.

P  #uid

(TOKEN_PUBLIC)  Mark uid as a "public" object, which can be shared by name between different modules.  Non-public objects are shared only if both modules include the same header file that defines them.

e  #uid #olduid

(TOKEN_EQUATEUID) Treat uid as synonymous with olduid.  All references to uid are changed to be references to olduid.  For duplicate declarations, the new declaration is elaborated as uid, then equated to olduid for the previous declaration.

+ line,col #uid funcName +++++++++++++++++++++++++

(TOKEN_FUNCSTART)  The open-brace for the body of function funcName is at line,col.  The string of plus signs is just to make this token easier to find in the .myer1 files.  Any objects defined after this token and before the FUNCEND are function-local unless marked public.  If the function was pre-declared before the body, the uid will be a duplicate-UID rather than the original UID; this should be cleaned up somehow.

- line,col #uid funcName -------------------------

(TOKEN_FUNCEND)  The close-brace for the body of function funcName is at line,col.  The string of minus signs is just to make this token easier to find in the .myer1 files.  If the function was pre-declared before the body, the uid will be the original-UID and not the same as the one in the FUNCSTART token.

; line,col

(TOKEN_ENDSTMT)  A statement ends at line,col. Often there is a semicolon or close-brace at that spot.  All MACROREF tokens with lower line,col values have now gone out of scope.

q

(TOKEN_ENDINPUT)  The last token in a complete .myer1 file, if compilation was successful.



 Phase 2 - Token


Cleaned-up token stream: master table of UID's, tokens sorted by line,col with duplicates removed, etc.  Line numbers start at 1 for each input file.

The file is in three sections:  the ':' tokens for UID's assigned by the compiler, the ':' tokens for negative UID's assigned by Myer for macros and literal constants, then the token-streams (each beginning with '@') for input files.

* text
.....
(TOKEN_COMMENT) The text is to be ignored. It specifies that this is a .myer2 file, the date and time when it was created, and that Emacs should use 12-character tab-stops.

: #uid type @fileno lineno name
: #uid type name


(TOKEN_DECL) Declares object uid, with name and type. If fileno and lineno are present, they indicate the defining-spot for this UID. (The exact spot is assumed to be the first reference to uid on that line.) If no fileno/lineno, this is declared as an undefined object. The type is two characters. The first character is similar to the token types from phase-1:
v
variable

L
label for goto
f
function

E
enum constant
p
parameter

D
macro
F
field in struct or union

N
numeric constant
s
struct, union, or enum

C
character constant
t
typedef

S
string constant
The second type character is 'P' if the object is public, or 'f' if it is function-local, or ' ' if module-global.

@fileno filename

(TOKEN_NEWFILE) Begin the token-stream for filename, whose associated IDis fileno. The stream continues until the next '@' token or the end of the .myer2 file.

< @fileno begline,0 endline,0

(SPOT_IncludeFile) Line begline is an #include directive that refers to fileno. Generally, endline=begline+1 (although theoretically the directive could occupy multiple lines via backslash continuation). The zeroes are just to maintain a consistent format for most .myer2 tokens.

/ 0 begline,0 endline,0

(SPOT_Skip) Lines begline through endline-1 have been skipped due to a conditional-compilation directive such as #if. The zeroes are just to maintain the columnar format.

D #uid begline,0 endline,0

(SPOT_MacroDef) The definition for macro uid occupies lines begline through endline-1. Following tokens whose begline is within this range are sited within the macro.

M #uid begline,begcol endline,endcol

(SPOT_MacroRef) An invocation of macro uid starts at begline,begcol and ends just before endline,endcol. Following tokens for the macro argument(s) will have positions within this range. Identifiers and string literals generated via token-pasting will show the same begline,begcol and endline,endcol as the macro invocation (i.e., they are "co-extensive" with the macro).

f #uid begline,begcol endline,endcol

(SPOT_FunDef) The body for function uid has its open-brace at begline,begcol and its close-brace at endline,endcol. Following tokens within this range are inside the function body.

C #uid begline,becol endline,endcol

(SPOT_ConstRef) A reference to literal constant uid (numeric, character, or string) starts at begline,begcol and ends just before endline,endcol.

= #uid begline,begcol endline,endcol
d #uid begline,begcol endline,endcol
r #uid begline,begcol endline,endcol

(SPOT_Def,SPOT_Decl,SPOT_Ref) A definition, declaration, or reference for object uid. The object's identifier starts at begline,begcol and ends just before endline,endcol.



 Phase 3 - Merge


Combined token-streams for all compilation units.

The file format is the same as for .myer2, with these additions:

#Wanted
.....
Precedes the '@' token for files that were mentioned on the command-line.  Mentioning .h files on the command-line just adds this #Wanted token.

#Inconsistent contents

Precedes the '@' token for files that could not be successfully merged.  The merge-failure point can be detected because the line numbers will suddenly jump backward (the unmergable tokens are simply appended to the previous copy of the stream).



 Phase 4 - Sum


Annotated token-stream with sums of various kinds that the coupling/cohesion formulas will use.  Of very little interest unless you're debugging the formulas!

The file format consists of three parts.  The first is the same as for .myer3, with these additions:

@fileno  filename
.....
After the actual file-streams, there are six additional '@' tokens for fake files whose names begin with backslashes: "\f-special" for functions referenced via extern and not through a common include-file, "\v-special" for variables referenced that way.  There's also "\t-special" for built-in types, and also "\C-special", "\S-special", and "\N-special" which are the defining modules for literal constants.

& numrefs

Follows each '@' token to indicate the number of #include references to this file from other files.

%

Indicates the end of the first part of a .myer4 file.


The second part of the .myer4 file, in a brilliant burst of creativity, reuses the same token-type codes but with completely different purposes and syntax!

@myfileno numinclude numfunc
.....
File myfileno contains numinclude #include directives and numfunc function bodies.  There is one of these '@' tokens for each "wanted" file.  Next will follow numinclude+7 '<' tokens and then numfunc+1 'f' tokens.

< @fileno numuid f
< @fileno numuid v
< @fileno numuid t
< @fileno numuid C
< @fileno numuid S
< @fileno numuid N
< @fileno numuid

The current file (myfileno) refers to numuid objects from file fileno. Next will follow numuid 'r' tokens.  After an '@' token, the first six file-refs always refer to the six special files, even when numref=0.  The type letters serve little purpose except orienting the human reader to which special file is which.  The seventh file-ref is for references to locally-defined objects; it has fileno = myfileno.

f #0: uidcounts
f #funuid: uidcounts


Counts of UID's that function funuid refers to.  There are numinclude+7 numbers in the uidcounts set, giving reference-counts for UIDs from each special file, then the locally-defined UIDs, and then those from each included file.  The first 'f' token in a file has "#0" and gives the reference-counts for the areas of myfileno that are outside of function bodies.  Add up all the numbers in uidcounts to find out how many 'r' tokens will follow.

r #uid numrefs

There are numrefs references to uid.  If this token follows '>', it indicates that fileno is considered to be the defining file for this uid and also indicates the total number of references to uid from myfileno.  If it follows 'f', it indicates the number of references from within funuid's body.


The third part of the .myer4 file consists entirely of ':' tokens:

: #uid  numfiles  numrefs
.....
There are numrefs total references to object uid, from numfiles different files.  



 Phase 5 - Calc


Token stream, annotated with marginal coupling/cohesion costs.  Mainly of interest for debugging the coupling/cohesion formulas and the HTML-generation code.  Since Myer currently cannot read these .myer5 files, it is unclear whether they actually contain all the information needed to generate the HTML output.  This format is difficult for humans to read because most tokens don't have names or UID's.

The marginal costs for cohesion range from 0.0 (no effect on coupling metric) to 1.0 (maximal effect).  The marginal costs for coupling are either in the range 0.0 (no effect on cohesion metric) to 1.0 (maximal effect) or are one of these special values: -1 = noncompiled code due to #if; -2 = function-local object (hence no coupling, cohesion=0.0), -3 = module-global object (hence no coupling, but cohesion is valid).

* text
.....
(TOKEN_COMMENT) The text is to be ignored. It specifies that this is a .myer5 file, the date and time when it was created, and that Emacs should use 12-character tab-stops.

@ filename

Begin the token-stream for filename. The stream continues until the next '@' token or the end of the .myer5 file.

< begline,0 endline,0  coupling,cohesion

An #include directive.

/ begline,0 endline,0  -1,0

Code that has been conditionally excluded from compilation due to #if.

D begline,0 endline,0  coupling,cohesion  ident

#define directive.  The coupling/cohesion color will be displayed on the name of the macro, whose begcol and endcol are not indicated, but the name is ident.

f begline,begcol endline,endcol  0,0

A function body.  For now, no marginal cost formulas have yet been developed that apply to function bodies.

begline,begcol endline,endcol  coupling,cohesion
begline,begcol endline,endcol  coupling,cohesion
r begline,begcol endline,endcol  coupling,cohesion

A macro invocation, literal constant, or identifier reference.