Custom languages in semantic version control

Tuesday, September 08, 2015 Miguel semantic 5 Comments

Following our path towards a semantic version control, we are totally aware of the great number of programming languages out there. Even if we natively support popular languages –such as C#, Java or C– we can't expect to cover all that ever-growing vastness. But we wouldn't like to let the Plastic SCM community think we've forgotten about them, either.

This is why we've allowed external language parsers to be plugged into Plastic since release Plastic SCM 5.4.16.689. Now, developers can collaborate with the community to implement custom parsers, which means support of virtually any possible language! There are, however, some rules that all custom parsers must follow in order to successfully communicate with the Plastic SCM semantic version control engine.

Take a look at the outcome:

Wouldn't you love to have this kind of information right on your version control GUI? Keep reading, then!

How to implement your custom language parser

First of all, the parser binary must accept two command-line arguments:

customparser.exe shell <flag-file>

The first argument will be always shell, which tells your parser to indefinitely wait for user input (shell mode).

The second argument contains the path to a flag file. The parser will need to create this file once it's ready to start parsing. It's useful to delay parsing requests until the parser has completed its initialization routines (if there are any).

At this point, the parser should be waiting for input lines. Plastic SCM will write line pairs in the parser standard input. The first line will contain the input path, pointing to the source code file retrieved from Plastic. The second line will contain the output path, indicating the expected path where the parsed results will appear. Finally, the parser should exit if a line containing just the keyword "end" is received.

The parser standard input will typically look like this:

C:\Users\Developer\AppData\Local\Temp\mysourcecode.lang
C:\Users\Developer\AppData\Local\Temp\parsed_mysourcecode.yaml
C:\Users\Developer\AppData\Local\Temp\infiniteloop.lang
C:\Users\Developer\AppData\Local\Temp\parsed_infiniteloop.yaml
end

The parser must write "OK" on the standard output after a pair of lines is received and the parse operation was successful, or "KO" if the parsing fails.

Parser output format

The parsing results are expected to be written in YAML. As Míryam -our semantic expert engineer- explains in our forum, we chose YAML over other options (plain text, JSON, XML) for three main reasons:

There are YAML parsers available to all languages
YAML is a superset of JSON, so anyone who likes JSON better may write their results in that format and Plastic would still be able to understand it!
It is human-readable.

All YAML output files will be encoded using UTF-8 and they can contain 3 types of data structures: file, container and terminal node.

File

It is the root node, unique and required.

Fields:

type - file
name - path of the file
locationSpan - row and column where the file starts and ends (optional)
footerSpan - start and end char where the file starts and ends
parsingErrorsDetected - Boolean, whether or not the file contains parsing errors
children - set of containers and/or terminal nodes inside the file. If there aren't any, this field shouldn't be specified.
parsingError - set of parsing errors (optional, see description below)

Container

Fields:

type - relevant, generic name of the container in the current programming language
name - actual name of the container
locationSpan - row and column where the container starts and ends (optional)
headerSpan - start and end chars where the header of the container starts and ends
footerSpan - start and end chars where the footer of the container starts and ends. This field should be set to [0, -1] if unexisting
children - set of containers and/or terminal nodes present inside the current container. If there aren’t any, this field shouldn’t be specified.

Terminal node

Fields:

type - relevant, generic name of the node in the current programming language
name - actual name of the node
locationSpan - row and column where the node starts and ends (optional)
span - start and end char where the node starts and ends

Parsing error

Fields:

location - row and column where the error was detected
message - error message

Sample output file

For instance, if we parsed the following Delphi contents:

unit Unit1;
interface

type
  TTest = class(TObject)
    procedure Test;
  end;

implementation

{ TTest }

procedure TTest.Test;
begin
  //
end;

end.

We would obtain the following YAML result (using \r\n as the line separators):

 
---
type : file
name : /path/to/file
locationSpan : {start: [1,0], end: [19,4]}
footerSpan : [0, -1]
parsingErrorsDetected : false
children :

  - type : unit
    name : Unit1
    locationSpan : {start: [1,0], end: [1,13]}
    span : [0, 12]

  - type : interface
    name : interface
    locationSpan : {start: [2,0], end: [9,0]}
    headerSpan : [13, 25]
    footerSpan : [0, -1]
    children :

      - type : type
        name : type
        locationSpan : {start: [4,0], end: [9,0]}
        headerSpan : [26, 33]
        footerSpan : [0, -1]
        children :

          - type : class
            name : TTest
            locationSpan : {start: [6,0], end: [9,0]}
            headerSpan : [34, 59]
            footerSpan : [81, 88]
            children :

              - type : procedure declaration
                name : Test
                locationSpan : {start: [7,0], end: [7,21]}
                span : [60, 80]

  - type : implementation
    name : implementation
    locationSpan : {start: [9,0], end: [19,4]}
    headerSpan : [89, 106]
    footerSpan : [164, 169]
    children :

      - type : procedure 
        name : TTest.Test
        locationSpan : {start: [11,0], end: [18,0]}
        span : [107, 163]

Connecting your parser to Plastic

Once everything is in place, you just need to tell Plastic where to find the parser executable and which file extensions should be matched.

To that matter, edit or create a file called externalparsers.conf in your local configuration directory (C:\Users\<your-username>\AppData\Local\plastic4). This file will contain file extensions and their associated parser executables. This is a valid example:

.pas=C:\Users\Developer\SemanticMergeDelphi\pas2yaml.exe
.js=C:\Program Files\JavaScriptMagicParser\bin\js2yaml.pas

Once Plastic is restarted, any controlled file with one of those extensions will be processed by the semantic engine. The related semantic version control information will be displayed on the diff views like any C#/Java/C controlled file. You've unleashed the power of Plastic!

Example: Delphi parser

To help you get familiar with this external parser system we're going to guide you through a real case scenario: using an actual Delphi language parser. This was developed as a result of a forum thread: http://www.plasticscm.net/index.php?/topic/1857-delphi-parser-development/ by André Mussche and Jeroen Vandezande (great job, guys!). This parser was initially developed for SemanticMerge, but since Plastic uses the same inner mechanism it's a perfect match for our external parser test. You can download the parser from GitHub: https://github.com/andremussche/SemanticMergeDelphi.

We'll assume the parser to be located at C:\Users\Miguel\SemanticMergeDelphi, so the parser executable path will be C:\Users\Miguel\SemanticMergeDelphi\pas2yaml.exe.

Now I'll edit the externalparsers.conf file (C:\Users\Miguel\AppData\Local\plastic4\externalparsers.conf) and add the following line:

.pas=C:\Users\Miguel\SemanticMergeDelphi\pas2yaml.exe

After that, open the Plastic GUI. We'll create a new workspace called semanticdelphi at C:\Users\Miguel\wkspaces\semanticdelphi, pointing to a new repository called semanticdelphi as well.

Clicking on the "OK" button will create the workspace and the Plastic GUI will open up afterwards. We'll use GitSync to retrieve the contents of a sample GitHub repository for our tests: https://github.com/fabriciocolombo/delphi-rest-client-api.

Once the replication is complete we'll go back to the workspace explorer and we'll update our workspace to download the source files.

Let's create some differences to test our new parser! I'll edit /src/HttpConnectionWinHttp.pas to reorder some procedure definitions and change their implementations. Let's have a look at the embedded diff view on the pending changes view:

Check that out! The mighty Plastic semantic engine read the output of the Delphi parser and then it has detected which methods were renamed, which ones were moved and which ones had their contents modified. For instance, THttpConnectionWinHttp.Get() was moved up 3 positions and then it had its body modified to include a new comment. You can click on the downward arrow next to the "C" (changed) symbol at the method signature (or control + click) to display just the differents of the method code:

But we're not going to stop right here! Let's see what Plastic is capable of. We're going to move some methods to a new file and add it to version control. Plastic is designed to analyze a set of files and extract refactors across all of them. We published this feature some days ago, you should definitely see it for yourself!

Back to our sample, I'm taking the HTTP method procedures out of /src/HttpConnectionWinHttp.pas and place them into a new file: /src/http/HttpMethods.pas. When the changes are saved on disk, we'll just add the new directory and file to version control. This is how the pending changes view looks like now:

As expected, the moved methods appear as removed. We'll check in these changes and we'll open the new changeset differences right after. Let's click on the "Analyze refactors" and see what happens:

It worked! You can see how Plastic found out what we had done and properly arranged all methods moved across files.

Now it's your turn! Find a suitable parser for your language of choice and let Plastic understand your code! Remember to stop by our other nice blogposts about semantic version control and tracking refactored code across files.

And don't forget to have some semantic fun!

Miguel González

Prior to become a Plastic hard-core developer, I worked in a consulting firm in France where I also finished his Computing Engineering master's degree.
I'm a Linux enthusiast (I was the one developing the Plastic SCM linux packages), heavy-metal guitar player on a band, LP collector, youtube expert and talented Plastic hacker.
You can find me at @TheRealMig_El.

5 comments:

PeteSeptember 9, 2015 at 12:06 PM
Does this mean you're not going to finish the C++ parser ?
Göran W.September 16, 2015 at 11:35 PM
Cool! I guess we would have to configure additional include paths for the parser, then. But I see the problem when you have to load includes from each of the involved changesets in a merge... Thanks for the comment, I was clearly too naive! :P

Branched Code

Thoughts on version control, software development, branching and merging from the Plastic dev team

Who we are

Custom languages in semantic version control

Tuesday, September 08, 2015 Miguel semantic 5 Comments

How to implement your custom language parser

Parser output format

File

Container

Terminal node

Parsing error

Sample output file

Connecting your parser to Plastic

Example: Delphi parser

Miguel González

Miguel

5 comments:

Popular Posts

Labels

Who we are

Custom languages in semantic version control Tuesday, September 08, 2015 Miguel semantic 5 Comments

How to implement your custom language parser

Parser output format

File

Container

Terminal node

Parsing error

Sample output file

Connecting your parser to Plastic

Example: Delphi parser

Miguel González

Miguel

5 comments:

Popular Posts

Labels

Custom languages in semantic version control

Tuesday, September 08, 2015 Miguel semantic 5 Comments