Who we are

We are the developers of Plastic SCM, a full version control stack (not a Git variant). We work on the strongest branching and merging you can find, and a core that doesn't cringe with huge binaries and repos. We also develop the GUIs, mergetools and everything needed to give you the full version control stack.

If you want to give it a try, download it from here.

We also code SemanticMerge, and the gmaster Git client.

Plastic repo spelunking with PowerBI

Monday, September 05, 2016 Modesto San Juan 0 Comments

Code repositories can give you lots of useful information about how you work with your code. This post will explain how to extract part of this information and how to visualize it using powerful tools such as Power BI.

Prerequisites

Getting the information

The Plastic SCM command line tool allows you to export all the information you need from the commits. Unlike Git, Plastic SCM allows you to export logs in XML format so you can parse from Power BI without having to develop a parser.

This command will export all logs since the first changeset in xml format for all branches:

cm log --from=cs:0 --xml=commits.xml --allbranches

For this example, we will use Roslyn's repository with more than 17k commits at this time. This is a sample of the content of the commits.xml file:

<?xml version="1.0" encoding="Windows-1252"?>
<LogList>
  <Changeset>
    <ObjId>1660352</ObjId>
    <ChangesetId>18439</ChangesetId>
    <Branch>/main</Branch>
    <Comment>Merge pull request #12979 from CyrusNajmabadi/farCleanup Small cleanup of streaming FAR.</Comment>
    <Owner>GitHub</Owner>
    <GUID>0c1f6908-20fb-4e0a-817e-d0730f4edb4c</GUID>
    <Changes>
      <Item>
        <Branch>/main</Branch>
        <RevNo>18435</RevNo>
        <Owner>CyrusNajmabadi</Owner>
        <RevId>1660045</RevId>
        <ParentRevId>1626987</ParentRevId>
        <SrcCmPath>/src/Features/Core/Portable/FindReferences/DefinitionItem.DocumentLocationDefinitionItem.cs</SrcCmPath>
        <SrcParentItemId>1619775</SrcParentItemId>
        <DstCmPath>/src/Features/Core/Portable/FindReferences/DefinitionItem.DocumentLocationDefinitionItem.cs</DstCmPath>
        <DstParentItemId>1619775</DstParentItemId>
        <Date>2016-08-07T14:23:23.0000000+01:00</Date>
        <Type>Changed</Type>
      </Item>
      ...
    </Changes>
    <Date>2016-08-08T03:52:21.0000000+01:00</Date>
  </Changeset>
  ...
</LogList>

As you can see, every changeset contains its date, branch name, author, comments and every file added, deleted, changed or moved.

Getting some visuals with Power BI

With all the information in the XML format, you can use Power BI to easily obtain some visuals. Power BI allows you to import data in XML format and display it in multiple formats. To do this you need to:

  1. Click the Get Data button.
  2. Choose the XML format.
  3. Select the file.

    Once you have selected the file, Power BI Desktop will show you the Navigator dialog with a preview of the data. If it is a very large XML file, this will take a while.

  4. Check Changeset and load the data.

Our first query with Power BI

Now that you have imported the data, you can do a sample query. To create a query:

  1. Open the query editor pressing the Edit Queries button in the Ribbon.
  2. Rename the query from the left pane and name it as Authors.

    Since we want to know who are the top 10 committers, the next step will be group by Author and get the count. To do this, you must:

  3. Click on the Group By button.
  4. Select Owner.
    • Notice that the new column name has changed to Count. You will use it later.
    • After grouping by Author, you will get two columns: author and commits per author, so you can sort descending by the second column and get the top 10 authors.

    After these steps, you have your first query.

  5. Click the Close & Apply button to go back to the Power BI Desktop main window and add a visualization.

Adding a visualization is very easy; you only need to:

  1. Select the Pie Chart visualization.
  2. Drag Owner to Details and Count to Values to get something like this:

This sample chart has something like a bug. Here there is duplicated user CyrusNajmadabi. This is because we have two owners CyrusNajmadabi and Cyrus Najmadabi. In this case, the difference is only one space, but in other cases there can be more differences.

To avoid this kind of data errors, there are more than one solution. In this case, we are going to choose an easy one from Power BI:

  1. Open the Query Editor and select the Owner column.
  2. Select Replace Values from the ribbon and enter the name without spaces in the Value To Find text box and the fixed name in the Replace With text box.
  3. Select Ok to apply.

Now you have two rows with the same name, this is because in the Applied Steps panel, Replace Value is applied after Grouped Rows. You only need to:

  1. Drag and drop the Replace Value step before Applied Steps.
  2. Press Close & Apply in the ribbon.

Now we have a pie chart with only one Cyrus Najmadabi as expected.

Querying commits with more files

Let's make another query to get commits with more files changed. To do this, you need to open the Query Editor and create a new query or duplicate an existing query. Both actions can be performed by clicking the right button in the Queries panel. If you duplicate the Authors query, you must remove all the previous steps you created in the previous example.

To create this query, you need to:

  1. Expand the Changes column twice. The first is to expand the item node from the XML and then another time to expand the information about every modified file.
  2. Group by ChangesetId adding a column to count files.
  3. Return to the main screen and add a stacked column chart like this:

More committed files

We are going to use the last example. What if you want to get the most committed files over the time? You only need to:

  1. Create another query and group by Date and Changes.Item.DstCmPath:
  2. Create a graph and a table. The table will have Changes.Item.DstCmPath and Count of Changed times as values:

Some tips and details

Power BI provides you with some features that will allow you to get a better user experience.

Filters

As you can see, ".csproj" files are at the top of the table. If you want to ignore these files, you can modify your query and add a filter or you can add a filter to the table. The first one will apply to all visuals that use the query and the second one will only apply to the visual with the filter:

Rename Fields

If you want to modify the name of a field to improve readability, it's possible to do through the field's panel:

Conclusions

With tools like Power BI, you can create dashboards full of information about your repositories. You can get all information you need; the only limitation is your knowledge about Power BI Desktop and its features. If you need something more powerful than the XML format, you can always put your data in a database so it will be easier and the performance will be better.

Plastic SCM provides an easy method to export information in an easy-to-analyze format. If you want to do the same with Git, you will need to create a parser for its csv format or use a library like LibGit to extract the desired information.

Some references

If you want to explore further on how you can go extract information from your repositories, we recommend that you read Your Code as a Crime Scene by Adam Tornhill.

If you are a F# lover, Vicenç García Altés has some good examples about this topic explaining how to parse and extract information using F# and Git repositories.

0 comentarios: