This post details how Business Central stores git repositories. Such information is useful for advanced users and admins that can fine-tune Business Central to address their needs. In this post, you’ll also learn why Business Central requires an internal git infrastructure.
Digging Deeper
To manipulate git content, Business Central uses internally the NIO2 API with a custom file system provider. And that’s the reason why the default storage directory is named .niogit. By default, this directory is created in the application working directory, but you can change both, name and location, using the following environment variables:
Variable name | Description |
---|---|
org.uberfire.nio.git.dir | The location of the directory .niogit |
org.uberfire.nio.git.dirname | The name of the git directory |
All repositories are stored inside this directory, and we use the concept of Spaces to organize them in folders. So if you create a new project named Example inside the default MySpace space, you’ll be able to have direct access to it on .niogit/MySpace/Example.git/
. The same is valid for all internal repositories that are under system space, for example, .niogit/system/datasources.git/
.
Important: never interact directly with the .niogit directory. As a web application with multiple concurrent users, Business Central needs total visibility and control over all git operations, and direct manipulation into .niogit aren’t visible to Business Central, what may cause data corruption and potential data loss! The most useful you can do with the location of the .niogit is to set up a backup plan for it. If you need access to the repository content outside Business Central, use Git (read-only) or SSH (read-write) protocols to clone it,
Why an internal git?
Once people learn about .niogit and a bit of internal Business Central git operation, the most follow-up question is: Why an internal git? The natural way of think, based on everyday common git usage, is to connect Business Central directly to an existing external centralized git infrastructure like GitHub or GitLab. Well… unfortunately not, and here I’ll try to put some light on the reasons behind it.
Git Distributed Architecture
The git architecture is probably the most fundamental aspect of the internal Business Central. Git is designed to be a fully distributed version control system, which means that there isn’t a centralized git infra, as every clone or fork has no more or no less than the central version of it. So, in other words, we could say that Business Central does not have an internal git, it just uses git.
This might be a controversial point, but since the early days of git in Business Central we always allowed users to clone external repositories, what proves the point that it doesn’t necessarily have to own the repository creation itself.
Of course this is a simplified statement because Business Central requires a specific data format to operate and it’s frequently easier to create a repository using Business Central, export it and then re-import if needed. But the point is that Business Central doesn’t require exclusive ownership of repositories. The data format limitation that usually forces the create-export-import flow is because of the build system that will be covered in a future post.
If you find hard to understand the implications of the distribution aspect of git with the above-oversimplified statements, sorry, I’ve failed to explain it. However, looks like I’m not the only one that has a hard time to explain the implications of a distributed version control system. If you want to understand it better, including the complexity of the distributed part, please check this phenomenal talk from Linus Torvalds about git.
Lack of bidirectional git sync
This is also an essential aspect of the enforced git ownership of Business Central: there is no safe way to create, out-of-the-box solution that provides bidirectional sync that is reliable and safe in multi-purpose workflows. The risk to offer something like this would result in conflicts and ultimately a high chance of data not being updated correctly.
Even products dedicated exclusively to git doesn’t provide such features, you can take a look at what GitLab documentation says about this here.
Does it mean it’s not possible? No, it might be possible to create a bi-directional solution for a pre-defined rigid workflow, but this would need to be tailored for a use-case and can’t be provided out-of-the-box. The other option would enforce a single pre-defined workflow in Business Central and behind the scenes guarantee the bidirectional behavior. However, offer one solution fits all in Business Central could potentially please some users, but certainly would bother others.
Multiple users
Multiple users collaborating in the same repository at the same time requires some level of coordination, as our abstraction on top of git can only commit fast-forward. Also, we want to make sure that users aren’t exposed to internal details of storage, especially the less technical users.
To avoid unexpected errors, like your current git is out of sync, rebase before needed… the best option is to have total control of the repository interaction and coordinate (queuing) change requests and make sure (using internal locks) that just a single operation can be executed at the same time. With this approach we can automatically reconcile the next action on top of the previous, creating a linear fast-forward only history.
Simultaneous Collaborative Modeling
This topic may be a little sidetrack; however, as we’re talking here about internal aspects of git, it’s also an excellent opportunity to explain why providing simultaneous collaborative modeling experience, like Google Docs, wouldn’t have a direct impact on how Business Central handles Git.
Although this feature is indeed in our plans, this would not solve any potential git issue, as the technical restrictions pointed above would still be valid.
The reason for that is simple, because even if we could have Google Docs like behavior in our modeling editors, a different file still could be possibly opened by a third user, and the control of operations would still be required.
And even this is not enough to convince about it; there’s also the fact that we enable external push via SSH, what means that to support external access we’d still have to have this level of control, regardless near real-time collaborative editors.
Complex data reconciliation
Business Central authoring is composed of several graphical modeling tools like DMN, BPMN and Decision Tables. Those graphical tools, as expected, stores data usually in complex XML or equivalent format, what makes virtually impossible to just diff and merge potential conflicts between different versions of different users that were working in the same file.
The solution for that would provide a visual tool that would allow graphical modeling merge, however, build such a tool wouldn’t be trivial.
Bare repositories
In this last section, we’ll cover the internal data structure used by Business Central to store and manipulate all git content. If you had the opportunity to explore the content of the .niogit directory, you probably noticed that repositories aren’t exactly looking like a regularly checked repository that most git users are familiar with. But just before, let’s make sure that we’re on the same page and understand how a regular git checked repository works.
When you, for example, clone a repository in your local file system, by default, git creates a repository with the same name of the repository and it automatically checkout the default branch (usually master) into the directory. As you probably know you can change branches and the content of the checked-out branch will be available. As we can interchange different branches, clearly that git has to store the content of the (and the history) somewhere, as all you can see in the directory is the content of the chosen branch.
The whole content of the repository (versions, branches, tags, etc.) is all available inside the .git directory. The content of this directory is the bare format, in other words: just the data storage. Consider the bare the real storage format, and the checked content the way to manipulate data and, once done with your changes, you move the content back to the storage (using the commit command).
Why understand this is important? Because now you know that, on every checkout or commit, there’s an IO operation that extracts the content from/to the storage (.git folder) to the checked dir.
Now that we’re on the same page about bare repositories let’s explore the two main reasons for Business Central use that format.
Avoid double I/O operation
As we just learned, inside any checked repository, you have the .git directory that where data is stored, every interaction on the file system has to be moved back to the storage. This is how all git users do, and once all changes are ready, they store the changes (commit). However, Business Central is a web tooling, and there’s not precisely isolation of individual users, actually many users interact in a single repository at a virtually same time.
If we could solve the problem pointed in the next section (multi-user, multi-branch) we would still necessarily execute a double I/O operation, once in the file system, followed by one that moves the changes from the file system to the storage.
While this is an acceptable compromise for local use, this would be a significative waste of resource in a web environment.
Multiple Users and Branches
Now the most complex issue that forced Business Central to use the bare format is how to enable multiple users to interact in the same repository safely. Multiple users changing files at same time, the state of local file system could be easily messed by different users changing same data at same time, and those local changes (not yet committed) don’t have history, so we could potentially see users steps in other users toe and possibly lost essential data.
True that could be theoretically possible to workaround this issue by creating some level of hard lock mechanisms that wouldn’t allow users to open a file that another one is editing. However, it’s impractical to do similar with different branches, as it’s a two-dimensional problem, while the file system is only one dimension.
Wrapping up
This post is part of a series of posts about Business Central and Git. In this post, we explored the where, why, what and how Business Central handles it’s internal git repositories. Such information gives you a deep understanding of the architectural decisions and technical limitations behind how Business Central uses git.