Git the importance of a commit

Git is a system centered around the idea of a save point, a commit. When you use the source control software you create many commits throughout the life cycle of the application. These save points serve a number of purposes such as showing what you did to create a given issue, this would be if you tying the commit to a system like Jira to track issues. They can also show you who did which pieces of an application in a team setting as each commit is stamped with the user making the commit. Probably the most important purpose is to provide a history of the application and specifically the different files within the repository at any given point in the development process.

Why is this important, you ask?

Ever accidentally delete a file off your file system and go “oh crap, I didn’t mean to do that”. In some cases you can go to the trash can of your favorite OS and get the file back but what if you accidentally hit “empty trash” when you only meant to delete a single file? what then? In this case you are pretty SOL.

With Git if you accidentally delete a file or an entire directory from the repository but it was present in another commit, you can get it back. If you do a pull and the pull action overwrites a file that you really wanted your changes and not the ones from the server, you can get the file back. Not only that but you can get the file back as part of the merge process. No matter what happens in Git as long as there is a commit in the repository with the state of the file that you want, you can get to that file or directory again.

Now some will say committing too much is a bad thing. Don’t know why they would but I would highly recommend that you commit often. Especially as you complete features and functionality, I would commit. These save points, these commits, can be life saving later.

So, if you run into a situation where you think your repository is toast, trashed, totaled, … but you have a commit with the files you want, know you can get those back.

How do you get them back? … well that depends…

depends on the situation and as I come across different situations I will post about them so you can check my blog for solutions.

Git rid of unwanted files…

In working with the apprentices that I work with a common issues is forgetting to add a .gitignore file when you initially setup the repository. This can suck in the long run as many files that you don’t want to track start showing up in your repository and worse yet in merge conflicts with other users. A great example of this is the files within the target/ folder in NetBeans (Java) or for my .NET, Visual Studio folks, the files within the bin/ directory. In either case these are the files that are getting built by the code that we wrote and we only want the code in the repository not the assemblies and jars, right… So how do we fix the mistake once it has already happened?…

To fix this issue you want to start by getting the .gitignore file in the repository pronto. To do this, why create it from scratch… instead here is a link to a github repo with many .gitignore files already created and ready for you copying pleasure.

github link

Find the .gitignore file that you need and add it to the repository but don’t commit just yet.

With the .gitignore file in play start running git rm to get rid of the unwanted files.

Example: git rm -rf MyProject/bin/

the git rm is to remove the files from the repository
-r will recursively traverse the directory for any subfolders
-f forces the files out
MyProject/bin/ is the directory to remove, this can also be a single file as well.

Now once you run this command you will see the files marked as deleted but they should not now nor ever show back up to be added to the repository with the .gitignore properly in the repository. Thus, it is safe to remove the files and forget they were ever part of the repo.

Special Note: in working in teams, make sure other members of the team pull your changes after deleting such files. If they don’t know to look for the deletions they may accidentally override your changes and all the files back. A good practice here would be to get everyone on the same commit, remove the files and then have everyone pull the new commit without the files and continue working.

New to Git… What is git?

Git is a source control management system, hence http://git-scm.com. By source this actually refers to any file of any type. We can add any file that we want to the system and begin tracking modifications to that file. As we create other related files we can add those to the system as well. This means that we can organize a group of files together and track them as a whole. From the perspective of a developer we can think of this as our source files for a project we are coding. For instance, in a C# project this would be all the .cs files. Now this grouping of files actually has a special name as well, a repository. Let’s review:

git – source control management software that tracks the modifications to one or more files

source – a file that we wish to track modifications through the software

repository – a group of files organized into a single set and tracked together as one group.

So how does git work? Well, let’s think about this in terms of the computer we are on and the file system that the computer has. A computer has a file system and consists of files and folders. Those files and folders are created to provide a desired structure so that we can easily find files that we are searching for. Git uses this structure for its repositories. A repository is really nothing more than a folder with files and other folders that we wish to track within the repository. Although, only the files are really being tracked for modifications, the repository can contain subfolders as well. A repository exists locally then and can be thought of as being on the machine we are working on. Therefore any modifications we make would be done locally on the computer we are working on.

So wait… we are going to work with everything locally? but what about portability to other machines, collaborating with other developers or just saving a copy to a central location so that if our machine’s hard drive is lost we don’t lose all our changes. Git solves this too… Git has the concept of a remote repository as well. A remote is a location that we intend to copy our changes to periodically. This could be another server within the network, an online host (such as GitHub or Bitbucket) or really any other directory that we want to copy the local repository. Our only interaction with this remote will be to copy (push) changes and retrieve (pull) changes from the remote. The remote itself will be a repository but one we won’t directly modify. Instead we work locally and push those changes that way.

Let’s take a look at a possible structure here.

Let’s start with a _repos folder. This folder will not be a git repository but instead be the folder that we create all our other repositories in. This way if we need work in more than one repository at a time we can differentiate between them using separate folders within this folder. For instance:

+ _repos
|-+ project1
|-+ project2

In this example we have two projects. Each directory, project1 and project2, are both git repositories. They are different repositories tracking files for two completely different projects. If we need to start working on another project we would just create another directory. The important thing is that we not put a repository within the folder of another repository (this is something more advanced and if we are just starting out, it is best to avoid). So we would never put project2 folder inside of project1. All the repos would go inside the _repos folder.

Once we get the folder structure of git we have a good base level knowledge to get setup and get git going with the basic commands… something I will save for another post.

Git Repository in a Repository Fix

In working with apprentices new to Git and really new to source control in general, I have come across a couple cases where someone will have a repository defined within a repository. In these cases the parent repository is not tracking the files within the subdirectory and there appears to be nothing that can be done to resolve this. To paint a better picture let’s work with this example:

project
|
+ sub_project

In this example there is a .git directory within both the project folder and the sub_project folder. Thus Git believes that these are separate repositories and in fact by design, although not correctly defined, believes that sub_project is a submodule of the project repository. I say not correctly defined since it cannot be found in a .gitmodules file. It only appears to be a submodule. We would have some work to do if we wanted it to be correctly defined. However, in the simple case we don’t want to do this and instead want to remove the repository from the subdirectory.

To fix the issue we need to delete the .git folder in the sub_project directory. We also need to remove the directory from the index of the project repository so that we may re-add the directory and be able to track the files within the sub_project directory in the project repository. To do this we can run the git rm sub_project command. Thus to fix the issue:

1. delete the .git folder from the subdirectory
2. run git rm sub_project
3. re-add the sub_project directory to the project repository
4. commit

Now things should be ok and we should be able to continue using our project repository with no issues. We just need to be careful not to create repositories within the project directory unless we really want to have submodules.

UPDATE: 9/11/2015

In doing this again I did have to run git rm with the --cached

git rm --cached sub_project

My Git commit isnt adding my files…

In some cases you may run the git commit --all command and think that it will stage and commit all the changes, deletions and new files that you have in your project. The truth is in the documentation of the git commit, though…

git commit documentation

-a
--all

Tell the command to automatically stage files that have been modified and deleted, but new files you have not told Git about are not affected.

This means that in order to add new files to the repository you can’t just run git commit. Instead you have to run git add and then git commit.