How Does Git Work
I’m working with Git now but only for my personal projects and those I have on GitHub. At work we still use TFS and SVN (as of now).
Git is a distributed version control system, meaning your local copy of code is a complete version control repository. These fully-functional local repositories make it is easy to work offline or remotely. You commit your work locally, and then sync your copy of the repository with the copy on the server. For one, the command line is the only place you can run all Git commands — most of the GUIs implement only a partial subset of Git functionality for simplicity. If you know how to run the command-line version, you can probably also figure out how to run the GUI version, while the opposite is not necessarily true. Git checkout -b — creates a new branch from the current HEAD, and switches the working directory to the new branch. Git diff – — shows the difference of between the working directory and the given branch. Git checkout – — checks out files from the given branch into the working directory.
Recently came to our company to hold a course about Agile planning and since Git was quite new to most of my mates, he also quickly explained Git in the context of refactoring. I really liked his approach of explaining it and that’s why I’d like to replicate his explanation here. Just before we start.How is Git different from other VCS (Version Control Systems)? Probably the most obvious difference is that Git is distributed (unlike SVN or TFS for instance). This means, you’ll have a local repository which lives inside a special folder named.git and you’ll normally (but not necessarily) have a remote, central repository where different collaborators may contribute their code. Note that each of those contributors has an exact clone of the repository on their local workstation.Git itself can be imagined as something that sits on top of your file system and manipulates files.
Even better, you can imagine Git as a tree structure where each commit creates a new node in that tree. Nearly all Git commands actually serve to navigate on this tree and to manipulate it accordingly.As such in this tutorial I’d like to take a look at how Git works by viewing a Git repository from the point of view of the tree it constructs. To do so I walk through some common use cases like. adding/modifying a new file. creating and merging a branch with and without merge conflicts. Viewing the history/changelog. Performing a rollback to a certain commit.
Sharing/synching your code to a remote/central repositoryBefore starting here, I highly recommend to first go through the initial pages of the, especially the. Illustration of the main three states your Git versioned file's lifecycle TerminologyHere’s the git terminology:.
master - the repository’s main branch. Depending on the work flow it is the one people work on or the one where the integration happens. clone - copies an existing git repository, normally from some remote location to your local environment.
commit - submitting files to the repository (the local one); in other VCS it is often referred to as “checkin”. fetch or pull - is like “update” or “get latest” in other VCS. The difference between fetch and pull is that pull combines both, fetching the latest code from a remote repo as well as performs the merging.
push - is used to submit the code to a remote repository. remote - these are “remote” locations of your repository, normally on some central server. SHA - every commit or node in the Git tree is identified by a unique SHA key. You can use them in various commands in order to manipulate a specific node. head - is a reference to the node to which our working space of the repository currently points. branch - is just like in other VCS with the difference that a branch in Git is actually nothing more special than a particular label on a given node.
It is not a physical copy of the files as in other popular VCS.Workstation SetupI do not want to go into the details of setting up your workstation as there are numerous tools which partly vary on the different platforms. For this post I perform all of the operations on the command line. Even if you’re not the shell-guy you should give it a try (it never hurts;) ).To setup command line Git access simply go to where you’ll find the required downloads for your OS. More detailed information can be found.After everything is set up and you have “git” in your PATH environment variable, then the first thing you have to do is to config git with your name and email: $ git config -global user.name 'Juri Strumpflohner'$ git config -global user.email 'Let’s get started: Create a new Git RepositoryBefore starting, lets create a new directory where the git repository will live and cd into it: $ mkdir mygitrepo$ cd mygitrepoNow we’re ready to initialize a brand new git repository. $ git initInitialized empty Git repository in c:/projects/mystuff/temprepos/mygitrepo/.git/We can check for the current status of the git repository by using $ git status# On branch master## Initial commit#nothing to commit (create/copy files and use 'git add' to track)Create and commit a new fileThe next step is to create a new file and add some content to it. $ touch hallo.txt$ echo Hello, world!
hallo.txtAgain, checking for the status now reveals the following $ git status# On branch master## Initial commit## Untracked files:# (use 'git add.' To include in what will be committed)## hallo.txtnothing added to commit but untracked files present (use 'git add' to track)To “register” the file for committing we need to add it to git using $ git add hallo.txtChecking for the status now indicates that the file is ready to be committed: $ git status# On branch master## Initial commit## Changes to be committed:# (use 'git rm -cached.' To unstage)## new file: hallo.txt#We can now commit it to the repository $ git commit -m 'Add my first file'1 file changed, 1 insertion(+)create mode 100644 hallo.txtIt is common practice to use the “presence” in commit messages.
So rather than writing “added my first file” we write “add my first file”.So if we now step back for a second and take a look at the tree we would have the following. State of the repo tree after 1st commitThere is one node where the “label” master points to. Add another fileLets add another file: $ echo 'Hi, I'm another file' anotherfile.txt$ git add.$ git commit -m 'add another file with some other content'1 file changed, 1 insertion(+)create mode 100644 anotherfile.txtBtw, note that this time I used git add. Which adds all files in the current directory (.).From the point of view of the tree we now have another node and master has moved on to that one.Create a (feature)branchBranching and merging is what makes Git so powerful and for what it has been optimized, being a distributed version control system (VCS). Indeed, feature branches are quite popular to be used with Git.
Feature branches are created for every new kind of functionality you’re going to add to your system and they are normally deleted afterwards once the feature is merged back into the main integration branch (normally the master branch). The advantage is that you can experiment with new functionality in a separated, isolated “playground” and quickly switch back and forth to the original “master” branch when needed.
How Does Git Repository Work
Moreover, it can be easily discarded again (in case it is not needed) by simply dropping the feature branch. There’s a nice article on which you should definitely read.But lets get started. First of all I create the new feature branch: $ git branch my-feature-branchExecuting $ git branch. mastermy-feature-branchwe get a list of branches. The.
in front of master indicates that we’re currently on that branch. Lets switch to my-feature-branch instead: $ git checkout my-feature-branchSwitched to branch 'my-feature-branch'Again $ git branchmaster. my-feature-branchNote you can directly use the command git checkout -b my-feature-branch to create and checkout a new branch in one step.What’s different to other VCS is that there is only one working directory.
All of your branches live in the same one and there is not a separate folder for each branch you create. Instead, when you switch between branches, Git will replace the content of your working directory to reflect the one in the branch you’re switching to.Lets modify one of our existing files $ echo 'Hi' hallo.txt$ cat hallo.txtHello, world!Hiand then commit it to our new branch $ git commit -a -m 'modify file adding hi'2fa266a modify file adding hi1 file changed, 1 insertion(+)Note, this time I used the git commit -a -m to add and commit a modification in one step. This works only on files that have already been added to the git repo before. New files won’t be added this way and need an explicit git add as seen before.What about our tree?So far everything seems pretty normal and we still have a straight line in the tree, but note that now master remained where it was and we moved forward with my-feature-branch.Lets switch back to master and modify the same file there as well.
$ git checkout masterSwitched to branch 'master'As expected, hallo.txt is unmodified: $ cat hallo.txtHello, world!Lets change and commit it on master as well (this will generate a nice conflict later). $ echo 'Hi I was changed in master' hallo.txt$ git commit -a -m 'add line on hallo.txt'c8616db add line on hallo.txt1 file changed, 1 insertion(+)Our tree now visualizes the branch:Polishing your feature branch commitsWhen you create your own, personal feature branch you’re allowed to do as much commits as you want, even with kinda dirty commit messages. This is a really powerful approach as you can jump back to any point in your dev cycle. However, once you’re ready to merge back to master you should polish your commit history. This is done with the rebase command like this: git rebase -i HEADThe following animated GIF shows how do do it: Demo on cleaning up your commit history Merge and resolve conflictsThe next step would be to merge our feature branch back into master. This is done by using the merge command $ git merge my-feature-branchAuto-merging hallo.txtCONFLICT (content): Merge conflict in hallo.txtAutomatic merge failed; fix conflicts and then commit the result.As expected, we have a merge conflict in hallo.txt. Hello, world!
my-feature-branchLets resolve it: Hello, world!Hi I was changed in masterHi.and then commit it $ git commit -a -m 'resolve merge conflicts'master 6834fb2 resolve merge conflictsThe tree reflects our merge. Fig 1: Tree state after the merge Jump to a certain commitLets assume we want to jump back to a given commit. We can use the git log command to get all the SHA identifiers that uniquely identify each node in the tree.
Git BasicsSo, what is Git in a nutshell? This is an important section to absorb, because if you understand what Git is and the fundamentals of how it works, then using Git effectively will probably be much easier for you. As you learn Git, try to clear your mind of the things you may know about other VCSs, such as Subversion and Perforce; doing so will help you avoid subtle confusion when using the tool.
Git stores and thinks about information much differently than these other systems, even though the user interface is fairly similar; understanding those differences will help prevent you from becoming confused while using it.The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time, as illustrated in Figure 1-4.Figure 1-4. Other systems tend to store data as changes to a base version of each file.Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem.
Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again—just a link to the previous identical file it has already stored.
Git thinks about its data more like Figure 1-5.Figure 1-5. Git stores data as snapshots of the project over time.This is an important distinction between Git and nearly all other VCSs. It makes Git reconsider almost every aspect of version control that most other systems copied from the previous generation.
This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS. We’ll explore some of the benefits you gain by thinking of your data this way when we cover Git branching in Chapter 3.Most operations in Git only need local files and resources to operate — generally no information is needed from another computer on your network.
If you’re used to a CVCS where most operations have that network latency overhead, this aspect of Git will make you think that the gods of speed have blessed Git with unworldly powers. Because you have the entire history of the project right there on your local disk, most operations seem almost instantaneous.For example, to browse the history of the project, Git doesn’t need to go out to the server to get the history and display it for you—it simply reads it directly from your local database. This means you see the project history almost instantly. If you want to see the changes introduced between the current version of a file and the file a month ago, Git can look up the file a month ago and do a local difference calculation, instead of having to either ask a remote server to do it or pull an older version of the file from the remote server to do it locally.This also means that there is very little you can’t do if you’re offline or off VPN. If you get on an airplane or a train and want to do a little work, you can commit happily until you get to a network connection to upload.
If you go home and can’t get your VPN client working properly, you can still work. In many other systems, doing so is either impossible or painful. In Perforce, for example, you can’t do much when you aren’t connected to the server; and in Subversion and CVS, you can edit files, but you can’t commit changes to your database (because your database is offline). This may not seem like a huge deal, but you may be surprised what a big difference it can make.Everything in Git is check-summed before it is stored and is then referred to by that checksum. This means it’s impossible to change the contents of any file or directory without Git knowing about it. This functionality is built into Git at the lowest levels and is integral to its philosophy. You can’t lose information in transit or get file corruption without Git being able to detect it.The mechanism that Git uses for this checksumming is called a SHA-1 hash.
This is a 40-character string composed of hexadecimal characters (0–9 and a–f) and calculated based on the contents of a file or directory structure in Git. A SHA-1 hash looks something like this: 24b9daaa4cd6d3b00373You will see these hash values all over the place in Git because it uses them so much. In fact, Git stores everything not by file name but in the Git database addressable by the hash value of its contents.When you do actions in Git, nearly all of them only add data to the Git database. It is very difficult to get the system to do anything that is not undoable or to make it erase data in any way.
As in any VCS, you can lose or mess up changes you haven’t committed yet; but after you commit a snapshot into Git, it is very difficult to lose, especially if you regularly push your database to another repository.This makes using Git a joy because we know we can experiment without the danger of severely screwing things up. For a more in-depth look at how Git stores its data and how you can recover data that seems lost, see Chapter 9.Now, pay attention. This is the main thing to remember about Git if you want the rest of your learning process to go smoothly. Git has three main states that your files can reside in: committed, modified, and staged. Committed means that the data is safely stored in your local database. Modified means that you have changed the file but have not committed it to your database yet. Staged means that you have marked a modified file in its current version to go into your next commit snapshot.This leads us to the three main sections of a Git project: the Git directory, the working directory, and the staging area.Figure 1-6.
Working directory, staging area, and Git directory.The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.The staging area is a simple file, generally contained in your Git directory, that stores information about what will go into your next commit. It’s sometimes referred to as the index, but it’s becoming standard to refer to it as the staging area.The basic Git workflow goes something like this:. You modify files in your working directory.
How Does Git Work On Facebook
You stage the files, adding snapshots of them to your staging area. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.If a particular version of a file is in the Git directory, it’s considered committed. If it’s modified but has been added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified. In Chapter 2, you’ll learn more about these states and how you can either take advantage of them or skip the staged part entirely.