Using Git and Github for LaTeX writing

The problematic

As an academic, I spend my life writing papers. Since these papers are mostly about maths or some applications of maths, I am an extensive user of latex.

This documents summarize my current workflows for latex automations and version control.

Is’nt version control used to write code and not latex documents ?

Yes, tools like SVN and git where developped to version control source code. But the latex code you are writting is also a source code – it compiles to a full document, but it has the same structure. In fact, TeX is a turing-complete language.

Git+latex is a wonderfull workflow and idea. Beside standard git recomendations (make commit messages relevant, etc), three guidelines are usually found online when talking about latex and git :

  • Use .gitignore and do not commit compilations files, including the paper.pdf file.
  • Add a line break in your source after each comma and point instead of the usual “one sentence per line”. This makes your document “taller”, and allows git diffs to work more efficiently.
  • Use branches as you’ll use them for code : to developp a new idea, change a theorem, etc.. and keep on master the most finished version of the paper.

But some less evident things might also simplify drastically your life.

Journal template, arXiv export, etc. get their own branches.

Of course, I used branches for new features and potential ideas and changes that i’m not ready to merge into the main paper yet. If some other people are working with me, branches and PR are a very good co-wrting framework.

But I also use special branches to keep modifications that are specific to the publisher side of things apart from the rest.

Indeed, The journal might request changes from you: change the template via another documentclass and several files, change “Figure X” to “Fig. X” everywhere, use that convention instead of this one, etc. Each journal as its own guidelines and ca be picky on these stuffs.

Therefore, I keep these commits on a specific branch, the journalname branch. Then, if the journal is unhappy and reject my work, I can just delete the branch and voila. If on the other hand they ask for reviews, I can still use a fast-forward merge.

ArXiv is known to be a little picky to compile the files you give them – they have old versions of packages and stuff… Therefore, they deserve their own branch with exactly the same structure.

These journal branches are not supposed to be merged back into master: I merge master into all of them regularly (or when I need them), usually through rebase.

Using Semver tags for the paper

I use tags a lot, and I follow some kind of semantic versionning major.minor.patch

  • A patch increase represent an internal and non-adding change: Maybe I rewrote a section, or changed a proof because it was wrong, or something like that. I do not always use these tags.
  • A minor increase represent new content : I wrote a new theorem, a new proof, a new section, or I added a different example, a new property, a new remark, etc.
  • major increase are reserved for publication status : usually, 1.0 is the version that corresponds to the preprint, the one that is published on arXiv and submitted to the journal, and 2.0 correspond to the result after taking into account the first review. If there is no second round, this is also the last version (otherwise there will be a 3.0).

Latexdiff script that exploit these tags

The main reason I use this semver rules is that I use latexdiff to compile changes. The script I have writes :

#!/bin/bash    
git show $(git describe --tag HEAD^ --abbrev=0):paper.tex > tmp.tex
C:/latexdiff/bin/latexdiff -t UNDERLINE --verbose --no-links --flatten tmp.tex paper.tex > tmpdiff.tex
latexmk -pdf -f -interaction=nonstopmode tmpdiff.tex
mv -v tmpdiff.pdf diff.pdf
rm -v tmp.tex tmpdiff.* # this line might be ignore while developping.

The simple code produce a latexdiff file that compares the checked out version to the previous major tagged version. Of course, you could modify it to always compare to the same version.

But latexdiff can be faulty sometimes and will be a pain if you do not run it regulary. Since you dont want to run it everytime you do some changes, since it takes time, you might end up in a position where the diff produced does not compile anymore and you do not know why.

I promise, finding the faulty commit can be a pain (maybe use git bisect to simply the opperation). To prevent these thigs to happend, I run latexdiff on github workflow.

Automate latexdiff runs with github workflow

I use the following github workflow :

name: Build LaTeX document & latexdiff to previous tagged version.
on: [push, pull_request]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout new version
        uses: actions/checkout@v2
        with:
          path: new
          fetch-depth: 0
      - name: Set last tag value
        run: |
          echo ::set-output name=LAST_TAG_NAME::"$(cd new && git describe --tag HEAD^ --abbrev=0)"
        id: tag_val_stp
      - name: Checkout old version
        uses: actions/checkout@v2
        with:
          path: old
          ref: ${{ steps.tag_val_stp.outputs.LAST_TAG_NAME }}
      - name: Compile LaTeX document
        uses: xu-cheng/latex-action@v2
        with:
          root_file: paper.tex
          working_directory: new/
      - name: Install latexdiff
        run: sudo apt-get install latexdiff
      - name: Run latexdiff
        run: latexdiff -t UNDERLINE --verbose --no-links --flatten old/paper.tex new/paper.tex > new/diff.tex
      - name: Compile LaTeX difference document
        uses: xu-cheng/latex-action@v2
        with:
          root_file: diff.tex
          working_directory: new/
      - name: Check pdf files
        run: |
          file new/diff.pdf | grep -q ' PDF '
      - name: Upload
        uses: actions/upload-artifact@v2
        with:
          name: paper
          path: |
            new/diff.pdf
            new/paper.pdf

This workflow allows to upload these files as artifacts, but also to ensure that the latexdiff always compile if i do not run it locally regularly enough;

I also receive a mail if latexdiff or, worse, the paper compilation, does not work anymore.

Another option is to publish as a website these produced files, so that you have fixed links to share the last version (even with a private repo).

Version the code alongside the paper

Usually my papers contain some numeric applications, that relies on code : It might be R code, Python, Matlab or Julia.

I split this code in two parts: one part, that I publish somewhere else, is usually a kind of package that is supposed to be usable by others. But there is always a part that generates data and graphs directly for the paper. This part should not go inside the package, it should be versionned alongside the latex document.

I usually use a code/ folder, that produces things (images, tables..) inside an assets folder.

This ensure full reproducibility of the research. I do not generally make the source of the paper avalibale before full publication, but when I do these files are along-side it and can be used to reproduce my results directly.

Add a this.bib file to the repo that is presented in the readme.

The readme of the repo might not be as usefull as for a package project as not many people will see it: you and your coauthors mainly, since we talk about privates repos.

However it might be usefull to keep alongside the paper a this.bib file, that contains the reference to this paper. Create it after the first publication as a preprint, and update it when the paper is published.

The point is that, someday, I’ll write a script that go through all my repos and gathers these this.bib files and populate my publication list with them on my personal website and CV. But this is work for another day…

Oskar Laverny
Oskar Laverny
Actuary, P.h.d.

What would be the dependence structure between quality of code and quantity of coffee ?

comments powered by Disqus