2019-11-27

git

Your closest collaborator is you six months ago, but you don’t reply to emails.
    -Paul Wilson

git and Github

  • git and http://github.com make version control and collaboration easier
  • Package development requires iteration and maybe input from multiple sources
  • Ad hoc approaches suffer and can have disastrous consequences
  • Here are some examples

Example 1: Reverting changes

  • In an attempt to speed up computation time, I tried reimplementing a MLE function.
  • Turns out it broke the method entirely and didn’t end up speeding things up.
  • How to go back to what it was originally?

Example 2: Incorporating feature requests or edits

  • A package gets popular on github, and I occasionally get feature requests or contributions
  • How to keep track of the requests?
  • How to keep track of contributions from other people?

Example 3: Sharing content

Once a paper gets published, occasionally people want to use or extend the method.

“I would be very grateful if you are able to help me implement this tool in my dataset as well.”

“Could you please send me your code so that I can try to apply it to my example?”

“Would you please kindly e-mail me your article and other works in that field promptly.?”

Email is an ineffective tool for sharing code, data, documents

What is it?

git

  • “The stupid content tracker”, developed to manage Linux source code
    • Files organized into repositories
    • Users commit changes, additions, deletions
    • Entire history of commits saved
    • Only successive diffs are saved, not full copies

Github

  • A web interface and host for repositories
    • Explore repositories
    • View code, documents, etc.
    • Interact with collaborators

Why use it?

  • Add structure to the iterative process of package development
    • are you sensing a trend?
  • Creates a record of all the steps you took
  • Gives structure to collaboration, with benefits for asynchronous work
  • Github adds the benefit of creating a web-presence
    • Not everyone agrees that this is a benefit

Open up Rstudio

How often should I commit?

git is not dropbox!

Each commit should be minimally complete

Minimal

  • Relates to a single problem
  • Makes it easier to describe and understand
  • Reduces risk of conflicts (if collaborating)

Complete

  • Solves the problem in the code
  • Is documented
  • Has tests

Common problems

Oops I forgot to document

  • use “Amend previous commit” before pushing to github

I broke someting

  • do a “hard reset” back to the previous commit (Discard all in Rstudio)

A collaborator is working on something else at the same time

  • use branches and merge later

Writing good commit messages

  • The diff shows you exactly what changed
  • The commit message should communicate the context

Basic guidelines

  1. Write a concise subject heading (about 50 characters)
    • Complete the following sentence: “If applied, this commit will …”
  2. If more explanation needed, write a body to explain why and how
  3. Reference any relevant issue tracking ids, or links

Examples

Bad Better
Fix bug Ensure that sampling function works for small datasets
changing documentation Improve documentation with a vignette tutorial
typo Add a comma to README
  • Use the imperative tense (like you’re telling someone what to do)
  • Be concise, but detailed enough to trigger your memory

Collaborating with Github

The lifecycle of a paper

  • I’m working on a paper with my friend Sue
  • Statistical analysis + simulation studies + discussion
  • I decide to take the lead and create an initial repository
  • Add a draft .tex file, some analyses in .R with data and output
  • Commit at appropriate stopping points

Branching and merging

Main branch is called master
Sue creates a new branch called sues-working. Adding, editing, commiting

Commits do not affect the other branch

branch

Push all of her changes to github, then submit a pull request.

Pull request

pull

Changes

pull2

merge

Smaller contributions

propose

Issues

issue

Summary

  • git is a structured approach to tracking content
    • Small committment to learn and use
    • But benefits are enormous
    • … especially if using plain text files
  • github is a web interface and repository host
    • Adds value to git
    • It’s not Dropbox, formal and structure commits + discussion
    • Everything is public
    • Alternatives: bitbucket, gitlab, self-hosting, local-only