Michael Sachs
2018-01-25
Interests:
However, if the selection of \(f\) from \(\mathcal{F}\) involves interacting with \(S\), then the estimate/inference may be biased.
Which ones give valid estimates?
All aspects of \(\mathcal{F}\) should be documented and reported, i.e., how is \(f\) selected?
Goals:
knit
to create results.doc
\(\rightarrow\) .docx
Plain-text formatting. Indicate what elements represent, not how they should look. Minimal, yet flexible, html and latex commands are interpreted correctly.
# headers, ## subheader, etc
, > blockquotes
_italics_, *italics*, __bold__, **bold**
![name](pathtoimage)
, [text](link)
- > +
, 1. 2. 3.
$\sum_{i=1}^nX_i/n$
= \(\sum_{i=1}^nX_i/n\)[@citekey]
, bibtex, endnote, others supportedThree backticks:
```{r my-first-chunk, results='asis'}
## code goes in here and gets evaluated
```
Inline code uses single backticks
Here I am using `#r rnorm(1)`
to generate a random digit: -0.89047. (Omit the pound sign)
Raw output using the mtcars
dataset:
```{r mtcars-example}
summary(lm(mpg ~ hp + wt, data = mtcars))
```
##
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.941 -1.600 -0.182 1.050 5.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
## hp -0.03177 0.00903 -3.519 0.00145 **
## wt -3.87783 0.63273 -6.129 1.12e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
Generate every in-text number from code.
Easy to reproduce results when data/assumptions change
The provenence of every result is clearly documented
paste_meansd <- function(x, digits = 2, na.rm = TRUE){
paste0(round(mean(x, na.rm = na.rm), digits),
" (", round(sd(x, na.rm = na.rm), digits), ")")
}
## The mean (sd) of a random sample of normals is `r paste_meansd(rnorm(100))`
The mean (sd) of a random sample of normals is -0.12 (0.99)
sprint_CI95 <- function(mu, se) {
lim <- mu + c(-1.96, 1.96) * se
sprintf("%.2f (95%% CI: %.2f to %.2f)", mu, lim[1], lim[2])
}
bfit <- lm(hp ~ disp, mtcars)
## The coefficient estimate is `r sprint_CI95(bfit$coeff[2], sqrt(diag(vcov(bfit)))[2])`
The coefficient estimate is 0.44 (95% CI: 0.32 to 0.56).
render('doc.Rmd')
format specified in front matterrender('doc.Rmd', format = 'word_document')
git
and Githubgit
and http://github.com make version control and collaboration easierA recent paper I worked on used data from a disease registry, which released “frozen” databases quarterly. While working on the revisions, a new database was released. I used to new database to update the analysis because it contained the most reliable and up to date information. After completing the revisions, I received this email from the lead author (this was in 2013 btw):
“As you can see from the paper I sent you, it is almost complete and I do not want to re-write it. Therefore, I just want the data described in the e-mail below from the June 1, 2011 data freeze. … Is it possible to reconstruct the data inquiry as per what was originally delivered?”
I had not saved prior versions of the analysis code, not to mention the manuscript with all of the results incorporated into the text. My only option at that point was to start over.
Applied papers that I’ve worked on had between 5 and 13 authors. Inevitably, a “final” draft of the manuscript (usually a Word document) gets circulated via email and comments or suggestions are solicited. Here are the typical types of responses that I get:
The challenge is to incorporate (or not) all of the changes from a variety of collaborators, while keeping a record of who has contributed what.
Once a paper gets published, occasionally people want to use or extend the method.
“I would be very grateful if you are able to help me implement this tool in my dataset as well.”
“Could you please send me your code so that I can try to apply it to my example?”
“Would you please kindly e-mail me your article and other works in that field promptly.?”
Email is an ineffective tool for sharing code, data, documents
git
git
is a structured approach to tracking content
git
Programs are meant to be read by humans and only incidentally for computers to execute.
-Donald Knuth
Your closest collaborator is you six months ago, but you don’t reply to emails.
-Paul Wilson