Save input as CSV file

For information about R-Index, see http://www.r-index.org/.

For information about TIVA, see replicationindex.wordpress.com.

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer.

The test statistics are converted to Cohen`s d (resp. Hedge`s g) wherever possible, based on the formulas provided by Borenstein, Hedges, Higgins, & Rothstein (2011). **Warning:** These effect size conversions are based on approximative formulas. Although they work good under many conditions, this cannot replace a proper meta-analysis!

- Everything before a colon is an identifier for the paper. For optimal parsing, it should have the format
`XXXX (YYYY) ZZ`

. Everything*before*the year in parenthesis (i.e.,`XXXX`

) is an ID for the paper. Everything*after*the year is an ID for the study within that paper. Example:`AB&C (2013) Study1`

. Test statistics with the same paper and study ID belong together (this is relevant for the R-Index). - By default a critical two-tailed p value of .05 is assumed; for one-tailed tests you can add
`; one-tailed`

(or shorter:`; 1t`

) to set the critical p-value to .10 - You can also directly define the critical p:
`; crit = .10`

, for example. - You can check whether a p value has been correctly reported when you provide the reported p value, for example
`p < .05`

, or`p = .037`

. - In general, all options should be written after the test statistic and be separated by semicolons, e.g.
`A&B (2001) Study1: t(88)=2.1; one-tailed; p < .02`

.

Possible test statistics:

- t-values:
`t(45)=3.4`

- F-values:
`F(1, 25)=4.4`

- Z-values:
`Z=2.02`

- chi2-values:
`chi2(1)=2.4`

- r-values:
`r(188)=0.276`

If two numbers are provided for chi2, the first are the dfs, the second is the sample size (e.g.,

`chi2(1, 326) = 3.8`

)
in the case of

`p(52) = 0.02`

)*All*p-values can be extracted, both from focal hypothesis tests and from ancillary analyses, such as manipulation checks. But only p values are extracted for which precise dfs are reported (i.e., results such as “Fs < 1, ps > .50” are*not*extracted).- Format:
- Study ID:
*teststatistic*; [optional] reported*p*value; [optional] critical*p*value; [optional, if one-tailed testing] one-tailed- [optional] reported
*p*value: e.g.,`p = .03`

, or`p < .05`

- [optional] critical
*p*value: e.g.,`crit = .10`

, or`crit = .08`

- [optional, if one-tailed testing]: write the keyword
`one-tailed`

, or just`one`

, or`1t`

- [optional] reported
- The colon separates study ID from everything else
- If the study ID starts with an underscore, this test statistic is
*not*a focal test (e.g., from a manipulation check, a pre-test, or an ancillary analysis for possible alternative explanations), and will not be included in R-Index or p-curve analyses (but it will be included in the test for correct p-values) - The first datum after the colon must be the test statistic
- All optional informations are separated by semicolons; can be given in any order
- At the end of a line a comment can be written after a # sign (everything after the # is ignored)

- Study ID:
- Examples:
- M&E (2005) S1: t(25) = 2.1; p < .05; one-tailed
- M&E (2005) S2: F(1, 45) = 4.56; p = .03 # wrong p value?
- M&E (2005) S3: chi2(1) = 3.7; crit=.10
- _M&X (2011) S1: r(123) = .08; p = .45 # this was a manipulation check (see underscore)

- Be careful if you
**copy & paste**the results from a PDF:- Sometimes there are invisible special characters. They are shown in the app as weird signs and must be removed.
- The minus sign sometimes looks a bit longer (an “em-dash”). This should be replaced with a standard minus sign.

- Which tests to select in the presence of
**interactions**? Some hints from Simonsohn et al.’s (2014) p-curve paper:- “When the researcher’s stated hypothesis is that the interaction attenuates the impact of X on Y (e.g., people always sweat more in summer, but less so indoors), the relevant test is whether the interaction is significant (Gelman & Stern, 2006), and hence p-curve must include only the interaction’s p-value. […] Simple effects from a study examining the attenuation of an effect should not be included in p-curve, as they bias p-curve to conclude evidential value is present even when it is not.”
- “When the researcher’s stated hypothesis is that the interaction reverses the impact of X on Y (e.g., people sweat more outdoors in the summer, but more indoors in the winter), the relevant test is whether the two simple effects of X on Y are of opposite sign and are significant, and so both simple effects’ p-values ought to go into p-curve. The interaction that is predicted to reverse the sign of an effect should not be included in p-curve, as it biases p-curve to conclude evidential value is present even when it is not.”

`t(123) = -2.8`

). This, however, is ignored in all analyses which are currently available. R-index, TIVA, and Hence, by now it is implicitly assumed that all effects go into the predicted direction. If you use

- Significant studies are used to determine the success rate in the R-index analyses. But sometimes marginally non-significant
*p*values (e.g.,*p*= .051) are falsely rounded downwards and cross the critical boundary only due to this error (i.e., they are reported as “p < .05)”). In this case, the ES does not count for a "success" (see column “significant” in the R-Index tab), as the actual*p*-value is not significant. But, if the ES has been (falsely) interpreted as significant by the original authors, the critical value can be slightly increased, so that the ES is also counted as a success in the R-Index analysis. In this case, increase the critical level to`crit = .055`

for example. This decision (whether "near significant" studies, that are falsely interpreted as significant, should be included as "successes" in the R-index analysis) should be made a priori.

```
```

When you do an actual analysis, remember:

- It is
*not OK*to search for single papers which score low on a certain index ("cherry-picking"), and to single out these papers. Sampling variation applies to papers as well, and it can occur by chance that some rare combinations of results are found. - Always analyze papers with a defendable a priori inclusion criterion, e.g.: "All papers from an certain journal issue, which have more than 2 studies", or "The 10 most cited papers of a working group".
- Disclose the inclusion rule.
- Take care what p-values can be included. p-curve, for example, assumes the independence of p-values. That means, you usually only extract one p-value per sample.
- In general: RTFM of the tests you do!

I strongly recommend to read Simonsohn et al.`s (2014) p-curve paper. They have sensible recommendations and rules of thumb which papers and test statistics to include in an analysis.

This Shiny app implements the

p-curve code is to a large extent adapted or copied from Uri Simonsohn (see here). TIVA code adapted from Moritz Heene.

Schönbrodt, F. D. (2015).

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer.

Schimmack, U. (2014).

I cross-validated the results with p-curve.com and did not find differences (unsurprisingly, as I use Uri`s code for p-curve to a large extent). With a single click (see the "Export" tab) you can transfer the test statistics to p-curve.com and cross-validate the results yourself. I also checked the results with the R-Index Excel-sheet and did not find differences so far.

Nonetheless, this app could contain errors and a healthy scepticism towards the results is indicated. I always recommend to perform some plausibility checks. Feel free to go to the source code and check the validity yourself. If you suspect a bug or encounter errors, please send me an email with your test statistics and a description of the error.

*Non-hacked JPSP data*: See Simonsohn, Nelson, & Simmons (2014), Figure 3B. Retrieved from http://www.p-curve.com/Supplement/full_pdt.xlsx*855 t-tests*: See Wetzels, Matzke, Lee, Rouder, Iverson, & Wagenmakers (2011). Retrieved from http://www.ejwagenmakers.com/2011/effectsize_data.zip*Elderly priming*: See Lakens, D. (2014). Professors are Not Elderly: Evaluating the Evidential Value of Two Social Priming Effects Through P-Curve Analyses. Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2381936. Data available at https://osf.io/3urp2/

Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E.-J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests.

Lakens, D. (2014). Professors are not elderly: Evaluating the evidential value of two social priming effects through p-curve analyses. doi: http://dx.doi.org/10.2139/ssrn.2381936. Retrieved from http://ssrn.com/abstract=2381936

- Changed TIVA computation to log(p), which allows much smaller p-values (thanks to Rickard Carlsson @RickCarlsson for pointing out the bug).
- Added power posing p-curve data from Joe Simmons and Uri Simonsohn (see http://datacolada.org/37)

- New "test statistic": You can now directly enter p-values (optionally with df in parentheses), based on a suggestion by Roger Giner-Sorolla. If df are provided, an effect size is computed based on a approximative conversion formula by (see here).

Examples: - p=.034
- p(48)=.024

- Added 33% (or other) theoretical p-curve in plot
- Moved comparison-power-slider to standard controls

- Included Begg's test for publication bias
- Fixed bug in effect size plot
- "Send to p-curve" now links to app4
- Much improved parser (at least 100x faster)

- TODO code clean-up: Clearly separate the inference functions from UI functions