- New logging functions:
logger()andlastlog()inspired by @levibaruch - New
test_paper()for creating paper objects with specfic test text
- FReD replication database and associated functions now renamed to
FLoRA() - Various bug fixes discovered when running modules on large numbers of papers (e.g., handling when zero references have DOIs)
- Modules "function_check" and "coi_check" reverted to the rtransparent versions (the re-written version were overinclusive and need more development).
reports()now takes a paperlist and makes a report from each- New
report_module_run()andreport_qmd()break down thereport()function to allow separation of module output lists and creation of QMD report from them (might be changed to internal functions). - Ability to select returned columns in
crossref_query() - Module "ref_accuracy" now returns info for references with missing DOIs that were found by ref_doi_check
- Module "code_check" split into "repo_check" and "code_check"
lmm()allows you to set the model to any provider or provider/model supported by ellmer (must have appropriate *****_API_KEY set in your Renviron)lmm()arguments have changed to align withellmer::chat()argumentslmm_models()now returns models from all platforms for which you have a valid API key set- The power module uses a new prompt that utilises a JSON schema for power
- Updated report styles
- New
github_links()function to find github references in a paper. code_checkmodule very much improved - checks SAS and STATA code in OSF, researchbox, and github repos.powermodule much improved- New modules:
coi_check,funding_check - New functions
extract_p_values()andextract_urls(), so now no need to useall_p_valuesandall_urlsmodules to get their tables. These modules remain because they are used in demos, but may be deprecated soon.
- Enhanced module help
- "ref_replication" module no longer warns about replications if you have cited them.
- Extensive chenges to clen up tests.
get_doi()has been removed in favour ofcrossref_query(), to look up crossref info by bibliographic query, andcrossref_doi(), to look up crossref info by DOI.scroll_table()changed arguments.heightis removed andscroll_abovechanged tomaxrows. It not paginates above maxrows (default = 2), rather than scrolling within a fixed height. This is a more accessible solution, since scrolling is hard with touchscreens and it's often hard to copy text in a scroll window. We will continually improve this with further user feedback.- Fixed a bunch of small problems with modules and let the report render even with errors
- Updated the report template with light and dark themes (set to user preference)
- The module
reference_checkis split intoref_doi_checkandref_accuracy. - Lots of modules got renamed so they have a consistent format.
json_expand()updated to handle LLM JSON errors more gracefully.- You can pass arguments to modules via
report()now with the newargsargument. - New
get_prev_outputs()module helper function - Updated the vignettes.
- Modules
aspredictedandretractionwatchare removed, as they are superseded byprereg_checkandreference_check. - The module
nonsignificant_pvaluehas changed tononsig_p - The default modules in a report have changed.
- A new module report helper,
format_ref()for displaying references in bibentry or bibtex formats - The ref column of the bib table in paper objects is now the bibentry for a reference, not just the formatted text. This will allow for more formatting options.
- Efficiency improvements to the OSF functions
- Fixed some confusing parts of the articles that changed when the module output report structure changed.
- Modules are now categorised by section: general, intro, method, results, discussion, reference
- Reports are organised by section
- Display improvement in reports
- Module report improvement (e.g., fixing broken links)
- New example report on the pkgdown website
- Lots of changes for how reports are formatted
- In module output,
summaryis nowsummary_table - Fixed a bug where some .docx file wouldn't read in (support for Word files is still patchy -- ideally render to PDF)
- New
pubpeer_comments()function (now vectorised) - Module helpers:
scroll_table(),collapse_section(),link(),plural(),pb()
- Package name changed to metacheck!
- Fixed a bug in
osf_file_download()when multiple files have the same name andignore_folder_structure = TRUE. osf_file_download()should handle errors more gracefully (with warnings, but not fail)
openalex()results now includeabstract, which parses the abstract_inverted_index for you
- New module:
miscitationto detect commonly mis-cited papers (a proof-of-concept) - New module:
powerto detect and classify power analyses (currently being validated) - New module:
aspredictedto get structured data from AsPredicted preregistrations (mainly for info) module_template()creates a module file from a templateorcid_person()gets details from an ORCiD, such as name, emails, countryosf_preprint_list()returns a table of preprints from the OSF optionally filtered by archive and dates created or modified- Added an API wrapper - it is now possible to run papercheck functions and modules via a REST API. See
inst/plumber/README.mdfor details. - Added documentation and plumber/Docker quickstart for the API
- Changes to
module_find()to find potential modules in the working directory and ./modules/ - Changes to
effectsizemodule so text of the potential effect size is given inmod_output$table$es(mod_output$summary$ttests_nandmod_output$summary$Ftests_ncolumns removed, as they are just the sum of*tests_with_esand*tests_without_es) pdf2grobid()now gives more useful information in the warning if some files do not convert when converting more than one PDF- Changed parameter names in pdf2grobid to be consistently snake_case (consolidate_headers etc.) whilst keeping backward compatibility for the old camelCase (consolidateHeaders etc.)
- Fixed warning messages in
osf_checkmodule when there are no OSF links - Fixed a problem in module_report() that happens when the table returned from module_run() has no rows
- Fixed a bug that crashed
stat_table()function by generating a summary table in case of empty stat table
- If
expand_text()doesn't find a text match because sentence location info is missing, it now returns the original text instead of NA - Fixed a bug that prevented matching xrefs sentences under some circumstances (when there was an initial with a full stop in the citation) -- re-run
read()on XMLs to update any saved paper objects psychsciupdated for these fixes- Changed
retractionwatchinternal data toretractionwatch()function (aliasrw()) to support user updating. - Added new function
rw_date()so you can find out when retractionwatch was last updated - New function
rw_update()lets you update retractionwatch yourself
pdf2grobid()handlessave_pathbatter if any path components don't exist yet. The argumentsave_pathalso now can take a vector of the same length as the number of PDFs to convert, so you can specify the name of each output XML.read()now skips any imports with errors and warns you about them after importing all files- Fixed a bug that errored on read() when bibentry files don't format correctly
- Function
osf_get_all_pages()now has a new argumentpage_endto limit the number of pages retrieved (mainly for testing purposes), and is external (previously internal) - Fixed a bug in
osf_files()that failed on paths with spaces - Fixed a bug in
read()that duplicated entries in xrefs
osf_file_download()now also retrieves files from linked storage- Removed the last dependency to {osfr} and updated
osf_check_id()to return expected IDs from various URLs - OSF functions added to getting started vignette
- Functions that require and API are now tested using httptest
- module_list() doesn't fail if there are any errors in the modules
- Updated
read()to parse more stupid date formats that turn up in the submission string (and added the unparsed submission string back just in case) - Completely overhauled how paper objects handle references.
- the
paper$referencetable is nowpaper$bib - the
paper$citationstable is nowpaper$xrefsand also contains information for internal cross-references to figures, tables, footnotes, and formulae - the
ref_idandbib_idin both tables is nowxref_id - the
xrefstable also contains location information (section, div, p, s) for the sentence containing the cross-ref, so you can useexpand_text() - The
read()function now returns paper objects with these new tables, so you will need to re-read any XML files (if you have stored the papercheck list as Rdata) - The
psychsciobject has been updated for this new format - Modules and vignettes have been updated as well
- the
- Fixed a bug in
expand_text()where expanded sentences were duplicated if there are multiple matches from the same sentence in the data frame. - Updated the
retractionwatchtable - Fixed a bug in
read()that omitted paper DOIs from paper$info - Updated
read()to add correctly parsed "accepted" and "received" dates to paper$info (replaces paper$submission string) (ISO 8601 is the only correct date format!) - Updated
psychscifor new info structure
- Small bug fixes to
osf_file_download() osf_file_download()now returns a table of file info, including info for files not downloaded because of file size limits
- Added
read()function, which superceedsread_grobid(),read_cermine()andread_text()(they are still available, but are now just aliases toread()). This should work with XML files in TEI (grobid), JATS APA-DTD, NLM-DTD and cermine formats, plus full text-only parsing of .docx and plain text files. - Added
osf_file_download()function, which downloads all files under a project or node and structures them the same as the project.
- Updated
read_grobid()to classify headers as intro, method, results, discussion with better accuracy (to handle garbled headers) - Updated
pdf2grobid()to allow some grobid parameters - Updated the module "all_p_values" to handle more scientific notation formats
- Functions to check ResearchBox.org (
rbox_links()andrbox_retrieve()) -- very preliminary - The module "all_p_values" now returns the p-value as a numeric column
p_valueand the comparator asp_comp, like "exact_p"
- fixed some bugs in osf and aspredicted functions (mainly around dealing with private or empty projects)
- added rvest dependency for better webpage parsing
- changed name of resulting column from
summarize_contents()frombest_guesstofile_category
- New
aspredicted_links()andaspredicted_retrieve()functions - New related blog post
- General bug fixes in newer stuff
- Updated license to AGPL (GNU Affero General Public License)
- When reading a paper with
read_grobid(), the paper$references table now contains new columns for bibtype, title, journal, year, and authors to facilitate reference checks, and more reliably pulls DOIs. - The
psychsciset has been updated for the new reference tables - fixed bug in
info_table()where adding "id" to the items argument borked the id column - Added
json_expand()function to expand JSON-formatted LLM responses - Updated the LLM examples in the vignettes
- Added
find_projectargument toosf_retrieve()to make searching for the parent project optional (it takes 1+ API calls) - Added
emojisfor convenience
- Revised the OSF functions again!
- Organised the Reference section of the website
- Added some blog posts to the website
- Upgraded the "osf_check" module to give more info
- Totally re-wrote the OSF functions
- New OSF functions and vignette
- Build pkgdown manually
- Fixed a bug in
validate()that returned incorrect summary stats if the data type of an expected column didn't match the data type of an observed column (e.g., double vs integer) - Combined the two effect size modules into "effect_size"
- Renamed the module "imprecise_p" to "exact_p" (I keep typo-ing "imprecise")
- Added a loading message
- Added code coverage at https://app.codecov.io/gh/scienceverse/papercheck
- updated "all_p_values" to handle unicode operators like <=or >>
- Updated default llm model to llama-3.3-70b-versatile (old one is being deprecated in August)
- Updated reporting function for modules to show the summary table
- Fixes a bug in
validate()that returned FALSE for matches if the expected and observed results were bothNA - Added two preliminary modules: "effect_size_ttest" and "effect_size_ftest"
- removed the llm_summarise module
- updated
papercheck_app()to show all modules - removed the LLM tab from the shiny app
- fixed a bug in
pdf2grobid()where a custom grobid_url was not used in batch processing psychsciobject updated to use XMLs from grobid 0.8.2, which fixes some grobid-related errors in PDF import
validate()function is updated for the new module structure- the validation, metascience, and text_model vignettes are updated
- modules can now use relative paths (to their own location) to access helper files
- The way modules are created has been majorly changed -- it is now very similar to R package functions, using roxygen for documentation, instead of JSON format. There is no longer a need to distinguish text search, code, and LLM types of modules, they all use code. The vignettes have been updated to reflect this.
- Modules now return a
summarytable that is appended to a master summary table if you chain modules likepsychsci |> module_run("all_p_values") |> module_run("marginal") - The
validate()function is temporarily removed to adapt the workflow to the new summary tables. - new
module_help()function and some help/examples in modules - new
module_info()helper function - new
paperlist()function to create paper list objects - paper lists now print as a table of IDs, titles, and DOIs
- updated
read_grobid()to have fewer false positives for citations - updated
retractionwatch
- Now reads in grobid XMLs that have badly parsed figures
- updated the shiny app for recent changes
openalex()takes paper objects, paper lists, and vectors of DOIs as input, not just a single DOI- fixed paper object naming problem when nested files are not all at the same depth
- added
read_cermine()as associated internal functions for reading cermine-formatted XMLs
- New functions for exploring github repositories:
github_repo(),github_readme(),github_languages(),github_files(),github_info() - A new vignette about github functions
read_grobid()now includes figure and table captions, plus footnotes, in the full_text table- the
psychscipaper list object is updated to include the above - The functions that
module_run()delegates to now check and only pass valid arguments
- modules are now updated for clearer output, and added a new module vignette
llm()no longer returns NA when the rate limit is hit, but slows down queries accordinglyread_grobid()now includes back matter (e.g., acknowledgements, COI statements) in the full_text, so is searchable withsearch_text()- references are now converted to bibtex format, so are more complete and consistent
- Machine-learning module types are removed (the python/reticulate setup was too complex for many users), and instructions for how to create simple text feature models is included in the metascience vignette
- added
author_table()to get a dataframe of author info from a list of paper objects - fixed a bunch of tests now that multiple matches in a sentence are possible
- added back text (acknowledgements, annex, funding notes) to the full_text of a paper
- Fixed a bug in
search_text()that omitted duplicate matches in the same sentence when using results = "match" - Upgraded the search string for the "all-p-values" module to not error when a numeric value is followed by "-"
- Error catching for
stats()related to the above problem (and filed an issue on statcheck) - URLs in grobid XML are now converted to "" using the source url, not the text url, which is often mangled
- added
psychscidataset of 250 open access papers from Psychological Science - added "all" option the the return argument of
search_text() - added
info_table()to get a dataframe of info from a list of paper objects - experimental functions for text prediction:
distinctive_words()andtext_features()
- Removed ChatGPT and added groq support
- Updated
llm()and associated functions likellm_models() - Working on div vs section aggregation for
search_text()
- metascience and batch vignettes
- removed scienceverse as a dependency
- revised validation functions
- added
tl_accuracy()
- Added
expand_text()
- Added
validate()function and vignette