2024-08-24 12:53:32
blog.stephenturner.us
-
I wrote an R package that summarizes recent bioRxiv preprints using a locally running LLM via Ollama+ollamar, and produces a summary HTML report from a parameterized RMarkdown template. The package is on GitHub and you can install it with devtools: https://github.com/stephenturner/biorecap.
-
I published a paper about the package on arXiv: Turner, S. D. (2024). biorecap: an R package for summarizing bioRxiv preprints with a local LLM. arXiv, 2408.11707. https://doi.org/10.48550/arXiv.2408.11707.
-
I wrote both the package and the paper with assistance from LLMs: GitHub copilot for documentation, llama3.1:70b for tests, llama3.1:405b via HuggingFace assistants for drafting, GPT-4o for editing.
I recently started to explore prompting a local LLM (e.g. Llama3.1) from R via Ollama. Last week I wrote about how to do this, with a few contrived examples: (1) trying to figure out what’s interesting about a set of genes, and (2) summarizing a set of preprints retrieved from bioRxiv’s RSS feed. The gene set analysis really was contrived — as I mentioned in the footnote to that post, you’d never actually want to do a gene set analysis this way when there are plenty of well-understood first-principles approaches to gene set analysis.
The second example wasn’t so contrived. I subscribe to bioRxiv’s RSS feeds, along with many other journals and blogs in genetics, bioinformatics, data science, synthetic biology, and others. The fusillade of new preprints and peer-reviewed papers relevant to my work is relentless. Late last year bioRxiv started adding AI summaries to newly published preprints, but this required multiple clicks to get out of my RSS feed onto the paper’s landing page, another to click into the AI summary. I wanted a way to give me a quick TL;DR on all the recent papers published in particular subject areas (e.g., bioinformatics, genomics, or synthetic biology).
Shortly after putting together that one-off demo, on a sultry Sunday afternoon in Virginia too hot to do anything outside, I took the code from that post, generalized it a bit, and created the biorecap package: https://github.com/stephenturner/biorecap.
You can install it with devtools/remotes:
remotes::install_github("stephenturner/biorecap",
dependencies=TRUE)
Create a report with 2-sentence summaries of recent papers published in the bioinformatics, genomics, and synthetic biology sections, using llama3.1:8b running locally on your laptop:
my_subjects
This report will look different each time you run it, depending on what’s been published that day in the sections you’re interested in.
The package documentation provides further instructions on usage.
I mentioned I used LLMs to help write this package. I started out using Positron for package development, but quickly fell back to RStudio because I wanted GitHub copilot integration. Among other things, Copilot is great for quickly writing Roxygen documentation for relatively simple functions. I also used llama3.1:70b running locally via Open WebUI to help me write some of the unit tests using testthat. Starting from the code I worked out in the previous post, it took about 2 hours to get the package working, and another hour or so to write documentation and tests and set up GitHub actions for pkgdown deployment and R CMD checks on PRs to main.
I published a short paper about the package on arXiv: Turner, S. D. (2024). biorecap: an R package for summarizing bioRxiv preprints with a local LLM. arXiv, 2408.11707. https://doi.org/10.48550/arXiv.2408.11707.
I used two LLMs to assist me in writing the package, which itself uses an LLM to summarize research published on bioRxiv. It only makes sense to close the circle and use an LLM to help me write a preprint to publish on arXiv describing the software.
I’ll write a post soon about how to set up a llama3.1:405b-powered chatbot on HuggingFace connected to a GitHub repository to be able to ask questions about the codebase in that repo. It’s free and takes about 60 seconds or less. I started out doing this, asking for help crafting a narrative outline and introduction section for a paper based on the code in the repo. I ended up scrapping this and writing most of the first draft text myself, then using GPT-4o to help with editing, tone, and length reduction. It’s hard to exactly put my finger on it, but what I ended up with still had that feeling of “sounds like it was written by ChatGPT.” I did some of the final editing on my own, and used Mike Mahoney’s arXiv template Quarto plugin to write and typeset the final document.
Support Techcratic
If you find value in our blend of original insights (Techcratic articles and Techs Got To Eat), up-to-date daily curated articles, and the extensive technical work required to keep everything running smoothly, consider supporting Techcratic with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to future updates and improvements. I am committed to continually enhancing the site and staying at the forefront of trends to provide the best possible experience. Your generosity and commitment are deeply appreciated. Thank you!
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending any funds to ensure your donation is directed correctly.
Bitcoin QR Code
Your contribution is vital in supporting my efforts to deliver valuable content and manage the technical aspects of the site. To donate, simply scan the QR code below. Your generosity allows me to keep providing insightful articles and maintaining the server infrastructure that supports them.
Privacy and Security Disclaimer
- No Personal Information Collected: We do not collect any personal information or transaction details when you make a donation via Bitcoin. The Bitcoin address provided is used solely for receiving donations.
- Data Privacy: We do not store or process any personal data related to your Bitcoin transactions. All transactions are processed directly through the Bitcoin network, ensuring your privacy.
- Security Measures: We utilize industry-standard security practices to protect our Bitcoin address and ensure that your donations are received securely. However, we encourage you to exercise caution and verify the address before sending funds.
- Contact Us: If you have any concerns or questions about our donation process, please contact us via the Techcratic Contact form. We are here to assist you.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.