The big push for today is to get @sixvable’s PacBio CCS read pull request for Q2-DADA2 ready. Steps to go include: finishing the paper for the addition of PacBio CCS read denoising to DADA2, making a decision of refactoring to improve DRYness vs leaving as is to avoid monkeying with existing code/improving maintainability (I am leaning towards a bit of refactoring that produces the best of both), creating a subset of the Zymo dataset(as recommended by Ben Callahan, the author of DADA2 and PacBio CCS functionality + the accompanying paper), running the dataset through R and QIIME2 to ensure the same output, and writing some smoke tests that check that QIIME2 is getting the same results as R does (DADA2 is already tested extensively in R).
This is a really accurate method, ~ 50% of 1.5 kilo-base 16S rRNA sequencing reads were completely error free! So accurate that the author encountered too little resolution in the reference datasets. You do have to be careful of chemistry RSII, pre-P6-C4, and SMRT Portal generated reads do not have the same accuracy. Also, almost no homopolymeric repeats in 16S.
The authors were able to detect differences to the strain level, as opposed to the genus or higher level, consistently.
Oxford Nanopore generates longer but less accurate reads.
Short-read: Cheapest, per-sample depth allows for detection of rare community members.
Shotgun Metagenomics: Species and some variant resolution(reliant on suitable reference datasets), can provide information about functional genetic potential. With deep shotgun sequencing: some de novo assembly of community genomes?
Full Length: Combines the targeting of amplicon sequencing with the resolution achieved by shotgun approaches.
Could also be applied in other domains, such as generating complete oncogene sequences, instead of just detecting partial ones, identification of unknown patheogens, and possibly entire ~5kb 16S-ITS-23S region, which could encompass entire viral genomes, as well as a variety of other uses.
Fornutely Benjamin Callahan produced this awesome Github repo of the analysis for the paper!
Retrive mock Zymo sequencing run from NCBI Accension PRJNA521754. The direct download for the raw reads from the sequencer is here.
copy data strip the
generate manifest file
qiime tools import \ --type 'SampleData[SequencesWithQuality]' \ --input-path zymo_manifest.tsv \ --output-path zymo_raw.qza \ --input-format SingleEndFastqManifestPhred33V2
Working on pre-release activities w/Liz and Evan.
As a side note, if any
tag-dev on busywork is red, there is a good chance
that it is because it has already been run and it does not want to do it
Remember there are pre-built docs on
Getting a local copy of a PR:
git fetch $REPO_THE_PR_IS_ON pull/$PR_NUMBER/head:$DESIRED_BRANCH_NAME git checkout $DESIRED_BRANCH_NAME