Jekyll2022-08-15T17:40:45+00:00keeganevans.com/feed.xmlKeegan Evans - HomeWrite an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.Blog2022-02-23T00:00:00+00:002022-02-23T00:00:00+00:00keeganevans.com/2022/02/23/blog<h2 id="to-start">To start</h2>
<p>Started the day off skiing the first good snow we have had this year with
Joey. A gentle white out and lots of soft, lumpy snow with bits of untouched
powder to be found. I definitely am not as smooth in the soft snow as as I
would like yet, but it was fun. Fatter skies would help.</p>
<h2 id="work">Work</h2>
<p>Wrapping up release activities, which while tedious took Liz and I a lot less
time than it has historically. Not much other than responding to forum posts
and some reading:</p>
<ul>
<li><a href="https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c">A One-Stop Shop for Principal Component Analysis</a> turned out to be a pretty good overview of PCA.</li>
<li><a href="https://www.nature.com/articles/s41467-019-12669-6">Species abundance information improves sequence taxonomy classification
accuracy</a></li>
</ul>To startBlog2022-02-22T00:00:00+00:002022-02-22T00:00:00+00:00keeganevans.com/2022/02/22/blog<p>I am trying wo</p>I am trying woBlog2022-02-21T00:00:00+00:002022-02-21T00:00:00+00:00keeganevans.com/2022/02/21/blog<p>There were essentially just 2 things that happened today:</p>
<ol>
<li>Finished reading chapter 7 from <em>Digital Computer Electronics</em>.</li>
<li>Finished up the QIIME2-2022.2 release.</li>
</ol>
<p><em>DCE</em> has thus far been a very straight forward look at how computers operate
at the level of the metal. Ch 7 was on flip-flops, which while not the most
interesting chapter is a crucial foundational piece for being able to store
information. Working through this book really has improved my mental model of
how a computer fundementally operates.</p>
<p>Considering that Matt is now on his sabattical, this release has gone pretty
smoothly. For the most part it feels like Liz and I are able to do most things
and are pretty quick to figure out when we <em>do not know</em>. It is nice to be
able to fail fast.</p>There were essentially just 2 things that happened today:Blog2022-02-18T00:00:00+00:002022-02-18T00:00:00+00:00keeganevans.com/2022/02/18/blog<h3 id="subsampling-script">Subsampling script</h3>
<p>Ok, the subsampling script that I had been plannin on working on yesterday did
not really go anywhere then, though I did get the q2-dada2 expected data
output setup. So here we go on the subsampling stuff. Here are some design
points:</p>
<ul>
<li>
<p>Inputs: a manifest of the fastq files and the desired number of reads
from each.</p>
</li>
<li>
<p>create directory to store subsampled files.</p>
</li>
<li>
<p>read files from manifest and subsample them one at a time.</p>
</li>
<li>
<p>If any of the fastqs are unzipped, unzip them(maybe un-necessary if <code class="language-plaintext highlighter-rouge">sk-bio</code>
can read compressed files.</p>
</li>
<li>
<p>Open each file using <code class="language-plaintext highlighter-rouge">skbio.io.read</code></p>
</li>
<li>
<p>reservoir sample each file to the target number of reads. This function
should be a generator that takes the read file generator as input.</p>
</li>
<li>
<p>write back out to file in the subsampled directory.</p>
</li>
<li>
<p>compress new file(maybe handled by skbio?)</p>
</li>
<li>
<p>write entry in a subsampled manifest.</p>
</li>
</ul>Subsampling scriptBlog2022-02-17T00:00:00+00:002022-02-17T00:00:00+00:00keeganevans.com/2022/02/17/blog<h3 id="q2-dada2">q2-dada2</h3>
<p>Work on scripts to automate the production of test data for the smoke-screen
tests in <code class="language-plaintext highlighter-rouge">q2-dada2</code>.</p>
<p>The first of these are used to subset data from larger fastq files, so that
these smaller datasets can be run in a reasonable amount of time. Eventually
this might be added to <code class="language-plaintext highlighter-rouge">dev tools</code>.</p>
<p>The other essentially hack into the <code class="language-plaintext highlighter-rouge">_denoise_helper</code> function and write the
raw data to <code class="language-plaintext highlighter-rouge">.fasta</code> and <code class="language-plaintext highlighter-rouge">.tsv</code> files when running denoise. This data than can
be used as the expected data for the tests.</p>
<h3 id="helpful-vim-thingy">Helpful Vim thingy:</h3>
<p><code class="language-plaintext highlighter-rouge">:set list</code></p>
<p>shows whitespaces!</p>q2-dada2Blog2022-02-15T00:00:00+00:002022-02-15T00:00:00+00:00keeganevans.com/2022/02/15/blog<h3 id="subsetting-the-data">subsetting the data</h3>
<p>Alright, I got a bit ahead of myself importing the data yesterday. Because I
am using the data to run tests, I want to create a subset containing only
about 20 datapoints. To do this I want to get random data points from the set:</p>
<pre><code class="language-{python}">import random
x = 20
selected_elements = []
while x > 0:
selected_elements.append(
random.randrange((1, 309812, 4))
x -= 1
[140445, 279801, 99949, 128401, 22385, 113897, 7493, 143441, 245621, 220889,
155237, 205217, 92565, 77941, 173313, 181325, 263481, 49605, 178705, 153261]
</code></pre>
<p>Then, save these sequences(all 4 lines!) into a new test data fastq file, put
this in the manifest and import and run the tests on this file.</p>
<h3 id="import-test-data">import test data</h3>
<pre><code class="language-{bash}">qiime tools import \
--type 'SampleData[SequencesWithQuality]' \
--input-path zymo_manifest.tsv \
--output-path zymo_test_data.qza \
--input-format SingleEndFastqManifestPhred33V2
</code></pre>subsetting the dataBlog2022-02-14T00:00:00+00:002022-02-14T00:00:00+00:00keeganevans.com/2022/02/14/blog<h2 id="q2-dada2--pacbio-ccs-reads">Q2-DADA2 + PacBio CCS Reads</h2>
<p>The big push for today is to get
<a href="https://github.com/qiime2/q2-dada2/pull/135">@sixvable</a>’s PacBio CCS read
pull request for Q2-DADA2 ready. Steps to go include: finishing <a href="https://academic.oup.com/nar/article/47/18/e103/5527971?login=false">the
paper</a>
for the addition of PacBio CCS read denoising to DADA2, making a decision of
refactoring to improve DRYness vs leaving as is to avoid monkeying with
existing code/improving maintainability (I am leaning towards a bit of
refactoring that produces the best of both), creating a subset of the Zymo
dataset(as recommended by Ben Callahan, the author of DADA2 and PacBio CCS
functionality + the accompanying paper), running the dataset through R and
QIIME2 to ensure the same output, and writing some smoke tests that check
that QIIME2 is getting the same results as R does (DADA2 is already tested
extensively in R).</p>
<h3 id="paper">Paper</h3>
<p>This is a really accurate method, ~ 50% of 1.5 kilo-base 16S rRNA sequencing
reads were completely error free! So accurate that the author encountered too
little resolution in the reference datasets. You do have to be careful of
chemistry RSII, pre-P6-C4, and SMRT Portal generated reads do not have the
same accuracy. Also, almost no homopolymeric repeats in 16S.</p>
<p>The authors were able to detect differences to the strain level, as opposed to
the genus or higher level, consistently.</p>
<p>Oxford Nanopore generates longer but less accurate reads.</p>
<h4 id="advantages-and-disadvantages-of-sequencing-techniques">Advantages and Disadvantages of Sequencing Techniques:</h4>
<ul>
<li>
<p><strong>Short-read</strong>: Cheapest, per-sample depth allows for detection of rare
community members.</p>
</li>
<li>
<p><strong>Shotgun Metagenomics</strong>: Species and some variant resolution(reliant on
suitable reference datasets), can provide information
about functional genetic potential. With deep shotgun sequencing: some
de novo assembly of community genomes?</p>
</li>
<li>
<p><strong>Full Length</strong>: Combines the targeting of amplicon sequencing with the
resolution achieved by shotgun approaches.</p>
</li>
</ul>
<p>Could also be applied in other domains, such as generating complete oncogene
sequences, instead of just detecting partial ones, identification of unknown
patheogens, and possibly entire ~5kb 16S-ITS-23S region, which could encompass
entire viral genomes, as well as a variety of other uses.</p>
<h3 id="producing-a-subset-of-data">Producing a Subset of Data</h3>
<p>Fornutely Benjamin Callahan produced this <a href="https://github.com/benjjneb/LRASManuscript">awesome Github
repo</a> of the analysis for the
paper!</p>
<ol>
<li>
<p>Retrive mock Zymo sequencing run from <a href="https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA521754">NCBI Accension
PRJNA521754</a>.
The direct download for the raw reads from the sequencer is
<a href="https://sra-pub-src-1.s3.amazonaws.com/SRR9089357/zymo_CCS_99_9.fastq.gz.1">here</a>.</p>
</li>
<li>
<p>copy data strip the <code class="language-plaintext highlighter-rouge">.1</code> suffix.</p>
</li>
<li>
<p>generate manifest file</p>
</li>
<li>
<p>import:</p>
</li>
</ol>
<pre><code class="language-{bash}">qiime tools import \
--type 'SampleData[SequencesWithQuality]' \
--input-path zymo_manifest.tsv \
--output-path zymo_raw.qza \
--input-format SingleEndFastqManifestPhred33V2
</code></pre>
<h2 id="qiime2-20222-release-stuff">QIIME2 2022.2 Release Stuff</h2>
<p>Working on pre-release activities w/Liz and Evan.</p>
<p>As a side note, if any <code class="language-plaintext highlighter-rouge">tag-dev</code> on busywork is red, there is a good chance
that it is because it has already been run and it does not want to do it
again.</p>
<p>Remember there are pre-built docs on <code class="language-plaintext highlighter-rouge">ghost</code>.</p>
<h2 id="ancillary-tidbits">Ancillary Tidbits</h2>
<p>Getting a local copy of a PR:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git fetch $REPO_THE_PR_IS_ON pull/$PR_NUMBER/head:$DESIRED_BRANCH_NAME
git checkout $DESIRED_BRANCH_NAME
</code></pre></div></div>Q2-DADA2 + PacBio CCS Readslearning to gui2022-02-13T00:00:00+00:002022-02-13T00:00:00+00:00keeganevans.com/software/2022/02/13/Learning-to-GUI<p>I started learning about programming seriously<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> as an adult 3 years ago.
Not quite to the day, but pretty close. I started off with Zed Shaw’s <em>Learn
Python The Hard Way</em>. At the time, this seemed like a great way to get exposed
to what programming was all about in as short of time as possible. I still
think I can stand behind this reasoning.</p>
<p>I found myself diving into learning about programming for 2 reasons:</p>
<ul>
<li>
<p>I was in the middle of taking classes for a GIS certification and all 3
of the classes for the upcoming semester were going to require
progamming, which I essentially knew nothing about.</p>
</li>
<li>
<p>I try to keep a loose list in my head of things that I “know” and
things that I don’t. I like to keep one thing from the “don’t” category
moving towards the “know category most of the time. 2018 was the year
were the top 2 things on the list were “computers” and “vehicles”.</p>
</li>
</ul>
<p>I have come a long way since then. However, nearly everything that I have done
since then has “looked” more or less the same, meaning that I write something
up to be called directly from the command line, like nearly all programming
tutorials show. I hever have really gotten into the web side of things(though
that needs to change soon…). That is to say, I have never made anything that
most average people would recognize as a progam or app.</p>
<p>I got inspired to change this during a recent Rust class that I did with some
of my coworkers. The class mostly consisted of working through <a href="https://doc.rust-lang.org/book/">The Rust
Programming Language</a> together.
did a very simple final project that would give us the chance to use actually
build something rather than working problems from the book and rustlings
exercises. I choose to do a Pomodoro timer, as this is a tool that I find very
helpful for my day to day work.</p>
<p>Coming into this process, I really have no clue about <em>how</em> GUIs actually
work. It seemed like you would need some kind of loop to monitor changes to
state and some how send these to the screen. Based on some cursory research,
this turns out to be the case, but the whole process is pretty
complicated…so lets get learning.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Maybe lets not take the use of the word seriously here too seriously… <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>I started learning about programming seriously1 as an adult 3 years ago. Not quite to the day, but pretty close. I started off with Zed Shaw’s Learn Python The Hard Way. At the time, this seemed like a great way to get exposed to what programming was all about in as short of time as possible. I still think I can stand behind this reasoning. Maybe lets not take the use of the word seriously here too seriously… ↩Blog2021-02-15T00:00:00+00:002021-02-15T00:00:00+00:00keeganevans.com/2021/02/15/blog<ul>
<li>
<p>More carefully worked through SICP exercise 1.19 more carefully and
think that I finally at-least sort of grok what is going on. I at
least was able to get the correct values for $q_1$ and $p_1$ so that
the function works correctly.</p>
</li>
<li>
<p>Yep, more playing with snippets and $\LaTeX$. I also plan on
documenting how I got this setup.</p>
</li>
</ul>More carefully worked through SICP exercise 1.19 more carefully and think that I finally at-least sort of grok what is going on. I at least was able to get the correct values for $q_1$ and $p_1$ so that the function works correctly.Blog2021-02-14T00:00:00+00:002021-02-14T00:00:00+00:00keeganevans.com/2021/02/14/blog<ul>
<li>
<p>Finished the Real Python Course on modules and packages. I had signed
up for a subscription to this site. After trying a couple of other
courses on here I cancled my subscription, as the quality of the
courses seem to be somewhat variable(with the majority being quite
good, but definitely a few that at-least lack in editing and
production value) and all of the material feels
quite superficial. To be fair, they have really good overviews of
topics, but also $20/month will buy you a lot of good programming
books that will be much more indepth and that can serve as references
rather than just an introduction to the topic. I hope that me feeling
this way is an indicator of my growth as a programmer rather than an
criticism of the material itself. It is nice to feel like I finally
know enough to be able to distinguish whether a particular piece of
material is too easy/something I already understand well, just right
for expanding my understanding, or will require some pre-requiste
study before I can really understand it. I will take it as a mild
indicator of competence.</p>
</li>
<li>
<p>More work on getting snippets, $\LaTeX$, markdown preview, and Conjure/Racket working in
my Nvim setup. These are another set of suprisingly time consuming
things to get working correctly.</p>
</li>
<li>
<p>Finished <code class="language-plaintext highlighter-rouge">ConjureSchool</code> and looked over the documentation for
<code class="language-plaintext highlighter-rouge">Conjure</code>. One of the more helpful commands that I found was
<code class="language-plaintext highlighter-rouge"><localleader>cs</code> which starts a new racket REPL when inside of a
Racket file inside of VIM.</p>
</li>
</ul>Finished the Real Python Course on modules and packages. I had signed up for a subscription to this site. After trying a couple of other courses on here I cancled my subscription, as the quality of the courses seem to be somewhat variable(with the majority being quite good, but definitely a few that at-least lack in editing and production value) and all of the material feels quite superficial. To be fair, they have really good overviews of topics, but also $20/month will buy you a lot of good programming books that will be much more indepth and that can serve as references rather than just an introduction to the topic. I hope that me feeling this way is an indicator of my growth as a programmer rather than an criticism of the material itself. It is nice to feel like I finally know enough to be able to distinguish whether a particular piece of material is too easy/something I already understand well, just right for expanding my understanding, or will require some pre-requiste study before I can really understand it. I will take it as a mild indicator of competence.