Extract total RNA with double elution for maximal template recovery.
cDNA synthesis and 30 cycles of PCR using life technologies SuperScript® III One-Step RT-PCR System with Platinum® Taq High Fidelity DNA Polymerase. PCR is done separately for six overlapping fragments labeled F1 through F6.
Library preparation using Illumina Nextera XT® sample preparation kit, followed by column purification. Size selection with the Sage Science BluePippin® with cutoffs at 550 and 900 bp. These sizes include the library adaptors.
Sequencing on illumina MiSeq® using 500 or 600 cycles kit.
Mapping and cleaning of the sequencing reads (in house Python application).
Sequencing quality:
The graphs show the fraction of bases sequenced with a quality better than the indicated threshold for each position in read 1 and read 2 (this run used a 3x250 cycle kit). Phred qualities are estimate by CASAVA 1.8 during base calling. Things to note:
a slow linear decrease even for the threshold Q10. This is due to short inserts escaping size selection
quality deteriorates towards the end of read 2, likely due to phasing errors and optical overlap of bridge-amplified clusters on the flow cell.
We typically used a Phred quality threshold of 30 for our analysis, corresponding to a miscalling rate of 0.1%.
Sequencing errors:
A reference HIV-1 strain expressed from a plasmid (NL4-3) was amplified and sequenced to measure sequencing errors and, to some extent, PCR errors. The error level was around 0.1-0.2%.
The six fragments F1 to F6 overlap between 200 and 350 bases
Variants in these overlaps are sequenced in replicate in a completely independent fashion (starting at reverse transcription)
The figure shows the correlation between variant frequencies in the leading and trailing fragments, deviations from the diagonal signal PCR bias or low template input
Variant frequencies in overlaps are used to quantify the accuracy of frequency estimation for each sample
For each sample and fragment, we quantify the variant frequency accuracy by an effective template number. Effective template numbers are several hundreds for most samples and fragments, but tend to be low for F5.