Skip to content

EvolBioInf/o111h8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Serotype O111 belongs to the ”Big SixEscherichia coli taxa causing food poisoning in the US and beyond. In this tutorial we analyze the subtype with H8 flagella antigen in the context of all E. coli genomes with a view to designing serotype-specific diagnostic markers. This analysis is part of a forthcoming manuscript entitled Sorting Bacterial Genomes Below the Species Rank.

Authors

Beatriz Vieira Mourato, Sara-Lena Welk, Fabian Klötzl, and Bernhard Haubold

Dependencies

This tutorial has six main dependencies:

  • Neighbors for finding target and neighbor genomes
  • `datasets` for downloading genomes
  • Fur for finding unique genome regions
  • Prim for designing PCR primers
  • Biobox for general sequence manipulation
  • the Unix tools `curl`, `bzip2`, and `zip` for downloading and decompressing files

If you are on a Debian system like WSL/Ubuntu, you can install these dependencies into `~/bin/` by executing from inside the `o111h8` repo

bash scripts/setup.sh

Once that’s done, make sure `~/bin/` is in your path by running

source ~/.profile

We have tested this setup on our “minimal box” Docker container, mix.

Download and Construct Data Set

Execute

make data

to generate the directory `data` and calculate the Neighbors database inside it. This takes approximately 2.5 minutes, produces 166 warnings you can safely ignore, and results in four relevant files inside `data`,

  • `neidb`, the Neighbors database calculated from the Genbank assemblies, Refseq assemblies, and the taxonomy database downloaded from the NCBI on 17th June 2026
  • `eco.json`, the genome summaries of the 7884 E. coli genomes assembled to level “complete”
  • `eco.nwk`, the tree of the 7386 complete E. coli genomes that passed the quality filter
  • `sero.txt`, the serotypes of the 7386 complete E. coli genomes calculated with `ectyper`

Tutorial

Make the tutorial and change into it.

make tutorial

This now contains the scripts and data files for following the tutorial described in the doc.

About

Genome Analysis of Escherichia coli O111:H8

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors