<-- README --> README for NCBI Microbial Genomes _______________________________________________________________________ National Center for Biotechnology Information (NCBI) National Library of Medicine National Institutes of Health 8600 Rockville Pike Bethesda, MD 20894, USA tel: (301) 496-2475 fax: (301) 480-9241 email: info@ncbi.nlm.nih.gov (for general questions) email: genomes@ncbi.nlm.nih.gov (for specific questions) _______________________________________________________________________ ======================================================================= Announcement: Please be aware that NCBI is initiating changes to prokaryotic genomic resources due to the large number of genomic sequences being deposited. 1. At some point in the future, strain-level taxids will no longer be issued to every prokaryotic organism for which a genome sequence is obtained. 2. The Reference Sequence resources will no longer have records matching every GenBank prokaryotic genomic sequence (both complete and WGS genomes). It is likely that only a subset of representatives from a given species will be used as RefSeq records. This will impact numerous Entrez databases, NCBI resources and tools, and a variety of FTP archives. - GenBank sequences that are not made into RefSeqs become neighbors for a specific RefSeq record and/or a neighbor for an entire species (ex: CP001637 is neighbor to NC_000913). 3. For WGS genomes, current restrictions are: - WGS genomes without annotation are not copied into RefSeq. - after 6 months, if no annotation is submitted, NCBI will make a RefSeq copy and annotate the genome. - WGS genomes > 200 contigs are not copied into RefSeq. - NCBI is examining cases of multiple strains for the same species with respect to WGS and complete genomes. NCBI is currently examining how best to deal with these issues and will make additional announcements on this topic. ======================================================================= This FTP directory contains data from complete microbial genomes. ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ Updated every week on Friday Related Directories: ftp://ftp.ncbi.nih.gov/genomes/Bacteria_DRAFT/ Updated every week on Saturday ftp://ftp.ncbi.nih.gov/genomes/Fungi/ Updated every week on Thursday ftp://ftp.ncbi.nih.gov/genomes/MITOCHONDRIA/ Updated every day ftp://ftp.ncbi.nih.gov/genomes/Viruses/ Updated every day ftp://ftp.ncbi.nih.gov/genomes/Plasmids/ Updated every week on Monday ftp://ftp.ncbi.nih.gov/genomes/Chloroplasts/plastids/plastids/ Updated every day ftp://ftp.ncbi.nih.gov/genomes/HUMAN_MICROBIOM/ Special directory of genomes derived from NIHGRI Human Microbiome Project Updated every Sunday The data for individual microbial genomes are contained in separate folders. Directory names approximate organism names (in taxonomy) but not always. For duplicate projects, project ID is appended to differentiate them. NOTE: Data submissions to a project that may span a significant amount of time may result in changes or alterations to the data content for a particular genome over that time period, Specifically cases where only one replicon is submitted at an initial time point and then more replicons are submitted much later may affect how the project/genome/data is organized. Provided for each chromosome/plasmid: -Summary data (.rpt) -Genbank (.gbk), ASN1 (.asn), binary ASN1 (.val) and fasta (.fna) format sequences -Fasta format gene (.fnn) and protein (.faa) sequences -Gene and protein information (location, strand, product etc.) (.gff and .ptt). -GeneMark, GeneMarkHMM, Glimmer and Prodigal gene predictions -A list of updates since the last release (.rps). NOTE: There may be potential errors in the *.gff files. Note, these are provided as is with no guarantee that the data is correct. This data is also provided for all genomes in tar zip format (all.xxxx.tar.gz) Genome project information for complete and partial genomes is provided in the lproks text files. Summary data for all complete genomes (RefSeq) is provided in summary.txt. Genomes (chromosomes and plasmids) in Genbank that are not made into RefSeqs are listed by gi and taxid in SameSpecies.gi. They are also listed in the lproks_1.txt file where no RefSeq Accession is listed. Cluster data from complete and partial genomes is contained in the genomes/CLUSTERS directory. Genome data is updated on a daily basis, Cluster data is updated quarterly to semi-annually. Edited Oct 22, 2010 NCBI