|   | cai | 
Please help by correcting and extending the Wiki pages.
cai calculates the Codon Adaptation Index for a given nucleotide sequence, given a reference codon usage table. The CAI index is a simple, effective measure of synonymous codon usage bias. It index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.
The CAI index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon. A score for a gene sequence is calculated from the frequency of use of all codons in that gene sequence.
| % cai TEMBL:AB009602 Calculate codon adaptation index Codon usage file [Eyeast_cai.cut]: Output file [ab009602.cai]: | 
Go to the input files for this example
Go to the output files for this example
| 
Calculate codon adaptation index
Version: EMBOSS:6.6.0.0
   Standard (Mandatory) qualifiers:
  [-seqall]            seqall     Nucleotide sequence(s) filename and optional
                                  format, or reference (input USA)
   -cfile              codon      [Eyeast_cai.cut] Codon usage table name
  [-outfile]           outfile    [*.cai] Output file name
   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:
   "-seqall" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -scircular1         boolean    Sequence is circular
   -squick1            boolean    Read id and sequence only
   -sformat1           string     Input sequence format
   -iquery1            string     Input query fields or ID list
   -ioffset1           integer    Input start position offset
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name
   "-cfile" associated qualifiers
   -format             string     Data format
   "-outfile" associated qualifiers
   -odirectory2        string     Output directory
   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit
 | 
| Qualifier | Type | Description | Allowed values | Default | 
|---|---|---|---|---|
| Standard (Mandatory) qualifiers | ||||
| [-seqall] (Parameter 1) | seqall | Nucleotide sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required | 
| -cfile | codon | Codon usage table name | Codon usage file in EMBOSS data path | Eyeast_cai.cut | 
| [-outfile] (Parameter 2) | outfile | Output file name | Output file | <*>.cai | 
| Additional (Optional) qualifiers | ||||
| (none) | ||||
| Advanced (Unprompted) qualifiers | ||||
| (none) | ||||
| Associated qualifiers | ||||
| "-seqall" associated seqall qualifiers | ||||
| -sbegin1 -sbegin_seqall | integer | Start of each sequence to be used | Any integer value | 0 | 
| -send1 -send_seqall | integer | End of each sequence to be used | Any integer value | 0 | 
| -sreverse1 -sreverse_seqall | boolean | Reverse (if DNA) | Boolean value Yes/No | N | 
| -sask1 -sask_seqall | boolean | Ask for begin/end/reverse | Boolean value Yes/No | N | 
| -snucleotide1 -snucleotide_seqall | boolean | Sequence is nucleotide | Boolean value Yes/No | N | 
| -sprotein1 -sprotein_seqall | boolean | Sequence is protein | Boolean value Yes/No | N | 
| -slower1 -slower_seqall | boolean | Make lower case | Boolean value Yes/No | N | 
| -supper1 -supper_seqall | boolean | Make upper case | Boolean value Yes/No | N | 
| -scircular1 -scircular_seqall | boolean | Sequence is circular | Boolean value Yes/No | N | 
| -squick1 -squick_seqall | boolean | Read id and sequence only | Boolean value Yes/No | N | 
| -sformat1 -sformat_seqall | string | Input sequence format | Any string | |
| -iquery1 -iquery_seqall | string | Input query fields or ID list | Any string | |
| -ioffset1 -ioffset_seqall | integer | Input start position offset | Any integer value | 0 | 
| -sdbname1 -sdbname_seqall | string | Database name | Any string | |
| -sid1 -sid_seqall | string | Entryname | Any string | |
| -ufo1 -ufo_seqall | string | UFO features | Any string | |
| -fformat1 -fformat_seqall | string | Features format | Any string | |
| -fopenfile1 -fopenfile_seqall | string | Features file name | Any string | |
| "-cfile" associated codon qualifiers | ||||
| -format | string | Data format | Any string | |
| "-outfile" associated outfile qualifiers | ||||
| -odirectory2 -odirectory_outfile | string | Output directory | Any string | |
| General qualifiers | ||||
| -auto | boolean | Turn off prompts | Boolean value Yes/No | N | 
| -stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | 
| -filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | 
| -options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | 
| -debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | 
| -verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | 
| -help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | 
| -warning | boolean | Report warnings | Boolean value Yes/No | Y | 
| -error | boolean | Report errors | Boolean value Yes/No | Y | 
| -fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | 
| -die | boolean | Report dying program messages | Boolean value Yes/No | Y | 
| -version | boolean | Report version number and exit | Boolean value Yes/No | N | 
| 
ID   AB009602; SV 1; linear; mRNA; STD; FUN; 561 BP.
XX
AC   AB009602;
XX
DT   15-DEC-1997 (Rel. 53, Created)
DT   14-APR-2005 (Rel. 83, Last updated, Version 2)
XX
DE   Schizosaccharomyces pombe mRNA for MET1 homolog, partial cds.
XX
KW   MET1 homolog.
XX
OS   Schizosaccharomyces pombe (fission yeast)
OC   Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC   Schizosaccharomycetes; Schizosaccharomycetales; Schizosaccharomycetaceae;
OC   Schizosaccharomyces.
XX
RN   [1]
RP   1-561
RA   Kawamukai M.;
RT   ;
RL   Submitted (07-DEC-1997) to the INSDC.
RL   Makoto Kawamukai, Shimane University, Life and Environmental Science; 1060
RL   Nishikawatsu, Matsue, Shimane 690, Japan
RL   (E-mail:kawamuka@life.shimane-u.ac.jp, Tel:0852-32-6587, Fax:0852-32-6499)
XX
RN   [2]
RP   1-561
RA   Kawamukai M.;
RT   "S.pmbe MET1 homolog";
RL   Unpublished.
XX
DR   EnsemblGenomes; SPCC1739.06c; Schizosaccharomyces_pombe.
DR   EnsemblGenomes; SPCC1739.06c.1; Schizosaccharomyces_pombe.
DR   PomBase; SPCC1739.06c.
DR   PomBase; SPCC1739.06c.1.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..561
FT                   /organism="Schizosaccharomyces pombe"
FT                   /mol_type="mRNA"
FT                   /clone_lib="pGAD GH"
FT                   /db_xref="taxon:4896"
FT   CDS             <1..275
FT                   /codon_start=3
FT                   /transl_table=1
FT                   /product="MET1 homolog"
FT                   /db_xref="GOA:O74468"
FT                   /db_xref="InterPro:IPR000878"
FT                   /db_xref="InterPro:IPR003043"
FT                   /db_xref="InterPro:IPR006366"
FT                   /db_xref="InterPro:IPR012066"
FT                   /db_xref="InterPro:IPR014776"
FT                   /db_xref="InterPro:IPR014777"
FT                   /db_xref="InterPro:IPR016040"
FT                   /db_xref="UniProtKB/Swiss-Prot:O74468"
FT                   /protein_id="BAA23999.1"
FT                   /translation="SMPKIPSFVPTQTTVFLMALHRLEILVQALIESGWPRVLPVCIAE
FT                   RVSCPDQRFIFSTLEDVVEEYNKYESLPPGLLITGYSCNTLRNTA"
XX
SQ   Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other;
     gttcgatgcc taaaatacct tcttttgtcc ctacacagac cacagttttc ctaatggctt        60
     tacaccgact agaaattctt gtgcaagcac taattgaaag cggttggcct agagtgttac       120
     cggtttgtat agctgagcgc gtctcttgcc ctgatcaaag gttcattttc tctactttgg       180
     aagacgttgt ggaagaatac aacaagtacg agtctctccc ccctggtttg ctgattactg       240
     gatacagttg taataccctt cgcaacaccg cgtaactatc tatatgaatt attttccctt       300
     tattatatgt agtaggttcg tctttaatct tcctttagca agtcttttac tgttttcgac       360
     ctcaatgttc atgttcttag gttgttttgg ataatatgcg gtcagtttaa tcttcgttgt       420
     ttcttcttaa aatatttatt catggtttaa tttttggttt gtacttgttc aggggccagt       480
     tcattattta ctctgtttgt atacagcagt tcttttattt ttagtatgat tttaatttaa       540
     aacaattcta atggtcaaaa a                                                 561
//
 | 
| Sequence: AB009602 CAI: 0.188 | 
cai requires a reference codon usage table prepared from a set of genes which are known to be highly expressed. This is specified by the -cfile option and must exist in the EMBOSS data directory. The default codon usage table Eyeastcai.cut is the standard set of Saccharomyces cerevisiae highly expressed gene codon frequiencies. Another table (Eschpo_cai.cut) was prepared from a set of Schizosaccharomyces pombe genes by Peter Rice for the S. pombe sequencing team at the Sanger Centre, and is available in the EMBOSS data directory. You should prepare your own codon usage table for your organism of interest.
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
Codons are nucleotide triplet that encode an amino acid residue in a polypeptide chain. There are four possible nucleotides in DNA; adenine (A), guanine (G), cytosine (C) and thymine (T), therefore 64 possible triplets to encode the 20 amino acids plus the translation termination signal. The encoding is therefore redundant, with all but two amino acids coded for by more than one triplet. Organisms often have a particular preference for one of the possible codons for a given amino acid.
Codon preferences reflect a balance between mutational bias and selection for efficiency of translation. In fast-growing microorganisms there are optimal codons that reflect the composition of the genomic tRNA pool and probably help achieve faster translation rates and high accuracy. Such selection is expected to be strong in highly expressed genes, as is the case for Escherichia coli or Saccharomyces cerevisiae. In contrast, codon usage optimization is normally absent in organisms with slower growing rates such as Homo sapiens (human), where codon preferences are determined by mutational biases characteristic to a particular genome.
Various factors are thought to influence codon usage bias in baceteria, including gene expression level already mentioned, %G+C composition (reflecting horizontal gene transfer or mutational bias), GC skew (reflecting strand-specific mutational bias), amino acid conservation, protein hydropathy, transcriptional selection, RNA stability, and optimal growth temperature.
Various methods have been used to analyze codon usage bias. CAI and methods such as the 'frequency of optimal codons' (Fop) are commonly used to predict gene expression levels. Others such as the 'effective number of codons' (Nc) and Shannon entropy are used to measure codon usage evenness, whereas multivariate statistical methods, iincluding correspondence analysis and principal component analysis, may be used to analyze variations in codon usage between genes.
| Program name | Description | 
|---|---|
| chips | Calculate Nc codon usage statistic | 
| codcmp | Codon usage table comparison | 
| codcopy | Copy and reformat a codon usage table | 
| cusp | Create a codon usage table from nucleotide sequence(s) | 
| syco | Draw synonymous codon usage statistic plot for a nucleotide sequence | 
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.