在线DNA和蛋白序列处理工具(SMS找生物汉化版)
About
The Sequence Manipulation Suite is written in JavaScript 1.5, which is a lightweight, cross-platform, object-oriented scripting language. JavaScript is now standardized by the ECMA (European Computer Manufacturers Association). The first version of the ECMA standard is documented in the ECMA-262 specification. The ECMA-262 standard is also approved by the ISO (International Organization for Standards) as ISO-16262. JavaScript 1.5 is fully compatible with ECMA-262, Edition 3.

Sequences submitted to the Sequence Manipulation Suite do not leave your computer and are instead manipulated by your web browser, which executes the JavaScript. The Sequence Manipulation Suite was written by Paul Stothard (University of Alberta, Canada). Send questions and comments to stothard@ualberta.ca.

Here are short descriptions of the programs that comprise the Sequence Manipulation Suite:

格式转换操作:
  • FASTA整合 - converts multiple FASTA sequence records into a single sequence. Use FASTA整合, for example, when you wish to determine the 密码子用法 for a collection of sequences using a program that accepts a single sequence as input.
  • EMBL格式转换为FASTA - accepts one or more EMBL files as input and returns the DNA sequence from each in FASTA format. Use this program when you wish to quickly remove all of the non-DNA sequence information from an EMBL file.
  • EMBL特征文件提取器 - accepts one or more EMBL files as input and reads the sequence feature information described in the feature tables. The program extracts or highlights the relevant sequence segments and returns each sequence feature in FASTA format. EMBL特征文件提取器 is particularly helpful when you wish to derive the sequence of a cDNA from a genomic sequence that contains many introns.
  • EMBL翻译文件提取器 - accepts one or more EMBL files as input and returns each of the protein translations described in the files in FASTA format. EMBL翻译文件提取器 can be used when you are more interested in the predicted protein translations of a DNA sequence than the DNA sequence itself.
  • DNA序列过滤器 - removes non-DNA characters from text. Use this program when you wish to remove digits and blank spaces from a sequence to make it suitable for other applications.
  • 蛋白序列过滤器 - removes non-protein characters from text. Use this program when you wish to remove digits and blank spaces from a sequence to make it suitable for other applications.
  • GenBank转换为 FASTA - accepts one or more GenBank files as input and returns the entire DNA sequence from each in FASTA format. Use this program when you wish to quickly remove all of the non-DNA sequence information from a GenBank file.
  • GenBank特征文件提取器 - accepts one or more GenBank files as input and reads the sequence feature information described in the feature tables, according to the rules outlined in the GenBank release notes. The program extracts or highlights the relevant sequence segments and returns each sequence feature in FASTA format. GenBank特征文件提取器 is particularly helpful when you wish to derive the sequence of a cDNA from a genomic sequence that contains many introns.
  • GenBank翻译文件提取器 - accepts one or more GenBank files as input and returns each of the protein translations described in the files in FASTA format. GenBank翻译文件提取器 should be used when you are more interested in the predicted protein translations of a DNA sequence than the DNA sequence itself.
  • 氨基酸名称转换(1到3) - converts single letter translations to three letter translations.
  • 提取指定位置DNA序列 - accepts one or more DNA sequences along with a set of positions or ranges. The bases corresponding to the positions or ranges are returned, either as a single new sequence, a set of FASTA records, uppercase text, or lowercase text. Use 提取指定位置DNA序列 to obtain subsequences using position information.
  • 提取指定位置蛋白序列 - 根据要求自动获取指定位置或范围的DNA碱基片段,可以输出为新序列、FASTA格式序列、大写字体显示、小写字体显示等方式。
  • 反向互补序列转换 - converts a DNA sequence into its reverse, complement, or reverse-complement counterpart. The entire IUPAC DNA alphabet is supported, and the case of each input sequence character is maintained. You may want to work with the reverse-complement of a sequence if it contains an ORF on the reverse strand.
  • 密码子拆分 - divides a coding sequence into three new sequences, each consisting of the bases from one of the three codon positions.
  • FASTA拆分 - divides FASTA sequence records into smaller FASTA sequences of the size you specify. An optional overlap value can be used to create sequences that overlap.
  • 氨基酸名称转换(3到1) - converts three letter translations to single letter translations. Digits and blank spaces are removed automatically. Non-standard triplets are ignored.
  • 从窗口提取DNA序列 - accepts one or more DNA sequences along with a position and window size. The bases located in the window are returned, either as a new sequence, uppercase text, or lowercase text. Use 从窗口提取DNA序列 to obtain subsequences using position information.
  • 从窗口提取蛋白序列 - accepts one or more protein sequences along with a position and window size. The residues located in the window are returned, either as a new sequence, uppercase text, or lowercase text. Use 从窗口提取蛋白序列 to obtain subsequences using position information.
序列分析操作:
  • 密码子点阵 - accepts a DNA sequence and generates a graphical plot consisting of a horizontal bar for each codon. The length of the bar is proportional to the frequency of the codon in the codon frequency table you enter. Use 密码子点阵 to find portions of DNA sequence that may be poorly expressed, or to view a graphic representation of a 密码子用法 table (by using a DNA sequence consisting of one of each codon type).
  • 密码子用法 - accepts one or more DNA sequences and returns the number and frequency of each codon type. Since the program also compares the frequencies of codons that code for the same amino acid (synonymous codons), you can use it to assess whether a sequence shows a preference for particular synonymous codons.
  • CpG岛预测 - reports potential CpG island regions using the method described by Gardiner-Garden and Frommer (1987). The calculation is performed using a 200 bp window moving across the sequence at 1 bp intervals. CpG岛预测 are defined as sequence ranges where the Obs/Exp value is greater than 0.6 and the GC content is greater than 50%. The expected number of CpG dimers in a window is calculated as the number of 'C's in the window multiplied by the number of 'G's in the window, divided by the window length. CpG岛预测 are often found in the 5' regions of vertebrate genes, therefore this program can be used to highlight potential genes in genomic sequences.
  • DNA分子量 - accepts one or more DNA sequences and calculates molecular weight. Sequences can be treated as double-stranded or single-stranded, and as linear or circular. Use DNA分子量 when calculating molecule copy number.
  • DNA模式搜寻 - accepts one or more sequences along with a search pattern and returns the number and positions of sites that match the pattern. The search pattern is written as a JavaScript regular expression, which resembles the regular expressions written in other programming languages, such as Perl.
  • DNA碱基统计 - returns the number of occurrences of each residue in the sequence you enter. Percentage totals are also given for each residue, and for certain groups of residues, allowing you to quickly compare the results obtained for different sequences.
  • 模糊搜索DNA - accepts a DNA sequence along with a query sequence and returns sites that are identical or similar to the query. You can use this program, for example, to find sequences that can be easily mutated into a useful restriction site.
  • 模糊搜索蛋白 - accepts a protein sequence along with a query sequence and returns sites that are identical or similar to the query.
  • 识别和模拟 - accepts a group of aligned sequences (in FASTA or GDE format) and calculates the identity and similarity of each sequence pair. Identity and similarity values are often used to assess whether or not two sequences share a common ancestor or function.
  • Mutate for Digest - accepts a DNA sequence as input and searches for regions that can easily be mutated to create a restriction site of interest. The program also reports protein translations so that you can see which reading frames are altered by the proposed mutations. Use Mutate for Digest to find sequences that can be converted to a useful restriction site using PCR or site-directed mutagenesis.
  • 多启式翻译 - accepts a protein alignment and uses a 密码子用法 table to generate a degenerate DNA coding sequence. The program also returns a graph that can be used to find regions of minimal degeneracy at the nucleotide level. Use 多启式翻译 when designing PCR primers to anneal to an unsequenced coding sequence from a related species.
  • ORF查找 - searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. ORF查找 supports the entire IUPAC alphabet and several 基因编码. Use ORF查找 to search newly sequenced DNA for potential protein encoding segments.
  • 成对排列密码子 - accepts two coding sequences and determines the optimal global alignment. Use 成对排列密码子 to look for conserved coding sequence regions.
  • 成对排列DNA - accepts two DNA sequences and determines the optimal global alignment. Use 成对排列DNA to look for conserved sequence regions.
  • 成对排列蛋白 - accepts two protein sequences and determines the optimal global alignment. Use 成对排列蛋白 to look for conserved sequence regions.
  • PCR引物统计 - accepts a list of PCR primer sequences and returns a report describing the properties of each primer, including melting temperature, percent GC content, and PCR suitability. Use PCR引物统计 to evaluate potential PCR primers.
  • PCR产物 - accepts one or more DNA sequence templates and two primer sequences. The program searches for perfectly matching primer annealing sites that can generate a PCR product. Any resulting products are sorted by size, and they are given a title specifying their length, their position in the original sequence, and the primers that produced them. You can use linear or circular molecules as the template. Use PCR产物 to determine the product sizes you can expect to see when you perform PCR in the lab.
  • 蛋白平均疏水性 - 蛋白平均疏水性 returns the GRAVY (grand average of hydropathy) value for the protein sequences you enter. The GRAVY value is calculated by adding the hydropathy value for each residue and dividing by the length of the sequence (Kyte and Doolittle; 1982).
  • 蛋白等电点 - calculates the theoretical pI (isoelectric point) for the protein sequence you enter. Use 蛋白等电点 when you want to know approximately where on a 2-D gel a particular protein will be found.
  • 蛋白分子量 - accepts one or more protein sequences and calculates molecular weight. You can append copies of commonly used epitopes and fusion proteins using the supplied list. Use 蛋白分子量 when you wish to predict the location of a protein of interest on a gel in relation to a set of protein standards.
  • 蛋白样式查找 - accepts one or more sequences along with a search pattern and returns the number and positions of sites that match the pattern. The search pattern is written as a JavaScript regular expression, which resembles the regular expressions written in other programming languages, such as Perl.
  • 蛋白统计汇总 - returns the number of occurrences of each residue in the sequence you enter. Percentage totals are also given for each residue, and for certain groups of residues, allowing you to quickly compare the results obtained for different sequences.
  • 限制性酶切位点 - cleaves a DNA sequence in a virtual 限制性酶切位点, with one, two, or three restriction enzymes. The resulting fragments are sorted by size, and they are given a title specifying their length, their position in the original sequence, and the enzyme sites that produced them. You can digest linear or circular molecules, and even a mixture of molecules (by entering more than one sequence in FASTA format). Use 限制性酶切位点 to determine the fragment sizes you will see when you perform a digest in the lab.
  • 酶切位点汇总 - accepts a DNA sequence and returns the number and positions of commonly used restriction endonuclease cut sites. Use this program if you wish to quickly determine whether or not an enzyme cuts a particular segment of DNA.
  • 反向翻译 - accepts a protein sequence as input and uses a 密码子用法 table to generate a DNA sequence representing the most likely non-degenerate coding sequence. A consensus sequence derived from all the possible codons for each amino acid is also returned. Use 反向翻译 when designing PCR primers to anneal to an unsequenced coding sequence from a related species.
  • 翻译 - accepts a DNA sequence and converts it into a protein in the reading frame you specify. 翻译 supports the entire IUPAC alphabet and several 基因编码.
序列图谱操作:
  • Color Align Conservation - accepts a group of aligned sequences (in FASTA or GDE format) and colors the alignment. The program examines each residue and compares it to the other residues in the same column. Residues that are identical among the sequences are given a black background, and those that are similar among the sequences are given a gray background. The remaining residues receive a white background. You can specify the percentage of residues that must be identical and similar for the coloring to be applied. Use Color Align Conservation to enhance the output of sequence alignment programs.
  • Color Align Properties - accepts a group of aligned sequences (in FASTA or GDE format) and colors the alignment. The program examines each residue and compares it to the other residues in the same column. Residues that are identical or similar among the sequences are given a colored background. The color is chosen according to the biochemical properties of the residue. You can specify the percentage of residues that must be identical and similar for the coloring to be applied. Use Color Align Properties to highlight protein regions with conserved biochemical properties.
  • Group DNA - adjusts the spacing of DNA sequences and adds numbering. You can specify the group size (the number of bases per group), as well as the number of bases per line. The output of this program can serve as a convenient reference, since the numbering and spacing allows you to quickly locate specific bases.
  • Group Protein - adjusts the spacing of protein sequences and adds numbering. You can specify the group size (the number of residues per group), as well as the number of residues per line. The output of this program can serve as a convenient reference, since the numbering and spacing allows you to quickly locate specific residues.
  • 引物图谱 - accepts a DNA sequence and returns a textual map showing the annealing positions of PCR primers. Restriction endonuclease cut sites, and the protein translations of the DNA sequence can also be shown. Use this program to produce a useful reference figure, particularly when you have designed a large number of primers for a particular template. 引物图谱 supports the entire IUPAC alphabet and several 基因编码.
  • 限制内切酶图谱 - accepts a DNA sequence and returns a textual map showing the positions of restriction endonuclease cut sites. The translation of the DNA sequence is also given, in the reading frame you specify. Use the output of this program as a reference when planning cloning strategies. 限制内切酶图谱 supports the entire IUPAC alphabet and several 基因编码.
  • 翻译图谱 - accepts a DNA sequence and returns a textual map displaying protein translations. The reading frame of the translation can be specified (1, 2, 3, or all three), or you can choose to treat uppercase text as the reading frame. 翻译图谱 supports the entire IUPAC alphabet and several 基因编码.
序列随机操作:
  • DNA突变 - introduces base changes into a DNA sequence. You can select the number of mutations to introduce, and whether or not to preserve the first and last three bases in the sequence, to reflect selection acting to maintain start and stop codons. The position of each mutation is chosen randomly, and multiple mutations can occur at a single site. Mutated sequences can be used to evaluate the significance of 序列分析操作 results.
  • 蛋白突变 - introduces residue changes into a protein sequence. You can select the number of mutations to introduce, and whether or not to preserve the first residue in the sequence, to reflect selection acting to maintain a start codon. The position of each mutation is chosen randomly, and multiple mutations can occur at a single site. Mutated sequences can be used to evaluate the significance of 序列分析操作 results.
  • 随机编码DNA - generates a random open reading frame beginning with a start codon and ending with a stop codon. You can choose the genetic code to use and the length of the sequence to generate. 序列随机操作 can be used to evaluate the significance of 序列分析操作 results.
  • 随机DNA序列 - 根据指定长度生成随机DNA序列,用于评价序列分析结果.
  • 随机DNA区域 - 使用随机的碱基替代DNA序列的特定区域,用于评价序列分析结果。
  • 随机蛋白序列 - 根据指定长度生成随机DNA序列,用于评价序列分析结果.
  • 随机蛋白区域 - 使用随机的氨基酸替代蛋白序列的特定区域,用于评价序列分析结果。
  • DNA样品 - randomly selects bases from the guide sequence until a sequence of the length you specify is constructed. Each selected base is replaced so that it can be selected again.
  • 蛋白样品 - randomly selects bases from the guide sequence until a sequence of the length you specify is constructed. Each selected residue is replaced so that it can be selected again.
  • 打乱DNA序列 - randomly shuffles a DNA sequence. Shuffled sequences can be used to evaluate the significance of 序列分析操作 results, particularly when sequence composition is an important consideration.
  • 打乱蛋白序列 - randomly shuffles a protein sequence. Shuffled sequences can be used to evaluate the significance of 序列分析操作 results, particularly when sequence composition is an important consideration.

新窗口打开 | SMS工具首页 | 找生物
Tue Sep 20 18:12:27 2011