Molecular file formats
A brief overview of various formats for describing molecules and crystals. The program Babel is the standard tool for conversion, although OpenBabel is catching up fast
Note that a unit cell can be specified either by three cartesian axes, or in a b c alpha beta gamma format. The latter format loses information about the absolute orientation of the cell in space, and by convention a lies along x, b in the xy plane, and abc forms a right-hand set. Converting from cartesian to abc notation may thus require a rotation.
Most formats do not contain bonding information. Therefore any viewer makes up bonds as it feels fit.
Beware viewers! They may believe that xyz is a right-hand set, or perhaps a left-hand set. They may believe that all co-ordinates need reducing into the unit cell (if supplied), or they may not. They may believe that an atom at (0,0,0) needs to be shown at all eight corners of the cell, if just one repeat unit is specified, or they may not.
Cell
Castep input file. Free format. Units specified as Angstom or Bohr, unit cell is required and may be in cartesian or abc form. Atomic positions can be given in absolute or relative co-ordinates.
Cube
Gaussian grid format. Poorly documented. Units are Bohrs, not Angstroms. Effectively can contain a single 3D density and a list of atomic numbers and positions. Many programs will read just the density, and not the atomic positions.
PDB
Protein Data Bank. A fixed format file, one record per line, with atoms in absolute co-ordinates in units of Angstoms written as F8.3, so to the nearest 0.001A. Optional unit cell definition as abc and including symmetry class. Optional bonding information. Comments are permitted and are introduced with the keyword REMARK.
The format is designed for proteins, and contains fields for residue names, chain identifiers and similar things which are irrelevant for inorganic crystals.
Sometimes the unit cell is defined as the unit cube to signify that no unit cell exists. This is confusing.
This is a very widely used, and abused, format (some programs produce incorrect pdb files, or misread correct ones).
Vasp output
A formatted file containing a basis set, atomic positions and a single density. Units are Angstroms. Unfortunately atoms are indexed by sequential species number, not atomic number, and there is no map from these numbers to the atomic number. Co-incidentally the old academic Castep code had this property too.
Not widely used.
Xplor
Poorly documented, and data grid only, no atoms. Axes specified in abc notation.
XSF
XCrySDen's native format. Free format text file containing atomic positions, optionally forces, optionally a unit cell, optionally a conventional unit cell as well as a reduced one (e.g. 2 and 8 atom cubic), and optionally any number of 2D and 3D data sets. Comments not permitted.
Not widely used, and may change in future verions of XCrySDen.
XYZ
Simply a free format list of atomic symbols and co-ordinates. No unit cell can be defined, but multiple frames can be, so movies can be generated from a single xyz file. The native format of xmol. It can contain charges or vectors, but few programs support this.
Widely used, but beware the lack of unit cell.