Symplectic in Cambridge

Symplectic is the database of publications used by the University to ease the submission of various returns to funding agencies, particularly its REF return. Symplectic is a commercial product, and, as of 2016, the University's instance of it, ref.cam.ac.uk, can collect data for academic staff, postdocs and PhD students (but not yet all possible types of Fellow).

The data in it are used to generate the Group's publication pages on its web site, so, if you wish to advertise your work, you probably wish to be in. It is very easy to use. Simply log in at https://ref.cam.ac.uk/, perform a one-time setup by looking at Menu, Publications, Search settings, and ensuring that "arXiv", "Scopus" and "Web of Science (Lite)" at the bottom of the page are set to "Currently searched". If not, set them to "Currently searched", click "Save", and wait about 48 hours. You should then start to receive some very well-targeted emails asking you to confirm that papers are indeed yours, and that is it. If you have an ORCiD, then add that too by clicking the "configure" link beside ORCiD.

Symplectic in TCM

It can be hard to extract data from this database in a useful form, so TCM has created a local copy of some of the data relevant to it in a local sqlite3 database, which anyone can query with the usual sqlite3 tools. It has also created a script to perform common queries more easily.

This is very much work in progress, ideas for enhancements are welcome, although there are constraints caused by the data available in the orginal database.

Examples of Queries

Output Format: Text, Python, BibTeX, LaTeX or HTML

m1:~$ pubs
Id:         23835
Title:      Ab initio calculation of electron affinities of diamond surfaces
Authors:    Rutter, MJ and Robertson, J
Journal:    Phys. Rev. B
Volume:     57
Pages:      9241 -- 9245
Date:       15/4/1998

A Human-readable dump of the main fields in the database for all records which include your CRSID. For convenience, this page shows just the first in this example and all others.

m1:~$ pubs -f py
{'volume': '57', 'book': None, 'pages': '9241 -- 9245',
'arxiv': None, 'eissn': '1095-3795',
'type': 'journal-article',
'authors': 'Rutter, MJ and Robertson, J',
'crsids': 'mjr19 jr214', 'doi': '10.1103/PhysRevB.57.9241',
'day': 15, 'month': 4, 'year': 1998,
'journal': 'Phys. Rev. B', 'id': 23835, 
'issn': '0163-1829', 'timefetched': 1396731649,
'title': 'Ab initio calculation of electron affinities of diamond surfaces'}

This simply dumps all fields of all records in the local database which include your CRSID, in the format of a python dictionary.

m1:~$ pubs -f bibtex
@article{ 23835
  journal = {Phys. Rev. B}
  title = {Ab initio calculation of electron affinities of diamond surfaces}
  author = {Rutter, MJ and Robertson, J}
  year = {1998}
  volume = {57}
  pages = {9241 -- 9245}
}
m1:~$ pubs -f latex
\item {\it Ab initio calculation of electron affinities of diamond surfaces}
MJ Rutter and J Robertson,
Phys. Rev. B {\bf 57} 9241 -- 9245 (1998) 
m1:~$ pubs -f html
<li>
<a href='http://dx.doi.org/10.1103/PhysRevB.57.9241'>
<span style='font-style: italic;'>Ab initio calculation of electron affinities of diamond surfaces</span></a>
MJ Rutter and J Robertson, 
Phys. Rev. B <span style='font-weight: bold;'>57</span> 9241 -- 9245 (1998) 
</li>

The last of these would be formatted in a list context as

Ab initio calculation of electron affinities of diamond surfaces MJ Rutter and J Robertson, Phys. Rev. B 57 9241 -- 9245 (1998)

Output Order

The default ordering of the output is chronological. The option `-r' changes to reverse chronological. Records with no date have an assumed date of zero.

Selecting Users

By default publications for the user id of the user running the pubs command are shown. To select a different user, the option is
-u CRSID.
To select all users, the option is simply -a.

Other Selections

One can specify any sqlite3 condition, which, unless -a is specified, will be combined with a restriction to a single CRSID. Examples include:

m1:~$ pubs -f html -q 'year=1997'

(Anything you published in 1997.)

m1:~$ pubs -f html -a -q 'type="patent"'

(All patents in the local database. Note sqlite requires quotes around strings. Patents are not automatically found by Symplectic. So this lists only those patents which people in TCM have bothered to enter manually.)

m1:~$ pubs -f html -a -q 'title like "%Silicon Carbide%"'

Anything with the string "Silicon Carbide" in its title. The wildcards used by sqlite's like statement are "%", which is the equivalent of "*" for the shell's filename globbing or the regular expression ".*", and "_", which is the equivalent of "?" for the shell's filename globbing or the regular expression "." Note that SQLite regards "like" as case insensitive, and "=" as case sensitive.

Authors unknown to Symplectic

Authors who are not known to the University's database cannot be found by CRSID. So

m1:~$ pubs -u vh200

finds nothing, whereas

m1:~$ pubs -a -q 'authors like "%Heine, V%"'

finds 43 publications Volker has co-authored with people who are known to the University's database. This will be the case for all PhD students too. The "authors" field of the local database stores names in the BibTeX format of

surname, initials[ and surname, initials[ and ...]]

Abstracts

Abstracts are stored for some records. They can be searched with expressions such as:

m1:~$ pubs -a -q 'abstract like "%diamond%"'

(All abstracts mentioning diamond.)

m1:~$  pubs -a -q '(abstract like "%diamond%") and (abstract like "%temperature%")'

(All abstracts mentioning diamand and temperature.)

This could be useful for finding who else has been working in particular areas.

Abstracts are output only if "-l" is specified. If it is, the html output changes from a list, to the sort of output intended as one page per article.

Journal Titles

The Journal titles stored in the Symplectic database are a mess. They suffer a seemingly random capitalisation and abbreviation regime. The one thing which is well-defined is the ISSN. As an excuse for showing raw sqlite3 tools operating on the local database, one can try:

m1:~$ sqlite3 /rscratch/Apps/Symplectic/pubs.sqlite3 \
'select journal from Pubs where issn="0031-9007";' | sort | uniq
PHYS REV LETT
Phys Rev Lett
Phys. Rev. Lett
Phys. Rev. Lett.
Phys.Rev.Lett.
Physical Review Letters

To hide this mess, the pubs command uses the issn field as an index into /rscratch/Apps/Symplectic/issn.txt, and sets the journal field to what it finds there. Only if there is no entry in issn.txt is the original contents of the journal field seen. The file issn.txt is very incomplete - additions welcome.

UTF-8

Records in the database are UTF8, as is the HTML output. The BibTeX and LaTeX outputs attempt to convert extended ASCII characters to LaTeX's symbols. This conversion is incomplete, and reports of errors will lead to corrections.

Citations

That these are precisely quantified, does not mean that they are accurate. The underlying database has different citation counts for different data sources, depending on the degree of coverage, or duplication, in that source. These counts can vary widely. One PRL I happened to notice has four different counts: 106, 31, 107, 41. The local database takes just the highest. Low values are likely to come from relatively narrow databases, and would serve to depress unnaturally any attempt to take an average.

An example of a query involving a citation would be:

m1:~$ pubs -a -q 'citations>1000'