Class CifDataParser
- All Implemented Interfaces:
GenericCifDataParser
- Direct Known Subclasses:
Cif2DataParser
regarding the treatment of single quotes vs. primes in cif file, PMR wrote:
* There is a formal grammar for CIF (see http://www.iucr.org/iucr-top/cif/index.html) which confirms this. The textual explanation is
14. Matching single or double quote characters (' or ") may be used to bound a string representing a non-simple data value provided the string does not extend over more than one line.
15. Because data values are invariably separated from other
tokens in the file by white space, such a quote-delimited
character string may contain instances of the character used
to delimit the string provided they are not followed by white
space. For example, the data item
_example 'a dog's life'
is legal; the data value is a dog's life.
[PMR - the terminating character(s) are quote+whitespace.
That would mean that:
_example 'Jones' life'
would be an error
The CIF format was developed in that late 1980's under the aegis of the International Union of Crystallography (I am a consultant to the COMCIFs committee). It was ratified by the Union and there have been several workshops. mmCIF is an extension of CIF which includes a relational structure. The formal publications are:
Hall, S. R. (1991). "The STAR File: A New Format for Electronic Data Transfer and Archiving", J. Chem. Inform. Comp. Sci., 31, 326-333. Hall, S. R., Allen, F. H. and Brown, I. D. (1991). "The Crystallographic Information File (CIF): A New Standard Archive File for Crystallography", Acta Cryst., A47, 655-685. Hall, S.R. & Spadaccini, N. (1994). "The STAR File: Detailed Specifications," J. Chem. Info. Comp. Sci., 34, 505-508.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected boolean
A flag to create and return Java objects, not strings.protected int
length of strprotected int
protected String[]
protected char
optional token terminator; in CIF 2.0 could be } or ]protected boolean
debugging flag passed from reader; unusedprotected boolean
A global, static map that contains field information.protected int
pointer to current character on strstatic final int
The maximum number of columns (data keys) passed to the parser or found in the file for a given loop_ or category.subkey listing.protected String
from buffered readerprotected String
string to return for CIF data value .protected String
working string (buffer)protected boolean
whether we are processing an unquoted value or keyFields inherited from interface javajs.api.GenericCifDataParser
EMPTY, NONE
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionUsed especially for data that might be multi-line data that might have unwanted white space at start or end.Parses all CIF data for a reader defined in the constructor into a standard Map structure and close the BufferedReader if it exists.getAllCifDataType
(String... types) int
getColumnData
(int i) getColumnName
(int i) boolean
getData()
The work horse; a general reader for loop data.first checks to see if the next token is an unquoted control code, and if so, returns nullGet a token as a String value (for the reader)Get the token as a Java Objectprotected Object
Just makes sureprotected Object
getQuotedStringOrObject
(char ch) CIF 1.0 only.protected int
protected boolean
isQuote
(char ch) CIF 1.0 only; we handle various quote types hereprotected boolean
isTerminator
(char c) The token terminator is space or tab in CIF 1.0, but it can be quoted strings in CIF 2.0.void
parseDataBlockParameters
(String[] fields, String key, String data, int[] key2col, int[] col2key) Process a data block, with or without a loop_.Just look at the next token.protected String
sets the string for parsing to be from the next line when the token buffer is empty, and if ';' is at the beginning of that line, extends the string to include that full multiline string.protected String
Encapsulate a multi-line ; ....protected String
Preprocess the string on a line starting with a semicolon to produce a string with a \1 ...readLine()
readList()
Read a CIF 2.0 list structure, converting it to either a JSON string or to a Java data structureset
(GenericLineReader reader, BufferedReader br, boolean debugging) A Chemical Information File data parser.void
setNullValue
(String nullString) Set the string value of what is returned for "." and "?"protected String
sets global str and line to be parsed from the beginning \1 ....skipLoop
(boolean doReport) Skips all associated loop data.Only translating the basic Greek set here, not all the other stuff.protected Object
In CIF 2.0, this method turns a String into an Integer or Float In CIF 1.0 (here) just return the unchanged value.
-
Field Details
-
KEY_MAX
public static final int KEY_MAXThe maximum number of columns (data keys) passed to the parser or found in the file for a given loop_ or category.subkey listing.- See Also:
-
line
from buffered reader -
str
working string (buffer) -
ich
protected int ichpointer to current character on str -
cch
protected int cchlength of str -
wasUnquoted
protected boolean wasUnquotedwhether we are processing an unquoted value or key -
cterm
protected char ctermoptional token terminator; in CIF 2.0 could be } or ] -
nullString
string to return for CIF data value . and ? -
asObject
protected boolean asObjectA flag to create and return Java objects, not strings. Used only by Jmol scripting x = getProperty("cifInfo", filename). -
debugging
protected boolean debuggingdebugging flag passed from reader; unused -
columnCount
protected int columnCount -
columnNames
-
haveData
protected boolean haveData -
htFields
A global, static map that contains field information. The assumption is that if we read a set of fields for, say, atom_site, once in a lifetime, then that should be good forever. Those are static lists. Or should be....
-
-
Constructor Details
-
CifDataParser
public CifDataParser()
-
-
Method Details
-
getVersion
protected int getVersion() -
setNullValue
Set the string value of what is returned for "." and "?"- Parameters:
nullString
- null here returns "." and "?"; default is "\0"
-
getColumnData
- Specified by:
getColumnData
in interfaceGenericCifDataParser
-
getColumnCount
public int getColumnCount()- Specified by:
getColumnCount
in interfaceGenericCifDataParser
-
getColumnName
- Specified by:
getColumnName
in interfaceGenericCifDataParser
-
set
A Chemical Information File data parser. set() should be called immediately upon construction. Two options; one of reader or br should be null, or reader will be ignored. Just simpler this way...- Specified by:
set
in interfaceGenericCifDataParser
- Parameters:
reader
- Anything that can deliver a line of text or nullbr
- A standard BufferedReader.debugging
-
-
getFileHeader
- Specified by:
getFileHeader
in interfaceGenericCifDataParser
- Returns:
- commented-out section at the start of a CIF file.
-
getAllCifData
Parses all CIF data for a reader defined in the constructor into a standard Map structure and close the BufferedReader if it exists.- Specified by:
getAllCifData
in interfaceGenericCifDataParser
- Returns:
- Hashtable of models Vector of Hashtable data
-
getAllCifDataType
-
readLine
- Specified by:
readLine
in interfaceGenericCifDataParser
-
getData
The work horse; a general reader for loop data. Fills colunnData with fieldCount fields.- Specified by:
getData
in interfaceGenericCifDataParser
- Returns:
- false if EOF
- Throws:
Exception
-
skipLoop
Skips all associated loop data. (Skips to next control word.)- Specified by:
skipLoop
in interfaceGenericCifDataParser
- Throws:
Exception
-
getNextToken
Get a token as a String value (for the reader)- Specified by:
getNextToken
in interfaceGenericCifDataParser
- Returns:
- the next token of any kind, or null
- Throws:
Exception
-
getNextTokenObject
Get the token as a Java Object- Returns:
- the next token of any kind, or null
- Throws:
Exception
-
getNextTokenProtected
Just makes sure- Returns:
- String from buffer.
- Throws:
Exception
-
getNextDataToken
first checks to see if the next token is an unquoted control code, and if so, returns null- Specified by:
getNextDataToken
in interfaceGenericCifDataParser
- Returns:
- next data token or null
- Throws:
Exception
-
peekToken
Just look at the next token. Saves it for retrieval using getTokenPeeked()- Specified by:
peekToken
in interfaceGenericCifDataParser
- Returns:
- next token or null if EOF
- Throws:
Exception
-
getTokenPeeked
- Specified by:
getTokenPeeked
in interfaceGenericCifDataParser
- Returns:
- the token last acquired; may be null
-
fullTrim
Used especially for data that might be multi-line data that might have unwanted white space at start or end.- Specified by:
fullTrim
in interfaceGenericCifDataParser
- Parameters:
str
-- Returns:
- str without any leading/trailing white space, and no '\n'
-
toUnicode
Only translating the basic Greek set here, not all the other stuff. See http://www.iucr.org/resources/cif/spec/version1.1/semantics#markup- Specified by:
toUnicode
in interfaceGenericCifDataParser
- Parameters:
data
-- Returns:
- cleaned string
-
parseDataBlockParameters
public void parseDataBlockParameters(String[] fields, String key, String data, int[] key2col, int[] col2key) throws Exception Process a data block, with or without a loop_. Passed an array of field names, this method fills two int[] arrays. The first, key2col, maps desired key values to actual order of appearance (column number) in the file; the second, col2key, is a reverse loop-up for that, mapping column numbers to desired field indices. When called within a loop_ context, this.columnData will be created but not filled. Alternatively, if fields is null, then this.fieldNames is filled, in order, with key data, and both key2col and col2key will be simply 0,1,2,... This array is used in cases such as matrices for which there are simply too many possibilities to list, and the key name itself contains information that we need. When not a loop_ context, keys are expected to be in the mmCIF form category.subkey and will be unique within a data block (see http://mmcif.wwpdb.org/docs/tutorials/mechanics/pdbx-mmcif-syntax.html). Keys and data will be read for all data in the same category, filling this.columnData. In this way, the calling class does not need to enumerate all possible category names, but instead can focus on just those of interest.- Specified by:
parseDataBlockParameters
in interfaceGenericCifDataParser
- Parameters:
fields
- list of normalized field names, such as "_pdbx_struct_assembly_gen_assembly_id" (with "_" instead of ".")key
- null to indicate a loop_ construct, otherwise the initial category.subkey founddata
- when not loop_ the initial data read, otherwise ignoredkey2col
- map of desired keys to actual columnscol2key
- map of actual columns to desired keys- Throws:
Exception
-
fixKey
- Specified by:
fixKey
in interfaceGenericCifDataParser
-
setString
sets global str and line to be parsed from the beginning \1 .... \1 indicates an embedded fully escaped data object- Parameters:
str
- new data string- Returns:
- str
-
prepareNextLine
sets the string for parsing to be from the next line when the token buffer is empty, and if ';' is at the beginning of that line, extends the string to include that full multiline string. Uses \1 to indicate that this is a special quotation.- Returns:
- the next line or null if EOF
- Throws:
Exception
-
preprocessString
Preprocess the string on a line starting with a semicolon to produce a string with a \1 ... \1 segment that will be picked up in the next round- Returns:
- escaped part with attached extra data
- Throws:
Exception
-
preprocessSemiString
Encapsulate a multi-line ; .... ; string with \1 ... \1 CIF 1.0 and CIF 2.0- Returns:
- ecapsulated string
- Throws:
Exception
-
unquoted
In CIF 2.0, this method turns a String into an Integer or Float In CIF 1.0 (here) just return the unchanged value.- Parameters:
s
- unquoted string- Returns:
- unchanged value
-
isTerminator
protected boolean isTerminator(char c) The token terminator is space or tab in CIF 1.0, but it can be quoted strings in CIF 2.0.- Parameters:
c
-- Returns:
- true if this character is a terminator
-
isQuote
protected boolean isQuote(char ch) CIF 1.0 only; we handle various quote types here- Parameters:
ch
-- Returns:
- true if this character is a (starting) quote
-
getQuotedStringOrObject
CIF 1.0 only.- Parameters:
ch
- current character being pointed to- Returns:
- a String data object
-
readList
Read a CIF 2.0 list structure, converting it to either a JSON string or to a Java data structure- Returns:
- a string or data structure, depending upon setting asObject
- Throws:
Exception
-
skipNextToken
- Specified by:
skipNextToken
in interfaceGenericCifDataParser
- Throws:
Exception
-