summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore2
-rw-r--r--README.md102
-rw-r--r--parsebib.el635
3 files changed, 599 insertions, 140 deletions
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..6db6ff5
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+*.elc
+testbib.bib
diff --git a/README.md b/README.md
index 02e2b86..05a8f27 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,104 @@
Parsebib
=======
-(c) 2014 Joost Kremers
+(c) 2014-2017 Joost Kremers
-`Parsebib` is an Elisp library for reading `.bib` files. It provides functions for going through a `.bib` file and reading the entries one by one. There are separate functions for reading `@Preamble`, `@String` and `@Comment` items.
+`Parsebib` is an Elisp library for reading `.bib` files. It provides two different APIs, a higher-level one that reads all items in one go, and a lower-level one that reads one item at a time. Supported items are `@Preamble`, `@String` and `@Comment` items, and obviously actual bibliographic entries.
-To get started reading a `.bib` file, use the function `parsebib-find-next-item`, which finds the first BibTeX item following point, where an item is either an entry, or a `@Preamble`, `@String`, or `@Comment`. This function moves point to the item and returns the item’s name as a string without the `@` (i.e., either the entry type, or `"preamble"`, `"string"`, or `"comment"`). The functions `parsebib-read-preamble`, `parsebib-read-string`, `parsebib-read-comment`, and `parsebib-read-entry` all do as their names suggest. They start reading at the beginning of the line that point is on and thus can be used after `parsebib-find-next-item`. Note that `parsebib-read-entry` takes the entry type (as returned by `parsebib-find-next-entry`) as argument. All these functions take an optional position argument (either a number or a marker), from which to start searching, which defaults to point.
+Both APIs parse the current buffer. If you wish to combine multiple `.bib` files, you need to parse each separately.
-The reading functions return the contents of the item they read: `parsebib-read-preamble` and `parsebib-read-comment` return the text as a string. `parsebib-read-string` returns a cons cell of the form `(<abbrev> . <string>)`, and `parsebib-read-entry` returns the entry as an alist of `(<field> . <value>)` pairs. The alist contains an element with the key `=type=`, which holds the entry type, and an element with the key `=key=`, which holds the entry key. All functions move point to the end of the entry.
-The reading functions return `nil` if they do not find the element they should be reading at the line point is on. Point is nonetheless moved, however. Similarly, `parsebib-find-next-item` returns `nil` if it finds no next entry, leaving point at the end of the buffer. Additionally, it will signal an error of type `parsebib-entry-type-error` if it finds something that it deems to be an invalid item name. What is considered to be a valid name is determined by the regexp `parsebib-bibtex-identifier`, which is set to `"[^^\"@\\&$#%',={}() \t\n\f]*"`, meaning that any string not containing whitespace or any of the characters `^"@\&$#%',={}()` is considered a valid identifier.
+Resolving `@string` abbreviations and cross-references
+------------------------------------------------------
+
+Parsebib can resolve `@string` abbrevs and cross-references while reading the contents of a `.bib` file. When `@string` abbrevs are resolved, abbreviations in field values (or `@string` definitions) are replaced with their expansion. In addition, the braces or double quotes around field values are removed, and multiple spaces and newlines in sequence are reduced to a single space. In essence, the field values are modified in such a way that they are suitable for display, but they no longer reliably represent the contents of the `.bib` file. When `@string` abbrevs are not resolved, no modifications are applied to the field values, so that the parsing results reflect the contents of the `.bib` file accurately.
+
+Cross-references can also be resolved. This means that if an entry that has a `crossref` field, fields in the cross-referenced entry that are not already part of the cross-referencing entry are added to it. Both BibTeX's (rather simplistic) inheritance rule and BibLaTeX's more sophisticated inheritance schema are supported. It is also possible to specify a custom inheritance schema. Note that resolving cross-references can be done independently from resolving `@string` abbrevs, but the former generally won't make sense without the latter.
+
+Resolving `@string` abbrevs can be done with both the higher-level and the lower-level API. Resolving cross-references can only be done with the higher-level API. This is mainly because cross-referenced entries appear *after* cross-referencing entries in the `.bib` file, so that when an entry with a `crossref` field is read, its cross-referenced entry is not known yet, while `@string` definitions appear in the `.bib` file before they are used. It is possible, however, to resolve cross-references after all entries have been read.
+
+
+Higher-level API
+----------------
+
+The higher-level API consists of functions that read and return all items of a specific type in the current buffer. They do not move point.
+
+
+### `parsebib-collect-entries (&optional hash strings inheritance)` ###
+
+Collect all entries in the current buffer and return them as a hash table, where the keys correspond to the BibTeX keys and the values are alists consisting of `(<field> . <value>)` pairs of the relevant entries. In this alist, the BibTeX key and the entry type are stored under `=key=` and `=type=`, respectively.
+
+The variable `hash` can be used to pass a hash table in which the entries are stored. This can be used to combine multiple `.bib` files into a single hash table, or to update an existing hash table by rereading its `.bib` file.
+
+If the variable `strings` is present, `@string` abbreviations are expanded. `strings` should be a hash table of `@string` definitions as returned by `parsebib-collect-strings`.
+
+If the variable `inheritance` is present, cross-references among entries are resolved. It can be `t`, in which case the file-local or global value of `bibtex-dialect` is used to determine which inheritance schema is used. It can also be one of the symbols `BibTeX` or `biblatex`, or it can be a custom inheritance schema.
+
+
+### `parsebib-collect-strings (&optional hash expand-strings)` ###
+
+Collect all `@string` definitions in the current buffer and return them as a hash table. The variable `hash` can be used to provide a hash table to store the definitions in. If it is `nil`, a new hash table is created.
+
+The argument `expand-strings` is a boolean value. If non-nil, any abbreviations found in the string definitions are expanded against the `@string` definitions appearing earlier in the `.bib` file and against `@string` definitions in `hash`, if provided.
+
+
+### `parsebib-collect-preambles` ###
+
+Collect all `@preamble` definitions in the current buffer and return them as a list.
+
+
+### `parsebib-collect-comments` ###
+
+Collect all `@comments` in the current buffer and return them as a list.
+
+
+### `parsebib-find-bibtex-dialect` ###
-There is one additional function: `parsebib-find-bibtex-dialect`. This function looks for a local variable block in a `@Comment` and checks if the variable `bibtex-dialect` is set. If it is, it returns the value that is set (as a symbol). The value should be one of the elements in `bibtex-dialect-list`, i.e., by default one of `(BibTeX biblatex)`.
+Find and return the BibTeX dialect for the current buffer. The BibTeX dialect is either `BibTeX` or `biblatex` and can be defined in a local-variable block at the end of the file.
+
+
+### `parsebib-parse-buffer (&optional entries strings expand-strings inheritance)` ###
+
+Collect all BibTeX data in the current buffer. Return a five-element list:
+
+ (<entries> <strings> <preambles> <comments> <BibTeX dialect>)
+
+The `<entries>` and `<strings>` are hash tables, `<preambles>` and `<comments>` are lists, `<BibTeX dialect>` is a symbol (either `BibTeX` or `biblatex`) or `nil`.
+
+If the arguments `entries` and `strings` are present, they should be hash tables with `equal` as the `:test` function. They are then used to store the entries and strings, respectively.
+
+The argument `expand-strings` functions as the same-name argument in `parsebib-collect-strings`, and `inheritance` functions as the same-name argument in `parsebib-collect-entries`.
+
+Note that `parsebib-parse-buffer` only makes one pass through the buffer. It is therefore a bit faster than calling all the `parsebib-collect-*` functions above in a row, since that would require making four passes through the buffer.
+
+
+### `parsebib-expand-xrefs (entries inheritance)` ###
+
+Expand cross-references in `entries` according to inheritance schema `inheritance`. `entries` should be a hash table as returned by `parsebib-collect-entries`. Each entry with a `crossref` field is expanded as described above. The results are stored in the hash table `entries` again, the return value of this function is always `nil`.
+
+
+Lower-level API
+---------------
+
+The lower-level API consists of functions that do the actual reading of a BibTeX item. Unlike the higher-level API, the functions here are dependent on the position of `point`. They are meant to be used in a `while` loop in which `parsebib-find-next-item` is used to move `point` to the next item and then use one of the `parsebib-read-*` functions to read the contents of the item.
+
+All functions here take an optional position argument, which is the position in the buffer from which they should start reading. The default value is `(point)`.
+
+
+### `parsebib-find-next-item (&optional pos)` ###
+
+Find the first BibTeX item following `pos`, where an item is either a BibTeX entry, or a `@Preamble`, `@String`, or `@Comment`. This function returns the item's type as a string, i.e., either `"preamble"`, `"string"`, or `"comment"`, or the entry type. Note that the `@` is *not* part of the returned string. This function moves point into the correct position to start reading the actual contents of the item, which is done by one of the following functions.
+
+
+### `parsebib-read-string (&optional pos strings)` ###
+### `parsebib-read-entry (type &optional pos strings)` ###
+### `parsebib-read-preamble (&optional pos)` ###
+### `parsebib-read-comment (&optional pos)` ###
+
+These functions do what their names suggest: read one single item of the type specified. Each takes the `pos` argument just mentioned. In addition, `parsebib-read-string` and `parsebiib-read-entry` take an extra argument, a hash table of `@string` definitions. When provided, abbreviations in the `@string` definitions or in field values are expanded. Note that `parsebib-read-entry` takes the entry type (as returned by `parsebib-find-next-entry`) as argument.
+
+The reading functions return the contents of the item they read: `parsebib-read-preamble` and `parsebib-read-comment` return the text as a string. `parsebib-read-string` returns a cons cell of the form `(<abbrev> . <string>)`, and `parsebib-read-entry` returns the entry as an alist of `(<field> . <value>)` pairs. One of these pairs contains the entry type `=type=`, and one contains the entry key. These have the keys `"=key="` and `"=type="`, respectively.
+
+Note that all `parsebib-read*` functions move point to the end of the entry.
+
+The reading functions return `nil` if they do not find the element they should be reading at the line point is on. Point is nonetheless moved, however. Similarly, `parsebib-find-next-item` returns `nil` if it finds no next entry, leaving point at the end of the buffer. Additionally, it will signal an error of type `parsebib-entry-type-error` if it finds something that it deems to be an invalid item name. What is considered to be a valid name is determined by the regexp `parsebib-bibtex-identifier`, which is set to `"[^^\"@\\&$#%',={}() \t\n\f]*"`, meaning that any string not containing whitespace or any of the characters `^"@\&$#%',={}()` is considered a valid identifier.
diff --git a/parsebib.el b/parsebib.el
index 2ed9494..d66b39a 100644
--- a/parsebib.el
+++ b/parsebib.el
@@ -1,12 +1,12 @@
-;;; parsebib.el --- A library for parsing bib files
+;;; parsebib.el --- A library for parsing bib files -*- lexical-binding: t -*-
-;; Copyright (c) 2014 Joost Kremers
+;; Copyright (c) 2014-2017 Joost Kremers
;; All rights reserved.
;; Author: Joost Kremers <joostkremers@fastmail.fm>
;; Maintainer: Joost Kremers <joostkremers@fastmail.fm>
;; Created: 2014
-;; Version: 1.0
+;; Version: 2.3
;; Keywords: text bibtex
;; Package-Requires: ((emacs "24.3"))
@@ -41,39 +41,154 @@
(require 'bibtex)
(require 'cl-lib)
-
-(defconst parsebib--bibtex-identifier "[^^\"@\\&$#%',={}() \t\n\f]*" "Regexp describing a licit BibTeX identifier.")
-(defconst parsebib--key-regexp "[^^\"@\\&$#%',={} \t\n\f]*" "Regexp describing a licit key.")
+(require 'subr-x) ; for `string-join'.
+
+(defvar parsebib--biblatex-inheritances '(("all"
+ "all"
+ (("ids" . none)
+ ("crossref" . none)
+ ("xref" . none)
+ ("entryset" . none)
+ ("entrysubtype" . none)
+ ("execute" . none)
+ ("label" . none)
+ ("options" . none)
+ ("presort" . none)
+ ("related" . none)
+ ("relatedoptions" . none)
+ ("relatedstring" . none)
+ ("relatedtype" . none)
+ ("shorthand" . none)
+ ("shorthandintro" . none)
+ ("sortkey" . none)))
+
+ ("mvbook, book"
+ "inbook, bookinbook, suppbook"
+ (("author" . "author")
+ ("author" . "bookauthor")))
+
+ ("mvbook"
+ "book, inbook, bookinbook, suppbook"
+ (("title" . "maintitle")
+ ("subtitle" . "mainsubtitle")
+ ("titleaddon" . "maintitleaddon")
+ ("shorttitle" . none)
+ ("sorttitle" . none)
+ ("indextitle" . none)
+ ("indexsorttitle" . none)))
+
+ ("mvcollection, mvreference"
+ "collection, reference, incollection, inreference, suppcollection"
+ (("title" . "maintitle")
+ ("subtitle" . "mainsubtitle")
+ ("titleaddon" . "maintitleaddon")
+ ("shorttitle" . none)
+ ("sorttitle" . none)
+ ("indextitle" . none)
+ ("indexsorttitle" . none)))
+
+ ("mvproceedings"
+ "proceedings, inproceedings"
+ (("title" . "maintitle")
+ ("subtitle" . "mainsubtitle")
+ ("titleaddon" . "maintitleaddon")
+ ("shorttitle" . none)
+ ("sorttitle" . none)
+ ("indextitle" . none)
+ ("indexsorttitle" . none)))
+
+ ("book"
+ "inbook, bookinbook, suppbook"
+ (("title" . "booktitle")
+ ("subtitle" . "booksubtitle")
+ ("titleaddon" . "booktitleaddon")
+ ("shorttitle" . none)
+ ("sorttitle" . none)
+ ("indextitle" . none)
+ ("indexsorttitle" . none)))
+
+ ("collection, reference"
+ "incollection, inreference, suppcollection"
+ (("title" . "booktitle")
+ ("subtitle" . "booksubtitle")
+ ("titleaddon" . "booktitleaddon")
+ ("shorttitle" . none)
+ ("sorttitle" . none)
+ ("indextitle" . none)
+ ("indexsorttitle" . none)))
+
+ ("proceedings"
+ "inproceedings"
+ (("title" . "booktitle")
+ ("subtitle" . "booksubtitle")
+ ("titleaddon" . "booktitleaddon")
+ ("shorttitle" . none)
+ ("sorttitle" . none)
+ ("indextitle" . none)
+ ("indexsorttitle" . none)))
+
+ ("periodical"
+ "article, suppperiodical"
+ (("title" . "journaltitle")
+ ("subtitle" . "journalsubtitle")
+ ("shorttitle" . none)
+ ("sorttitle" . none)
+ ("indextitle" . none)
+ ("indexsorttitle" . none))))
+ "Inheritance scheme for BibLaTeX cross-referencing.
+Inheritances are specified for pairs of source and target entry
+type, where the target is the cross-referencing entry and the
+source the cross-referenced entry. Each pair specifies the
+fields in the source and the fields in the target that they
+correspond with.
+
+Inheritances valid for all entry types are defined by specifying
+the entry type as \"all\". The entry type may also be a
+comma-separated list of entry types.
+
+If no inheritance rule is set up for a given entry type+field
+combination, the field inherits from the same-name field in the
+cross-referenced entry. If no inheritance should take place, the
+target field is set to the symbol `none'.")
+
+;; Regexes describing BibTeX identifiers and keys. Note that while $ ^ & are
+;; valid in BibTeX keys, they may nonetheless be problematic, because they are
+;; special for TeX. The difference between `parsebib--bibtex-identifier' and
+;; `parsebib--key-regexp' are the parentheses (), which are valid in keys. It may in
+;; fact not be necessary (or desirable) to distinguish the two, but until
+;; someone complains, I'll keep it this way.
+(defconst parsebib--bibtex-identifier "[^\"@\\#%',={}() \t\n\f]+" "Regexp describing a licit BibTeX identifier.")
+(defconst parsebib--key-regexp "[^\"@\\#%',={} \t\n\f]+" "Regexp describing a licit key.")
(defconst parsebib--entry-start "^[ \t]*@" "Regexp describing the start of an entry.")
;; Emacs 24.3 compatibility code.
-(if (version-list-< (version-to-list emacs-version) '(24 4))
- ;; This definition is simply copied from the Emacs 24.4 sources
- (defun define-error (name message &optional parent)
- "Define NAME as a new error signal.
+(unless (fboundp 'define-error)
+ ;; This definition is simply copied from the Emacs 24.4 sources
+ (defun define-error (name message &optional parent)
+ "Define NAME as a new error signal.
MESSAGE is a string that will be output to the echo area if such an error
is signaled without being caught by a `condition-case'.
PARENT is either a signal or a list of signals from which it inherits.
Defaults to `error'."
- (unless parent (setq parent 'error))
- (let ((conditions
- (if (consp parent)
- (apply #'nconc
- (mapcar (lambda (parent)
- (cons parent
- (or (get parent 'error-conditions)
- (error "Unknown signal `%s'" parent))))
- parent))
- (cons parent (get parent 'error-conditions)))))
- (put name 'error-conditions
- (delete-dups (copy-sequence (cons name conditions))))
- (when message (put name 'error-message message)))))
+ (unless parent (setq parent 'error))
+ (let ((conditions
+ (if (consp parent)
+ (apply #'nconc
+ (mapcar (lambda (parent)
+ (cons parent
+ (or (get parent 'error-conditions)
+ (error "Unknown signal `%s'" parent))))
+ parent))
+ (cons parent (get parent 'error-conditions)))))
+ (put name 'error-conditions
+ (delete-dups (copy-sequence (cons name conditions))))
+ (when message (put name 'error-message message)))))
(define-error 'parsebib-entry-type-error "Illegal entry type" 'error)
-;;;;;;;;;;;;;;;;;;;;
-;; matching stuff ;;
-;;;;;;;;;;;;;;;;;;;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; matching and parsing stuff ;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun parsebib--looking-at-goto-end (str &optional match)
"Like `looking-at' but move point to the end of the matching string STR.
@@ -90,48 +205,37 @@ This function handles parentheses () and braces {}. Return t if a
matching parenthesis was found. Note that this function puts
point right before the closing delimiter (unlike e.g.,
`forward-sexp', which puts it right after.)"
- (let ((result (cond
- ((eq (char-after) ?\{)
- (parsebib--match-brace-forward))
- ((eq (char-after) ?\()
- ;; This is really a hack. We want to allow unbalanced parentheses in
- ;; field values (BibTeX does), so we cannot use forward-sexp
- ;; here. For the same reason, looking for the matching paren by hand
- ;; is pretty complicated. However, balanced parentheses can only be
- ;; used to enclose entire entries (or @STRINGs or @PREAMBLEs) so we
- ;; can be pretty sure we'll find it right before the next @ at the
- ;; start of a line, or right before the end of the file.
- (let ((beg (point)))
- (re-search-forward parsebib--entry-start nil 0)
- (skip-chars-backward "@ \n\t\f")
- (if (eq (char-after) ?\))
- ;; if we've found a closing paren, return t
- t
- ;; otherwise put the cursor back and signal an error
- (goto-char beg)
- (signal 'scan-error (list "Unbalanced parentheses" beg (point-max)))))))))
- (when result
- ;; move point one char back to place it where the rest of parsebib expects it
- (forward-char -1)
- ;; make sure we return t
- result)))
+ (cond
+ ((eq (char-after) ?\{)
+ (parsebib--match-brace-forward))
+ ((eq (char-after) ?\()
+ ;; This is really a hack. We want to allow unbalanced parentheses in
+ ;; field values (BibTeX does), so we cannot use forward-sexp
+ ;; here. For the same reason, looking for the matching paren by hand
+ ;; is pretty complicated. However, balanced parentheses can only be
+ ;; used to enclose entire entries (or @STRINGs or @PREAMBLEs) so we
+ ;; can be pretty sure we'll find it right before the next @ at the
+ ;; start of a line, or right before the end of the file.
+ (let ((beg (point)))
+ (re-search-forward parsebib--entry-start nil 0)
+ (skip-chars-backward "@ \n\t\f")
+ (if (eq (char-after) ?\))
+ ;; if we've found a closing paren, return t
+ t
+ ;; otherwise put the cursor back and signal an error
+ (goto-char beg)
+ (signal 'scan-error (list "Unbalanced parentheses" beg (point-max))))))))
(defun parsebib--match-delim-forward ()
"Move forward to the closing delimiter matching the delimiter at point.
This function handles braces {} and double quotes \"\". Return t
-if a matching delimiter was found. Note that this function puts
-point right before the closing delimiter (unlike e.g.,
-`forward-sexp', which puts it right after.)"
+if a matching delimiter was found."
(let ((result (cond
((eq (char-after) ?\{)
(parsebib--match-brace-forward))
((eq (char-after) ?\")
(parsebib--match-quote-forward)))))
- (when result
- ;; move point one char back to place it where the rest of parsebib expects it
- (forward-char -1)
- ;; make sure we return t
- result)))
+ result))
(defun parsebib--match-brace-forward ()
"Move forward to the closing brace matching the opening brace at point."
@@ -147,30 +251,130 @@ point right before the closing delimiter (unlike e.g.,
;; if forward-sexp does not result in an error, we want to return t
t))
-
-;;;;;;;;;;;;;;;;;;;;;;;;
-;; parsing a bib file ;;
-;;;;;;;;;;;;;;;;;;;;;;;;
-
-(defun parsebib-find-bibtex-dialect ()
- "Find the BibTeX dialect of a file if one is set.
-This function looks for a local value of the variable
-`bibtex-dialect' in the local variable block at the end of the
-file. Return nil if no dialect is found."
- (save-excursion
- (goto-char (point-max))
- (let ((case-fold-search t))
- (when (re-search-backward (concat parsebib--entry-start "comment") (- (point-max) 3000) t)
- (let ((comment (parsebib-read-comment)))
- (when (and comment
- (string-match-p "\\`[ \n\t\r]*Local Variables:" comment)
- (string-match-p "End:[ \n\t\r]*\\'" comment)
- (string-match (concat "bibtex-dialect: " (regexp-opt (mapcar #'symbol-name bibtex-dialect-list) t)) comment))
- (intern (match-string 1 comment))))))))
+(defun parsebib--parse-value (limit &optional strings)
+ "Parse value at point.
+A value is either a field value or a @String expansion. Return
+the value as a string. No parsing is done beyond LIMIT, but note
+that parsing may stop well before LIMIT.
+
+STRINGS, if non-nil, is a hash table of @String definitions.
+@String abbrevs in the value to be parsed are then replaced with
+their expansions. Additionally, newlines in field values are
+removed, white space is reduced to a single space and braces or
+double quotes around field values are removed."
+ (let (res)
+ (while (and (< (point) limit)
+ (not (looking-at-p ",")))
+ (cond
+ ((looking-at-p "[{\"]")
+ (let ((beg (point)))
+ (parsebib--match-delim-forward)
+ (push (buffer-substring-no-properties beg (point)) res)))
+ ((looking-at parsebib--bibtex-identifier)
+ (push (buffer-substring-no-properties (point) (match-end 0)) res)
+ (goto-char (match-end 0)))
+ ((looking-at "[[:space:]]*#[[:space:]]*")
+ (goto-char (match-end 0)))
+ (t (forward-char 1)))) ; so as not to get stuck in an infinite loop.
+ (if strings
+ (string-join (parsebib--expand-strings (nreverse res) strings))
+ (string-join (nreverse res) " # "))))
+
+;;;;;;;;;;;;;;;;;;;;;
+;; expanding stuff ;;
+;;;;;;;;;;;;;;;;;;;;;
+
+(defun parsebib--expand-strings (strings abbrevs)
+ "Expand strings in STRINGS using expansions in ABBREVS.
+STRINGS is a list of strings. If a string in STRINGS has an
+expansion in hash table ABBREVS, replace it with its expansion.
+Otherwise, if the string is enclosed in braces {} or double
+quotes \"\", remove the delimiters. In addition, newlines and
+multiple spaces in the string are replaced with a single space."
+ (mapcar (lambda (str)
+ (setq str (replace-regexp-in-string "[ \t\n\f]+" " " str))
+ (cond
+ ((gethash str abbrevs))
+ ((string-match "\\`[\"{]\\(.*?\\)[\"}]\\'" str)
+ (match-string 1 str))
+ (t str)))
+ strings))
+
+(defun parsebib-expand-xrefs (entries inheritance)
+ "Expand cross-referencing items in ENTRIES.
+BibTeX entries in ENTRIES that have a `crossref' field are
+expanded with the fields in the cross-referenced entry. ENTRIES
+is a hash table with entries. This hash table is updated with
+the new fields. The return value of this function is always nil.
+
+INHERITANCE indicates the inheritance schema. It can be a symbol
+`BibTeX' or `biblatex', or it can be an explicit inheritance
+schema. See the variable `parsebib--biblatex-inheritances' for
+details on the structure of such an inheritance schema."
+ (maphash (lambda (key fields)
+ (let ((xref (cdr (assoc-string "crossref" fields))))
+ (when xref
+ (if (string-match-p (concat "\\b[\"{]" parsebib--key-regexp "[\"}]\\b") xref)
+ (setq xref (substring xref 1 -1)))
+ (let* ((source (gethash xref entries))
+ (updated-entry (parsebib--get-xref-fields fields source inheritance)))
+ (when updated-entry
+ (puthash key updated-entry entries))))))
+ entries))
+
+(defun parsebib--get-xref-fields (target-entry source-entry inheritance)
+ "Return TARGET-ENTRY supplemented with fields inherited from SOURCE-ENTRY.
+TARGET-ENTRY and SOURCE-ENTRY are entry alists. Fields in
+SOURCE-ENTRY for which TARGET-ENTRY has no value are added to
+TARGET-ENTRY. Return value is the modified TARGET-ENTRY.
+
+INHERITANCE is an inheritance schema. It can either be one of
+the symbols `BibTeX' or `biblatex', or it can be an explicit
+inheritance schema. See the variable
+`parsebib--biblatex-inheritances' for details on the structure of
+such an inheritance schema."
+ (when (and target-entry source-entry)
+ (when (eq inheritance 'biblatex)
+ (setq inheritance parsebib--biblatex-inheritances))
+ (let* ((inheritable-fields (unless (eq inheritance 'BibTeX)
+ (append (cl-third (cl-find-if (lambda (elem)
+ (and (string-match-p (concat "\\b" (cdr (assoc-string "=type=" source-entry)) "\\b") (cl-first elem))
+ (string-match-p (concat "\\b" (cdr (assoc-string "=type=" target-entry)) "\\b") (cl-second elem))))
+ inheritance))
+ (cl-third (assoc-string "all" inheritance)))))
+ (new-fields (delq nil (mapcar (lambda (field)
+ (let ((target-field (parsebib--get-target-field (car field) inheritable-fields)))
+ (if (and target-field
+ (not (assoc-string target-field target-entry 'case-fold)))
+ (cons target-field (cdr field)))))
+ source-entry))))
+ (append target-entry new-fields))))
+
+(defun parsebib--get-target-field (source-field inheritances)
+ "Return the target field for inheritance from SOURCE-FIELD.
+Inheritance is determined by INHERITANCES, which is an alist of
+source/target pairs. If no inheritance should take place for
+SOURCE-FIELD, the target in the relevant item in INHERITANCES is
+the symbol `none'. If there is no item for SOURCE-FIELD in
+INHERITANCES, SOURCE-FIELD is returned. Note that it is valid
+for INHERITANCES to be nil."
+ ;; Note: the argument INHERITANCES differs from the INHERITANCE argument in
+ ;; the previous two functions. It is a simple alist of (source-field
+ ;; . target-field) pairs.
+ (let ((target-field (cdr (assoc-string source-field inheritances 'case-fold))))
+ (cond
+ ((null target-field)
+ source-field)
+ ((eq target-field 'none)
+ nil)
+ (t target-field))))
+
+;;;;;;;;;;;;;;;;;;;
+;; low-level API ;;
+;;;;;;;;;;;;;;;;;;;
(defun parsebib-find-next-item (&optional pos)
"Find the first (potential) BibTeX item following POS.
-
This function simply searches for an @ at the start of a line,
possibly preceded by spaces or tabs, followed by a string of
characters as defined by `parsebib--bibtex-identifier'. When
@@ -187,15 +391,19 @@ at the end of the buffer.
POS can be a number or a marker and defaults to point."
(when pos (goto-char pos))
- (when (re-search-forward parsebib--entry-start nil 0)
- (if (parsebib--looking-at-goto-end (concat "\\(" parsebib--bibtex-identifier "\\)" "[[:space:]]*[\(\{]") 1)
- (match-string 1)
+ (when (re-search-forward parsebib--entry-start nil 0)
+ (if (parsebib--looking-at-goto-end (concat "\\(" parsebib--bibtex-identifier "\\)" "[[:space:]]*[\(\{]?") 1)
+ (match-string-no-properties 1)
(signal 'parsebib-entry-type-error (list (point))))))
(defun parsebib-read-comment (&optional pos)
"Read the @Comment beginning at the line POS is on.
-Return value is the text of the @Comment or nil if no comment is
-found.
+Return value is the text of the @Comment including the braces.
+For comments that last until the end of the line (i.e., comments
+that are not delimited by braces), the return value includes the
+whitespace between `@comment' and the actual comment text.
+
+If no comment could be found, return nil.
POS can be a number or a marker. It does not have to be at the
beginning of a line, but the @Comment entry must start at the
@@ -203,44 +411,46 @@ beginning of the line POS is on. If POS is nil, it defaults to
point."
(when pos (goto-char pos))
(beginning-of-line)
- (when (parsebib--looking-at-goto-end (concat parsebib--entry-start "comment[[:space:]]*[\(\{]"))
- (let ((beg (point))) ; we are right after the opening brace / parenthesis
- (forward-char -1) ; move back to the brace / paren
- (when (parsebib--match-paren-forward)
- (buffer-substring-no-properties beg (point))))))
+ (when (parsebib--looking-at-goto-end (concat parsebib--entry-start "\\(comment\\)[[:space:]]*[\(\{]?") 1)
+ (let ((beg (point)))
+ (if (looking-at-p "[[:space:]]*[\(\{]")
+ (progn (skip-chars-forward "[:space:]")
+ (parsebib--match-paren-forward))
+ (goto-char (point-at-eol)))
+ (buffer-substring-no-properties beg (point)))))
-(defun parsebib-read-string (&optional pos)
+(defun parsebib-read-string (&optional pos strings)
"Read the @String definition beginning at the line POS is on.
-If a proper abbreviation and string are found, they are returned
-as a cons cell (<abbrev> . <string>). Otherwise, nil is returned.
+If a proper abbreviation and expansion are found, they are
+returned as a cons cell (<abbrev> . <expansion>). Otherwise, nil
+is returned.
POS can be a number or a marker. It does not have to be at the
beginning of a line, but the @String entry must start at the
beginning of the line POS is on. If POS is nil, it defaults to
-point."
+point.
+
+If STRINGS is provided it should be a hash table with string
+abbreviations, which are used to expand abbrevs in the string's
+expansion."
+ (interactive)
(when pos (goto-char pos))
(beginning-of-line)
- (when (parsebib--looking-at-goto-end (concat parsebib--entry-start "string[[:space:]]*[\(\{]"))
- (let ((limit (save-excursion ; find the position of the matching end parenthesis
- (forward-char -1)
+ (when (parsebib--looking-at-goto-end (concat parsebib--entry-start "\\(string[[:space:]]*\\)[\(\{]") 1)
+ (let ((limit (save-excursion
(parsebib--match-paren-forward)
(point))))
- (skip-chars-forward "\"#%'(),={} \n\t\f" limit) ; move up to the abbrev
- (let* ((beg (point)) ; read the abbrev
- (abbr (if (parsebib--looking-at-goto-end (concat "\\(" parsebib--bibtex-identifier "\\)[ \t\n\f]*=") 1)
- (buffer-substring-no-properties beg (point))
- nil)))
- (when (and abbr (> (length abbr) 0)) ; if we found an abbrev
- (skip-chars-forward "^\"{" limit) ; move forward to the definition
- (let* ((beg (point)) ; read the definition
- (string (if (parsebib--match-delim-forward)
- (buffer-substring-no-properties beg (1+ (point))))))
- (and string (> (length string) 0)
- (cons abbr string))))))))
+ (parsebib--looking-at-goto-end (concat "[({]\\(" parsebib--bibtex-identifier "\\)[[:space:]]*=[[:space:]]*"))
+ (let ((abbr (match-string-no-properties 1)))
+ (when (and abbr (> (length abbr) 0)) ; if we found an abbrev
+ (let ((expansion (parsebib--parse-value limit strings)))
+ (goto-char limit)
+ (cons abbr expansion)))))))
(defun parsebib-read-preamble (&optional pos)
"Read the @Preamble definition at the line POS is on.
-Return the preamble as a string, or nil if none was found.
+Return the preamble as a string (including the braces surrounding
+the preamble text), or nil if no preamble was found.
POS can be a number or a marker. It does not have to be at the
beginning of a line, but the @Preamble must start at the
@@ -248,15 +458,13 @@ beginning of the line POS is on. If POS is nil, it defaults to
point."
(when pos (goto-char pos))
(beginning-of-line)
- (when (parsebib--looking-at-goto-end (concat parsebib--entry-start "preamble[[:space:]]*[\(\{]"))
+ (when (parsebib--looking-at-goto-end (concat parsebib--entry-start "\\(preamble[[:space:]]*\\)[\(\{]") 1)
(let ((beg (point)))
- (forward-char -1)
(when (parsebib--match-paren-forward)
(buffer-substring-no-properties beg (point))))))
-(defun parsebib-read-entry (type &optional pos)
+(defun parsebib-read-entry (type &optional pos strings)
"Read a BibTeX entry of type TYPE at the line POS is on.
-
TYPE should be a string and should not contain the @
sign. The return value is the entry as an alist of (<field> .
<contents>) cons pairs, or nil if no entry was found. In this
@@ -270,7 +478,11 @@ the line POS is on. If POS is nil, it defaults to point.
ENTRY should not be \"Comment\", \"Preamble\" or \"String\", but
is otherwise not limited to any set of possible entry types. If
so required, the calling function has to ensure that the entry
-type is valid."
+type is valid.
+
+If STRINGS is provided, it should be a hash table with string
+abbreviations, which are used to expand abbrevs in the entry's
+fields."
(unless (member-ignore-case type '("comment" "preamble" "string"))
(when pos (goto-char pos))
(beginning-of-line)
@@ -287,37 +499,192 @@ type is valid."
(buffer-substring-no-properties beg (point)))))
(or key (setq key "")) ; if no key was found, we pretend it's empty and try to read the entry anyway
(skip-chars-forward "^," limit) ; move to the comma after the entry key
- (let ((fields (cl-loop for field = (parsebib--find-bibtex-field limit)
+ (let ((fields (cl-loop for field = (parsebib--find-bibtex-field limit strings)
while field collect field)))
(push (cons "=type=" type) fields)
(push (cons "=key=" key) fields)
(nreverse fields))))))
-(defun parsebib--find-bibtex-field (limit)
+(defun parsebib--find-bibtex-field (limit &optional strings)
"Find the field after point.
Do not search beyond LIMIT (a buffer position). Return a
-cons (FIELD . VALUE), or nil if no field was found."
+cons (FIELD . VALUE), or nil if no field was found.
+
+If STRINGS is provided it should be a hash table with string
+abbreviations, which are used to expand abbrevs in the field's
+value."
(skip-chars-forward "\"#%'(),={} \n\t\f" limit) ; move to the first char of the field name
- (unless (>= (point) limit) ; if we haven't reached the end of the entry
+ (unless (>= (point) limit) ; if we haven't reached the end of the entry
(let ((beg (point)))
- (if (parsebib--looking-at-goto-end (concat "\\(" parsebib--bibtex-identifier "\\)[ \t\n\f]*=") 1)
+ (if (parsebib--looking-at-goto-end (concat "\\(" parsebib--bibtex-identifier "\\)[[:space:]]*=[[:space:]]*") 1)
(let ((field-type (buffer-substring-no-properties beg (point))))
- (skip-chars-forward "#%'()=} \n\t\f" limit) ; move to the field contents
- (let* ((beg (point))
- (field-contents (buffer-substring-no-properties beg (parsebib--find-end-of-field limit))))
+ (let ((field-contents (parsebib--parse-value limit strings)))
(cons field-type field-contents)))))))
-(defun parsebib--find-end-of-field (limit)
- "Move point to the end of a field's contents and return point.
-The contents of a field is delimited by a comma or by the closing brace of
-the entry. The latter should be at position LIMIT."
- (while (and (not (eq (char-after) ?\,))
- (< (point) limit))
- (parsebib--match-delim-forward) ; check if we're on a delimiter and if so, jump to the matching closing delimiter
- (forward-char 1))
- (if (= (point) limit)
- (skip-chars-backward " \n\t\f"))
- (point))
+;;;;;;;;;;;;;;;;;;;;
+;; high-level API ;;
+;;;;;;;;;;;;;;;;;;;;
+
+(defun parsebib-collect-preambles ()
+ "Collect all @Preamble definitions in the current buffer.
+Return a list of strings, each string a separate @Preamble."
+ (save-excursion
+ (goto-char (point-min))
+ (let (res)
+ (cl-loop for item = (parsebib-find-next-item)
+ while item do
+ (when (cl-equalp item "preamble")
+ (push (parsebib-read-preamble) res)))
+ (nreverse res))))
+
+(defun parsebib-collect-comments ()
+ "Collect all @Comment definitions in the current buffer.
+Return a list of strings, each string a separate @Comment."
+ (save-excursion
+ (goto-char (point-min))
+ (let (res)
+ (cl-loop for item = (parsebib-find-next-item)
+ while item do
+ (when (cl-equalp item "comment")
+ (push (parsebib-read-comment) res)))
+ (nreverse (delq nil res)))))
+
+(defun parsebib-collect-strings (&optional hash expand-strings)
+ "Collect all @String definitions in the current buffer.
+Return value is a hash with the abbreviations as keys and the
+expansions as values. If HASH is a hash table with test function
+`equal', it is used to store the @String definitions. If
+EXPAND-STRINGS is non-nil, @String expansions are expanded
+themselves using the @String definitions already stored in HASH."
+ (or (and (hash-table-p hash)
+ (eq 'equal (hash-table-test hash)))
+ (setq hash (make-hash-table :test #'equal)))
+ (save-excursion
+ (goto-char (point-min))
+ (cl-loop with string = nil
+ for item = (parsebib-find-next-item)
+ while item do
+ (when (cl-equalp item "string")
+ (setq string (parsebib-read-string nil (if expand-strings hash)))
+ (puthash (car string) (cdr string) hash)))
+ hash))
+
+(defun parsebib-collect-entries (&optional hash strings inheritance)
+ "Collect all entries in the current buffer.
+Return value is a hash table containing the entries. If HASH is
+a hash table, with test function `equal', it is used to store the
+entries. If STRINGS is non-nil, it should be a hash table of
+string definitions, which are used to expand abbreviations used
+in the entries.
+
+If INHERITANCE is non-nil, cross-references in the entries are
+resolved: if the crossref field of an entry points to an entry
+already in HASH, the fields of the latter that do not occur in
+the entry are added to it. INHERITANCE indicates the inheritance
+schema used for determining which fields inherit from which
+fields. It can be a symbol `BibTeX' or `biblatex', or it can be
+an explicit inheritance schema. (See the variable
+`parsebib--biblatex-inheritances' for details on the structure of
+such an inheritance schema.) It can also be the symbol t, in
+which case the local variable block is checked for a
+dialect (using the variable `bibtex-dialect'), or, if no such
+local variable is found, the value of the variable
+`bibtex-dialect'."
+ (or (and (hash-table-p hash)
+ (eq 'equal (hash-table-test hash)))
+ (setq hash (make-hash-table :test #'equal)))
+ (if (eq inheritance t)
+ (setq inheritance (or (parsebib-find-bibtex-dialect)
+ bibtex-dialect
+ 'BibTeX)))
+ (save-excursion
+ (goto-char (point-min))
+ (cl-loop with entry = nil
+ for entry-type = (parsebib-find-next-item)
+ while entry-type do
+ (unless (member-ignore-case entry-type '("preamble" "string" "comment"))
+ (setq entry (parsebib-read-entry entry-type nil strings))
+ (if entry
+ (puthash (cdr (assoc-string "=key=" entry)) entry hash))))
+ (when inheritance
+ (parsebib-expand-xrefs hash inheritance))
+ hash))
+
+(defun parsebib-find-bibtex-dialect ()
+ "Find the BibTeX dialect of a file if one is set.
+This function looks for a local value of the variable
+`bibtex-dialect' in the local variable block at the end of the
+file. Return nil if no dialect is found."
+ (save-excursion
+ (goto-char (point-max))
+ (let ((case-fold-search t))
+ (when (re-search-backward (concat parsebib--entry-start "comment") (- (point-max) 3000) t)
+ (let ((comment (parsebib-read-comment)))
+ (when (and comment
+ (string-match-p "\\`[ \n\t\r]*Local Variables:" comment)
+ (string-match-p "End:[ \n\t\r]*\\'" comment)
+ (string-match (concat "bibtex-dialect: " (regexp-opt (mapcar #'symbol-name bibtex-dialect-list) t)) comment))
+ (intern (match-string 1 comment))))))))
+
+(defun parsebib-parse-buffer (&optional entries strings expand-strings inheritance)
+ "Parse the current buffer and return all BibTeX data.
+Return list of five elements: a hash table with the entries, a
+hash table with the @String definitions, a list of @Preamble
+definitions, a list of @Comments and the BibTeX dialect, if
+present in the file.
+
+If ENTRIES is a hash table with test function `equal', it is used
+to store the entries. Any existing entries with identical keys
+are overwritten. Similarly, if STRINGS is a hash table with test
+function `equal', the @String definitions are stored in it.
+
+If EXPAND-STRINGS is non-nil, abbreviations in the entries and
+@String definitions are expanded using the @String definitions
+already in STRINGS.
+
+If INHERITANCE is non-nil, cross-references in the entries are
+resolved: if the crossref field of an entry points to an entry
+already in ENTRIES, the fields of the latter that do not occur in
+the entry are added to it. INHERITANCE indicates the inheritance
+schema used for determining which fields inherit from which
+fields. It can be a symbol `BibTeX' or `biblatex', which means
+to use the default inheritance schema for either dialect, or it
+can be an explicit inheritance schema. (See the variable
+`parsebib--biblatex-inheritances' for details on the structure of
+such an inheritance schema.) It can also be the symbol t, in
+which case the local variable block is checked for a
+dialect (using the variable `bibtex-dialect'), or, if no such
+local variable is found, the value of the variable
+`bibtex-dialect'."
+ (save-excursion
+ (goto-char (point-min))
+ (or (and (hash-table-p entries)
+ (eq (hash-table-test entries) 'equal))
+ (setq entries (make-hash-table :test #'equal)))
+ (or (and (hash-table-p strings)
+ (eq (hash-table-test strings) 'equal))
+ (setq strings (make-hash-table :test #'equal)))
+ (let ((dialect (or (parsebib-find-bibtex-dialect)
+ bibtex-dialect
+ 'BibTeX))
+ preambles comments)
+ (cl-loop for item = (parsebib-find-next-item)
+ while item do
+ (cond
+ ((cl-equalp item "string") ; `cl-equalp' compares strings case-insensitively.
+ (let ((string (parsebib-read-string nil (if expand-strings strings))))
+ (if string
+ (puthash (car string) (cdr string) strings))))
+ ((cl-equalp item "preamble")
+ (push (parsebib-read-preamble) preambles))
+ ((cl-equalp item "comment")
+ (push (parsebib-read-comment) comments))
+ ((stringp item)
+ (let ((entry (parsebib-read-entry item nil (if expand-strings strings))))
+ (when entry
+ (puthash (cdr (assoc-string "=key=" entry)) entry entries))))))
+ (when inheritance (parsebib-expand-xrefs entries (if (eq inheritance t) dialect inheritance)))
+ (list entries strings (nreverse preambles) (nreverse comments) dialect))))
(provide 'parsebib)