summaryrefslogtreecommitdiff
path: root/BibTeX/Structure.pm
diff options
context:
space:
mode:
Diffstat (limited to 'BibTeX/Structure.pm')
-rw-r--r--BibTeX/Structure.pm1201
1 files changed, 1201 insertions, 0 deletions
diff --git a/BibTeX/Structure.pm b/BibTeX/Structure.pm
new file mode 100644
index 0000000..2fcd571
--- /dev/null
+++ b/BibTeX/Structure.pm
@@ -0,0 +1,1201 @@
+# ----------------------------------------------------------------------
+# NAME : BibTeX/Structure.pm
+# CLASSES : Text::BibTeX::Structure, Text::BibTeX::StructuredEntry
+# RELATIONS :
+# DESCRIPTION: Provides the two base classes needed to implement
+# Text::BibTeX structure modules.
+# CREATED : in original form: Apr 1997
+# completely redone: Oct 1997
+# MODIFIED :
+# VERSION : $Id: Structure.pm 3033 2006-09-21 20:07:27Z ambs $
+# COPYRIGHT : Copyright (c) 1997-2000 by Gregory P. Ward. All rights
+# reserved.
+#
+# This file is part of the Text::BibTeX library. This
+# library is free software; you may redistribute it and/or
+# modify it under the same terms as Perl itself.
+# ----------------------------------------------------------------------
+
+package Text::BibTeX::Structure;
+
+require 5.004; # for 'isa' and 'can'
+
+use strict;
+use Carp;
+
+use Text::BibTeX ('check_class');
+
+=head1 NAME
+
+Text::BibTeX::Structure - provides base classes for user structure modules
+
+=head1 SYNOPSIS
+
+ # Define a 'Foo' structure for BibTeX databases: first, the
+ # structure class:
+
+ package Text::BibTeX::FooStructure;
+ @ISA = ('Text::BibTeX::Structure');
+
+ sub known_option
+ {
+ my ($self, $option) = @_;
+
+ ...
+ }
+
+ sub default_option
+ {
+ my ($self, $option) = @_;
+
+ ...
+ }
+
+ sub describe_entry
+ {
+ my $self = shift;
+
+ $self->set_fields ($type,
+ \@required_fields,
+ \@optional_fields,
+ [$constraint_1, $constraint_2, ...]);
+ ...
+ }
+
+
+ # Now, the structured entry class
+
+ package Text::BibTeX::FooEntry;
+ @ISA = ('Text::BibTeX::StructuredEntry');
+
+ # define whatever methods you like
+
+=head1 DESCRIPTION
+
+The module C<Text::BibTeX::Structure> provides two classes that form the
+basis of the B<btOOL> "structure module" system. This system is how
+database structures are defined and imposed on BibTeX files, and
+provides an elegant synthesis of object-oriented techniques with
+BibTeX-style database structures. Nothing described here is
+particularly deep or subtle; anyone familar with object-oriented
+programming should be able to follow it. However, a fair bit of jargon
+in invented and tossed around, so pay attention.
+
+A I<database structure>, in B<btOOL> parlance, is just a set of allowed
+entry types and the rules for fields in each of those entry types.
+Currently, there are three kinds of rules that apply to fields: some
+fields are I<required>, meaning they must be present in every entry for
+a given type; some are I<optional>, meaning they may be present, and
+will be used if they are; other fields are members of I<constraint
+sets>, which are explained in L<"Field lists and constraint sets">
+below.
+
+A B<btOOL> structure is implemented with two classes: the I<structure
+class> and the I<structured entry class>. The former defines everything
+that applies to the structure as a whole (allowed types and field
+rules). The latter provides methods that operate on individual entries
+which conform (or are supposed to conform) to the structure. The two
+classes provided by the C<Text::BibTeX::Structure> module are
+C<Text::BibTeX::Structure> and C<Text::BibTeX::StructuredEntry>; these
+serve as base classes for, respectively, all structure classes and all
+structured entry classes. One canonical structure is provided as an
+example with B<btOOL>: the C<Bib> structure, which (via the
+C<BibStructure> and C<BibEntry> classes) provides the same functionality
+as the standard style files of BibTeX 0.99. It is hoped that other
+programmers will write new bibliography-related structures, possibly
+deriving from the C<Bib> structure, to emulate some of the functionality
+that is available through third-party BibTeX style files.
+
+The purpose of this manual page is to describe the whole "structure
+module" system. It is mainly for programmers wishing to implement a new
+database structure for data files with BibTeX syntax; if you are
+interested in the particular rules for the BibTeX-emulating C<Bib>
+structure, see L<Text::BibTeX::Bib>.
+
+Please note that the C<Text::BibTeX> prefix is dropped from most module
+and class names in this manual page, except where necessary.
+
+=head1 STRUCTURE CLASSES
+
+Structure classes have two roles: to define the list of allowed types
+and field rules, and to handle I<structure options>.
+
+=head2 Field lists and constraint sets
+
+Field lists and constraint sets define the database structure for a
+particular entry type: that is, they specify the rules which an entry
+must follow to conform to the structure (assuming that entry is of an
+allowed type). There are three components to the field rules for each
+entry type: a list of required fields, a list of optional fields, and
+I<field constraints>. Required and optional fields should be obvious to
+anyone with BibTeX experience: all required fields must be present, and
+any optional fields that are present have some meaning to the structure.
+(One could conceive of a "strict" interpretation, where any field not
+mentioned in the official definition is disallowed; this would be
+contrary to the open spirit of BibTeX databases, but could be useful in
+certain applications where a stricter level of control is desired.
+Currently, B<btOOL> does not offer such an option.)
+
+Field constraints capture the "one or the other, but not both" type of
+relationships present for some entry types in the BibTeX standard style
+files. Most BibTeX documentation glosses over the distinction between
+mutually constrained fields and required/optional fields. For instance,
+one of the standard entry types is C<book>, and "C<author> or C<editor>"
+is given in the list of required fields for that type. The meaning of
+this is that an entry of type C<book> must have I<either> the C<author>
+or C<editor> fields, but not both. Likewise, the "C<volume> or
+C<number>" are listed under the "optional fields" heading for C<book>
+entries; it would be more accurate to say that every C<book> entry may
+have one or the other, or neither, of C<volume> or C<number>---but not
+both.
+
+B<btOOL> attempts to clarify this situation by creating a third category
+of fields, those that are mutually constrained. For instance, neither
+C<author> nor C<editor> appears in the list of required fields for
+the C<inbook> type according to B<btOOL>; rather, a field constraint is
+created to express this relationship:
+
+ [1, 1, ['author', 'editor']]
+
+That is, a field constraint is a reference to a three-element list. The
+last element is a reference to the I<constraint set>, the list of fields
+to which the constraint applies. (Calling this a set is a bit
+inaccurate, as there are conditions in which the order of fields
+matters---see the C<check_field_constraints> method in L<"METHODS 2:
+BASE STRUCTURED ENTRY CLASS">.) The first two elements are the minimum
+and maximum number of fields from the constraint set that must be
+present for an entry to conform to the constraint. This constraint thus
+expresses that there must be exactly one (>= 1 and <= 1) of the fields
+C<author> and C<editor> in a C<book> entry.
+
+The "either one or neither, but not both" constraint that applies to the
+C<volume> and C<number> fields for C<book> entries is expressed slightly
+differently:
+
+ [0, 1, ['volume', 'number']]
+
+That is, either 0 or 1, but not the full 2, of C<volume> and C<number>
+may be present.
+
+It is important to note that checking and enforcing field constraints is
+based purely on counting which fields from a set are actually present;
+this mechanism can't capture "x must be present if y is" relationships.
+
+The requirements imposed on the actual structure class are simple: it
+must provide a method C<describe_entry> which sets up a fancy data
+structure describing the allowed entry types and all the field rules for
+those types. The C<Structure> class provides methods (inherited by a
+particular structure class) to help particular structure classes create
+this data structure in a consistent, controlled way. For instance, the
+C<describe_structure> method in the BibTeX 0.99-emulating
+C<BibStructure> class is quite simple:
+
+ sub describe_entry
+ {
+ my $self = shift;
+
+ # series of 13 calls to $self->set_fields (one for each standard
+ # entry type)
+ }
+
+One of those calls to the C<set_fields> method defines the rules for
+C<book> entries:
+
+ $self->set_fields ('book',
+ [qw(title publisher year)],
+ [qw(series address edition month note)],
+ [1, 1, [qw(author editor)]],
+ [0, 1, [qw(volume number)]]);
+
+The first field list is the list of required fields, and the second is
+the list of optional fields. Any number of field constraints may follow
+the list of optional fields; in this case, there are two, one for each
+of the constraints (C<author>/C<editor> and C<volume>/C<number>)
+described above. At no point is a list of allowed types explicitly
+supplied; rather, each call to C<set_fields> adds one more allowed type.
+
+New structure modules that derive from existing ones will probably use the
+C<add_fields> method (and possibly C<add_constraints>) to augment an
+existing entry type. Adding new types should be done with C<set_fields>,
+though.
+
+=head2 Structure options
+
+The other responsibility of structure classes is to handle I<structure
+options>. These are scalar values that let the user customize the
+behaviour of both the structure class and the structured entry class.
+For instance, one could have an option to enable "extended structure",
+which might add on a bunch of new entry types and new fields. (In this
+case, the C<describe_entry> method would have to pay attention to this
+option and modify its behaviour accordingly.) Or, one could have
+options to control how the structured entry class sorts or formats
+entries (for bibliography structures such as C<Bib>).
+
+The easy way to handle structure options is to provide two methods,
+C<known_option> and C<default_option>. These return, respectively,
+whether a given option is supported, and what its default value is. (If
+your structure doesn't support any options, you can just inherit these
+methods from the C<Structure> class. The default C<known_option>
+returns false for all options, and its companion C<default_option>
+crashes with an "unknown option" error.)
+
+Once C<known_option> and C<default_option> are provided, the structure
+class can sit back and inherit the more visible C<set_options> and
+C<get_options> methods from the C<Structure> class. These are the
+methods actually used to modify/query options, and will be used by
+application programs to customize the structure module's behaviour, and
+by the structure module itself to pay attention to the user's wishes.
+
+Options should generally have pure string values, so that the generic
+set_options method doesn't have to parse user-supplied strings into some
+complicated structure. However, C<set_options> will take any scalar
+value, so if the structure module clearly documents its requirements,
+the application program could supply a structure that meets its needs.
+Keep in mind that this requires cooperation between the application and
+the structure module; the intermediary code in
+C<Text::BibTeX::Structure> knows nothing about the format or syntax of
+your structure's options, and whatever scalar the application passes via
+C<set_options> will be stored for your module to retrieve via
+C<get_options>.
+
+As an example, the C<Bib> structure supports a number of "markup"
+options that allow applications to control the markup language used for
+formatting bibliographic entries. These options are naturally paired,
+as formatting commands in markup languages generally have to be turned
+on and off. The C<Bib> structure thus expects references to two-element
+lists for markup options; to specify LaTeX 2e-style emphasis for book
+titles, an application such as C<btformat> would set the C<btitle_mkup>
+option as follows:
+
+ $structure->set_options (btitle_mkup => ['\emph{', '}']);
+
+Other options for other structures might have a more complicated
+structure, but it's up to the structure class to document and enforce
+this.
+
+=head1 STRUCTURED ENTRY CLASSES
+
+A I<structured entry class> defines the behaviour of individual entries
+under the regime of a particular database structure. This is the
+I<raison d'E<ecirc>tre> for any database structure: the structure class
+merely lays out the rules for entries to conform to the structure, but
+the structured entry class provides the methods that actually operate on
+individual entries. Because this is completely open-ended, the
+requirements of a structured entry class are much less rigid than for a
+structure class. In fact, all of the requirements of a structured entry
+class can be met simply by inheriting from
+C<Text::BibTeX::StructuredEntry>, the other class provided by the
+C<Text::BibTeX::Structure> module. (For the record, those requirements
+are: a structured entry class must provide the entry
+parse/query/manipulate methods of the C<Entry> class, and it must
+provide the C<check>, C<coerce>, and C<silently_coerce> methods of the
+C<StructuredEntry> class. Since C<StructuredEntry> inherits from
+C<Entry>, both of these requirements are met "for free" by structured
+entry classes that inherit from C<Text::BibTeX::StructuredEntry>, so
+naturally this is the recommended course of action!)
+
+There are deliberately no other methods required of structured entry
+classes. A particular application (eg. C<btformat> for bibliography
+structures) will require certain methods, but it's up to the application
+and the structure module to work out the requirements through
+documentation.
+
+=head1 CLASS INTERACTIONS
+
+Imposing a database structure on your entries sets off a chain reaction
+of interactions between various classes in the C<Text::BibTeX> library
+that should be transparent when all goes well. It could prove confusing
+if things go wrong and you have to go wading through several levels of
+application program, core C<Text::BibTeX> classes, and some structure
+module.
+
+The justification for this complicated behaviour is that it allows you
+to write programs that will use a particular structured module without
+knowing the name of the structure when you write the program. Thus, the
+user can supply a database structure, and ultimately the entry objects
+you manipulate will be blessed into a class supplied by the structure
+module. A short example will illustrate this.
+
+Typically, a C<Text::BibTeX>-based program is based around a kernel of
+code like this:
+
+ $bibfile = new Text::BibTeX::File "foo.bib";
+ while ($entry = new Text::BibTeX::Entry $bibfile)
+ {
+ # process $entry
+ }
+
+In this case, nothing fancy is happening behind the scenes: the
+C<$bibfile> object is blessed into the C<Text::BibTeX::File> class, and
+C<$entry> is blessed into C<Text::BibTeX::Entry>. This is the
+conventional behaviour of Perl classes, but it is not the only possible
+behaviour. Let us now suppose that C<$bibfile> is expected to conform
+to a database structure specified by C<$structure> (presumably a
+user-supplied value, and thus unknown at compile-time):
+
+ $bibfile = new Text::BibTeX::File "foo.bib";
+ $bibfile->set_structure ($structure);
+ while ($entry = new Text::BibTeX::Entry $bibfile)
+ {
+ # process $entry
+ }
+
+A lot happens behind the scenes with the call to C<$bibfile>'s
+C<set_structure> method. First, a new structure object is created from
+C<$structure>. The structure name implies the name of a Perl
+module---the structure module---which is C<require>'d by the
+C<Structure> constructor. (The main consequence of this is that any
+compile-time errors in your structure module will not be revealed until
+a C<Text::BibTeX::File::set_structure> or
+C<Text::BibTeX::Structure::new> call attempts to load it.)
+
+Recall that the first responsibility of a structure module is to define
+a structure class. The "structure object" created by the
+C<set_structure> method call is actually an object of this class; this
+is the first bit of trickery---the structure object (buried behind the
+scenes) is blessed into a class whose name is not known until run-time.
+
+Now, the behaviour of the C<Text::BibTeX::Entry::new> constructor
+changes subtly: rather than returning an object blessed into the
+C<Text::BibTeX::Entry> class as you might expect from the code, the
+object is blessed into the structured entry class associated with
+C<$structure>.
+
+For example, if the value of C<$structure> is C<"Foo">, that means the
+user has supplied a module implementing the C<Foo> structure.
+(Ordinarily, this module would be called C<Text::BibTeX::Foo>---but you
+can customize this.) Calling the C<set_structure> method on C<$bibfile>
+will attempt to create a new structure object via the
+C<Text::BibTeX::Structure> constructor, which loads the structure module
+C<Text::BibTeX::Foo>. Once this module is successfully loaded, the new
+object is blessed into its structure class, which will presumably be
+called C<Text::BibTeX::FooStructure> (again, this is customizable). The
+new object is supplied with the user's structure options via the
+C<set_options> method (usually inherited), and then it is asked to
+describe the actual entry layout by calling its C<describe_entry>
+method. This, in turn, will usually call the inherited C<set_fields>
+method for each entry type in the database structure. When the
+C<Structure> constructor is finished, the new structure object is stored
+in the C<File> object (remember, we started all this by calling
+C<set_structure> on a C<File> object) for future reference.
+
+Then, when a new C<Entry> object is created and parsed from that
+particular C<File> object, some more trickery happens. Trivially, the
+structure object stored in the C<File> object is also stored in the
+C<Entry> object. (The idea is that entries could belong to a database
+structure independently of any file, but usually they will just get the
+structure that was assigned to their database file.) More importantly,
+the new C<Entry> object is re-blessed into the structured entry class
+supplied by the structure module---presumably, in this case,
+C<Text::BibTeX::FooEntry> (also customizable).
+
+Once all this sleight-of-hand is accomplished, the application may treat
+its entry objects as objects of the structured entry class for the
+C<Foo> structure---they may call the check/coerce methods inherited from
+C<Text::BibTeX::StructuredEntry>, and they may also call any methods
+specific to entries for this particular database structure. What these
+methods might be is up to the structure implementor to decide and
+document; thus, applications may be specific to one particular database
+structure, or they may work on all structures that supply certain
+methods. The choice is up to the application developer, and the range
+of options open to him depends on which methods structure implementors
+provide.
+
+=head1 EXAMPLE
+
+For example code, please refer to the source of the C<Bib> module and
+the C<btcheck>, C<btsort>, and C<btformat> applications supplied with
+C<Text::BibTeX>.
+
+=head1 METHODS 1: BASE STRUCTURE CLASS
+
+The first class provided by the C<Text::BibTeX::Structure> module is
+C<Text::BibTeX::Structure>. This class is intended to provide methods
+that will be inherited by user-supplied structure classes; such classes
+should not override any of the methods described here (except
+C<known_option> and C<default_option>) without very good reason.
+Furthermore, overriding the C<new> method would be useless, because in
+general applications won't know the name of your structure class---they
+can only call C<Text::BibTeX::Structure::new> (usually via
+C<Text::BibTeX::File::set_structure>).
+
+Finally, there are three methods that structure classes should
+implement: C<known_option>, C<default_option>, and C<describe_entry>.
+The first two are described in L<"Structure options"> above, the latter
+in L<"Field lists and constraint sets">. Note that C<describe_entry>
+depends heavily on the C<set_fields>, C<add_fields>, and
+C<add_constraints> methods described here.
+
+=head2 Constructor/simple query methods
+
+=over 4
+
+=item new (STRUCTURE, [OPTION =E<gt> VALUE, ...])
+
+Constructs a new structure object---I<not> a C<Text::BibTeX::Structure>
+object, but rather an object blessed into the structure class associated
+with STRUCTURE. More precisely:
+
+=over 4
+
+=item *
+
+Loads (with C<require>) the module implementing STRUCTURE. In the
+absence of other information, the module name is derived by appending
+STRUCTURE to C<"Text::BibTeX::">---thus, the module C<Text::BibTeX::Bib>
+implements the C<Bib> structure. Use the pseudo-option C<module> to
+override this module name. For instance, if the structure C<Foo> is
+implemented by the module C<Foo>:
+
+ $structure = new Text::BibTeX::Structure
+ ('Foo', module => 'Foo');
+
+This method C<die>s if there are any errors loading/compiling the
+structure module.
+
+=item *
+
+Verifies that the structure module provides a structure class and a
+structured entry class. The structure class is named by appending
+C<"Structure"> to the name of the module, and the structured entry class
+by appending C<"Entry">. Thus, in the absence of a C<module> option,
+these two classes (for the C<Bib> structure) would be named
+C<Text::BibTeX::BibStructure> and C<Text::BibTeX::BibEntry>. Either or
+both of the default class names may be overridden by having the
+structure module return a reference to a hash (as opposed to the
+traditional C<1> returned by modules). This hash could then supply a
+C<structure_class> element to name the structure class, and an
+C<entry_class> element to name the structured entry class.
+
+Apart from ensuring that the two classes actually exist, C<new> verifies
+that they inherit correctly (from C<Text::BibTeX::Structure> and
+C<Text::BibTeX::StructuredEntry> respectively), and that the structure
+class provides the required C<known_option>, C<default_option>, and
+C<describe_entry> methods.
+
+=item *
+
+Creates the new structure object, and blesses it into the structure
+class. Supplies it with options by passing all (OPTION, VALUE) pairs to
+its C<set_options> method. Calls its C<describe_entry> method, which
+should list the field requirements for all entry types recognized by
+this structure. C<describe_entry> will most likely use some or all of
+the C<set_fields>, C<add_fields>, and C<add_constraints>
+methods---described below---for this.
+
+=back
+
+=cut
+
+sub new
+{
+ my ($type, $name, %options) = @_;
+
+ # - $type is presumably "Text::BibTeX::Structure" (if called from
+ # Text::BibTeX::File::set_structure), but shouldn't assume that
+ # - $name is the name of the user-supplied structure; it also
+ # determines the module we will attempt to load here, unless
+ # a 'module' option is given in %options
+ # - %options is a mix of options recognized here (in particular
+ # 'module'), by Text::BibTeX::StructuredEntry (? 'check', 'coerce',
+ # 'warn' flags), and by the user structure classes
+
+ my $module = (delete $options{'module'}) || ('Text::BibTeX::' . $name);
+
+ my $module_info = eval "require $module";
+ die "Text::BibTeX::Structure: unable to load module \"$module\" for " .
+ "user structure \"$name\": $@\n"
+ if $@;
+
+ my ($structure_class, $entry_class);
+ if (ref $module_info eq 'HASH')
+ {
+ $structure_class = $module_info->{'structure_class'};
+ $entry_class = $module_info->{'entry_class'};
+ }
+ $structure_class ||= $module . 'Structure';
+ $entry_class ||= $module . 'Entry';
+
+ check_class ($structure_class, "user structure class",
+ 'Text::BibTeX::Structure',
+ ['known_option', 'default_option', 'describe_entry']);
+ check_class ($entry_class, "user entry class",
+ 'Text::BibTeX::StructuredEntry',
+ []);
+
+ my $self = bless {}, $structure_class;
+ $self->{entry_class} = $entry_class;
+ $self->{name} = $name;
+ $self->set_options (%options); # these methods are both provided by
+ $self->describe_entry; # the user structure class
+ $self;
+}
+
+
+=item name ()
+
+Returns the name of the structure described by the object.
+
+=item entry_class ()
+
+Returns the name of the structured entry class associated with this
+structure.
+
+=back
+
+=cut
+
+sub name { shift->{'name'} }
+
+sub entry_class { shift->{'entry_class'} }
+
+
+=head2 Field structure description methods
+
+=over 4
+
+=item add_constraints (TYPE, CONSTRAINT, ...)
+
+Adds one or more field constraints to the structure. A field constraint
+is specified as a reference to a three-element list; the last element is
+a reference to the list of fields affected, and the first two elements
+are the minimum and maximum number of fields from the constraint set
+allowed in an entry of type TYPE. See L<"Field lists and constraint
+sets"> for a full explanation of field constraints.
+
+=cut
+
+sub add_constraints
+{
+ my ($self, $type, @constraints) = @_;
+ my ($constraint);
+
+ foreach $constraint (@constraints)
+ {
+ my ($min, $max, $fields) = @$constraint;
+ croak "add_constraints: constraint record must be a 3-element " .
+ "list, with the last element a list ref"
+ unless (@$constraint == 3 && ref $fields eq 'ARRAY');
+ croak "add_constraints: constraint record must have 0 <= 'min' " .
+ "<= 'max' <= length of field list"
+ unless ($min >= 0 && $max >= $min && $max <= @$fields);
+ map { $self->{fields}{$type}{$_} = $constraint } @$fields;
+ }
+ push (@{$self->{fieldgroups}{$type}{'constraints'}}, @constraints);
+
+} # add_constraints
+
+
+=item add_fields (TYPE, REQUIRED [, OPTIONAL [, CONSTRAINT, ...]])
+
+Adds fields to the required/optional lists for entries of type TYPE.
+Can also add field constraints, but you can just as easily use
+C<add_constraints> for that.
+
+REQUIRED and OPTIONAL, if defined, should be references to lists of
+fields to add to the respective field lists. The CONSTRAINTs, if given,
+are exactly as described for C<add_constraints> above.
+
+=cut
+
+sub add_fields # add fields for a particular type
+{
+ my ($self, $type, $required, $optional, @constraints) = @_;
+
+ # to be really robust and inheritance-friendly, we should:
+ # - check that no field is in > 1 list (just check $self->{fields}
+ # before we start assigning stuff)
+ # - allow sub-classes to delete fields or move them to another group
+
+ if ($required)
+ {
+ push (@{$self->{fieldgroups}{$type}{'required'}}, @$required);
+ map { $self->{fields}{$type}{$_} = 'required' } @$required;
+ }
+
+ if ($optional)
+ {
+ push (@{$self->{fieldgroups}{$type}{'optional'}}, @$optional);
+ map { $self->{fields}{$type}{$_} = 'optional' } @$optional;
+ }
+
+ $self->add_constraints ($type, @constraints);
+
+} # add_fields
+
+
+=item set_fields (TYPE, REQUIRED [, OPTIONAL [, CONSTRAINTS, ...]])
+
+Sets the lists of required/optional fields for entries of type TYPE.
+Identical to C<add_fields>, except that the field lists and list of
+constraints are set from scratch here, rather than being added to.
+
+=back
+
+=cut
+
+sub set_fields
+{
+ my ($self, $type, $required, $optional, @constraints) = @_;
+ my ($constraint, $field);
+
+ undef %{$self->{fields}{$type}};
+
+ if ($required)
+ {
+ $self->{fieldgroups}{$type}{'required'} = $required;
+ map { $self->{fields}{$type}{$_} = 'required' } @$required;
+ }
+
+ if ($optional)
+ {
+ $self->{fieldgroups}{$type}{'optional'} = $optional;
+ map { $self->{fields}{$type}{$_} = 'optional' } @$optional;
+ }
+
+ undef @{$self->{fieldgroups}{$type}{'constraints'}};
+ $self->add_constraints ($type, @constraints);
+
+} # set_fields
+
+
+=head2 Field structure query methods
+
+=over 4
+
+=item types ()
+
+Returns the list of entry types supported by the structure.
+
+=item known_type (TYPE)
+
+Returns true if TYPE is a supported entry type.
+
+=item known_field (TYPE, FIELD)
+
+Returns true if FIELD is in the required list, optional list, or one of
+the constraint sets for entries of type TYPE.
+
+=item required_fields (TYPE)
+
+Returns the list of required fields for entries of type TYPE.
+
+=item optional_fields ()
+
+Returns the list of optional fields for entries of type TYPE.
+
+=item field_constraints ()
+
+Returns the list of field constraints (in the format supplied to
+C<add_constraints>) for entries of type TYPE.
+
+=back
+
+=cut
+
+sub types
+{
+ my $self = shift;
+
+ keys %{$self->{'fieldgroups'}};
+}
+
+sub known_type
+{
+ my ($self, $type) = @_;
+
+ exists $self->{'fieldgroups'}{$type};
+}
+
+sub _check_type
+{
+ my ($self, $type) = @_;
+
+ croak "unknown entry type \"$type\" for $self->{'name'} structure"
+ unless exists $self->{'fieldgroups'}{$type};
+}
+
+sub known_field
+{
+ my ($self, $type, $field) = @_;
+
+ $self->_check_type ($type);
+ $self->{'fields'}{$type}{$field}; # either 'required', 'optional', or
+} # a constraint record (or undef!)
+
+sub required_fields
+{
+ my ($self, $type) = @_;
+
+ $self->_check_type ($type);
+ @{$self->{'fieldgroups'}{$type}{'required'}};
+}
+
+sub optional_fields
+{
+ my ($self, $type) = @_;
+
+ $self->_check_type ($type);
+ @{$self->{'fieldgroups'}{$type}{'optional'}};
+}
+
+sub field_constraints
+{
+ my ($self, $type) = @_;
+
+ $self->_check_type ($type);
+ @{$self->{'fieldgroups'}{$type}{'constraints'}};
+}
+
+
+=head2 Option methods
+
+=over 4
+
+=item known_option (OPTION)
+
+Returns false. This is mainly for the use of derived structures that
+don't have any options, and thus don't need to provide their own
+C<known_option> method. Structures that actually offer options should
+override this method; it should return true if OPTION is a supported
+option.
+
+=cut
+
+sub known_option
+{
+ return 0;
+}
+
+
+=item default_option (OPTION)
+
+Crashes with an "unknown option" message. Again, this is mainly for use
+by derived structure classes that don't actually offer any options.
+Structures that handle options should override this method; every option
+handled by C<known_option> should have a default value (which might just
+be C<undef>) that is returned by C<default_option>. Your
+C<default_options> method should crash on an unknown option, perhaps by
+calling C<SUPER::default_option> (in order to ensure consistent error
+messages). For example:
+
+ sub default_option
+ {
+ my ($self, $option) = @_;
+ return $default_options{$option}
+ if exists $default_options{$option};
+ $self->SUPER::default_option ($option); # crash
+ }
+
+The default value for an option is returned by C<get_options> when that
+options has not been explicitly set with C<set_options>.
+
+=cut
+
+sub default_option
+{
+ my ($self, $option) = @_;
+
+ croak "unknown option \"$option\" for structure \"$self->{'name'}\"";
+}
+
+
+=item set_options (OPTION =E<gt> VALUE, ...)
+
+Sets one or more option values. (You can supply as many
+C<OPTION =E<gt> VALUE> pairs as you like, just so long as there are an even
+number of arguments.) Each OPTION must be handled by the structure
+module (as indicated by the C<known_option> method); if not
+C<set_options> will C<croak>. Each VALUE may be any scalar value; it's
+up to the structure module to validate them.
+
+=cut
+
+sub set_options
+{
+ my $self = shift;
+ my ($option, $value);
+
+ croak "must supply an even number of arguments (option/value pairs)"
+ unless @_ % 2 == 0;
+ while (@_)
+ {
+ ($option, $value) = (shift, shift);
+ croak "unknown option \"$option\" for structure \"$self->{'name'}\""
+ unless $self->known_option ($option);
+ $self->{'options'}{$option} = $value;
+ }
+}
+
+
+=item get_options (OPTION, ...)
+
+Returns the value(s) of one or more options. Any OPTION that has not
+been set by C<set_options> will return its default value, fetched using
+the C<default_value> method. If OPTION is not supported by the
+structure module, then your program either already crashed (when it
+tried to set it with C<set_option>), or it will crash here (thanks to
+calling C<default_option>).
+
+=back
+
+=cut
+
+sub get_options
+{
+ my $self = shift;
+ my ($options, $option, $value, @values);
+
+ $options = $self->{'options'};
+ while (@_)
+ {
+ $option = shift;
+ $value = (exists $options->{$option})
+ ? $options->{$option}
+ : $self->default_option ($option);
+ push (@values, $value);
+ }
+
+ wantarray ? @values : $values[0];
+}
+
+
+
+# ----------------------------------------------------------------------
+# Text::BibTeX::StructuredEntry methods dealing with entry structure
+
+package Text::BibTeX::StructuredEntry;
+use strict;
+use vars qw(@ISA);
+use Carp;
+
+@ISA = ('Text::BibTeX::Entry');
+use Text::BibTeX qw(:metatypes display_list);
+
+=head1 METHODS 2: BASE STRUCTURED ENTRY CLASS
+
+The other class provided by the C<Structure> module is
+C<StructuredEntry>, the base class for all structured entry classes.
+This class inherits from C<Entry>, so all of its entry
+query/manipulation methods are available. C<StructuredEntry> adds
+methods for checking that an entry conforms to the database structure
+defined by a structure class.
+
+It only makes sense for C<StructuredEntry> to be used as a base class;
+you would never create standalone C<StructuredEntry> objects. The
+superficial reason for this is that only particular structured-entry
+classes have an actual structure class associated with them,
+C<StructuredEntry> on its own doesn't have any information about allowed
+types, required fields, field constraints, and so on. For a deeper
+understanding, consult L<"CLASS INTERACTIONS"> above.
+
+Since C<StructuredEntry> derives from C<Entry>, it naturally operates on
+BibTeX entries. Hence, the following descriptions refer to "the
+entry"---this is just the object (entry) being operated on. Note that
+these methods are presented in bottom-up order, meaning that the methods
+you're most likely to actually use---C<check>, C<coerce>, and
+C<silently_coerce> are at the bottom. On a first reading, you'll
+probably want to skip down to them for a quick summary.
+
+=over 4
+
+=item structure ()
+
+Returns the object that defines the structure the entry to which is
+supposed to conform. This will be an instantiation of some structure
+class, and exists mainly so the check/coerce methods can query the
+structure about the types and fields it recognizes. If, for some
+reason, you wanted to query an entry's structure about the validity of
+type C<foo>, you might do this:
+
+ # assume $entry is an object of some structured entry class, i.e.
+ # it inherits from Text::BibTeX::StructuredEntry
+ $structure = $entry->structure;
+ $foo_known = $structure->known_type ('foo');
+
+=cut
+
+sub structure
+{
+ my $self = shift;
+ $self->{'structure'};
+}
+
+
+=item check_type ([WARN])
+
+Returns true if the entry has a valid type according to its structure.
+If WARN is true, then an invalid type results in a warning being
+printed.
+
+=cut
+
+sub check_type
+{
+ my ($self, $warn) = @_;
+
+ my $type = $self->{'type'};
+ if (! $self->{'structure'}->known_type ($type))
+ {
+ $self->warn ("unknown entry type \"$type\"") if $warn;
+ return 0;
+ }
+ return 1;
+}
+
+
+=item check_required_fields ([WARN [, COERCE]])
+
+Checks that all required fields are present in the entry. If WARN is
+true, then a warning is printed for every missing field. If COERCE is
+true, then missing fields are set to the empty string.
+
+This isn't generally used by other code; see the C<check> and C<coerce>
+methods below.
+
+=cut
+
+sub check_required_fields
+{
+ my ($self, $warn, $coerce) = @_;
+ my ($field, $warning);
+ my $num_errors = 0;
+
+ foreach $field ($self->{'structure'}->required_fields ($self->type))
+ {
+ if (! $self->exists ($field))
+ {
+ $warning = "required field '$field' not present" if $warn;
+ if ($coerce)
+ {
+ $warning .= " (setting to empty string)" if $warn;
+ $self->set ($field, '');
+ }
+ $self->warn ($warning) if $warn;
+ $num_errors++;
+ }
+ }
+
+ # Coercion is always successful, so if $coerce is true return true.
+ # Otherwise, return true if no errors found.
+
+ return $coerce || ($num_errors == 0);
+
+} # check_required_fields
+
+
+=item check_field_constraints ([WARN [, COERCE]])
+
+Checks that the entry conforms to all of the field constraints imposed
+by its structure. Recall that a field constraint consists of a list of
+fields, and a minimum and maximum number of those fields that must be
+present in an entry. For each constraint, C<check_field_constraints>
+simply counts how many fields in the constraint's field set are present.
+If this count falls below the minimum or above the maximum for that
+constraint and WARN is true, a warning is issued. In general, this
+warning is of the form "between x and y of fields foo, bar, and baz must
+be present". The more common cases are handled specially to generate
+more useful and human-friendly warning messages.
+
+If COERCE is true, then the entry is modified to force it into
+conformance with all field constraints. How this is done depends on
+whether the violation is a matter of not enough fields present in the
+entry, or of too many fields present. In the former case, just enough
+fields are added (as empty strings) to meet the requirements of the
+constraint; in the latter case, fields are deleted. Which fields to add
+or delete is controlled by the order of fields in the constraint's field
+list.
+
+An example should clarify this. For instance, a field constraint
+specifying that exactly one of C<author> or C<editor> must appear in an
+entry would look like this:
+
+ [1, 1, ['author', 'editor']]
+
+Suppose the following entry is parsed and expected to conform to this
+structure:
+
+ @inbook{unknown:1997a,
+ title = "An Unattributed Book Chapter",
+ booktitle = "An Unedited Book",
+ publisher = "Foo, Bar \& Company",
+ year = 1997
+ }
+
+If C<check_field_constraints> is called on this method with COERCE true
+(which is done by any of the C<full_check>, C<coerce>, and
+C<silently_coerce> methods), then the C<author> field is set to the
+empty string. (We go through the list of fields in the constraint's
+field set in order -- since C<author> is the first missing field, we
+supply it; with that done, the entry now conforms to the
+C<author>/C<editor> constraint, so we're done.)
+
+However, if the same structure was applied to this entry:
+
+ @inbook{smith:1997a,
+ author = "John Smith",
+ editor = "Fred Jones",
+ ...
+ }
+
+then the C<editor> field would be deleted. In this case, we allow the
+first field in the constraint's field list---C<author>. Since only one
+field from the set may be present, all fields after the first one are in
+violation, so they are deleted.
+
+Again, this method isn't generally used by other code; rather, it is
+called by C<full_check> and its friends below.
+
+=cut
+
+sub check_field_constraints
+{
+ my ($self, $warn, $coerce) = @_;
+
+ my $num_errors = 0;
+ my $constraint;
+
+ foreach $constraint ($self->{'structure'}->field_constraints ($self->type))
+ {
+ my ($warning);
+ my ($min, $max, $fields) = @$constraint;
+
+ my $field;
+ my $num_seen = 0;
+ map { $num_seen++ if $self->exists ($_) } @$fields;
+
+ if ($num_seen < $min || $num_seen > $max)
+ {
+ if ($warn)
+ {
+ if ($min == 0 && $max > 0)
+ {
+ $warning = sprintf ("at most %d of fields %s may be present",
+ $max, display_list ($fields, 1));
+ }
+ elsif ($min < @$fields && $max == @$fields)
+ {
+ $warning = sprintf ("at least %d of fields %s must be present",
+ $min, display_list ($fields, 1));
+ }
+ elsif ($min == $max)
+ {
+ $warning = sprintf ("exactly %d of fields %s %s be present",
+ $min, display_list ($fields, 1),
+ ($num_seen < $min) ? "must" : "may");
+ }
+ else
+ {
+ $warning = sprintf ("between %d and %d of fields %s " .
+ "must be present",
+ $min, $max, display_list ($fields, 1))
+ }
+ }
+
+ if ($coerce)
+ {
+ if ($num_seen < $min)
+ {
+ my @blank = @{$fields}[$num_seen .. ($min-1)];
+ $warning .= sprintf (" (setting %s to empty string)",
+ display_list (\@blank, 1))
+ if $warn;
+ @blank = map (($_, ''), @blank);
+ $self->set (@blank);
+ }
+ elsif ($num_seen > $max)
+ {
+ my @delete = @{$fields}[$max .. ($num_seen-1)];
+ $warning .= sprintf (" (deleting %s)",
+ display_list (\@delete, 1))
+ if $warn;
+ $self->delete (@delete);
+ }
+ } # if $coerce
+
+ $self->warn ($warning) if $warn;
+ $num_errors++;
+ } # if $num_seen out-of-range
+
+ } # foreach $constraint
+
+ # Coercion is always successful, so if $coerce is true return true.
+ # Otherwise, return true if no errors found.
+
+ return $coerce || ($num_errors == 0);
+
+} # check_field_constraints
+
+
+=item full_check ([WARN [, COERCE]])
+
+Returns true if an entry's type and fields are all valid. That is, it
+calls C<check_type>, C<check_required_fields>, and
+C<check_field_constraints>; if all of them return true, then so does
+C<full_check>. WARN and COERCE are simply passed on to the three
+C<check_*> methods: the first controls the printing of warnings, and the
+second decides whether we should modify the entry to force it into
+conformance.
+
+=cut
+
+sub full_check
+{
+ my ($self, $warn, $coerce) = @_;
+
+ return 1 unless $self->metatype == &BTE_REGULAR;
+ return unless $self->check_type ($warn);
+ return $self->check_required_fields ($warn, $coerce) &&
+ $self->check_field_constraints ($warn, $coerce);
+}
+
+
+# Front ends for full_check -- there are actually four possible wrappers,
+# but having both $warn and $coerce false is pointless.
+
+=item check ()
+
+Checks that the entry conforms to the requirements of its associated
+database structure: the type must be known, all required fields must be
+present, and all field constraints must be met. See C<check_type>,
+C<check_required_fields>, and C<check_field_constraints> for details.
+
+Calling C<check> is the same as calling C<full_check> with WARN true and
+COERCE false.
+
+=item coerce ()
+
+Same as C<check>, except entries are coerced into conformance with the
+database structure---that is, it's just like C<full_check> with both
+WARN and COERCE true.
+
+=item silently_coerce ()
+
+Same as C<coerce>, except warnings aren't printed---that is, it's just
+like C<full_check> with WARN false and COERCE true.
+
+=back
+
+=cut
+
+sub check { shift->full_check (1, 0) }
+
+sub coerce { shift->full_check (1, 1) }
+
+sub silently_coerce { shift->full_check (0, 1) }
+
+1;
+
+=head1 SEE ALSO
+
+L<Text::BibTeX>, L<Text::BibTeX::Entry>, L<Text::BibTeX::File>
+
+=head1 AUTHOR
+
+Greg Ward <gward@python.net>
+
+=head1 COPYRIGHT
+
+Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file
+is part of the Text::BibTeX library. This library is free software; you
+may redistribute it and/or modify it under the same terms as Perl itself.