(http://www.faximum.com/technotes/215)
TITLE: #215 - Extended DID string manipulation feature
KEYWORDS: DID regular expression
RELEASE: Special
CLASSIFICATION: All
PROBLEM: This TN documents the special DID pattern matching feature.
CAUSE: N/A
SOLUTION: This note provides detailed information on the use of the
DID pattern matching feature.
The purposes of this feature is to enable users to interface
the Faximum fax server with DID trunks that format their
DID information in strange and wonderful ways.
If your DID system merely provides four (or so) digits
which represent the "extension" being dialled, then read
no further -- this is not for you.
On the other hand, if your system sends out strings like:
#12#2343#2221#
where (for example) 12 is a special operations code, and
2343 is the originating extension and 2221 is the actual
DID string you want Faximum to use for routing purposes,
then you, my friend, need DID pattern matching.
With DID pattern matching you can configure Faximum to
extract those parts of the DID string that you want.
In the above example, one would configure Faximum by
editing the appropriate fax-line-* file(s) and adding
parameters of the form:
did-pattern = '#..#....#\(.....\)#'
did-macro = '$a'
(As an aside the location of your fax-line-* files will
depend upon the product you are configuring. Please
refer to your Faximum documentation for more information.)
The did-pattern is an extended regular expression (see
documentation below). The did-macro is the string to
use for routing (with $a, $b, $c, etc. replaced with the
first, second, third, etc. parts of the pattern within
the \( and \) characters.
Please note that the '*' asterisk character is a regular
expression and if you need to match * from your phone
system then you must match \*.
For a more advanced example, let us assume that we must
be able to handle DID strings of two flavours:
#12#2343#2221#
and
#12##2221#
One possible pattern that would accomplish that would be:
did-pattern = '#..#[0123456789]*#\(.....\)#'
You may wish to consider seriously the advantages of
purchasing Technical Support from Faximum Software to
assist with the configuration and implementation of your
DID regular expressions.
The following is extracted from the Linux documentation
on regular expressions.
An extended RE is one or more non-empty branches, separated
by `|'. It matches anything that matches one of the
branches.
A branch is one or more pieces, concatenated. It matches
a match for the first, followed by a match for the second,
etc.
A piece is an atom possibly followed by a single `*', `+',
`?', or bound. An atom followed by `*' matches a sequence
of 0 or more matches of the atom. An atom followed by `+'
matches a sequence of 1 or more matches of the atom. An
atom followed by `?' matches a sequence of 0 or 1 matches
of the atom.
A bound is `{' followed by an unsigned decimal integer,
possibly followed by `,' possibly followed by another
unsigned decimal integer, always followed by `}'. The
integers must lie between 0 and RE_DUP_MAX (255) inclu-
sive, and if there are two of them, the first may not
exceed the second. An atom followed by a bound containing
one integer i and no comma matches a sequence of exactly i
matches of the atom. An atom followed by a bound contain-
ing one integer i and a comma matches a sequence of i or
more matches of the atom. An atom followed by a bound
containing two integers i and j matches a sequence of i
through j (inclusive) matches of the atom.
An atom is a regular expression enclosed in `()' (matching
a match for the regular expression), an empty set of `()'
(matching the null string), a bracket expression (see
below), `.' (matching any single character), `^' (match-
ing the null string at the beginning of a line), `$'
(matching the null string at the end of a line), a `\'
followed by one of the characters `^.[$()|*+?{\' (matching
that character taken as an ordinary character), a `\' fol-
lowed by any other character (matching that character
taken as an ordinary character, as if the `\' had not been
present), or a single character with no other significance
(matching that character). A `{' followed by a character
other than a digit is an ordinary character, not the
beginning of a bound. It is illegal to end an RE with
`\'.
A bracket expression is a list of characters enclosed in
`[]'. It normally matches any single character from the
list (but see below). If the list begins with `^', it
matches any single character (but see below) not from the
rest of the list. If two characters in the list are sepa-
rated by `-', this is shorthand for the full range of
characters between those two (inclusive) in the collating
sequence, e.g. `[0-9]' in ASCII matches any decimal digit.
It is illegal for two ranges to share an endpoint, e.g.
`a-c-e'. Ranges are very collating-sequence-dependent,
and portable programs should avoid relying on them.
To include a literal `]' in the list, make it the first
character (following a possible `^'). To include a lit-
eral `-', make it the first or last character, or the sec-
ond endpoint of a range. To use a literal `-' as the
first endpoint of a range, enclose it in `[.' and `.]' to
make it a collating element (see below). With the excep-
tion of these and some combinations using `[' (see next
paragraphs), all other special characters, including `\',
lose their special significance within a bracket expres-
sion.
Within a bracket expression, a collating element (a char-
acter, a multi-character sequence that collates as if it
were a single character, or a collating-sequence name for
either) enclosed in `[.' and `.]' stands for the sequence
of characters of that collating element. The sequence is
a single element of the bracket expression's list. A
bracket expression containing a multi-character collating
element can thus match more than one character, e.g. if
the collating sequence includes a `ch' collating element,
then the RE `[[.ch.]]*c' matches the first five characters
of `chchcc'.
Within a bracket expression, a collating element enclosed
in `[=' and `=]' is an equivalence class, standing for the
sequences of characters of all collating elements equiva-
lent to that one, including itself. (If there are no
other equivalent collating elements, the treatment is as
if the enclosing delimiters were `[.' and `.]'.) For
example, if o and ^ are the members of an equivalence
class, then `[[=o=]]', `[[=^=]]', and `[o^]' are all syn-
onymous. An equivalence class may not be an endpoint of a
range.
In the event that an RE could match more than one sub-
string of a given string, the RE matches the one starting
earliest in the string. If the RE could match more than
one substring starting at that point, it matches the
longest. Subexpressions also match the longest possible
substrings, subject to the constraint that the whole match
be as long as possible, with subexpressions starting ear-
lier in the RE taking priority over ones starting later.
Note that higher-level subexpressions thus take priority
over their lower-level component subexpressions.
Match lengths are measured in characters, not collating
elements. A null string is considered longer than no
match at all. For example, `bb*' matches the three middle
characters of `abbbc', `(wee|week)(knights|nights)'
matches all ten characters of `weeknights', when `(.*).*'
is matched against `abc' the parenthesized subexpression
matches all three characters, and when `(a*)*' is matched
against `bc' both the whole RE and the parenthesized
subexpression match the null string.
If case-independent matching is specified, the effect is
much as if all case distinctions had vanished from the
alphabet. When an alphabetic that exists in multiple
cases appears as an ordinary character outside a bracket
expression, it is effectively transformed into a bracket
expression containing both cases, e.g. `x' becomes `[xX]'.
When it appears inside a bracket expression, all case
counterparts of it are added to the bracket expression, so
that (e.g.) `[x]' becomes `[xX]' and `[^x]' becomes
`[^xX]'.
No particular limit is imposed on the length of REs. Pro-
grams intended to be portable should not employ REs longer
than 256 bytes, as an implementation can refuse to accept
such REs and remain POSIX-compliant.
The Faximum Software expects the DID string to be used for
routing purposes to be less than 32 characters in length.
ACKNOWLEDGEMENT
This description of regular expressions was taken from
Henry Spencer's regex package.
TechNote: 215 - Copyright 2000 Faximum Software Inc., All Rights Reserved.
Last Updated: Wed Jun 21 21:40:48 PDT 2000
The complete set of Faximum TechNotes are available on the Internet at
http://www.faximum.com/TechSupport
© Copyright 2001 Faximum Software Inc. All Rights Reserved.