Faximum TechNote #215

Faximum TechNote #215

(http://www.faximum.com/technotes/215)

TITLE:    #215 - Extended DID string manipulation feature

KEYWORDS: DID regular expression

RELEASE:  Special

CLASSIFICATION: All

PROBLEM:  This TN documents the special DID pattern matching feature.

CAUSE:	  N/A

SOLUTION: This note provides detailed information on the use of the
	  DID pattern matching feature.


	  The purposes of this feature is to enable users to interface
	  the Faximum fax server with DID trunks that format their
	  DID information in strange and wonderful ways.

	  If your DID system merely provides four (or so) digits
	  which represent the "extension" being dialled, then read
	  no further -- this is not for you.

	  On the other hand, if your system sends out strings like:
	  	#12#2343#2221#
	
	  where (for example) 12 is a special operations code, and 
	  2343 is the originating extension and 2221 is the actual
	  DID string you want Faximum to use for routing purposes,
	  then you, my friend, need DID pattern matching.

	  With DID pattern matching you can configure Faximum to
	  extract those parts of the DID string that you want.

	  In the above example, one would configure Faximum by
	  editing the appropriate fax-line-* file(s) and adding
	  parameters of the form:

	  	did-pattern = '#..#....#\(.....\)#'
		did-macro = '$a'

	  (As an aside the location of your fax-line-* files will
	  depend upon the product you are configuring. Please
	  refer to your Faximum documentation for more information.)

	  The did-pattern is an extended regular expression (see
	  documentation below). The did-macro is the string to
	  use for routing (with $a, $b, $c, etc. replaced with the
	  first, second, third, etc. parts of the pattern within
	  the \( and \) characters.

	  Please note that the '*' asterisk character is a regular
	  expression and if you need to match * from your phone
	  system then you must match \*.

	  For a more advanced example, let us assume that we must
	  be able to handle DID strings of two flavours:
	  	#12#2343#2221#
	  and
	  	#12##2221#

	  One possible pattern that would accomplish that would be:
	  	did-pattern = '#..#[0123456789]*#\(.....\)#'

	  You may wish to consider seriously the advantages of
	  purchasing Technical Support from Faximum Software to
	  assist with the configuration and implementation of your
	  DID regular expressions.

	  The following is extracted from the Linux documentation
	  on regular expressions.

	  An extended RE is one or more non-empty branches, separated
	  by `|'.  It matches  anything  that  matches  one  of  the
	  branches.

	  A  branch is one or more pieces, concatenated.  It matches
	  a match for the first, followed by a match for the second,
	  etc.

	  A piece is an atom possibly followed by a single `*', `+',
	  `?', or bound.  An atom followed by `*' matches a sequence
	  of 0 or more matches of the atom.  An atom followed by `+'
	  matches a sequence of 1 or more matches of the  atom.   An
	  atom  followed by `?' matches a sequence of 0 or 1 matches
	  of the atom.

	  A bound is `{' followed by an  unsigned  decimal  integer,
	  possibly  followed  by  `,'  possibly  followed by another
	  unsigned decimal integer, always  followed  by  `}'.   The
	  integers  must  lie  between 0 and RE_DUP_MAX (255) inclu-
	  sive, and if there are two of  them,  the  first  may  not
	  exceed the second.  An atom followed by a bound containing
	  one integer i and no comma matches a sequence of exactly i
	  matches of the atom.  An atom followed by a bound contain-
	  ing one integer i and a comma matches a sequence of  i  or
	  more  matches  of  the  atom.  An atom followed by a bound
	  containing two integers i and j matches a  sequence  of  i
	  through j (inclusive) matches of the atom.

	  An atom is a regular expression enclosed in `()' (matching
	  a match for the regular expression), an empty set of  `()'
	  (matching  the  null  string),  a  bracket expression (see
	  below), `.'  (matching any single character), `^'  (match-
	  ing  the  null  string  at  the  beginning of a line), `$'
	  (matching the null string at the end of  a  line),  a  `\'
	  followed by one of the characters `^.[$()|*+?{\' (matching
	  that character taken as an ordinary character), a `\' fol-
	  lowed  by  any  other  character  (matching that character
	  taken as an ordinary character, as if the `\' had not been
	  present), or a single character with no other significance
	  (matching that character).  A `{' followed by a  character
	  other  than  a  digit  is  an  ordinary character, not the
	  beginning of a bound.  It is illegal to  end  an  RE  with
	  `\'.

	  A  bracket  expression is a list of characters enclosed in
	  `[]'.  It normally matches any single character  from  the
	  list  (but  see  below).   If the list begins with `^', it
	  matches any single character (but see below) not from  the
	  rest of the list.  If two characters in the list are sepa-
	  rated by `-', this is shorthand  for  the  full  range  of
	  characters  between those two (inclusive) in the collating
	  sequence, e.g. `[0-9]' in ASCII matches any decimal digit.
	  It  is  illegal  for two ranges to share an endpoint, e.g.
	  `a-c-e'.  Ranges  are  very  collating-sequence-dependent,
	  and portable programs should avoid relying on them.

	  To  include  a  literal `]' in the list, make it the first
	  character (following a possible `^').  To include  a  lit-
	  eral `-', make it the first or last character, or the sec-
	  ond endpoint of a range.  To use  a  literal  `-'  as  the
	  first  endpoint of a range, enclose it in `[.' and `.]' to
	  make it a collating element (see below).  With the  excep-
	  tion  of  these  and some combinations using `[' (see next
	  paragraphs), all other special characters, including  `\',
	  lose  their  special significance within a bracket expres-
	  sion.

	  Within a bracket expression, a collating element (a  char-
	  acter,  a  multi-character sequence that collates as if it
	  were a single character, or a collating-sequence name  for
	  either)  enclosed in `[.' and `.]' stands for the sequence
	  of characters of that collating element.  The sequence  is
	  a  single  element  of  the  bracket expression's list.  A
	  bracket expression containing a multi-character  collating
	  element  can  thus  match more than one character, e.g. if
	  the collating sequence includes a `ch' collating  element,
	  then the RE `[[.ch.]]*c' matches the first five characters
	  of `chchcc'.

	  Within a bracket expression, a collating element  enclosed
	  in `[=' and `=]' is an equivalence class, standing for the
	  sequences of characters of all collating elements  equiva-
	  lent  to  that  one,  including  itself.  (If there are no
	  other equivalent collating elements, the treatment  is  as
	  if  the  enclosing  delimiters  were  `[.' and `.]'.)  For
	  example, if o and ^ are  the  members  of  an  equivalence
	  class,  then `[[=o=]]', `[[=^=]]', and `[o^]' are all syn-
	  onymous.  An equivalence class may not be an endpoint of a
	  range.

	  In  the  event  that  an RE could match more than one sub-
	  string of a given string, the RE matches the one  starting
	  earliest  in  the string.  If the RE could match more than
	  one substring starting  at  that  point,  it  matches  the
	  longest.   Subexpressions  also match the longest possible
	  substrings, subject to the constraint that the whole match
	  be  as long as possible, with subexpressions starting ear-
	  lier in the RE taking priority over ones  starting  later.
	  Note  that  higher-level subexpressions thus take priority
	  over their lower-level component subexpressions.

	  Match lengths are measured in  characters,  not  collating
	  elements.   A  null  string  is  considered longer than no
	  match at all.  For example, `bb*' matches the three middle
	  characters    of   `abbbc',   `(wee|week)(knights|nights)'
	  matches all ten characters of `weeknights', when  `(.*).*'
	  is  matched  against `abc' the parenthesized subexpression
	  matches all three characters, and when `(a*)*' is  matched
	  against  `bc'  both  the  whole  RE  and the parenthesized
	  subexpression match the null string.

	  If case-independent matching is specified, the  effect  is
	  much  as  if  all  case distinctions had vanished from the
	  alphabet.  When an  alphabetic  that  exists  in  multiple
	  cases  appears  as an ordinary character outside a bracket
	  expression, it is effectively transformed into  a  bracket
	  expression containing both cases, e.g. `x' becomes `[xX]'.
	  When it appears inside  a  bracket  expression,  all  case
	  counterparts of it are added to the bracket expression, so
	  that  (e.g.)  `[x]'  becomes  `[xX]'  and  `[^x]'  becomes
	  `[^xX]'.

	  No particular limit is imposed on the length of REs.  Pro-
	  grams intended to be portable should not employ REs longer
	  than  256 bytes, as an implementation can refuse to accept
	  such REs and remain POSIX-compliant.

	  The Faximum Software expects the DID string to be used for
	  routing purposes to be less than 32 characters in length.

	  ACKNOWLEDGEMENT

	  This description of regular expressions was taken from 
	  Henry Spencer's regex package.


TechNote: 215 - Copyright 2000 Faximum Software Inc., All Rights Reserved.
Last Updated: Wed Jun 21 21:40:48 PDT 2000
The complete set of Faximum TechNotes are available on the Internet at 
	http://www.faximum.com/TechSupport

© Copyright 2001 Faximum Software Inc. All Rights Reserved.