Java RegEx Utilities

PDF HTML FlashPaper

Main

jre-utils v0.5.1



Usage

To start, create and initiate the object. It's as simple as this:

<cfset jre = CreateObject("component","jre-utils").init()/>

However, you may want to change the default options...

<cfset jre = CreateObject("component","jre-utils").init ( DefaultFlags = 'CASE_INSENSITIVE' , BackslashReferences = true )/>

For details on what options are available, see "INIT OPTIONS" section below.

Examples

A good example of when jre-utils comes in handy can be seen in my QueryParam Scanner project. In order to locate missing cfqueryparam tags, it is first necessary to locate all cfquery blocks, which requires the jre-utils get function.

jre.Get(FileData,"(?si)(?<=(<cfquery[^p][^>]{0,300}>)).*?(?=</cfquery>)")

The get function returns an array of all the matches to the regular expression - in this example, it returns the contents of all the queries. Also worth noting is the look-behind operator (?<=...) which allows the query tag itself to be excluded from the match - this is another Java regex benefit; regular CF regex only supports look-aheads, not look-behinds.



Init Options

DefaultFlags">DefaultFlags

Comma-delimited list of Regex Pattern flags. Default value is "MULTILINE". See "REGEX PATTERN FLAGS" for details of available flags.

IgnoreInvalidFlags">IgnoreInvalidFlags

Boolean value, defaults to false. Enable IgnoreInvalidFlags to ignore invalid flags, instead of the default action of throwing an error.

BackslashReferences">BackslashReferences

Boolean value, defaults to false. Java references uses $, whilst CF uses \ instead. Enable BackslashReferences to allow \ instead of $ in replace values.



Functions

get ( Text , Pattern , [Flags] ) - Array

Returns an Array containing the matches of Pattern in Text. Allows optional comma-delimited list of flags (see "REGEX PATTERN FLAGS")

getNoCase ( Text , Pattern , [Flags] ) - Array

Same as get(), but case-insensitive.

replace ( Text , Pattern , Replacement , [Scope] ) - String

Returns the Text after applying Replacement to match(es) of Pattern. Set Scope to "ALL" to apply to all matches, otherwise only first match. Replacement can be a standard RegEx replacement string, or the name of a callback function to be applied to each match in turn. Callback function should accept two arguments, the string match, and an optional array of any groups found within the regex.

This is an example callback function that does nothing with the match:

<cffunction name="CallbackFunction" returntype="String" output="false"> <cfargument name="Match" type="String"/> <cfargument name="Groups" type="Array" default="#ArrayNew(1)#"/> <cfreturn Arguments.Match/> </cffunction>



RegEx Pattern Flags

The following can be used for the DefaultFlags option or the get() functions:

UNIX_LINES

Only \n is recognised as a line terminator (\r is not) Equivalent to (?d) embedded expression.

CASE_INSENSITIVE

Enables case-insensitive matching, by default only for the US-ASCII charset. (See UNICODE_CASE below) Equivalent to (?i) embedded expression.

UNICODE_CASE

Allows CASE_INSENSITIVE to also work for Unicode. Equivalent to (?u) embedded expression.

COMMENTS

Permits whitespace and single-line comments in the regex pattern. Equivalent to (?x) embedded expression.

MULTILINE

Enables multiline mode, so ^ and $ also matches against the start/end of each line. By default ^ and $ do not match each line, only against the entire string. Equivalent to (?m) embedded expression.

DOTALL

Enables dotall mode, so the . also matches line terminators. By default . excludes line terminators. Equivalent to (?s) embedded expression.

CANON_EQ

Enables canonical equivalence. When this flag is specified then two characters will be considered to match if their full canonical decompositions match.



Support

For Regular Expression help, use the RegEx mailing list on House of Fusion:

The reference sheets on Regular-Expressions.info can be useful.

For support on this project, please use the RIA Forge contact page.



Credits

jre-utils project created and maintained by Peter Boughton

Significant amount of code re-factored from UDFs originally by Ben Nadel

Descriptions of Regex Pattern flags adapted from Java documentation: http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html



Engine Compatibility

jre-utils uses Java libraries, so requires a Java-based CFML engine.

It has been tested and works on the following:

It has not been tested, but might work with:

  • Smith Project



Known Issues

There are 0 known issues with the current release...

BlueDragon Server compatibility Fixed in v0.5.1