SEDIT
From ICE Enterprises
Edits a string and writes it to a results parameter
<ISTR> Input string to be edited (String results label or literal string)
<OSTR> Output results label
<FUNC> Edit operation to perform (see below for LIST)
<P1> Edit parameter as defined for each operation
<P2> Edit parameter as defined for each operation
NOTE: SEDIT can now handle MULTIPLE operations per line (entering just ONE
output result name):
nM> SEDIT <input> <output result> FUNC1 [P1] [P2] ... FUNCn [P1n] [P2n]
This command performs a few basic string editing functions that may be required
by some macros with a sophisticated user interface or command line alteration
functions.
The supported operations (CAPS indicate minimum required name) are:
APPEND, BETWeen, BSEArch, CLEAN, ELEMent, EXT, GSUBstitute, HEAD, JOIN,
KEY, LENgth, LOCAse, NELem, NFORM, PADL, PADR, PADB, PARSE, PARSEALl,
PARSEARgs, PARSEDInd, PATH, PREPEND, RANGE, ROOT, SEARch, SELect,
SPLIT, STRIM, SUBstitute, TAIL, TRIM, UPCAse, WORD
NOTE: As of NeXtMidas 3.1.2, the behaviour of The BETWeen, RANGE, and SELect
operations is deprecated and will change in a future release. These
changes are to bring SEDIT in line with X-Midas SEDIT behaviour. These
changes are:
BETWeen: Negative numbers will be treated as offsets from the last
character in the input string.
LENgth: The index passed to the LENgth will be one-based, since it
is counting elements.
RANGE: Like BETWeen, negative numbers will be treated as offsets
from the last character in the input string. The net effect
this is that (in zero index mode) a range of -5 str.length-1
will return the last 5 characters of the string, where
previously it would return the last 6 characters. Also,
leaving off the second index will equate to a value of 0, NOT
the end of the input string.
SELect: This operation will be 1-index based only, since it is
returning an element number, not an index.
These changes can be utilized as of NeXtMidas 3.1.2 using the
/LEGACY=FALSE switch. In a future release, this will be the default
setting. Until that time, users of the above operations not using
/LEGACY=FALSE will get deprecation warnings. There is also an
equivalent switch /seditLegacy added as a convenience so that users
can use the switch at the top of a macro without interfering with
other primitives that use /legacy (i.e. LIST2).
By default SEDIT is "offset" or "zero-based". The /OB or /ONEBASE switch may
be used to FORCE one-based processing.
When using index return value functions (SEARCH, BSEARCH and PARSEDINDEX,..)
one should check for < 0 for invalid or not found. A return value of zero (0)
is NO LONGER SUPPORTED! Zero is a valid index in JAVA and should not be used
as an error indicator.
NOTE 1: In NeXtMidas a string result has access to all of the methods of the
string class. Use the QUERY command to show the available methods. For
instance, we can use raw JAVA toUpperCase method as follows:
nM> res str "This is a string"
nM> res str2 str.touppercase()
nm> res str2
16S: STR2 = THIS IS A STRING
NOTE 2: In NeXtMidas null strings, "" are differentiated from blank spaces,
" ". Thus, there is no "special string" "<SPACE>" as in the X-Midas
version of SEDIT. For example,
nM> res orig "Underscores_to_spaces"
nM> sedit orig str gsubs "_" " "
21S: STR = "Underscores to spaces"
FUNCTIONS:
APPend - Append <P1> to <string>
BETWeen - ZERO-BASED: extracts any possible substring from <string> between
<P1> and <P2>. It doesn't matter whether <P1> is "less than" <P2>
or not; BETWEEN returns what is *between* the two indices,
inclusive.
A negative number means that many characters from the other index.
In general, BETWEEN will give as much of the string as possible if
at least one index is within the string. If both indices are out of
range, a blank string will be returned.
ONE-BASED (/OB) ONLY: A zero means the end of the string.
The zero is a special case; if the other index is within range,
it returns the indicated substring; if not, it returns blank.
NOTE: As of NeXtMidas 3.1.2, the BETWeen function will have the
following change to behaviour when using /LEGACY=FALSE. This
will be the default behaviour in a later release.
Negative numbers will represent an index from the end of
the string. So, in zero-based, "-1" is the last character
in the string, and in one-based, "0" is the last character.
Like RANGE, BETWeen will support leaving off the last index
to mean the end of String. See the examples below.
Examples (Zero-Based):
* Get the first 7 characters of a string:
nM> SEDIT "This is a string" str "BETWEEN" 0 6
str = "This is"
nM> SEDIT "This is a string" str "BETWEEN" 6 0
str = "This is"
* Get the last 6 characters of a string
nM> res instr "This is a string"
nM> SEDIT instr str "BETWEEN" ^instr.length -5
str = "string"
nM> SEDIT/LEGACY=FALSE instr str "BETWEEN" -1 -6
str = "string"
* Get the characters OUT-OF-RANGE of a string
nM> SEDIT "This is a string" str "BETWEEN" 100 200
str = ""
* Get the entire string
nM> SEDIT "This is a string" str "BETWEEN" 0 200
str = "This is a string"
* Get five (5) characters from character 4 to the beginning
of the string
nM> SEDIT "This is a string" str "BETWEEN" 4 -100
str = "This "
* The INCORRECT way to get the last character of the string
nM> res instr "This is a string"
nM> SEDIT instr str "BETWEEN" ^instr.length ^instr.length
str = ""
* The CORRECT way to get the last character of the string
nM> res instr "This is a string"
nM> SEDIT instr str "BETWEEN" ^instr.length-1 ^instr.length-1
str = "g"
nM> SEDIT/LEGACY=FALSE instr str "BETWEEN" -1 -1
str = "g"
* Get the first character of the string
nM> SEDIT instr str "BETWEEN" 0 0
str = "T"
* Get the last five characters in a string
nM> SEDIT/LEGACY=FALSE instr str "BETWEEN" -6
str = "string"
Examples (one-based):
SEDIT/OB "This is a string" str "BETWEEN" 1 7
str = "This is"
SEDIT/OB "This is a string" str "BETWEEN" 7 1
str = "This is"
res instr "This is a string"
SEDIT/OB instr str "BETWEEN" ^instr.length -5
str = "string"
SEDIT/OB "This is a string" str "BETWEEN" 100 200
str = ""
SEDIT/OB "This is a string" str "BETWEEN" 4 -100
str = "This"
res instr "This is a string"
SEDIT/OB instr str "BETWEEN" ^instr.length ^instr.length
str = "g"
SEDIT/OB instr str "BETWEEN" 0 0
str = ""
BSEArch - search for the substring <P1> in <string> starting from
the back and return index of the start of the string in
<label>. Returns -1 if not found.
Examples:
SEDIT "This is a string" idx "BSEARCH" "string"
idx = 11
SEDIT "This is a string" idx "BSEARCH" "STRING"
idx = -1
CLEAN - returns the CLEANed version of <string> in <label>
(see nxm.sys.lib.Parser)
Example:
SEDIT "This is a string" str CLEAN
str = THIS,IS,A,STRING
ELEMent - extracts the nth delimited word where n = <P1> and the delimiter =
<P2>. The string is not CLEANED as in PARSE. <P2> can be more than
1 character in length.
Examples:
SEDIT "This/is/a//string" str "ELEM" 2 "/"
str = is
SEDIT "Bill and Bob and Jebediah" str "ELEM" 3 " and "
str = Jebediah
Examples:
nM>SEDIT "This is a string" x "ENDS" "string"
z: X = true
nM>SEDIT "This is a string" x "ENDS" "This"
z: X = false
EXT - returns the filename extension (see ROOT, TAIL, PATH)
GSUBstitute - substitutes string <P2> for every instance of <P1>
Examples:
SEDIT "This is a string" str "GSUBS" "is" "at"
str = That at a string
SEDIT "This is a string" str "GSUBS" " " "xx"
str = Thisxxisxxaxxstring
SEDIT "This is a string" str "GSUBS" "is" ""
12S: str = Th a string
SEDIT "This is a string" str "GSUBS" "is" " "
14S: str = Th a string
HEAD - alias for PATH (see ROOT, TAIL, EXT)
JOIN - Join a string array into a String.
Examples:
nM> SEDIT "A string array to join" word SPLIT " "
nM> SEDIT word str JOIN " "
17S: STR = A string array to join
KEY - Gets the value from a string containing TAG/VALUE pairs, where <P1>
is the tag name and <P2> is the delimiter (= is DEFAULT).
Examples:
nM> SEDIT "NAME=homer CITY=SPRINGFIELD" myname KEY "NAME"
LENgth - calculates the length of string through the Nth naturally delimited
entry where <P1> = N. Use N=-1 (default) to calculate length of the
entire string.
Note: The default behaviour or LENgth will change from zero-based to
one-based in a future release, since length is counting elements.
Examples (zero-based):
SEDIT "This is a string" lstr "LEN" 0
lstr = 4
SEDIT "This is a string" lstr "LEN" 2
lstr = 9
SEDIT "This is a string" lstr "LEN"
lstr = 16
Examples (one-based):
SEDIT/OB "This is a string" lstr "LEN" 0
lstr = 16
SEDIT/OB "This is a string" lstr "LEN" 3
lstr = 9
Note that you can; for example, extract the first three
words of some string with the string length followed by RANGE:
RESULT string "This is one powerful utility!"
SEDIT/OB string lstr length 3
SEDIT/OB string substr RANGE 1 lstr
substr = This is one
LOCAse - converts alphabetic characters to lower case
Examples:
SEDIT "This is a string" str "LOCA"
str = this is a string
MASK - Calls Parser.mask to build an integer bit mask of enabled items from a list.
Examples:
SEDIT "AA,BB,CC,DD" mask "MASK" "A|DD"
L: mask = 0x9
NELem - Count the number of elements in string delimited by <P1>. The default
delimiter is a comma (,). Only an empty string or the the RESERVED
word "NULL" has zero (0) elements.
Examples:
SEDIT "" nels "NELEM"
L: NELS = 0
SEDIT "NULL" nels "NELEM"
L: NELS = 0
SEDIT " " nels "NELEM"
L: NELS = 1
SEDIT "," nels "NELEM"
L: NELS = 2
SEDIT "x,y" nels "NELEM"
L: NELS = 2
SEDIT "blank separated string" nels "NELEM" " "
L: NELS = 3
SEDIT "blanks and commas,separated string" nels "NELEM" " "
L: NELS = 4
SEDIT " " nels "NELEM" " " ! Four blanks with blank as delimiter
L: NELS = 5
SEDIT "Bill and Bob and Jebediah" nels "NELEM" " and " ! String delimiter
L: NELS = 3
NFORM - Format numbers according to given mask. Besides all of the standard Java
format strings, Fortran format strings can be applied by surrounding the
Fortran format string with parentheses like (F12.2). The following
format keywords are also supported:
GEN - X-Midas GENeral format (no exponent if between 1E-3, 1E15).
VIS - X-Midas VISual format (no exponent if between 1E-3, 1E7).
SCI - SCIentific notation.
ENG - ENGineering notation (exponent is a multiple of 3).
MAN - MANtissa notation (no exponents).
DMS - Deg-Min-Sec angular format. ddd'mm'ss
LAT - Deg-Min-Sec format for latitude. ddd'mm'ssN
LON - Deg-Min-Sec format for longitude. ddd'mm'ssE
STD - STanDard time code format. [yy]yy:mm:dd::hh:mm:ss
ACQ - ACQuisition time code format. [yy.ddd]:[hh:mm:ss]
EPOCH - EPOCH quadwords for time code. [yy]yy:sec_in_year
NORAD - NORAD timecode format. yyddd.frac_of_day
TCR - TimeCode Reader format. ddd:hh:mm:ss
VAX - VAX time format. dd-MMM-yy[yy]:hh:mm:ss
HMS - Hour-Min-Sec time format. hh:mm:ss.ffff
YMD - Year-Month-Day format. yyyy:mm:dd
NET - 32-bit integer formatted as URL. 127.0.0.1
Examples:
SEDIT 1 str "NFORM" "#"
str = 1
SEDIT 1.12345678 str "NFORM" "#.###"
str = 1.123
SEDIT 1.12345678 str "NFORM" "00.000"
str = 01.123
SEDIT 1 str NFORM "0.00"
str = 1.00
SEDIT 1 str NFORM "(F3.2)"
str = 1.00
SEDIT 123.456 str NFORM "DMS"
STR = 123'27'22
the start of <P2> occurrence of <P1>. If <P1> not found or if
specified occurence of <P1> not found, returns -1. (Since 3.5.0)
Examples:
nM> sedit "1 and 2 and 3 and" index NSEA "and" 3
L: INDEX = 14
nM> sedit "1 and 2 and 3 and" index NSEA "and" 10
L: INDEX = -1
nM> sedit "1 and 2 and 3 and" index NSEA "missing" 3
L: INDEX = -1
PADx - PADRight, PADLeft and PADBoth. Default pad character is a space. You
can specify a length or a relative length by prepending a + before the
number. When necessary when padding BOTH, the extra character goes on
the right.
Examples:
nM> sedit "a string" str PADL 20
20S: STR = " a string"
nM> sedit "a string" str PADR 20 "."
20S: STR = "a string............"
nM> sedit "a string" str PADB 20
20S: STR = " a string "
PARSE - extracts the nth naturally delimited word where n = <P1>.
<P2> is an optional delimiter, if not set commas or spaces are used.
<label> will be in all uppercase letters unless /CLEAN=FALSE.
Examples:
nM> SEDIT "This is a string" str "PARSE" 2
str = IS
nM> SEDIT "This/is/a/string" str "^func" 4 "/" /clean=f
nM> res str
6S: STR = string
PARSEALl - parses <string> into naturally delimited words and puts the parsed
words into a results array named <label>. <P1> specifies the number
of elements in the array. If <P1> is not specified or made 0, the
number of elements within the array will be defaulted to the number
of words in the string. If <P1> is less than the number of words,
only the first <P1> words will be placed into the array. If <P1> is
greater than the number of words, the remaining elements of the
array are null. <P2> is an optional delimiter, if not set commas or
spaces are used. All results will be in all uppercase letters unless
/CLEAN=FALSE.
Examples:
* nM> SEDIT "This is a string" word "PARSEALL" 4
nM> res word(0)
4S: WORD(0) = THIS
nM> res word(2)
1S: WORD(2) = A
nM> res word(4)
ERROR: java.lang.IllegalArgumentException: KeyObject.getIndexed():
Error in accessing element 4 from [Ljava.lang.String;@33b121:
java.lang.ArrayIndexOutOfBoundsException
* nM> SEDIT "This is a string" word "PARSEALL" 4 /clean=f
nM> res word(0)
4S: WORD(0) = This
PARSEARgs - parses the arguments of a command. The returned object is of type
nxm.sys.lib.Args, and as such has access to the Args class methods.
Examples:
nM> sedit "PATH,FUNC=SET,SYS" args PARSEARGS
nM> res L:size args.size
nM> res size
L: SIZE = 2
PARSEDIndex - parses <string> into naturally delimited words and returns the
index of <P1>. Returns -1 if not found.
Examples (ZERO-BASED):
nM> SEDIT "This is a string" idx "PARSEDI" is
idx = 1
nM> SEDIT "This is a string" idx "PARSEDI" "not"
idx = -1
nM> SEDIT "This is a string" idx "PARSEDI" "not"
idx = -1
Examples (ONE-BASED):
nM> SEDIT/OB "This is a string" idx "PARSEDI" is
idx = 2
nM> SEDIT/OB "This is a string" idx "PARSEDI" "not"
idx = -1
nM> SEDIT/OB "This is a string" idx "PARSEDI" "not"
idx = 0
PATH - returns the path of a filename (see ROOT, TAIL, EXT)
PREPend - Prepend <P1> to <string>
RANGE - extracts the substring from <string> between <P1> and <P2>.
For either <P1> or <P2>, a negative number means that many characters
from the end. The indices <P1> and <P2> are order-DEPENDENT; that is,
<P1> must refer to a position in the string that is equal to or to the
left of the position specified by <P2>.
If either index is out of range, a blank string is returned. RANGE is
therefore useful for catching errors: if it cannot give the entire
range of characters, it won't give any. The BETWEEN operator is
error-tolerant.
Note: As of NeXtMidas 3.1.2, the RANGE function will have the
following change to behaviour when using /LEGACY=FALSE. This
will be the default behaviour in a later release.
Negative numbers will count as an index from the end of the
string. See negative number examples below.
Leaving off the second index value will equate to a value of
zero, NOT the end of the string.
Examples(Zero-Based):
nM> SEDIT "This is a string" str "RANGE" 0 6
str = "This is"
nM> SEDIT "This is a string" str "RANGE" 6 0
str = ""
nM> SEDIT "This is a string" str RANGE -5 ^inStr.length-1
str = "string"
nM> SEDIT/LEGACY=FALSE "This is a string" str RANGE -6 -1
str = "string"
nM> SEDIT "This is a string" str "RANGE" 100 200
str = ""
nM> SEDIT "This is a string" str "RANGE" 4 -100
str = ""
nM> SEDIT "This is a string" str "RANGE" -1 -2
str = ""
nM> SEDIT "This is a string" str "RANGE" -2 -1
str = "in"
nM> SEDIT/LEGACY=FALSE "This is a string" str "RANGE" -2 -1
str = "ng"
nM> res inStr "This is a string
nM> SEDIT inStr str RANGE ^inStr.length ^inStr.length
str = ""
nM> SEDIT inStr str RANGE -1 -1
str = "n"
nM> SEDIT/LEGACY=FALSE inStr str RANGE -1 -1
str = "g"
Examples(One-Based):
nM> SEDIT/OB "This is a string" str "RANGE" 1 7
str = "This is"
nM> SEDIT/OB "This is a string" str "RANGE" 7 1
str = ""
nM> res inStr "This is a string"
nM> SEDIT/OB inStr str RANGE -5 ^inStr.length
str = "string"
nM> SEDIT/OB "This is a string" str "RANGE" 100 200
str = ""
nM> SEDIT/OB "This is a string" str "RANGE" 4 -100
str = ""
nM> SEDIT/OB "This is a string" str "RANGE" -1 -2
str = ""
nM> SEDIT/OB "This is a string" str "RANGE" -2 -1
str = "in"
nM> res inStr "This is a string"
nM> SEDIT/OB inStr str RANGE ^inStr.length ^inStr.length
str = "g"
ROOT - returns the filename without extension (see TAIL, PATH, EXT)
SEARch - Case sensitive search for the substring <P1> in
<string> and return index in <label>.
Returns -1 if not found .
Examples (Zero-Based):
nM> SEDIT "This is a string" idx SEARCH "string"
idx = 10
nM> SEDIT "This is a string" idx SEARCH "STRING"
idx = -1
Examples (One-Based):
nM> SEDIT/OB "This is a string" idx SEARCH "string"
idx = 11
nM> SEDIT/OB "This is a string" idx SEARCH "STRING"
idx = -1
SELect - Find the index of the first token that starts with <P1>.
See nxm.sys.lib.Parser for documentation on find method.
Note: As of NeXtMidas 3.1.2, the SELect function will have the
following change to behaviour when using /LEGACY=FALSE. This
will be the default behaviour in a later release.
The index returned will always be one-based, and using the /OB
switch will have no effect
Example (/LEGACY=FALSE):
nM> SEDIT/LEGACY=FALSE "This,is,a,string" idx SELECT "str"
idx = 4
Example (Zero-Based):
nM> SEDIT "This,is,a,string" idx SELECT "str"
idx = 3
Example (One-Based):
nM> SEDIT/OB "This,is,a,string" idx SELECT "str"
idx = 4
SPLIT - Split a string into an array (0 to N-1) of N tokens. Unlike
PARSEALL, the delimiter is specified.
Examples:
nM> SEDIT "A string to split" word SPLIT " "
word(0) = A
word(1) = string
word(2) = to
word(3) = split
nM> SEDIT "A,string,to,split" word SPLIT ","
word(0) = A
word(1) = string
word(2) = to
word(3) = split
Examples:
nM>SEDIT "This is a string" x "STARTS" "This"
z: X = true
nM>SEDIT "This is a string" x "STARTS" "string"
z: X = false
STRIM - trims all leading and trailing spaces off of <string>.
Examples:
nM> SEDIT " stranded " str STRIM
str = stranded
SUBstitute - substitutes string <P2> for the first instance of <P1> only
Examples:
nM> SEDIT "This is a string" str SUBS "is" "at"
str = That is a string
nM> SEDIT "This is a string" str SUBS "is" " "
15S: STR = Th is a string ! Note length
nM> SEDIT "This is a string" str SUBS "is" ""
14S: STR = Th is a string ! Note length
TAIL - returns the filename and extension (see ROOT, PATH, EXT)
TRIM - trims off string before <P1> and after <P2>. <label> will not contain
<P1> or <P2>.
Examples:
nM> SEDIT "ARRAY(FRAME;INDEX)+5" str TRIM "(" ")"
str = FRAME;INDEX
nM> SEDIT "FILENAME.EXT" str trim "."
str = EXT
nM> SEDIT "FILENAME.EXT" str trim ,, "."
str = FILENAME
UPCAse - converts alphabetic characters to upper case. One may also use
the Java method directly, for example;
nM> res str "my string"
nM> res str2 str.toUpperCase()
9S: STR2 = MY STRING
WORD - extracts the naturally delimited word containing <P1>
Examples:
nM> SEDIT "This is a string" str WORD "ri"
str = string
Performing Multiple Functions:
=============================
To perform multiple operations per line simply chain the operations
together (entering just ONE output result name):
nM> sedit "this is a string" out GSUBS "is" "at" GSUBS "at" "is"
nM> res out
16S: OUT = this is a string
When performing multiple operations per line the operations
must "make sense". For example, the following will generate an exception:
nM> sedit "this is a string" out LEN GSUBS "is" "at"
ERROR: Unable to convert GSUBS to type L
because SEDIT expects the parameter after LEN to be an integer.
SWITCHES:
/CLEAN - [DEFAULT=TRUE] Cleans the string before performing the functions:
PARSE, PARSEALL, PARSEDINDEX
/DEBUG - Turn on debug output
/FORCE - Sets <OSTR> in a readonly table. (Since 2.9.3)
/LEGACY - Use legacy behaviour of BETWEEN, LENGTH, RANGE and SELECT
functions. [DEF=TRUE], but will change to [DEF=FALSE] in a
future version. (Since 3.1.2)
/OB - Same as /ONEBASE.
/ONEBASE - Force 1-based indexing. Affects the BETWEEN, BSEARCH,
PARSEDINDEX, LENGTH, RANGE, SEARCH and SELECT functions.
/SEDITLEGACY - Same as /LEGACY. (Since 3.1.2)
/STRIP - Remove the quotes from the string
/ZEROBASE - DEPRECATED -- This is now the default. Use /OB to override.
SEE ALSO: Query, nxm.sys.test.test_sedit.mm, nxm.sys.lib.Format,
SEE ALSO: nxm.sys.lib.Parser