SEDIT

From ICE Enterprises
Jump to navigation Jump to search

Edits a string and writes it to a results parameter

<ISTR>  Input string to be edited (String results label or literal string)
<OSTR>  Output results label
<FUNC>  Edit operation to perform (see below for LIST)
<P1>    Edit parameter as defined for each operation
<P2>    Edit parameter as defined for each operation

NOTE: SEDIT can now handle MULTIPLE operations per line (entering just ONE
      output result name):
  nM> SEDIT <input> <output result> FUNC1 [P1] [P2] ... FUNCn [P1n] [P2n]

This command performs a few basic string editing functions that may be required
by some macros with a sophisticated user interface or command line alteration
functions.

The supported operations (CAPS indicate minimum required name) are:
  APPEND, BETWeen, BSEArch, CLEAN, ELEMent, EXT, GSUBstitute, HEAD, JOIN,
  KEY, LENgth, LOCAse, NELem, NFORM, PADL, PADR, PADB, PARSE, PARSEALl,
  PARSEARgs, PARSEDInd, PATH, PREPEND, RANGE, ROOT, SEARch, SELect,
  SPLIT, STRIM, SUBstitute, TAIL, TRIM, UPCAse, WORD

NOTE: As of NeXtMidas 3.1.2, the behaviour of The BETWeen, RANGE, and SELect
      operations is deprecated and will change in a future release. These
      changes are to bring SEDIT in line with X-Midas SEDIT behaviour. These
      changes are:
         BETWeen: Negative numbers will be treated as offsets from the last
                  character in the input string.
         LENgth:  The index passed to the LENgth will be one-based, since it
                  is counting elements.
         RANGE:   Like BETWeen, negative numbers will be treated as offsets
                  from the last character in the input string. The net effect
                  this is that (in zero index mode) a range of -5 str.length-1
                  will return the last 5 characters of the string, where
                  previously it would return the last 6 characters. Also,
                  leaving off the second index will equate to a value of 0, NOT
                  the end of the input string.
         SELect:  This operation will be 1-index based only, since it is
                  returning an element number, not an index.

      These changes can be utilized as of NeXtMidas 3.1.2 using the
      /LEGACY=FALSE switch. In a future release, this will be the default
      setting. Until that time, users of the above operations not using
      /LEGACY=FALSE will get deprecation warnings. There is also an
      equivalent switch /seditLegacy added as a convenience so that users
      can use the switch at the top of a macro without interfering with
      other primitives that use /legacy (i.e. LIST2).


By default SEDIT is "offset" or "zero-based".  The /OB or /ONEBASE switch may
be used to FORCE one-based processing.

When using index return value functions (SEARCH, BSEARCH and  PARSEDINDEX,..)
one should check for < 0 for invalid or not found.  A return value of zero (0)
is NO LONGER SUPPORTED! Zero is a valid index in JAVA and should not be used
as an error indicator.

NOTE 1: In NeXtMidas a string result has access to all of the methods of the
string class.  Use the QUERY command to show the available methods. For
instance, we can use raw JAVA toUpperCase method as follows:
        nM> res str "This is a string"
        nM> res str2 str.touppercase()
        nm> res str2
        16S: STR2            = THIS IS A STRING

NOTE 2: In NeXtMidas null strings, "" are differentiated from blank spaces,
       " ". Thus, there is no "special string" "<SPACE>" as in the X-Midas
       version of SEDIT. For example,

       nM> res orig "Underscores_to_spaces"
       nM> sedit orig str gsubs "_" " "
       21S: STR =   "Underscores to spaces"

FUNCTIONS:

 APPend  - Append <P1> to <string>

 BETWeen - ZERO-BASED: extracts any possible substring from <string> between
           <P1> and <P2>.  It doesn't matter whether <P1> is "less than" <P2>
           or not; BETWEEN returns what is *between* the two indices,
           inclusive.

           A negative number means that many characters from the other index.

           In general, BETWEEN will give as much of the string as possible if
           at least one index is within the string. If both indices are out of
           range, a blank string will be returned.

           ONE-BASED (/OB) ONLY: A zero means the end of the string.
           The zero is a special case; if the other index is within range,
           it returns the indicated substring; if not, it returns blank.

           NOTE: As of NeXtMidas 3.1.2, the BETWeen function will have the
                 following change to behaviour when using /LEGACY=FALSE. This
                 will be the default behaviour in a later release.

                 Negative numbers will represent an index from the end of
                 the string. So, in zero-based, "-1" is the last character
                 in the string, and in one-based, "0" is the last character.

                 Like RANGE, BETWeen will support leaving off the last index
                 to mean the end of String. See the examples below.


        Examples (Zero-Based):
         * Get the first 7 characters of a string:
            nM> SEDIT "This is a string" str "BETWEEN" 0 6
            str = "This is"

            nM> SEDIT "This is a string" str "BETWEEN" 6 0
            str = "This is"

         * Get the last 6 characters of a string
            nM> res instr "This is a string"
            nM> SEDIT instr str "BETWEEN" ^instr.length -5
            str = "string"

            nM> SEDIT/LEGACY=FALSE instr str "BETWEEN" -1 -6
            str = "string"

         * Get the characters OUT-OF-RANGE of a string
            nM> SEDIT "This is a string" str "BETWEEN" 100 200
            str = ""

         * Get the entire string
            nM> SEDIT "This is a string" str "BETWEEN" 0 200
            str = "This is a string"

         * Get five (5) characters from character 4 to the beginning
           of the string
            nM> SEDIT "This is a string" str "BETWEEN" 4 -100
            str = "This "

         * The INCORRECT way to get the last character of the string
            nM> res instr "This is a string"
            nM> SEDIT instr str "BETWEEN" ^instr.length ^instr.length
            str = ""

         * The CORRECT way to get the last character of the string
            nM> res instr "This is a string"
            nM> SEDIT instr str "BETWEEN" ^instr.length-1 ^instr.length-1
            str = "g"

            nM> SEDIT/LEGACY=FALSE instr str "BETWEEN" -1 -1
            str = "g"

         * Get the first character of the string
            nM> SEDIT instr str "BETWEEN" 0 0
            str = "T"

         * Get the last five characters in a string
           nM> SEDIT/LEGACY=FALSE instr str "BETWEEN" -6
           str = "string"

        Examples (one-based):
          SEDIT/OB "This is a string" str "BETWEEN" 1 7
          str = "This is"

          SEDIT/OB "This is a string" str "BETWEEN" 7 1
          str = "This is"

          res instr "This is a string"
          SEDIT/OB instr str "BETWEEN" ^instr.length -5
          str = "string"

          SEDIT/OB "This is a string" str "BETWEEN" 100 200
          str = ""

          SEDIT/OB "This is a string" str "BETWEEN" 4 -100
          str = "This"

          res instr "This is a string"
          SEDIT/OB instr str "BETWEEN" ^instr.length ^instr.length
          str = "g"

          SEDIT/OB instr str "BETWEEN" 0 0
          str = ""

 BSEArch - search for the substring <P1> in <string> starting from
            the back and return index of the start of the string in
            <label>. Returns -1 if not found.

        Examples:
          SEDIT "This is a string" idx "BSEARCH" "string"
          idx = 11

          SEDIT "This is a string" idx "BSEARCH" "STRING"
          idx = -1

 CLEAN - returns the CLEANed version of <string> in <label>
         (see nxm.sys.lib.Parser)

        Example:
          SEDIT  "This is a string" str CLEAN
          str = THIS,IS,A,STRING

 ELEMent - extracts the nth delimited word where n = <P1> and the delimiter =
          <P2>.  The string is not CLEANED as in PARSE. <P2> can be more than
          1 character in length.

        Examples:
          SEDIT "This/is/a//string" str "ELEM" 2 "/"
          str  = is
          
          SEDIT "Bill and Bob and Jebediah" str "ELEM" 3 " and "
          str  = Jebediah

        Examples:
          nM>SEDIT "This is a string" x "ENDS" "string"
            z: X               = true
          
          nM>SEDIT "This is a string" x "ENDS" "This"
            z: X               = false

 EXT - returns the filename extension  (see ROOT, TAIL, PATH)

 GSUBstitute - substitutes string <P2> for every instance of <P1>

        Examples:
          SEDIT "This is a string" str "GSUBS" "is" "at"
          str  = That at a string

          SEDIT "This is a string" str "GSUBS" " " "xx"
          str  = Thisxxisxxaxxstring

          SEDIT "This is a string" str "GSUBS" "is" ""
          12S: str  = Th  a string

          SEDIT "This is a string" str "GSUBS" "is" " "
          14S: str  = Th    a string

 HEAD - alias for PATH (see ROOT, TAIL, EXT)

 JOIN - Join a string array into a String.
        Examples:
          nM> SEDIT "A string array to join" word SPLIT " "
          nM> SEDIT word str JOIN " "
          17S: STR = A string array to join

 KEY - Gets the value from a string containing TAG/VALUE pairs, where <P1>
       is the tag name and <P2> is the delimiter (= is DEFAULT).
        Examples:
            nM> SEDIT "NAME=homer CITY=SPRINGFIELD" myname KEY "NAME"


 LENgth  - calculates the length of string through the Nth naturally delimited
           entry where <P1> = N.  Use N=-1 (default) to calculate length of the
           entire string.

        Note: The default behaviour or LENgth will change from zero-based to
              one-based in a future release, since length is counting elements.

        Examples (zero-based):
          SEDIT "This is a string" lstr "LEN" 0
          lstr = 4
          SEDIT "This is a string" lstr "LEN" 2
          lstr = 9
          SEDIT "This is a string" lstr "LEN"
          lstr = 16

        Examples (one-based):
          SEDIT/OB "This is a string" lstr "LEN" 0
          lstr = 16
          SEDIT/OB "This is a string" lstr "LEN" 3
          lstr = 9

        Note that you can; for example, extract the first three
        words of some string with the string length followed by RANGE:
          RESULT string "This is one powerful utility!"
          SEDIT/OB  string lstr length 3
          SEDIT/OB  string substr RANGE 1 lstr
          substr = This is one

 LOCAse - converts alphabetic characters to lower case

        Examples:
          SEDIT "This is a string" str "LOCA"
          str = this is a string

 MASK   - Calls Parser.mask to build an integer bit mask of enabled items from a list.

        Examples:
          SEDIT "AA,BB,CC,DD" mask "MASK" "A|DD"
          L: mask             = 0x9

 NELem  - Count the number of elements in string delimited by <P1>.  The default
          delimiter is a comma (,).  Only an empty string or the the RESERVED
          word "NULL" has  zero (0) elements.

        Examples:
          SEDIT "" nels "NELEM"
          L: NELS             = 0

          SEDIT "NULL" nels "NELEM"
          L: NELS             = 0

          SEDIT " " nels "NELEM"
          L: NELS             =  1

          SEDIT "," nels "NELEM"
          L: NELS             =  2

          SEDIT "x,y" nels "NELEM"
          L: NELS             =  2

          SEDIT "blank separated string" nels "NELEM" " "
          L: NELS             =  3

          SEDIT "blanks and commas,separated string" nels "NELEM" " "
          L: NELS             =  4

          SEDIT "    " nels "NELEM" " " ! Four blanks with blank as delimiter
          L: NELS             =  5
          
          SEDIT "Bill and Bob and Jebediah" nels "NELEM" " and " ! String delimiter
          L: NELS             =  3

 NFORM  - Format numbers according to given mask.  Besides all of the standard Java
        format strings, Fortran format strings can be applied by surrounding the
        Fortran format string with parentheses like (F12.2). The following
        format keywords are also supported:
          GEN    - X-Midas GENeral format (no exponent if between 1E-3, 1E15).
          VIS    - X-Midas VISual  format (no exponent if between 1E-3, 1E7).
          SCI    - SCIentific notation.
          ENG    - ENGineering notation   (exponent is a multiple of 3).
          MAN    - MANtissa notation      (no exponents).
          DMS    - Deg-Min-Sec angular format.            ddd'mm'ss
          LAT    - Deg-Min-Sec format for latitude.       ddd'mm'ssN
          LON    - Deg-Min-Sec format for longitude.      ddd'mm'ssE
          STD    - STanDard time code format.             [yy]yy:mm:dd::hh:mm:ss
          ACQ    - ACQuisition time code format.          [yy.ddd]:[hh:mm:ss]
          EPOCH  - EPOCH quadwords for time code.         [yy]yy:sec_in_year
          NORAD  - NORAD timecode format.                 yyddd.frac_of_day
          TCR    - TimeCode Reader format.                ddd:hh:mm:ss
          VAX    - VAX time format.                       dd-MMM-yy[yy]:hh:mm:ss
          HMS    - Hour-Min-Sec time format.              hh:mm:ss.ffff
          YMD    - Year-Month-Day format.                 yyyy:mm:dd
          NET    - 32-bit integer formatted as URL.       127.0.0.1

        Examples:
          SEDIT 1 str "NFORM" "#"
          str = 1

          SEDIT 1.12345678 str "NFORM" "#.###"
          str = 1.123

          SEDIT 1.12345678 str "NFORM" "00.000"
          str = 01.123

          SEDIT 1 str NFORM "0.00"
          str = 1.00

          SEDIT 1 str NFORM "(F3.2)"
          str = 1.00

          SEDIT 123.456 str NFORM "DMS"
          STR = 123'27'22

          the start of <P2> occurrence of <P1>. If <P1> not found or if
          specified occurence of <P1> not found, returns -1. (Since 3.5.0)
          
         Examples:
         nM> sedit "1 and 2 and 3 and" index NSEA "and" 3
           L: INDEX         = 14
           
         nM> sedit "1 and 2 and 3 and" index NSEA "and" 10
           L: INDEX         = -1
           
         nM> sedit "1 and 2 and 3 and" index NSEA "missing" 3
           L: INDEX         = -1

 PADx  - PADRight, PADLeft and PADBoth.  Default pad character is a space. You
         can specify a length or a relative length by prepending a + before the
         number.  When necessary when padding BOTH, the extra character goes on
         the right.

        Examples:
          nM> sedit "a string" str PADL 20
          20S: STR             = "            a string"

          nM> sedit "a string" str PADR 20 "."
          20S: STR             = "a string............"

          nM> sedit "a string" str PADB 20
          20S: STR             = "      a string      "

 PARSE - extracts the nth naturally delimited word where n = <P1>.
         <P2> is an optional delimiter, if not set commas or spaces are used.
         <label> will be in all uppercase letters unless /CLEAN=FALSE.

        Examples:
          nM> SEDIT "This is a string" str "PARSE" 2
          str  = IS

          nM> SEDIT "This/is/a/string" str "^func" 4 "/" /clean=f
          nM> res str
           6S: STR             = string

 PARSEALl - parses <string> into naturally delimited words and puts the parsed
            words into a results array named <label>.  <P1> specifies the number
            of elements in the array.  If <P1> is not specified or made 0, the
            number of elements within the array will be defaulted to the number
            of words in the string.  If <P1> is less than the number of words,
            only the first <P1> words will be placed into the array.  If <P1> is
            greater than the number of words, the remaining elements of the
            array are null.  <P2> is an optional delimiter, if not set commas or
            spaces are used. All results will be in all uppercase letters unless
            /CLEAN=FALSE.


        Examples:
          * nM> SEDIT "This is a string" word "PARSEALL" 4
            nM> res word(0)
             4S: WORD(0)       = THIS
            nM> res word(2)
             1S: WORD(2)       = A

            nM> res word(4)
            ERROR: java.lang.IllegalArgumentException: KeyObject.getIndexed():
            Error in accessing element 4 from [Ljava.lang.String;@33b121:
            java.lang.ArrayIndexOutOfBoundsException

          * nM> SEDIT "This is a string" word "PARSEALL" 4 /clean=f
            nM> res word(0)
            4S: WORD(0)       = This

 PARSEARgs - parses the arguments of a command.  The returned object is of type
             nxm.sys.lib.Args, and as such has access to the Args class methods.
        Examples:
          nM> sedit "PATH,FUNC=SET,SYS" args PARSEARGS
          nM> res L:size args.size
          nM> res size
           L: SIZE            = 2

 PARSEDIndex - parses <string> into naturally delimited words and returns the
               index of <P1>.  Returns -1 if not found.

        Examples (ZERO-BASED):
          nM> SEDIT  "This is a string" idx "PARSEDI" is
          idx  = 1

          nM> SEDIT  "This is a string" idx "PARSEDI" "not"
          idx  = -1

          nM> SEDIT  "This is a string" idx "PARSEDI" "not"
          idx  = -1

        Examples (ONE-BASED):
          nM> SEDIT/OB "This is a string" idx "PARSEDI" is
          idx  = 2

          nM> SEDIT/OB "This is a string" idx "PARSEDI" "not"
          idx  = -1

          nM> SEDIT/OB "This is a string" idx "PARSEDI" "not"
          idx  = 0

 PATH  - returns the path of a filename (see ROOT, TAIL, EXT)

 PREPend  - Prepend <P1> to <string>

 RANGE - extracts the substring from <string> between <P1> and <P2>.

         For either <P1> or <P2>, a negative number means that many characters
         from the end.  The indices <P1> and <P2> are order-DEPENDENT; that is,
         <P1> must refer to a position in the string that is equal to or to the
         left of the position specified by <P2>.

         If either index is out of range, a blank string is returned. RANGE is
         therefore useful for catching errors:  if it cannot give the entire
         range of characters, it won't give any. The BETWEEN operator is
         error-tolerant.

         Note: As of NeXtMidas 3.1.2, the RANGE function will have the
               following change to behaviour when using /LEGACY=FALSE. This
               will be the default behaviour in a later release.

               Negative numbers will count as an index from the end of the
               string. See negative number examples below.

               Leaving off the second index value will equate to a value of
               zero, NOT the end of the string.

        Examples(Zero-Based):
          nM> SEDIT "This is a string" str "RANGE" 0 6
          str = "This is"

          nM> SEDIT "This is a string" str "RANGE" 6 0
          str = ""

          nM> SEDIT "This is a string" str RANGE -5 ^inStr.length-1
          str = "string"

          nM> SEDIT/LEGACY=FALSE "This is a string" str RANGE -6 -1
          str = "string"

          nM> SEDIT "This is a string" str "RANGE" 100 200
          str = ""

          nM> SEDIT "This is a string" str "RANGE" 4 -100
          str = ""

          nM> SEDIT "This is a string" str "RANGE" -1 -2
          str = ""

          nM> SEDIT "This is a string" str "RANGE" -2 -1
          str = "in"

          nM> SEDIT/LEGACY=FALSE "This is a string" str "RANGE" -2 -1
          str = "ng"

          nM> res inStr "This is a string
          nM> SEDIT inStr str RANGE  ^inStr.length ^inStr.length
          str = ""

          nM> SEDIT inStr str RANGE  -1 -1
          str = "n"

          nM> SEDIT/LEGACY=FALSE inStr str RANGE  -1 -1
          str = "g"

        Examples(One-Based):
          nM> SEDIT/OB "This is a string" str "RANGE" 1 7
          str = "This is"

          nM> SEDIT/OB "This is a string" str "RANGE" 7 1
          str = ""

          nM> res inStr "This is a string"
          nM> SEDIT/OB inStr str RANGE -5 ^inStr.length
          str = "string"

          nM> SEDIT/OB "This is a string" str "RANGE" 100 200
          str = ""

          nM> SEDIT/OB "This is a string" str "RANGE" 4 -100
          str = ""

          nM> SEDIT/OB "This is a string" str "RANGE" -1 -2
          str = ""

          nM> SEDIT/OB "This is a string" str "RANGE" -2 -1
          str = "in"

          nM> res inStr "This is a string"
          nM> SEDIT/OB inStr str RANGE  ^inStr.length ^inStr.length
          str = "g"

 ROOT    - returns the filename without extension (see TAIL, PATH, EXT)

 SEARch  - Case sensitive search for the substring <P1> in
                <string> and return index in <label>.
                Returns -1 if not found .

        Examples (Zero-Based):
          nM> SEDIT "This is a string" idx SEARCH "string"
          idx = 10

          nM> SEDIT "This is a string" idx SEARCH "STRING"
          idx = -1

        Examples (One-Based):
          nM> SEDIT/OB "This is a string" idx SEARCH "string"
          idx = 11

          nM> SEDIT/OB "This is a string" idx SEARCH "STRING"
          idx = -1

 SELect - Find the index of the first token that starts with <P1>.
          See nxm.sys.lib.Parser for documentation on find method.

          Note: As of NeXtMidas 3.1.2, the SELect function will have the
                following change to behaviour when using /LEGACY=FALSE. This
                will be the default behaviour in a later release.

                The index returned will always be one-based, and using the /OB
                switch will have no effect

        Example (/LEGACY=FALSE):
          nM> SEDIT/LEGACY=FALSE "This,is,a,string" idx SELECT "str"
          idx = 4

        Example (Zero-Based):
          nM> SEDIT "This,is,a,string" idx SELECT "str"
          idx = 3

        Example (One-Based):
          nM> SEDIT/OB "This,is,a,string" idx SELECT "str"
          idx = 4

 SPLIT - Split a string into an array (0 to N-1) of N tokens.  Unlike
                PARSEALL, the delimiter is specified.
        Examples:
          nM> SEDIT "A string to split" word SPLIT " "
          word(0) = A
          word(1) = string
          word(2) = to
          word(3) = split

          nM> SEDIT "A,string,to,split" word SPLIT ","
          word(0) = A
          word(1) = string
          word(2) = to
          word(3) = split

        Examples:
          nM>SEDIT "This is a string" x "STARTS" "This"
            z: X               = true
          
          nM>SEDIT "This is a string" x "STARTS" "string"
            z: X               = false

 STRIM - trims all leading and trailing spaces off of <string>.
        Examples:
          nM> SEDIT "    stranded   " str STRIM
          str  = stranded

 SUBstitute - substitutes string <P2> for the first instance of <P1> only
        Examples:
          nM> SEDIT "This is a string" str SUBS "is" "at"
          str  = That is a string

          nM> SEDIT "This is a string" str SUBS "is" " "
          15S: STR = Th  is a string ! Note length

          nM> SEDIT "This is a string" str SUBS "is" ""
          14S: STR = Th is a string ! Note length

 TAIL - returns the filename and extension  (see ROOT, PATH, EXT)

 TRIM - trims off string before <P1> and after <P2>. <label> will not contain
        <P1> or <P2>.

        Examples:
          nM> SEDIT "ARRAY(FRAME;INDEX)+5" str TRIM "(" ")"
          str  = FRAME;INDEX
          nM> SEDIT "FILENAME.EXT" str trim "."
          str  = EXT
          nM> SEDIT "FILENAME.EXT" str trim ,, "."
          str = FILENAME

 UPCAse - converts alphabetic characters to upper case.  One may also use
          the Java method directly, for example;
          nM> res str "my string"
          nM> res str2 str.toUpperCase()
          9S: STR2            = MY STRING

 WORD  - extracts the naturally delimited word containing <P1>

        Examples:
          nM> SEDIT "This is a string" str WORD "ri"
          str  = string

Performing Multiple Functions:
=============================
  To perform multiple operations per line simply chain the operations
  together (entering just ONE output result name):
    nM> sedit "this is a string" out GSUBS "is" "at" GSUBS "at" "is"
    nM> res out
    16S: OUT             = this is a string

  When performing multiple operations per line the operations
  must "make sense".  For example, the following will generate an exception:
    nM> sedit "this is a string" out LEN GSUBS "is" "at"
    ERROR: Unable to convert GSUBS to type L
  because SEDIT expects the parameter after LEN to be an integer.


SWITCHES:
  /CLEAN   - [DEFAULT=TRUE] Cleans the string before performing the functions:
               PARSE, PARSEALL, PARSEDINDEX
  /DEBUG   - Turn on debug output
  /FORCE   - Sets <OSTR> in a readonly table. (Since 2.9.3)
  /LEGACY  - Use legacy behaviour of BETWEEN, LENGTH, RANGE and SELECT
             functions. [DEF=TRUE], but will change to [DEF=FALSE] in a
             future version. (Since 3.1.2)
  /OB      - Same as /ONEBASE.
  /ONEBASE - Force 1-based indexing. Affects the BETWEEN, BSEARCH,
              PARSEDINDEX, LENGTH, RANGE, SEARCH and SELECT functions.
  /SEDITLEGACY - Same as /LEGACY. (Since 3.1.2)
  /STRIP    - Remove the quotes from the string
  /ZEROBASE - DEPRECATED -- This is now the default.  Use /OB to override.

SEE ALSO: Query, nxm.sys.test.test_sedit.mm, nxm.sys.lib.Format,
SEE ALSO: nxm.sys.lib.Parser