Forum Discussion

nshelton85's avatar
nshelton85
Icon for Altostratus rankAltostratus
Mar 09, 2020

Basic findstr question

So when using findstr, for instance set user [ findstr [HTTP::payload] "user=" 5 & ], I know that the 5 & means to return all data after "user=". If my user ID is 5555555555@domain.com, but has additional data after the domain.com, is there a way to tell the findstr command not to return the data after the domain? Is there a good resource that shows all of the characters/commands that can be used with findstr to return data in different ways? Sorry for the very basic questions, but I don't have a strong background in programming.

  • findstr is a simple string search function, and you can't really do anything complex with it.

    For more complex string extraction, you probably need a regular expression

    iRules 101 - #10 - Regular Expressions

    However, be aware that using a regex is computationally expensive.

    If your URL is /test?user=5555555@my.domain.com&pass=.....

    user=(.*@([\w*\.]*))(\&|\#|$)

    Then from the above regex, the Group 1 match is 5555555@my.domain.com

    You can be more selective about what you terminate on.

    To play with regex, go to regex101.com

  • Thanks, but this is all greek to me unfortunately. I checked the expression on the site you linked me to, and the first part makes sense. I don't understand what the last part (\&|\#|$) is doing entirely.

     

    /

    user=(.*@([\w*\.]*))(\&|\#|$)

    /

    gm

    user= matches the characters user= literally (case sensitive)

     

    1st Capturing Group (.*@([\w*\.]*))

     

    .* matches any character (except for line terminators)

    Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

    @ matches the character @ literally (case sensitive)

     

    2nd Capturing Group ([\w*\.]*)

     

    Match a single character present in the list below [\w*\.]*

    Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

    \w matches any word character (equal to [a-zA-Z0-9_])

    * matches the character * literally (case sensitive)

    \. matches the character . literally (case sensitive)

     

    3rd Capturing Group (\&|\#|$)

     

    1st Alternative \&

    \& matches the character & literally (case sensitive)

     

    2nd Alternative \#

    \# matches the character # literally (case sensitive)

     

    3rd Alternative $

    $ asserts position at the end of a line

     

    Global pattern flags

    g modifier: global. All matches (don't return after first match)

    m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

     

    • gersbah's avatar
      gersbah
      Icon for Cirrostratus rankCirrostratus

      I think that last part is meant to "anchor" the expression. The idea would be to end the expression with either "&" (which means the next parameter follows), "#" (a URL fragment) or $ (end of the line - no other parameters).

      However, practically I don't believe it does a whole lot, because \w in the second capture group already limits the match to only alpha-numeric (which "&" isn't). It also doesn't really prevent overmatching of the first ".*" which would come into play if you have another "@" anywhere else in the query string after the user parameter.

      There's probably a million ways to do this and a lot of it depends on how robust you want or need to make it. What if there are multiple user parameters? What about a user parameter with no "@" or no value at all? Can there be special characters in the user name? If the domain is always the same you could even match that literally.

      I would probably start with something like this:

      user=([\w]+@[\w.]+)

      But as I said, a million ways...

      • Simon_Blakely's avatar
        Simon_Blakely
        Icon for Employee rankEmployee

        As you say, a million ways to die in the regex engine ...

         

        To be honest, I was trying to compensate for the requirement from the OP

         

        > If my user ID is 5555555555@domain.com, but has additional data after the domain.com, is there a way to tell the findstr command not to return the data after the domain?

         

        I was guessing what data might be after the domain if there wasn't another &

         

        In a correctly specified URI, the only valid element separators once you are past the path and have reached query parameters are query parameter separators (&, ;) or the fragment specifier (#), or a line end ( and I missed the ;).

         

        Special characters should be %-encoded in the URL at this stage, but I haven't accounted for %-encoding in the domain portion, because there probably shouldn't be any (the only allowed non-alphanumeric in a fqdn is a dash - , which probably needs inclusion as well).

         

        It gets complicated, real quick ...