cancel
Showing results for 
Search instead for 
Did you mean: 

Basic findstr question

nshelton85
Altostratus
Altostratus

So when using findstr, for instance set user [ findstr [HTTP::payload] "user=" 5 & ], I know that the 5 & means to return all data after "user=". If my user ID is 5555555555@domain.com, but has additional data after the domain.com, is there a way to tell the findstr command not to return the data after the domain? Is there a good resource that shows all of the characters/commands that can be used with findstr to return data in different ways? Sorry for the very basic questions, but I don't have a strong background in programming.

4 REPLIES 4

Simon_Blakely
F5 Employee
F5 Employee

findstr is a simple string search function, and you can't really do anything complex with it.

 

For more complex string extraction, you probably need a regular expression

 

iRules 101 - #10 - Regular Expressions

 

However, be aware that using a regex is computationally expensive.

 

If your URL is /test?user=5555555@my.domain.com&pass=.....

user=(.*@([\w*\.]*))(\&|\#|$)

Then from the above regex, the Group 1 match is 5555555@my.domain.com

 

You can be more selective about what you terminate on.

 

To play with regex, go to regex101.com

nshelton85
Altostratus
Altostratus

Thanks, but this is all greek to me unfortunately. I checked the expression on the site you linked me to, and the first part makes sense. I don't understand what the last part (\&|\#|$) is doing entirely.

 

/

user=(.*@([\w*\.]*))(\&|\#|$)

/

gm

user= matches the characters user= literally (case sensitive)

 

1st Capturing Group (.*@([\w*\.]*))

 

.* matches any character (except for line terminators)

Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

@ matches the character @ literally (case sensitive)

 

2nd Capturing Group ([\w*\.]*)

 

Match a single character present in the list below [\w*\.]*

Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

\w matches any word character (equal to [a-zA-Z0-9_])

* matches the character * literally (case sensitive)

\. matches the character . literally (case sensitive)

 

3rd Capturing Group (\&|\#|$)

 

1st Alternative \&

\& matches the character & literally (case sensitive)

 

2nd Alternative \#

\# matches the character # literally (case sensitive)

 

3rd Alternative $

$ asserts position at the end of a line

 

Global pattern flags

g modifier: global. All matches (don't return after first match)

m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

 

I think that last part is meant to "anchor" the expression. The idea would be to end the expression with either "&" (which means the next parameter follows), "#" (a URL fragment) or $ (end of the line - no other parameters).

 

However, practically I don't believe it does a whole lot, because \w in the second capture group already limits the match to only alpha-numeric (which "&" isn't). It also doesn't really prevent overmatching of the first ".*" which would come into play if you have another "@" anywhere else in the query string after the user parameter.

 

There's probably a million ways to do this and a lot of it depends on how robust you want or need to make it. What if there are multiple user parameters? What about a user parameter with no "@" or no value at all? Can there be special characters in the user name? If the domain is always the same you could even match that literally.

 

I would probably start with something like this:

user=([\w]+@[\w.]+)

But as I said, a million ways...

As you say, a million ways to die in the regex engine ...

 

To be honest, I was trying to compensate for the requirement from the OP

 

> If my user ID is 5555555555@domain.com, but has additional data after the domain.com, is there a way to tell the findstr command not to return the data after the domain?

 

I was guessing what data might be after the domain if there wasn't another &

 

In a correctly specified URI, the only valid element separators once you are past the path and have reached query parameters are query parameter separators (&, ;) or the fragment specifier (#), or a line end ( and I missed the ;).

 

Special characters should be %-encoded in the URL at this stage, but I haven't accounted for %-encoding in the domain portion, because there probably shouldn't be any (the only allowed non-alphanumeric in a fqdn is a dash - , which probably needs inclusion as well).

 

It gets complicated, real quick ...