Discussion:
Regex Split on forward slash?
Peter Lambrechtsen
2013-12-17 03:18:29 UTC
Permalink
I'm trying to do a global split on a forward slash in FreeRadius 2.1.12.

Essentially I am trying to do a split on the ADSL Circuit ID:

ADSL-Agent-Circuit-Id = "POLT01 eth 1/1/02/05/4/14/1:10"

And trying to grab the value in the 6th split being "14" and check it's a
14.

I would have hoped something like this regex would work:

if ("%{ADSL-Agent-Circuit-Id}" =~ /(?<=\/)([\w-]*)(?=\/)/
) {
update request {
Tmp-String-1 = "%{0}"
Tmp-String-2 = "%{1}"
Tmp-String-3 = "%{2}"
Tmp-String-4 = "%{3}"
Tmp-String-5 = "%{4}"
Tmp-String-6 = "%{5}"
Tmp-String-7 = "%{6}"
}
}

And it would put each match into a Variable, or using /\//g too.

But neither option works.

Any suggestions on how to make it work, as I think taking it into Perl may
be the best option.
Arran Cudbard-Bell
2013-12-17 11:41:50 UTC
Permalink
Post by Peter Lambrechtsen
I'm trying to do a global split on a forward slash in FreeRadius 2.1.12.
g is not currently supported in any version of FreeRADIUS. You are limited
to 8 capture groups.
Post by Peter Lambrechtsen
ADSL-Agent-Circuit-Id = "POLT01 eth 1/1/02/05/4/14/1:10"
And trying to grab the value in the 6th split being "14" and check it's a 14.
if ("%{ADSL-Agent-Circuit-Id}" =~ /(?<=\/)([\w-]*)(?=\/)/ ) {
update request {
Tmp-String-1 = "%{0}"
Tmp-String-2 = "%{1}"
Tmp-String-3 = "%{2}"
Tmp-String-4 = "%{3}"
Tmp-String-5 = "%{4}"
Tmp-String-6 = "%{5}"
Tmp-String-7 = "%{6}"
}
}
And it would put each match into a Variable, or using /\//g too.
But neither option works.
No, g won't work. You probably need \\/ because of the multiple levels of escaping...
Post by Peter Lambrechtsen
Any suggestions on how to make it work, as I think taking it into Perl may be the best option.
Sure, but there will be a significant performance hit, and as (presumably) your company is
an ISP that may be an issue

Additionally by default the regular expression flavour is extended regular expressions,
though PCRE is supported in 3.x.x if you have the PCRE library available at build time.

-Arran

Arran Cudbard-Bell <***@freeradius.org>
FreeRADIUS Development Team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2
Phil Mayers
2013-12-17 12:19:22 UTC
Permalink
Post by Arran Cudbard-Bell
No, g won't work. You probably need \\/ because of the multiple levels of escaping...
Really? Isn't the delimiter handled at top-level of the unlang parser?

I should resurrect my "use any delimiter / more flags / remove
double-escaping craziness" regexp patch:

if (Attr =~ !no\tleaning/toothpicks!giu) {
...
}
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Alan DeKok
2013-12-17 15:05:18 UTC
Permalink
The issue with the parser is that it's terrible. The "unlang" parser is just the normal config file parser. The backslashes get "eaten" by the confit parser. The unlang code then sees an un-escaped string. So if you want unlang to see an escaped string, you have to escape it twice.

Fixing it properly is hard, as all of the existing configurations use the double escapes. So we cant just make single escapes the norm. It's something we'd have to change in a major release.

It may be better to allow non-slash characters as regex delimiters. That means adding new functionality, which does t break existing configs. It just means changing the regex parser in src/main/parser.c. That should be simple.

Sent from my iPhone
Post by Phil Mayers
Post by Arran Cudbard-Bell
No, g won't work. You probably need \\/ because of the multiple levels of escaping...
Really? Isn't the delimiter handled at top-level of the unlang parser?
if (Attr =~ !no\tleaning/toothpicks!giu) {
...
}
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Phil Mayers
2013-12-17 17:03:34 UTC
Permalink
Post by Alan DeKok
The issue with the parser is that it's terrible. The "unlang" parser
is just the normal config file parser. The backslashes get "eaten" by
Ah, I'd forgotten that first layer.
Post by Alan DeKok
It may be better to allow non-slash characters as regex delimiters.
That means adding new functionality, which does t break existing
configs. It just means changing the regex parser in
src/main/parser.c. That should be simple.
That's what I was thinking; any delimiter *other* than / can have new
behaviour.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Arran Cudbard-Bell
2013-12-17 17:06:07 UTC
Permalink
Post by Alan DeKok
The issue with the parser is that it's terrible. The "unlang" parser is just the normal config file parser. The backslashes get "eaten" by the confit parser. The unlang code then sees an un-escaped string. So if you want unlang to see an escaped string, you have to escape it twice.
Fixing it properly is hard, as all of the existing configurations use the double escapes. So we cant just make single escapes the norm. It's something we'd have to change in a major release.
It may be better to allow non-slash characters as regex delimiters. That means adding new functionality, which does t break existing configs. It just means changing the regex parser in src/main/parser.c. That should be simple.
Yes, that'd help. Then / wouldn't need to be escaped and there'd be non of the confusion over double escapes.

Really the parser should be modified to only process escape sequences between double quotes and a special case for the end of lines.. but that stillbreaks everyone.


Arran Cudbard-Bell <***@freeradius.org>
FreeRADIUS Development Team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2

Arran Cudbard-Bell
2013-12-17 16:20:29 UTC
Permalink
Post by Phil Mayers
Post by Arran Cudbard-Bell
No, g won't work. You probably need \\/ because of the multiple levels of escaping...
Really? Isn't the delimiter handled at top-level of the unlang parser?
The regex delimiter is hardcoded to be a '/', we only support the case insensitive flag,
and i'm fairly sure forward slashes need to be double escaped.
Post by Phil Mayers
if (Attr =~ !no\tleaning/toothpicks!giu) {
:)

Arran Cudbard-Bell <***@freeradius.org>
FreeRADIUS Development Team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2
Phil Mayers
2013-12-17 11:46:52 UTC
Permalink
Post by Peter Lambrechtsen
ADSL-Agent-Circuit-Id = "POLT01 eth 1/1/02/05/4/14/1:10"
And trying to grab the value in the 6th split being "14" and check it's
a 14.
/(?<=\/)([\w-]*)(?=\/)/
Few things:

1. This looks like a PCRE regexp; are you sure you're compiling w/
PCRE support?

2. The "\w" will need double-backslash i.e. \\w - the double-slash
makes the unlang parser emit a single slash to the regexp input. The
regexp parser in unlang is a bit funky in this regard I'm afraid :o(

3. However you should only need to single-escape the regexp deliminter
(i.e. once, for the unlang parser).

4. That regexp doesn't work for me when I try it from the python repl;
how are you sure it's right

5. It's obvious from your text that you're assuming the regexps work
differently than they do - they don't do a "searchall" operation or
similar. It's a single match, and each %{n} refers to a single capture
group from a single match.

You'll need to write out a regexp in full that matches your input e.g.

if (blah =~ /\\w+ \\w+ .+\/.+\/.+\/.+\/.+\/(.+)\/.+/) {
update control {
Tmp-String-0 := "6th /-delimited is %{1}"
}
}
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Loading...